LangChain Integration

Added in version 2.12.0.

LangChain Community

While the stable integrations (such as ChatOCIModelDeploymentVLLM and OCIModelDeploymentVLLM) are also available from LangChain Community, integrations from ADS may provide additional or experimental features in the latest updates.

Requirements

The LangChain integration requires python>=3.9 and langchain>=0.3. Chat model also requires langchain-openai.

LangChain compatible models/interfaces are needed for LangChain applications to invoke LLMs deployed on OCI data science model deployment service.

If you deploy LLM on OCI model deployment service using AI Quick Actions or HuggingFace TGI , you can use the integration models described in this page to build your application with LangChain.

Authentication

By default, the integration uses the same authentication method configured with ads.set_auth(). Optionally, you can also pass the auth keyword argument when initializing the model to use specific authentication method for the model. For example, to use resource principal for all OCI authentication:

import ads
from ads.llm import ChatOCIModelDeploymentVLLM

ads.set_auth(auth="resource_principal")

llm = ChatOCIModelDeploymentVLLM(
    model="odsc-llm",
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Optionally you can specify additional keyword arguments for the model, e.g. temperature.
    temperature=0.1,
)

Alternatively, you may use specific authentication for the model:

import ads
from ads.llm import ChatOCIModelDeploymentVLLM

llm = ChatOCIModelDeploymentVLLM(
    model="odsc-llm",
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Use security token authentication for the model
    auth=ads.auth.security_token(profile="my_profile"),
    # Optionally you can specify additional keyword arguments for the model, e.g. temperature.
    temperature=0.1,
)

Completion Models

Completion models takes a text string and input and returns a string with completions. To use completion models, your model should be deployed with the completion endpoint (/v1/completions). The following example shows how you can use the OCIModelDeploymentVLLM class for model deployed with vLLM container. If you deployed the model with TGI container, you can use OCIModelDeploymentTGI similarly.

from ads.llm import OCIModelDeploymentVLLM

llm = OCIModelDeploymentVLLM(
    model="odsc-llm",
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Optionally you can specify additional keyword arguments for the model.
    max_tokens=32,
)

# Invoke the LLM. The completion will be a string.
completion = llm.invoke("Who is the first president of United States?")

# Stream the completion
for chunk in llm.stream("Who is the first president of United States?"):
    print(chunk, end="", flush=True)

# Invoke asynchronously
completion = await llm.ainvoke("Who is the first president of United States?")

# Stream asynchronously
async for chunk in llm.astream("Who is the first president of United States?"):
    print(chunk, end="", flush=True)

Chat Models

Chat models takes chat messages as inputs and returns additional chat message (usually AIMessage) as output. To use chat models, your models must be deployed with chat completion endpoint (/v1/chat/completions). The following example shows how you can use the ChatOCIModelDeploymentVLLM class for model deployed with vLLM container. If you deployed the model with TGI container, you can use ChatOCIModelDeploymentTGI similarly.

from langchain_core.messages import HumanMessage, SystemMessage
from ads.llm import ChatOCIModelDeploymentVLLM

llm = ChatOCIModelDeploymentVLLM(
    model="odsc-llm",
    endpoint=f"<oci_model_deployment_url>>/predict",
    # Optionally you can specify additional keyword arguments for the model.
    max_tokens=32,
)

messages = [
    HumanMessage(content="Who's the first president of United States?"),
]

# Invoke the LLM. The response will be `AIMessage`
response = llm.invoke(messages)
# Print the text of the response
print(response.content)

# Stream the response. Note that each chunk is an `AIMessageChunk``
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

# Invoke asynchronously
response = await llm.ainvoke(messages)
print(response.content)

# Stream asynchronously
async for chunk in llm.astream(messages):
    print(chunk.content, end="")

Tool Calling

The vLLM container support tool/function calling on some models (e.g. Mistral and Hermes models). To use tool calling, you must customize the “Model deployment configuration” to use --enable-auto-tool-choice and specify --tool-call-parser when deploying the model with vLLM container. A customized chat_template is also needed for tool/function calling to work with vLLM. ADS includes a convenience way to import the example templates provided by vLLM.

from ads.llm import ChatOCIModelDeploymentVLLM, ChatTemplates

llm = ChatOCIModelDeploymentVLLM(
    model="odsc-llm",
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Set tool_choice to "auto" to enable tool/function calling.
    tool_choice="auto",
    # Use the modified mistral template provided by vLLM
    chat_template=ChatTemplates.mistral()
)

Following is an example of creating an agent with a tool to get current exchange rate:

import requests
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_tool_calling_agent, AgentExecutor

@tool
def get_exchange_rate(currency:str) -> str:
    """Obtain the current exchange rates of currency in ISO 4217 Three Letter Currency Code"""

    response = requests.get(f"https://open.er-api.com/v6/latest/{currency}")
    return response.json()

tools = [get_exchange_rate]
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
agent_executor.invoke({"input": "what's the currency conversion of USD to Yen"})