LangChain Integration

Added in version 2.12.0.

LangChain Community

While the stable integrations (such as ChatOCIModelDeploymentVLLM and OCIModelDeploymentVLLM) are also available from LangChain Community, integrations from ADS may provide additional or experimental features in the latest updates.

Requirements

The LangChain integration requires python>=3.9 and langchain>=0.3. Chat model also requires langchain-openai.

LangChain compatible models/interfaces are needed for LangChain applications to invoke LLMs deployed on OCI data science model deployment service.

If you deploy LLM on OCI model deployment service using AI Quick Actions or HuggingFace TGI , you can use the integration models described in this page to build your application with LangChain.

Authentication

By default, the integration uses the same authentication method configured with ads.set_auth(). Optionally, you can also pass the auth keyword argument when initializing the model to use specific authentication method for the model. For example, to use resource principal for all OCI authentication:

import ads
from ads.llm import ChatOCIModelDeployment

ads.set_auth(auth="resource_principal")

llm = ChatOCIModelDeployment(
    model="odsc-llm", # default model name if deployed on AQUA
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Optionally you can specify additional keyword arguments for the model, e.g. temperature and default_headers.
    temperature=0.1,
    default_headers={"route": "v1/chat/completions"}, # default route for chat models
)

Alternatively, you may use specific authentication for the model:

import ads
from ads.llm import ChatOCIModelDeployment

llm = ChatOCIModelDeployment(
    model="odsc-llm", # default model name if deployed on AQUA
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Use security token authentication for the model
    auth=ads.auth.security_token(profile="my_profile"),
    # Optionally you can specify additional keyword arguments for the model, e.g. temperature and default_headers.
    temperature=0.1,
    default_headers={"route": "v1/chat/completions"}, # default route for chat models
)

Completion Models

Completion models takes a text string and input and returns a string with completions. To use completion models, your model should be deployed with the completion endpoint (/v1/completions).

from ads.llm import OCIModelDeploymentLLM

llm = OCIModelDeploymentLLM(
    model="odsc-llm", # default model name if deployed on AQUA
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Optionally you can specify additional keyword arguments for the model.
    max_tokens=32,
    default_headers={"route": "v1/completions"}, # default route for completion models
)

# Invoke the LLM. The completion will be a string.
completion = llm.invoke("Who is the first president of United States?")

# Stream the completion
for chunk in llm.stream("Who is the first president of United States?"):
    print(chunk, end="", flush=True)

# Invoke asynchronously
completion = await llm.ainvoke("Who is the first president of United States?")

# Stream asynchronously
async for chunk in llm.astream("Who is the first president of United States?"):
    print(chunk, end="", flush=True)

Chat Models

Chat models takes chat messages as inputs and returns additional chat message (usually AIMessage) as output. To use chat models, your models must be deployed with chat completion endpoint (/v1/chat/completions).

from langchain_core.messages import HumanMessage, SystemMessage
from ads.llm import ChatOCIModelDeployment

llm = ChatOCIModelDeployment(
    model="odsc-llm", # default model name if deployed on AQUA
    endpoint=f"<oci_model_deployment_url>/predict",
    # Optionally you can specify additional keyword arguments for the model.
    max_tokens=32,
    default_headers={"route": "v1/chat/completions"}, # default route for chat models
)

messages = [
    HumanMessage(content="Who's the first president of United States?"),
]

# Invoke the LLM. The response will be `AIMessage`
response = llm.invoke(messages)
# Print the text of the response
print(response.content)

# Stream the response. Note that each chunk is an `AIMessageChunk``
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

# Invoke asynchronously
response = await llm.ainvoke(messages)
print(response.content)

# Stream asynchronously
async for chunk in llm.astream(messages):
    print(chunk.content, end="")

Embedding Models

You can also use embedding model that’s hosted on a OCI Data Science Model Deployment.

from ads.llm import OCIDataScienceEmbedding

# Create an instance of OCI Model Deployment Endpoint
# Replace the endpoint uri with your own
embeddings = OCIDataScienceEmbedding(
    endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/<MD_OCID>/predict",
)

query = "Hello World!"
embeddings.embed_query(query)

Tool Calling

The vLLM container support tool/function calling on some models (e.g. Mistral and Hermes models). To use tool calling, you must customize the “Model deployment configuration” to use --enable-auto-tool-choice and specify --tool-call-parser when deploying the model with vLLM container. A customized chat_template is also needed for tool/function calling to work with vLLM. ADS includes a convenience way to import the example templates provided by vLLM.

from ads.llm import ChatOCIModelDeploymentVLLM, ChatTemplates

llm = ChatOCIModelDeploymentVLLM(
    model="odsc-llm", # default model name if deployed on AQUA
    endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
    # Set tool_choice to "auto" to enable tool/function calling.
    tool_choice="auto",
    # Use the modified mistral template provided by vLLM
    chat_template=ChatTemplates.mistral()
)

Following is an example of creating an agent with a tool to get current exchange rate:

import requests
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_tool_calling_agent, AgentExecutor

@tool
def get_exchange_rate(currency:str) -> str:
    """Obtain the current exchange rates of currency in ISO 4217 Three Letter Currency Code"""

    response = requests.get(f"https://open.er-api.com/v6/latest/{currency}")
    return response.json()

tools = [get_exchange_rate]
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
agent_executor.invoke({"input": "what's the currency conversion of USD to Yen"})