LangChain Integration¶
Added in version 2.12.0.
LangChain Community
While the stable integrations (such as ChatOCIModelDeploymentVLLM
and OCIModelDeploymentVLLM
) are also available from LangChain Community, integrations from ADS may provide additional or experimental features in the latest updates.
Requirements
The LangChain integration requires python>=3.9
and langchain>=0.3
. Chat model also requires langchain-openai
.
LangChain compatible models/interfaces are needed for LangChain applications to invoke LLMs deployed on OCI data science model deployment service.
If you deploy LLM on OCI model deployment service using AI Quick Actions or HuggingFace TGI , you can use the integration models described in this page to build your application with LangChain.
Authentication¶
By default, the integration uses the same authentication method configured with ads.set_auth()
. Optionally, you can also pass the auth
keyword argument when initializing the model to use specific authentication method for the model. For example, to use resource principal for all OCI authentication:
import ads
from ads.llm import ChatOCIModelDeployment
ads.set_auth(auth="resource_principal")
llm = ChatOCIModelDeployment(
model="odsc-llm", # default model name if deployed on AQUA
endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
# Optionally you can specify additional keyword arguments for the model, e.g. temperature and default_headers.
temperature=0.1,
default_headers={"route": "v1/chat/completions"}, # default route for chat models
)
Alternatively, you may use specific authentication for the model:
import ads
from ads.llm import ChatOCIModelDeployment
llm = ChatOCIModelDeployment(
model="odsc-llm", # default model name if deployed on AQUA
endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
# Use security token authentication for the model
auth=ads.auth.security_token(profile="my_profile"),
# Optionally you can specify additional keyword arguments for the model, e.g. temperature and default_headers.
temperature=0.1,
default_headers={"route": "v1/chat/completions"}, # default route for chat models
)
Completion Models¶
Completion models takes a text string and input and returns a string with completions. To use completion models, your model should be deployed with the completion endpoint (/v1/completions
).
from ads.llm import OCIModelDeploymentLLM
llm = OCIModelDeploymentLLM(
model="odsc-llm", # default model name if deployed on AQUA
endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
# Optionally you can specify additional keyword arguments for the model.
max_tokens=32,
default_headers={"route": "v1/completions"}, # default route for completion models
)
# Invoke the LLM. The completion will be a string.
completion = llm.invoke("Who is the first president of United States?")
# Stream the completion
for chunk in llm.stream("Who is the first president of United States?"):
print(chunk, end="", flush=True)
# Invoke asynchronously
completion = await llm.ainvoke("Who is the first president of United States?")
# Stream asynchronously
async for chunk in llm.astream("Who is the first president of United States?"):
print(chunk, end="", flush=True)
Chat Models¶
Chat models takes chat messages as inputs and returns additional chat message (usually AIMessage) as output. To use chat models, your models must be deployed with chat completion endpoint (/v1/chat/completions
).
from langchain_core.messages import HumanMessage, SystemMessage
from ads.llm import ChatOCIModelDeployment
llm = ChatOCIModelDeployment(
model="odsc-llm", # default model name if deployed on AQUA
endpoint=f"<oci_model_deployment_url>/predict",
# Optionally you can specify additional keyword arguments for the model.
max_tokens=32,
default_headers={"route": "v1/chat/completions"}, # default route for chat models
)
messages = [
HumanMessage(content="Who's the first president of United States?"),
]
# Invoke the LLM. The response will be `AIMessage`
response = llm.invoke(messages)
# Print the text of the response
print(response.content)
# Stream the response. Note that each chunk is an `AIMessageChunk``
for chunk in llm.stream(messages):
print(chunk.content, end="", flush=True)
# Invoke asynchronously
response = await llm.ainvoke(messages)
print(response.content)
# Stream asynchronously
async for chunk in llm.astream(messages):
print(chunk.content, end="")
Embedding Models¶
You can also use embedding model that’s hosted on a OCI Data Science Model Deployment.
from ads.llm import OCIDataScienceEmbedding
# Create an instance of OCI Model Deployment Endpoint
# Replace the endpoint uri with your own
embeddings = OCIDataScienceEmbedding(
endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/<MD_OCID>/predict",
)
query = "Hello World!"
embeddings.embed_query(query)
Tool Calling¶
The vLLM container support tool/function calling on some models (e.g. Mistral and Hermes models). To use tool calling, you must customize the “Model deployment configuration” to use --enable-auto-tool-choice
and specify --tool-call-parser
when deploying the model with vLLM container. A customized chat_template
is also needed for tool/function calling to work with vLLM. ADS includes a convenience way to import the example templates provided by vLLM.
from ads.llm import ChatOCIModelDeploymentVLLM, ChatTemplates
llm = ChatOCIModelDeploymentVLLM(
model="odsc-llm", # default model name if deployed on AQUA
endpoint= f"https://modeldeployment.oci.customer-oci.com/<OCID>/predict",
# Set tool_choice to "auto" to enable tool/function calling.
tool_choice="auto",
# Use the modified mistral template provided by vLLM
chat_template=ChatTemplates.mistral()
)
Following is an example of creating an agent with a tool to get current exchange rate:
import requests
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_tool_calling_agent, AgentExecutor
@tool
def get_exchange_rate(currency:str) -> str:
"""Obtain the current exchange rates of currency in ISO 4217 Three Letter Currency Code"""
response = requests.get(f"https://open.er-api.com/v6/latest/{currency}")
return response.json()
tools = [get_exchange_rate]
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
agent_executor.invoke({"input": "what's the currency conversion of USD to Yen"})