ChatNVIDIA features and configurations head to the API reference.
Overview
Thelangchain-nvidia-ai-endpoints package contains LangChain integrations for chat models and embeddings powered by NVIDIA AI Foundation Models, and hosted on the NVIDIA API Catalog.
NVIDIA AI Foundation models are community- and NVIDIA-built models that are optimized to deliver the best performance on NVIDIA-accelerated infrastructure. You can use the API to query live endpoints that are available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment, or you can download models from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license. The ability to run models on-premises gives your enterprise ownership of your customizations and full control of your IP and AI application.
NIM microservices are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIM microservices are containers that provide interactive APIs for running inference on an AI Model.
This example goes over how to use LangChain to interact with NVIDIA models via the ChatNVIDIA class.
For more information on accessing embedding models through this API, refer to the NVIDIAEmbeddings documentation.
Integration details
| Class | Package | Serializable | JS support | Downloads | Version |
|---|---|---|---|---|---|
| ChatNVIDIA | langchain-nvidia-ai-endpoints | beta | ❌ |
Model features
| Tool calling | Structured output | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ |
Install the package
Access the NVIDIA API Catalog
To get access to the NVIDIA API Catalog, do the following:- Create a free account on the NVIDIA API Catalog and log in.
- Click your profile icon, and then click API Keys. The API Keys page appears.
- Click Generate API Key. The Generate API Key window appears.
- Click Generate Key. You should see API Key Granted, and your key appears.
- Copy and save the key as
NVIDIA_API_KEY. - To verify your key, use the following code.
Instantiation
Now we can access models in the NVIDIA API Catalog:Invocation
Self-host with NVIDIA NIM Microservices
When you are ready to deploy your AI application, you can self-host models with NVIDIA NIM. For more information, refer to NVIDIA NIM Microservices. The following code connects to locally hosted NIM Microservices.Stream, batch, and async
These models natively support streaming, and as is the case with all LangChain LLMs they expose a batch method to handle concurrent requests, as well as async methods for invoke, stream, and batch. Below are a few examples.Supported models
Queryingavailable_models will still give you all of the other models offered by your API credentials.
The playground_ prefix is optional.
Model types
All of these models above are supported and can be accessed viaChatNVIDIA.
Some model types support unique prompting techniques and chat messages. We will review a few important ones below.
To find out more about a specific model, please navigate to the API section of an AI Foundation model as linked here.
General chat
Models such asmeta/llama3-8b-instruct and mistralai/mixtral-8x22b-instruct-v0.1 are good all-around models that you can use for with any LangChain chat messages. Example below.
Code generation
These models accept the same arguments and input structure as regular chat models, but they tend to perform better on code-generation and structured code tasks. An example of this ismeta/codellama-70b.
Multimodal
NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs isnvidia/neva-22b.
Below is an example use:
Passing an image as a URL
Passing an image as a base64 encoded string
At the moment, some extra processing happens client-side to support larger images like the one above. But for smaller images (and to better illustrate the process going on under the hood), we can directly pass in the image as shown below:Directly within the string
The NVIDIA API uniquely accepts images as base64 images inlined within<img/> HTML tags. While this isn’t interoperable with other LLMs, you can directly prompt the model accordingly.
Example usage within a RunnableWithMessageHistory
Like any other integration, ChatNVIDIA is fine to support chat utilities like RunnableWithMessageHistory which is analogous to usingConversationChain. Below, we show the LangChain RunnableWithMessageHistory example applied to the mistralai/mixtral-8x22b-instruct-v0.1 model.
Tool calling
Starting in v0.2,ChatNVIDIA supports bind_tools.
ChatNVIDIA provides integration with the variety of models on build.nvidia.com as well as local NIMs. Not all these models are trained for tool calling. Be sure to select a model that does have tool calling for your experimention and applications.
You can get a list of models that are known to support tool calling with,
API reference
For detailed documentation of allChatNVIDIA features and configurations head to the API reference: python.langchain.com/api_reference/nvidia_ai_endpoints/chat_models/langchain_nvidia_ai_endpoints.chat_models.ChatNVIDIA.html
Related topics
langchain-nvidia-ai-endpointspackageREADME- Overview of NVIDIA NIM for Large Language Models (LLMs)
- Overview of NeMo Retriever Embedding NIM
- Overview of NeMo Retriever Reranking NIM
NVIDIAEmbeddingsModel for RAG Workflows- NVIDIA Provider Page