Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
This release adds ability to switch between API Catalog models and on-prem models using NIM-LLM and adds documentation on how to build an RAG application from scratch. It also releases a containerized end to end RAG evaluation application integrated with RAG chain-server APIs.
/health
- Provides a health check for the chain server.csv_rag
to structured_data_rag
nv-ai-foundation
and nv-api-catalog
llm engine are renamed to nvidia-ai-endpoints
nv-ai-foundation
embedding engine is renamed to nvidia-ai-endpoints
developer_rag
example uses UAE-Large-V1 embedding model.ai-embed-qa-4
for api catalog examples instead of nvolveqa_40k
as embedding modelpgvector/pgvector:pg16
ai-mixtral-8x7b-instruct
model for response generation.ai-llama3-70b
for response and code generation.This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest API catalog endpoints from NVIDIA and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.
llm-playground
service in compose files to rag-playground
.NVIDIA_API_KEY
in compose.env
file.This release adds new dedicated notebooks showcasing usage of cloud based NVIDIA AI Foundation models, upgraded milvus container version to enable GPU accelerated vector search and added support for FAISS vector database. Detailed changes are listed below:
nemo-embed
and nemo-infer
.COLLECTION_NAME
.faiss
as a generic vector database solution behind utils.py
.23.12-py3
.utils.py
.get_llm
utility in utils.py
to return Langchain wrapper instead of Llmaindex wrappers.This release adds support for PGvector Vector DB, speech-in speech-out support using RIVA and RAG observability tooling. This release also adds a dedicated example for RAG pipeline using only models from NVIDIA AI Foundation and one example demonstrating query decomposition. Detailed changes are listed below:
minio
service's port to 9010
from 9000
in docker based deployment.evaluation
directory from top level to under tools
and created a dedicated compose file.ai-playground
model engine name to nv-ai-foundation
in configurations.This release builds on the feedback received and brings many improvements, bugfixes and new features. This release is the first to include Nvidia AI Foundational models support and support for quantized LLM models. Detailed changes are listed below: