AI Workbench example RAG project with Gradio chat app
Top 83.6% on sourcepulse
This project provides an example of a Retrieval Augmented Generation (RAG) application, targeting developers and researchers who want to build and deploy conversational AI systems with custom document knowledge. It offers a flexible Gradio-based chat interface and supports multiple inference backends, including local GPUs, cloud endpoints, and NVIDIA Inference Microservices (NIMs).
How It Works
The project leverages NVIDIA AI Workbench for environment management and reproducibility. Users can embed their documents into a local vector database for retrieval. Inference can be handled by Hugging Face's Text Generation Inference (TGI) server for local GPU execution, NVIDIA API Catalog endpoints for cloud-based inference, or NIMs for microservice deployments. This hybrid approach allows users to choose the most suitable inference strategy based on their hardware and deployment needs.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
This is an official NVIDIA example project. Feedback and issues can be submitted via the NVIDIA Developer Forums.
Licensing & Compatibility
Limitations & Caveats
Running gated models locally requires obtaining access from Hugging Face and configuring a Hugging Face API token. The local microservice tutorial specifically requires Docker and participation in the NeMo Inference Microservice (NIMs) General Availability Program.
2 months ago
Inactive