workbench-example-hybrid-rag  by NVIDIA

AI Workbench example RAG project with Gradio chat app

Created 1 year ago
340 stars

Top 81.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an example of a Retrieval Augmented Generation (RAG) application, targeting developers and researchers who want to build and deploy conversational AI systems with custom document knowledge. It offers a flexible Gradio-based chat interface and supports multiple inference backends, including local GPUs, cloud endpoints, and NVIDIA Inference Microservices (NIMs).

How It Works

The project leverages NVIDIA AI Workbench for environment management and reproducibility. Users can embed their documents into a local vector database for retrieval. Inference can be handled by Hugging Face's Text Generation Inference (TGI) server for local GPU execution, NVIDIA API Catalog endpoints for cloud-based inference, or NIMs for microservice deployments. This hybrid approach allows users to choose the most suitable inference strategy based on their hardware and deployment needs.

Quick Start & Requirements

  • Install: NVIDIA AI Workbench is the recommended method for running this project.
  • Prerequisites: An NGC account and NVCF run key are required for cloud endpoints. A Hugging Face API token is needed for gated models when running locally. A GPU with at least 12GB VRAM is recommended for local inference. Docker is required for the local microservice tutorial.
  • Setup: Initial setup involves cloning the repository within AI Workbench and configuring secrets. Building the project environment can take several minutes.
  • Links: AI Workbench Docs, Explore Example Projects, Developer Forum

Highlighted Details

  • Supports a wide range of NVIDIA and Hugging Face models across different inference modes.
  • Offers quantization options (4-bit, 8-bit, none) for local TGI inference.
  • Allows customization of the Gradio chat application and backend logic.
  • Provides detailed tutorials for cloud, local GPU, remote microservice, and local microservice deployments.

Maintenance & Community

This is an official NVIDIA example project. Feedback and issues can be submitted via the NVIDIA Developer Forums.

Licensing & Compatibility

  • License: Apache 2.0 License.
  • Compatibility: Users are responsible for complying with the licenses of third-party components, including models downloaded from Hugging Face.

Limitations & Caveats

Running gated models locally requires obtaining access from Hugging Face and configuring a Hugging Face API token. The local microservice tutorial specifically requires Docker and participation in the NeMo Inference Microservice (NIMs) General Availability Program.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Samuel Colvin Samuel Colvin(Founder and Author of Pydantic) and Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML).

5ire by nanbingxyz

0.8%
5k
Cross-platform desktop AI assistant and MCP client
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.