workbench-example-hybrid-rag  by NVIDIA

AI Workbench example RAG project with Gradio chat app

created 1 year ago
333 stars

Top 83.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an example of a Retrieval Augmented Generation (RAG) application, targeting developers and researchers who want to build and deploy conversational AI systems with custom document knowledge. It offers a flexible Gradio-based chat interface and supports multiple inference backends, including local GPUs, cloud endpoints, and NVIDIA Inference Microservices (NIMs).

How It Works

The project leverages NVIDIA AI Workbench for environment management and reproducibility. Users can embed their documents into a local vector database for retrieval. Inference can be handled by Hugging Face's Text Generation Inference (TGI) server for local GPU execution, NVIDIA API Catalog endpoints for cloud-based inference, or NIMs for microservice deployments. This hybrid approach allows users to choose the most suitable inference strategy based on their hardware and deployment needs.

Quick Start & Requirements

  • Install: NVIDIA AI Workbench is the recommended method for running this project.
  • Prerequisites: An NGC account and NVCF run key are required for cloud endpoints. A Hugging Face API token is needed for gated models when running locally. A GPU with at least 12GB VRAM is recommended for local inference. Docker is required for the local microservice tutorial.
  • Setup: Initial setup involves cloning the repository within AI Workbench and configuring secrets. Building the project environment can take several minutes.
  • Links: AI Workbench Docs, Explore Example Projects, Developer Forum

Highlighted Details

  • Supports a wide range of NVIDIA and Hugging Face models across different inference modes.
  • Offers quantization options (4-bit, 8-bit, none) for local TGI inference.
  • Allows customization of the Gradio chat application and backend logic.
  • Provides detailed tutorials for cloud, local GPU, remote microservice, and local microservice deployments.

Maintenance & Community

This is an official NVIDIA example project. Feedback and issues can be submitted via the NVIDIA Developer Forums.

Licensing & Compatibility

  • License: Apache 2.0 License.
  • Compatibility: Users are responsible for complying with the licenses of third-party components, including models downloaded from Hugging Face.

Limitations & Caveats

Running gated models locally requires obtaining access from Hugging Face and configuring a Hugging Face API token. The local microservice tutorial specifically requires Docker and participation in the NeMo Inference Microservice (NIMs) General Availability Program.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 4 days ago
Feedback? Help us improve.