workbench-example-hybrid-rag by NVIDIA

AI Workbench example RAG project with Gradio chat app

Created 1 year ago

361 stars

Top 77.8% on SourcePulse

Project Summary

This project provides an example of a Retrieval Augmented Generation (RAG) application, targeting developers and researchers who want to build and deploy conversational AI systems with custom document knowledge. It offers a flexible Gradio-based chat interface and supports multiple inference backends, including local GPUs, cloud endpoints, and NVIDIA Inference Microservices (NIMs).

How It Works

The project leverages NVIDIA AI Workbench for environment management and reproducibility. Users can embed their documents into a local vector database for retrieval. Inference can be handled by Hugging Face's Text Generation Inference (TGI) server for local GPU execution, NVIDIA API Catalog endpoints for cloud-based inference, or NIMs for microservice deployments. This hybrid approach allows users to choose the most suitable inference strategy based on their hardware and deployment needs.

Quick Start & Requirements

Install: NVIDIA AI Workbench is the recommended method for running this project.
Prerequisites: An NGC account and NVCF run key are required for cloud endpoints. A Hugging Face API token is needed for gated models when running locally. A GPU with at least 12GB VRAM is recommended for local inference. Docker is required for the local microservice tutorial.
Setup: Initial setup involves cloning the repository within AI Workbench and configuring secrets. Building the project environment can take several minutes.
Links: AI Workbench Docs, Explore Example Projects, Developer Forum

Highlighted Details

Supports a wide range of NVIDIA and Hugging Face models across different inference modes.
Offers quantization options (4-bit, 8-bit, none) for local TGI inference.
Allows customization of the Gradio chat application and backend logic.
Provides detailed tutorials for cloud, local GPU, remote microservice, and local microservice deployments.

Maintenance & Community

This is an official NVIDIA example project. Feedback and issues can be submitted via the NVIDIA Developer Forums.

Licensing & Compatibility

License: Apache 2.0 License.
Compatibility: Users are responsible for complying with the licenses of third-party components, including models downloaded from Hugging Face.

Limitations & Caveats

Running gated models locally requires obtaining access from Hugging Face and configuring a Hugging Face API token. The local microservice tutorial specifically requires Docker and participation in the NeMo Inference Microservice (NIMs) General Availability Program.

workbench-example-hybrid-rag by NVIDIA

Explore Similar Projects

warc-gpt by harvard-lil

icebreaker by hecrj

mcp-apple-notes by RafalWilinski

chatglm-openai-api by ninehills

llm-search by snexus

Sidekick by johnbean393

langchain4j-aideepin by moyangzhan

5ire by nanbingxyz

inference by xorbitsai

nexa-sdk by NexaAI

chat-ui by huggingface

spring-ai-alibaba by alibaba