rag by NVIDIA-AI-Blueprints

GPU-accelerated RAG pipeline for enterprise data

Created 10 months ago

358 stars

Top 78.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Xiaofan Luan

VP Engineering at Zilliz

Project Summary

NVIDIA-AI-Blueprints/rag provides a reference solution for building foundational Retrieval Augmented Generation (RAG) pipelines. It targets developers seeking a quick, production-ready RAG setup leveraging NVIDIA NIM microservices and GPU acceleration. The blueprint enables querying enterprise data, offering benefits like enhanced data governance, reduced latency, and multimodal data processing.

How It Works

This blueprint implements a modular RAG architecture orchestrated by a LangChain-based server. It utilizes NVIDIA NIM microservices for core functions: response generation (LLM inference), embedding retrieval, and document parsing/extraction. Data is stored in a Milvus Vector Database, accelerated with NVIDIA cuVS. The workflow involves query processing, retrieval of relevant document chunks, optional reranking for precision, and response generation using the retrieved context. Key advantages include GPU-accelerated indexing and search, multimodal data ingestion, and optional integration of vision language models (VLMs) and guardrails.

Quick Start & Requirements

Deployment options include Docker Compose for single-node setups, direct integration with NVIDIA AI Workbench, or Helm charts for scalable deployments. Sample Jupyter notebooks are provided for interaction. Prerequisites include NVIDIA NIM microservices and compatible NVIDIA GPUs. Specific models like nvidia/llama-3.3-nemotron-super-49b-v1 and nvidia/llama-3_2-nv-embedqa-1b-v2 are leveraged. Official quick-start guides, API specifications, and usage notebooks are available via links in the documentation.

Highlighted Details

Supports multimodal PDF data extraction, including text, tables, charts, and infographics.
Features hybrid search combining dense and sparse retrieval methods.
Offers optional VLM integration for answer generation and image captioning.
Provides GPU-accelerated index creation and search capabilities.
Includes optional NeMo Guardrails for input/output content safety and topic control.
Exposes OpenAI-compatible APIs and a sample UI (rag-playground).

Maintenance & Community

The project is hosted on GitHub, encouraging community contributions through issues and pull requests to support the NVIDIA LLM ecosystem and gather feedback.

Licensing & Compatibility

The blueprint itself is licensed under the Apache License, Version 2.0. Use of the integrated models is governed by specific NVIDIA licenses, including the NVIDIA AI Foundation Models Community License and the Llama 3.2 Community License Agreement for certain models. Compatibility requires NVIDIA hardware and software stack.

Limitations & Caveats

Advanced features such as self-reflection, query rewriting, image captioning, NeMo Guardrails, VLM inferencing, and PDF extraction with Nemoretriever Parse are not supported on B200 GPUs; H100 or A100 GPUs are recommended for these functionalities. Image captioning is disabled by default to optimize latency, potentially affecting accuracy for image-related queries.

rag by NVIDIA-AI-Blueprints

Explore Similar Projects

ImageIndexer by jabberjabberjabber

MPP-LLaVA by Coobiw

oslo by tunib-ai

Omega-AI by dromara

MS-G3D by kenziyuliu

transformer-deploy by ELS-RD

DeepSpeed-MII by deepspeedai

towhee by towhee-io

simple-local-rag by mrdbourke

DALI by NVIDIA

deeplake by activeloopai

milvus by milvus-io