Discover and explore top open-source AI tools and projects—updated daily.
GPU-accelerated RAG pipeline for enterprise data
Top 97.9% on SourcePulse
NVIDIA-AI-Blueprints/rag provides a reference solution for building foundational Retrieval Augmented Generation (RAG) pipelines. It targets developers seeking a quick, production-ready RAG setup leveraging NVIDIA NIM microservices and GPU acceleration. The blueprint enables querying enterprise data, offering benefits like enhanced data governance, reduced latency, and multimodal data processing.
How It Works
This blueprint implements a modular RAG architecture orchestrated by a LangChain-based server. It utilizes NVIDIA NIM microservices for core functions: response generation (LLM inference), embedding retrieval, and document parsing/extraction. Data is stored in a Milvus Vector Database, accelerated with NVIDIA cuVS. The workflow involves query processing, retrieval of relevant document chunks, optional reranking for precision, and response generation using the retrieved context. Key advantages include GPU-accelerated indexing and search, multimodal data ingestion, and optional integration of vision language models (VLMs) and guardrails.
Quick Start & Requirements
Deployment options include Docker Compose for single-node setups, direct integration with NVIDIA AI Workbench, or Helm charts for scalable deployments. Sample Jupyter notebooks are provided for interaction. Prerequisites include NVIDIA NIM microservices and compatible NVIDIA GPUs. Specific models like nvidia/llama-3.3-nemotron-super-49b-v1
and nvidia/llama-3_2-nv-embedqa-1b-v2
are leveraged. Official quick-start guides, API specifications, and usage notebooks are available via links in the documentation.
Highlighted Details
rag-playground
).Maintenance & Community
The project is hosted on GitHub, encouraging community contributions through issues and pull requests to support the NVIDIA LLM ecosystem and gather feedback.
Licensing & Compatibility
The blueprint itself is licensed under the Apache License, Version 2.0. Use of the integrated models is governed by specific NVIDIA licenses, including the NVIDIA AI Foundation Models Community License and the Llama 3.2 Community License Agreement for certain models. Compatibility requires NVIDIA hardware and software stack.
Limitations & Caveats
Advanced features such as self-reflection, query rewriting, image captioning, NeMo Guardrails, VLM inferencing, and PDF extraction with Nemoretriever Parse
are not supported on B200 GPUs; H100 or A100 GPUs are recommended for these functionalities. Image captioning is disabled by default to optimize latency, potentially affecting accuracy for image-related queries.
1 day ago
Inactive