simple-local-rag by mrdbourke

Local RAG pipeline for querying PDFs using open-source tools

Created 1 year ago

887 stars

Top 40.7% on SourcePulse

Project Summary

This repository provides a tutorial for building a Retrieval Augmented Generation (RAG) pipeline that runs entirely locally, targeting users who want to implement "chat with PDF" functionality using open-source tools. It demonstrates a practical application by creating "NutriChat," a system for querying a 1200-page nutrition textbook.

How It Works

The pipeline follows the RAG paradigm: retrieve relevant information from a data source, augment the LLM's prompt with this information, and then generate a response. This approach aims to reduce LLM hallucinations and enable interaction with custom, domain-specific data without the need for costly fine-tuning. The project emphasizes local execution for privacy, speed, and cost benefits.

Quick Start & Requirements

Installation: Clone the repository, create and activate a Python virtual environment, and install requirements via pip install -r requirements.txt.
Prerequisites: Python 3.11+, NVIDIA GPU with 5GB+ VRAM (or Google Colab), familiarity with Python and PyTorch. Manual installation of PyTorch with CUDA support (e.g., pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121) is recommended.
Setup: Requires agreeing to Hugging Face terms for models like Gemma and potentially authorizing your machine via Hugging Face CLI. Flash Attention 2 can be optionally compiled for speedups.
Resources: Links to a YouTube walkthrough video and Hugging Face for model access are provided.

Highlighted Details

Demonstrates a full RAG pipeline from PDF ingestion to chat interface.
Focuses on local execution for privacy, speed, and cost efficiency.
Utilizes open-source tools and models, including Gemma.
Explains core RAG concepts, benefits, and use cases.

Maintenance & Community

The repository is maintained by mrdbourke. Further community engagement details (Discord/Slack, roadmap) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would depend on the specific licenses of the underlying libraries and models used.

Limitations & Caveats

The README indicates that setup instructions are not fully complete. Compiling Flash Attention 2 can be time-consuming, especially on Windows. Accessing certain LLM models (like Gemma) requires agreeing to Hugging Face terms and potentially authorizing local access.

simple-local-rag by mrdbourke

Explore Similar Projects

magenta.nvim by dlants

llm-axe by emirsahin1

local_llama by jlonge4

onprem by amaiya

LLM_AppDev-HandsOn by sroecker

obsidian-Smart2Brain by your-papa

MiniMax-MCP by MiniMax-AI

ollama_pdf_rag by tonykipkemboi

fully-local-pdf-chatbot by jacoblee93

ChatRTX by NVIDIA

workbench-example-hybrid-rag by NVIDIA

chroma by chroma-core