simple-local-rag  by mrdbourke

Local RAG pipeline for querying PDFs using open-source tools

created 1 year ago
791 stars

Top 45.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a tutorial for building a Retrieval Augmented Generation (RAG) pipeline that runs entirely locally, targeting users who want to implement "chat with PDF" functionality using open-source tools. It demonstrates a practical application by creating "NutriChat," a system for querying a 1200-page nutrition textbook.

How It Works

The pipeline follows the RAG paradigm: retrieve relevant information from a data source, augment the LLM's prompt with this information, and then generate a response. This approach aims to reduce LLM hallucinations and enable interaction with custom, domain-specific data without the need for costly fine-tuning. The project emphasizes local execution for privacy, speed, and cost benefits.

Quick Start & Requirements

  • Installation: Clone the repository, create and activate a Python virtual environment, and install requirements via pip install -r requirements.txt.
  • Prerequisites: Python 3.11+, NVIDIA GPU with 5GB+ VRAM (or Google Colab), familiarity with Python and PyTorch. Manual installation of PyTorch with CUDA support (e.g., pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121) is recommended.
  • Setup: Requires agreeing to Hugging Face terms for models like Gemma and potentially authorizing your machine via Hugging Face CLI. Flash Attention 2 can be optionally compiled for speedups.
  • Resources: Links to a YouTube walkthrough video and Hugging Face for model access are provided.

Highlighted Details

  • Demonstrates a full RAG pipeline from PDF ingestion to chat interface.
  • Focuses on local execution for privacy, speed, and cost efficiency.
  • Utilizes open-source tools and models, including Gemma.
  • Explains core RAG concepts, benefits, and use cases.

Maintenance & Community

The repository is maintained by mrdbourke. Further community engagement details (Discord/Slack, roadmap) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would depend on the specific licenses of the underlying libraries and models used.

Limitations & Caveats

The README indicates that setup instructions are not fully complete. Compiling Flash Attention 2 can be time-consuming, especially on Windows. Accessing certain LLM models (like Gemma) requires agreeing to Hugging Face terms and potentially authorizing local access.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
96 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.