local-rag-system  by jamwithai

Local RAG system for private document querying

Created 1 year ago
261 stars

Top 97.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides a complete solution for building a private, offline Retrieval-Augmented Generation (RAG) system. It enables users to manage and query personal documents locally, offering a privacy-friendly alternative to cloud-based solutions. The system targets individuals and privacy-conscious users seeking to leverage LLMs for document analysis without data exposure. Its core benefit is enabling secure, local document interaction powered by advanced AI.

How It Works

The system employs a hybrid approach combining traditional text matching and semantic search via OpenSearch. Document embeddings are generated using Sentence Transformers, facilitating efficient semantic retrieval. These retrieved contexts are then fed to local Large Language Models (LLMs) to generate personalized, context-aware responses. This architecture ensures data privacy by keeping all processing and documents on the user's machine.

Quick Start & Requirements

  • Installation involves cloning the repository, installing dependencies via pip install -r requirements.txt, configuring constants.py for embedding models and OpenSearch settings, and running the Streamlit application with streamlit run welcome.py.
  • Prerequisites include a Python environment, OpenSearch, Sentence Transformers models, and local LLMs. Specific hardware requirements (e.g., GPU, CUDA) or Python versions are not detailed.
  • Links to a two-part blog guide are provided for a detailed walkthrough: Part 1 and Part 2.

Highlighted Details

  • Enables a fully private, offline RAG system for personal documents.
  • Features hybrid search capabilities leveraging OpenSearch for both keyword and semantic matching.
  • Designed for easy integration with local LLMs for customized responses.

Maintenance & Community

  • No specific details on contributors, sponsorships, or community channels (like Discord/Slack) are provided.

Licensing & Compatibility

  • The license type is not specified.

Limitations & Caveats

  • The README does not detail specific hardware requirements, making performance estimation difficult.
  • The setup relies on manual configuration of constants and external services like OpenSearch and LLMs, potentially requiring significant technical expertise.
  • The project's description field is explicitly marked as "None".
Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
37 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.