tiny-rag  by wdndev

Tiny RAG system for retrieval-augmented LLM

created 1 year ago
267 stars

Top 96.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a minimal Retrieval-Augmented Generation (RAG) system designed for researchers and developers needing a lightweight, modular framework. It simplifies the process of integrating external knowledge into LLMs for improved response accuracy and relevance, supporting various document types and multiple embedding and LLM backends.

How It Works

The system employs a multi-stage approach: document parsing and embedding, offline database construction, and online retrieval with re-ranking. It supports text and image embeddings using models like BGE and CLIP, respectively. For retrieval, it combines BM25 keyword search with vector similarity search using FAISS. A re-ranking model then refines the retrieved results before they are passed to the LLM, enhancing the quality of context provided to the language model.

Quick Start & Requirements

  • Install/Run: python script/tiny_rag.py -t build -c config/qwen2_config.json -p data/raw_data/wikipedia-cn-20230720-filtered.json (for building DB) and python script/tiny_rag.py -t search -c config/qwen2_config.json (for searching).
  • Prerequisites: Python, FAISS, BGE, CLIP, Qwen2 (or other supported LLMs). Specific models need to be downloaded separately as indicated in the README.
  • Setup: Requires downloading multiple models. Configuration is managed via JSON files.

Highlighted Details

  • Supports multiple document parsing formats (txt, markdown, pdf, word, ppt, images).
  • Implements dual-path retrieval (BM25 and vector similarity) with a re-ranking stage.
  • Offers modularity for custom embedding and LLM integrations by inheriting base classes.
  • Includes support for both local LLM inference and API-based models.

Maintenance & Community

The project appears to be a personal or small-team effort with no explicit mention of maintainers, community channels (like Discord/Slack), or a public roadmap.

Licensing & Compatibility

The README does not explicitly state a license. The code structure and dependencies suggest it is intended for research and development purposes. Commercial use would require clarification on licensing.

Limitations & Caveats

The project uses smaller models for demonstration, and larger models are recommended for better performance. The README does not detail error handling, scalability considerations, or specific performance benchmarks. The lack of explicit community support or clear licensing might be a concern for production deployments.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
54 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.