Tiny RAG system for retrieval-augmented LLM
Top 96.7% on sourcepulse
This project provides a minimal Retrieval-Augmented Generation (RAG) system designed for researchers and developers needing a lightweight, modular framework. It simplifies the process of integrating external knowledge into LLMs for improved response accuracy and relevance, supporting various document types and multiple embedding and LLM backends.
How It Works
The system employs a multi-stage approach: document parsing and embedding, offline database construction, and online retrieval with re-ranking. It supports text and image embeddings using models like BGE and CLIP, respectively. For retrieval, it combines BM25 keyword search with vector similarity search using FAISS. A re-ranking model then refines the retrieved results before they are passed to the LLM, enhancing the quality of context provided to the language model.
Quick Start & Requirements
python script/tiny_rag.py -t build -c config/qwen2_config.json -p data/raw_data/wikipedia-cn-20230720-filtered.json
(for building DB) and python script/tiny_rag.py -t search -c config/qwen2_config.json
(for searching).Highlighted Details
Maintenance & Community
The project appears to be a personal or small-team effort with no explicit mention of maintainers, community channels (like Discord/Slack), or a public roadmap.
Licensing & Compatibility
The README does not explicitly state a license. The code structure and dependencies suggest it is intended for research and development purposes. Commercial use would require clarification on licensing.
Limitations & Caveats
The project uses smaller models for demonstration, and larger models are recommended for better performance. The README does not detail error handling, scalability considerations, or specific performance benchmarks. The lack of explicit community support or clear licensing might be a concern for production deployments.
3 months ago
1 day