FlashRAG  by RUC-NLPIR

Python toolkit for efficient RAG research

Created 1 year ago
2,847 stars

Top 16.7% on SourcePulse

GitHubView on GitHub
Project Summary

FlashRAG is a Python toolkit designed for efficient Retrieval-Augmented Generation (RAG) research, enabling users to reproduce and develop RAG systems. It offers a comprehensive framework with 36 pre-processed benchmark datasets and 17 state-of-the-art RAG algorithms, catering to researchers and developers in the RAG domain.

How It Works

FlashRAG provides a modular architecture with components for retrievers, rerankers, generators, and compressors, allowing flexible pipeline assembly. It supports various retrieval methods (dense and sparse) using Faiss and Pyserini/bm25s, and integrates with LLM acceleration tools like vLLM and FastChat. The toolkit simplifies RAG workflow preparation through efficient preprocessing scripts and offers an easy-to-use UI for configuration and experimentation.

Quick Start & Requirements

  • Installation: pip install flashrag-dev --pre or clone and pip install -e .
  • Dependencies: Python 3.10+. Optional: vllm, sentence-transformers, pyserini. faiss-cpu or faiss-gpu requires Conda installation.
  • Resources: Supports GPU acceleration via vLLM. Index building can be resource-intensive depending on corpus size and retrieval method.
  • Links: Installation, Quick Start, FlashRAG-UI

Highlighted Details

  • Includes 36 pre-processed RAG benchmark datasets and 17 implemented SOTA RAG algorithms.
  • Supports multimodal RAG with MLLMs like LLaVA and multimodal retrievers.
  • Offers FlashRAG-UI, a visual interface for easy configuration, experimentation, and evaluation.
  • Provides optimized execution with vLLM, FastChat, and Faiss.

Maintenance & Community

The project is under active development, with a roadmap indicating plans to include more RAG approaches and evaluation metrics. Contributions are welcomed.

Licensing & Compatibility

FlashRAG is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The toolkit is still under development, with some features planned for future releases. While efforts are made to reproduce original method results, uniform settings may lead to variations compared to original outcomes. Faiss installation can be challenging on certain systems.

Health Check
Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
241 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
1 more.

AutoRAG by Marker-Inc-Korea

0.3%
4k
RAG AutoML tool for optimizing RAG pipelines
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.