RAGLAB  by fate-ubw

RAG framework for research, modularity, and reproducibility

created 1 year ago
302 stars

Top 89.3% on sourcepulse

GitHubView on GitHub
Project Summary

RAGLAB is a comprehensive, modular framework designed for research and development in Retrieval-Augmented Generation (RAG). It provides researchers and practitioners with a unified platform to reproduce, compare, and develop new RAG algorithms, supporting a full pipeline from data processing to evaluation across multiple datasets and metrics.

How It Works

RAGLAB offers a dual-mode system: "Interact Mode" for quick algorithm understanding and "Evaluation Mode" for rigorous scientific research and paper reproduction. It implements 6 state-of-the-art RAG algorithms and includes an evaluation system with 10 benchmark datasets, facilitating fair comparisons. The framework is built for extensibility, allowing easy integration of new algorithms, datasets, and evaluation metrics.

Quick Start & Requirements

  • Install: Clone the repository and create a Conda environment using conda env create -f environment.yml.
  • Prerequisites: PyTorch 2.0.1 (CUDA 11.8), flash-attn==2.2, en_core_web_sm, nltk (punkt). Requires downloading multiple models and datasets from Hugging Face.
  • Resources: ColBERT server requires at least 60GB RAM. GPU scheduler is available for parallel experiments.
  • Docs: process_wiki.md, train_docs.md

Highlighted Details

  • Reproduces 6 SOTA RAG algorithms and supports 10 benchmark datasets.
  • Includes an efficient retriever client with local API for parallel access and caching.
  • Compatible with large models (70B+), VLLM, and quantization techniques.
  • Supports ALCE evaluation; Factscore evaluation requires a separate environment due to dependency conflicts (PyTorch 1.13.1 vs. RAGLAB's 2.0.1).

Maintenance & Community

The project is associated with EMNLP 2024 System Demonstration. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Factscore evaluation requires manual environment setup due to PyTorch version conflicts with core RAGLAB dependencies. Some configuration steps, particularly for ColBERT server paths, require careful manual adjustment to absolute paths.

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.