RAGLAB by fate-ubw

RAG framework for research, modularity, and reproducibility

Created 1 year ago

309 stars

Top 87.1% on SourcePulse

Project Summary

RAGLAB is a comprehensive, modular framework designed for research and development in Retrieval-Augmented Generation (RAG). It provides researchers and practitioners with a unified platform to reproduce, compare, and develop new RAG algorithms, supporting a full pipeline from data processing to evaluation across multiple datasets and metrics.

How It Works

RAGLAB offers a dual-mode system: "Interact Mode" for quick algorithm understanding and "Evaluation Mode" for rigorous scientific research and paper reproduction. It implements 6 state-of-the-art RAG algorithms and includes an evaluation system with 10 benchmark datasets, facilitating fair comparisons. The framework is built for extensibility, allowing easy integration of new algorithms, datasets, and evaluation metrics.

Quick Start & Requirements

Install: Clone the repository and create a Conda environment using conda env create -f environment.yml.
Prerequisites: PyTorch 2.0.1 (CUDA 11.8), flash-attn==2.2, en_core_web_sm, nltk (punkt). Requires downloading multiple models and datasets from Hugging Face.
Resources: ColBERT server requires at least 60GB RAM. GPU scheduler is available for parallel experiments.
Docs: process_wiki.md, train_docs.md

Highlighted Details

Reproduces 6 SOTA RAG algorithms and supports 10 benchmark datasets.
Includes an efficient retriever client with local API for parallel access and caching.
Compatible with large models (70B+), VLLM, and quantization techniques.
Supports ALCE evaluation; Factscore evaluation requires a separate environment due to dependency conflicts (PyTorch 1.13.1 vs. RAGLAB's 2.0.1).

Maintenance & Community

The project is associated with EMNLP 2024 System Demonstration. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Factscore evaluation requires manual environment setup due to PyTorch version conflicts with core RAGLAB dependencies. Some configuration steps, particularly for ColBERT server paths, require careful manual adjustment to absolute paths.

RAGLAB by fate-ubw

Explore Similar Projects

RAGTune by misbahsy

bergen by naver

RAG-Book by Nipi64310

RAG-FiT by IntelLabs

Rankify by DataScienceUIBK

Awesome-RAG by Danielskry

RAGChecker by amazon-science

RAGMeUp by ErikTromp

RAG-Survey by Tongji-KGLLM

rag-in-action by huangjia2019

FlashRAG by RUC-NLPIR

AutoRAG by Marker-Inc-Korea