End-to-end code for RAG retrieval model training, inference, and distillation
Top 38.4% on sourcepulse
This project provides an end-to-end solution for training, inference, and distillation of Retrieval-Augmented Generation (RAG) retrieval models, including embedding, ColBERT, and reranker components. It targets researchers and developers working with RAG systems, offering unified code and support for various open-source models, with a focus on efficient fine-tuning and distillation from large to small models.
How It Works
The framework supports fine-tuning of diverse RAG retrieval models: embedding models (BERT-based, LLM-based), late interaction models (ColBERT), and reranker models (BERT-based, LLM-based). It leverages advanced algorithms like MRL loss for dimensionality reduction in embedding models and supports multi-GPU training strategies via DeepSpeed and FSDP. For inference, a lightweight Python library rag-retrieval
offers a unified interface for various reranker models, including specific logic for handling long documents.
Quick Start & Requirements
conda create -n rag-retrieval python=3.8
, activate environment, pip install -r requirements.txt
. Manual PyTorch/CUDA version installation recommended.pip install rag-retrieval
. Manual PyTorch/CUDA version installation recommended.Highlighted Details
rag-retrieval-reranker
model showing strong results.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README notes that performance improvements from fine-tuning open-source models with existing general datasets might be limited, suggesting vertical field datasets yield greater gains.
4 weeks ago
1 day