RAG evaluation for retrieval algorithms, using LlamaIndex
Top 95.8% on sourcepulse
This repository evaluates retrieval techniques and algorithm performance for the Retrieve stage in RAG systems, using LlamaIndex as the primary framework. It targets researchers and engineers working on improving RAG retrieval accuracy and efficiency by comparing various retrieval methods, embedding models, and re-ranking strategies.
How It Works
The project systematically evaluates different retrieval strategies, including BM25, various embedding models (OpenAI, BGE, BGE-Finetune), and ensemble methods. It further assesses the impact of re-ranking using models like Cohere Rerank, BGE-Base Rerank, and BGE-Large Rerank. The evaluation is performed on a custom dataset (data/doc_qa_test.json
) and metrics like hit_rate, mrr, and cost_time are reported for each configuration. The project also explores advanced techniques like Query Rewrite and HyDE to enhance retrieval performance.
Quick Start & Requirements
pip install -r requirements.txt
data/doc_qa_test.json
.Highlighted Details
Maintenance & Community
The project appears to be a personal research effort, with no explicit mention of maintainers, community channels, or ongoing development beyond the provided README.
Licensing & Compatibility
The README does not specify a license.
Limitations & Caveats
The project focuses on a specific dataset and LlamaIndex version (0.9.21), which may limit generalizability. The extensive benchmark tables, while informative, do not include code for reproducing them directly. The "cost_time" metric for re-ranked ensembles appears exceptionally high, suggesting potential issues or specific experimental conditions.
1 week ago
1 day