embedding_rerank_retrieval  by percent4

RAG evaluation for retrieval algorithms, using LlamaIndex

created 1 year ago
271 stars

Top 95.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository evaluates retrieval techniques and algorithm performance for the Retrieve stage in RAG systems, using LlamaIndex as the primary framework. It targets researchers and engineers working on improving RAG retrieval accuracy and efficiency by comparing various retrieval methods, embedding models, and re-ranking strategies.

How It Works

The project systematically evaluates different retrieval strategies, including BM25, various embedding models (OpenAI, BGE, BGE-Finetune), and ensemble methods. It further assesses the impact of re-ranking using models like Cohere Rerank, BGE-Base Rerank, and BGE-Large Rerank. The evaluation is performed on a custom dataset (data/doc_qa_test.json) and metrics like hit_rate, mrr, and cost_time are reported for each configuration. The project also explores advanced techniques like Query Rewrite and HyDE to enhance retrieval performance.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.9+, LlamaIndex 0.9.21, potentially API keys for embedding models (OpenAI, Cohere).
  • Data: Requires data/doc_qa_test.json.
  • Resources: Running evaluations involves significant computation, especially for ensemble methods and re-ranking.

Highlighted Details

  • Comprehensive benchmark tables comparing BM25, various embedding models (including fine-tuned BGE variants), and ensemble approaches with and without re-ranking.
  • Detailed analysis of specific query examples demonstrating the strengths and weaknesses of different retrieval methods and the benefits of re-ranking and query rewriting.
  • Exploration of advanced techniques like HyDE (Hypothetical Document Embeddings) and Late Chunking for improved retrieval.
  • Includes notebooks for implementing and experimenting with Late Chunking.

Maintenance & Community

The project appears to be a personal research effort, with no explicit mention of maintainers, community channels, or ongoing development beyond the provided README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The project focuses on a specific dataset and LlamaIndex version (0.9.21), which may limit generalizability. The extensive benchmark tables, while informative, do not include code for reproducing them directly. The "cost_time" metric for re-ranked ensembles appears exceptionally high, suggesting potential issues or specific experimental conditions.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.