Open-source RAG evaluation toolkit
Top 95.8% on sourcepulse
This Python package provides an open-source toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, targeting developers and researchers. It offers a flexible framework to measure RAG quality using metrics that do not require golden chunks or answers, enabling easier and more scalable evaluation.
How It Works
The toolkit employs techniques like UMBRELA and AutoNuggetizer for evaluation without ground truth data. It processes RAG outputs through Evaluators (e.g., TRECEvaluator, ConsistencyEvaluator), which apply various Metrics (e.g., HHEM Score, BERTScore, ROUGE-L) to assess aspects like faithfulness and relevance. Results are reported via detailed CSVs and visualizations, with an option for chained evaluations.
Quick Start & Requirements
pip install -e .
(from source, recommended for samples) or pip install open-rag-eval
.OPENAI_API_KEY
), Vectara account and API key (for Vectara connector).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The ROUGE-L metric's accuracy may degrade for non-English languages due to its reliance on syntactic alignment without language-specific preprocessing. Some metrics require an OpenAI API key.
3 weeks ago
Inactive