open-rag-eval  by vectara

Open-source RAG evaluation toolkit

created 8 months ago
271 stars

Top 95.8% on sourcepulse

GitHubView on GitHub
Project Summary

This Python package provides an open-source toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, targeting developers and researchers. It offers a flexible framework to measure RAG quality using metrics that do not require golden chunks or answers, enabling easier and more scalable evaluation.

How It Works

The toolkit employs techniques like UMBRELA and AutoNuggetizer for evaluation without ground truth data. It processes RAG outputs through Evaluators (e.g., TRECEvaluator, ConsistencyEvaluator), which apply various Metrics (e.g., HHEM Score, BERTScore, ROUGE-L) to assess aspects like faithfulness and relevance. Results are reported via detailed CSVs and visualizations, with an option for chained evaluations.

Quick Start & Requirements

  • Installation: pip install -e . (from source, recommended for samples) or pip install open-rag-eval.
  • Prerequisites: Python 3.9+, OpenAI API Key (environment variable OPENAI_API_KEY), Vectara account and API key (for Vectara connector).
  • Setup: Requires cloning the repo and configuring YAML files with API keys and corpus details for Vectara integration.
  • Links: Getting Started Guide, Open Evaluation Viewer

Highlighted Details

  • Implements TREC-RAG benchmark evaluation metrics.
  • Supports connectors for Vectara, LlamaIndex, and LangChain.
  • Offers detailed reporting with per-query scores and intermediate outputs.
  • Includes visualization utilities for comparing results locally or via a web viewer.

Maintenance & Community

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The ROUGE-L metric's accuracy may degrade for non-English languages due to its reliance on syntactic alignment without language-specific preprocessing. Some metrics require an OpenAI API key.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
130 stars in the last 90 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Didier Lopes Didier Lopes(Founder of OpenBB), and
4 more.

evidently by evidentlyai

0.4%
6k
Open-source framework for ML/LLM observability
created 4 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

deepeval by confident-ai

2.0%
10k
LLM evaluation framework for unit testing LLM outputs
created 2 years ago
updated 1 day ago
Feedback? Help us improve.