open-rag-eval  by vectara

Open-source RAG evaluation toolkit

Created 11 months ago
316 stars

Top 85.3% on SourcePulse

GitHubView on GitHub
Project Summary

This Python package provides an open-source toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, targeting developers and researchers. It offers a flexible framework to measure RAG quality using metrics that do not require golden chunks or answers, enabling easier and more scalable evaluation.

How It Works

The toolkit employs techniques like UMBRELA and AutoNuggetizer for evaluation without ground truth data. It processes RAG outputs through Evaluators (e.g., TRECEvaluator, ConsistencyEvaluator), which apply various Metrics (e.g., HHEM Score, BERTScore, ROUGE-L) to assess aspects like faithfulness and relevance. Results are reported via detailed CSVs and visualizations, with an option for chained evaluations.

Quick Start & Requirements

  • Installation: pip install -e . (from source, recommended for samples) or pip install open-rag-eval.
  • Prerequisites: Python 3.9+, OpenAI API Key (environment variable OPENAI_API_KEY), Vectara account and API key (for Vectara connector).
  • Setup: Requires cloning the repo and configuring YAML files with API keys and corpus details for Vectara integration.
  • Links: Getting Started Guide, Open Evaluation Viewer

Highlighted Details

  • Implements TREC-RAG benchmark evaluation metrics.
  • Supports connectors for Vectara, LlamaIndex, and LangChain.
  • Offers detailed reporting with per-query scores and intermediate outputs.
  • Includes visualization utilities for comparing results locally or via a web viewer.

Maintenance & Community

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The ROUGE-L metric's accuracy may degrade for non-English languages due to its reliance on syntactic alignment without language-specific preprocessing. Some metrics require an OpenAI API key.

Health Check
Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
6
Issues (30d)
4
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
1 more.

AutoRAG by Marker-Inc-Korea

0.2%
4k
RAG AutoML tool for optimizing RAG pipelines
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.