open-rag-eval  by vectara

Open-source RAG evaluation toolkit

Created 9 months ago
299 stars

Top 89.0% on SourcePulse

GitHubView on GitHub
Project Summary

This Python package provides an open-source toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, targeting developers and researchers. It offers a flexible framework to measure RAG quality using metrics that do not require golden chunks or answers, enabling easier and more scalable evaluation.

How It Works

The toolkit employs techniques like UMBRELA and AutoNuggetizer for evaluation without ground truth data. It processes RAG outputs through Evaluators (e.g., TRECEvaluator, ConsistencyEvaluator), which apply various Metrics (e.g., HHEM Score, BERTScore, ROUGE-L) to assess aspects like faithfulness and relevance. Results are reported via detailed CSVs and visualizations, with an option for chained evaluations.

Quick Start & Requirements

  • Installation: pip install -e . (from source, recommended for samples) or pip install open-rag-eval.
  • Prerequisites: Python 3.9+, OpenAI API Key (environment variable OPENAI_API_KEY), Vectara account and API key (for Vectara connector).
  • Setup: Requires cloning the repo and configuring YAML files with API keys and corpus details for Vectara integration.
  • Links: Getting Started Guide, Open Evaluation Viewer

Highlighted Details

  • Implements TREC-RAG benchmark evaluation metrics.
  • Supports connectors for Vectara, LlamaIndex, and LangChain.
  • Offers detailed reporting with per-query scores and intermediate outputs.
  • Includes visualization utilities for comparing results locally or via a web viewer.

Maintenance & Community

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The ROUGE-L metric's accuracy may degrade for non-English languages due to its reliance on syntactic alignment without language-specific preprocessing. Some metrics require an OpenAI API key.

Health Check
Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
4
Issues (30d)
2
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
1 more.

AutoRAG by Marker-Inc-Korea

0.3%
4k
RAG AutoML tool for optimizing RAG pipelines
Created 1 year ago
Updated 2 days ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

opik by comet-ml

1.7%
14k
Open-source LLM evaluation framework for RAG, agents, and more
Created 2 years ago
Updated 16 hours ago
Feedback? Help us improve.