open-rag-eval by vectara

Open-source RAG evaluation toolkit

Created 1 year ago

333 stars

Top 82.5% on SourcePulse

Project Summary

This Python package provides an open-source toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, targeting developers and researchers. It offers a flexible framework to measure RAG quality using metrics that do not require golden chunks or answers, enabling easier and more scalable evaluation.

How It Works

The toolkit employs techniques like UMBRELA and AutoNuggetizer for evaluation without ground truth data. It processes RAG outputs through Evaluators (e.g., TRECEvaluator, ConsistencyEvaluator), which apply various Metrics (e.g., HHEM Score, BERTScore, ROUGE-L) to assess aspects like faithfulness and relevance. Results are reported via detailed CSVs and visualizations, with an option for chained evaluations.

Quick Start & Requirements

Installation: pip install -e . (from source, recommended for samples) or pip install open-rag-eval.
Prerequisites: Python 3.9+, OpenAI API Key (environment variable OPENAI_API_KEY), Vectara account and API key (for Vectara connector).
Setup: Requires cloning the repo and configuring YAML files with API keys and corpus details for Vectara integration.
Links: Getting Started Guide, Open Evaluation Viewer

Highlighted Details

Implements TREC-RAG benchmark evaluation metrics.
Supports connectors for Vectara, LlamaIndex, and LangChain.
Offers detailed reporting with per-query scores and intermediate outputs.
Includes visualization utilities for comparing results locally or via a web viewer.

Maintenance & Community

Developed by Vectara.
Contributions, issues, and feature requests are welcome.
Links: GitHub Issues, Contributing Guide

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The ROUGE-L metric's accuracy may degrade for non-English languages due to its reliance on syntactic alignment without language-specific preprocessing. Some metrics require an OpenAI API key.

open-rag-eval by vectara

Explore Similar Projects

RAGTune by misbahsy

RAGLAB by fate-ubw

continuous-eval by relari-ai

beyondllm by aiplanethub

PAI-RAG by aigc-apps

dingo by MigoXLab

Awesome-RAG by Danielskry

olmes by allenai

RAGChecker by amazon-science

canopy by pinecone-io

FlashRAG by RUC-NLPIR

AutoRAG by Marker-Inc-Korea