ARES by stanford-futuredata

RAG evaluation framework

Created 2 years ago

684 stars

Top 49.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Gregor Zunic

Cofounder of Browser Use

Project Summary

ARES is an automated framework for evaluating Retrieval-Augmented Generation (RAG) systems, designed for researchers and developers. It automates the assessment of context relevance, answer faithfulness, and answer relevance by combining synthetic data generation with fine-tuned classifiers, significantly reducing the need for manual annotation.

How It Works

ARES employs Prediction-Powered Inference (PPI) and synthetic data generation. It uses fine-tuned classifiers trained on synthetically generated queries and answers, alongside human-annotated data, to evaluate RAG outputs. This approach allows for accurate assessments with statistical confidence, even when dealing with model response variability. The framework is model-agnostic, enabling evaluation of custom RAG pipelines.

Quick Start & Requirements

Installation: pip install ares-ai
API Keys: Set OPENAI_API_KEY or TOGETHER_API_KEY environment variables.
Requirements: Requires a human preference validation set (50-hundreds of examples), few-shot examples for scoring, and a larger set of unlabeled query-document-answer triples from the RAG system.
Datasets: Example datasets can be downloaded using wget commands provided in the README. The full NQ dataset (37.3 GB) can be fetched via ares.KILT_dataset("nq").
Documentation: https://github.com/stanford-futuredata/ARES#documentation

Highlighted Details

Supports local model execution via vLLM for enhanced privacy and offline capabilities.
Provides tools for synthetic query generation and classifier training.
Offers direct comparison of RAG configurations and evaluation against ground truth.
Includes example configurations for UES/IDP scoring, PPI evaluation, and classifier training.

Maintenance & Community

Developed by researchers from Stanford University.
Contact emails provided for questions: jonsaadfalcon@stanford.edu, manihani@stanford.edu.
Citation details provided for academic referencing.

Licensing & Compatibility

The README does not explicitly state a license. The project is hosted by Stanford University, implying a research-oriented license, but specific terms are not detailed.

Limitations & Caveats

The framework requires significant computational resources, including over 100 GB of disk space and powerful GPUs (A100 recommended). Smaller GPUs may encounter CUDA out-of-memory errors. Setup on cloud VMs requires manual installation of Conda, GCC, and NVIDIA drivers.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days