CRAG  by facebookresearch

Advancing RAG research with a comprehensive factual question answering benchmark

Created 1 year ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

CRAG (Comprehensive RAG Benchmark) is a rich, factual question-answering benchmark designed to advance research in Retrieval-Augmented Generation (RAG) systems. It targets researchers and engineers developing RAG models, providing a diverse dataset and simulated API environments to rigorously evaluate system performance across varied question complexities, entity popularity, and temporal dynamisms, enabling more robust RAG development.

How It Works

The project features a comprehensive dataset spanning five domains and eight question categories, reflecting diverse entity popularity (popular to long-tail) and temporal dynamisms (years to seconds). It simulates information retrieval via mock APIs for web and knowledge graph interactions. RAG system responses are evaluated using a multi-tiered scoring method: 'perfect' (correct, no hallucination), 'acceptable' (useful, minor errors), 'missing' (no information), and 'incorrect' (wrong/irrelevant). Automated evaluation employs rule-based matching and LLM assessment for correctness (+1 correct, 0 missing, -1 incorrect).

Quick Start & Requirements

  • Installation: Execute pip install -r requirements.txt.
  • Prerequisites: Dependencies are detailed within requirements.txt. Custom model implementation and configuration instructions are in models/README.md and models/user_config.py, respectively. An example uses llama3-8b-instruct.
  • Links: Dataset: docs/dataset.md. Mock APIs: mock_api. Evaluation: local_evaluation.py. Baselines: docs/baselines.md.

Highlighted Details

  • Comprehensive benchmark for RAG systems, focusing on factual question answering.
  • Dataset is highly diverse: five domains, eight categories, varied entity popularity, and dynamic temporal aspects.
  • Includes mock APIs to simulate web and knowledge graph search environments.
  • Robust auto-evaluation system combining rule-based matching and LLM assessment for response correctness.
  • Provides three baseline RAG models for testing and comparison.

Maintenance & Community

This repository is a migration from meta-comprehensive-rag-benchmark-kdd-cup-2024. The README does not detail community channels (e.g., Discord, Slack), roadmaps, or notable contributors/sponsorships.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license permits sharing and adaptation but explicitly prohibits commercial use.

Limitations & Caveats

The primary adoption blocker is the CC BY-NC 4.0 license, forbidding commercial applications. The README excerpt does not specify other technical limitations, alpha/beta status, known bugs, or platform constraints.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.