CRAG by facebookresearch

Advancing RAG research with a comprehensive factual question answering benchmark

Created 2 years ago

261 stars

Top 97.3% on SourcePulse

Project Summary

Summary

CRAG (Comprehensive RAG Benchmark) is a rich, factual question-answering benchmark designed to advance research in Retrieval-Augmented Generation (RAG) systems. It targets researchers and engineers developing RAG models, providing a diverse dataset and simulated API environments to rigorously evaluate system performance across varied question complexities, entity popularity, and temporal dynamisms, enabling more robust RAG development.

How It Works

The project features a comprehensive dataset spanning five domains and eight question categories, reflecting diverse entity popularity (popular to long-tail) and temporal dynamisms (years to seconds). It simulates information retrieval via mock APIs for web and knowledge graph interactions. RAG system responses are evaluated using a multi-tiered scoring method: 'perfect' (correct, no hallucination), 'acceptable' (useful, minor errors), 'missing' (no information), and 'incorrect' (wrong/irrelevant). Automated evaluation employs rule-based matching and LLM assessment for correctness (+1 correct, 0 missing, -1 incorrect).

Quick Start & Requirements

Installation: Execute pip install -r requirements.txt.
Prerequisites: Dependencies are detailed within requirements.txt. Custom model implementation and configuration instructions are in models/README.md and models/user_config.py, respectively. An example uses llama3-8b-instruct.
Links: Dataset: docs/dataset.md. Mock APIs: mock_api. Evaluation: local_evaluation.py. Baselines: docs/baselines.md.

Highlighted Details

Comprehensive benchmark for RAG systems, focusing on factual question answering.
Dataset is highly diverse: five domains, eight categories, varied entity popularity, and dynamic temporal aspects.
Includes mock APIs to simulate web and knowledge graph search environments.
Robust auto-evaluation system combining rule-based matching and LLM assessment for response correctness.
Provides three baseline RAG models for testing and comparison.

Maintenance & Community

This repository is a migration from meta-comprehensive-rag-benchmark-kdd-cup-2024. The README does not detail community channels (e.g., Discord, Slack), roadmaps, or notable contributors/sponsorships.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license permits sharing and adaptation but explicitly prohibits commercial use.

Limitations & Caveats

The primary adoption blocker is the CC BY-NC 4.0 license, forbidding commercial applications. The README excerpt does not specify other technical limitations, alpha/beta status, known bugs, or platform constraints.

CRAG by facebookresearch

Explore Similar Projects

LLM-Factuality-Survey by wangcunxiang

bergen by naver

ircot by StonyBrookNLP

RGB by chen700564

MultiHop-RAG by yixuantt

enterprise-rag-challenge by trustbit

rag-demystified by pchunduri6

primeqa by primeqa

RAG-Interview-Questions-and-Answers-Hub by KalyanKS-NLP

Awesome-LLM-RAG by jxzhangjhu

HiRAG by hhy-huang

LLMTest_NeedleInAHaystack by gkamradt