Discover and explore top open-source AI tools and projects—updated daily.
facebookresearchAdvancing RAG research with a comprehensive factual question answering benchmark
Top 99.1% on SourcePulse
Summary
CRAG (Comprehensive RAG Benchmark) is a rich, factual question-answering benchmark designed to advance research in Retrieval-Augmented Generation (RAG) systems. It targets researchers and engineers developing RAG models, providing a diverse dataset and simulated API environments to rigorously evaluate system performance across varied question complexities, entity popularity, and temporal dynamisms, enabling more robust RAG development.
How It Works
The project features a comprehensive dataset spanning five domains and eight question categories, reflecting diverse entity popularity (popular to long-tail) and temporal dynamisms (years to seconds). It simulates information retrieval via mock APIs for web and knowledge graph interactions. RAG system responses are evaluated using a multi-tiered scoring method: 'perfect' (correct, no hallucination), 'acceptable' (useful, minor errors), 'missing' (no information), and 'incorrect' (wrong/irrelevant). Automated evaluation employs rule-based matching and LLM assessment for correctness (+1 correct, 0 missing, -1 incorrect).
Quick Start & Requirements
pip install -r requirements.txt.requirements.txt. Custom model implementation and configuration instructions are in models/README.md and models/user_config.py, respectively. An example uses llama3-8b-instruct.Highlighted Details
Maintenance & Community
This repository is a migration from meta-comprehensive-rag-benchmark-kdd-cup-2024. The README does not detail community channels (e.g., Discord, Slack), roadmaps, or notable contributors/sponsorships.
Licensing & Compatibility
Licensed under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license permits sharing and adaptation but explicitly prohibits commercial use.
Limitations & Caveats
The primary adoption blocker is the CC BY-NC 4.0 license, forbidding commercial applications. The README excerpt does not specify other technical limitations, alpha/beta status, known bugs, or platform constraints.
7 months ago
Inactive
NirDiamant