Discover and explore top open-source AI tools and projects—updated daily.
trustbitEvaluating RAG system accuracy on enterprise data
Top 97.9% on SourcePulse
Enterprise RAG Challenge
This repository hosts a challenge designed to rigorously test the accuracy and hallucination rates of Retrieval Augmented Generation (RAG) systems when processing enterprise data, specifically public annual reports. It targets developers, researchers, and power users building or evaluating LLM-driven assistants, offering a quantifiable benchmark for system performance in real-world information retrieval tasks. The primary benefit is objective comparison of RAG approaches against a standardized, verifiable dataset and question set.
How It Works
The challenge centers on participants deploying RAG systems capable of ingesting a provided corpus of public annual reports and answering a predefined set of questions. Accuracy is paramount, with a focus on detecting factual correctness and minimizing hallucinations. The system employs a unique, verifiable approach to ensure fairness: random seeds are generated using public blockchain APIs, and a deterministic question generator uses these seeds to create identical question sets for all participants, guaranteeing a level playing field.
Quick Start & Requirements
gen_seed.py, main.py). Specific hardware or software dependencies beyond standard Python environments are not detailed but are expected for RAG system implementation (e.g., vector databases, LLM APIs).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are present in the provided README.
Licensing & Compatibility
The repository is released under the Apache 2.0 license. This permissive license allows for unrestricted use, modification, and distribution, including for commercial purposes and integration into closed-source projects, provided attribution and license terms are followed.
Limitations & Caveats
The full 46GB dataset is not provided upfront, requiring participants to potentially gather or manage large data volumes. The challenge strictly measures accuracy and hallucination rates, excluding other critical RAG performance metrics like latency, cost, or throughput. Some questions may be intentionally nonsensical or refer to companies not present in the provided documents to specifically test hallucination handling.
9 months ago
Inactive
NirDiamant