RAGTruth  by ParticleMedia

Hallucination corpus and evaluation tools for trustworthy RAG

Created 2 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> RAGTruth addresses the critical issue of hallucinations in Retrieval-Augmented Generation (RAG) systems. It offers a comprehensive, word-level hallucination corpus derived from diverse LLM responses across various RAG tasks (QA, Data2txt, Summary). This dataset empowers researchers and engineers to train and rigorously evaluate RAG models, fostering the development of more trustworthy and reliable AI.

How It Works

The project provides nearly 18,000 manually annotated responses generated by multiple LLMs under RAG conditions. Annotations are granular, identifying specific hallucination spans, their types (e.g., Evident Baseless Info, implicit_true), and intensity. This detailed labeling facilitates precise measurement and targeted mitigation of factual inaccuracies and unsupported claims within LLM outputs.

Quick Start & Requirements

Training and evaluation code were released in June 2024. Model weights are also available. Specific installation commands, dependencies (e.g., Python version, CUDA), or setup resource estimates are not detailed in the provided README excerpt.

Highlighted Details

  • Corpus comprises ~18,000 responses from diverse LLMs across QA, Data-to-Text, and Summarization tasks.
  • Features meticulous, word-level annotations of hallucination spans, types, and intensity.
  • Includes detailed statistics on dataset composition and hallucination distribution by task and LLM.
  • Data format includes response.jsonl and source_info.jsonl with comprehensive fields for analysis.

Maintenance & Community

The project has seen recent updates in January, February, and June 2024, indicating active maintenance. No specific community channels (e.g., Discord, Slack) or detailed contributor information are present in the excerpt.

Licensing & Compatibility

The provided README excerpt does not specify a software license. This lack of clarity may impact commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not detail any specific limitations, known bugs, or alpha status. The focus is on the dataset's utility for hallucination research.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

HaluEval by RUCAIBox

0.3%
591
Benchmark dataset for LLM hallucination evaluation
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.