RefChecker by amazon-science

Framework for detecting LLM-generated fine-grained hallucinations

Created 2 years ago

410 stars

Top 71.2% on SourcePulse

Project Summary

RefChecker offers a standardized framework for detecting fine-grained hallucinations in Large Language Model (LLM) outputs. It breaks down LLM responses into knowledge triplets (subject, predicate, object) for precise fact-checking against provided or retrieved context. This approach is beneficial for researchers and developers evaluating LLM truthfulness across various settings like zero-context, RAG, and summarization.

How It Works

RefChecker employs a three-stage pipeline: a claim extractor ($E$), a hallucination checker ($C$), and aggregation rules ($\tau$). Claims are extracted as knowledge triplets, enabling a more granular analysis than sentence-level checks. The checker then verifies these triplets against references, classifying them as Entailment, Neutral, or Contradiction. Finally, aggregation rules combine these triplet-level judgments into an overall assessment of the response's factuality. This modular design allows for individual component customization and supports both LLM-based and NLI-based checkers.

Quick Start & Requirements

Install via pip: pip install refchecker
Download spaCy model: python -m spacy download en_core_web_sm
Optional dependencies for open-source extractors or acceleration: pip install refchecker[open-extractor,repcex]
Supports various LLMs via litellm and open-source models via vllm.
API keys are required for services like OpenAI and Amazon Bedrock.
Documentation: https://github.com/amazon-science/RefChecker#readme
Demo Website: Available via setup instructions.

Highlighted Details

Fine-grained hallucination detection by decomposing responses into knowledge triplets.
Supports three context settings: Zero Context (with optional retrieval), Noisy Context (RAG), and Accurate Context (Summarization, QA).
Includes a benchmark dataset with 2.1k human-annotated LLM responses across 7 popular LLMs.
Offers a modular pipeline with customizable extractors, checkers (LLM-based, NLI-based), and aggregation rules.

Maintenance & Community

Project paper available on Arxiv: https://arxiv.org/pdf/2405.14486
Active development with recent updates including joint claim checking and broader LLM support.
Contributions are welcomed via pull requests.

Licensing & Compatibility

Licensed under the Apache-2.0 License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Triplet extraction is a sampling approach and may miss complex semantics or contextual nuances. Evidence localization, mapping triplets back to source text, is challenging and may require multiple reasoning steps. The current benchmark primarily focuses on QA and summarization, with plans to expand task coverage.

RefChecker by amazon-science

Explore Similar Projects

context-cite by MadryLab

verdict by haizelabs

codebase-digest by kamilstanuch

exa-hallucination-detector by exa-labs

chain-of-draft by sileix

awesome-local-llms by vince-lam

fact-checker by jagilley

OpenFactVerification by Libr-AI

ReProver by lean-dojo

DeepSeek-Prover-V2 by deepseek-ai

WFGY by onestardao

optillm by algorithmicsuperintelligence