RefChecker  by amazon-science

Framework for detecting LLM-generated fine-grained hallucinations

created 1 year ago
382 stars

Top 75.9% on sourcepulse

GitHubView on GitHub
Project Summary

RefChecker offers a standardized framework for detecting fine-grained hallucinations in Large Language Model (LLM) outputs. It breaks down LLM responses into knowledge triplets (subject, predicate, object) for precise fact-checking against provided or retrieved context. This approach is beneficial for researchers and developers evaluating LLM truthfulness across various settings like zero-context, RAG, and summarization.

How It Works

RefChecker employs a three-stage pipeline: a claim extractor ($E$), a hallucination checker ($C$), and aggregation rules ($\tau$). Claims are extracted as knowledge triplets, enabling a more granular analysis than sentence-level checks. The checker then verifies these triplets against references, classifying them as Entailment, Neutral, or Contradiction. Finally, aggregation rules combine these triplet-level judgments into an overall assessment of the response's factuality. This modular design allows for individual component customization and supports both LLM-based and NLI-based checkers.

Quick Start & Requirements

  • Install via pip: pip install refchecker
  • Download spaCy model: python -m spacy download en_core_web_sm
  • Optional dependencies for open-source extractors or acceleration: pip install refchecker[open-extractor,repcex]
  • Supports various LLMs via litellm and open-source models via vllm.
  • API keys are required for services like OpenAI and Amazon Bedrock.
  • Documentation: https://github.com/amazon-science/RefChecker#readme
  • Demo Website: Available via setup instructions.

Highlighted Details

  • Fine-grained hallucination detection by decomposing responses into knowledge triplets.
  • Supports three context settings: Zero Context (with optional retrieval), Noisy Context (RAG), and Accurate Context (Summarization, QA).
  • Includes a benchmark dataset with 2.1k human-annotated LLM responses across 7 popular LLMs.
  • Offers a modular pipeline with customizable extractors, checkers (LLM-based, NLI-based), and aggregation rules.

Maintenance & Community

  • Project paper available on Arxiv: https://arxiv.org/pdf/2405.14486
  • Active development with recent updates including joint claim checking and broader LLM support.
  • Contributions are welcomed via pull requests.

Licensing & Compatibility

  • Licensed under the Apache-2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Triplet extraction is a sampling approach and may miss complex semantics or contextual nuances. Evidence localization, mapping triplets back to source text, is challenging and may require multiple reasoning steps. The current benchmark primarily focuses on QA and summarization, with plans to expand task coverage.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Daniel Han Daniel Han(Cofounder of Unsloth), and
1 more.

synthetic-data-kit by meta-llama

1.6%
1k
Synthetic data CLI tool for LLM fine-tuning
created 4 months ago
updated 1 week ago
Feedback? Help us improve.