Framework for detecting LLM-generated fine-grained hallucinations
Top 75.9% on sourcepulse
RefChecker offers a standardized framework for detecting fine-grained hallucinations in Large Language Model (LLM) outputs. It breaks down LLM responses into knowledge triplets (subject, predicate, object) for precise fact-checking against provided or retrieved context. This approach is beneficial for researchers and developers evaluating LLM truthfulness across various settings like zero-context, RAG, and summarization.
How It Works
RefChecker employs a three-stage pipeline: a claim extractor ($E$), a hallucination checker ($C$), and aggregation rules ($\tau$). Claims are extracted as knowledge triplets, enabling a more granular analysis than sentence-level checks. The checker then verifies these triplets against references, classifying them as Entailment, Neutral, or Contradiction. Finally, aggregation rules combine these triplet-level judgments into an overall assessment of the response's factuality. This modular design allows for individual component customization and supports both LLM-based and NLI-based checkers.
Quick Start & Requirements
pip install refchecker
python -m spacy download en_core_web_sm
pip install refchecker[open-extractor,repcex]
litellm
and open-source models via vllm
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Triplet extraction is a sampling approach and may miss complex semantics or contextual nuances. Evidence localization, mapping triplets back to source text, is challenging and may require multiple reasoning steps. The current benchmark primarily focuses on QA and summarization, with plans to expand task coverage.
2 months ago
1 week