Hallucination detection research paper for generative LLMs using black-box methods
Top 59.1% on sourcepulse
SelfCheckGPT provides zero-resource, black-box hallucination detection for generative LLMs. It's designed for researchers and developers evaluating LLM outputs, offering sentence-level consistency scores without needing access to the LLM's internal workings or training data.
How It Works
The library implements several variants of the self-check approach: BERTScore, Question-Answering (MQAG), n-gram, NLI, and LLM-Prompting. These methods compare a generated passage against multiple sampled variations of the same passage. For instance, BERTScore measures semantic similarity, MQAG generates and answers questions about the text, n-gram checks for distributional shifts, NLI assesses entailment/contradiction between sentences and samples, and LLM-Prompting uses another LLM to judge consistency. This ensemble of techniques allows for robust hallucination detection by leveraging different linguistic and semantic signals.
Quick Start & Requirements
pip install selfcheckgpt
torch
and spacy
. Download a spaCy model (e.g., python -m spacy download en_core_web_sm
).Highlighted Details
gpt-3.5-turbo
achieved the highest performance (AUC-PR 93.42 for NonFact) on the wiki_bio_gpt3_hallucination
dataset.wiki_bio_gpt3_hallucination
dataset via Hugging Face Datasets or direct download.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 week