selfcheckgpt  by potsawee

Hallucination detection research paper for generative LLMs using black-box methods

Created 2 years ago
571 stars

Top 56.4% on SourcePulse

GitHubView on GitHub
Project Summary

SelfCheckGPT provides zero-resource, black-box hallucination detection for generative LLMs. It's designed for researchers and developers evaluating LLM outputs, offering sentence-level consistency scores without needing access to the LLM's internal workings or training data.

How It Works

The library implements several variants of the self-check approach: BERTScore, Question-Answering (MQAG), n-gram, NLI, and LLM-Prompting. These methods compare a generated passage against multiple sampled variations of the same passage. For instance, BERTScore measures semantic similarity, MQAG generates and answers questions about the text, n-gram checks for distributional shifts, NLI assesses entailment/contradiction between sentences and samples, and LLM-Prompting uses another LLM to judge consistency. This ensemble of techniques allows for robust hallucination detection by leveraging different linguistic and semantic signals.

Quick Start & Requirements

  • Install via pip: pip install selfcheckgpt
  • Requires torch and spacy. Download a spaCy model (e.g., python -m spacy download en_core_web_sm).
  • GPU with CUDA is recommended for performance.
  • See demo notebooks for detailed usage examples: demo/SelfCheck_demo1.ipynb

Highlighted Details

  • Offers five distinct hallucination detection methods: BERTScore, MQAG, N-gram, NLI, and LLM-Prompting.
  • SelfCheck-Prompt using gpt-3.5-turbo achieved the highest performance (AUC-PR 93.42 for NonFact) on the wiki_bio_gpt3_hallucination dataset.
  • Includes an implementation of MQAG (Multiple-choice Question Answering and Generation) from prior work.
  • Provides access to the wiki_bio_gpt3_hallucination dataset via Hugging Face Datasets or direct download.

Maintenance & Community

  • Paper accepted at EMNLP 2023.
  • Codebase appears actively maintained with recent updates and analysis.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository name and common practice suggest it might be MIT or Apache 2.0, but this requires verification.
  • Compatibility for commercial use is dependent on the unstated license.

Limitations & Caveats

  • The N-gram method's scores are not bounded, unlike BERTScore and MQAG.
  • LLM-Prompting requires API keys for services like OpenAI or Groq, or local setup for HuggingFace models, introducing external dependencies and potential costs.
  • The effectiveness of NLI and LLM-Prompting methods relies on the quality of the underlying NLI model and the prompted LLM, respectively.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

HaluEval by RUCAIBox

0.6%
520
Benchmark dataset for LLM hallucination evaluation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.