selfcheckgpt  by potsawee

Hallucination detection research paper for generative LLMs using black-box methods

created 2 years ago
548 stars

Top 59.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SelfCheckGPT provides zero-resource, black-box hallucination detection for generative LLMs. It's designed for researchers and developers evaluating LLM outputs, offering sentence-level consistency scores without needing access to the LLM's internal workings or training data.

How It Works

The library implements several variants of the self-check approach: BERTScore, Question-Answering (MQAG), n-gram, NLI, and LLM-Prompting. These methods compare a generated passage against multiple sampled variations of the same passage. For instance, BERTScore measures semantic similarity, MQAG generates and answers questions about the text, n-gram checks for distributional shifts, NLI assesses entailment/contradiction between sentences and samples, and LLM-Prompting uses another LLM to judge consistency. This ensemble of techniques allows for robust hallucination detection by leveraging different linguistic and semantic signals.

Quick Start & Requirements

  • Install via pip: pip install selfcheckgpt
  • Requires torch and spacy. Download a spaCy model (e.g., python -m spacy download en_core_web_sm).
  • GPU with CUDA is recommended for performance.
  • See demo notebooks for detailed usage examples: demo/SelfCheck_demo1.ipynb

Highlighted Details

  • Offers five distinct hallucination detection methods: BERTScore, MQAG, N-gram, NLI, and LLM-Prompting.
  • SelfCheck-Prompt using gpt-3.5-turbo achieved the highest performance (AUC-PR 93.42 for NonFact) on the wiki_bio_gpt3_hallucination dataset.
  • Includes an implementation of MQAG (Multiple-choice Question Answering and Generation) from prior work.
  • Provides access to the wiki_bio_gpt3_hallucination dataset via Hugging Face Datasets or direct download.

Maintenance & Community

  • Paper accepted at EMNLP 2023.
  • Codebase appears actively maintained with recent updates and analysis.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository name and common practice suggest it might be MIT or Apache 2.0, but this requires verification.
  • Compatibility for commercial use is dependent on the unstated license.

Limitations & Caveats

  • The N-gram method's scores are not bounded, unlike BERTScore and MQAG.
  • LLM-Prompting requires API keys for services like OpenAI or Groq, or local setup for HuggingFace models, introducing external dependencies and potential costs.
  • The effectiveness of NLI and LLM-Prompting methods relies on the quality of the underlying NLI model and the prompted LLM, respectively.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

HaluEval by RUCAIBox

0.2%
497
Benchmark dataset for LLM hallucination evaluation
created 2 years ago
updated 1 year ago
Feedback? Help us improve.