hallucination_probes  by obalcells

Real-time hallucination detection for long-form text generation

Created 1 month ago
258 stars

Top 98.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a system for the real-time detection of hallucinated entities within long-form text generated by large language models. It is designed for researchers and developers seeking to improve the factual accuracy and trustworthiness of LLM outputs, offering a mechanism to identify and flag potentially fabricated information as it's generated.

How It Works

The core approach involves training "probe heads" that attach to existing LLM architectures. These probes analyze token-level probabilities during the generation process to identify entities that are likely hallucinations. The demo backend leverages vLLM for efficient inference, enabling these probes to compute confidence scores in real-time, which are then visualized by a Streamlit frontend. This allows for immediate feedback on the factual grounding of generated text.

Quick Start & Requirements

  • Installation: Requires Python 3.10. Setup involves installing uv (uv python install 3.10), creating a virtual environment (uv venv --python 3.10), and syncing dependencies (uv sync). Environment variables must be configured by copying env.example to .env.
  • Prerequisites: Python 3.10, uv package manager, Anthropic API key, Hugging Face write token (for uploading datasets), and a Modal account for the demo UI.
  • Links: Paper: arxiv.org/abs/2509.03531, Project Website: hallucination-probes.com, Modal Signup: modal.com/signup.

Highlighted Details

  • Real-time, token-level hallucination detection integrated directly into the generation pipeline.
  • An annotation pipeline that utilizes frontier LLMs and web search for labeling entities and spans.
  • A comprehensive demo UI featuring a Modal backend (with vLLM) and a Streamlit frontend for interactive visualization.
  • Provides access to long-form datasets including LongFact, LongFact++, and HealthBench annotations via Hugging Face.

Maintenance & Community

The README lists authors for the associated paper but does not provide explicit links to community channels (e.g., Discord, Slack), a roadmap, or details on ongoing maintenance or sponsorships.

Licensing & Compatibility

No open-source license is specified in the provided README. This lack of explicit licensing information may pose compatibility concerns for commercial use or integration into closed-source projects.

Limitations & Caveats

The system requires specific API keys for Anthropic and Hugging Face, and the demo functionality depends on setting up a Modal account. The codebase is tied to Python 3.10, and specific environment variables must be configured for training and annotation pipelines.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
34 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

HaluEval by RUCAIBox

0.4%
516
Benchmark dataset for LLM hallucination evaluation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.