hallucination_probes by obalcells

Real-time hallucination detection for long-form text generation

Created 3 months ago

269 stars

Top 95.6% on SourcePulse

Project Summary

This project provides a system for the real-time detection of hallucinated entities within long-form text generated by large language models. It is designed for researchers and developers seeking to improve the factual accuracy and trustworthiness of LLM outputs, offering a mechanism to identify and flag potentially fabricated information as it's generated.

How It Works

The core approach involves training "probe heads" that attach to existing LLM architectures. These probes analyze token-level probabilities during the generation process to identify entities that are likely hallucinations. The demo backend leverages vLLM for efficient inference, enabling these probes to compute confidence scores in real-time, which are then visualized by a Streamlit frontend. This allows for immediate feedback on the factual grounding of generated text.

Quick Start & Requirements

Installation: Requires Python 3.10. Setup involves installing uv (uv python install 3.10), creating a virtual environment (uv venv --python 3.10), and syncing dependencies (uv sync). Environment variables must be configured by copying env.example to .env.
Prerequisites: Python 3.10, uv package manager, Anthropic API key, Hugging Face write token (for uploading datasets), and a Modal account for the demo UI.
Links: Paper: arxiv.org/abs/2509.03531, Project Website: hallucination-probes.com, Modal Signup: modal.com/signup.

Highlighted Details

Real-time, token-level hallucination detection integrated directly into the generation pipeline.
An annotation pipeline that utilizes frontier LLMs and web search for labeling entities and spans.
A comprehensive demo UI featuring a Modal backend (with vLLM) and a Streamlit frontend for interactive visualization.
Provides access to long-form datasets including LongFact, LongFact++, and HealthBench annotations via Hugging Face.

Maintenance & Community

The README lists authors for the associated paper but does not provide explicit links to community channels (e.g., Discord, Slack), a roadmap, or details on ongoing maintenance or sponsorships.

Licensing & Compatibility

No open-source license is specified in the provided README. This lack of explicit licensing information may pose compatibility concerns for commercial use or integration into closed-source projects.

Limitations & Caveats

The system requires specific API keys for Anthropic and Hugging Face, and the demo functionality depends on setting up a Modal account. The codebase is tied to Python 3.10, and specific environment variables must be configured for training and annotation pipelines.

hallucination_probes by obalcells

Explore Similar Projects

Awesome-LLM-hallucination by LuckyyySTA

HaluEval by RUCAIBox

LettuceDetect by KRLabsOrg

Binoculars by ahans30

ComfyUI_VLM_nodes by gokayfem

semantic_uncertainty by jlko

selfcheckgpt by potsawee

awesome-hallucination-detection by EdinburghNLP

unlocking-the-power-of-llms by howl-anderson

automatic_prompt_engineer by keirp

nextpy by dot-agent

augmentoolkit by e-p-armstrong