pythea by leochlon

LLM hallucination risk calculator and prompt re-engineering toolkit

Created 4 months ago

1,305 stars

Top 30.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This toolkit addresses LLM hallucination risk by providing post-hoc calibration and prompt re-engineering capabilities, specifically for OpenAI models. It enables users to quantify hallucination risk using the Expectation-level Decompression Law (EDFL) and make informed ANSWER/REFUSE decisions under target Service Level Agreements (SLAs), offering transparent mathematical guarantees in "nats". The primary audience includes engineers and researchers seeking to improve LLM reliability and safety without retraining.

How It Works

The core approach leverages the EDFL principle to bound hallucination risk. It generates "rolling priors" by creating ensembles of content-weakened prompts (skeletons) from an original prompt. The system calculates an information budget ($\bar{\Delta}$) and uses two types of priors: an average prior ($\bar{q}$) for the EDFL risk bound and a worst-case prior ($q_{\text{lo}}$) for SLA gating. A decision to ANSWER is made if the information sufficiency ratio (ISR = $\bar{\Delta}$ / B2T) meets a threshold, ensuring conservative safety while providing realistic risk bounds. It supports evidence-based (context erasure) and closed-book (semantic masking) prior generation methods.

Quick Start & Requirements

Primary install: pip install --upgrade openai
Prerequisites: Requires an OpenAI API key (export OPENAI_API_KEY=sk-...). Utilizes the OpenAI Chat Completions API (e.g., gpt-4o, gpt-4o-mini). openai>=1.0.0 is necessary.
Links: Example scripts are provided within the repository for usage guidance.

Highlighted Details

Quantifies hallucination risk and provides explicit ANSWER/REFUSE decisions with transparent mathematical reasoning in "nats".
Supports two distinct modes for generating "rolling priors": Evidence-based (erasing/permuting context) and Closed-book (semantic masking of entities, numbers, titles).
Enables the generation of formal SLA certificates for auditability and compliance.
Offers detailed explanations for observed behaviors on different query types (e.g., arithmetic, factoids), framing them as safety-centric features.

Maintenance & Community

Developed by Hassana Labs (https://hassana.io). The implementation follows the framework described in the paper “Compression Failure in LLMs: Bayesian in Expectation, Not in Realization” (NeurIPS 2024 preprint) and related EDFL/ISR/B2T methodology. No community channels (e.g., Discord, Slack) or explicit roadmap links are provided in the README.

Licensing & Compatibility

Licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The toolkit is strictly limited to OpenAI models and the Chat Completions API. Arithmetic queries may exhibit unexpected abstentions due to pattern recognition leading to low information lift ($\bar{\Delta}$), requiring careful tuning or alternative event definitions (e.g., Correct/Incorrect). Aggressive clipping or prior collapse ($q_{\text{lo}} \to 0$) can also impact performance, necessitating adjustments to parameters like B_clip or the use of prior floors. The bus factor is unclear, as development is attributed to a single entity.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

109 stars in the last 30 days