uqlm  by cvs-health

Python package for LLM hallucination detection using uncertainty quantification

Created 5 months ago
1,036 stars

Top 36.2% on SourcePulse

GitHubView on GitHub
Project Summary

UQLM is a Python library designed for detecting hallucinations in Large Language Models (LLMs) by employing uncertainty quantification (UQ) techniques. It offers a flexible framework for developers and researchers to assess the reliability of LLM outputs, providing confidence scores to identify potential errors or fabricated information.

How It Works

UQLM categorizes UQ scorers into four types: Black-Box (consistency-based), White-Box (token-probability-based), LLM-as-a-Judge, and Ensemble. Black-box methods measure response consistency across multiple generations, offering universal compatibility but higher latency and cost. White-box methods leverage internal token probabilities for efficiency but require model access. LLM-as-a-Judge uses other LLMs for evaluation, allowing customization. Ensemble scorers combine multiple methods for robust estimation, with options for off-the-shelf use or fine-tuning.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of UQ scorers including semantic negentropy, exact match, BERT-score, minimum token probability, and various LLM-as-a-Judge approaches.
  • Offers an ensemble scorer that can be tuned on custom datasets for improved performance.
  • Integrates with LangChain, allowing seamless use with various LLM providers.
  • Provides detailed documentation and example notebooks for each scorer category.

Maintenance & Community

Licensing & Compatibility

  • License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

  • The README does not specify the license, which is crucial for determining commercial usability.
  • Black-box methods incur significant computational cost and latency due to multiple LLM calls.
  • White-box methods are limited to LLMs that provide access to token probabilities.
Health Check
Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
25
Issues (30d)
9
Star History
122 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Addair Travis Addair(Cofounder of Predibase), and
4 more.

alibi by SeldonIO

0.1%
3k
Python library for ML model inspection and interpretation
Created 6 years ago
Updated 15 hours ago
Feedback? Help us improve.