uqlm  by cvs-health

Python package for LLM hallucination detection using uncertainty quantification

created 3 months ago
822 stars

Top 44.1% on sourcepulse

GitHubView on GitHub
Project Summary

UQLM is a Python library designed for detecting hallucinations in Large Language Models (LLMs) by employing uncertainty quantification (UQ) techniques. It offers a flexible framework for developers and researchers to assess the reliability of LLM outputs, providing confidence scores to identify potential errors or fabricated information.

How It Works

UQLM categorizes UQ scorers into four types: Black-Box (consistency-based), White-Box (token-probability-based), LLM-as-a-Judge, and Ensemble. Black-box methods measure response consistency across multiple generations, offering universal compatibility but higher latency and cost. White-box methods leverage internal token probabilities for efficiency but require model access. LLM-as-a-Judge uses other LLMs for evaluation, allowing customization. Ensemble scorers combine multiple methods for robust estimation, with options for off-the-shelf use or fine-tuning.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of UQ scorers including semantic negentropy, exact match, BERT-score, minimum token probability, and various LLM-as-a-Judge approaches.
  • Offers an ensemble scorer that can be tuned on custom datasets for improved performance.
  • Integrates with LangChain, allowing seamless use with various LLM providers.
  • Provides detailed documentation and example notebooks for each scorer category.

Maintenance & Community

Licensing & Compatibility

  • License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

  • The README does not specify the license, which is crucial for determining commercial usability.
  • Black-box methods incur significant computational cost and latency due to multiple LLM calls.
  • White-box methods are limited to LLMs that provide access to token probabilities.
Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
56
Issues (30d)
2
Star History
841 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

deepeval by confident-ai

2.0%
10k
LLM evaluation framework for unit testing LLM outputs
created 2 years ago
updated 22 hours ago
Feedback? Help us improve.