uqlm by cvs-health

Python package for LLM hallucination detection using uncertainty quantification

Created 8 months ago

1,096 stars

Top 34.8% on SourcePulse

Project Summary

UQLM is a Python library designed for detecting hallucinations in Large Language Models (LLMs) by employing uncertainty quantification (UQ) techniques. It offers a flexible framework for developers and researchers to assess the reliability of LLM outputs, providing confidence scores to identify potential errors or fabricated information.

How It Works

UQLM categorizes UQ scorers into four types: Black-Box (consistency-based), White-Box (token-probability-based), LLM-as-a-Judge, and Ensemble. Black-box methods measure response consistency across multiple generations, offering universal compatibility but higher latency and cost. White-box methods leverage internal token probabilities for efficiency but require model access. LLM-as-a-Judge uses other LLMs for evaluation, allowing customization. Ensemble scorers combine multiple methods for robust estimation, with options for off-the-shelf use or fine-tuning.

Quick Start & Requirements

Primary install: pip install uqlm
Prerequisites: LangChain compatible LLM.
Resources: Requires multiple LLM calls for Black-Box and LLM-as-a-Judge methods, potentially increasing cost and latency.
Demos: Black-Box UQ Demo, White-Box UQ Demo, LLM-as-a-Judge Demo, Ensemble Demos

Highlighted Details

Supports a wide range of UQ scorers including semantic negentropy, exact match, BERT-score, minimum token probability, and various LLM-as-a-Judge approaches.
Offers an ensemble scorer that can be tuned on custom datasets for improved performance.
Integrates with LangChain, allowing seamless use with various LLM providers.
Provides detailed documentation and example notebooks for each scorer category.

Maintenance & Community

Associated research paper: Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers.
The project appears to be primarily driven by Dylan Bouchard and Mohit Singh Chauhan.

Licensing & Compatibility

License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify the license, which is crucial for determining commercial usability.
Black-box methods incur significant computational cost and latency due to multiple LLM calls.
White-box methods are limited to LLMs that provide access to token probabilities.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

35

Issues (30d)

4

Star History

14 stars in the last 30 days

Explore Similar Projects

LLM-Uncertainty-Bench by smartyfh

Benchmarking LLMs via uncertainty quantification

Created 2 years ago

Updated 1 year ago

Awesome-LLMs-as-Judges by CSHaitao

Survey paper for LLM-based evaluation methods

Created 1 year ago

Updated 5 months ago

JamesGPT by jconorgrogan

Jailbreak prompt for eliciting LLM biases and beliefs

Created 2 years ago

Updated 2 years ago

Awesome-LLM-Uncertainty-Reliability-Robustness by jxzhangjhu

Curated list of LLM uncertainty, reliability, and robustness resources

Created 2 years ago

Updated 7 months ago

semantic_uncertainty by jlko

Code for reproducing semantic uncertainty research paper experiments

Created 1 year ago

Updated 1 year ago

lm-polygraph by IINemo

Framework for uncertainty estimation in LLM text generation

Created 2 years ago

Updated 1 day ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

UQ360 by IBM

Open-source toolkit for uncertainty estimation/communication in ML models

Created 4 years ago

Updated 3 months ago

PandaLM by WeOpenML

LLM evaluation benchmark for reproducible, automated assessment

Created 2 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

selfcheckgpt by potsawee

Hallucination detection research paper for generative LLMs using black-box methods

Created 2 years ago

Updated 1 year ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

awesome-llm-interpretability by JShollaj

LLM interpretability resources

Created 2 years ago

Updated 6 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

pythea by leochlon

LLM hallucination risk calculator and prompt re-engineering toolkit

Created 4 months ago

Updated 1 day ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

9 more.

arena-hard-auto by lmarena

Automatic LLM benchmark for instruction-tuned models, correlating with human preference

Created 2 years ago

Updated 6 months ago

Feedback? Help us improve.