lm-polygraph by IINemo

Framework for uncertainty estimation in LLM text generation

Created 2 years ago

414 stars

Top 70.7% on SourcePulse

Project Summary

LM-Polygraph provides a comprehensive Python framework for evaluating uncertainty estimation (UE) methods in Large Language Models (LLMs) for text generation. It aims to make LLM applications safer by identifying potential hallucinations through confidence scores, targeting researchers and developers working with LLMs.

How It Works

The framework supports "white-box" (full model access), "grey-box" (access to token probabilities via logprobs), and "black-box" (API-based) model interactions. It implements a wide array of state-of-the-art UE techniques, categorized into information-based, meaning diversity, ensembling, and density-based methods. This flexible architecture allows for consistent benchmarking and integration with various LLM architectures and APIs.

Quick Start & Requirements

Installation:
- From PyPI: pip install lm-polygraph
- From GitHub (for notebooks/benchmarks): git clone ... && cd lm-polygraph && pip install . (checkout to a stable release tag recommended).
Prerequisites: Python 3, Hugging Face Transformers. GPU with CUDA is recommended for performance. OpenAI-compatible endpoints require setting OPENAI_BASE_URL and OPENAI_API_KEY.
Resources: Demo application can be run via Docker. GPU memory requirements depend on the LLM used.
Links: Basic Usage, Demo Web Application, Documentation

Highlighted Details

Supports over 30 uncertainty estimation methods, including novel approaches like LUQ and Kernel Language Entropy.
Includes an extendable benchmark for consistent evaluation of UE techniques.
Offers a demo web application that integrates confidence scores into chat dialogues.
Compatible with Hugging Face models (e.g., BLOOMz, LLaMA-2) and OpenAI APIs (e.g., GPT-3.5-turbo, GPT-4).

Maintenance & Community

The project is associated with EMNLP 2023 and has recent arXiv publications, indicating active development. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The project appears to be under a permissive license, likely MIT or Apache, based on common open-source practices, but a specific license is not explicitly stated in the README. This suggests good compatibility for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that code from the main branch may be unstable. Some UE methods require training data, and certain demo applications might require Colab Pro for larger models due to memory constraints.

lm-polygraph by IINemo

Explore Similar Projects

LLM-Uncertainty-Bench by smartyfh

RGB by chen700564

Awesome-LLM-Uncertainty-Reliability-Robustness by jxzhangjhu

semantic_uncertainty by jlko

fmeval by aws

enn by google-deepmind

UQ360 by IBM

TrustLLM by HowieHwong

selfcheckgpt by potsawee

awesome-hallucination-detection by EdinburghNLP

uqlm by cvs-health

dcai-lab by dcai-course