lm-polygraph  by IINemo

Framework for uncertainty estimation in LLM text generation

created 2 years ago
316 stars

Top 86.7% on sourcepulse

GitHubView on GitHub
Project Summary

LM-Polygraph provides a comprehensive Python framework for evaluating uncertainty estimation (UE) methods in Large Language Models (LLMs) for text generation. It aims to make LLM applications safer by identifying potential hallucinations through confidence scores, targeting researchers and developers working with LLMs.

How It Works

The framework supports "white-box" (full model access), "grey-box" (access to token probabilities via logprobs), and "black-box" (API-based) model interactions. It implements a wide array of state-of-the-art UE techniques, categorized into information-based, meaning diversity, ensembling, and density-based methods. This flexible architecture allows for consistent benchmarking and integration with various LLM architectures and APIs.

Quick Start & Requirements

  • Installation:
    • From PyPI: pip install lm-polygraph
    • From GitHub (for notebooks/benchmarks): git clone ... && cd lm-polygraph && pip install . (checkout to a stable release tag recommended).
  • Prerequisites: Python 3, Hugging Face Transformers. GPU with CUDA is recommended for performance. OpenAI-compatible endpoints require setting OPENAI_BASE_URL and OPENAI_API_KEY.
  • Resources: Demo application can be run via Docker. GPU memory requirements depend on the LLM used.
  • Links: Basic Usage, Demo Web Application, Documentation

Highlighted Details

  • Supports over 30 uncertainty estimation methods, including novel approaches like LUQ and Kernel Language Entropy.
  • Includes an extendable benchmark for consistent evaluation of UE techniques.
  • Offers a demo web application that integrates confidence scores into chat dialogues.
  • Compatible with Hugging Face models (e.g., BLOOMz, LLaMA-2) and OpenAI APIs (e.g., GPT-3.5-turbo, GPT-4).

Maintenance & Community

The project is associated with EMNLP 2023 and has recent arXiv publications, indicating active development. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The project appears to be under a permissive license, likely MIT or Apache, based on common open-source practices, but a specific license is not explicitly stated in the README. This suggests good compatibility for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that code from the main branch may be unstable. Some UE methods require training data, and certain demo applications might require Colab Pro for larger models due to memory constraints.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
22
Issues (30d)
8
Star History
57 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 16 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 5 days ago
Feedback? Help us improve.