lighteval by huggingface

LLM evaluation toolkit for multiple backends

Created 1 year ago

2,236 stars

Top 20.1% on SourcePulse

View on GitHub

10 Experts Love This Project

Morgan Funtowicz

Head of ML Optimizations at Hugging Face

Luis Capelo

Cofounder of Lightning AI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Abubakar Abid

Cofounder of Gradio

and 6 more!

Project Summary

Lighteval is a Python toolkit for evaluating Large Language Models (LLMs) across various backends, designed for researchers and developers needing flexible and fast performance benchmarking. It allows users to easily test LLMs on numerous tasks and metrics, with options for customizability and seamless result storage.

How It Works

Lighteval supports multiple backends for LLM inference, including Hugging Face Transformers (via accelerate), Text Generation Inference (TGI), vLLM, and Nanotron. This versatility allows users to leverage different inference optimizations and hardware setups. The framework emphasizes detailed, sample-by-sample result logging, enabling in-depth analysis and debugging of model performance. Users can define custom tasks and metrics, extending the evaluation capabilities beyond pre-defined benchmarks.

Quick Start & Requirements

Install via pip: pip install lighteval
For Hugging Face Hub integration: huggingface-cli login
Backends like vLLM may require specific GPU hardware and CUDA versions.
Documentation: Lighteval's Wiki

Highlighted Details

Supports fast evaluations using the vLLM backend.
Integrates with Hugging Face Hub for model hosting and result storage.
Offers a Python API for easy integration into existing workflows.
Allows creation of custom evaluation tasks and metrics.

Maintenance & Community

Developed by the Hugging Face Leaderboard and Evals Team. Contributions are welcomed; see CONTRIBUTING.md for details.

Licensing & Compatibility

License: MIT
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is actively developed, and while it supports multiple backends, specific backend configurations or advanced features might require careful setup and understanding of their respective dependencies.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

51 stars in the last 30 days