lighteval  by huggingface

LLM evaluation toolkit for multiple backends

created 1 year ago
1,773 stars

Top 24.8% on sourcepulse

GitHubView on GitHub
Project Summary

Lighteval is a Python toolkit for evaluating Large Language Models (LLMs) across various backends, designed for researchers and developers needing flexible and fast performance benchmarking. It allows users to easily test LLMs on numerous tasks and metrics, with options for customizability and seamless result storage.

How It Works

Lighteval supports multiple backends for LLM inference, including Hugging Face Transformers (via accelerate), Text Generation Inference (TGI), vLLM, and Nanotron. This versatility allows users to leverage different inference optimizations and hardware setups. The framework emphasizes detailed, sample-by-sample result logging, enabling in-depth analysis and debugging of model performance. Users can define custom tasks and metrics, extending the evaluation capabilities beyond pre-defined benchmarks.

Quick Start & Requirements

  • Install via pip: pip install lighteval
  • For Hugging Face Hub integration: huggingface-cli login
  • Backends like vLLM may require specific GPU hardware and CUDA versions.
  • Documentation: Lighteval's Wiki

Highlighted Details

  • Supports fast evaluations using the vLLM backend.
  • Integrates with Hugging Face Hub for model hosting and result storage.
  • Offers a Python API for easy integration into existing workflows.
  • Allows creation of custom evaluation tasks and metrics.

Maintenance & Community

Developed by the Hugging Face Leaderboard and Evals Team. Contributions are welcomed; see CONTRIBUTING.md for details.

Licensing & Compatibility

  • License: MIT
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is actively developed, and while it supports multiple backends, specific backend configurations or advanced features might require careful setup and understanding of their respective dependencies.

Health Check
Last commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
14
Issues (30d)
26
Star History
302 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
6 more.

evaluate by huggingface

0.3%
2k
ML model evaluation library for standardized performance reporting
created 3 years ago
updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

deepeval by confident-ai

2.0%
10k
LLM evaluation framework for unit testing LLM outputs
created 2 years ago
updated 16 hours ago
Feedback? Help us improve.