lm-evaluation-harness by EleutherAI

Framework for few-shot language model evaluation

Created 5 years ago

11,145 stars

Top 4.6% on SourcePulse

View on GitHub

23 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Jasper Zhang

Cofounder of Hyperbolic

Zhuohan Li

Coauthor of vLLM

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

and 19 more!

Project Summary

This framework provides a unified system for evaluating generative language models across a wide array of academic benchmarks. It supports numerous model loading methods, including Hugging Face transformers, vLLM, and various API-based models, making it a versatile tool for researchers and developers assessing LLM performance.

How It Works

The harness employs a flexible, tokenization-agnostic interface to evaluate models on over 60 standard benchmarks with hundreds of subtasks. It supports advanced inference techniques like quantization (GPTQ, AutoGPTQ), vLLM for speed and memory efficiency, and multi-GPU parallelism via Hugging Face's Accelerate library. Prompt engineering is facilitated through Jinja2 templating and integration with Promptsource, allowing for customizable evaluation setups.

Quick Start & Requirements

Install via pip: git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness && cd lm-evaluation-harness && pip install -e .
Optional dependencies for extended functionality are available (e.g., pip install lm_eval[vllm]).
Requires Python and a compatible environment. GPU acceleration is highly recommended for performance.
Documentation: https://github.com/EleutherAI/lm-evaluation-harness#documentation

Highlighted Details

Backend for Hugging Face's Open LLM Leaderboard.
Supports evaluation on PEFT adapters (e.g., LoRA).
Integrates with Weights & Biases and Zeno for results visualization.
Includes experimental support for multimodal tasks and steering vectors.

Maintenance & Community

The project is actively maintained by EleutherAI, with contributions from numerous researchers and organizations. Support and discussion are available via GitHub issues and the EleutherAI Discord server.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Native multi-node evaluation is not supported for the Hugging Face hf model type; custom integrations or external servers are recommended. The MPS backend for Metal GPUs is in early development and may have correctness issues.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

238 stars in the last 30 days