evaluate by huggingface

ML model evaluation library for standardized performance reporting

Created 3 years ago

2,395 stars

Top 18.9% on SourcePulse

View on GitHub

14 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Nathan Lambert

Research Scientist at AI2

Leandro von Werra

Head of Research at Hugging Face

and 10 more!

Project Summary

🤗 Evaluate is a Python library designed to simplify and standardize the evaluation of machine learning models and datasets. It provides a unified interface for dozens of popular metrics across NLP and Computer Vision, enabling easy integration with frameworks like NumPy, PyTorch, TensorFlow, and JAX. The library also facilitates model comparisons, dataset measurements, and the creation and sharing of custom evaluation modules via the Hugging Face Hub.

How It Works

The library employs a modular design, allowing users to load metrics, comparisons, and measurements with a simple evaluate.load() command. Each metric is self-contained and includes type checking and a "metric card" detailing its usage, limitations, and value ranges. This approach promotes reproducibility and standardization in model evaluation, abstracting away framework-specific implementations.

Quick Start & Requirements

Primary install: pip install evaluate
Prerequisites: Python 3.x, virtual environment recommended.
Documentation: https://huggingface.co/docs/evaluate/index
Find a metric: https://huggingface.co/metrics

Highlighted Details

Supports integration with NumPy, Pandas, PyTorch, TensorFlow, and JAX.
Features "metric cards" for detailed metric descriptions and usage examples.
Enables community contributions and sharing of custom metrics via the Hugging Face Hub.
Includes functionality for model comparisons and dataset measurements.

Maintenance & Community

The project is part of the Hugging Face ecosystem, indicating active development and community support.

Licensing & Compatibility

The library is released under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README explicitly recommends the newer LightEval library for LLM evaluations, suggesting 🤗 Evaluate may be less actively maintained for cutting-edge LLM use cases.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

20 stars in the last 30 days