evaluate  by huggingface

ML model evaluation library for standardized performance reporting

created 3 years ago
2,278 stars

Top 20.3% on sourcepulse

GitHubView on GitHub
Project Summary

🤗 Evaluate is a Python library designed to simplify and standardize the evaluation of machine learning models and datasets. It provides a unified interface for dozens of popular metrics across NLP and Computer Vision, enabling easy integration with frameworks like NumPy, PyTorch, TensorFlow, and JAX. The library also facilitates model comparisons, dataset measurements, and the creation and sharing of custom evaluation modules via the Hugging Face Hub.

How It Works

The library employs a modular design, allowing users to load metrics, comparisons, and measurements with a simple evaluate.load() command. Each metric is self-contained and includes type checking and a "metric card" detailing its usage, limitations, and value ranges. This approach promotes reproducibility and standardization in model evaluation, abstracting away framework-specific implementations.

Quick Start & Requirements

Highlighted Details

  • Supports integration with NumPy, Pandas, PyTorch, TensorFlow, and JAX.
  • Features "metric cards" for detailed metric descriptions and usage examples.
  • Enables community contributions and sharing of custom metrics via the Hugging Face Hub.
  • Includes functionality for model comparisons and dataset measurements.

Maintenance & Community

The project is part of the Hugging Face ecosystem, indicating active development and community support.

Licensing & Compatibility

The library is released under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README explicitly recommends the newer LightEval library for LLM evaluations, suggesting 🤗 Evaluate may be less actively maintained for cutting-edge LLM use cases.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
2
Star History
79 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Luca Antiga Luca Antiga(CTO of Lightning AI), and
4 more.

helm by stanford-crfm

0.9%
2k
Open-source Python framework for holistic evaluation of foundation models
created 3 years ago
updated 1 day ago
Feedback? Help us improve.