optimum-benchmark by huggingface

Benchmarking utility for Transformers, Diffusers, and other models

Created 2 years ago

325 stars

Top 83.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Alex Chen

Cofounder of Nexa AI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This project provides a unified, multi-backend utility for benchmarking Hugging Face Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers libraries. It targets researchers and engineers needing to evaluate model performance across various hardware optimizations and quantization schemes for both inference and training, offering detailed metrics like latency, memory, and energy consumption.

How It Works

Optimum-Benchmark employs a flexible configuration system, allowing users to define benchmarks via a Python API or a Hydra CLI. It supports multiple launchers (Process, Torchrun, Inline) and scenarios (Training, Inference), with extensive features for each, such as device isolation, input shape control, and detailed metric tracking (latency, memory, energy). The core advantage lies in its unified approach to abstracting diverse hardware backends (PyTorch, ONNX Runtime, TensorRT-LLM, vLLM, OpenVINO, etc.) and their specific optimizations.

Quick Start & Requirements

Installation: pip install optimum-benchmark (or with extras for specific backends, e.g., pip install optimum-benchmark[onnxruntime]).
Dependencies: Python 3.x. Additional dependencies depend on the chosen backend (e.g., CUDA for GPU acceleration, specific libraries for OpenVINO, TensorRT-LLM, vLLM).
Resources: Requires downloading models and datasets. GPU acceleration is recommended for performance testing.
Docs: Examples

Highlighted Details

Supports benchmarking of Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers.
Includes backends for PyTorch, ONNX Runtime (CPU, CUDA, ROCm, TensorRT), Py-TXI, Neural Compressor, TensorRT-LLM, Torch-ORT, OpenVINO, vLLM, and IPEX.
Offers detailed tracking for latency, throughput, memory usage, and energy consumption.
Features like "no weights" benchmarking and NUMA node control are available.

Maintenance & Community

The project is actively developed with a focus on expanding backend and hardware support. Contributions are welcomed, with a clear path outlined in CONTRIBUTING.md.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is explicitly noted as a work in progress and not yet ready for production use. Some hardware backends (e.g., Habana Gaudi) are listed as unsupported or under development.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days