optimum-benchmark  by huggingface

Benchmarking utility for Transformers, Diffusers, and other models

Created 2 years ago
315 stars

Top 85.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a unified, multi-backend utility for benchmarking Hugging Face Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers libraries. It targets researchers and engineers needing to evaluate model performance across various hardware optimizations and quantization schemes for both inference and training, offering detailed metrics like latency, memory, and energy consumption.

How It Works

Optimum-Benchmark employs a flexible configuration system, allowing users to define benchmarks via a Python API or a Hydra CLI. It supports multiple launchers (Process, Torchrun, Inline) and scenarios (Training, Inference), with extensive features for each, such as device isolation, input shape control, and detailed metric tracking (latency, memory, energy). The core advantage lies in its unified approach to abstracting diverse hardware backends (PyTorch, ONNX Runtime, TensorRT-LLM, vLLM, OpenVINO, etc.) and their specific optimizations.

Quick Start & Requirements

  • Installation: pip install optimum-benchmark (or with extras for specific backends, e.g., pip install optimum-benchmark[onnxruntime]).
  • Dependencies: Python 3.x. Additional dependencies depend on the chosen backend (e.g., CUDA for GPU acceleration, specific libraries for OpenVINO, TensorRT-LLM, vLLM).
  • Resources: Requires downloading models and datasets. GPU acceleration is recommended for performance testing.
  • Docs: Examples

Highlighted Details

  • Supports benchmarking of Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers.
  • Includes backends for PyTorch, ONNX Runtime (CPU, CUDA, ROCm, TensorRT), Py-TXI, Neural Compressor, TensorRT-LLM, Torch-ORT, OpenVINO, vLLM, and IPEX.
  • Offers detailed tracking for latency, throughput, memory usage, and energy consumption.
  • Features like "no weights" benchmarking and NUMA node control are available.

Maintenance & Community

The project is actively developed with a focus on expanding backend and hardware support. Contributions are welcomed, with a clear path outlined in CONTRIBUTING.md.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is explicitly noted as a work in progress and not yet ready for production use. Some hardware backends (e.g., Habana Gaudi) are listed as unsupported or under development.

Health Check
Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 1 day ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.