Benchmarking utility for Transformers, Diffusers, and other models
Top 88.2% on sourcepulse
This project provides a unified, multi-backend utility for benchmarking Hugging Face Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers libraries. It targets researchers and engineers needing to evaluate model performance across various hardware optimizations and quantization schemes for both inference and training, offering detailed metrics like latency, memory, and energy consumption.
How It Works
Optimum-Benchmark employs a flexible configuration system, allowing users to define benchmarks via a Python API or a Hydra CLI. It supports multiple launchers (Process, Torchrun, Inline) and scenarios (Training, Inference), with extensive features for each, such as device isolation, input shape control, and detailed metric tracking (latency, memory, energy). The core advantage lies in its unified approach to abstracting diverse hardware backends (PyTorch, ONNX Runtime, TensorRT-LLM, vLLM, OpenVINO, etc.) and their specific optimizations.
Quick Start & Requirements
pip install optimum-benchmark
(or with extras for specific backends, e.g., pip install optimum-benchmark[onnxruntime]
).Highlighted Details
Maintenance & Community
The project is actively developed with a focus on expanding backend and hardware support. Contributions are welcomed, with a clear path outlined in CONTRIBUTING.md
.
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project is explicitly noted as a work in progress and not yet ready for production use. Some hardware backends (e.g., Habana Gaudi) are listed as unsupported or under development.
2 months ago
1 day