optimum-benchmark  by huggingface

Benchmarking utility for Transformers, Diffusers, and other models

created 2 years ago
308 stars

Top 88.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a unified, multi-backend utility for benchmarking Hugging Face Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers libraries. It targets researchers and engineers needing to evaluate model performance across various hardware optimizations and quantization schemes for both inference and training, offering detailed metrics like latency, memory, and energy consumption.

How It Works

Optimum-Benchmark employs a flexible configuration system, allowing users to define benchmarks via a Python API or a Hydra CLI. It supports multiple launchers (Process, Torchrun, Inline) and scenarios (Training, Inference), with extensive features for each, such as device isolation, input shape control, and detailed metric tracking (latency, memory, energy). The core advantage lies in its unified approach to abstracting diverse hardware backends (PyTorch, ONNX Runtime, TensorRT-LLM, vLLM, OpenVINO, etc.) and their specific optimizations.

Quick Start & Requirements

  • Installation: pip install optimum-benchmark (or with extras for specific backends, e.g., pip install optimum-benchmark[onnxruntime]).
  • Dependencies: Python 3.x. Additional dependencies depend on the chosen backend (e.g., CUDA for GPU acceleration, specific libraries for OpenVINO, TensorRT-LLM, vLLM).
  • Resources: Requires downloading models and datasets. GPU acceleration is recommended for performance testing.
  • Docs: Examples

Highlighted Details

  • Supports benchmarking of Transformers, Diffusers, PEFT, Timm, and Sentence-Transformers.
  • Includes backends for PyTorch, ONNX Runtime (CPU, CUDA, ROCm, TensorRT), Py-TXI, Neural Compressor, TensorRT-LLM, Torch-ORT, OpenVINO, vLLM, and IPEX.
  • Offers detailed tracking for latency, throughput, memory usage, and energy consumption.
  • Features like "no weights" benchmarking and NUMA node control are available.

Maintenance & Community

The project is actively developed with a focus on expanding backend and hardware support. Contributions are welcomed, with a clear path outlined in CONTRIBUTING.md.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is explicitly noted as a work in progress and not yet ready for production use. Some hardware backends (e.g., Habana Gaudi) are listed as unsupported or under development.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 17 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.