optimum  by huggingface

Hardware optimization tools for Transformers, Diffusers, etc

Created 4 years ago
3,086 stars

Top 15.5% on SourcePulse

GitHubView on GitHub
Project Summary

Hugging Face Optimum provides a unified toolkit for accelerating the inference and training of Hugging Face Transformers, Diffusers, TIMM, and Sentence Transformers models across diverse hardware backends. It targets researchers and engineers seeking to maximize model performance on specific hardware without sacrificing ease of use.

How It Works

Optimum acts as an extension layer, abstracting hardware-specific optimizations. It supports exporting models to various formats like ONNX, ExecuTorch, TensorFlow Lite, and OpenVINO, enabling efficient execution via optimized runtimes. For training, it offers wrappers around the Hugging Face Trainer to leverage specialized hardware accelerators.

Quick Start & Requirements

  • Install core package: python -m pip install optimum
  • Install accelerator-specific features: python -m pip install optimum[accelerator_type] (e.g., optimum[onnxruntime], optimum[executorch], optimum[neural-compressor], optimum[openvino], optimum[amd], optimum[neuronx], optimum[habana]).
  • NVIDIA TensorRT-LLM requires Docker: docker run -it --gpus all --ipc host huggingface/optimum-nvidia.
  • Full documentation: https://huggingface.co/docs/optimum/index

Highlighted Details

  • Supports ONNX, ExecuTorch, TensorFlow Lite, OpenVINO, Intel Neural Compressor, NVIDIA TensorRT-LLM, AMD Instinct, AWS Trainium/Inferentia, and Habana Gaudi.
  • Enables programmatic and CLI-based model export and optimization.
  • Provides wrappers for accelerated training on specialized hardware.
  • Integrates with PyTorch's native edge solution (ExecuTorch) and quantization tools like Quanto.

Maintenance & Community

  • Developed by Hugging Face.
  • Community support channels and documentation are available via Hugging Face.

Licensing & Compatibility

  • Primarily Apache 2.0 licensed, consistent with Hugging Face ecosystem.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The README indicates that specific accelerator integrations might require separate installation steps or Docker images, and users should consult the documentation for detailed prerequisites for each backend. Some integrations may be in earlier stages of development.

Health Check
Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
22
Issues (30d)
15
Star History
66 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
4 more.

fastformers by microsoft

0%
707
NLU optimization recipes for transformer models
Created 5 years ago
Updated 6 months ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.