SDK for optimizing Hugging Face Transformers/Diffusers models on Intel
Top 64.6% on sourcepulse
This library provides an interface between Hugging Face Transformers/Diffusers and Intel's optimization tools (Intel Extension for PyTorch, Intel Neural Compressor, OpenVINO) to accelerate inference on Intel architectures. It targets researchers and developers working with large language models and diffusion models on Intel hardware, enabling significant performance gains through quantization, pruning, and optimized operators.
How It Works
Optimum Intel integrates Intel Extension for PyTorch (IPEX) for optimized operators and attention mechanisms, Intel Neural Compressor (INC) for accuracy-driven quantization and pruning, and OpenVINO for high-performance inference across Intel CPUs, GPUs, and accelerators. It simplifies the process of converting Hugging Face models to OpenVINO IR and applying compression techniques, offering a unified API for these optimizations.
Quick Start & Requirements
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"
pip install --upgrade --upgrade-strategy eager "optimum[ipex]"
pip install git+https://github.com/huggingface/optimum-intel.git
Highlighted Details
OVModelForXxx
and IPEXModelForXxx
classes for seamless integration with Hugging Face pipelines.optimum-cli
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Quantization is currently CPU-only. For Gaudi accelerators, Optimum Habana should be used.
1 day ago
1+ week