optimum-intel  by huggingface

SDK for optimizing Hugging Face Transformers/Diffusers models on Intel

Created 3 years ago
493 stars

Top 62.7% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides an interface between Hugging Face Transformers/Diffusers and Intel's optimization tools (Intel Extension for PyTorch, Intel Neural Compressor, OpenVINO) to accelerate inference on Intel architectures. It targets researchers and developers working with large language models and diffusion models on Intel hardware, enabling significant performance gains through quantization, pruning, and optimized operators.

How It Works

Optimum Intel integrates Intel Extension for PyTorch (IPEX) for optimized operators and attention mechanisms, Intel Neural Compressor (INC) for accuracy-driven quantization and pruning, and OpenVINO for high-performance inference across Intel CPUs, GPUs, and accelerators. It simplifies the process of converting Hugging Face models to OpenVINO IR and applying compression techniques, offering a unified API for these optimizations.

Quick Start & Requirements

  • Installation:
    • OpenVINO: pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
    • INC: pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"
    • IPEX: pip install --upgrade --upgrade-strategy eager "optimum[ipex]"
    • From source: pip install git+https://github.com/huggingface/optimum-intel.git
  • Prerequisites: Python, PyTorch. Specific optimizations may require specific Intel hardware.
  • Resources: Quantization requires a calibration dataset.
  • Docs: https://huggingface.co/docs/optimum-intel/index

Highlighted Details

  • Supports dynamic, static, and aware-training quantization with accuracy criteria.
  • Enables weight-only INT8 quantization and hybrid quantization for Stable Diffusion.
  • Provides OVModelForXxx and IPEXModelForXxx classes for seamless integration with Hugging Face pipelines.
  • Includes a CLI for model export and quantization (optimum-cli).

Maintenance & Community

  • Actively developed by Hugging Face and Intel.
  • Examples and notebooks are available in the repository.

Licensing & Compatibility

  • Apache 2.0 License.
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

Quantization is currently CPU-only. For Gaudi accelerators, Optimum Habana should be used.

Health Check
Last Commit

6 days ago

Responsiveness

1 week

Pull Requests (30d)
18
Issues (30d)
3
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.3%
2k
Post-training quantization research paper for large language models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.