optimum-intel  by huggingface

SDK for optimizing Hugging Face Transformers/Diffusers models on Intel

created 3 years ago
481 stars

Top 64.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This library provides an interface between Hugging Face Transformers/Diffusers and Intel's optimization tools (Intel Extension for PyTorch, Intel Neural Compressor, OpenVINO) to accelerate inference on Intel architectures. It targets researchers and developers working with large language models and diffusion models on Intel hardware, enabling significant performance gains through quantization, pruning, and optimized operators.

How It Works

Optimum Intel integrates Intel Extension for PyTorch (IPEX) for optimized operators and attention mechanisms, Intel Neural Compressor (INC) for accuracy-driven quantization and pruning, and OpenVINO for high-performance inference across Intel CPUs, GPUs, and accelerators. It simplifies the process of converting Hugging Face models to OpenVINO IR and applying compression techniques, offering a unified API for these optimizations.

Quick Start & Requirements

  • Installation:
    • OpenVINO: pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
    • INC: pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"
    • IPEX: pip install --upgrade --upgrade-strategy eager "optimum[ipex]"
    • From source: pip install git+https://github.com/huggingface/optimum-intel.git
  • Prerequisites: Python, PyTorch. Specific optimizations may require specific Intel hardware.
  • Resources: Quantization requires a calibration dataset.
  • Docs: https://huggingface.co/docs/optimum-intel/index

Highlighted Details

  • Supports dynamic, static, and aware-training quantization with accuracy criteria.
  • Enables weight-only INT8 quantization and hybrid quantization for Stable Diffusion.
  • Provides OVModelForXxx and IPEXModelForXxx classes for seamless integration with Hugging Face pipelines.
  • Includes a CLI for model export and quantization (optimum-cli).

Maintenance & Community

  • Actively developed by Hugging Face and Intel.
  • Examples and notebooks are available in the repository.

Licensing & Compatibility

  • Apache 2.0 License.
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

Quantization is currently CPU-only. For Gaudi accelerators, Optimum Habana should be used.

Health Check
Last commit

1 day ago

Responsiveness

1+ week

Pull Requests (30d)
25
Issues (30d)
3
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Feedback? Help us improve.