optimum-intel by huggingface

SDK for optimizing Hugging Face Transformers/Diffusers models on Intel

Created 3 years ago

528 stars

Top 59.8% on SourcePulse

1 Expert Loves This Project

lewtun

Research Engineer at Hugging Face

Project Summary

This library provides an interface between Hugging Face Transformers/Diffusers and Intel's optimization tools (Intel Extension for PyTorch, Intel Neural Compressor, OpenVINO) to accelerate inference on Intel architectures. It targets researchers and developers working with large language models and diffusion models on Intel hardware, enabling significant performance gains through quantization, pruning, and optimized operators.

How It Works

Optimum Intel integrates Intel Extension for PyTorch (IPEX) for optimized operators and attention mechanisms, Intel Neural Compressor (INC) for accuracy-driven quantization and pruning, and OpenVINO for high-performance inference across Intel CPUs, GPUs, and accelerators. It simplifies the process of converting Hugging Face models to OpenVINO IR and applying compression techniques, offering a unified API for these optimizations.

Quick Start & Requirements

Installation:
- OpenVINO: pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
- INC: pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"
- IPEX: pip install --upgrade --upgrade-strategy eager "optimum[ipex]"
- From source: pip install git+https://github.com/huggingface/optimum-intel.git
Prerequisites: Python, PyTorch. Specific optimizations may require specific Intel hardware.
Resources: Quantization requires a calibration dataset.
Docs: https://huggingface.co/docs/optimum-intel/index

Highlighted Details

Supports dynamic, static, and aware-training quantization with accuracy criteria.
Enables weight-only INT8 quantization and hybrid quantization for Stable Diffusion.
Provides OVModelForXxx and IPEXModelForXxx classes for seamless integration with Hugging Face pipelines.
Includes a CLI for model export and quantization (optimum-cli).

Maintenance & Community

Actively developed by Hugging Face and Intel.
Examples and notebooks are available in the repository.

Licensing & Compatibility

Apache 2.0 License.
Compatible with commercial use and closed-source applications.

Limitations & Caveats

Quantization is currently CPU-only. For Gaudi accelerators, Optimum Habana should be used.

Health Check

Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)

25

Issues (30d)

5

Star History

11 stars in the last 30 days

Explore Similar Projects

Sparsebit by megvii-research

Model compression and acceleration toolbox

Created 3 years ago

Updated 2 years ago

EET by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 4 years ago

Updated 1 year ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Daniel Han

Daniel Han(Cofounder of Unsloth), and

3 more.

hqq by dropbox

Model quantizer for fast, accurate post-training quantization, skipping calibration

Created 2 years ago

Updated 3 weeks ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 1 day ago

ComfyUI-nunchaku by nunchaku-tech

ComfyUI plugin for efficient 4-bit neural network inference

Created 10 months ago

Updated 2 days ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Zack Li

Zack Li(Cofounder of Nexa AI), and

4 more.

smoothquant by mit-han-lab

Post-training quantization research paper for large language models

Created 3 years ago

Updated 1 year ago

Starred by

Ettore Di Giacinto

Ettore Di Giacinto(Author of LocalAI),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

intel-extension-for-transformers by intel

Transformer toolkit for GenAI/LLM acceleration on Intel platforms

Created 3 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

1 more.

Model-Optimizer by NVIDIA

Library for optimizing deep learning models for GPU inference

Created 1 year ago

Updated 1 day ago

Starred by

James Reed

James Reed(Cofounder of Fireworks AI).

nncf by openvinotoolkit

Neural network compression for optimized inference

Created 5 years ago

Updated 2 days ago

Starred by

Daniel Han

Daniel Han(Cofounder of Unsloth),

Michael Han

Michael Han(Cofounder of Unsloth), and

14 more.

ao by pytorch

PyTorch library for quantization and sparsity in training/inference

Created 2 years ago

Updated 1 day ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

13 more.

optimum by huggingface

Hardware optimization tools for Transformers, Diffusers, etc

Created 4 years ago

Updated 3 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Morgan Funtowicz

Morgan Funtowicz(Head of ML Optimizations at Hugging Face).

openvino by openvinotoolkit

Open source toolkit for optimizing and deploying AI inference

Created 7 years ago

Updated 2 days ago

Feedback? Help us improve.