fastformers by microsoft

NLU optimization recipes for transformer models

Created 5 years ago

709 stars

Top 48.3% on SourcePulse

View on GitHub

6 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Lewis Tunstall

Research Engineer at Hugging Face

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Luis Capelo

Cofounder of Lightning AI

and 2 more!

Project Summary

FastFormers provides methods and recipes for highly efficient Transformer model inference for Natural Language Understanding (NLU) tasks. It targets researchers and engineers seeking significant speed-ups on CPU and GPU, demonstrating up to 233x speed improvement on CPU for multi-head self-attention architectures.

How It Works

The project leverages techniques like knowledge distillation, structured pruning (reducing heads and FFN dimensions), and 8-bit integer quantization via ONNX Runtime for CPU optimization. For GPU, it supports 16-bit floating-point precision. The core approach involves creating smaller, faster student models from larger teacher models, often with modifications to activation functions and architectural elements.

Quick Start & Requirements

Installation: pip install onnxruntime==1.8.0 --user --upgrade --no-deps --force-reinstall, pip uninstall transformers -y, git clone https://github.com/microsoft/fastformers, cd fastformers, pip install .
Prerequisites: Linux OS, Python 3.6/3.7, PyTorch 1.5.0+, ONNX Runtime 1.8.0+. CPU requires AVX2/AVX512 (AVX512 recommended for full speed). GPU requires Volta or later for 16-bit float support.
Demo: Requires downloading SuperGLUE dataset and demo model files.
Docs: FastFormers Paper

Highlighted Details

Claims up to 233.87x speed-up on CPU for Transformer architectures.
Supports knowledge distillation, pruning, and 8-bit quantization.
Integrates with Hugging Face Transformers and ONNX Runtime.
Reproducible results from the FastFormers paper are available.

Maintenance & Community

Actively worked with Hugging Face and ONNX Runtime teams.
Adopted the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

Currently supports only Linux operating systems.
Requires uninstalling existing transformers package due to customized versions.
GPU 16-bit float optimization requires specific hardware (Volta+).

Health Check

Last Commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days