fastformers  by microsoft

NLU optimization recipes for transformer models

Created 5 years ago
707 stars

Top 48.3% on SourcePulse

GitHubView on GitHub
Project Summary

FastFormers provides methods and recipes for highly efficient Transformer model inference for Natural Language Understanding (NLU) tasks. It targets researchers and engineers seeking significant speed-ups on CPU and GPU, demonstrating up to 233x speed improvement on CPU for multi-head self-attention architectures.

How It Works

The project leverages techniques like knowledge distillation, structured pruning (reducing heads and FFN dimensions), and 8-bit integer quantization via ONNX Runtime for CPU optimization. For GPU, it supports 16-bit floating-point precision. The core approach involves creating smaller, faster student models from larger teacher models, often with modifications to activation functions and architectural elements.

Quick Start & Requirements

  • Installation: pip install onnxruntime==1.8.0 --user --upgrade --no-deps --force-reinstall, pip uninstall transformers -y, git clone https://github.com/microsoft/fastformers, cd fastformers, pip install .
  • Prerequisites: Linux OS, Python 3.6/3.7, PyTorch 1.5.0+, ONNX Runtime 1.8.0+. CPU requires AVX2/AVX512 (AVX512 recommended for full speed). GPU requires Volta or later for 16-bit float support.
  • Demo: Requires downloading SuperGLUE dataset and demo model files.
  • Docs: FastFormers Paper

Highlighted Details

  • Claims up to 233.87x speed-up on CPU for Transformer architectures.
  • Supports knowledge distillation, pruning, and 8-bit quantization.
  • Integrates with Hugging Face Transformers and ONNX Runtime.
  • Reproducible results from the FastFormers paper are available.

Maintenance & Community

  • Actively worked with Hugging Face and ONNX Runtime teams.
  • Adopted the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

  • Currently supports only Linux operating systems.
  • Requires uninstalling existing transformers package due to customized versions.
  • GPU 16-bit float optimization requires specific hardware (Volta+).
Health Check
Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 8 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 1 month ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.