ao  by pytorch

PyTorch library for quantization and sparsity in training/inference

Created 2 years ago
2,481 stars

Top 18.7% on SourcePulse

GitHubView on GitHub
Project Summary

torchao is a PyTorch library for optimizing neural network models through quantization and sparsity, targeting researchers and engineers seeking to improve inference speed and reduce memory footprint for both training and deployment. It offers composable, PyTorch-native tools that integrate seamlessly with torch.compile and FSDP2, enabling significant performance gains with minimal code changes.

How It Works

torchao provides a unified API for applying various quantization and sparsity techniques, including post-training quantization (PTQ) and quantization-aware training (QAT). It leverages PyTorch's dynamic graph capabilities to define custom data types and operations, which are then compiled into efficient kernels. This approach allows for flexible integration with existing PyTorch workflows and hardware accelerators, facilitating custom optimizations without requiring deep C++/CUDA expertise.

Quick Start & Requirements

  • Install via pip: pip install torchao (or from PyTorch index for specific CUDA versions).
  • Requires PyTorch nightly or latest stable.
  • Recommended: CUDA 12.4+ for optimal performance.
  • Official Docs: https://github.com/pytorch/torchao

Highlighted Details

  • Achieves up to 9.5x inference speedups on image segmentation and 10x on language models.
  • Enables up to 1.5x end-to-end speedups for large-scale pretraining with torchao.float8.
  • Integrates with torch.compile and FSDP2 for seamless composability.
  • Supports memory-efficient optimizers (e.g., AdamW8bit) and KV cache quantization for long context inference.

Maintenance & Community

  • Developed by the PyTorch team, with contributions from the broader community.
  • Integrations with Hugging Face Transformers, Diffusers, and TorchTune.
  • Active development with ongoing alpha features.

Licensing & Compatibility

  • Released under the BSD 3-Clause license.
  • Permissive license suitable for commercial use and closed-source applications.

Limitations & Caveats

  • Some advanced features like Int8 Quantized Training and smaller intX dtypes are marked as prototype, with performance benchmarks still under development.
  • Custom kernel development requires understanding PyTorch's compilation and extension mechanisms.
Health Check
Last Commit

9 hours ago

Responsiveness

1 day

Pull Requests (30d)
160
Issues (30d)
36
Star History
104 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.4%
2k
Post-training quantization research paper for large language models
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Casper Hansen Casper Hansen(Author of AutoAWQ), and
3 more.

deepsparse by neuralmagic

0.1%
3k
CPU inference runtime for sparse deep learning models
Created 4 years ago
Updated 5 months ago
Feedback? Help us improve.