ao by pytorch

PyTorch library for quantization and sparsity in training/inference

Created 2 years ago

2,614 stars

Top 17.8% on SourcePulse

View on GitHub

16 Experts Love This Project

Research Lead at Prime Intellect

and 12 more!

Project Summary

torchao is a PyTorch library for optimizing neural network models through quantization and sparsity, targeting researchers and engineers seeking to improve inference speed and reduce memory footprint for both training and deployment. It offers composable, PyTorch-native tools that integrate seamlessly with torch.compile and FSDP2, enabling significant performance gains with minimal code changes.

How It Works

torchao provides a unified API for applying various quantization and sparsity techniques, including post-training quantization (PTQ) and quantization-aware training (QAT). It leverages PyTorch's dynamic graph capabilities to define custom data types and operations, which are then compiled into efficient kernels. This approach allows for flexible integration with existing PyTorch workflows and hardware accelerators, facilitating custom optimizations without requiring deep C++/CUDA expertise.

Quick Start & Requirements

Install via pip: pip install torchao (or from PyTorch index for specific CUDA versions).
Requires PyTorch nightly or latest stable.
Recommended: CUDA 12.4+ for optimal performance.
Official Docs: https://github.com/pytorch/torchao

Highlighted Details

Achieves up to 9.5x inference speedups on image segmentation and 10x on language models.
Enables up to 1.5x end-to-end speedups for large-scale pretraining with torchao.float8.
Integrates with torch.compile and FSDP2 for seamless composability.
Supports memory-efficient optimizers (e.g., AdamW8bit) and KV cache quantization for long context inference.

Maintenance & Community

Developed by the PyTorch team, with contributions from the broader community.
Integrations with Hugging Face Transformers, Diffusers, and TorchTune.
Active development with ongoing alpha features.

Licensing & Compatibility

Released under the BSD 3-Clause license.
Permissive license suitable for commercial use and closed-source applications.

Limitations & Caveats

Some advanced features like Int8 Quantized Training and smaller intX dtypes are marked as prototype, with performance benchmarks still under development.
Custom kernel development requires understanding PyTorch's compilation and extension mechanisms.

Health Check

Last Commit

20 hours ago

Responsiveness

1 day

Pull Requests (30d)

121

Issues (30d)

Star History

61 stars in the last 30 days