ao  by pytorch

PyTorch library for quantization and sparsity in training/inference

created 1 year ago
2,223 stars

Top 20.8% on sourcepulse

GitHubView on GitHub
Project Summary

torchao is a PyTorch library for optimizing neural network models through quantization and sparsity, targeting researchers and engineers seeking to improve inference speed and reduce memory footprint for both training and deployment. It offers composable, PyTorch-native tools that integrate seamlessly with torch.compile and FSDP2, enabling significant performance gains with minimal code changes.

How It Works

torchao provides a unified API for applying various quantization and sparsity techniques, including post-training quantization (PTQ) and quantization-aware training (QAT). It leverages PyTorch's dynamic graph capabilities to define custom data types and operations, which are then compiled into efficient kernels. This approach allows for flexible integration with existing PyTorch workflows and hardware accelerators, facilitating custom optimizations without requiring deep C++/CUDA expertise.

Quick Start & Requirements

  • Install via pip: pip install torchao (or from PyTorch index for specific CUDA versions).
  • Requires PyTorch nightly or latest stable.
  • Recommended: CUDA 12.4+ for optimal performance.
  • Official Docs: https://github.com/pytorch/torchao

Highlighted Details

  • Achieves up to 9.5x inference speedups on image segmentation and 10x on language models.
  • Enables up to 1.5x end-to-end speedups for large-scale pretraining with torchao.float8.
  • Integrates with torch.compile and FSDP2 for seamless composability.
  • Supports memory-efficient optimizers (e.g., AdamW8bit) and KV cache quantization for long context inference.

Maintenance & Community

  • Developed by the PyTorch team, with contributions from the broader community.
  • Integrations with Hugging Face Transformers, Diffusers, and TorchTune.
  • Active development with ongoing alpha features.

Licensing & Compatibility

  • Released under the BSD 3-Clause license.
  • Permissive license suitable for commercial use and closed-source applications.

Limitations & Caveats

  • Some advanced features like Int8 Quantized Training and smaller intX dtypes are marked as prototype, with performance benchmarks still under development.
  • Custom kernel development requires understanding PyTorch's compilation and extension mechanisms.
Health Check
Last commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
171
Issues (30d)
26
Star History
222 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Suale Hasif Suale Hasif(Cofounder of Cursor), and
1 more.

attorch by BobMcDear

0.3%
564
PyTorch nn module subset, implemented in Python using Triton
created 2 years ago
updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lianmin Zheng Lianmin Zheng(Author of SGLang), and
13 more.

gpt-fast by pytorch-labs

0.1%
6k
PyTorch text generation for efficient transformer inference
created 1 year ago
updated 3 months ago
Feedback? Help us improve.