PyTorch library for quantization and sparsity in training/inference
Top 20.8% on sourcepulse
torchao
is a PyTorch library for optimizing neural network models through quantization and sparsity, targeting researchers and engineers seeking to improve inference speed and reduce memory footprint for both training and deployment. It offers composable, PyTorch-native tools that integrate seamlessly with torch.compile
and FSDP2, enabling significant performance gains with minimal code changes.
How It Works
torchao
provides a unified API for applying various quantization and sparsity techniques, including post-training quantization (PTQ) and quantization-aware training (QAT). It leverages PyTorch's dynamic graph capabilities to define custom data types and operations, which are then compiled into efficient kernels. This approach allows for flexible integration with existing PyTorch workflows and hardware accelerators, facilitating custom optimizations without requiring deep C++/CUDA expertise.
Quick Start & Requirements
pip install torchao
(or from PyTorch index for specific CUDA versions).Highlighted Details
torchao.float8
.torch.compile
and FSDP2 for seamless composability.AdamW8bit
) and KV cache quantization for long context inference.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
22 hours ago
1 day