Discover and explore top open-source AI tools and projects—updated daily.
pytorchPyTorch library for quantization and sparsity in training/inference
Top 18.7% on SourcePulse
torchao is a PyTorch library for optimizing neural network models through quantization and sparsity, targeting researchers and engineers seeking to improve inference speed and reduce memory footprint for both training and deployment. It offers composable, PyTorch-native tools that integrate seamlessly with torch.compile and FSDP2, enabling significant performance gains with minimal code changes.
How It Works
torchao provides a unified API for applying various quantization and sparsity techniques, including post-training quantization (PTQ) and quantization-aware training (QAT). It leverages PyTorch's dynamic graph capabilities to define custom data types and operations, which are then compiled into efficient kernels. This approach allows for flexible integration with existing PyTorch workflows and hardware accelerators, facilitating custom optimizations without requiring deep C++/CUDA expertise.
Quick Start & Requirements
pip install torchao (or from PyTorch index for specific CUDA versions).Highlighted Details
torchao.float8.torch.compile and FSDP2 for seamless composability.AdamW8bit) and KV cache quantization for long context inference.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
9 hours ago
1 day
jiaweizzhao
mit-han-lab
neuralmagic