Model compression and acceleration toolbox
Top 83.8% on sourcepulse
Sparsebit is a PyTorch-based toolkit for model compression and acceleration, offering pruning and quantization capabilities. It targets researchers and engineers seeking to reduce model size and inference latency with minimal code changes, supporting both Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
How It Works
Sparsebit leverages torch.fx
to transform PyTorch models into a QuantModel
where operations become QuantModule
s. This modular design allows for easy extension of quantization methods, observers, and modules. For pruning, it supports structured and unstructured pruning across various model components (weights, activations, layers) using algorithms like L1/L0 norm, Fisher pruning, and Hrank, with ONNX export for pruned models.
Quick Start & Requirements
pip install sparsebit
Highlighted Details
Maintenance & Community
The project is from megvii-research, with recent updates in April 2023. It references several open-source projects it was inspired by. Contact: sunpeiqin@megvii.com for team opportunities.
Licensing & Compatibility
Released under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
While flexible, the README focuses heavily on specific model architectures and quantization techniques (e.g., GPTQ, QAT). Broader model compatibility and performance benchmarks beyond those listed may require user validation.
1 year ago
1 week