Sparsebit by megvii-research

Model compression and acceleration toolbox

Created 3 years ago

333 stars

Top 82.5% on SourcePulse

Project Summary

Sparsebit is a PyTorch-based toolkit for model compression and acceleration, offering pruning and quantization capabilities. It targets researchers and engineers seeking to reduce model size and inference latency with minimal code changes, supporting both Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).

How It Works

Sparsebit leverages torch.fx to transform PyTorch models into a QuantModel where operations become QuantModules. This modular design allows for easy extension of quantization methods, observers, and modules. For pruning, it supports structured and unstructured pruning across various model components (weights, activations, layers) using algorithms like L1/L0 norm, Fisher pruning, and Hrank, with ONNX export for pruned models.

Quick Start & Requirements

Install via pip: pip install sparsebit
Requires PyTorch. Specific CUDA versions or GPU hardware are not explicitly mandated for basic functionality, but advanced features like GPTQ kernels may benefit from CUDA.
Documentation: docs

Highlighted Details

Supports GPTQ CUDA kernels with group size for efficient quantization.
Enables fine-tuning large models like LLaMA-65b with pipeline parallelism on consumer hardware (e.g., 8x2080ti).
Offers PTQ and QAT examples for various architectures including LLaMA, BERT, and vision models (BEVDet, BEVDepth, ViT).
Supports exporting QDQ-ONNX for deployment with TensorRT and ONNXRuntime.

Maintenance & Community

The project is from megvii-research, with recent updates in April 2023. It references several open-source projects it was inspired by. Contact: sunpeiqin@megvii.com for team opportunities.

Licensing & Compatibility

Released under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While flexible, the README focuses heavily on specific model architectures and quantization techniques (e.g., GPTQ, QAT). Broader model compatibility and performance benchmarks beyond those listed may require user validation.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days