Model pruning tool for efficient inference
Top 73.0% on sourcepulse
This library provides tools for applying movement pruning to neural networks, specifically focusing on achieving structured sparsity for improved inference speed. It targets researchers and practitioners in NLP and deep learning who need to compress large models like BERT while minimizing accuracy loss, enabling efficient deployment on resource-constrained devices.
How It Works
The library implements "Block Movement Pruning," an extension of movement pruning that creates structured sparsity patterns. This approach prunes weights in blocks, which are more amenable to hardware acceleration than unstructured sparsity. It explores semi-structured and structured variants, aiming to balance sparsity levels, accuracy, and inference speed. The method involves pruning during fine-tuning or training, allowing the network to adapt to the sparsity.
Quick Start & Requirements
python -m pip install -U nn_pruning
git clone https://github.com/huggingface/nn_pruning.git
then cd nn_pruning
and python -m pip install -e ".[dev]"
pytest nn_pruning
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
pytorch_block_sparse
CUDA implementation is not yet competitive with dense linear layers for speed.3 years ago
1+ week