PyTorch extension for block-sparse linear layers
Top 59.1% on sourcepulse
This library provides a PyTorch extension for fast block sparse matrices, enabling easy experimentation with sparse neural networks to achieve significant savings in memory and computation. It's targeted at researchers and practitioners looking to optimize model size and speed without substantial precision loss.
How It Works
The extension replaces torch.nn.Linear
with BlockSparseLinear
, utilizing C++ CUDA templates based on the CUTLASS library for efficient block-sparse matrix multiplication. This approach aims to outperform naive PyTorch sparse implementations, which are often an order of magnitude slower than dense counterparts. While currently slower than optimized dense torch.nn.Linear
by a factor of ~2, performance gains increase with sparsity, making 75% sparse matrices approximately 2x faster than dense equivalents.
Quick Start & Requirements
pip install pytorch-block-sparse
Highlighted Details
BlockSparseLinear
is as fast as its dense counterpart.BlockSparseModelPatcher
for easily converting existing PyTorch models to use block sparsity.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The current implementation is approximately 2x slower than optimized dense torch.nn.Linear
layers, though this is expected to improve with future updates and CUTLASS versions. Sparsifying pre-trained models is not directly supported; models typically need to be trained from scratch with the sparse layers.
4 years ago
Inactive