TensorFlow ops/GPU kernels for block-sparse matrix multiplication and convolution
Top 36.7% on sourcepulse
This package provides efficient TensorFlow GPU kernels for block-sparse matrix multiplication and convolution, targeting researchers and engineers working with large neural networks where sparsity can significantly improve performance. It offers custom ops for sparse operations, aiming to accelerate training and inference by optimizing memory access and computation on NVIDIA GPUs.
How It Works
The core of the package leverages custom CUDA kernels to implement block-sparse matrix multiplication (BlocksparseMatMul
) and convolution (BlocksparseConv
). It operates by dividing matrices and filters into blocks, processing only the non-zero blocks to reduce computation and memory bandwidth. The kernels are optimized for specific GPU architectures (Maxwell, Pascal, Volta) and support different sparsity patterns and feature axis layouts, enabling faster execution compared to dense operations or standard sparse formats.
Quick Start & Requirements
pip install blocksparse
Highlighted Details
group_param_grads
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
BlocksparseMatMul
kernels have different feature_axis
support depending on the implementation (ASM vs. CudaC).SparseProj
, integrated ReLU in layer_norm
).2 years ago
1 week