Discover and explore top open-source AI tools and projects—updated daily.
openaiTensorFlow ops/GPU kernels for block-sparse matrix multiplication and convolution
Top 35.7% on SourcePulse
This package provides efficient TensorFlow GPU kernels for block-sparse matrix multiplication and convolution, targeting researchers and engineers working with large neural networks where sparsity can significantly improve performance. It offers custom ops for sparse operations, aiming to accelerate training and inference by optimizing memory access and computation on NVIDIA GPUs.
How It Works
The core of the package leverages custom CUDA kernels to implement block-sparse matrix multiplication (BlocksparseMatMul) and convolution (BlocksparseConv). It operates by dividing matrices and filters into blocks, processing only the non-zero blocks to reduce computation and memory bandwidth. The kernels are optimized for specific GPU architectures (Maxwell, Pascal, Volta) and support different sparsity patterns and feature axis layouts, enabling faster execution compared to dense operations or standard sparse formats.
Quick Start & Requirements
pip install blocksparseHighlighted Details
group_param_grads).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
BlocksparseMatMul kernels have different feature_axis support depending on the implementation (ASM vs. CudaC).SparseProj, integrated ReLU in layer_norm).2 years ago
Inactive
mratsim
neuralmagic
databricks
baidu-research
deepseek-ai
NVIDIA