blocksparse by openai

TensorFlow ops/GPU kernels for block-sparse matrix multiplication and convolution

Created 8 years ago

1,062 stars

Top 35.6% on SourcePulse

View on GitHub

12 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

James Reed

Cofounder of Fireworks AI

and 8 more!

Project Summary

This package provides efficient TensorFlow GPU kernels for block-sparse matrix multiplication and convolution, targeting researchers and engineers working with large neural networks where sparsity can significantly improve performance. It offers custom ops for sparse operations, aiming to accelerate training and inference by optimizing memory access and computation on NVIDIA GPUs.

How It Works

The core of the package leverages custom CUDA kernels to implement block-sparse matrix multiplication (BlocksparseMatMul) and convolution (BlocksparseConv). It operates by dividing matrices and filters into blocks, processing only the non-zero blocks to reduce computation and memory bandwidth. The kernels are optimized for specific GPU architectures (Maxwell, Pascal, Volta) and support different sparsity patterns and feature axis layouts, enabling faster execution compared to dense operations or standard sparse formats.

Quick Start & Requirements

Install via pip: pip install blocksparse
Prerequisites: NVIDIA GPU (Maxwell or newer recommended), Linux (Ubuntu 16.04 tested), CUDA 8, Python 3.5+, TensorFlow 1.4.0+ (with GPU support).
CUDA 9/Volta requires updating build targets and recompiling TensorFlow from source.
See OpenAI blog post for more details.

Highlighted Details

Optimized CUDA kernels for block-sparse matrix multiplication and convolution.
Supports various GPU architectures (Kepler, Maxwell, Pascal, Volta) with performance notes.
Includes custom ops for layer normalization, batch normalization, and element-wise operations.
Offers utilities for weight normalization and gradient aggregation (group_param_grads).

Maintenance & Community

Project status is "Active" with ongoing development; breaking changes may occur.
Developed by OpenAI.

Licensing & Compatibility

License: MIT.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Requires specific NVIDIA GPU hardware and older CUDA/TensorFlow versions for optimal performance.
BlocksparseMatMul kernels have different feature_axis support depending on the implementation (ASM vs. CudaC).
Some features are experimental (e.g., SparseProj, integrated ReLU in layer_norm).

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days