pytorch_block_sparse by huggingface

PyTorch extension for block-sparse linear layers

Created 5 years ago

550 stars

Top 58.1% on SourcePulse

View on GitHub

9 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Paras Jain

Cofounder of Genmo

Phil Wang

Prolific Research Paper Implementer

Lewis Tunstall

Research Engineer at Hugging Face

and 5 more!

Project Summary

This library provides a PyTorch extension for fast block sparse matrices, enabling easy experimentation with sparse neural networks to achieve significant savings in memory and computation. It's targeted at researchers and practitioners looking to optimize model size and speed without substantial precision loss.

How It Works

The extension replaces torch.nn.Linear with BlockSparseLinear, utilizing C++ CUDA templates based on the CUTLASS library for efficient block-sparse matrix multiplication. This approach aims to outperform naive PyTorch sparse implementations, which are often an order of magnitude slower than dense counterparts. While currently slower than optimized dense torch.nn.Linear by a factor of ~2, performance gains increase with sparsity, making 75% sparse matrices approximately 2x faster than dense equivalents.

Quick Start & Requirements

Install via pip: pip install pytorch-block-sparse
Requires PyTorch and CUDA.
Official example notebooks are available for detailed usage.

Highlighted Details

Achieves 40-55% of cuBLAS performance on large matrices.
A Transformer with 50% sparsity using BlockSparseLinear is as fast as its dense counterpart.
Offers BlockSparseModelPatcher for easily converting existing PyTorch models to use block sparsity.

Maintenance & Community

Developed by Hugging Face.
Further development details are available in the repository.

Licensing & Compatibility

The library is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The current implementation is approximately 2x slower than optimized dense torch.nn.Linear layers, though this is expected to improve with future updates and CUTLASS versions. Sparsifying pre-trained models is not directly supported; models typically need to be trained from scratch with the sparse layers.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days