pytorch_block_sparse  by huggingface

PyTorch extension for block-sparse linear layers

created 5 years ago
548 stars

Top 59.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a PyTorch extension for fast block sparse matrices, enabling easy experimentation with sparse neural networks to achieve significant savings in memory and computation. It's targeted at researchers and practitioners looking to optimize model size and speed without substantial precision loss.

How It Works

The extension replaces torch.nn.Linear with BlockSparseLinear, utilizing C++ CUDA templates based on the CUTLASS library for efficient block-sparse matrix multiplication. This approach aims to outperform naive PyTorch sparse implementations, which are often an order of magnitude slower than dense counterparts. While currently slower than optimized dense torch.nn.Linear by a factor of ~2, performance gains increase with sparsity, making 75% sparse matrices approximately 2x faster than dense equivalents.

Quick Start & Requirements

  • Install via pip: pip install pytorch-block-sparse
  • Requires PyTorch and CUDA.
  • Official example notebooks are available for detailed usage.

Highlighted Details

  • Achieves 40-55% of cuBLAS performance on large matrices.
  • A Transformer with 50% sparsity using BlockSparseLinear is as fast as its dense counterpart.
  • Offers BlockSparseModelPatcher for easily converting existing PyTorch models to use block sparsity.

Maintenance & Community

  • Developed by Hugging Face.
  • Further development details are available in the repository.

Licensing & Compatibility

  • The library is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The current implementation is approximately 2x slower than optimized dense torch.nn.Linear layers, though this is expected to improve with future updates and CUTLASS versions. Sparsifying pre-trained models is not directly supported; models typically need to be trained from scratch with the sparse layers.

Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

torchshard by kaiyuyue

0%
299
PyTorch engine for tensor slicing into parallel shards
created 4 years ago
updated 1 month ago
Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

wanda by locuslab

0%
782
LLM pruning research paper implementation
created 2 years ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
3 more.

sparseml by neuralmagic

0%
2k
Sparsification toolkit for optimized neural networks
created 4 years ago
updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lianmin Zheng Lianmin Zheng(Author of SGLang), and
13 more.

gpt-fast by pytorch-labs

0.1%
6k
PyTorch text generation for efficient transformer inference
created 1 year ago
updated 3 months ago
Feedback? Help us improve.