sparse_attention by openai

Sparse attention primitives for research

Created 6 years ago

1,606 stars

Top 25.9% on SourcePulse

View on GitHub

7 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Chenlin Meng

Cofounder of Pika

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Tom Brown

Cofounder of Anthropic

and 3 more!

Project Summary

This repository provides sparse attention primitives for efficiently processing long sequences in Transformer models, targeting researchers and engineers working on large-scale language generation. It offers optimized attention kernels that reduce computational complexity by skipping unnecessary calculations, enabling faster training and inference.

How It Works

The core of the project lies in fused implementations of attention operations that support block sparsity. Instead of computing the full attention matrix, it allows users to define patterns of blocks to be skipped (set to zero) in the QK^T product and softmax calculation. This approach, detailed in the Sparse Transformers paper, significantly reduces computation by avoiding unnecessary operations, especially for long sequences.

Quick Start & Requirements

Install via pip install blocksparse (requires CUDA 10 and tensorflow-gpu).
For fp16 and smaller block sizes (8, 16, 32, 64), a GPU with Tensor Cores (compute capability >= 7.0, e.g., V100) is required.
Non-V100 GPUs can run fp32 with blocksize 32 (compute capability > 3.5).
Example usage: python attention.py (non-V100) or python attention.py fp16 (V100).
Official blog and paper: https://openai.com/blog/sparse-transformers

Highlighted Details

Implements fused kernels for normal, "strided," and "fixed" attention patterns.
Supports block sizes of {8, 16, 32, 64}, with potential speed advantages for larger blocks.
Includes a simple recompute decorator for memory optimization.
Offers an example Transformer implementation at https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py.

Maintenance & Community

Status: Archive (code provided as-is, no updates expected).

Licensing & Compatibility

The repository does not explicitly state a license. The associated paper and blog post are from OpenAI.

Limitations & Caveats

The project is archived and no longer maintained. The primary dependency, blocksparse, may require installation from source depending on the CUDA and TensorFlow setup. FP16 support is restricted to specific GPU hardware.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days