sparse_attention  by openai

Sparse attention primitives for research

Created 6 years ago
1,588 stars

Top 26.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides sparse attention primitives for efficiently processing long sequences in Transformer models, targeting researchers and engineers working on large-scale language generation. It offers optimized attention kernels that reduce computational complexity by skipping unnecessary calculations, enabling faster training and inference.

How It Works

The core of the project lies in fused implementations of attention operations that support block sparsity. Instead of computing the full attention matrix, it allows users to define patterns of blocks to be skipped (set to zero) in the QK^T product and softmax calculation. This approach, detailed in the Sparse Transformers paper, significantly reduces computation by avoiding unnecessary operations, especially for long sequences.

Quick Start & Requirements

  • Install via pip install blocksparse (requires CUDA 10 and tensorflow-gpu).
  • For fp16 and smaller block sizes (8, 16, 32, 64), a GPU with Tensor Cores (compute capability >= 7.0, e.g., V100) is required.
  • Non-V100 GPUs can run fp32 with blocksize 32 (compute capability > 3.5).
  • Example usage: python attention.py (non-V100) or python attention.py fp16 (V100).
  • Official blog and paper: https://openai.com/blog/sparse-transformers

Highlighted Details

Maintenance & Community

Status: Archive (code provided as-is, no updates expected).

Licensing & Compatibility

The repository does not explicitly state a license. The associated paper and blog post are from OpenAI.

Limitations & Caveats

The project is archived and no longer maintained. The primary dependency, blocksparse, may require installation from source depending on the CUDA and TensorFlow setup. FP16 support is restricted to specific GPU hardware.

Health Check
Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

ring-attention-pytorch by lucidrains

0.2%
538
Pytorch impl of Ring Attention for near-infinite context
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.