sparse_attention  by openai

Sparse attention primitives for research

created 6 years ago
1,584 stars

Top 26.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides sparse attention primitives for efficiently processing long sequences in Transformer models, targeting researchers and engineers working on large-scale language generation. It offers optimized attention kernels that reduce computational complexity by skipping unnecessary calculations, enabling faster training and inference.

How It Works

The core of the project lies in fused implementations of attention operations that support block sparsity. Instead of computing the full attention matrix, it allows users to define patterns of blocks to be skipped (set to zero) in the QK^T product and softmax calculation. This approach, detailed in the Sparse Transformers paper, significantly reduces computation by avoiding unnecessary operations, especially for long sequences.

Quick Start & Requirements

  • Install via pip install blocksparse (requires CUDA 10 and tensorflow-gpu).
  • For fp16 and smaller block sizes (8, 16, 32, 64), a GPU with Tensor Cores (compute capability >= 7.0, e.g., V100) is required.
  • Non-V100 GPUs can run fp32 with blocksize 32 (compute capability > 3.5).
  • Example usage: python attention.py (non-V100) or python attention.py fp16 (V100).
  • Official blog and paper: https://openai.com/blog/sparse-transformers

Highlighted Details

Maintenance & Community

Status: Archive (code provided as-is, no updates expected).

Licensing & Compatibility

The repository does not explicitly state a license. The associated paper and blog post are from OpenAI.

Limitations & Caveats

The project is archived and no longer maintained. The primary dependency, blocksparse, may require installation from source depending on the CUDA and TensorFlow setup. FP16 support is restricted to specific GPU hardware.

Health Check
Last commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 16 hours ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.2%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Feedback? Help us improve.