Discover and explore top open-source AI tools and projects—updated daily.
HKUSTDialTrainable sparse attention for long sequences
Top 53.9% on SourcePulse
This project provides a high-performance, trainable sparse attention implementation designed to significantly improve memory efficiency and computational speed for Transformer models handling extremely long sequences. It targets researchers and engineers working with large language models and other sequence-based deep learning architectures, offering a way to scale attention mechanisms beyond current practical limits.
How It Works
Flash-Sparse-Attention combines the memory efficiency of Flash Attention with sparse computation techniques. It implements trainable sparse attention, allowing the model to dynamically skip low-contribution attention weights via a configurable softmax_threshold. The core approach supports dense, sparse, and gated attention variants, regular and variable-length inputs, causal and local window attention, and optimizations like Split-KV for decoding. This allows for reduced effective compute and memory usage on long sequences.
Quick Start & Requirements
pip install flash-sparse-attngit clone https://github.com/flash-algo/flash-sparse-attn.git && cd flash-sparse-attn && pip install .https://github.com/flash-algo/flash-sparse-attn.gitHighlighted Details
Maintenance & Community
The project acknowledges contributions from OpenSeek, Flash-Attention, and NVIDIA CUTLASS. No specific community channels (like Discord/Slack) or roadmap links are provided in the README. The citation points to an arXiv paper from 2025, indicating recent development activity.
Licensing & Compatibility
The README does not explicitly state the project's license. This omission requires further investigation for compatibility with commercial or closed-source applications.
Limitations & Caveats
Support for arbitrary mask and bias shapes is available in a separate branch, not the main branch. Features such as Paged Attention, TMA, WGMMA, and FP8 low precision are listed as future aims, indicating they are not yet implemented.
4 days ago
Inactive
lucidrains
microsoft
google-research
flashinfer-ai