Training-free sparse attention for model inference acceleration
Top 51.6% on sourcepulse
SpargeAttn provides a training-free sparse attention mechanism designed to accelerate inference across various models, including language, image, and video generation. It targets researchers and engineers seeking to improve the efficiency of existing deep learning architectures without requiring model retraining.
How It Works
SpargeAttn implements a novel sparse attention mechanism that dynamically identifies and focuses on salient attention patterns. This approach reduces computational overhead by selectively computing attention scores, leading to significant speedups during inference. The implementation offers two variants, one based on SageAttention and an updated version based on SageAttention2, which claims a further 30% speedup.
Quick Start & Requirements
pip install ninja
followed by python setup.py install
or pip install -e .
.Highlighted Details
Maintenance & Community
The project welcomes contributions for supporting additional models. Links to Hugging Face for tuned checkpoints are provided.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source integration.
Limitations & Caveats
The README notes that provided hyper-parameters are tuned for the SageAttention variant, and re-tuning is recommended for optimal performance with the newer SageAttention2 API. The --compile
flag can slow down the first inference pass.
2 days ago
1 week