FlashAttention extension for ring attention
Top 43.8% on sourcepulse
This repository provides implementations of Ring Attention, a technique for scaling attention mechanisms across multiple GPUs, integrated with FlashAttention for improved efficiency. It targets researchers and engineers working with large language models who need to overcome memory and computational bottlenecks during training and inference, offering optimized attention kernels that reduce memory overhead and increase throughput.
How It Works
Ring Attention distributes attention computation across a ring of GPUs, allowing for longer sequence lengths than would be possible on a single device. It leverages FlashAttention's optimized kernels for efficient computation of the attention mechanism. The project offers several variants, including a basic ring attention, a compute-balanced "zigzag" version, and a "llama3" context parallelism approach that is less intrusive for existing training frameworks.
Quick Start & Requirements
pip install ring-flash-attn
or build from source.torchrun --nproc_per_node 8 test/test_llama3_flash_attn_varlen_func.py
(example for 8 GPUs).torchrun --nproc_per_node 8 benchmark/benchmark_kvpacked_func.py
(example for 8 GPUs).Highlighted Details
llama3_flash_attn_varlen_func
is recommended for varlen use cases due to its lower intrusion and better precision.Maintenance & Community
The project is actively developed by zhuzilin. There are no explicit mentions of community channels (e.g., Discord/Slack) or formal roadmaps in the README.
Licensing & Compatibility
The repository does not explicitly state a license. This is a critical omission for evaluating commercial use or integration into closed-source projects.
Limitations & Caveats
The implementation has known arithmetic errors, potentially due to bf16 precision in FlashAttention blocks and the need for extra fp32 buffers, leading to higher memory usage. Dropout and windowed attention are not supported due to difficulties in managing RNG states and implementation complexity.
1 week ago
1 day