Discover and explore top open-source AI tools and projects—updated daily.
QwenLMAccelerate AI workloads with high-performance linear attention kernels
Top 60.4% on SourcePulse
FlashQLA is a high-performance linear attention kernel library designed to accelerate GDN Chunked Prefill operations. It targets researchers and engineers working with large language models, particularly those focused on pretraining or edge-side agentic inference, offering significant speedups over existing Triton kernels on modern NVIDIA hardware. The library leverages TileLang for optimized kernel development, enabling substantial performance gains.
How It Works
FlashQLA builds upon TileLang to implement a highly optimized GDN Chunked Prefill kernel. Its core approach involves applying reasonable operator fusion and performance optimizations to both forward and backward passes. Key innovations include gate-driven automatic intra-card context parallelism, which enhances GPU SM utilization by exploiting the exponential decay property of the GDN gate. Additionally, it employs hardware-friendly algebraic reformulations to reduce computational overhead without sacrificing numerical precision, and utilizes TileLang to construct fused, warp-specialized kernels that effectively overlap data movement and computation.
Quick Start & Requirements
git clone https://github.com/QwenLM/FlashQLA.git
cd FlashQLA
pip install -v .
Highlighted Details
Maintenance & Community
No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README. The project is associated with QwenLM.
Licensing & Compatibility
FlashQLA is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The library has stringent hardware and software requirements, mandating NVIDIA SM90+ GPUs and recent versions of CUDA and PyTorch. These prerequisites may limit adoption for users with older hardware or different development environments.
2 weeks ago
Inactive
microsoft
Dao-AILab
baidu-research
ztxz16
Dao-AILab