Discover and explore top open-source AI tools and projects—updated daily.
hustvlEfficient attention mechanism for deep language models
Top 96.0% on SourcePulse
Mixture-of-Depths Attention (MoDA) addresses signal degradation in deep LLMs by allowing attention heads to access information from preceding layers. This mechanism enables heads to attend to both current sequence KV pairs and KV pairs from earlier depths, improving feature propagation. MoDA offers a hardware-efficient implementation, serving as a promising primitive for scaling model depth without significant computational overhead, benefiting researchers and engineers developing advanced deep learning models.
How It Works
MoDA integrates a "depth stream" alongside standard sequence attention. Each head queries KV pairs from the current layer's sequence and from depth streams of preceding layers, mitigating feature dilution in deep models. The project emphasizes a hardware-efficient implementation that resolves non-contiguous memory access. A "Chunk/Group-aware MoDA" variant further optimizes depth KV calculation by reorganizing data based on chunk size and GQA groups, reducing memory access overhead.
Quick Start & Requirements
Installation involves cloning the repo and locally installing the MoDA-enabled fla package: cd libs/moda_triton && pip install -e .. Dependencies include PyTorch (>= 2.5), Triton (>= 3.0), einops, transformers (>= 4.53.0), datasets (>= 3.3.0), and causal-conv1d (>= 1.4.0). Example commands for testing the Triton kernel and training vision tasks (DeiT on ImageNet) are provided.
Highlighted Details
Maintenance & Community
Developed by researchers from Huazhong University of Science & Technology and ByteDance. Updates are shared via X/Twitter and blog articles. No explicit community channels or public roadmap are detailed.
Licensing & Compatibility
The specific open-source license is not explicitly stated in the provided README, requiring further investigation for commercial use or integration into closed-source projects.
Limitations & Caveats
The project is under active development, with a stated TODO to "Release full LLM training recipe and reproducible configs." Comprehensive LLM training configurations are pending, though vision task recipes are available.
3 weeks ago
Inactive
microsoft
feifeibear
flashinfer-ai