Discover and explore top open-source AI tools and projects—updated daily.
Mixture-of-Head attention for efficient Transformers
Top 94.3% on SourcePulse
This project introduces Mixture-of-Head Attention (MoH), a novel architecture that treats attention heads as experts within a Mixture-of-Experts framework. It enhances inference efficiency and model performance by allowing tokens to select appropriate attention heads and employing a weighted summation instead of standard summation. MoH is applicable to Vision Transformers (ViT), Diffusion Transformers (DiT), and Large Language Models (LLMs), targeting researchers and practitioners in computer vision and natural language processing.
How It Works
MoH replaces the standard multi-head attention mechanism with a system where attention heads are treated as specialized experts. Each token dynamically selects a subset of these heads, guided by learned routing mechanisms, and their outputs are combined via a weighted sum. This approach allows for more efficient parameter utilization and increased model flexibility compared to traditional multi-head attention, as demonstrated by performance gains even when using a reduced number of attention heads.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with SkyworkAI and has recent updates (October 2024) regarding LLaMA3-8B model availability and tokenizer configuration. Related projects include MoE++ and Chat-UniVi.
Licensing & Compatibility
The majority of the project is released under the Apache 2.0 license. However, the service is a research preview intended for non-commercial use only, subject to LLaMA's model license, OpenAI's data terms, and ShareGPT's privacy practices.
Limitations & Caveats
The "service" aspect is explicitly stated as a research preview for non-commercial use, imposing restrictions beyond the Apache 2.0 license due to dependencies on other model licenses and data terms.
10 months ago
Inactive