MoBA  by MoonshotAI

Research paper introducing MoBA for long-context LLMs

created 5 months ago
1,847 stars

Top 23.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MoBA (Mixture of Block Attention) addresses the quadratic complexity of attention mechanisms in Large Language Models (LLMs) for long-context processing. It targets researchers and developers building or fine-tuning LLMs, offering a flexible and efficient alternative to standard attention that can transition between full and sparse modes without performance compromise.

How It Works

MoBA applies Mixture of Experts (MoE) principles to the attention mechanism. It divides the full context into blocks and uses a parameter-less top-k gating mechanism for each query token to select the most relevant KV blocks. This allows the model to autonomously learn where to attend, avoiding predefined biases of other sparse attention methods. The approach is designed for seamless integration and continued training with existing models.

Quick Start & Requirements

  • Install: conda create -n moba python=3.10, conda activate moba, pip install .
  • Prerequisites: flash-attn==2.6.3, torch >= 2.1.0.
  • Usage: python3 examples/llama.py --model meta-llama/Llama-3.1-8B --attn moba
  • More info: MoonshotAI/MoBA

Highlighted Details

  • Offers a production-ready moba_efficient implementation with up to 40x speedup over moba_naive (tested at 32K sequence length, 1 attention head, MoBA Block 2048, MoBA Topk 3).
  • Designed for seamless transition between full and sparse attention modes.
  • Requires continued training of existing models to achieve acceleration benefits; not a drop-in solution for pre-trained models.
  • Successfully deployed to support Kimi's long-context requests.

Maintenance & Community

  • The project is associated with Moonshot AI.
  • Code is available on GitHub: MoonshotAI/MoBA.

Licensing & Compatibility

  • The README does not explicitly state a license. The code repository should be checked for licensing information.

Limitations & Caveats

  • MoBA requires continued training on existing models to realize its performance benefits; it is not a direct replacement for standard attention on pre-trained models.
  • The moba_naive implementation is for understanding and visualization, not production use.
Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
98 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.