MoBA  by MoonshotAI

Research paper introducing MoBA for long-context LLMs

Created 8 months ago
1,949 stars

Top 22.4% on SourcePulse

GitHubView on GitHub
Project Summary

MoBA (Mixture of Block Attention) addresses the quadratic complexity of attention mechanisms in Large Language Models (LLMs) for long-context processing. It targets researchers and developers building or fine-tuning LLMs, offering a flexible and efficient alternative to standard attention that can transition between full and sparse modes without performance compromise.

How It Works

MoBA applies Mixture of Experts (MoE) principles to the attention mechanism. It divides the full context into blocks and uses a parameter-less top-k gating mechanism for each query token to select the most relevant KV blocks. This allows the model to autonomously learn where to attend, avoiding predefined biases of other sparse attention methods. The approach is designed for seamless integration and continued training with existing models.

Quick Start & Requirements

  • Install: conda create -n moba python=3.10, conda activate moba, pip install .
  • Prerequisites: flash-attn==2.6.3, torch >= 2.1.0.
  • Usage: python3 examples/llama.py --model meta-llama/Llama-3.1-8B --attn moba
  • More info: MoonshotAI/MoBA

Highlighted Details

  • Offers a production-ready moba_efficient implementation with up to 40x speedup over moba_naive (tested at 32K sequence length, 1 attention head, MoBA Block 2048, MoBA Topk 3).
  • Designed for seamless transition between full and sparse attention modes.
  • Requires continued training of existing models to achieve acceleration benefits; not a drop-in solution for pre-trained models.
  • Successfully deployed to support Kimi's long-context requests.

Maintenance & Community

  • The project is associated with Moonshot AI.
  • Code is available on GitHub: MoonshotAI/MoBA.

Licensing & Compatibility

  • The README does not explicitly state a license. The code repository should be checked for licensing information.

Limitations & Caveats

  • MoBA requires continued training on existing models to realize its performance benefits; it is not a direct replacement for standard attention on pre-trained models.
  • The moba_naive implementation is for understanding and visualization, not production use.
Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.