MoBA by MoonshotAI

Research paper introducing MoBA for long-context LLMs

Created 10 months ago

2,031 stars

Top 21.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

MoBA (Mixture of Block Attention) addresses the quadratic complexity of attention mechanisms in Large Language Models (LLMs) for long-context processing. It targets researchers and developers building or fine-tuning LLMs, offering a flexible and efficient alternative to standard attention that can transition between full and sparse modes without performance compromise.

How It Works

MoBA applies Mixture of Experts (MoE) principles to the attention mechanism. It divides the full context into blocks and uses a parameter-less top-k gating mechanism for each query token to select the most relevant KV blocks. This allows the model to autonomously learn where to attend, avoiding predefined biases of other sparse attention methods. The approach is designed for seamless integration and continued training with existing models.

Quick Start & Requirements

Install: conda create -n moba python=3.10, conda activate moba, pip install .
Prerequisites: flash-attn==2.6.3, torch >= 2.1.0.
Usage: python3 examples/llama.py --model meta-llama/Llama-3.1-8B --attn moba
More info: MoonshotAI/MoBA

Highlighted Details

Offers a production-ready moba_efficient implementation with up to 40x speedup over moba_naive (tested at 32K sequence length, 1 attention head, MoBA Block 2048, MoBA Topk 3).
Designed for seamless transition between full and sparse attention modes.
Requires continued training of existing models to achieve acceleration benefits; not a drop-in solution for pre-trained models.
Successfully deployed to support Kimi's long-context requests.

Maintenance & Community

The project is associated with Moonshot AI.
Code is available on GitHub: MoonshotAI/MoBA.

Licensing & Compatibility

The README does not explicitly state a license. The code repository should be checked for licensing information.

Limitations & Caveats

MoBA requires continued training on existing models to realize its performance benefits; it is not a direct replacement for standard attention on pre-trained models.
The moba_naive implementation is for understanding and visualization, not production use.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days