megablocks by databricks

Lightweight library for mixture-of-experts (MoE) training

Created 3 years ago

1,519 stars

Top 27.0% on SourcePulse

View on GitHub

20 Experts Love This Project

Matei Zaharia

Cofounder of Databricks

Yaowei Zheng

Author of LLaMA-Factory

Jonathan Frankle

Chief AI Scientist at Databricks

Alexander Wettig

Coauthor of SWE-bench, SWE-agent

and 16 more!

Project Summary

MegaBlocks is a lightweight library for efficient Mixture-of-Experts (MoE) training, targeting researchers and engineers working with large language models. It introduces "dropless-MoE" (dMoE) to improve training speed and efficiency, eliminating token dropping and the need for the capacity_factor hyperparameter.

How It Works

MegaBlocks reformulates MoE layers using block-sparse operations, enabling efficient computation without dropping tokens. This approach avoids the hardware efficiency trade-offs typically associated with MoE implementations, leading to significant speedups. The library is integrated with Megatron-LM, supporting data, expert, and pipeline parallelism for MoE training.

Quick Start & Requirements

Installation: pip install megablocks for general use. For Megatron-LM integration, use the provided Dockerfile (docker build . -t megablocks-dev) and launch script (bash docker.sh), then pip install . inside the container.
Prerequisites: PyTorch, NumPy. For optimal performance on Hopper GPUs, install with megablocks[gg]. Megatron-LM integration requires datasets in Megatron-LM format.
Resources: NGC's PyTorch container (e.g., nvcr.io/nvidia/pytorch:23.09-py3) is recommended for Megatron-LM integration.

Highlighted Details

dMoE outperforms Tutel MoEs by up to 40%.
dMoE can accelerate training by up to 2.4x compared to dense Transformers with Megatron-LM.
Supports grouped GEMM for Hopper GPUs via megablocks[gg].
Compatible with vLLM for running models like Mixtral-8x7B.

Maintenance & Community

The project is associated with Databricks and its authors are listed in the citation. Further integration with Databricks libraries is planned.

Licensing & Compatibility

The library is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The primary integration target is Megatron-LM, though it can be used with other frameworks like vLLM. Datasets for Megatron-LM integration require specific formatting.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days