Lightweight library for mixture-of-experts (MoE) training
Top 29.7% on sourcepulse
MegaBlocks is a lightweight library for efficient Mixture-of-Experts (MoE) training, targeting researchers and engineers working with large language models. It introduces "dropless-MoE" (dMoE) to improve training speed and efficiency, eliminating token dropping and the need for the capacity_factor
hyperparameter.
How It Works
MegaBlocks reformulates MoE layers using block-sparse operations, enabling efficient computation without dropping tokens. This approach avoids the hardware efficiency trade-offs typically associated with MoE implementations, leading to significant speedups. The library is integrated with Megatron-LM, supporting data, expert, and pipeline parallelism for MoE training.
Quick Start & Requirements
pip install megablocks
for general use. For Megatron-LM integration, use the provided Dockerfile (docker build . -t megablocks-dev
) and launch script (bash docker.sh
), then pip install .
inside the container.megablocks[gg]
. Megatron-LM integration requires datasets in Megatron-LM format.nvcr.io/nvidia/pytorch:23.09-py3
) is recommended for Megatron-LM integration.Highlighted Details
megablocks[gg]
.Maintenance & Community
The project is associated with Databricks and its authors are listed in the citation. Further integration with Databricks libraries is planned.
Licensing & Compatibility
The library is released under a permissive license, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The primary integration target is Megatron-LM, though it can be used with other frameworks like vLLM. Datasets for Megatron-LM integration require specific formatting.
1 month ago
1 day