Discover and explore top open-source AI tools and projects—updated daily.
databricksLightweight library for mixture-of-experts (MoE) training
Top 27.8% on SourcePulse
MegaBlocks is a lightweight library for efficient Mixture-of-Experts (MoE) training, targeting researchers and engineers working with large language models. It introduces "dropless-MoE" (dMoE) to improve training speed and efficiency, eliminating token dropping and the need for the capacity_factor hyperparameter.
How It Works
MegaBlocks reformulates MoE layers using block-sparse operations, enabling efficient computation without dropping tokens. This approach avoids the hardware efficiency trade-offs typically associated with MoE implementations, leading to significant speedups. The library is integrated with Megatron-LM, supporting data, expert, and pipeline parallelism for MoE training.
Quick Start & Requirements
pip install megablocks for general use. For Megatron-LM integration, use the provided Dockerfile (docker build . -t megablocks-dev) and launch script (bash docker.sh), then pip install . inside the container.megablocks[gg]. Megatron-LM integration requires datasets in Megatron-LM format.nvcr.io/nvidia/pytorch:23.09-py3) is recommended for Megatron-LM integration.Highlighted Details
megablocks[gg].Maintenance & Community
The project is associated with Databricks and its authors are listed in the citation. Further integration with Databricks libraries is planned.
Licensing & Compatibility
The library is released under a permissive license, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The primary integration target is Megatron-LM, though it can be used with other frameworks like vLLM. Datasets for Megatron-LM integration require specific formatting.
4 months ago
1 day
microsoft
MoonshotAI
jiaweizzhao
FMInference
unslothai