megablocks  by databricks

Lightweight library for mixture-of-experts (MoE) training

created 2 years ago
1,388 stars

Top 29.7% on sourcepulse

GitHubView on GitHub
Project Summary

MegaBlocks is a lightweight library for efficient Mixture-of-Experts (MoE) training, targeting researchers and engineers working with large language models. It introduces "dropless-MoE" (dMoE) to improve training speed and efficiency, eliminating token dropping and the need for the capacity_factor hyperparameter.

How It Works

MegaBlocks reformulates MoE layers using block-sparse operations, enabling efficient computation without dropping tokens. This approach avoids the hardware efficiency trade-offs typically associated with MoE implementations, leading to significant speedups. The library is integrated with Megatron-LM, supporting data, expert, and pipeline parallelism for MoE training.

Quick Start & Requirements

  • Installation: pip install megablocks for general use. For Megatron-LM integration, use the provided Dockerfile (docker build . -t megablocks-dev) and launch script (bash docker.sh), then pip install . inside the container.
  • Prerequisites: PyTorch, NumPy. For optimal performance on Hopper GPUs, install with megablocks[gg]. Megatron-LM integration requires datasets in Megatron-LM format.
  • Resources: NGC's PyTorch container (e.g., nvcr.io/nvidia/pytorch:23.09-py3) is recommended for Megatron-LM integration.

Highlighted Details

  • dMoE outperforms Tutel MoEs by up to 40%.
  • dMoE can accelerate training by up to 2.4x compared to dense Transformers with Megatron-LM.
  • Supports grouped GEMM for Hopper GPUs via megablocks[gg].
  • Compatible with vLLM for running models like Mixtral-8x7B.

Maintenance & Community

The project is associated with Databricks and its authors are listed in the citation. Further integration with Databricks libraries is planned.

Licensing & Compatibility

The library is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The primary integration target is Megatron-LM, though it can be used with other frameworks like vLLM. Datasets for Megatron-LM integration require specific formatting.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
48 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.