megablocks  by databricks

Lightweight library for mixture-of-experts (MoE) training

Created 2 years ago
1,478 stars

Top 27.8% on SourcePulse

GitHubView on GitHub
Project Summary

MegaBlocks is a lightweight library for efficient Mixture-of-Experts (MoE) training, targeting researchers and engineers working with large language models. It introduces "dropless-MoE" (dMoE) to improve training speed and efficiency, eliminating token dropping and the need for the capacity_factor hyperparameter.

How It Works

MegaBlocks reformulates MoE layers using block-sparse operations, enabling efficient computation without dropping tokens. This approach avoids the hardware efficiency trade-offs typically associated with MoE implementations, leading to significant speedups. The library is integrated with Megatron-LM, supporting data, expert, and pipeline parallelism for MoE training.

Quick Start & Requirements

  • Installation: pip install megablocks for general use. For Megatron-LM integration, use the provided Dockerfile (docker build . -t megablocks-dev) and launch script (bash docker.sh), then pip install . inside the container.
  • Prerequisites: PyTorch, NumPy. For optimal performance on Hopper GPUs, install with megablocks[gg]. Megatron-LM integration requires datasets in Megatron-LM format.
  • Resources: NGC's PyTorch container (e.g., nvcr.io/nvidia/pytorch:23.09-py3) is recommended for Megatron-LM integration.

Highlighted Details

  • dMoE outperforms Tutel MoEs by up to 40%.
  • dMoE can accelerate training by up to 2.4x compared to dense Transformers with Megatron-LM.
  • Supports grouped GEMM for Hopper GPUs via megablocks[gg].
  • Compatible with vLLM for running models like Mixtral-8x7B.

Maintenance & Community

The project is associated with Databricks and its authors are listed in the citation. Further integration with Databricks libraries is planned.

Licensing & Compatibility

The library is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The primary integration target is Megatron-LM, though it can be used with other frameworks like vLLM. Datasets for Megatron-LM integration require specific formatting.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
40 more.

unsloth by unslothai

0.6%
48k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 5 hours ago
Feedback? Help us improve.