MegaDLMs  by JinjieNi

Accelerate diffusion language model training at any scale with GPU optimization

Created 1 month ago
284 stars

Top 92.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

MegaDLMs is a GPU-optimized framework designed for training diffusion language models (DLMs) and autoregressive LMs at any scale. It serves as the backend for projects like Quokka and OpenMoE 2, offering significant speedups and high model FLOP utilization (MFU) for researchers and power users.

How It Works

The framework integrates Megatron-LM's flexible parallelism strategies (TP, PP, CP, EP) with Transformer Engine's optimized GPU kernels for efficient layer computation. It supports various precision formats (FP8, FP16, BF16) and incorporates advanced techniques like FlashAttention and communication overlaps to maximize training throughput and scalability.

Quick Start & Requirements

Installation is recommended via the PyTorch NGC Container (nvcr.io/nvidia/pytorch:24.11-py3). Alternatively, users can build from source following the Megatron-LM guide. Key software prerequisites include PyTorch, Transformer Engine, and up-to-date CUDA/cuDNN/NCCL. Hardware requirements specify NVIDIA GPUs with FP8 support (Hopper, Ada, Blackwell) or Turing architecture and later. Environment variables must be set from envs/.env. Official documentation is available at https://deepwiki.com/JinjieNi/MegaDLMs.

Highlighted Details

  • Supports comprehensive training pipelines for DLMs and Autoregressive LMs, including pre-training, SFT, and RL on dense and MoE architectures.
  • Achieves up to 47% MFU and 3x faster training speeds compared to other frameworks on H100 clusters.
  • Seamless integration with HuggingFace checkpoints.
  • Implements advanced parallelism: Data Parallelism (DDP, FSDP), Tensor Parallelism, Pipeline Parallelism, Context Parallelism, and Expert Parallelism.
  • Performance optimizations include FlashAttention, FP8 training, activation checkpointing, and communication overlap.

Maintenance & Community

MegaDLMs is an actively maintained codebase, serving as the training backend for related projects like Quokka, Super Data Learners, and OpenMoE 2. Specific community channels (e.g., Discord, Slack) are not detailed in the README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

MoE pre-training is slated for release after OpenMoE 2 training concludes. Features such as SFT, RL, and multi-modality support are still in development and listed on the project's todo list. The project was recently released (November 2025), indicating it is still evolving.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
276 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
Created 5 years ago
Updated 2 months ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
Created 5 years ago
Updated 2 days ago
Feedback? Help us improve.