MegaDLMs by JinjieNi

Accelerate diffusion language model training at any scale with GPU optimization

Created 4 months ago

327 stars

Top 83.7% on SourcePulse

Project Summary

Summary

MegaDLMs is a GPU-optimized framework designed for training diffusion language models (DLMs) and autoregressive LMs at any scale. It serves as the backend for projects like Quokka and OpenMoE 2, offering significant speedups and high model FLOP utilization (MFU) for researchers and power users.

How It Works

The framework integrates Megatron-LM's flexible parallelism strategies (TP, PP, CP, EP) with Transformer Engine's optimized GPU kernels for efficient layer computation. It supports various precision formats (FP8, FP16, BF16) and incorporates advanced techniques like FlashAttention and communication overlaps to maximize training throughput and scalability.

Quick Start & Requirements

Installation is recommended via the PyTorch NGC Container (nvcr.io/nvidia/pytorch:24.11-py3). Alternatively, users can build from source following the Megatron-LM guide. Key software prerequisites include PyTorch, Transformer Engine, and up-to-date CUDA/cuDNN/NCCL. Hardware requirements specify NVIDIA GPUs with FP8 support (Hopper, Ada, Blackwell) or Turing architecture and later. Environment variables must be set from envs/.env. Official documentation is available at https://deepwiki.com/JinjieNi/MegaDLMs.

Highlighted Details

Supports comprehensive training pipelines for DLMs and Autoregressive LMs, including pre-training, SFT, and RL on dense and MoE architectures.
Achieves up to 47% MFU and 3x faster training speeds compared to other frameworks on H100 clusters.
Seamless integration with HuggingFace checkpoints.
Implements advanced parallelism: Data Parallelism (DDP, FSDP), Tensor Parallelism, Pipeline Parallelism, Context Parallelism, and Expert Parallelism.
Performance optimizations include FlashAttention, FP8 training, activation checkpointing, and communication overlap.

Maintenance & Community

MegaDLMs is an actively maintained codebase, serving as the training backend for related projects like Quokka, Super Data Learners, and OpenMoE 2. Specific community channels (e.g., Discord, Slack) are not detailed in the README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

MoE pre-training is slated for release after OpenMoE 2 training concludes. Features such as SFT, RL, and multi-modality support are still in development and listed on the project's todo list. The project was recently released (November 2025), indicating it is still evolving.

MegaDLMs by JinjieNi

Explore Similar Projects

ReaLHF by openpsi-project

ModelCenter by OpenBMB

PatrickStar by Tencent

onnxruntime-training-examples by microsoft

tiny-llm-zh by wdndev

lightning-thunder by Lightning-AI

FlagScale by flagos-ai

Megatron-DeepSpeed by bigscience-workshop

OpenRLHF by OpenRLHF

gpt-neox by EleutherAI

accelerate by huggingface

Megatron-LM by NVIDIA