Discover and explore top open-source AI tools and projects—updated daily.
JinjieNiAccelerate diffusion language model training at any scale with GPU optimization
Top 92.2% on SourcePulse
Summary
MegaDLMs is a GPU-optimized framework designed for training diffusion language models (DLMs) and autoregressive LMs at any scale. It serves as the backend for projects like Quokka and OpenMoE 2, offering significant speedups and high model FLOP utilization (MFU) for researchers and power users.
How It Works
The framework integrates Megatron-LM's flexible parallelism strategies (TP, PP, CP, EP) with Transformer Engine's optimized GPU kernels for efficient layer computation. It supports various precision formats (FP8, FP16, BF16) and incorporates advanced techniques like FlashAttention and communication overlaps to maximize training throughput and scalability.
Quick Start & Requirements
Installation is recommended via the PyTorch NGC Container (nvcr.io/nvidia/pytorch:24.11-py3). Alternatively, users can build from source following the Megatron-LM guide. Key software prerequisites include PyTorch, Transformer Engine, and up-to-date CUDA/cuDNN/NCCL. Hardware requirements specify NVIDIA GPUs with FP8 support (Hopper, Ada, Blackwell) or Turing architecture and later. Environment variables must be set from envs/.env. Official documentation is available at https://deepwiki.com/JinjieNi/MegaDLMs.
Highlighted Details
Maintenance & Community
MegaDLMs is an actively maintained codebase, serving as the training backend for related projects like Quokka, Super Data Learners, and OpenMoE 2. Specific community channels (e.g., Discord, Slack) are not detailed in the README.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license, which is generally permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
MoE pre-training is slated for release after OpenMoE 2 training concludes. Features such as SFT, RL, and multi-modality support are still in development and listed on the project's todo list. The project was recently released (November 2025), indicating it is still evolving.
2 weeks ago
Inactive
bigscience-workshop
EleutherAI
huggingface