Megatron-LM  by NVIDIA

Framework for training transformer models at scale

Created 6 years ago
13,602 stars

Top 3.6% on SourcePulse

GitHubView on GitHub
Project Summary

Megatron-LM and Megatron-Core provide a research framework and a library of GPU-optimized techniques for training transformer models at scale. It is designed for researchers and developers working with large language models, offering advanced parallelism and memory-saving features for efficient training on NVIDIA hardware.

How It Works

Megatron-Core offers composable, modular APIs for GPU-optimized building blocks like attention mechanisms, transformer layers, and normalization. It supports advanced model parallelism (tensor, sequence, pipeline, context, MoE) and data parallelism, enabling efficient training of models with hundreds of billions of parameters. Techniques like activation recomputation, distributed optimizers, and FlashAttention further reduce memory usage and improve training speed.

Quick Start & Requirements

  • Installation: Recommended via NGC's PyTorch container. Docker commands provided for setup.
  • Prerequisites: Latest PyTorch, CUDA, NCCL, NVIDIA APEX. NLTK for data preprocessing.
  • Resources: Requires NVIDIA GPUs (Hopper architecture support for FP8). Training examples scale up to 6144 H100 GPUs.
  • Documentation: Megatron-Core Developer Guide

Highlighted Details

  • Supports advanced parallelism: tensor, sequence, pipeline, context, and MoE expert parallelism.
  • Features memory optimization techniques: activation checkpointing, distributed optimizer, FlashAttention.
  • Enables efficient training of models with hundreds of billions of parameters, demonstrating strong scaling on H100 GPUs.
  • Offers tools for checkpoint conversion between different model classes and formats.

Maintenance & Community

  • Actively developed by NVIDIA, with recent updates including Mamba support and multimodal training enhancements.
  • Links to documentation and examples are provided.

Licensing & Compatibility

  • License: OpenBSD.
  • Compatible with NVIDIA accelerated computing infrastructure and Tensor Core GPUs.

Limitations & Caveats

FlashAttention is non-deterministic; use --use-flash-attn with caution if bitwise reproducibility is critical. Transformer Engine requires NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 for determinism. Determinism verified in NGC PyTorch containers >= 23.12.

Health Check
Last Commit

13 hours ago

Responsiveness

1 week

Pull Requests (30d)
41
Issues (30d)
40
Star History
370 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 8 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 19 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.