Transformer LM research repo for BERT & GPT-2 training at scale
Top 29.5% on sourcepulse
This repository is a fork of Microsoft's Megatron-DeepSpeed, which itself is a fork of NVIDIA's Megatron-LM, specifically tailored for the BigScience project's large-scale transformer language model training. It enables researchers and engineers to train models like BERT and GPT-2 with advanced distributed training techniques.
How It Works
This project integrates DeepSpeed's optimizations (like ZeRO-DP and pipeline parallelism) with Megatron-LM's architecture for efficient, large-scale distributed training. It supports tensor and pipeline model parallelism, allowing model layers and computations to be split across multiple GPUs and nodes, significantly reducing memory requirements and increasing training throughput.
Quick Start & Requirements
pip install -r requirements.txt
. Apex and DeepSpeed require separate compilation steps.Highlighted Details
tokenizer-type=PretrainedFromHF
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive