Megatron-DeepSpeed  by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

Created 4 years ago
1,425 stars

Top 28.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a fork of Microsoft's Megatron-DeepSpeed, which itself is a fork of NVIDIA's Megatron-LM, specifically tailored for the BigScience project's large-scale transformer language model training. It enables researchers and engineers to train models like BERT and GPT-2 with advanced distributed training techniques.

How It Works

This project integrates DeepSpeed's optimizations (like ZeRO-DP and pipeline parallelism) with Megatron-LM's architecture for efficient, large-scale distributed training. It supports tensor and pipeline model parallelism, allowing model layers and computations to be split across multiple GPUs and nodes, significantly reducing memory requirements and increasing training throughput.

Quick Start & Requirements

  • Install: Clone the repository, then install dependencies using pip install -r requirements.txt. Apex and DeepSpeed require separate compilation steps.
  • Prerequisites: NVIDIA GPU with CUDA, PyTorch matching CUDA version.
  • Setup: Requires compiling Apex and DeepSpeed with specific CUDA architecture flags.
  • Docs: BigScience Workshop

Highlighted Details

  • Supports advanced parallelism techniques: Data Parallelism (DP), Tensor Model Parallelism (TP), and Pipeline Model Parallelism (PP).
  • Integrates DeepSpeed ZeRO-DP for memory optimization.
  • Provides scripts for data preprocessing, pretraining, fine-tuning, and evaluation of BERT and GPT models.
  • Enables use of Hugging Face tokenizers via tokenizer-type=PretrainedFromHF.

Maintenance & Community

  • Community-driven project with contributions welcome.
  • Links to BigScience issues and good first issues are provided for contribution guidance.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the provided README snippet. However, as it is a fork of Megatron-LM (Apache 2.0) and Megatron-DeepSpeed (MIT), it likely inherits permissive licensing.

Limitations & Caveats

  • Pipeline parallelism is not currently supported for the T5 model.
  • The test suite is not yet integrated with CI and requires manual execution.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.0%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 3 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 1 month ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 3 weeks ago
Feedback? Help us improve.