Megatron-DeepSpeed  by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

created 4 years ago
1,405 stars

Top 29.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a fork of Microsoft's Megatron-DeepSpeed, which itself is a fork of NVIDIA's Megatron-LM, specifically tailored for the BigScience project's large-scale transformer language model training. It enables researchers and engineers to train models like BERT and GPT-2 with advanced distributed training techniques.

How It Works

This project integrates DeepSpeed's optimizations (like ZeRO-DP and pipeline parallelism) with Megatron-LM's architecture for efficient, large-scale distributed training. It supports tensor and pipeline model parallelism, allowing model layers and computations to be split across multiple GPUs and nodes, significantly reducing memory requirements and increasing training throughput.

Quick Start & Requirements

  • Install: Clone the repository, then install dependencies using pip install -r requirements.txt. Apex and DeepSpeed require separate compilation steps.
  • Prerequisites: NVIDIA GPU with CUDA, PyTorch matching CUDA version.
  • Setup: Requires compiling Apex and DeepSpeed with specific CUDA architecture flags.
  • Docs: BigScience Workshop

Highlighted Details

  • Supports advanced parallelism techniques: Data Parallelism (DP), Tensor Model Parallelism (TP), and Pipeline Model Parallelism (PP).
  • Integrates DeepSpeed ZeRO-DP for memory optimization.
  • Provides scripts for data preprocessing, pretraining, fine-tuning, and evaluation of BERT and GPT models.
  • Enables use of Hugging Face tokenizers via tokenizer-type=PretrainedFromHF.

Maintenance & Community

  • Community-driven project with contributions welcome.
  • Links to BigScience issues and good first issues are provided for contribution guidance.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the provided README snippet. However, as it is a fork of Megatron-LM (Apache 2.0) and Megatron-DeepSpeed (MIT), it likely inherits permissive licensing.

Limitations & Caveats

  • Pipeline parallelism is not currently supported for the T5 model.
  • The test suite is not yet integrated with CI and requires manual execution.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.