Megatron-DeepSpeed by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

Created 4 years ago

1,427 stars

Top 28.3% on SourcePulse

View on GitHub

4 Experts Love This Project

Amanpreet Singh

Cofounder of Contextual AI

Johannes Hagemann

Cofounder of Prime Intellect

Jeff Hammerbacher

Cofounder of Cloudera

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository is a fork of Microsoft's Megatron-DeepSpeed, which itself is a fork of NVIDIA's Megatron-LM, specifically tailored for the BigScience project's large-scale transformer language model training. It enables researchers and engineers to train models like BERT and GPT-2 with advanced distributed training techniques.

How It Works

This project integrates DeepSpeed's optimizations (like ZeRO-DP and pipeline parallelism) with Megatron-LM's architecture for efficient, large-scale distributed training. It supports tensor and pipeline model parallelism, allowing model layers and computations to be split across multiple GPUs and nodes, significantly reducing memory requirements and increasing training throughput.

Quick Start & Requirements

Install: Clone the repository, then install dependencies using pip install -r requirements.txt. Apex and DeepSpeed require separate compilation steps.
Prerequisites: NVIDIA GPU with CUDA, PyTorch matching CUDA version.
Setup: Requires compiling Apex and DeepSpeed with specific CUDA architecture flags.
Docs: BigScience Workshop

Highlighted Details

Supports advanced parallelism techniques: Data Parallelism (DP), Tensor Model Parallelism (TP), and Pipeline Model Parallelism (PP).
Integrates DeepSpeed ZeRO-DP for memory optimization.
Provides scripts for data preprocessing, pretraining, fine-tuning, and evaluation of BERT and GPT models.
Enables use of Hugging Face tokenizers via tokenizer-type=PretrainedFromHF.

Maintenance & Community

Community-driven project with contributions welcome.
Links to BigScience issues and good first issues are provided for contribution guidance.

Licensing & Compatibility

The repository itself does not explicitly state a license in the provided README snippet. However, as it is a fork of Megatron-LM (Apache 2.0) and Megatron-DeepSpeed (MIT), it likely inherits permissive licensing.

Limitations & Caveats