fms-fsdp  by foundation-model-stack

Efficiently train foundation models with PyTorch

Created 1 year ago
265 stars

Top 96.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an example for efficiently pre-training foundation models, specifically Llama2, using native PyTorch features like Fully Sharded Data Parallel (FSDP) and the Scalable Data Parallelism API (SDPA) for Flash Attention v2. It targets researchers and engineers aiming to leverage PyTorch's advanced capabilities for large-scale model training, offering performance benchmarks and practical implementation details.

How It Works

The project leverages PyTorch's FSDP for distributed training and integrates an SDPA implementation of Flash Attention v2 for optimized attention computation. This approach aims to maximize hardware utilization and training throughput by combining these native PyTorch features with techniques like torch.compile, selective activation checkpointing, and overlapping computation with communication. The goal is to showcase efficient training strategies within the PyTorch ecosystem, rather than providing a full end-to-end framework.

Quick Start & Requirements

  • Install dependencies using pip install -r requirements.txt.
  • Recommended: Latest PyTorch nightlies and ibm-fms.
  • Training example uses Slurm (sbatch ./scripts/train.slurm), but torchrun commands are available.
  • Requires pre-tokenized data.

Highlighted Details

  • Achieves 4550 tokens/sec/GPU on 128 A100s and 9600 tokens/sec/GPU on 96 H100s for a 7B model with FSDP and Flash Attention v2.
  • Demonstrates MFUs ranging from 0.38 to 0.74 and HFUs from 0.46 to 0.74 across different model sizes and hardware.
  • Trained a Llama2 7B replica to 2.2T tokens, achieving ~20% faster throughput than published Llama2 times.
  • Includes a script to convert trained models to Hugging Face format.

Maintenance & Community

This repository is a companion to the Foundation Model Stack and represents IBM's work with the PyTorch community. Specific community links or active maintainer information are not detailed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The repository focuses on the pre-training phase and does not include data preparation or post-training alignment/tuning. It uses an internally curated dataset, omitting details on sampling ratios. Performance on smaller batch sizes for larger models may show lower hardware utilization (MFU).

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
407
Lightweight training framework for model pre-training
Created 1 year ago
Updated 4 weeks ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 15 hours ago
Feedback? Help us improve.