fms-fsdp  by foundation-model-stack

Efficiently train foundation models with PyTorch

created 1 year ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an example for efficiently pre-training foundation models, specifically Llama2, using native PyTorch features like Fully Sharded Data Parallel (FSDP) and the Scalable Data Parallelism API (SDPA) for Flash Attention v2. It targets researchers and engineers aiming to leverage PyTorch's advanced capabilities for large-scale model training, offering performance benchmarks and practical implementation details.

How It Works

The project leverages PyTorch's FSDP for distributed training and integrates an SDPA implementation of Flash Attention v2 for optimized attention computation. This approach aims to maximize hardware utilization and training throughput by combining these native PyTorch features with techniques like torch.compile, selective activation checkpointing, and overlapping computation with communication. The goal is to showcase efficient training strategies within the PyTorch ecosystem, rather than providing a full end-to-end framework.

Quick Start & Requirements

  • Install dependencies using pip install -r requirements.txt.
  • Recommended: Latest PyTorch nightlies and ibm-fms.
  • Training example uses Slurm (sbatch ./scripts/train.slurm), but torchrun commands are available.
  • Requires pre-tokenized data.

Highlighted Details

  • Achieves 4550 tokens/sec/GPU on 128 A100s and 9600 tokens/sec/GPU on 96 H100s for a 7B model with FSDP and Flash Attention v2.
  • Demonstrates MFUs ranging from 0.38 to 0.74 and HFUs from 0.46 to 0.74 across different model sizes and hardware.
  • Trained a Llama2 7B replica to 2.2T tokens, achieving ~20% faster throughput than published Llama2 times.
  • Includes a script to convert trained models to Hugging Face format.

Maintenance & Community

This repository is a companion to the Foundation Model Stack and represents IBM's work with the PyTorch community. Specific community links or active maintainer information are not detailed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The repository focuses on the pre-training phase and does not include data preparation or post-training alignment/tuning. It uses an internally curated dataset, omitting details on sampling ratios. Performance on smaller batch sizes for larger models may show lower hardware utilization (MFU).

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

hyena-dna by HazyResearch

0%
704
Genomic foundation model for long-range DNA sequence modeling
created 2 years ago
updated 3 months ago
Feedback? Help us improve.