PiPPy  by pytorch

PyTorch tool for pipeline parallelism

Created 3 years ago
779 stars

Top 44.9% on SourcePulse

GitHubView on GitHub
Project Summary

PiPPy provides a compiler and runtime for automating pipeline parallelism in PyTorch models, targeting researchers and engineers scaling large deep learning models. It simplifies the implementation of pipeline parallelism, enabling efficient execution across multiple devices and hosts with minimal code modification.

How It Works

PiPPy automatically partitions a PyTorch model into stages by tracing its execution graph. It then transforms these stages into a Pipe object, which defines the data flow between stages. A PipelineStage runtime executes these stages concurrently, managing micro-batch splitting, inter-stage communication, and gradient synchronization. This approach allows for automatic handling of complex model topologies like skip connections and tied weights.

Quick Start & Requirements

  • Install PyTorch nightly (newer than 2.2.0.dev): pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html (or CUDA version).
  • Install PiPPy from source: python setup.py install or python setup.py develop.
  • Requires PyTorch >= 2.2.0.dev.
  • Examples are available in the HuggingFace examples directory.

Highlighted Details

  • Automatic model splitting via tracing and annotate_split_points or pipe_split() API.
  • Supports non-trivial topologies, including skip connections and tied weights.
  • First-class support for cross-host pipeline parallelism.
  • Composability with data parallelism and tensor parallelism (3D parallelism).
  • Supports various pipeline scheduling paradigms (e.g., GPipe, 1F1B).

Maintenance & Community

PiPPy has been migrated into PyTorch as torch.distributed.pipelining. The original repository now serves as an examples land, and library code will be removed.

Licensing & Compatibility

PiPPy is licensed under the 3-clause BSD license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The original PiPPy repository's library code is slated for removal, with users directed to use the torch.distributed.pipelining subpackage. The README primarily serves as an example repository.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.