PiPPy by pytorch

PyTorch tool for pipeline parallelism

Created 4 years ago

783 stars

Top 44.9% on SourcePulse

View on GitHub

8 Experts Love This Project

Yang Song

Professor at Caltech; Research Scientist at OpenAI

Jeremy Howard

Cofounder of fast.ai

Johannes Hagemann

Cofounder of Prime Intellect

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

and 4 more!

Project Summary

PiPPy provides a compiler and runtime for automating pipeline parallelism in PyTorch models, targeting researchers and engineers scaling large deep learning models. It simplifies the implementation of pipeline parallelism, enabling efficient execution across multiple devices and hosts with minimal code modification.

How It Works

PiPPy automatically partitions a PyTorch model into stages by tracing its execution graph. It then transforms these stages into a Pipe object, which defines the data flow between stages. A PipelineStage runtime executes these stages concurrently, managing micro-batch splitting, inter-stage communication, and gradient synchronization. This approach allows for automatic handling of complex model topologies like skip connections and tied weights.

Quick Start & Requirements

Install PyTorch nightly (newer than 2.2.0.dev): pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html (or CUDA version).
Install PiPPy from source: python setup.py install or python setup.py develop.
Requires PyTorch >= 2.2.0.dev.
Examples are available in the HuggingFace examples directory.

Highlighted Details

Automatic model splitting via tracing and annotate_split_points or pipe_split() API.
Supports non-trivial topologies, including skip connections and tied weights.
First-class support for cross-host pipeline parallelism.
Composability with data parallelism and tensor parallelism (3D parallelism).
Supports various pipeline scheduling paradigms (e.g., GPipe, 1F1B).

Maintenance & Community

PiPPy has been migrated into PyTorch as torch.distributed.pipelining. The original repository now serves as an examples land, and library code will be removed.

Licensing & Compatibility

PiPPy is licensed under the 3-clause BSD license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The original PiPPy repository's library code is slated for removal, with users directed to use the torch.distributed.pipelining subpackage. The README primarily serves as an example repository.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days