PiPPy  by pytorch

PyTorch tool for pipeline parallelism

created 3 years ago
775 stars

Top 46.0% on sourcepulse

GitHubView on GitHub
Project Summary

PiPPy provides a compiler and runtime for automating pipeline parallelism in PyTorch models, targeting researchers and engineers scaling large deep learning models. It simplifies the implementation of pipeline parallelism, enabling efficient execution across multiple devices and hosts with minimal code modification.

How It Works

PiPPy automatically partitions a PyTorch model into stages by tracing its execution graph. It then transforms these stages into a Pipe object, which defines the data flow between stages. A PipelineStage runtime executes these stages concurrently, managing micro-batch splitting, inter-stage communication, and gradient synchronization. This approach allows for automatic handling of complex model topologies like skip connections and tied weights.

Quick Start & Requirements

  • Install PyTorch nightly (newer than 2.2.0.dev): pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html (or CUDA version).
  • Install PiPPy from source: python setup.py install or python setup.py develop.
  • Requires PyTorch >= 2.2.0.dev.
  • Examples are available in the HuggingFace examples directory.

Highlighted Details

  • Automatic model splitting via tracing and annotate_split_points or pipe_split() API.
  • Supports non-trivial topologies, including skip connections and tied weights.
  • First-class support for cross-host pipeline parallelism.
  • Composability with data parallelism and tensor parallelism (3D parallelism).
  • Supports various pipeline scheduling paradigms (e.g., GPipe, 1F1B).

Maintenance & Community

PiPPy has been migrated into PyTorch as torch.distributed.pipelining. The original repository now serves as an examples land, and library code will be removed.

Licensing & Compatibility

PiPPy is licensed under the 3-clause BSD license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The original PiPPy repository's library code is slated for removal, with users directed to use the torch.distributed.pipelining subpackage. The README primarily serves as an example repository.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.