DualPipe  by deepseek-ai

Pipeline parallelism algorithm for training large models

created 5 months ago
2,838 stars

Top 17.1% on sourcepulse

GitHubView on GitHub
Project Summary

DualPipe is a bidirectional pipeline parallelism algorithm designed to optimize large model training by enabling computation-communication overlap and reducing pipeline bubbles. It targets researchers and engineers working with large-scale deep learning models who need to improve training efficiency and throughput. The primary benefit is enhanced training speed through minimized idle time during communication phases.

How It Works

DualPipe implements a bidirectional pipeline schedule, allowing forward and backward passes to overlap with communication. This approach, detailed in the DeepSeek-V3 Technical Report, aims to fully utilize hardware resources by keeping computation units busy. A derived "V-shape" schedule, DualPipeV, further refines this by halving the pipeline stages, potentially reducing memory usage and further improving efficiency.

Quick Start & Requirements

  • Install via pip (not explicitly stated, but implied by Python examples).
  • Requirements: PyTorch 2.0 and above.
  • Examples: python examples/example_dualpipe.py, python examples/example_dualpipev.py.
  • Note: Custom overlapped_forward_backward implementation is required for real-world applications.

Highlighted Details

  • Achieves full overlap of forward and backward computation-communication phases.
  • Reduces pipeline bubbles compared to traditional methods like 1F1B and ZB1P.
  • DualPipeV offers reduced activation memory per device compared to DualPipe.
  • The algorithm is introduced in the DeepSeek-V3 Technical Report (arXiv:2412.19437).

Maintenance & Community

Developed by Jiashi Li, Chengqi Deng, and Wenfeng Liang from DeepSeek-AI. No community links (Discord, Slack, etc.) are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The README indicates that a custom overlapped_forward_backward method is necessary for practical deployment, suggesting the provided examples are illustrative rather than fully production-ready. The effectiveness of DualPipeV's memory reduction is dependent on the number of pipeline stages being even.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
101 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
4 more.

PiPPy by pytorch

0.1%
775
PyTorch tool for pipeline parallelism
created 3 years ago
updated 11 months ago
Feedback? Help us improve.