DualPipe by deepseek-ai

Pipeline parallelism algorithm for training large models

Created 10 months ago

2,901 stars

Top 16.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Johannes Hagemann

Cofounder of Prime Intellect

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

DualPipe is a bidirectional pipeline parallelism algorithm designed to optimize large model training by enabling computation-communication overlap and reducing pipeline bubbles. It targets researchers and engineers working with large-scale deep learning models who need to improve training efficiency and throughput. The primary benefit is enhanced training speed through minimized idle time during communication phases.

How It Works

DualPipe implements a bidirectional pipeline schedule, allowing forward and backward passes to overlap with communication. This approach, detailed in the DeepSeek-V3 Technical Report, aims to fully utilize hardware resources by keeping computation units busy. A derived "V-shape" schedule, DualPipeV, further refines this by halving the pipeline stages, potentially reducing memory usage and further improving efficiency.

Quick Start & Requirements

Install via pip (not explicitly stated, but implied by Python examples).
Requirements: PyTorch 2.0 and above.
Examples: python examples/example_dualpipe.py, python examples/example_dualpipev.py.
Note: Custom overlapped_forward_backward implementation is required for real-world applications.

Highlighted Details

Achieves full overlap of forward and backward computation-communication phases.
Reduces pipeline bubbles compared to traditional methods like 1F1B and ZB1P.
DualPipeV offers reduced activation memory per device compared to DualPipe.
The algorithm is introduced in the DeepSeek-V3 Technical Report (arXiv:2412.19437).

Maintenance & Community

Developed by Jiashi Li, Chengqi Deng, and Wenfeng Liang from DeepSeek-AI. No community links (Discord, Slack, etc.) are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The README indicates that a custom overlapped_forward_backward method is necessary for practical deployment, suggesting the provided examples are illustrative rather than fully production-ready. The effectiveness of DualPipeV's memory reduction is dependent on the number of pipeline stages being even.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days