DualPipe  by deepseek-ai

Pipeline parallelism algorithm for training large models

Created 6 months ago
2,858 stars

Top 16.7% on SourcePulse

GitHubView on GitHub
Project Summary

DualPipe is a bidirectional pipeline parallelism algorithm designed to optimize large model training by enabling computation-communication overlap and reducing pipeline bubbles. It targets researchers and engineers working with large-scale deep learning models who need to improve training efficiency and throughput. The primary benefit is enhanced training speed through minimized idle time during communication phases.

How It Works

DualPipe implements a bidirectional pipeline schedule, allowing forward and backward passes to overlap with communication. This approach, detailed in the DeepSeek-V3 Technical Report, aims to fully utilize hardware resources by keeping computation units busy. A derived "V-shape" schedule, DualPipeV, further refines this by halving the pipeline stages, potentially reducing memory usage and further improving efficiency.

Quick Start & Requirements

  • Install via pip (not explicitly stated, but implied by Python examples).
  • Requirements: PyTorch 2.0 and above.
  • Examples: python examples/example_dualpipe.py, python examples/example_dualpipev.py.
  • Note: Custom overlapped_forward_backward implementation is required for real-world applications.

Highlighted Details

  • Achieves full overlap of forward and backward computation-communication phases.
  • Reduces pipeline bubbles compared to traditional methods like 1F1B and ZB1P.
  • DualPipeV offers reduced activation memory per device compared to DualPipe.
  • The algorithm is introduced in the DeepSeek-V3 Technical Report (arXiv:2412.19437).

Maintenance & Community

Developed by Jiashi Li, Chengqi Deng, and Wenfeng Liang from DeepSeek-AI. No community links (Discord, Slack, etc.) are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The README indicates that a custom overlapped_forward_backward method is necessary for practical deployment, suggesting the provided examples are illustrative rather than fully production-ready. The effectiveness of DualPipeV's memory reduction is dependent on the number of pipeline stages being even.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Yang Song Yang Song(Professor at Caltech; Research Scientist at OpenAI), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
6 more.

PiPPy by pytorch

0%
779
PyTorch tool for pipeline parallelism
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 15 hours ago
Feedback? Help us improve.