-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[Distribution] Support DualPipeV #71427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
dd1e0c0 to
eb7bd55
Compare
3061629 to
40c4b45
Compare
40c4b45 to
52681ba
Compare
b320933 to
b179e5e
Compare
zhangbo9674
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ForFishes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ForFishes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [Distribution] Support DualPipeV
* [Distribution] Support DualPipeV (#71427) * [Distribution] Support DualPipeV * [Distributed] Add fail-fast for dualpipev (#71977) * [Distribution] support ScheduleNode for overlapping in dualpipev (#71665) * [Distribution] support ScheduleNode for overlapping in dualpipev * fix * opt mem * [Bug fix] fix mem leakage in dualpipev (#72070) * fix code style * fix pipeline in dynamic_shape
* [Distribution] Support DualPipeV * fix * fix
PR Category
Distributed Strategy
PR Types
New features
Description
An implementation of the DeepSeek-V3 DualPipeV, based on https://github.com/deepseek-ai/DualPipe/blob/main/dualpipe/dualpipev.py
For the pipeline schedule
Usage:
set
use_dualpipev=Truefor both yourPipelineLayerand thestrategy.hybrid_configsThe following codes can be run using
python -m paddle.distributed.launch --gpus="0,1,2,3" demo.pyFor the SplitBW Linear
SplitBW Linear is used for zero bubble pipeline proposed in https://arxiv.org/abs/2401.10241
Use

paddle.distributed.fleet.meta_parallel.zero_bubble_utils.SplitBWLinearto replace the standardnn.Linear. Notably,SplitBWLinearcan only be used inDualPipeV; otherwise, users need to manage theWeightGradStorethemselves to ensure that all weight gradients are calculated.Pcard-76459