Skip to content

Conversation

@K11OntheBoat
Copy link
Collaborator

修复当前Dev分支 因为splitwise_complete_prefilled_step IPCsignal 清理没有考虑到DP+EP的Bug.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -447 to -450
dp_rank_id = (
self.local_rank
+ self.parallel_config.local_data_parallel_id * self.parallel_config.tensor_parallel_size
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

会影响DP+PD之外的其他DP场景吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是林军之前为了适配TP加的逻辑,改了之后他帮忙测过了TP,没问题.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只会影响PD场景,这里执行前会判断是否是P节点,所以不会影响PD以外的场景,PD+TP在1P1D每个节点tp_size=2的情况下是能跑通的

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 2b2b645 into PaddlePaddle:develop Sep 29, 2025
30 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants