[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank.#60214
Merged
GhostScreaming merged 3 commits intoDec 27, 2023
Merged
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
3a829f4 to
d98f1c2
Compare
0c6c724 to
00a2aa7
Compare
zyfncg
approved these changes
Dec 27, 2023
Wanglongzhi2001
pushed a commit
to Wanglongzhi2001/Paddle
that referenced
this pull request
Jan 7, 2024
…rank. (PaddlePaddle#60214) * [AutoParallel] Fix pipeline parallel get none grad in non-computation rank. * fix optimizer update parameter is uninitialized * fix gradient clip --------- Co-authored-by: LiYuRio <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Others
Description
PCard-73145
修复动半下,流水线并行的非计算节点对 uninitialized Tensor 会返回 Python
None的问题。并修复hook打印uninitialized Tensor会报错的问题。nn.Linear有一个已知问题:bias可以为None,相应的传给_C_ops.linear的C++ bias Tensor是unitialized的,相应的会跳过add bias计算。这与动半pp的unitialized Tensor语义冲突。考虑这种情况:动半使用有bias的Linear,但非计算节点的Linear.bias天然是unitialized的,它会跳过调用PHI APIelementwise_add的操作,而计算节点仍旧有elementwise_add。目前这个问题没有造成影响,例如save_load如果要存储Linear.bias,仍旧可以通过paddle.distributed.reshard,从对应节点取得正确的bias。动转静也是根据python侧的nn.Linear改写的,跳过PHI API的add bias计算没有影响。