【FlexCheckpoint】Add the test about the sharded_state_dict of optimizer #75067
Merged
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #75067 +/- ##
==========================================
Coverage ? 0
==========================================
Files ? 0
Lines ? 0
Branches ? 0
==========================================
Hits ? 0
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xingmingyyj
reviewed
Sep 4, 2025
| static_to_struct = { | ||
| v.local_tensor.name: k for k, v in model_sharded_state_dict.items() | ||
| } | ||
| static_to_struct = {} |
| static_to_struct_mapping = { | ||
| v.local_tensor.name: k for k, v in model_sharded_state_dict.items() | ||
| } | ||
| static_to_struct_mapping = {} |
Contributor
Author
|
/re-run all-failed |
Contributor
Author
|
/re-run all-failed |
0c385bc to
a7ba193
Compare
Contributor
Author
|
/re-run all-failed |
Contributor
|
LGTM |
Contributor
Author
|
/re-run all-failed |
XieYunshen
approved these changes
Sep 23, 2025
Contributor
Author
|
/re-run all-failed |
wanglezz
pushed a commit
to wanglezz/Paddle
that referenced
this pull request
Sep 25, 2025
…75067) * fix the share_weight_bug * add note * add the unit test * set the timeout * add more test * Trigger CI rebuild * fix the CmakeLists
xingmingyyj
pushed a commit
to xingmingyyj/Paddle
that referenced
this pull request
Oct 22, 2025
…75067) * fix the share_weight_bug * add note * add the unit test * set the timeout * add more test * Trigger CI rebuild * fix the CmakeLists
swgu98
pushed a commit
that referenced
this pull request
Oct 23, 2025
…#75996) * [Flex CP]Fix merge_sharded_state_dict with aoa and offload (#75062) * fix merge_state_dict with aoa and offload * add tests * refine * fix * fix * add log * fix * fix * 【FlexCheckpoint】Upgrade some macros and optimize load_state_dict communication (#75282) * upgrad macros and load_state_dict comm task fix fix support 0-d tensor fix balance save and fix * fix test * Add the test about the sharded_state_dict of optimizer (#75067) * fix the share_weight_bug * add note * add the unit test * set the timeout * add more test * Trigger CI rebuild * fix the CmakeLists * handle_missing_edge_cases_in_fc (#75413) * up_grade fc (#75613) fix and add test fix fix fix fix cmakelists add notion --------- Co-authored-by: Chen Zhiyang <[email protected]> Co-authored-by: Tianyu Zheng <[email protected]>
xingmingyyj
pushed a commit
to xingmingyyj/Paddle
that referenced
this pull request
Nov 5, 2025
…75067) * fix the share_weight_bug * add note * add the unit test * set the timeout * add more test * Trigger CI rebuild * fix the CmakeLists
sneaxiy
pushed a commit
that referenced
this pull request
Nov 6, 2025
….2 (#76249) * 【FlexCP】merge_sharded_state_dict support distribute merge (#75005) * fix data is nullptr * add dist merge * change test * change test * 【FlexCP】add Skip param param for merge_shard_state_dict (#75061) * fix data is nullptr * add dist merge * change test * change test * add skip optimizer param * [Flex CP]Fix merge_sharded_state_dict with aoa and offload (#75062) * fix merge_state_dict with aoa and offload * add tests * refine * fix * fix * add log * fix * fix * 【FlexCheckpoint】Upgrade some macros and optimize load_state_dict communication (#75282) * upgrad macros and load_state_dict comm task fix fix support 0-d tensor fix balance save and fix * fix test * Add the test about the sharded_state_dict of optimizer (#75067) * fix the share_weight_bug * add note * add the unit test * set the timeout * add more test * Trigger CI rebuild * fix the CmakeLists * handle_missing_edge_cases_in_fc (#75413) * up_grade fc (#75613) fix and add test fix fix fix fix cmakelists add notion * 【FlexCheckpoint】fix_the_layer_id_macro (#75556) * fix_the_layer_id_macro * fix the ctest * add expert_id_macro * fix the assert bug * fix the code style * Pr support load hf checkpoint (#75928) * support hf checkpoint fix support cast add id macro fix * add test and fix some bug * fix full param bug * add full param cast test --------- Co-authored-by: xingmingyyj <[email protected]> * 【Flexcheckpoint】add_get_var_mapping_chain_macro (#76013) * add_get_var_mapping_chain_macro * add note * fix the bug input_vars and resolve_mapping_chain * fix the code style * fit the dtype assert bug * fix the bug * fix the merge_sharded_state_dict bug * fix aoa transpose corner case (#76234) --------- Co-authored-by: xiaoguoguo626807 <[email protected]> Co-authored-by: Chen Zhiyang <[email protected]> Co-authored-by: Tianyu Zheng <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
在optimizer的sharded_state_dict中,对于共享参数来说,它们共享同一个weight,并且只有首次出现的参数会对齐创建优化器状态,因此在此处需要做判断,避免后续共享参数把前面首次出现的参数覆盖掉。