Skip to content

【FlexCheckpoint】Aoa config reverse#76437

Merged
From00 merged 11 commits into
PaddlePaddle:developfrom
zty-king:aoa_config_reverse
Nov 25, 2025
Merged

【FlexCheckpoint】Aoa config reverse#76437
From00 merged 11 commits into
PaddlePaddle:developfrom
zty-king:aoa_config_reverse

Conversation

@zty-king
Copy link
Copy Markdown
Contributor

@zty-king zty-king commented Nov 16, 2025

PR Category

User Experience

PR Types

Others

Description

  • 实现自动执行aoa_config的逆操作
  • 扩展cast的实现格式,原来的 dtype="dst_dtype"依旧支持,而aoa_config_reverse=True时,表示需要使用aoa_statements的逆操作,则cast操作必须设置为src_dtype='xxx',dst_dtype='xxx',例如:src_dtype='float16',dst_dtype='float32',表示从float16转换到float32。扩展原因在于aoa的7种原语当前add、merge、rename、split、remove、transpose这六种原语在配置时,src和dst信息都保留,因此可推导出逆向的aoa,只有cast丢失了src的dtype信息,因此进行统一,使得7种原语均可逆

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Nov 16, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot Bot added the contributor External developers label Nov 16, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@530cae4). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #76437   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         3           
  Lines              ?        59           
  Branches           ?         0           
===========================================
  Hits               ?        59           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

self.aoa_statements = [
"s0, s1 -> s, axis = 1 \n",
"s -> s, dtype = 'float64'\n",
"s -> s, dtype = 'float32'\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要修改dtype

Copy link
Copy Markdown
Contributor Author

@zty-king zty-king Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前cast仅支持float32,float16,bfloat16;新增了一个类型检查,单测也做一下统一

self.start_macro_test()


class TestFusedQkvOldMacro(TestMacro):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么修改单测

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移动了一下位置,新增了ID的macro单测

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

5 similar comments
@xingmingyyj
Copy link
Copy Markdown
Contributor

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

From00
From00 previously approved these changes Nov 19, 2025
Copy link
Copy Markdown
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xingmingyyj
Copy link
Copy Markdown
Contributor

/re-run all-failed

5 similar comments
@xingmingyyj
Copy link
Copy Markdown
Contributor

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@xingmingyyj
Copy link
Copy Markdown
Contributor

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

2 similar comments
@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@zty-king
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 merged commit 86d9966 into PaddlePaddle:develop Nov 25, 2025
58 of 59 checks passed
xingmingyyj pushed a commit to xingmingyyj/Paddle that referenced this pull request Dec 5, 2025
* aoa_config_reverse

* fix the bug

* add test

* fix dtype style and add test

* adapt full param update
swgu98 pushed a commit that referenced this pull request Dec 6, 2025
* 【FlexCheckpoint】Aoa config reverse (#76437)

* aoa_config_reverse

* fix the bug

* add test

* fix dtype style and add test

* adapt full param update

* [FlexCheckPoint]adapt fc to sharding stage3 (#76538)

* adapt fc to sharding stage3

* add test and fix bug

* fix bug

* fix bug

* add test

* fc comm using grouped send/recv (#76779)

fix

fix

fix

fix

---------

Co-authored-by: Tianyu Zheng <[email protected]>
@zty-king zty-king deleted the aoa_config_reverse branch January 8, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants