-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[CP] Remove the need of recording cp_dim in the global var #162540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162540
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit 2cbd999 with merge base 7a0f933 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
XilunWu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: we still have multiple hardcoding sites (seq_dim=2 or seq_dim = 2)
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
We are not doing ring attention but only using allgather to do CP for Flex. Pull Request resolved: #162541 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: #162539, #162540
Summary: This PR is extracted from #162542, to make the original PR easier to review. This PR only contains cosmetic changes. Pull Request resolved: #163115 Approved by: https://github.com/tianyu-l ghstack dependencies: #162539, #162540, #162541
Pull Request resolved: #163131 Approved by: https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: #162539, #162540, #162541, #163115
…62540) This information can be obtained during the dispatching. Pull Request resolved: pytorch#162540 Approved by: https://github.com/ezyang, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539
We are not doing ring attention but only using allgather to do CP for Flex. Pull Request resolved: pytorch#162541 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540
Summary: This PR is extracted from pytorch#162542, to make the original PR easier to review. This PR only contains cosmetic changes. Pull Request resolved: pytorch#163115 Approved by: https://github.com/tianyu-l ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541
Pull Request resolved: pytorch#163131 Approved by: https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115
We should only unsqueeze if necessary. Fix pytorch#162743 Pull Request resolved: pytorch#163231 Approved by: https://github.com/eqy ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115, pytorch#163131
…62540) This information can be obtained during the dispatching. Pull Request resolved: pytorch#162540 Approved by: https://github.com/ezyang, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539
We are not doing ring attention but only using allgather to do CP for Flex. Pull Request resolved: pytorch#162541 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540
Summary: This PR is extracted from pytorch#162542, to make the original PR easier to review. This PR only contains cosmetic changes. Pull Request resolved: pytorch#163115 Approved by: https://github.com/tianyu-l ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541
Pull Request resolved: pytorch#163131 Approved by: https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115
We should only unsqueeze if necessary. Fix pytorch#162743 Pull Request resolved: pytorch#163231 Approved by: https://github.com/eqy ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115, pytorch#163131
…62540) This information can be obtained during the dispatching. Pull Request resolved: pytorch#162540 Approved by: https://github.com/ezyang, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539
We are not doing ring attention but only using allgather to do CP for Flex. Pull Request resolved: pytorch#162541 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540
Summary: This PR is extracted from pytorch#162542, to make the original PR easier to review. This PR only contains cosmetic changes. Pull Request resolved: pytorch#163115 Approved by: https://github.com/tianyu-l ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541
Pull Request resolved: pytorch#163131 Approved by: https://github.com/tianyu-l, https://github.com/XilunWu ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115
We should only unsqueeze if necessary. Fix pytorch#162743 Pull Request resolved: pytorch#163231 Approved by: https://github.com/eqy ghstack dependencies: pytorch#162539, pytorch#162540, pytorch#162541, pytorch#163115, pytorch#163131
Stack from ghstack (oldest at bottom):
This information can be obtained during the dispatching.
cc @H-Huang @awgu @wanchaol @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci