-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix DDP bug in single process multiple device use cases #36503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 Build failures summary and remediationsAs of commit d8299d1 (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 31 times. |
zhaojuanmao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, accidentally clicked approve
[ghstack-poisoned]
[ghstack-poisoned]
Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
…n't need them [ghstack-poisoned]
…ses" Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
…le device use cases" Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
…ses" Differential Revision: [D21179274](https://our.internmc.facebook.com/intern/diff/D21179274) [ghstack-poisoned]
zhaojuanmao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I will think about how to refactor the test structures as well.
thanks for fixing it.
This is a commit to merge pytorch#36503 into the 1.5.1 release. It fixes single-process multi-GPU DDP use cases by explicitly exposing model replica's parameters to DDP. pytorch#36656 is landed into master at 8d6a8d2. Summary: Pull Request resolved: pytorch#36503 Test Plan: Imported from OSS Differential Revision: D21179274 Pulled By: mrshenli fbshipit-source-id: 0afce30ae0ddda753d1e240584a0f80df9aec4c2
This is a commit to merge #36503 into the 1.5.1 release. It fixes single-process multi-GPU DDP use cases by explicitly exposing model replica's parameters to DDP. #36656 is landed into master at 8d6a8d2. Summary: Pull Request resolved: #36503 Test Plan: Imported from OSS Differential Revision: D21179274 Pulled By: mrshenli fbshipit-source-id: 0afce30ae0ddda753d1e240584a0f80df9aec4c2
Fixed by #36503 [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#40190 Fixed by pytorch#36503 Test Plan: Imported from OSS Differential Revision: D22101516 Pulled By: mrshenli fbshipit-source-id: 9abd6dce602530c11b7fe623ac0f4d556dccc961
Test was added a few months back in pytorch#36503 but recently became flaky for ROCm.
Stack from ghstack:
Differential Revision: D21179274