[reland] move rebuild buckets from end of first iteration to beginning of second iteration#44798
Closed
zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/54/basefrom
Closed
[reland] move rebuild buckets from end of first iteration to beginning of second iteration#44798zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/54/basefrom
zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/54/basefrom
Conversation
…g of second iteration [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112011490 Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
…to beginning of second iteration" [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 333a3c8 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
…to beginning of second iteration" [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
…to beginning of second iteration" [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
zhaojuanmao
added a commit
that referenced
this pull request
Sep 17, 2020
…g of second iteration Pull Request resolved: #44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112244222 ghstack-source-id: 112244222 Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!
…to beginning of second iteration" [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
…to beginning of second iteration" [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)! [ghstack-poisoned]
zhaojuanmao
added a commit
that referenced
this pull request
Sep 17, 2020
…g of second iteration Pull Request resolved: #44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112279261 ghstack-source-id: 112279261 Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!
Contributor
Author
|
failures are not related, ci-all tests also passed in #44893 |
rohan-varma
approved these changes
Sep 17, 2020
Contributor
rohan-varma
left a comment
There was a problem hiding this comment.
LGTM, the changes in join() will ensure we don't run into desync issues as discussed.
mrshenli
approved these changes
Sep 17, 2020
| work, ones, self.ddp_join_divide_by_initial_world_size | ||
| ) | ||
|
|
||
| # Calling _rebuild_buckets before forward compuation, |
Contributor
There was a problem hiding this comment.
compuation -> computation
Contributor
|
This pull request has been merged in e14b208. |
xuzhao9
pushed a commit
that referenced
this pull request
Sep 18, 2020
…g of second iteration (#44798) Summary: Pull Request resolved: #44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112279261 ghstack-source-id: 112279261 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D23735185 fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e
rohan-varma
added a commit
that referenced
this pull request
Oct 6, 2020
…_buckets fails Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. Differential Revision: [D24151256](https://our.internmc.facebook.com/intern/diff/D24151256/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Oct 6, 2020
…hen rebuild_buckets fails" Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. Differential Revision: [D24151256](https://our.internmc.facebook.com/intern/diff/D24151256/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Oct 6, 2020
…_buckets fails Pull Request resolved: #45933 Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. ghstack-source-id: 113713050 Differential Revision: [D24151256](https://our.internmc.facebook.com/intern/diff/D24151256/)
rohan-varma
added a commit
that referenced
this pull request
Oct 7, 2020
…hen rebuild_buckets fails" Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. Differential Revision: [D24151256](https://our.internmc.facebook.com/intern/diff/D24151256/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Oct 8, 2020
…hen rebuild_buckets fails" Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. Differential Revision: [D24151256](https://our.internmc.facebook.com/intern/diff/D24151256/) [ghstack-poisoned]
facebook-github-bot
pushed a commit
that referenced
this pull request
Oct 9, 2020
…_buckets fails (#45933) Summary: Pull Request resolved: #45933 Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (#44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. ghstack-source-id: 113914759 Test Plan: Added unittest Reviewed By: mrshenli Differential Revision: D24151256 fbshipit-source-id: 5d349a988b4aac7d3e0ef7b3cd84dfdcbe9db675
This was referenced Oct 9, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
[test all]
Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.
Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
Differential Revision: D23735185
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!