Add unit test to ensure no gradients sync when calling ddp.module(input) #20282

chenyangyu1988 · 2019-05-08T18:28:42Z

Summary:
Add unit test to ensure no gradients sync when calling ddp.module(input), e.g not invoking prepare_for_backward

PyText is depending on DDP for data parallel distributed training. To support accumulate gradients locally before gradients sync, we are calling orig_model.forward instead of ddp_model.forward. Add a unit test to avoid changes break the assumption.

Differential Revision: D15263155

mrshenli

Thanks for adding the test. I feel this test should only temporarily live to make sure changes in PyTorch does not break your use cases. And it should retire when we have a better solution (e.g., a context manager as @apaszke suggested).

mrshenli · 2019-05-08T21:15:19Z

test/test_c10d.py

Let's do 4 iterations to make sure that things are still correct after the first sync.

mrshenli · 2019-05-08T21:16:13Z

test/test_c10d.py

Shall we add a comment to say that this would only work when there is a single model replica per process. Otherwise, _sync_params would broadcast params from replica 0 to others and erase accumulated grads.

…ut) (pytorch#20282) Summary: Pull Request resolved: pytorch#20282 Add unit test to ensure no gradients sync when calling ddp.module(input), e.g not invoking prepare_for_backward PyText is depending on DDP for data parallel distributed training. To support accumulate gradients locally before gradients sync, we are calling orig_model.forward instead of ddp_model.forward. Add a unit test to avoid changes break the assumption. Reviewed By: mrshenli Differential Revision: D15263155 fbshipit-source-id: f5bd884616b2783064f383820dc6397f8608c3c6

facebook-github-bot · 2019-05-09T17:37:47Z

This pull request has been merged in 2019f6c.

Summary: Pull Request resolved: #20351 This was broken because of a merge race between #20282 and the stack in #20236. Cleaned up the test and comments a bit as well. Differential Revision: D15292786 fbshipit-source-id: a4379ea700cad959d3a6921fc5ddf9384fb8f228

chenyangyu1988 requested review from apaszke, mrshenli and pietern as code owners May 8, 2019 18:28

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 8, 2019

chenyangyu1988 force-pushed the export-D15263155 branch from ccc446f to e3cd089 Compare May 8, 2019 18:31

mrshenli approved these changes May 8, 2019

View reviewed changes

chenyangyu1988 force-pushed the export-D15263155 branch from e3cd089 to 2132e23 Compare May 9, 2019 04:54

chenyangyu1988 force-pushed the export-D15263155 branch from 2132e23 to 2f194b0 Compare May 9, 2019 04:55

facebook-github-bot closed this in 2019f6c May 9, 2019

facebook-github-bot added the merged label May 9, 2019

pietern mentioned this pull request May 10, 2019

Fix DistributedDataParallelTest.test_accumulate_gradients #20351

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unit test to ensure no gradients sync when calling ddp.module(input) #20282

Add unit test to ensure no gradients sync when calling ddp.module(input) #20282

Uh oh!

chenyangyu1988 commented May 8, 2019

Uh oh!

mrshenli left a comment

Uh oh!

mrshenli May 8, 2019

Uh oh!

mrshenli May 8, 2019 •

edited

Loading

Uh oh!

facebook-github-bot commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add unit test to ensure no gradients sync when calling ddp.module(input) #20282

Add unit test to ensure no gradients sync when calling ddp.module(input) #20282

Uh oh!

Conversation

chenyangyu1988 commented May 8, 2019

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

mrshenli May 8, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mrshenli May 8, 2019 •

edited

Loading