Multiple module outputs and multiple calls to backward #19799

pietern · 2019-04-26T07:01:24Z

Stack:
:black_circle: #19799 Multiple module outputs and multiple calls to backward 💛

A module that returns multiple outputs and where the called may end up
doing multiple calls to torch.autograd.backward did not work with
DistributedDataParallel. It expected the first call to
torch.autograd.backward to provide gradients for ALL parameters that
expect gradients and were used in computing the module output. If you
have outputs with disjoint autograd graphs it is fine to call
torch.autograd.backward on both and fill in the module's parameter
gradients in separate chunks.

With this change we delay queuing the finalizer callback until we have
marked all buckets as ready, instead of queueing it the first time we
receive an autograd hook. This returns the current implementation to
be functionally equivalent to the DistributedDataParallel
implementation before #18953 was merged.

Differential Revision: D15097045

Differential Revision: D15097045 Differential Version: 80783859

facebook-github-bot · 2019-04-26T19:13:18Z

This pull request has been merged in 0d8a361.

Summary: Pull Request resolved: #19897 During validation, gradient reduction is not needed, and autograd is never called. The model output will always be a detached tensor. After the new reducer was merged, this meant that it would find all model parameters unused, and kick off reduction for them. When #19799 and output where no parameters are used and it tries to kick off reduction of zeroed gradients. Test for `torch.is_grad_enabled()` and `self.training` before calling into the reducer. Reviewed By: mrshenli Differential Revision: D15118726 fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924

Summary: Pull Request resolved: pytorch#19799 A module that returns multiple outputs and where the called may end up doing multiple calls to torch.autograd.backward did not work with DistributedDataParallel. It expected the first call to torch.autograd.backward to provide gradients for ALL parameters that expect gradients and were used in computing the module output. If you have outputs with disjoint autograd graphs it is fine to call torch.autograd.backward on both and fill in the module's parameter gradients in separate chunks. With this change we delay queuing the finalizer callback until we have marked all buckets as ready, instead of queueing it the first time we receive an autograd hook. This returns the current implementation to be functionally equivalent to the DistributedDataParallel implementation before pytorch#18953 was merged. Reviewed By: mrshenli Differential Revision: D15097045 fbshipit-source-id: 2df023319713bc31e29a8b45108c78e6593fccd4

Summary: Pull Request resolved: pytorch#19897 During validation, gradient reduction is not needed, and autograd is never called. The model output will always be a detached tensor. After the new reducer was merged, this meant that it would find all model parameters unused, and kick off reduction for them. When pytorch#19799 and output where no parameters are used and it tries to kick off reduction of zeroed gradients. Test for `torch.is_grad_enabled()` and `self.training` before calling into the reducer. Reviewed By: mrshenli Differential Revision: D15118726 fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924

V1: Initial commit

14b1442

Differential Revision: D15097045 Differential Version: 80783859

pietern requested review from apaszke and mrshenli as code owners April 26, 2019 07:01

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 26, 2019

mrshenli approved these changes Apr 26, 2019

View reviewed changes

facebook-github-bot closed this in 0d8a361 Apr 26, 2019

facebook-github-bot added the merged label Apr 26, 2019

pietern mentioned this pull request Apr 29, 2019

Only call into reducer if torch.is_grad_enabled() #19897

Closed

ezyang deleted the export-D15097045 branch May 30, 2019 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple module outputs and multiple calls to backward #19799

Multiple module outputs and multiple calls to backward #19799

Uh oh!

pietern commented Apr 26, 2019 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Multiple module outputs and multiple calls to backward #19799

Multiple module outputs and multiple calls to backward #19799

Uh oh!

Conversation

pietern commented Apr 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pietern commented Apr 26, 2019 •

edited

Loading