Skip to content

Conversation

@mrshenli
Copy link
Contributor

Fixes #21108

When grad is disabled, Python autograd function outputs are wrapped as detached aliases, which prevents calling Tensor.set_() on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call rnn.flatten_parameters() in the forward pass, as the function calls set_().

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

@apsdehal

@mrshenli mrshenli added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 31, 2019
@mrshenli
Copy link
Contributor Author

This is blocking facebookresearch/mmf/issues/76

@pytorchbot pytorchbot added the module: nn Related to torch.nn label May 31, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in 51ebbe9.

facebook-github-bot pushed a commit that referenced this pull request Jun 3, 2019
Summary:
Retry #21197

The previous one failed because it uses some Python3 only syntax.

ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error.
Pull Request resolved: #21262

Differential Revision: D15598941

Pulled By: mrshenli

fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715
@mrshenli mrshenli deleted the nograd branch June 14, 2019 14:58
@BramVanroy
Copy link

Is this fix part of 1.2?

@gchanan
Copy link
Contributor

gchanan commented Oct 14, 2019

@BramVanroy yes, this made 1.2.

@Emrys365
Copy link

Could you also modify the replicate part in data_parallel()? https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/data_parallel.py#L217

The same problem still happens when I call torch.nn.parallel.data_parallel instead of torch.nn.DataParallel. You can refer to this code snippet for reproducing the problem: #21108 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: nn Related to torch.nn oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DataParallel] flatten_parameters doesn't work under torch.no_grad

8 participants