Skip to content

Conversation

@t-vi
Copy link
Collaborator

@t-vi t-vi commented Jun 22, 2019

Returning the result of an inplace squeeze_ in einsum (which itself is traced) interacts badly with autograd.Function.

I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall.

Fixes: #22072

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with merging this, but we should also understand why is an in-place operation at the end of a Function a problem.

for (int i = dim-1; i>=0; i--)
if (sum_dims[i])
result.squeeze_(i);
sizes.erase(sizes.begin() + i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you please add braces here? Braceless nested statements make me nervous 😬

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me, too, even if I'm the one to blame for them in the first place.

@t-vi
Copy link
Collaborator Author

t-vi commented Jun 23, 2019

So in terms of root cause we can take autograd.Function and einsum out of the equation:

a = torch.randn(5, requires_grad=True)
with torch.no_grad():
    b = a.unsqueeze(0).squeeze_(0)
print(b)

a.unsqueeze(0) will have requires_grad set (which I don't know why it has) and b has a grad_fn set to AsStridedBackward, which it arguably should not.
Apparently, this prevents Function to set its own backward.

@apaszke
Copy link
Contributor

apaszke commented Jun 23, 2019

Which is... reasonable? I mean, if the forward (running under no_grad) returns views that have a derivative, then I'm sure we have code that doesn't change it, because otherwise we wouldn't have been able to find the common "base" variable. On the other hand, the weird part is that even though einsum should return something that doesn't require_grad in a no_grad block, a view applied to it later adds a grad_fn to it?

@t-vi
Copy link
Collaborator Author

t-vi commented Jun 23, 2019 via email

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in f5df0c9.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 25, 2019
Summary:
Returning the result of an inplace `squeeze_` in `einsum` (which itself is traced) interacts badly with `autograd.Function`.

I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall.

Fixes: pytorch/pytorch#22072
Pull Request resolved: pytorch/pytorch#22111

Differential Revision: D15974990

Pulled By: soumith

fbshipit-source-id: 477e7f23833f02999085f665c175d062e7d32acd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Einsum as result of autograd.Function forward makes that backward is not called

6 participants