-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Backward function will set a flag if it released variables #21533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…id second call to it
zou3519
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| def test_backward_twice_with_saved_values(self): | ||
| b = torch.randn(3, requires_grad=True, dtype=torch.double) | ||
| c = torch.zeros(3, dtype=torch.double) | ||
| c[[1, 2]] = b[[1, 1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not completely obvious that this indexing expression causes a TensorList to be saved for backwards and that is what we're fixing in this PR (that backward twice with a saved TensorList works as expected). I don't see any other functions in derivatives.yaml that saves a TensorList, though, so I don't have ideas on how to make this test better.
|
Oh, one more thing: you should delete pytorch/torch/csrc/autograd/saved_variable.cpp Lines 100 to 103 in 556af7c
Edit: Nevermind, you already did a deduplication. |
| ${will_release_variables} | ||
| ${saved_variables} | ||
| ${saved_list_sizes} | ||
| bool released = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming this released doesn't seem like the best idea because this will cause a problem with any function that tries to capture an argument named released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we conditionally generate this variable, like we do with every other variable in autograd Functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think there are 2 more subtle issues with this:
- why is this global on the function? This would prevent us from conditionally unpacking the variables we need if we only need to compute some subset of the backwards, for no good reason. We could just track this per TensorList, like we do with the size.
- Speaking of the size, doesn't that solve your disambiguate problem? Like, you know the size at the start and can check the later size, to figure out if something was reset. I guess there is a judgement call on whether you should error if the TensorList that was captured was empty, but that seems like we can just do the same thing we do with Tensors (what happens if we capture an undefined tensor?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You can unpack some of the variable, flag would be set only after release_variables() will be called.
- We do not hold TensorList, instead we hold vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name has been changed to variables_were_released_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are your responses "1." and "2." responses to my questions "1." and "2."? I don't understand how they address the questions if so. For example, with 2., I understand that you don't literally hold the "TensorList"; the question is about distinguishing between whether the thing you saved was empty to start or it was wasn't, but was released.
| elif arg['type'] == 'TensorList': | ||
| saved_variables.append('std::vector<SavedVariable> {}_;'.format(name)) | ||
| release_variables.append('for (auto& sv : {}_)'.format(name)) | ||
| release_variables.append('{') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the strategy here. Why are we both:
(1) resetting the data / resetting the grad function and
(2) tracking released on TensorLists and clearing the vector?
Why does both need to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is explanation from @zou3519: #21533 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was talking about something else in that comment (brainstorming a different solution that would not use a released flag).
@ifedan Just clear() (2) is sufficient, I don't think you need to reset the data and grad function (1). clear() remove the SavedVariables in the list. Because the SavedVariable owns a tensor and a grad_fn, removing the SavedVariable makes them go away as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
zou3519
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes based on @gchanan's review
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
gchanan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about why this was implemented this way, as opposed to say looping through and releasing each one would be nice.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this per-arg instead of having a single assert at the beginning of the function?? That way it would have been much faster and simpler.
Never mind, this only applies to tensor list arguments so it shouldn't matter.
I got a similar question here, given that we have very few autograd Node/Function that is not holding tensors, why we don't just put the flag in the autograd Node/Function itself (SavedVariable also have a flag called |
|
@wanchaol: I think that would work, but you'd need to handle the case where you don't hold tensors (it's important you don't give an error in those cases). I'd lean toward code that always works over adding special cases (unless there is a good reason, like perf). |
This is a fix for #21469
Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables.
Functions that do not have saved variables can be called multiple times for BC