Skip to content

Conversation

@ifedan
Copy link
Contributor

@ifedan ifedan commented Jun 7, 2019

This is a fix for #21469
Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables.
Functions that do not have saved variables can be called multiple times for BC

@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen labels Jun 7, 2019
@ifedan ifedan requested a review from gchanan June 11, 2019 17:46
@ifedan ifedan requested a review from zou3519 June 13, 2019 16:09
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

def test_backward_twice_with_saved_values(self):
b = torch.randn(3, requires_grad=True, dtype=torch.double)
c = torch.zeros(3, dtype=torch.double)
c[[1, 2]] = b[[1, 1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not completely obvious that this indexing expression causes a TensorList to be saved for backwards and that is what we're fixing in this PR (that backward twice with a saved TensorList works as expected). I don't see any other functions in derivatives.yaml that saves a TensorList, though, so I don't have ideas on how to make this test better.

@zou3519
Copy link
Contributor

zou3519 commented Jun 13, 2019

Oh, one more thing: you should delete

const char* ERR_BACKWARD_TWICE =
"Trying to backward through the graph a second time, but the buffers have "
"already been freed. Specify retain_graph=True when calling backward "
"the first time.";
since it's now unused

Edit: Nevermind, you already did a deduplication.

${will_release_variables}
${saved_variables}
${saved_list_sizes}
bool released = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming this released doesn't seem like the best idea because this will cause a problem with any function that tries to capture an argument named released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we conditionally generate this variable, like we do with every other variable in autograd Functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think there are 2 more subtle issues with this:

  1. why is this global on the function? This would prevent us from conditionally unpacking the variables we need if we only need to compute some subset of the backwards, for no good reason. We could just track this per TensorList, like we do with the size.
  2. Speaking of the size, doesn't that solve your disambiguate problem? Like, you know the size at the start and can check the later size, to figure out if something was reset. I guess there is a judgement call on whether you should error if the TensorList that was captured was empty, but that seems like we can just do the same thing we do with Tensors (what happens if we capture an undefined tensor?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You can unpack some of the variable, flag would be set only after release_variables() will be called.
  2. We do not hold TensorList, instead we hold vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name has been changed to variables_were_released_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are your responses "1." and "2." responses to my questions "1." and "2."? I don't understand how they address the questions if so. For example, with 2., I understand that you don't literally hold the "TensorList"; the question is about distinguishing between whether the thing you saved was empty to start or it was wasn't, but was released.

elif arg['type'] == 'TensorList':
saved_variables.append('std::vector<SavedVariable> {}_;'.format(name))
release_variables.append('for (auto& sv : {}_)'.format(name))
release_variables.append('{')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the strategy here. Why are we both:
(1) resetting the data / resetting the grad function and
(2) tracking released on TensorLists and clearing the vector?

Why does both need to happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is explanation from @zou3519: #21533 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was talking about something else in that comment (brainstorming a different solution that would not use a released flag).

@ifedan Just clear() (2) is sufficient, I don't think you need to reset the data and grad function (1). clear() remove the SavedVariables in the list. Because the SavedVariable owns a tensor and a grad_fn, removing the SavedVariable makes them go away as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on @gchanan's review

@ifedan ifedan requested review from gchanan and zou3519 June 13, 2019 18:40
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about why this was implemented this way, as opposed to say looping through and releasing each one would be nice.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ifedan merged this pull request in 0998a32.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this per-arg instead of having a single assert at the beginning of the function?? That way it would have been much faster and simpler.

Never mind, this only applies to tensor list arguments so it shouldn't matter.

@wanchaol
Copy link
Collaborator

Why are we doing this per-arg instead of having a single assert at the beginning of the function?? That way it would have been much faster and simpler.

Never mind, this only applies to tensor list arguments so it shouldn't matter.

I got a similar question here, given that we have very few autograd Node/Function that is not holding tensors, why we don't just put the flag in the autograd Node/Function itself (SavedVariable also have a flag called was_default_constructed that is used for this)? this way it could be simpler and we don't need to worry about the TensorList and SavedVariable difference for releasing.

@gchanan
Copy link
Contributor

gchanan commented Feb 10, 2020

@wanchaol: I think that would work, but you'd need to handle the case where you don't hold tensors (it's important you don't give an error in those cases). I'd lean toward code that always works over adding special cases (unless there is a good reason, like perf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants