-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Move version_counter_ to TensorImpl #18223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7680c05 to
4d65bce
Compare
|
Hmm. I expected the version counter to go on storage, because you care about mutations any time you touch the storage, not just the tensor itself. The old Variable code had logic in view to make sure the shared pointer gets shared across all variable views, but if you have a non-autograd tensor, it won't hit that codebase and the sharing will not be setup appropriately. This is not observable today because you haven't flipped the switch yet. Once you put it on storage you can get rid of the |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look right
|
@ezyang you can't put the version counter on Storage because I'm not sure exactly which model we are using here for "flip the switch." |
|
What I mean by flip the switch, is capture a non-Variable Tensor in SavedVariables. You won't have accurate tracking in that case. If there isn't a way around the sparse tensor problem, then we need to audit all of the non-Variable view operations and make sure they preserve the version counter appropriately as well. |
|
Based on offline chat with @ezyang :
|
6f80408 to
286f11e
Compare
286f11e to
7212564
Compare
|
Update from in-person discussion: version counter should be a concept of autograd and managed by autograd-related code. If the user is not using autograd or chooses to in-place update a variable in the non-variable scope, the version counter of that variable should not be incremented. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| // NOTE: After the Variable/Tensor merge, a tensor will not have AutogradMeta when | ||
| // its `requires_grad_` is false, but when we use this tensor in the forward pass of | ||
| // a function that requires saving this tensor for backward, we need to keep track of | ||
| // this tensor's version to make sure it's always valid in the autograd graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note will be difficult for future readers to understand. The reason it is difficult to understand is it is making a comment relative to a change, but in the future, no one will see the change, they will see the code as is! Sometimes knowing the historical context is helpful to know why code is setup this way, but in this case, the issue should be explained from first principles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the "After the Variable/Tensor merge" phrase after the merge is completed, change the future tense to present tense and rework the comment to make it easier to understand.
c10/core/TensorImpl.h
Outdated
| // One logical way to achieve this goal is to initialize AutogradMeta and create the | ||
| // version counter for the non-requires-grad tensor only when it's saved for backward. | ||
| // However, since saving a tensor for backward happens in the forward pass, and our | ||
| // invariant is that forward pass needs to be thread-safe, lazy-initializing AutogradMeta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with "forward pass" specifically. We support multithreaded read access to Tensor, period. Saving a tensor is a read operation, and therefore the reasoning below follows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we could ask "why saving a tensor has to be a read operation", but it can't be a write operation because then the forward pass will not be thread-safe, and having forward pass work in multi-thread scenario is an important use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it's obvious that saving a variable for later is a read-only operation. (Forget about PyTorch variables. If I stash an object somewhere so I can look at it later, surely that "stashing" process doesn't write to the object!)
I think it's perfectly fine to give an example where making "tensor saving" a write operation would break things. But you're more apt to confuse readers if you make this front and center.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
c10/core/TensorImpl.h
Outdated
| // NOTE [ Version Counter Sharing ] | ||
| // | ||
| // Every Tensor has a version counter. Version counters are incremented | ||
| // whenever the data or shape of a tensor changes through Variable operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do version counters update on shape change? Are you referring to resize_? Because if you in-place restride a tensor, I actually wouldn't expect the version counter to update (after all, the data didn't change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VariableType::as_strided_(self, size, stride, storage_offset) actually updates the version counter of self. We can investigate whether this is strictly necessary, outside of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at all functions in VariableType that does in-place restride to a tensor (VariableType::set_, VariableType::_th_set_, VariableType::as_strided_), and all of them resize the tensor at the same time, which is why the version counter is bumped in those functions.
In principle we shouldn't update the version counter if we only in-place restride a tensor. I will change "shape" to "size" in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no API for just restriding, so this doesn't seem relevant. But if there was an API that let you raw inplace restride a Tensor I would absolutely expect it to update the version counter because it can change the data.
c10/core/TensorImpl.h
Outdated
| // | ||
| // Every Tensor has a version counter. Version counters are incremented | ||
| // whenever the data or shape of a tensor changes through Variable operations. | ||
| // These are typically in-place operations. Version counters are used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's just typical: the only way we should be bumping the version counter, is if we do an in-place mutation.
c10/core/TensorImpl.h
Outdated
| // | ||
| // Version counters are not shared when: | ||
| // | ||
| // 1. We replace a `Variable`'s underlying `Tensor` by calling `set_data(...)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You start a list here but actually there is only one member of the list ;)
c10/core/TensorImpl.h
Outdated
| // gradient calculations. Version counters may be shared between Variables: | ||
| // | ||
| // 1. A view shares the version counter of the base Variable, | ||
| // 2. Detached variables share the version counter of the source, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to distinguish x.detach() from x.data. The former shares version counter; the latter doesn't, am I right? You should actually write code here. A link to #5396 would make it even better!
c10/core/TensorImpl.h
Outdated
| // Question 1: Why do we not increment the version counter in non-Variable | ||
| // operations? | ||
| // | ||
| // Answer: We explicitly don't increment the version counter in non-Variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think this is accurate. The most well known escape hatch is x.data and the fact that version counters don't get "incremented" here has nothing to do with whether or not we increment version counter in non-Variable operations; it's the fact that we didn't share the version counter in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this question comes totally out of the blue. Why would I ask a question like this? It would only make sense if I was previously told, "Hey! Version counters get updated when you call in-place operations on variables. They do NOT get updated when you call in-place operations on tensors."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO: The accurate answer to this question is that we think of "version counter tracking" as an affordance that it is given to you from the Variable API. So if you don't use the Variable API, you don't get this affordance.
c10/core/TensorImpl.h
Outdated
| // a function that requires saving this tensor for backward, we need to keep track of | ||
| // this tensor's version to make sure it's always valid in the autograd graph. | ||
| // | ||
| // One logical way to achieve this goal is to initialize AutogradMeta and create the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make it clearer here that you are talking about a hypothetical alternative way to work around this problem, but not what is actually implemented.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: According to pytorch/pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters. Pull Request resolved: pytorch/pytorch#18223 Differential Revision: D14735123 Pulled By: yf225 fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
Summary: According to pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters. Pull Request resolved: pytorch#18223 Differential Revision: D14735123 Pulled By: yf225 fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
According to #13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving
version_counter_out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.