Skip to content

Conversation

@yf225
Copy link
Contributor

@yf225 yf225 commented Mar 20, 2019

According to #13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving version_counter_ out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.

@yf225 yf225 force-pushed the move_version_counter_ branch from 7680c05 to 4d65bce Compare March 20, 2019 15:51
@yf225 yf225 requested review from ezyang and gchanan March 20, 2019 15:51
@yf225 yf225 mentioned this pull request Mar 20, 2019
22 tasks
@ezyang
Copy link
Contributor

ezyang commented Mar 20, 2019

Hmm. I expected the version counter to go on storage, because you care about mutations any time you touch the storage, not just the tensor itself. The old Variable code had logic in view to make sure the shared pointer gets shared across all variable views, but if you have a non-autograd tensor, it won't hit that codebase and the sharing will not be setup appropriately. This is not observable today because you haven't flipped the switch yet.

Once you put it on storage you can get rid of the shared_ptr; it was only needed because of this aliasing behavior.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look right

@gchanan
Copy link
Contributor

gchanan commented Mar 20, 2019

@ezyang you can't put the version counter on Storage because detach, which is usable on sparse tensors, also shares the version counter, but sparse tensors don't have Storage.

I'm not sure exactly which model we are using here for "flip the switch."

@ezyang
Copy link
Contributor

ezyang commented Mar 20, 2019

What I mean by flip the switch, is capture a non-Variable Tensor in SavedVariables. You won't have accurate tracking in that case.

If there isn't a way around the sparse tensor problem, then we need to audit all of the non-Variable view operations and make sure they preserve the version counter appropriately as well.

@yf225
Copy link
Contributor Author

yf225 commented Mar 20, 2019

Based on offline chat with @ezyang :

  1. We should add check in VariableType.cpp functions to make sure the version_counter_ is always incremented properly.
  2. We should remove version_counter_ increment from VariableType.cpp dispatch functions.
  3. We should add version_counter_ increment into non-Variable dispatch functions, so that version_counter_ accurately reflects the update count of this tensor, regardless of whether the tensor contains autograd metadata or not.

@yf225 yf225 changed the title Move version_counter_ to TensorImpl [WIP] Move version_counter_ to TensorImpl Mar 26, 2019
@yf225 yf225 force-pushed the move_version_counter_ branch from 6f80408 to 286f11e Compare March 26, 2019 22:18
@yf225 yf225 force-pushed the move_version_counter_ branch from 286f11e to 7212564 Compare March 26, 2019 22:20
@yf225 yf225 changed the title [WIP] Move version_counter_ to TensorImpl Move version_counter_ to TensorImpl Mar 27, 2019
@yf225
Copy link
Contributor Author

yf225 commented Mar 27, 2019

@ezyang @gchanan This PR is ready for review again. Thx :)

@yf225
Copy link
Contributor Author

yf225 commented Apr 8, 2019

Update from in-person discussion: version counter should be a concept of autograd and managed by autograd-related code. If the user is not using autograd or chooses to in-place update a variable in the non-variable scope, the version counter of that variable should not be incremented.

@ezyang ezyang added the module: internals Related to internal abstractions in c10 and ATen label Apr 9, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// NOTE: After the Variable/Tensor merge, a tensor will not have AutogradMeta when
// its `requires_grad_` is false, but when we use this tensor in the forward pass of
// a function that requires saving this tensor for backward, we need to keep track of
// this tensor's version to make sure it's always valid in the autograd graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note will be difficult for future readers to understand. The reason it is difficult to understand is it is making a comment relative to a change, but in the future, no one will see the change, they will see the code as is! Sometimes knowing the historical context is helpful to know why code is setup this way, but in this case, the issue should be explained from first principles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the "After the Variable/Tensor merge" phrase after the merge is completed, change the future tense to present tense and rework the comment to make it easier to understand.

// One logical way to achieve this goal is to initialize AutogradMeta and create the
// version counter for the non-requires-grad tensor only when it's saved for backward.
// However, since saving a tensor for backward happens in the forward pass, and our
// invariant is that forward pass needs to be thread-safe, lazy-initializing AutogradMeta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with "forward pass" specifically. We support multithreaded read access to Tensor, period. Saving a tensor is a read operation, and therefore the reasoning below follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we could ask "why saving a tensor has to be a read operation", but it can't be a write operation because then the forward pass will not be thread-safe, and having forward pass work in multi-thread scenario is an important use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's obvious that saving a variable for later is a read-only operation. (Forget about PyTorch variables. If I stash an object somewhere so I can look at it later, surely that "stashing" process doesn't write to the object!)

I think it's perfectly fine to give an example where making "tensor saving" a write operation would break things. But you're more apt to confuse readers if you make this front and center.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// NOTE [ Version Counter Sharing ]
//
// Every Tensor has a version counter. Version counters are incremented
// whenever the data or shape of a tensor changes through Variable operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do version counters update on shape change? Are you referring to resize_? Because if you in-place restride a tensor, I actually wouldn't expect the version counter to update (after all, the data didn't change.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VariableType::as_strided_(self, size, stride, storage_offset) actually updates the version counter of self. We can investigate whether this is strictly necessary, outside of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at all functions in VariableType that does in-place restride to a tensor (VariableType::set_, VariableType::_th_set_, VariableType::as_strided_), and all of them resize the tensor at the same time, which is why the version counter is bumped in those functions.

In principle we shouldn't update the version counter if we only in-place restride a tensor. I will change "shape" to "size" in this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no API for just restriding, so this doesn't seem relevant. But if there was an API that let you raw inplace restride a Tensor I would absolutely expect it to update the version counter because it can change the data.

//
// Every Tensor has a version counter. Version counters are incremented
// whenever the data or shape of a tensor changes through Variable operations.
// These are typically in-place operations. Version counters are used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's just typical: the only way we should be bumping the version counter, is if we do an in-place mutation.

//
// Version counters are not shared when:
//
// 1. We replace a `Variable`'s underlying `Tensor` by calling `set_data(...)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You start a list here but actually there is only one member of the list ;)

// gradient calculations. Version counters may be shared between Variables:
//
// 1. A view shares the version counter of the base Variable,
// 2. Detached variables share the version counter of the source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to distinguish x.detach() from x.data. The former shares version counter; the latter doesn't, am I right? You should actually write code here. A link to #5396 would make it even better!

// Question 1: Why do we not increment the version counter in non-Variable
// operations?
//
// Answer: We explicitly don't increment the version counter in non-Variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think this is accurate. The most well known escape hatch is x.data and the fact that version counters don't get "incremented" here has nothing to do with whether or not we increment version counter in non-Variable operations; it's the fact that we didn't share the version counter in this situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this question comes totally out of the blue. Why would I ask a question like this? It would only make sense if I was previously told, "Hey! Version counters get updated when you call in-place operations on variables. They do NOT get updated when you call in-place operations on tensors."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO: The accurate answer to this question is that we think of "version counter tracking" as an affordance that it is given to you from the Variable API. So if you don't use the Variable API, you don't get this affordance.

// a function that requires saving this tensor for backward, we need to keep track of
// this tensor's version to make sure it's always valid in the autograd graph.
//
// One logical way to achieve this goal is to initialize AutogradMeta and create the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make it clearer here that you are talking about a hypothetical alternative way to work around this problem, but not what is actually implemented.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yf225 merged this pull request in 4ae59e4.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Apr 11, 2019
Summary:
According to pytorch/pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: pytorch/pytorch#18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
According to pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: pytorch#18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: internals Related to internal abstractions in c10 and ATen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants