Move version_counter_ to TensorImpl #18223

yf225 · 2019-03-20T15:50:59Z

According to #13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving version_counter_ out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.

ezyang · 2019-03-20T18:13:09Z

Hmm. I expected the version counter to go on storage, because you care about mutations any time you touch the storage, not just the tensor itself. The old Variable code had logic in view to make sure the shared pointer gets shared across all variable views, but if you have a non-autograd tensor, it won't hit that codebase and the sharing will not be setup appropriately. This is not observable today because you haven't flipped the switch yet.

Once you put it on storage you can get rid of the shared_ptr; it was only needed because of this aliasing behavior.

ezyang

Doesn't look right

gchanan · 2019-03-20T19:50:49Z

@ezyang you can't put the version counter on Storage because detach, which is usable on sparse tensors, also shares the version counter, but sparse tensors don't have Storage.

I'm not sure exactly which model we are using here for "flip the switch."

ezyang · 2019-03-20T19:59:32Z

What I mean by flip the switch, is capture a non-Variable Tensor in SavedVariables. You won't have accurate tracking in that case.

If there isn't a way around the sparse tensor problem, then we need to audit all of the non-Variable view operations and make sure they preserve the version counter appropriately as well.

yf225 · 2019-03-20T22:50:02Z

Based on offline chat with @ezyang :

We should add check in VariableType.cpp functions to make sure the version_counter_ is always incremented properly.
We should remove version_counter_ increment from VariableType.cpp dispatch functions.
We should add version_counter_ increment into non-Variable dispatch functions, so that version_counter_ accurately reflects the update count of this tensor, regardless of whether the tensor contains autograd metadata or not.

…ersion_counter_

yf225 · 2019-03-27T18:21:15Z

@ezyang @gchanan This PR is ready for review again. Thx :)

yf225 · 2019-04-08T18:30:28Z

Update from in-person discussion: version counter should be a concept of autograd and managed by autograd-related code. If the user is not using autograd or chooses to in-place update a variable in the non-variable scope, the version counter of that variable should not be incremented.

aten/src/ATen/SparseTensorImpl.h

c10/core/TensorImpl.h

…ersion_counter_

c10/core/TensorImpl.h

facebook-github-bot

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2019-04-10T17:21:31Z

c10/core/TensorImpl.h

+// NOTE: After the Variable/Tensor merge, a tensor will not have AutogradMeta when
+// its `requires_grad_` is false, but when we use this tensor in the forward pass of
+// a function that requires saving this tensor for backward, we need to keep track of
+// this tensor's version to make sure it's always valid in the autograd graph.


This note will be difficult for future readers to understand. The reason it is difficult to understand is it is making a comment relative to a change, but in the future, no one will see the change, they will see the code as is! Sometimes knowing the historical context is helpful to know why code is setup this way, but in this case, the issue should be explained from first principles.

I will remove the "After the Variable/Tensor merge" phrase after the merge is completed, change the future tense to present tense and rework the comment to make it easier to understand.

ezyang · 2019-04-10T17:24:20Z

c10/core/TensorImpl.h

+// One logical way to achieve this goal is to initialize AutogradMeta and create the
+// version counter for the non-requires-grad tensor only when it's saved for backward.
+// However, since saving a tensor for backward happens in the forward pass, and our
+// invariant is that forward pass needs to be thread-safe, lazy-initializing AutogradMeta


This has nothing to do with "forward pass" specifically. We support multithreaded read access to Tensor, period. Saving a tensor is a read operation, and therefore the reasoning below follows.

I feel that we could ask "why saving a tensor has to be a read operation", but it can't be a write operation because then the forward pass will not be thread-safe, and having forward pass work in multi-thread scenario is an important use case.

To me, it's obvious that saving a variable for later is a read-only operation. (Forget about PyTorch variables. If I stash an object somewhere so I can look at it later, surely that "stashing" process doesn't write to the object!)

I think it's perfectly fine to give an example where making "tensor saving" a write operation would break things. But you're more apt to confuse readers if you make this front and center.

facebook-github-bot

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2019-04-10T20:37:41Z

c10/core/TensorImpl.h

+// NOTE [ Version Counter Sharing ]
+//
+// Every Tensor has a version counter. Version counters are incremented
+// whenever the data or shape of a tensor changes through Variable operations.


Do version counters update on shape change? Are you referring to resize_? Because if you in-place restride a tensor, I actually wouldn't expect the version counter to update (after all, the data didn't change.)

VariableType::as_strided_(self, size, stride, storage_offset) actually updates the version counter of self. We can investigate whether this is strictly necessary, outside of this PR.

I looked at all functions in VariableType that does in-place restride to a tensor (VariableType::set_, VariableType::_th_set_, VariableType::as_strided_), and all of them resize the tensor at the same time, which is why the version counter is bumped in those functions.

In principle we shouldn't update the version counter if we only in-place restride a tensor. I will change "shape" to "size" in this comment.

there's no API for just restriding, so this doesn't seem relevant. But if there was an API that let you raw inplace restride a Tensor I would absolutely expect it to update the version counter because it can change the data.

ezyang · 2019-04-10T20:38:01Z

c10/core/TensorImpl.h

+//
+// Every Tensor has a version counter. Version counters are incremented
+// whenever the data or shape of a tensor changes through Variable operations.
+// These are typically in-place operations. Version counters are used to


I don't think it's just typical: the only way we should be bumping the version counter, is if we do an in-place mutation.

ezyang · 2019-04-10T20:38:51Z

c10/core/TensorImpl.h

+//
+// Version counters are not shared when:
+//
+// 1. We replace a `Variable`'s underlying `Tensor` by calling `set_data(...)`.


You start a list here but actually there is only one member of the list ;)

ezyang · 2019-04-10T20:39:50Z

c10/core/TensorImpl.h

+// gradient calculations. Version counters may be shared between Variables:
+//
+// 1. A view shares the version counter of the base Variable,
+// 2. Detached variables share the version counter of the source,


It's important to distinguish x.detach() from x.data. The former shares version counter; the latter doesn't, am I right? You should actually write code here. A link to #5396 would make it even better!

ezyang · 2019-04-10T20:41:27Z

c10/core/TensorImpl.h

+// Question 1: Why do we not increment the version counter in non-Variable
+// operations?
+//
+// Answer: We explicitly don't increment the version counter in non-Variable


No, I don't think this is accurate. The most well known escape hatch is x.data and the fact that version counters don't get "incremented" here has nothing to do with whether or not we increment version counter in non-Variable operations; it's the fact that we didn't share the version counter in this situation.

Also, this question comes totally out of the blue. Why would I ask a question like this? It would only make sense if I was previously told, "Hey! Version counters get updated when you call in-place operations on variables. They do NOT get updated when you call in-place operations on tensors."

IMO: The accurate answer to this question is that we think of "version counter tracking" as an affordance that it is given to you from the Variable API. So if you don't use the Variable API, you don't get this affordance.

ezyang · 2019-04-10T20:44:19Z

c10/core/TensorImpl.h

+// a function that requires saving this tensor for backward, we need to keep track of
+// this tensor's version to make sure it's always valid in the autograd graph.
+//
+// One logical way to achieve this goal is to initialize AutogradMeta and create the


I'd make it clearer here that you are talking about a hypothetical alternative way to work around this problem, but not what is actually implemented.

facebook-github-bot

@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-04-11T22:46:18Z

@yf225 merged this pull request in 4ae59e4.

Summary: According to pytorch/pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters. Pull Request resolved: pytorch/pytorch#18223 Differential Revision: D14735123 Pulled By: yf225 fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b

Summary: According to pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters. Pull Request resolved: pytorch#18223 Differential Revision: D14735123 Pulled By: yf225 fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b

Move version_counter_ to TensorImpl

4d65bce

yf225 force-pushed the move_version_counter_ branch from 7680c05 to 4d65bce Compare March 20, 2019 15:51

yf225 requested review from ezyang and gchanan March 20, 2019 15:51

yf225 mentioned this pull request Mar 20, 2019

Variable/Tensor Merge Proposal #13638

Closed

22 tasks

ezyang requested changes Mar 20, 2019

View reviewed changes

Will Feng added 2 commits March 26, 2019 17:37

Bump tensor version in non-Variable dispatch paths

cd378ec

Merge branch 'master' of https://github.com/yf225/pytorch into move_v…

bb95573

…ersion_counter_

yf225 changed the title ~~Move version_counter_ to TensorImpl~~ [WIP] Move version_counter_ to TensorImpl Mar 26, 2019

Will Feng added 3 commits March 26, 2019 17:44

fix bug

07c897a

fix comment

6a4c907

Fix lint

c6d5609

yf225 force-pushed the move_version_counter_ branch from 6f80408 to 286f11e Compare March 26, 2019 22:18

preserve version in set_data

7212564

yf225 force-pushed the move_version_counter_ branch from 286f11e to 7212564 Compare March 26, 2019 22:20

Will Feng added 9 commits March 26, 2019 20:52

Try to fix bug

31581a6

clean up comment

919d522

clean up comment

8021735

DEBUG

6222ad0

don't copy version_counter in shallow copy

03910ac

fix test_nn

a351fba

fix test_einsum

f6e416e

nicer placeholder

9ce6b5a

Add version increment to MSNPUType.cpp and XLAType.cpp

bec5c06

yf225 changed the title ~~[WIP] Move version_counter_ to TensorImpl~~ Move version_counter_ to TensorImpl Mar 27, 2019

gchanan reviewed Apr 9, 2019

View reviewed changes

aten/src/ATen/SparseTensorImpl.h Show resolved Hide resolved

c10/core/TensorImpl.h Outdated Show resolved Hide resolved

Will Feng added 3 commits April 9, 2019 13:54

Remove is_variable() check in SparseTensorImpl shallow_copy_and_detach()

1ba87d5

Merge branch 'master' of https://github.com/yf225/pytorch into move_v…

e573d35

…ersion_counter_

improve comments

6cdf7f4

gchanan reviewed Apr 9, 2019

View reviewed changes

c10/core/TensorImpl.h Outdated Show resolved Hide resolved

ezyang added the module: internals Related to internal abstractions in c10 and ATen label Apr 9, 2019

Improve comment

f424cf1

gchanan approved these changes Apr 10, 2019

View reviewed changes

facebook-github-bot reviewed Apr 10, 2019

View reviewed changes

ezyang reviewed Apr 10, 2019

View reviewed changes

Improve comment

46e22e4

facebook-github-bot reviewed Apr 10, 2019

View reviewed changes

ezyang reviewed Apr 10, 2019

View reviewed changes

improve comments

49b5104

facebook-github-bot reviewed Apr 11, 2019

View reviewed changes

facebook-github-bot closed this in 4ae59e4 Apr 11, 2019

facebook-github-bot added the merged label Apr 11, 2019

yf225 mentioned this pull request May 15, 2019

[BC-breaking] Fix version counter sharing in set_data() #20391

Closed

yf225 mentioned this pull request Jul 18, 2019

Proposal: Optional AutogradMeta for Variable #23032

Open

7 tasks

Move version_counter_ to TensorImpl #18223

Move version_counter_ to TensorImpl #18223

Uh oh!

Conversation

yf225 commented Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Mar 20, 2019

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

gchanan commented Mar 20, 2019

Uh oh!

ezyang commented Mar 20, 2019

Uh oh!

yf225 commented Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yf225 commented Mar 27, 2019

Uh oh!

yf225 commented Apr 8, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yf225 commented Mar 20, 2019 •

edited

Loading

yf225 commented Mar 20, 2019 •

edited

Loading