-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[BC-breaking] Remove Variable::Impl and DifferentiableViewImpl #17072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aten/src/ATen/SparseTensorImpl.cpp
Outdated
| AT_ASSERT(!indices.is_variable() && !values.is_variable()); // They should be plain tensors! // TODO: change this to check `.requires_grad()` and `GradMode::is_enabled()` when Variable and Tensor are merged | ||
| AT_ASSERT((!(indices.is_variable() && indices.requires_grad()) && | ||
| !(values.is_variable() && values.requires_grad())) | ||
| || at::NonVariableTypeMode::is_enabled()); // TODO: use `compute_requires_grad()` after it's moved to ATen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on moving compute_requires_grad() to ATen in the next PR.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/nn/modules/module.py
Outdated
| with torch.no_grad(): | ||
| param = fn(param) | ||
| if param._grad is not None: | ||
| param._grad = fn(param._grad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to debug why we need this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: For some test cases, param._grad is already a tensor with allow_tensor_metadata_change set to false, and changing its storage in set_data() will throw an error. To fix this problem, we should just do param._grad = fn(param._grad), and use with torch.no_grad() to avoid accumulating gradients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New code is more idiomatic anyway, sgtm.
c10/core/TensorImpl.h
Outdated
| * | ||
| * For usage of `version_counter` and `allow_tensor_metadata_change`, see NOTE [ TensorImpl Shallow-Copying ]. | ||
| */ | ||
| virtual void copy_tensor_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few issues with this function:
- this is just used by the public API functions
shallow_copy_and_detach,shallow_copy_from, right? Then it should be private. - This doesn't need to be virtual; actually notice you don't ever use this, so this doesn't even need to be a non-static method.
- because it's a static method and you aren't using virtual dispatch, you can use the correct static types of inputs. So you don't have to do scary static_casts in the derived types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err, you probably need to make it protected so the derived types can call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| * Shallow-copies data from another TensorImpl into this TensorImpl. | ||
| */ | ||
| virtual void shallow_copy_from(c10::intrusive_ptr<TensorImpl> impl) { | ||
| copy_tensor_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- it's scary not to have an ASSERT in here -- I know you check the only call site to give a nice error message, but still, you want to guard the unsafe places.
- if you follow the arguments above that
copy_tensor_datashould have correct static types, you should do the static cast ofimplin here, after the ASSERT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the static cast and the ASSERT in the TensorImpl derived types' shallow_copy_from().
| * | ||
| * For usage of `version_counter` and `allow_tensor_metadata_change`, see NOTE [ TensorImpl Shallow-Copying ]. | ||
| */ | ||
| static void copy_tensor_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it not work to make it protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is protected now, with the keyword at https://github.com/pytorch/pytorch/pull/17072/files#diff-7f54d357d87885923a1536316f17ac6bR1410.
| /*src_impl=*/this, | ||
| /*dest_impl=*/impl.get(), | ||
| /*version_counter=*/version_counter, | ||
| /*allow_tensor_metadata_change=*/allow_tensor_metadata_change); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to refresh numel here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| /*src_impl=*/opaque_impl, | ||
| /*dest_impl=*/this, | ||
| /*version_counter=*/version_counter(), | ||
| /*allow_tensor_metadata_change=*/allow_tensor_metadata_change()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to refresh numel here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
gchanan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: As part of the Variable/Tensor merge work: pytorch/pytorch#13638, we make the following changes in this PR: 1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class 2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()` 3. Remove `Variable.data()` API 3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history. After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't. **Note that this PR is BC-breaking in the following use cases:** **Use Case 1:** Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type. **Use Case 2:** If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example: ```python params = torch.tensor([1.5, 1.5]).requires_grad_() with torch.no_grad(): # Change gradient to a sparse tensor params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.])) grad_saved = params.grad params.backward(torch.tensor([1.5, 1.5])) assert id(grad_saved) == id(params.grad) # This will fail after this PR ``` The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference. Pull Request resolved: pytorch/pytorch#17072 Differential Revision: D14075257 Pulled By: yf225 fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
|
Wow, epic work!!! |
|
This broke XLA build. cc @ailzhang |
|
Hooray! |
Summary: After #17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path. Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper. Pull Request resolved: #21620 Differential Revision: D15763560 Pulled By: yf225 fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
…flag, and check it in `nn.Module._apply()` (#21613) Summary: #17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type: ```python # https://github.com/pytorch/pytorch/blob/6dc445e1a84dc5d093d640de54f038f021d13227/torch/nn/modules/module.py#L192-L208 def _apply(self, fn): ... for param in self._parameters.values(): if param is not None: # Tensors stored in modules are graph leaves, and we don't # want to create copy nodes, so we have to unpack the data. param.data = fn(param.data) # NOTE: this doesn't allow changing `param.data`'s TensorImpl type if param._grad is not None: param._grad.data = fn(param._grad.data) # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type ... ``` yf225 TODO: fix the description here when we finish the implementation To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in. We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model. This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher. [xla ci] cc. resistor ailzhang Pull Request resolved: #21613 Differential Revision: D15895387 Pulled By: yf225 fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6
…flag, and check it in `nn.Module._apply()` (#21613) Summary: pytorch/pytorch#17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type: ```python # https://github.com/pytorch/pytorch/blob/6dc445e1a84dc5d093d640de54f038f021d13227/torch/nn/modules/module.py#L192-L208 def _apply(self, fn): ... for param in self._parameters.values(): if param is not None: # Tensors stored in modules are graph leaves, and we don't # want to create copy nodes, so we have to unpack the data. param.data = fn(param.data) # NOTE: this doesn't allow changing `param.data`'s TensorImpl type if param._grad is not None: param._grad.data = fn(param._grad.data) # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type ... ``` yf225 TODO: fix the description here when we finish the implementation To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in. We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model. This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher. [xla ci] cc. resistor ailzhang Pull Request resolved: pytorch/pytorch#21613 Differential Revision: D15895387 Pulled By: yf225 fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6
As part of the Variable/Tensor merge work: #13638, we make the following changes in this PR:
Variable::Implclass and theDifferentiableViewImplclassVariable.data()call sites to either useVariabledirectly, or useVariable.tensor_data()Variable.data()APIVariable.variable_data()that matchestensor.datain Python API, which creates a newVariablethat shares the same storage and tensor metadata with the originalVariable, but with a completely new autograd history.After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its
impl_. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.Note that this PR is BC-breaking in the following use cases:
Use Case 1:
Previously,
x.data = yworks even ifxandyare of different TensorImpl type (e.g.xis a CPU dense tensor whose impl is of type TensorImpl, whileyis a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR,x.data = ydoesn't work anymore ifxandyare of different TensorImpl type, because the underlying implementationvariable.set_data(tensor)no longer works ifvariableandtensorhave different TensorImpl type.This especially shows up in the following use case:
Use Case 2:
If a tensor
x'sgradis sparse, accumulating dense gradients toxwill change the tensor thatx.gradis pointing to. This is better illustrated with the following example:The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the
params.gradtensor reference.