Skip to content

Conversation

@weiyangfb
Copy link
Contributor

@weiyangfb weiyangfb commented Aug 30, 2018

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=True)
>>> print(copy)
tensor([[-1.2001,  1.9869],
        [-1.0134,  1.3096]], grad_fn=<CopyBackwards>)

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=False)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])

@weiyangfb weiyangfb changed the title mute grad when copy construct from a var with requires_grad=Fales mute grad when copy construct from a var with requires_grad=False Aug 30, 2018
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@weiyangfb weiyangfb changed the title mute grad when copy construct from a var with requires_grad=False [Easy]mute grad when copy construct from a var with requires_grad=False Sep 11, 2018
@weiyangfb weiyangfb added the ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes label Sep 11, 2018
@weiyangfb
Copy link
Contributor Author

@ezyang can I get a review on this easy PR?

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read-only ops should not mutate.

@apaszke
Copy link
Contributor

apaszke commented Sep 12, 2018

I would also say that the semantics are slightly weird. The output variable depends on the data of the input variable, and so there's no reason not to differentiate through this...

This comment was marked as off-topic.

This comment was marked as off-topic.

@weiyangfb
Copy link
Contributor Author

@apaszke I would guess the copy constructed tensor should not be made differentiable through copy op here. Please correct me if I am wrong, I think the expected behavior should be the same as:

source = torch.randn(10, requires_grad=True)
copy = torch.tensor(source.data)

@ezyang
Copy link
Contributor

ezyang commented Sep 14, 2018

To figure out a good solution to this problem, we must ask ourselves:

  1. How is tensor() with a Tensor argument implemented today, and
  2. Why does this implementation "do the wrong thing" with respect to requires_grad?

We can see that tensor() dispatches to tensor_ctor, which dispatches to internal_new_from_data. Here is the important lines in the function:

  if (THPVariable_Check(data)) {
      auto var = reinterpret_cast<THPVariable*>(data)->cdata;
      auto type_inference_device_type = device_opt.has_value() ? device_opt->type()
                                                               : torch::getDeviceType(var.type());
      // infer the scalar type and device type; it's not expected to infer the layout since these constructors
      // are defined per-layout-type (e.g. tensor vs sparse_coo_tensor).
      const auto& type_inference_type = torch::getVariableType(var.type().scalarType(),
                                                       *torch::getLayout(type.backend()),
                                                       type_inference_device_type);
      const auto& type_to_use = type_inference ? type_inference_type : type;
      return copy_variables ? new_with_tensor_copy(type_to_use, var, device_index) :
                              new_with_type_conversion(type_to_use, var, device_index);
  }

In the bodies of new_with_tensor_copy and new_with_type_conversion you can see this calls copy or toType on Variable, which means you get the normal gradient behavior as if you had called copy() or toType() on them, which is why you incorrectly get requires_grad at the end.

Based on this, this suggests two solutions:

  1. Keep calling copy/toType, but then detach the tensor afterwards, so that you set requires_grad = false and delete the grad_fn. Effectively, you are saying that tensor(t) is equivalent to t.clone().detach(). This is the "do the wrong thing first, and then fixup after yourself" strategy.
  2. Create a new operator on VariableType, new_with_tensor, which does the right thing from the get go and doesn't actually ever create a grad fn or set requires grad on the output. This is the "do the right thing from the beginning" strategy, but it is more complicated to implement.

I think both approaches are reasonable. If you want to do (2), you'll need to write a custom VariableType method implementation, based on the fact that detach is hardcoded, like this:

Tensor VariableType::detach(const Tensor & self) const {
  profiler::RecordFunction profiler("detach");
  torch::jit::Node* node = nullptr;
  if (jit::tracer::isTracing()) {
    auto& graph = jit::tracer::getTracingState()->graph;
    node = graph->create(jit::aten::detach, /*outputs=*/0);
    jit::tracer::recordSourceLocation(node);
    jit::tracer::addInputs(node, "self", self);
    graph->appendNode(node);

  }
  // <NON_GENERATED_CODE>
  auto result = as_variable_ref(const_cast<Tensor&>(self)).detach();
  // </NON_GENERATED_CODE>
  if (jit::tracer::isTracing()) {
    jit::tracer::addOutput(node, result);
  }
  return result;
}

You'll have something similar, except that it calls copy/toType as necessary. But I think detach after the fact is also a very reasonable strategy, and much simpler.

@weiyangfb weiyangfb force-pushed the create_tensor_requires_grad_false branch from 68e72fd to 04b32b3 Compare September 16, 2018 06:55
@weiyangfb weiyangfb changed the title [Easy]mute grad when copy construct from a var with requires_grad=False [Easy]detach new_tensor when copy construct from a tensor (requires_grad=True) while setting requires_grad=False Sep 16, 2018
@weiyangfb
Copy link
Contributor Author

@ezyang I am picking the (2) approach here: clone() and detach() when copy construct a new tensor.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there are some semantic problems.

Basically, in my opinion (with concurrences from @gchanan and @apaszke), we think that torch.tensor(x) should be equivalent to x.clone().detach() and torch.tensor(x, requires_grad=True) should be equivalent to x.clone().detach().requires_grad_(True). The basic idea is that torch.tensor reads out "the data" from whatever it is passed, and constructs a leaf variable. This gives us the invariant that "for any argument x, torch.tensor(x) does not require grad, and torch.tensor(x, requires_grad=True) gives a leaf variable that requires grad).

@weiyangfb
Copy link
Contributor Author

@ezyang Thanks a lot for the clarifications! Sorry I missed the default case. I will make changes so that:

source = torch.randn(3, 3, requires_grad=True)
copy = torch.tensor(source) # requires_grad=False

@weiyangfb weiyangfb force-pushed the create_tensor_requires_grad_false branch from 04b32b3 to c040b6b Compare September 17, 2018 19:22
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still not right.

Also, can we update the docs for tensor for this case?

@ezyang
Copy link
Contributor

ezyang commented Sep 17, 2018

In general, we should discourage people from passing tensor to torch.tensor, in favor of explicitly saying x.clone().detach() or x.clone().detach().requires_grad_(True). @apaszke suggested a warning might be appropriate in this case.

@weiyangfb
Copy link
Contributor Author

@ezyang Thanks a lot for the comments! I will update the PR to make copy construct from a tensor to be leaf variable as always, and set requires_grad wrt to input args. Also raise python warnings inside this path, and update the docs altogether.

@weiyangfb weiyangfb force-pushed the create_tensor_requires_grad_false branch from c040b6b to 02f1b22 Compare September 17, 2018 22:29
@weiyangfb weiyangfb changed the title [Easy]detach new_tensor when copy construct from a tensor (requires_grad=True) while setting requires_grad=False [Easy] make copy constructed tensor a leaf variable in using torch.tensor(sourceTensor) Sep 17, 2018
@weiyangfb weiyangfb changed the title [Easy] make copy constructed tensor a leaf variable in using torch.tensor(sourceTensor) [Easy] make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) Sep 17, 2018
@weiyangfb weiyangfb force-pushed the create_tensor_requires_grad_false branch from 02f1b22 to 6889608 Compare September 17, 2018 22:46
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

…or, and set requires_grad wrt to input args,

2. raise warnings for this path and update docs
@weiyangfb weiyangfb force-pushed the create_tensor_requires_grad_false branch from 6889608 to 9c413bb Compare September 18, 2018 04:47
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if what we did here is correct.

return copy_variables ? new_with_tensor_copy(type_to_use, var, device_index) :
auto new_tensor = copy_variables ? new_with_tensor_copy(type_to_use, var, device_index) :
new_with_type_conversion(type_to_use, var, device_index);
new_tensor.detach_(); // making copy constructed tensor a leaf node

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

When data is a tensor `x`, :func:`torch.tensor` reads out 'the data' from whatever it is passed,
and constructs a leaf variable. Therefore ``torch.tensor(x)`` is equivalent to ``x.clone().detach()``
and ``torch.tensor(x, requires_grad=True)`` is equivalent to ``x.clone().detach().requires_grad_(True)``.
The equivalents use ``clone()`` and ``detach()`` are recommended.

This comment was marked as off-topic.

This comment was marked as off-topic.


if (THPVariable_Check(data)) {
PyErr_WarnEx(PyExc_UserWarning,
"To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() "

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

type_inference)
type_inference,
args_requires_grad)
.set_requires_grad(r.toBool(3));

This comment was marked as off-topic.

return copy_variables ? new_with_tensor_copy(type_to_use, var, device_index) :
auto new_tensor = copy_variables ? new_with_tensor_copy(type_to_use, var, device_index) :
new_with_type_conversion(type_to_use, var, device_index);
new_tensor.detach_(); // making copy constructed tensor a leaf node

This comment was marked as off-topic.

cpuhrsch pushed a commit to cpuhrsch/pytorch that referenced this pull request Sep 21, 2018
Summary:
- fix PR pytorch#11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data `
- with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args
- `torch.as_tensor` retains its behavior as documented

gchanan apaszke
Pull Request resolved: pytorch#11815

Differential Revision: D9932713

Pulled By: weiyangfb

fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259
iotamudelta pushed a commit to ROCm/pytorch that referenced this pull request Sep 21, 2018
Summary:
- fix PR pytorch#11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data `
- with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args
- `torch.as_tensor` retains its behavior as documented

gchanan apaszke
Pull Request resolved: pytorch#11815

Differential Revision: D9932713

Pulled By: weiyangfb

fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259
facebook-github-bot pushed a commit that referenced this pull request Sep 26, 2018
Summary:
The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT

The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in #11061 .
Pull Request resolved: #12038

Differential Revision: D10033392

Pulled By: apaszke

fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pytorch] requires_grad=False from torch.tensor ignored when the input tensor has requires_grad=True

6 participants