Skip to content

Conversation

@vishwakftw
Copy link
Contributor

Fixes #8774 .

cc: @soumith @ssnl

@zou3519
Copy link
Contributor

zou3519 commented Jul 10, 2018

Should grad_fn be callable at all?

@vishwakftw
Copy link
Contributor Author

vishwakftw commented Jul 10, 2018

@zou3519 grad_fn represents a form of parameterized gradient function. It probably should not be accessible publicly, but I don't know the design decisions taken. There seems to be a detailed writeup in torch/csrc/autograd/functions.h.

This reverts commit 5ddc3cb.

Reverting because this is redundant
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats still an incomplete fix because it might be called with a wrong number. It would be better to use the information we stash in the forward pass to validate the inputs.

@vishwakftw
Copy link
Contributor Author

vishwakftw commented Jul 10, 2018

@apaszke Is my fix correct - because there are some failing tests which I don't know how to debug?

@yf225
Copy link
Contributor

yf225 commented Jul 10, 2018

@pytorchbot retest this please

@ssnl
Copy link
Collaborator

ssnl commented Jul 11, 2018

error looks legit

17:19:32 ======================================================================
17:19:32 ERROR: test_once_differentiable (__main__.TestAutograd)
17:19:32 ----------------------------------------------------------------------
17:19:32 Traceback (most recent call last):
17:19:32   File "test_autograd.py", line 131, in test_once_differentiable
17:19:32     x, y = self._function_test(MyFunction)
17:19:32   File "test_autograd.py", line 79, in _function_test
17:19:32     result.sum().backward(go, create_graph=True)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward
17:19:32     torch.autograd.backward(self, gradient, retain_graph, create_graph)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/__init__.py", line 90, in backward
17:19:32     allow_unreachable=True)  # allow_unreachable flag
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/function.py", line 76, in apply
17:19:32     return self._forward_cls.backward(self, *args)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/function.py", line 223, in wrapper
17:19:32     return err_fn(*[fake_requires_grad(v) for v in outputs])
17:19:32 RuntimeError: /var/lib/jenkins/workspace/torch/csrc/autograd/function.h:120: operator(): Assertion `num_inputs() == inputs.size()` failed: expected 0 arguments, got 3 instead
17:19:32 
17:19:32 ----------------------------------------------------------------------

@apaszke
Copy link
Contributor

apaszke commented Jul 11, 2018

@vishwakftw can you try to see what Function fails the assertion? It might be the case that there's a bug elsewhere


virtual variable_list apply(const variable_list& inputs) override;

virtual variable_list operator()(const variable_list& inputs) {

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't notice you added the check at every single function call! Maybe just put it in THPCppFunction_call so that it only applies to Python functions that cause the segfault. The rest of the system should be correct "by design"


variable_list operator()(const variable_list& inputs) {
return apply(inputs);
}

This comment was marked as off-topic.

@vishwakftw
Copy link
Contributor Author

Adding it to THPCppFunction_call fails for the DelayedError function call, after building locally.

@vishwakftw
Copy link
Contributor Author

Same test error as earlier. @apaszke

int num_inputs = PyTuple_GET_SIZE(args);
int num_inputs_required = ((THPCppFunction*)self)->cdata->num_inputs();
std::string self_name = ((THPCppFunction*)self)->cdata->name();
if ((self_name.find("Error") == std::string::npos) && (num_inputs != num_inputs_required)) {

This comment was marked as off-topic.

This comment was marked as off-topic.

DelayedError(std::string msg, int num_inputs)
: msg(std::move(msg)) {
for (int i = 0; i < num_inputs; i++)
input_metadata_.emplace_back();

This comment was marked as off-topic.

This comment was marked as off-topic.


std::string msg;

uint32_t input_nr;

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jul 13, 2018
Summary: Fixes #8774 .

Reviewed By: soumith

Differential Revision: D8836478

Pulled By: apaszke

fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38
@vishwakftw vishwakftw closed this Jul 13, 2018
@vishwakftw vishwakftw deleted the grad_fn-fix branch July 13, 2018 21:48
@ssnl
Copy link
Collaborator

ssnl commented Jul 15, 2018

I know that this has been merged... But can we have a test for this?

@vishwakftw
Copy link
Contributor Author

How do I do it? Should I send in a PR?

@ssnl
Copy link
Collaborator

ssnl commented Jul 15, 2018

That would be great! I can open an issue to track this if you are busy.

@vishwakftw
Copy link
Contributor Author

I think I should be able to do it; doesn't seem like a lot of effort. :)

@vishwakftw
Copy link
Contributor Author

PR in at #9457

petrex pushed a commit to petrex/pytorch that referenced this pull request Jul 16, 2018
* upstream/master: (24 commits)
  Implement tensor weak references (pytorch#9363)
  Nuke TestCollectEnv (pytorch#9459)
  Add test case for segmentation fault fix in grad_fn (pytorch#9457)
  Add peephole optimization for type_as operators. (pytorch#9316)
  Fix out-of-range error for test_neg (pytorch#9431)
  add depthwise conv support for mkldnn (pytorch#8782)
  Refactor `_log_sum_exp` (pytorch#9173)
  Add ModuleDict and ParameterDict containers (pytorch#8463)
  Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (pytorch#9358)
  Introducing IsInf (pytorch#9169)
  add device to CUDAEvent (pytorch#9415)
  Make localScalar error message more intuitive (pytorch#9443)
  Only accept continguous tensors in TopK for cuda (pytorch#9441)
  Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (pytorch#9299)
  Only view() rhs of index_put if we need to (pytorch#9424)
  Add BatchBucketizeOp in caffe2 (pytorch#9385)
  Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (pytorch#9001)
  Implementation and operator test for Wngrad optimizer (pytorch#8999)
  Fix segmentation fault in grad_fn (pytorch#9292)
  update docs (pytorch#9423)
  ...
jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
Summary: Fixes pytorch#8774 .

Reviewed By: soumith

Differential Revision: D8836478

Pulled By: apaszke

fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary: Fixes pytorch#8774 .

Reviewed By: soumith

Differential Revision: D8836478

Pulled By: apaszke

fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pytorch] calling t.grad_fn() segfaults

8 participants