Skip to content

Conversation

@albanD
Copy link
Collaborator

@albanD albanD commented Nov 21, 2019

Stack from ghstack:

Differential Revision: D18665459

albanD added a commit that referenced this pull request Nov 21, 2019
ghstack-source-id: 72a278b
Pull Request resolved: #30258
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

albanD added a commit to albanD/pytorch that referenced this pull request Nov 25, 2019
ghstack-source-id: 6aaf2a2
Pull Request resolved: pytorch#30258
@vincentqb
Copy link
Contributor

vincentqb commented Nov 25, 2019

Is this test related? It doesn't seem to be to me.

Nov 23 00:24:20 ERROR [0.077s]: test_tensor_sharing (jit.test_data_parallel.TestDataParallel)
Nov 23 00:24:20 ----------------------------------------------------------------------
Nov 23 00:24:20 Traceback (most recent call last):
Nov 23 00:24:20   File "/var/lib/jenkins/workspace/test/jit/test_data_parallel.py", line 118, in test_tensor_sharing
Nov 23 00:24:20     r0_forward = replica[0].forward(x)
Nov 23 00:24:20   File "/var/lib/jenkins/workspace/test/common_utils.py", line 82, in prof_meth_call
Nov 23 00:24:20     return prof_callable(meth_call, *args, **kwargs)
Nov 23 00:24:20   File "/var/lib/jenkins/workspace/test/common_utils.py", line 76, in prof_callable
Nov 23 00:24:20     return callable(*args, **kwargs)
Nov 23 00:24:20 RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at /var/lib/jenkins/workspace/torch/csrc/autograd/variable.cpp:326, please report a bug to PyTorch. 
Nov 23 00:24:20 The above operation failed in interpreter.
Nov 23 00:24:20 Traceback (most recent call last):
Nov 23 00:24:20   File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1370
Nov 23 00:24:20     if input.dim() == 2 and bias is not None:
Nov 23 00:24:20         # fused op is marginally faster
Nov 23 00:24:20         ret = torch.addmm(bias, input, weight.t())
Nov 23 00:24:20               ~~~~~~~~~~~ <--- HERE
Nov 23 00:24:20     else:
Nov 23 00:24:20         output = input.matmul(weight.t())

@albanD
Copy link
Collaborator Author

albanD commented Nov 25, 2019

I was looking into the tests. But I need a multigpu box to test. So it's taking some time to get running.

albanD added a commit to albanD/pytorch that referenced this pull request Nov 25, 2019
ghstack-source-id: 6aaf2a2
Pull Request resolved: pytorch#30258
@albanD
Copy link
Collaborator Author

albanD commented Nov 25, 2019

After investigation, the problem is the same as: #13452 (comment)
The Broadcast Function from distributed does bad things.
This PR fix the inplace detection for different models, and the Broadcast Function should definitely not be doing what it does.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Causes issues in #30334 ? Thanks for identifying this.

@albanD
Copy link
Collaborator Author

albanD commented Nov 25, 2019

Closing this until we fix the Broadcast issue described in #13452 (comment)

@albanD albanD closed this Nov 25, 2019
@albanD albanD mentioned this pull request Dec 19, 2019
15 tasks
@facebook-github-bot facebook-github-bot deleted the gh/albanD/12/head branch December 26, 2019 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants