Skip to content

Conversation

@kennyhorror
Copy link
Contributor

Summary:
At the current moment of time nn.Linear (an it's interal functional code), will
fail in THBlas:

RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363

This diff is trying to fix this bug.

As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic:

  1. The file touched in this diff
  2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu

At the moment I didn't find a better places comparing to injecting logic to those files:
the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass.

Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends.

Differential Revision: D17599915

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: nn Related to torch.nn module: operators labels Oct 2, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused... If the result is empty, what are these assignments doing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're 2 main cases when failure in BLAS routines happens:

  1. n or m is 0, which will result in the final matrix that 0 as one of the dimensions (as an example Linear called on a batch of size 0). In this case the only part that is needed from this function - resize of the final matrix to have correct output dimensions.

  2. Another case when BLAS can fail is when k is 0, that can fail when one of the strides is 0 (happens on backward pass for Linear with an empty batch). In that case I'm getting rid of no-op part for ADDMM:
    alpha * A x B + beta * C -> beta * C

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually beta is 1.0 so we can skip this loop all together

@pytorchbot pytorchbot added module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general labels Oct 3, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

3 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennyhorror has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
At the current moment of time nn.Linear (an it's interal functional code), will
fail in THBlas:

RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363

This diff is trying to fix this bug.

As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic:
1. The file touched in this diff
2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu

At the moment I didn't find a better places comparing to injecting logic to those files:
the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass.
Pull Request resolved: #27211

Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends.

Differential Revision: D17599915

Pulled By: kennyhorror

fbshipit-source-id: cd47fe9fddb3bad1be5ddaddb1c1d9b95a76d258
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17599915

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 9, 2019
Summary:
At the current moment of time nn.Linear (an it's interal functional code), will
fail in THBlas:

RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363

This diff is trying to fix this bug.

As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic:
1. The file touched in this diff
2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu

At the moment I didn't find a better places comparing to injecting logic to those files:
the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass.
Pull Request resolved: pytorch/pytorch#27211

Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends.

Differential Revision: D17599915

Pulled By: kennyhorror

fbshipit-source-id: 78894ce602d96aac2d6bf8c16a3fab43973e2d53
@facebook-github-bot
Copy link
Contributor

@kennyhorror merged this pull request in a891e92.

@ezyang
Copy link
Contributor

ezyang commented Oct 9, 2019

This seems to break py2.7.9 test


Oct 09 00:47:41 test_interface (__main__.TestClassType) ... terminate called after throwing an instance of 'std::runtime_error'
Oct 09 00:47:41   what():  pybind11_object_dealloc(): Tried to deallocate unregistered instance!
Oct 09 00:47:41 Traceback (most recent call last):
Oct 09 00:47:41   File "test/run_test.py", line 458, in <module>
Oct 09 00:47:41     main()
Oct 09 00:47:41   File "test/run_test.py", line 450, in main
Oct 09 00:47:41     raise RuntimeError(message)
Oct 09 00:47:41 RuntimeError: test_jit failed! Received signal: SIGIOT

thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
At the current moment of time nn.Linear (an it's interal functional code), will
fail in THBlas:

RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363

This diff is trying to fix this bug.

As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic:
1. The file touched in this diff
2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu

At the moment I didn't find a better places comparing to injecting logic to those files:
the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass.
Pull Request resolved: pytorch#27211

Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends.

Differential Revision: D17599915

Pulled By: kennyhorror

fbshipit-source-id: 78894ce602d96aac2d6bf8c16a3fab43973e2d53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants