Skip to content

Conversation

@IvanYashchuk
Copy link
Collaborator

torch.inverse now works for complex inputs on GPU.
Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet.

Ref. #33152

@IvanYashchuk IvanYashchuk added module: complex Related to complex number support in PyTorch module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Sep 19, 2020
@dr-ci
Copy link

dr-ci bot commented Sep 19, 2020

💊 CI failures summary and remediations

As of commit 55ad7fc (more details on the Dr. CI page):


  • 2/8 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)
  • 6/8 broken upstream at merge base 9e0102c since Nov 08

1 failure confirmed as flaky and can be ignored:

  • pytorch_linux_xenial_py3_clang7_onnx_ort_test2

🚧 6 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet:


Extra GitHub checks: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 76 times.

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 22, 2020
@vishwakftw vishwakftw requested a review from xwang233 September 22, 2020 20:27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently single-inverse is on cusolver, but batched inverse is mostly on MAGMA. cublas batched inverse performance is too bad, and is not used.

Different implementations shouldn't give these large differences. Can you try running tests only in double precision, and see if the numerical difference is still this large?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently single-inverse is on cusolver, but batched inverse is mostly on MAGMA.

Indeed, I mixed them up.

I will check the tolerances.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double precision is fine and tests pass for low tolerance (atol=1e-11 in my case).
But for floats, individual entries can vary significantly:

Tensors failed to compare as equal! With rtol=0 and atol=0.1, found 2553 element(s) (out of 393216) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.37847900390625 (820.0460815429688 vs. 819.6676025390625), which occurred at index (1, 1, 254, 71).

I think MAGMA and cuSOLVER can give these large differences as long as the identity == inv(A) A tests pass

self.assertEqual(identity.expand_as(matrix), torch.matmul(matrix, matrix_inverse))
self.assertEqual(identity.expand_as(matrix), torch.matmul(matrix_inverse, matrix))

Copy link
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! The PR generally looks good, other than the tolerance in tests.

Copy link
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The cusolver implementations LGTM. Can you rebase and make sure all relevant tests passed?

@IvanYashchuk
Copy link
Collaborator Author

So MAGMA vs cuSOLVER (batched vs single inverse) fails on Windows with

Tensors failed to compare as equal! With rtol=0 and atol=0.1, found 19138 element(s) (out of 393216) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.11126708984375 (819.3917236328125 vs. 818.2804565429688)

I wonder what should we do with this check.

@xwang233
Copy link
Collaborator

I think it's mostly due to precisions? When the original matrix is in the range of randn, which is mostly small numbers like -1~1. Its inverse could contain large numbers (such as 819 you are seeing) and cause numerical inaccuracy in float32 precision.

@codecov
Copy link

codecov bot commented Oct 2, 2020

Codecov Report

Merging #45034 into master will increase coverage by 46.17%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master   #45034       +/-   ##
===========================================
+ Coverage   35.28%   81.45%   +46.17%     
===========================================
  Files         443     1798     +1355     
  Lines       56512   188195   +131683     
===========================================
+ Hits        19938   153297   +133359     
+ Misses      36574    34898     -1676     

@mruberry mruberry self-requested a review October 5, 2020 06:42
This was referenced Oct 14, 2020
Copy link
Contributor

@anjali411 anjali411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @IvanYashchuk

@IvanYashchuk
Copy link
Collaborator Author

I've fixed merge conflicts.

@IvanYashchuk
Copy link
Collaborator Author

@mruberry, @anjali411 could you import this pull request?

@facebook-github-bot
Copy link
Contributor

Hi @IvanYashchuk!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@IvanYashchuk
Copy link
Collaborator Author

I've fixed new merge conflicts.
@mruberry, @anjali411 this PR should be ready now for merging.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@anjali411
Copy link
Contributor

Hi @IvanYashchuk can you rebase and resolve the merge conflict?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anjali411 merged this pull request in 33acbed.

@ezyang
Copy link
Contributor

ezyang commented Nov 6, 2020

@IvanYashchuk IvanYashchuk reopened this Nov 6, 2020
@IvanYashchuk
Copy link
Collaborator Author

Failing tests should be fixed now with a770132.

@anjali411
Copy link
Contributor

@IvanYashchuk can you create a new PR for reland?

facebook-github-bot pushed a commit that referenced this pull request Nov 11, 2020
Summary:
`torch.inverse` now works for complex inputs on GPU.
Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`.
Previous PR #45034

Ref. #33152

Pull Request resolved: #47595

Reviewed By: navahgar

Differential Revision: D24840955

Pulled By: anjali411

fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: complex Related to complex number support in PyTorch module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants