-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Added CUDA support for complex input for torch.inverse #45034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 55ad7fc (more details on the Dr. CI page):
1 failure confirmed as flaky and can be ignored:
🚧 6 ongoing upstream failures:These were probably caused by upstream breakages that are not fixed yet:
Extra GitHub checks: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 76 times. |
test/test_torch.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently single-inverse is on cusolver, but batched inverse is mostly on MAGMA. cublas batched inverse performance is too bad, and is not used.
Different implementations shouldn't give these large differences. Can you try running tests only in double precision, and see if the numerical difference is still this large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently single-inverse is on cusolver, but batched inverse is mostly on MAGMA.
Indeed, I mixed them up.
I will check the tolerances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double precision is fine and tests pass for low tolerance (atol=1e-11 in my case).
But for floats, individual entries can vary significantly:
Tensors failed to compare as equal! With rtol=0 and atol=0.1, found 2553 element(s) (out of 393216) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.37847900390625 (820.0460815429688 vs. 819.6676025390625), which occurred at index (1, 1, 254, 71).
I think MAGMA and cuSOLVER can give these large differences as long as the identity == inv(A) A tests pass
self.assertEqual(identity.expand_as(matrix), torch.matmul(matrix, matrix_inverse))
self.assertEqual(identity.expand_as(matrix), torch.matmul(matrix_inverse, matrix))
xwang233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! The PR generally looks good, other than the tolerance in tests.
xwang233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The cusolver implementations LGTM. Can you rebase and make sure all relevant tests passed?
|
So MAGMA vs cuSOLVER (batched vs single inverse) fails on Windows with
I wonder what should we do with this check. |
|
I think it's mostly due to precisions? When the original matrix is in the range of |
a997594 to
1aa60da
Compare
for fp32 batched inverse
1aa60da to
a298f93
Compare
Codecov Report
@@ Coverage Diff @@
## master #45034 +/- ##
===========================================
+ Coverage 35.28% 81.45% +46.17%
===========================================
Files 443 1798 +1355
Lines 56512 188195 +131683
===========================================
+ Hits 19938 153297 +133359
+ Misses 36574 34898 -1676 |
anjali411
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @IvanYashchuk
|
I've fixed merge conflicts. |
|
@mruberry, @anjali411 could you import this pull request? |
|
Hi @IvanYashchuk! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
|
I've fixed new merge conflicts. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Hi @IvanYashchuk can you rebase and resolve the merge conflict? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@anjali411 merged this pull request in 33acbed. |
|
Failing tests should be fixed now with a770132. |
|
@IvanYashchuk can you create a new PR for reland? |
Summary: `torch.inverse` now works for complex inputs on GPU. Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`. Previous PR #45034 Ref. #33152 Pull Request resolved: #47595 Reviewed By: navahgar Differential Revision: D24840955 Pulled By: anjali411 fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b
torch.inversenow works for complex inputs on GPU.Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet.
Ref. #33152