Add cuSOLVER path for torch.linalg.lstsq #57317

IvanYashchuk · 2021-04-29T22:22:37Z

Stack from ghstack:

Added cuBLAS path for torch.linalg.lstsq #54725 Added cuBLAS path for torch.linalg.lstsq
Add cuSOLVER path for torch.linalg.lstsq #57317 Add cuSOLVER path for torch.linalg.lstsq

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Differential Revision: D28312683

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

facebook-github-bot · 2021-04-29T22:22:48Z

💊 CI failures summary and remediations

As of commit b105517 (more details on the Dr. CI page):

2/2 failures introduced in this PR

2 failures not recognized by patterns:

Job	Step	Action
^mypy	^Unknown	🔁 rerun
^flake8-py3	^Unknown	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: 2c88768 Pull Request resolved: #57317

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: 6955a81 Pull Request resolved: #57317

mruberry · 2021-05-01T23:32:35Z

@xwang233 and @lezcano and/or @nikitaved, would you review this, please?

lezcano

LGTM! The logic is as clean as it can be. I just left a small comment on a bit that I found slightly more difficult to understand.

lezcano · 2021-05-03T13:00:17Z

torch/linalg/__init__.py

 :attr:`driver` chooses the LAPACK/MAGMA function that will be used.
 For CPU inputs the valid values are `'gels'`, `'gelsy'`, `'gelsd`, `'gelss'`.
-For CUDA input, the only valid driver is `'gels'`, which assumes that :attr:`A` is full-rank and `m < n`.
+For CUDA input, the only valid driver is `'gels'`, which assumes that :attr:`A` is full-rank.


lezcano · 2021-05-03T13:27:34Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+        const_cast<Tensor&>(infos),
+        upper, transpose, conjugate_transpose, unitriangular);
+
+    B.narrow(-2, m, n - m).zero_();


This is because triangular_solve_kernel writes its output into the first m elements of B, right? Could you leave a comment explaining this here?

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cu

xwang233

LGTM. Thanks for the PR!

xwang233 · 2021-05-03T19:50:12Z

test/test_linalg.py

+        # cases m < n are only supported on CPU and for cuSOLVER path on CUDA
        m_l_n_sizes = [(m // 2, m) for m in ms]
-        matrix_sizes = m_ge_n_sizes + (m_l_n_sizes if device == 'cpu' else [])
+        matrix_sizes = m_ge_n_sizes + (m_l_n_sizes if cusolver_available else [])


maybe use (cusolver_available or device == 'cpu') to test both?

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: cfdf68e Pull Request resolved: #57317

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: 86d64f8 Pull Request resolved: #57317

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: 40dc912 Pull Request resolved: #57317

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: e7d2246 Pull Request resolved: pytorch#57317

mruberry

Nice work all! Thanks for reviewing, @nikitaved, @xwang233

mruberry · 2021-05-06T00:04:13Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-05-06T05:45:34Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-06T11:46:14Z

@mruberry merged this pull request in 7b31d42.

samestep · 2021-05-06T16:24:52Z

Reverting this PR because it broke one of the Windows test jobs: https://app.circleci.com/pipelines/github/pytorch/pytorch/317376/workflows/463399f8-78ef-4894-a9bf-8b666943efc2/jobs/13217419

facebook-github-bot · 2021-05-06T16:29:37Z

This pull request has been reverted by 72ebdd6.

mruberry · 2021-05-07T08:13:13Z

This diff was revert, but the previous commits in the stack were not, I think. Link to why it was reverted:

https://app.circleci.com/pipelines/github/pytorch/pytorch/317452/workflows/c6beb886-c8ac-4cb5-bdbc-9cf351f89b7c/jobs/13220476

It broke pytorch_windows_vs2019_py36_cuda10.1_test2 and tests test_linalg_lstsq_input_checks_cuda_complex128, test_linalg_lstsq_input_checks_cuda_complex64, test_linalg_lstsq_input_checks_cuda_float32, and test_linalg_lstsq_input_checks_cuda_float64.

Sample failure snippet:

Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 292, in instantiated_test
    result = test_fn(self, *args)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 617, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 617, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "test_linalg.py", line 401, in test_linalg_lstsq_input_checks
    torch.linalg.lstsq(a, b)
AssertionError: RuntimeError not raised

The easiest way to reland the rest of the stack is probably to rebase the uncommitted PRs on nightly with the fix. We can run the updated PR through ci/all to validate this build is fixed, too.

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Differential Revision: [D28242069](https://our.internmc.facebook.com/intern/diff/D28242069) [ghstack-poisoned]

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. ghstack-source-id: aada19b Pull Request resolved: #57317

IvanYashchuk · 2021-05-07T13:56:37Z

@mruberry, I fixed the problem with that Windows CUDA 10.1 build. Here is the ci-all PR #57816.

The problem was that the condition of cuSOLVER availability was not correct in the test. I think we should consider adding a more robust way to check from Python whether cuSOLVER is used in PyTorch.

We use cuSOLVER if CUDA version is >= 10.1.243, but torch._C._cuda_getCompiledVersion() or torch.version.cuda output only major and minor versions ('10.1'). The current version of this PR checks that for the case of underdetermined input (m<n) the error is thrown for CUDA versions < 10.1. However, the underdetermined case is actually not working and raises the error for CUDA versions < 10.1.243, so raising the error is not tested for versions from 10.1.0 to 10.1.243.

Summary: Pull Request resolved: pytorch#57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242069 Pulled By: mruberry fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97

mruberry · 2021-05-08T23:39:40Z

@mruberry, I fixed the problem with that Windows CUDA 10.1 build. Here is the ci-all PR #57816.

The problem was that the condition of cuSOLVER availability was not correct in the test. I think we should consider adding a more robust way to check from Python whether cuSOLVER is used in PyTorch.

We use cuSOLVER if CUDA version is >= 10.1.243, but torch._C._cuda_getCompiledVersion() or torch.version.cuda output only major and minor versions ('10.1'). The current version of this PR checks that for the case of underdetermined input (m<n) the error is thrown for CUDA versions < 10.1. However, the underdetermined case is actually not working and raises the error for CUDA versions < 10.1.243, so raising the error is not tested for versions from 10.1.0 to 10.1.243.

Thanks @IvanYashchuk, and thanks for the thorough analysis. So users with a CUDA version between 10.1 and 10.1.243 will get the correct behavior (we think), but our test suite will report the behavior as incorrect?

mruberry · 2021-05-08T23:43:21Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

IvanYashchuk · 2021-05-10T05:36:15Z

So users with a CUDA version between 10.1 and 10.1.243 will get the correct behavior (we think)

Yes, but our test suite doesn't test the behavior for these versions, the tests will pass.

Summary: Pull Request resolved: pytorch#57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242069 Pulled By: mruberry fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97

Summary: Pull Request resolved: pytorch#57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28312683 Pulled By: mruberry fbshipit-source-id: dc8ae837a5fb0685d85c8733a47d7d25dc46443a

Add cuSOLVER path for torch.linalg.lstsq

587a6f4

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

facebook-github-bot added the cla signed label Apr 29, 2021

IvanYashchuk requested review from lezcano, mruberry and nikitaved April 29, 2021 22:25

IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 29, 2021

IvanYashchuk mentioned this pull request Apr 29, 2021

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Open

pytorchbot added the open source label Apr 29, 2021

Update on "Add cuSOLVER path for torch.linalg.lstsq"

f6baa61

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

mruberry requested a review from xwang233 May 1, 2021 23:32

lezcano approved these changes May 3, 2021

View reviewed changes

lezcano reviewed May 3, 2021

View reviewed changes

aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cu Show resolved Hide resolved

xwang233 approved these changes May 3, 2021

View reviewed changes

Update on "Add cuSOLVER path for torch.linalg.lstsq"

284e13c

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

IvanYashchuk mentioned this pull request May 4, 2021

Added cuBLAS path for torch.linalg.lstsq #54725

Closed

Update on "Add cuSOLVER path for torch.linalg.lstsq"

1a44aab

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

Update on "Add cuSOLVER path for torch.linalg.lstsq"

f8c230d

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

Update on "Add cuSOLVER path for torch.linalg.lstsq"

9030d2c

This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. [ghstack-poisoned]

mruberry approved these changes May 6, 2021

View reviewed changes

facebook-github-bot closed this in 7b31d42 May 6, 2021

facebook-github-bot added the Merged label May 6, 2021

facebook-github-bot added the Reverted label May 6, 2021

IvanYashchuk reopened this May 7, 2021

facebook-github-bot closed this in d11cce4 May 10, 2021

facebook-github-bot deleted the gh/ivanyashchuk/29/head branch May 13, 2021 14:17

IvanYashchuk removed the Reverted label Nov 30, 2021

raj-magesh mentioned this pull request Sep 14, 2022

[Missing documentation] cuSOLVER path for linalg.lstsq for underdetermined linear systems (on GPU) #85021

Closed

Add cuSOLVER path for torch.linalg.lstsq #57317

Add cuSOLVER path for torch.linalg.lstsq #57317

Uh oh!

Conversation

IvanYashchuk commented Apr 29, 2021 • edited by mruberry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

2 failures not recognized by patterns:

Uh oh!

mruberry commented May 1, 2021

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano May 3, 2021

Choose a reason for hiding this comment

Uh oh!

lezcano May 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xwang233 left a comment

Choose a reason for hiding this comment

Uh oh!

xwang233 May 3, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented May 6, 2021

Uh oh!

mruberry commented May 6, 2021

Uh oh!

facebook-github-bot commented May 6, 2021

Uh oh!

samestep commented May 6, 2021

Uh oh!

facebook-github-bot commented May 6, 2021

Uh oh!

mruberry commented May 7, 2021

Uh oh!

IvanYashchuk commented May 7, 2021

Uh oh!

mruberry commented May 8, 2021

Uh oh!

mruberry commented May 8, 2021

Uh oh!

IvanYashchuk commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

IvanYashchuk commented Apr 29, 2021 •

edited by mruberry

Loading

facebook-github-bot commented Apr 29, 2021 •

edited

Loading