Skip to content

Conversation

@IvanYashchuk
Copy link
Collaborator

@IvanYashchuk IvanYashchuk commented Apr 29, 2021

Stack from ghstack:

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Differential Revision: D28312683

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 29, 2021

💊 CI failures summary and remediations

As of commit b105517 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

2 failures not recognized by patterns:

Job Step Action
GitHub Actions mypy Unknown 🔁 rerun
GitHub Actions flake8-py3 Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

IvanYashchuk added a commit that referenced this pull request Apr 29, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: 2c88768
Pull Request resolved: #57317
@IvanYashchuk IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 29, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 1, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: 6955a81
Pull Request resolved: #57317
@mruberry mruberry requested a review from xwang233 May 1, 2021 23:32
@mruberry
Copy link
Collaborator

mruberry commented May 1, 2021

@xwang233 and @lezcano and/or @nikitaved, would you review this, please?

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The logic is as clean as it can be. I just left a small comment on a bit that I found slightly more difficult to understand.

:attr:`driver` chooses the LAPACK/MAGMA function that will be used.
For CPU inputs the valid values are `'gels'`, `'gelsy'`, `'gelsd`, `'gelss'`.
For CUDA input, the only valid driver is `'gels'`, which assumes that :attr:`A` is full-rank and `m < n`.
For CUDA input, the only valid driver is `'gels'`, which assumes that :attr:`A` is full-rank.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

const_cast<Tensor&>(infos),
upper, transpose, conjugate_transpose, unitriangular);

B.narrow(-2, m, n - m).zero_();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because triangular_solve_kernel writes its output into the first m elements of B, right? Could you leave a comment explaining this here?

Copy link
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR!

# cases m < n are only supported on CPU and for cuSOLVER path on CUDA
m_l_n_sizes = [(m // 2, m) for m in ms]
matrix_sizes = m_ge_n_sizes + (m_l_n_sizes if device == 'cpu' else [])
matrix_sizes = m_ge_n_sizes + (m_l_n_sizes if cusolver_available else [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use (cusolver_available or device == 'cpu') to test both?

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: cfdf68e
Pull Request resolved: #57317
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: 86d64f8
Pull Request resolved: #57317
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: 40dc912
Pull Request resolved: #57317
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request May 4, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: e7d2246
Pull Request resolved: pytorch#57317
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work all! Thanks for reviewing, @nikitaved, @xwang233

@mruberry
Copy link
Collaborator

mruberry commented May 6, 2021

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@mruberry
Copy link
Collaborator

mruberry commented May 6, 2021

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 7b31d42.

@samestep
Copy link
Contributor

samestep commented May 6, 2021

Reverting this PR because it broke one of the Windows test jobs: https://app.circleci.com/pipelines/github/pytorch/pytorch/317376/workflows/463399f8-78ef-4894-a9bf-8b666943efc2/jobs/13217419

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 72ebdd6.

@mruberry
Copy link
Collaborator

mruberry commented May 7, 2021

This diff was revert, but the previous commits in the stack were not, I think. Link to why it was reverted:

https://app.circleci.com/pipelines/github/pytorch/pytorch/317452/workflows/c6beb886-c8ac-4cb5-bdbc-9cf351f89b7c/jobs/13220476

It broke pytorch_windows_vs2019_py36_cuda10.1_test2 and tests test_linalg_lstsq_input_checks_cuda_complex128, test_linalg_lstsq_input_checks_cuda_complex64, test_linalg_lstsq_input_checks_cuda_float32, and test_linalg_lstsq_input_checks_cuda_float64.

Sample failure snippet:

Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 292, in instantiated_test
    result = test_fn(self, *args)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 617, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 617, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "test_linalg.py", line 401, in test_linalg_lstsq_input_checks
    torch.linalg.lstsq(a, b)
AssertionError: RuntimeError not raised

The easiest way to reland the rest of the stack is probably to rebase the uncommitted PRs on nightly with the fix. We can run the updated PR through ci/all to validate this build is fixed, too.

@IvanYashchuk IvanYashchuk reopened this May 7, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Differential Revision: [D28242069](https://our.internmc.facebook.com/intern/diff/D28242069)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 7, 2021
This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

ghstack-source-id: aada19b
Pull Request resolved: #57317
@IvanYashchuk
Copy link
Collaborator Author

@mruberry, I fixed the problem with that Windows CUDA 10.1 build. Here is the ci-all PR #57816.

The problem was that the condition of cuSOLVER availability was not correct in the test. I think we should consider adding a more robust way to check from Python whether cuSOLVER is used in PyTorch.

We use cuSOLVER if CUDA version is >= 10.1.243, but torch._C._cuda_getCompiledVersion() or torch.version.cuda output only major and minor versions ('10.1'). The current version of this PR checks that for the case of underdetermined input (m<n) the error is thrown for CUDA versions < 10.1. However, the underdetermined case is actually not working and raises the error for CUDA versions < 10.1.243, so raising the error is not tested for versions from 10.1.0 to 10.1.243.

mrshenli pushed a commit to mrshenli/pytorch that referenced this pull request May 8, 2021
Summary:
Pull Request resolved: pytorch#57317

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28242069

Pulled By: mruberry

fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97
@mruberry
Copy link
Collaborator

mruberry commented May 8, 2021

@mruberry, I fixed the problem with that Windows CUDA 10.1 build. Here is the ci-all PR #57816.

The problem was that the condition of cuSOLVER availability was not correct in the test. I think we should consider adding a more robust way to check from Python whether cuSOLVER is used in PyTorch.

We use cuSOLVER if CUDA version is >= 10.1.243, but torch._C._cuda_getCompiledVersion() or torch.version.cuda output only major and minor versions ('10.1'). The current version of this PR checks that for the case of underdetermined input (m<n) the error is thrown for CUDA versions < 10.1. However, the underdetermined case is actually not working and raises the error for CUDA versions < 10.1.243, so raising the error is not tested for versions from 10.1.0 to 10.1.243.

Thanks @IvanYashchuk, and thanks for the thorough analysis. So users with a CUDA version between 10.1 and 10.1.243 will get the correct behavior (we think), but our test suite will report the behavior as incorrect?

@mruberry
Copy link
Collaborator

mruberry commented May 8, 2021

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@IvanYashchuk
Copy link
Collaborator Author

So users with a CUDA version between 10.1 and 10.1.243 will get the correct behavior (we think)

Yes, but our test suite doesn't test the behavior for these versions, the tests will pass.

@facebook-github-bot facebook-github-bot deleted the gh/ivanyashchuk/29/head branch May 13, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#57317

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28242069

Pulled By: mruberry

fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#57317

This PR implements QR-based least squares solver using geqrf, ormqr, and
triangular_solve operations.

Internal code of triangular_solve was fixed to handle correctly larger
sized rectangular arrays.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28312683

Pulled By: mruberry

fbshipit-source-id: dc8ae837a5fb0685d85c8733a47d7d25dc46443a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants