Add non-allocating helper function for torch.linalg.qr #56254

IvanYashchuk · 2021-04-16T09:50:52Z

Stack from ghstack:

Add cuSOLVER path for torch.linalg.lstsq #57317 Add cuSOLVER path for torch.linalg.lstsq
Add CUDA support for torch.ormqr #57316 Add CUDA support for torch.ormqr
Port CPU torch.ormqr to ATen #57315 Port CPU torch.ormqr to ATen
Fix torch.ormqr for non Fortran-contiguous inputs #57314 Fix torch.ormqr for non Fortran-contiguous inputs
Fix MAGMA qr for empty batched inputs #56257 Fix MAGMA qr for empty batched inputs
Add cuSOLVER path for torch.linalg.qr #56256 Add cuSOLVER path for torch.linalg.qr
Remove size arguments for internal orgqr and geqrf calls #56255 Remove size arguments for internal orgqr and geqrf calls
Add non-allocating helper function for torch.linalg.qr #56254 Add non-allocating helper function for torch.linalg.qr

Differential Revision: D27960151

[ghstack-poisoned]

facebook-github-bot · 2021-04-16T09:51:20Z

💊 CI failures summary and remediations

As of commit 87182f9 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true : Scalars fai...ith rtol=1.3e-06 and atol=1e-05 is only 1.4278052!

======================================================================
FAIL [1.157s]: test_cudnn_multiple_threads_same_device (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 439, in wrapper
    fn(*args, **kwargs)
  File "test_cuda.py", line 2505, in test_cudnn_multiple_threads_same_device
    (2048 - test_iters) * (2048 - test_iters))
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1397, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! Comparing 2992900.0 and 1098304 gives a difference of 1894596.0, but the allowed difference with rtol=1.3e-06 and atol=1e-05 is only 1.4278052!

----------------------------------------------------------------------
Ran 159 tests in 85.431s

FAILED (failures=1, skipped=67)

Generating XML reports...
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCuda-20210429235051.xml
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCudaComm-20210429235051.xml
Traceback (most recent call last):

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 0a89999 Pull Request resolved: pytorch#56254

[ghstack-poisoned]

ghstack-source-id: a668bd2 Pull Request resolved: pytorch#56254

[ghstack-poisoned]

ghstack-source-id: c2ef227 Pull Request resolved: pytorch#56254

[ghstack-poisoned]

ghstack-source-id: 5005854 Pull Request resolved: pytorch#56254

mruberry · 2021-04-23T03:26:24Z

@lezcano would you sanity check this?

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

ghstack-source-id: 8307873 Pull Request resolved: pytorch#56254

antocuni

LGTM.
If I understand it correctly, this PR by itself is mostly a no-op because it introduces _linalg_qr_out_helper but it never uses it apart for re-implementing the old _linalg_qr_helper, but I assume it will be used by subsequent PRs in the ghstack

antocuni · 2021-04-26T15:33:04Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+    TORCH_INTERNAL_ASSERT(Q.sizes().equals(expected_Q_shape));
+
+    // Q tensor must be in batched column major order (Fortran contiguous)
+    TORCH_INTERNAL_ASSERT(Q.transpose(-2, -1).is_contiguous());


this has nothing to do with this PR, it's just a random idea which came to my mind while reading this.
Should we have something like is_fortran_contiguous()? And maybe even introduce at::MemoryFormat::FortranContiguous which would be useful when creating tensors.
I have seen similar patterns a bit everywhere in linalg code, and having first-class support for fortran-style memory would probably simplify a lot of code

In linalg code we need to have only that the last two dimensions to be Fortran contiguous and all other are normal C contiguous. Real Fortran-style memory would be to have the matrix in the first two dimensions and batches in the end, it's not useful to have.

lezcano

It looks very good. I just found one part that was a bit difficult to follow. I made a proposal for a simpler way to handle that, which I think should be save one allocation.

If this sounds reasonable, the handling of the "unpacking" of the tensors after the geqrf call should be changed as well.

lezcano · 2021-04-26T11:19:37Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+  // geqrf requires m x n workspace input that is modified in-place
+  // if m > n and reduced==true we use Q tensor for storing the result of geqrf operation
+  // otherwise R tensor is used
+  Tensor QR = R;


This logic is a bit difficult to follow. It's difficult to see that all the cases are covered. I think it'd be better to initialise it with a lambda and an if/else structure that has all the else's

const Tensor QR = [&]{ if (m <= n) { return R; } else { if (compute_q) { return reduced_mode ? Q : R; } else { return at::empty(input.transpose(-2, -1).sizes(), input.options()).transpose_(-2, -1); } }}();

Note that the current implementation allocates a new tensor whenever compute_q == false, which is not necessary.

An analogous if/else (without the lambda) should be written below to handle the unpacking of the result.

the current implementation allocates a new tensor whenever compute_q == false, which is not necessary

Right, I'll fix that.

I changed the code to use your conditions.

mruberry · 2021-04-27T08:42:07Z

@lezcano would you take a look at this PR?

IvanYashchuk · 2021-04-29T10:14:33Z

If I understand it correctly, this PR by itself is mostly a no-op because it introduces _linalg_qr_out_helper but it never uses it apart for re-implementing the old _linalg_qr_helper, but I assume it will be used by subsequent PRs in the ghstack

The previous helper was allocating memory on the go, with narrowing and resizes, which is not a good strategy to do. Here we first allocate the memory of the correct size and then fill it with the result.

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

ghstack-source-id: 4a7d272 Pull Request resolved: pytorch#56254

lezcano

LGTM!

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

ghstack-source-id: b31055e Pull Request resolved: pytorch#56254

mruberry

Stamped

facebook-github-bot · 2021-04-30T18:15:18Z

@mruberry merged this pull request in d5e1cac.

Summary: Pull Request resolved: pytorch#56254 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960151 Pulled By: mruberry fbshipit-source-id: 4067afed0dcca3f32d0fa153e50a268a850817b2

Add non-allocating helper function for torch.linalg.qr

a7f4a8b

[ghstack-poisoned]

facebook-github-bot added the cla signed label Apr 16, 2021

IvanYashchuk mentioned this pull request Apr 16, 2021

Remove size arguments for internal orgqr and geqrf calls #56255

Closed

This was referenced Apr 16, 2021

Add cuSOLVER path for torch.linalg.qr #56256

Closed

Fix MAGMA qr for empty batched inputs #56257

Closed

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Add non-allocating helper function for torch.linalg.qr

8923b68

ghstack-source-id: 0a89999 Pull Request resolved: pytorch#56254

pytorchbot added the open source label Apr 16, 2021

IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 16, 2021

IvanYashchuk requested review from antocuni and mruberry April 16, 2021 10:03

Update on "Add non-allocating helper function for torch.linalg.qr"

3b32cb7

[ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Add non-allocating helper function for torch.linalg.qr

a704ba7

ghstack-source-id: a668bd2 Pull Request resolved: pytorch#56254

Update on "Add non-allocating helper function for torch.linalg.qr"

89b3a5e

[ghstack-poisoned]

Update on "Add non-allocating helper function for torch.linalg.qr"

1be0837

[ghstack-poisoned]

IvanYashchuk added 2 commits April 19, 2021 12:10

Update on "Add non-allocating helper function for torch.linalg.qr"

ebde7bb

[ghstack-poisoned]

Update on "Add non-allocating helper function for torch.linalg.qr"

d9a759e

[ghstack-poisoned]

Update on "Add non-allocating helper function for torch.linalg.qr"

acdb900

[ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021

Add non-allocating helper function for torch.linalg.qr

07ae52c

ghstack-source-id: c2ef227 Pull Request resolved: pytorch#56254

Update on "Add non-allocating helper function for torch.linalg.qr"

c166fc4

[ghstack-poisoned]

Update on "Add non-allocating helper function for torch.linalg.qr"

0cffad1

[ghstack-poisoned]

IvanYashchuk removed the request for review from antocuni April 20, 2021 16:50

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 20, 2021

Add non-allocating helper function for torch.linalg.qr

c3017f1

ghstack-source-id: 5005854 Pull Request resolved: pytorch#56254

Update on "Add non-allocating helper function for torch.linalg.qr"

ed99a04

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

Update on "Add non-allocating helper function for torch.linalg.qr"

9daa605

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021

Add non-allocating helper function for torch.linalg.qr

9c6b87f

ghstack-source-id: 8307873 Pull Request resolved: pytorch#56254

lezcano self-requested a review April 26, 2021 10:35

antocuni reviewed Apr 26, 2021

View reviewed changes

lezcano reviewed Apr 26, 2021

View reviewed changes

Update on "Add non-allocating helper function for torch.linalg.qr"

b578896

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 29, 2021

Add non-allocating helper function for torch.linalg.qr

0a2b42f

ghstack-source-id: 4a7d272 Pull Request resolved: pytorch#56254

lezcano approved these changes Apr 29, 2021

View reviewed changes

Update on "Add non-allocating helper function for torch.linalg.qr"

87182f9

Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]

This was referenced Apr 29, 2021

Fix torch.ormqr for non Fortran-contiguous inputs #57314

Closed

Port CPU torch.ormqr to ATen #57315

Closed

Add CUDA support for torch.ormqr #57316

Closed

Add cuSOLVER path for torch.linalg.lstsq #57317

Closed

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 29, 2021

Add non-allocating helper function for torch.linalg.qr

79df6cf

ghstack-source-id: b31055e Pull Request resolved: pytorch#56254

mruberry approved these changes Apr 30, 2021

View reviewed changes

facebook-github-bot closed this in d5e1cac Apr 30, 2021

facebook-github-bot added the Merged label Apr 30, 2021

facebook-github-bot deleted the gh/ivanyashchuk/15/head branch May 4, 2021 14:16

Add non-allocating helper function for torch.linalg.qr #56254

Add non-allocating helper function for torch.linalg.qr #56254

Uh oh!

Conversation

IvanYashchuk commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Uh oh!

mruberry commented Apr 23, 2021

Uh oh!

antocuni left a comment

Choose a reason for hiding this comment

Uh oh!

antocuni Apr 26, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano Apr 26, 2021

Choose a reason for hiding this comment

Uh oh!

lezcano Apr 26, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry commented Apr 27, 2021

Uh oh!

IvanYashchuk commented Apr 29, 2021

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IvanYashchuk commented Apr 16, 2021 •

edited

Loading

facebook-github-bot commented Apr 16, 2021 •

edited

Loading