Port CUDA torch.geqrf to ATen #56251

IvanYashchuk · 2021-04-16T09:50:35Z

Stack from ghstack:

Fix MAGMA qr for empty batched inputs #56257 Fix MAGMA qr for empty batched inputs
Add cuSOLVER path for torch.linalg.qr #56256 Add cuSOLVER path for torch.linalg.qr
Remove size arguments for internal orgqr and geqrf calls #56255 Remove size arguments for internal orgqr and geqrf calls
Add non-allocating helper function for torch.linalg.qr #56254 Add non-allocating helper function for torch.linalg.qr
Add cuBLAS path for batched torch.geqrf #56253 Add cuBLAS path for batched torch.geqrf
Add cuSOLVER path for torch.geqrf #56252 Add cuSOLVER path for torch.geqrf
Port CUDA torch.geqrf to ATen #56251 Port CUDA torch.geqrf to ATen

This PR ports torch.geqrf from TH to ATen for CUDA path.

Resolves #24569

Differential Revision: D27960155

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

facebook-github-bot · 2021-04-16T09:50:54Z

💊 CI failures summary and remediations

As of commit 2843955 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 9d6b500 Pull Request resolved: pytorch#56251

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: ecdc580 Pull Request resolved: pytorch#56251

ezyang · 2021-04-16T14:06:59Z

aten/src/ATen/native/native_functions.yaml

  dispatch:
-    CPU: geqrf
-    CUDA: legacy::cuda::_th_geqrf
+    CPU, CUDA: geqrf


You don't have to do it this PR, but consider making this structured! https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md

ngimel · 2021-04-16T16:03:05Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+
+  // magmaGeqrf uses a hybrid CPU-GPU algorithm to compute the elementary reflectors.
+  // The driver routine geqrf2_gpu accepts a tensor on the CPU for elementary reflectors.
+  Tensor tau_cpu = at::empty(tau.sizes(), tau.options().device(at::kCPU));


does it make sense to allocate this in pinned memory and use non-blocking copy below? The effect is probably negligible, but I don't know off the top of my head.

I think it wouldn't hurt.
Is it simply Tensor tau_cpu = at::empty(tau.sizes(), tau.options().device(at::kCPU).pinned_memory(true)); ?

Yes, and ,/*non_blocking*/true for copy_ below, iirc.

Thank you! It did give non-negligible speedup:

| | before | after | |------------------------------|--------|-------| | torch.Size([2, 2]) | 3.0 | 2.3 | | torch.Size([2, 2, 2]) | 5.8 | 4.4 | | torch.Size([32, 2, 2]) | 88.3 | 70.6 | | torch.Size([8, 8]) | 2.9 | 2.2 | | torch.Size([2, 8, 8]) | 5.7 | 4.4 | | torch.Size([32, 8, 8]) | 88.3 | 70.7 | | torch.Size([16, 16]) | 3.0 | 2.2 | | torch.Size([2, 16, 16]) | 5.7 | 4.5 | | torch.Size([32, 16, 16]) | 89.0 | 72.4 | | torch.Size([32, 32]) | 3.0 | 2.3 | | torch.Size([2, 32, 32]) | 5.8 | 4.6 | | torch.Size([32, 32, 32]) | 91.6 | 72.7 | | torch.Size([64, 64]) | 3.1 | 2.4 | | torch.Size([2, 64, 64]) | 6.0 | 4.7 | | torch.Size([32, 64, 64]) | 93.1 | 74.0 | | torch.Size([128, 128]) | 3.5 | 2.8 | | torch.Size([2, 128, 128]) | 6.9 | 5.5 | | torch.Size([32, 128, 128]) | 107.8 | 88.1 | | torch.Size([256, 256]) | 4.7 | 3.7 | | torch.Size([2, 256, 256]) | 9.2 | 7.2 | | torch.Size([32, 256, 256]) | 144.1 | 112.1 | | torch.Size([512, 512]) | 6.9 | 6.0 | | torch.Size([2, 512, 512]) | 13.3 | 11.8 | | torch.Size([32, 512, 512]) | 209.3 | 184.6 | | torch.Size([1024, 1024]) | 15.4 | 14.6 | | torch.Size([2, 1024, 1024]) | 30.5 | 28.9 | | torch.Size([32, 1024, 1024]) | 486.5 | 462.4 | Times are in milliseconds (ms).

I just hope magma is synchronizing when it is writing tau, otherwise nonblocking copy might produce wrong result (it will copy tau to gpu before it's properly populated)

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 3827eca Pull Request resolved: pytorch#56251

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 3e76211 Pull Request resolved: pytorch#56251

IvanYashchuk · 2021-04-20T16:46:40Z

@ngimel, @mruberry I updated this PR to use pinned memory and non-blocking copy for the tau_cpu tensor that is passed to MAGMA.

mruberry · 2021-04-26T04:59:55Z

Hey @IvanYashchuk, unfortunately the internal tools are unable to merge this; would you try rebasing the stack now that the first 2 prs are in?

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 Differential Revision: [D27960155](https://our.internmc.facebook.com/intern/diff/D27960155) [ghstack-poisoned]

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 501c5f3 Pull Request resolved: pytorch#56251

IvanYashchuk · 2021-04-26T10:28:00Z

@mruberry, I rebased the stack.

facebook-github-bot · 2021-04-26T16:52:56Z

@mruberry merged this pull request in f84f206.

Summary: Pull Request resolved: pytorch#56251 This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960155 Pulled By: mruberry fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca

Port CUDA torch.geqrf to ATen

18d9500

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

IvanYashchuk requested a review from ezyang as a code owner April 16, 2021 09:50

facebook-github-bot added the cla signed label Apr 16, 2021

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Port CUDA torch.geqrf to ATen

e65704a

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 9d6b500 Pull Request resolved: pytorch#56251

IvanYashchuk added module: porting Issues related to porting TH/THNN legacy to ATen native module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Apr 16, 2021

IvanYashchuk requested review from mruberry and ngimel and removed request for ezyang April 16, 2021 09:59

pytorchbot added the open source label Apr 16, 2021

Update on "Port CUDA torch.geqrf to ATen"

f79b3ca

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Port CUDA torch.geqrf to ATen

d078a72

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: ecdc580 Pull Request resolved: pytorch#56251

ezyang reviewed Apr 16, 2021

View reviewed changes

ngimel reviewed Apr 16, 2021

View reviewed changes

Update on "Port CUDA torch.geqrf to ATen"

4fd06ae

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

Update on "Port CUDA torch.geqrf to ATen"

e390a2c

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

Update on "Port CUDA torch.geqrf to ATen"

927d40d

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

Update on "Port CUDA torch.geqrf to ATen"

496e194

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021

Port CUDA torch.geqrf to ATen

88023c9

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 3827eca Pull Request resolved: pytorch#56251

Update on "Port CUDA torch.geqrf to ATen"

f592cca

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves #24569 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 20, 2021

Port CUDA torch.geqrf to ATen

fc0ff1c

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 3e76211 Pull Request resolved: pytorch#56251

ngimel approved these changes Apr 21, 2021

View reviewed changes

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021

Port CUDA torch.geqrf to ATen

8dacde0

This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves pytorch#24569 ghstack-source-id: 501c5f3 Pull Request resolved: pytorch#56251

facebook-github-bot closed this in f84f206 Apr 26, 2021

facebook-github-bot added the Merged label Apr 26, 2021

IvanYashchuk mentioned this pull request Apr 29, 2021

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Open

facebook-github-bot deleted the gh/ivanyashchuk/12/head branch April 30, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port CUDA torch.geqrf to ATen #56251

Port CUDA torch.geqrf to ATen #56251

Uh oh!

IvanYashchuk commented Apr 16, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 16, 2021 •

edited

Loading

Uh oh!

ezyang Apr 16, 2021

Uh oh!

ngimel Apr 16, 2021

Uh oh!

IvanYashchuk Apr 16, 2021

Uh oh!

ngimel Apr 16, 2021

Uh oh!

IvanYashchuk Apr 20, 2021

Uh oh!

ngimel Apr 21, 2021

Uh oh!

IvanYashchuk commented Apr 20, 2021

Uh oh!

mruberry commented Apr 26, 2021

Uh oh!

IvanYashchuk commented Apr 26, 2021

Uh oh!

facebook-github-bot commented Apr 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Port CUDA torch.geqrf to ATen #56251

Port CUDA torch.geqrf to ATen #56251

Uh oh!

Conversation

IvanYashchuk commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ezyang Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 20, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Apr 21, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Apr 20, 2021

Uh oh!

mruberry commented Apr 26, 2021

Uh oh!

IvanYashchuk commented Apr 26, 2021

Uh oh!

facebook-github-bot commented Apr 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IvanYashchuk commented Apr 16, 2021 •

edited

Loading

facebook-github-bot commented Apr 16, 2021 •

edited

Loading