Skip to content

Conversation

@IvanYashchuk
Copy link
Collaborator

@IvanYashchuk IvanYashchuk commented Apr 16, 2021

Stack from ghstack:

This PR ports torch.geqrf from TH to ATen for CUDA path.

Resolves #24569

Differential Revision: D27960155

This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 16, 2021

💊 CI failures summary and remediations

As of commit 2843955 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

ghstack-source-id: 9d6b500
Pull Request resolved: pytorch#56251
@IvanYashchuk IvanYashchuk added module: porting Issues related to porting TH/THNN legacy to ATen native module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Apr 16, 2021
@IvanYashchuk IvanYashchuk requested review from mruberry and ngimel and removed request for ezyang April 16, 2021 09:59
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

ghstack-source-id: ecdc580
Pull Request resolved: pytorch#56251
dispatch:
CPU: geqrf
CUDA: legacy::cuda::_th_geqrf
CPU, CUDA: geqrf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// magmaGeqrf uses a hybrid CPU-GPU algorithm to compute the elementary reflectors.
// The driver routine geqrf2_gpu accepts a tensor on the CPU for elementary reflectors.
Tensor tau_cpu = at::empty(tau.sizes(), tau.options().device(at::kCPU));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to allocate this in pinned memory and use non-blocking copy below? The effect is probably negligible, but I don't know off the top of my head.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it wouldn't hurt.
Is it simply Tensor tau_cpu = at::empty(tau.sizes(), tau.options().device(at::kCPU).pinned_memory(true)); ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and ,/*non_blocking*/true for copy_ below, iirc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It did give non-negligible speedup:

|                              | before | after |
|------------------------------|--------|-------|
| torch.Size([2, 2])           | 3.0    | 2.3   |
| torch.Size([2, 2, 2])        | 5.8    | 4.4   |
| torch.Size([32, 2, 2])       | 88.3   | 70.6  |
| torch.Size([8, 8])           | 2.9    | 2.2   |
| torch.Size([2, 8, 8])        | 5.7    | 4.4   |
| torch.Size([32, 8, 8])       | 88.3   | 70.7  |
| torch.Size([16, 16])         | 3.0    | 2.2   |
| torch.Size([2, 16, 16])      | 5.7    | 4.5   |
| torch.Size([32, 16, 16])     | 89.0   | 72.4  |
| torch.Size([32, 32])         | 3.0    | 2.3   |
| torch.Size([2, 32, 32])      | 5.8    | 4.6   |
| torch.Size([32, 32, 32])     | 91.6   | 72.7  |
| torch.Size([64, 64])         | 3.1    | 2.4   |
| torch.Size([2, 64, 64])      | 6.0    | 4.7   |
| torch.Size([32, 64, 64])     | 93.1   | 74.0  |
| torch.Size([128, 128])       | 3.5    | 2.8   |
| torch.Size([2, 128, 128])    | 6.9    | 5.5   |
| torch.Size([32, 128, 128])   | 107.8  | 88.1  |
| torch.Size([256, 256])       | 4.7    | 3.7   |
| torch.Size([2, 256, 256])    | 9.2    | 7.2   |
| torch.Size([32, 256, 256])   | 144.1  | 112.1 |
| torch.Size([512, 512])       | 6.9    | 6.0   |
| torch.Size([2, 512, 512])    | 13.3   | 11.8  |
| torch.Size([32, 512, 512])   | 209.3  | 184.6 |
| torch.Size([1024, 1024])     | 15.4   | 14.6  |
| torch.Size([2, 1024, 1024])  | 30.5   | 28.9  |
| torch.Size([32, 1024, 1024]) | 486.5  | 462.4 |

Times are in milliseconds (ms).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just hope magma is synchronizing when it is writing tau, otherwise nonblocking copy might produce wrong result (it will copy tau to gpu before it's properly populated)

This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

ghstack-source-id: 3827eca
Pull Request resolved: pytorch#56251
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 20, 2021
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

ghstack-source-id: 3e76211
Pull Request resolved: pytorch#56251
@IvanYashchuk
Copy link
Collaborator Author

@ngimel, @mruberry I updated this PR to use pinned memory and non-blocking copy for the tau_cpu tensor that is passed to MAGMA.

@mruberry
Copy link
Collaborator

Hey @IvanYashchuk, unfortunately the internal tools are unable to merge this; would you try rebasing the stack now that the first 2 prs are in?

This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves #24569

Differential Revision: [D27960155](https://our.internmc.facebook.com/intern/diff/D27960155)

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021
This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

ghstack-source-id: 501c5f3
Pull Request resolved: pytorch#56251
@IvanYashchuk
Copy link
Collaborator Author

@mruberry, I rebased the stack.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in f84f206.

@facebook-github-bot facebook-github-bot deleted the gh/ivanyashchuk/12/head branch April 30, 2021 14:16
crcrpar pushed a commit to crcrpar/pytorch that referenced this pull request May 7, 2021
Summary:
Pull Request resolved: pytorch#56251

This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27960155

Pulled By: mruberry

fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#56251

This PR ports `torch.geqrf` from TH to ATen for CUDA path.

Resolves pytorch#24569

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27960155

Pulled By: mruberry

fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: porting Issues related to porting TH/THNN legacy to ATen native open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants