Add cuSOLVER path for torch.geqrf #56252

IvanYashchuk · 2021-04-16T09:50:41Z

Stack from ghstack:

Fix MAGMA qr for empty batched inputs #56257 Fix MAGMA qr for empty batched inputs
Add cuSOLVER path for torch.linalg.qr #56256 Add cuSOLVER path for torch.linalg.qr
Remove size arguments for internal orgqr and geqrf calls #56255 Remove size arguments for internal orgqr and geqrf calls
Add non-allocating helper function for torch.linalg.qr #56254 Add non-allocating helper function for torch.linalg.qr
Add cuBLAS path for batched torch.geqrf #56253 Add cuBLAS path for batched torch.geqrf
Add cuSOLVER path for torch.geqrf #56252 Add cuSOLVER path for torch.geqrf
Port CUDA torch.geqrf to ATen #56251 Port CUDA torch.geqrf to ATen

Differential Revision: D27960152

[ghstack-poisoned]

facebook-github-bot · 2021-04-16T09:51:03Z

💊 CI failures summary and remediations

As of commit 5011c45 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 26 11:29:21 ERROR 2021-04-26T10:42:18Z: scc...eof ((socklen_t)))\n ^\n" }

Apr 26 11:29:21 ERROR 2021-04-26T10:42:11Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c: In function \'main\':\nconftest.c:332:2: error: \'struct sockaddr\' has no member named \'sa_len\'\n x.sa_len = 0;\n  ^\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T10:42:14Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c: In function \'main\':\nconftest.c:366:10: error: \'RTLD_MEMBER\' undeclared (first use in this function); did you mean \'RTLD_NEXT\'?\n   (void) RTLD_MEMBER;\n          ^~~~~~~~~~~\n          RTLD_NEXT\nconftest.c:366:10: note: each undeclared identifier is reported only once for each function it appears in\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T10:42:15Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c:361:9: error: unknown type name \'not\'\n         not a universal capable compiler\n         ^~~\nconftest.c:361:15: error: expected \'=\', \',\', \';\', \'asm\' or \'__attribute__\' before \'universal\'\n         not a universal capable compiler\n               ^~~~~~~~~\nconftest.c:361:15: error: unknown type name \'universal\'\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T10:42:15Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c: In function \'main\':\nconftest.c:367:4: error: unknown type name \'not\'; did you mean \'ino_t\'?\n    not big endian\n    ^~~\n    ino_t\nconftest.c:367:12: error: expected \'=\', \',\', \';\', \'asm\' or \'__attribute__\' before \'endian\'\n    not big endian\n            ^~~~~~\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T10:42:16Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c: In function \'main\':\nconftest.c:378:4: error: \'struct stat\' has no member named \'st_mtimespec\'; did you mean \'st_mtim\'?\n st.st_mtimespec.tv_nsec = 1;\n    ^~~~~~~~~~~~\n    st_mtim\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T10:42:18Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "conftest.c: In function \'main\':\nconftest.c:402:24: error: expected expression before \')\' token\n if (sizeof ((socklen_t)))\n                        ^\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 ERROR 2021-04-26T11:29:14Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "\u{1b}[01m\u{1b}[K/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:\u{1b}[m\u{1b}[K In function \'\u{1b}[01m\u{1b}[Kvoid transformer_decoder_test_helper(bool)\u{1b}[m\u{1b}[K\':\n\u{1b}[01m\u{1b}[K/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6:\u{1b}[m\u{1b}[K \u{1b}[01;31m\u{1b}[Kinternal compiler error: \u{1b}[m\u{1b}[Kin equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429\n void \u{1b}[01;31m\u{1b}[Ktransformer_decoder_test_helper\u{1b}[m\u{1b}[K(bool is_cuda) {\n      \u{1b}[01;31m\u{1b}[K^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\u{1b}[m\u{1b}[K\nPlease submit a full bug report,\nwith preprocessed source if appropriate.\nSee <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.\n" }
Apr 26 11:29:21 
Apr 26 11:29:21 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 26 11:29:21 Compile requests                   10260
Apr 26 11:29:21 Compile requests executed           6000
Apr 26 11:29:21 Cache hits                          5632
Apr 26 11:29:21 Cache hits (C/C++)                  5153
Apr 26 11:29:21 Cache hits (CUDA)                    479
Apr 26 11:29:21 Cache misses                         299

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ghstack-source-id: 5cbd242 Pull Request resolved: pytorch#56252

Ref. #47953 [ghstack-poisoned]

ghstack-source-id: 28c2115 Pull Request resolved: pytorch#56252

xwang233

Thanks for the PR! This overall looks good. I have left some comments.

xwang233 · 2021-04-16T19:37:51Z

aten/src/ATen/cuda/CUDASolver.cpp

+      params,
+      m,
+      n,
+      CUDA_R_32F,


Would it be beneficial to get rid of the template specialization for 64-bit API with something like this?

pytorch/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.cu

Lines 492 to 506 in bcdcf34

#ifdef USE_CUSOLVER_64_BIT

cusolverDnParams_t params;

cudaDataType datatype = at::cuda::solver::get_cusolver_datatype<scalar_t>();

TORCH_CUSOLVER_CHECK(cusolverDnCreateParams(&params));

for (int64_t i = 0; i < batch_size; i++) {

at::cuda::solver::xpotrs(

handle, params, uplo, n, nrhs, datatype,

A_ptr + i * A_matrix_stride,

lda, datatype,

self_working_copy_ptr + i * self_matrix_stride,

ldb,

infos_ptr

);

}

Currently, it doesn't bring any value and it's a matter of taste, I think we can continue using templates here as xgeqrf supports only the same combination of data types as the original geqrf.

Ref. #47953 [ghstack-poisoned]

ghstack-source-id: 6da75be Pull Request resolved: pytorch#56252

Ref. #47953 [ghstack-poisoned]

ghstack-source-id: ebe22ac Pull Request resolved: pytorch#56252

[ghstack-poisoned]

ghstack-source-id: 01ccae0 Pull Request resolved: pytorch#56252

mruberry

Thanks for taking a look, @xwang233!

Differential Revision: [D27960152](https://our.internmc.facebook.com/intern/diff/D27960152) [ghstack-poisoned]

ghstack-source-id: 9d9edd8 Pull Request resolved: pytorch#56252

facebook-github-bot · 2021-04-26T16:53:00Z

@mruberry merged this pull request in 27a8ece.

xwang233 · 2021-04-28T21:54:02Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+void geqrf_kernel(const Tensor& input, const Tensor& tau, int64_t m, int64_t n) {
+#if defined(USE_CUSOLVER)
+  return geqrf_cusolver(input, tau, m, n);
+#else
+  return geqrf_magma(input, tau, m, n);
+#endif
+}


Hi @IvanYashchuk , I forgot to ask for a benchmark table for cusolver vs magma. I see that matrices of all shapes are dispatched to cusolver path in this heuristic. Is there any cusolver performance complaint for cusolverDnXgeqrf and cusolverDn<T>geqrf?

torch.linalg.qr with mode='r' basically just calls geqrf + triu_. Here are the results for that #56256 (comment). They show that for large sizes MAGMA is a bit faster.

Here are the results comparing geqrf_cusolver and geqrf_magma. For large sizes, MAGMA variant is faster but still we are using cuSOLVER here unconditionally, since we aim to remove all uses of single input MAGMA functions because they create and destroy cuda streams internally.

| | cuSOLVER | MAGMA | |-------------------------------|----------|--------| | torch.Size([2, 2]) | 0.049 | 5.3 | | torch.Size([2, 2, 2]) | 0.034 | 10.2 | | torch.Size([32, 2, 2]) | 0.417 | 189.5 | | torch.Size([64, 2, 2]) | 0.840 | 321.8 | | torch.Size([128, 2, 2]) | 1.6 | 632.9 | | torch.Size([8, 8]) | 0.062 | 6.1 | | torch.Size([2, 8, 8]) | 0.122 | 12.4 | | torch.Size([32, 8, 8]) | 1.8 | 157.6 | | torch.Size([64, 8, 8]) | 3.7 | 319.0 | | torch.Size([128, 8, 8]) | 7.5 | 724.8 | | torch.Size([16, 16]) | 0.125 | 6.7 | | torch.Size([2, 16, 16]) | 0.247 | 12.7 | | torch.Size([32, 16, 16]) | 3.9 | 152.8 | | torch.Size([64, 16, 16]) | 7.8 | 312.1 | | torch.Size([128, 16, 16]) | 15.6 | 661.9 | | torch.Size([32, 32]) | 0.256 | 5.7 | | torch.Size([2, 32, 32]) | 5.1 | 10.1 | | torch.Size([32, 32, 32]) | 8.1 | 250.8 | | torch.Size([64, 32, 32]) | 16.2 | 376.7 | | torch.Size([128, 32, 32]) | 32.5 | 682.1 | | torch.Size([64, 64]) | 0.658 | 5.7 | | torch.Size([2, 64, 64]) | 1.3 | 9.3 | | torch.Size([32, 64, 64]) | 20.9 | 211.8 | | torch.Size([64, 64, 64]) | 41.8 | 312.9 | | torch.Size([128, 64, 64]) | 83.7 | 556.3 | | torch.Size([128, 128]) | 1.5 | 5.2 | | torch.Size([2, 128, 128]) | 3.1 | 11.6 | | torch.Size([32, 128, 128]) | 49.8 | 208.4 | | torch.Size([64, 128, 128]) | 99.8 | 361.6 | | torch.Size([128, 128, 128]) | 199.6 | 903.5 | | torch.Size([256, 256]) | 2.3 | 9.7 | | torch.Size([2, 256, 256]) | 4.6 | 14.7 | | torch.Size([32, 256, 256]) | 75.9 | 228.9 | | torch.Size([64, 256, 256]) | 152.0 | 419.8 | | torch.Size([128, 256, 256]) | 303.9 | 846.4 | | torch.Size([512, 512]) | 5.8 | 9.8 | | torch.Size([2, 512, 512]) | 11.727 | 17.9 | | torch.Size([32, 512, 512]) | 187.4 | 285.0 | | torch.Size([64, 512, 512]) | 374.8 | 594.5 | | torch.Size([128, 512, 512]) | 749.3 | 1263.3 | | torch.Size([1024, 1024]) | 15.3 | 16.3 | | torch.Size([2, 1024, 1024]) | 30.6 | 32.7 | | torch.Size([32, 1024, 1024]) | 490.8 | 527.6 | | torch.Size([64, 1024, 1024]) | 985.4 | 1022.6 | | torch.Size([128, 1024, 1024]) | 1978.6 | 2026.9 | | | | | | torch.Size([512, 512]) | 8.0 | 11.9 | | torch.Size([1024, 1024]) | 15.1 | 22.5 | | torch.Size([2048, 2048]) | 54.9 | 54.9 | | torch.Size([4096, 4096]) | 276.4 | 265.8 | | torch.Size([8192, 8192]) | 1712.3 | 1555.8 | Times are in milliseconds (ms).

Summary: Pull Request resolved: pytorch#56252 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960152 Pulled By: mruberry fbshipit-source-id: 0510a302aab50623d7490efaba0133f740cd57c3

Add cuSOLVER path for torch.geqrf

1dfbb4e

[ghstack-poisoned]

facebook-github-bot added the cla signed label Apr 16, 2021

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Add cuSOLVER path for torch.geqrf

0ec4d71

ghstack-source-id: 5cbd242 Pull Request resolved: pytorch#56252

IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 16, 2021

pytorchbot added the open source label Apr 16, 2021

IvanYashchuk requested review from mruberry and xwang233 April 16, 2021 10:01

Update on "Add cuSOLVER path for torch.geqrf"

bb88692

Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021

Add cuSOLVER path for torch.geqrf

d1039ce

ghstack-source-id: 28c2115 Pull Request resolved: pytorch#56252

xwang233 approved these changes Apr 16, 2021

View reviewed changes

Update on "Add cuSOLVER path for torch.geqrf"

a12ed30

Ref. #47953 [ghstack-poisoned]

Update on "Add cuSOLVER path for torch.geqrf"

d3e2718

Ref. #47953 [ghstack-poisoned]

IvanYashchuk added 2 commits April 19, 2021 12:10

Update on "Add cuSOLVER path for torch.geqrf"

f3c83a6

Ref. #47953 [ghstack-poisoned]

Update on "Add cuSOLVER path for torch.geqrf"

face480

Ref. #47953 [ghstack-poisoned]

Update on "Add cuSOLVER path for torch.geqrf"

58499c5

Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021

Add cuSOLVER path for torch.geqrf

8436573

ghstack-source-id: 6da75be Pull Request resolved: pytorch#56252

Update on "Add cuSOLVER path for torch.geqrf"

b3c465b

Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 19, 2021

Add cuSOLVER path for torch.geqrf

3dfefc8

ghstack-source-id: ebe22ac Pull Request resolved: pytorch#56252

Update on "Add cuSOLVER path for torch.geqrf"

631da81

[ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 20, 2021

Add cuSOLVER path for torch.geqrf

82f5699

ghstack-source-id: 01ccae0 Pull Request resolved: pytorch#56252

mruberry approved these changes Apr 23, 2021

View reviewed changes

Update on "Add cuSOLVER path for torch.geqrf"

5011c45

Differential Revision: [D27960152](https://our.internmc.facebook.com/intern/diff/D27960152) [ghstack-poisoned]

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021

Add cuSOLVER path for torch.geqrf

d7f6be8

ghstack-source-id: 9d9edd8 Pull Request resolved: pytorch#56252

facebook-github-bot closed this in 27a8ece Apr 26, 2021

facebook-github-bot added the Merged label Apr 26, 2021

xwang233 reviewed Apr 28, 2021

View reviewed changes

IvanYashchuk mentioned this pull request Apr 29, 2021

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Open

facebook-github-bot deleted the gh/ivanyashchuk/13/head branch April 30, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cuSOLVER path for torch.geqrf #56252

Add cuSOLVER path for torch.geqrf #56252

Uh oh!

IvanYashchuk commented Apr 16, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 16, 2021 •

edited

Loading

Uh oh!

xwang233 left a comment

Uh oh!

xwang233 Apr 16, 2021

Uh oh!

IvanYashchuk Apr 19, 2021

Uh oh!

mruberry left a comment

Uh oh!

facebook-github-bot commented Apr 26, 2021

Uh oh!

xwang233 Apr 28, 2021

Uh oh!

IvanYashchuk Apr 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	#ifdef USE_CUSOLVER_64_BIT
	cusolverDnParams_t params;
	cudaDataType datatype = at::cuda::solver::get_cusolver_datatype<scalar_t>();
	TORCH_CUSOLVER_CHECK(cusolverDnCreateParams(&params));

	for (int64_t i = 0; i < batch_size; i++) {
	at::cuda::solver::xpotrs(
	handle, params, uplo, n, nrhs, datatype,
	A_ptr + i * A_matrix_stride,
	lda, datatype,
	self_working_copy_ptr + i * self_matrix_stride,
	ldb,
	infos_ptr
	);
	}

Add cuSOLVER path for torch.geqrf #56252

Add cuSOLVER path for torch.geqrf #56252

Uh oh!

Conversation

IvanYashchuk commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (1/1)

ci.pytorch.org: 1 failed

Uh oh!

xwang233 left a comment

Choose a reason for hiding this comment

Uh oh!

xwang233 Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 26, 2021

Uh oh!

xwang233 Apr 28, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

IvanYashchuk commented Apr 16, 2021 •

edited

Loading

facebook-github-bot commented Apr 16, 2021 •

edited

Loading