[RFC, ready] Batched Inverse #9949

vishwakftw · 2018-07-27T19:36:27Z

Complete billing of changes:

Related to Batch Inverse:

Miscellaneous modifications:

Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
Add a RAII structure for MAGMA queue management.

I didn't know how getri worked after using getrf, which led to some issues

vadimkantorov · 2018-07-27T22:50:55Z

For a test, e.g. this style transfer code https://github.com/NVIDIA/FastPhotoStyle/blob/master/photo_smooth.py#L77 batch-inverts matrices of shape HxWx3x3 in NumPy, one can compare perf of batched GPU inversion versus NumPy.

vishwakftw · 2018-07-28T12:50:58Z

Oops, I didn't notice #9102 . Should I close this?

vishwakftw

After building, the times between the CPU version and the NumPy version looked similar, but there was a big performance regression in the CUDA case. Help will be appreciated, and I have added my own comments for improving down below.

aten/src/ATen/native/Inverse.cpp

+  scalar_t wkopt;
+  Tensor work;
+
+  for (int64_t i = 0; i < batch_size; i++) {


aten/src/ATen/native/Inverse.cpp

+
+Tensor _inverse_helper_cpu(const Tensor& self) {
+  std::vector<int64_t> getrf_infos(batchCount(self), 0);
+  std::vector<int64_t> getri_infos(batchCount(self), 0);


aten/src/ATen/native/cuda/Inverse.cu

+  scalar_t** self_inv_array;
+
+  ALLOCATE_ARRAY(getrf_info_array, magma_int_t, batch_size, self);
+  ALLOCATE_ARRAY(getri_info_array, magma_int_t, batch_size, self);


aten/src/ATen/native/cuda/Inverse.cu

+    n, n, self_array, n, ipiv_array, getrf_info_array,
+    batch_size, createMagmaQueue(self));
+
+  for (int64_t i = 0; i < batch_size; i++) {


aten/src/ATen/native/cuda/Inverse.cu

+    n, self_array, n, ipiv_array, self_inv_array,
+    n, getri_info_array, batch_size, createMagmaQueue(self));
+
+  for (int64_t i = 0; i < batch_size; i++) {


aten/src/ATen/native/cuda/Inverse.cu

+
+  magmaGetriBatched<scalar_t>(
+    n, self_array, n, ipiv_array, self_inv_array,
+    n, getri_info_array, batch_size, createMagmaQueue(self));


test/test_torch.py

+        M = cast(torch.randn(5, 5))
        MI = torch.inverse(M)
-        E = torch.eye(5)
-        self.assertFalse(MI.is_contiguous(), 'MI is contiguous')


test/test_autograd.py

vishwakftw · 2018-10-22T00:33:35Z

@zou3519 test failures are unrelated.

aten/src/ATen/native/cuda/MiscUtils.h

+namespace native {
+
+#ifdef USE_MAGMA
+static magma_queue_t createMagmaQueue(const Tensor& tensor) {


aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+}
+
+// Because this is out-of-place inverse, the predefined macros will
+// not work


zou3519

One minor comment, otherwise, lgtm!

vishwakftw · 2018-10-22T14:51:44Z

There is an optimization available for CUDA batched getrf for square matrices of dim < 32. Should I include this optimization here, or leave it for a later PR?

zou3519 · 2018-10-22T14:58:30Z

@vishwakftw if it's a quick change feel free to do it here. Otherwise, it'll be easier to review as a separate PR

- use magma_*getrf_smallsq_shfl for batches of matrices with dim <= 32 - remove destroyMagmaQueue and make magma queue more RAII-like

aten/src/ATen/native/cuda/MiscUtils.h

  return magma_queue;
 }

-static void destroyMagmaQueue(magma_queue_t& existing_queue) {


aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+  if (self.size(-2) <= 32) {
+    magmaGetrfSmallSquareBatched<scalar_t>(
+      n, self_array, n, ipiv_array, info_array,
+      batch_size, createMagmaQueue(self));


vishwakftw · 2018-10-23T12:39:24Z

Seemingly, the optimization for small matrix getrf is done inside MAGMA's getrf function itself. I found this while inspecting the source code. I'll revert that change.

aten/src/ATen/native/cuda/MiscUtils.h

+#ifdef USE_MAGMA
+
+// RAII for a MAGMA Queue
+struct MAGMAQueue {


aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+
+  magmaGetriBatched<scalar_t>(
+    n, self_array, n, ipiv_array, self_inv_array,
+    n, info_array, batch_size, MAGMAQueue(self.get_device()));


aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+void magmaGesvBatched(
+    magma_int_t n, magma_int_t nrhs, scalar_t** dA_array, magma_int_t ldda,
+    magma_int_t** dipiv_array, scalar_t** dB_array, magma_int_t lddb,
+    magma_int_t* dinfo_array, magma_int_t batch_count, MAGMAQueue magma_queue) {


…h-inverse

zou3519

Looks great now! One last comment :) then we can merge this.

The test failures are unrelated.

aten/src/ATen/native/cuda/MiscUtils.h

+struct MAGMAQueue {
+
+  // Default constructor, does nothing.
+  MAGMAQueue() = default;


facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: pytorch/pytorch#9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4

Jasonhunger · 2021-01-12T23:03:50Z

DTD_inv = torch.inverse(DTD + self.lambda3 * torch.eye(self.input_dim).cuda())
RuntimeError: inverse_cuda: For batch 0: U(704,704) is zero, singular U.

How to solve this error

Add batched inverse

0096374

vishwakftw requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 27, 2018 19:36

vishwakftw added 2 commits July 27, 2018 15:38

Remove pin_memory from Gesv.cu

157280b

Missed #endif

bd43e52

vishwakftw closed this Jul 27, 2018

Fix computation of inverses

6ed2f09

I didn't know how getri worked after using getrf, which led to some issues

vishwakftw reopened this Jul 27, 2018

Silly typographical error; my bad

0d5a133

vishwakftw added 3 commits July 27, 2018 23:23

Modify docs, derivative, and add test in test_autograd

47ffc82

Add tests in test_cuda, test_torch

69db465

Remove _batch_inverse from MVN

bf165f1

vishwakftw changed the title ~~[WIP] Batched Inverse~~ [ready] Batched Inverse Jul 28, 2018

Fix include error

24b5d97

Rename files for consistency

1dfabaf

vishwakftw commented Jul 28, 2018

View reviewed changes

Clean up error messages, pass infos by ref

9a22157

vishwakftw force-pushed the batch-inverse branch from 9abbad0 to 9a22157 Compare July 28, 2018 18:50

vishwakftw changed the title ~~[ready] Batched Inverse~~ [help needed] Batched Inverse Jul 29, 2018

vishwakftw mentioned this pull request Jul 31, 2018

[ready] Add matrix_power #10068

Closed

li-roy added the ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes label Jul 31, 2018

Modify test case for full rank matrices

5fa97b3

vishwakftw commented Aug 2, 2018

View reviewed changes

test/test_autograd.py Outdated Show resolved Hide resolved

Some changes

cc6b9e4

zou3519 reviewed Oct 22, 2018

View reviewed changes

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

}

// Because this is out-of-place inverse, the predefined macros will

// not work

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

zou3519 approved these changes Oct 22, 2018

View reviewed changes

CR and improvements

221209d

- use magma_*getrf_smallsq_shfl for batches of matrices with dim <= 32 - remove destroyMagmaQueue and make magma queue more RAII-like

vishwakftw commented Oct 22, 2018

View reviewed changes

aten/src/ATen/native/cuda/MiscUtils.h Outdated

return magma_queue;

}

static void destroyMagmaQueue(magma_queue_t& existing_queue) {

This comment was marked as off-topic.

Sign in to view

vishwakftw commented Oct 22, 2018

View reviewed changes

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu Outdated

if (self.size(-2) <= 32) {

magmaGetrfSmallSquareBatched<scalar_t>(

n, self_array, n, ipiv_array, info_array,

batch_size, createMagmaQueue(self));

This comment was marked as off-topic.

Sign in to view

Add RAII struct for MAGMA queue management, revert getrf optimization

ca96075

vishwakftw commented Oct 23, 2018

View reviewed changes

aten/src/ATen/native/cuda/MiscUtils.h

#ifdef USE_MAGMA

// RAII for a MAGMA Queue

struct MAGMAQueue {

This comment was marked as off-topic.

Sign in to view

zou3519 reviewed Oct 23, 2018

View reviewed changes

vishwakftw added 2 commits October 24, 2018 11:29

CR

3dfae69

Merge branch 'master' of https://github.com/pytorch/pytorch into batc…

7fb2196

…h-inverse

zou3519 approved these changes Oct 24, 2018

View reviewed changes

aten/src/ATen/native/cuda/MiscUtils.h Outdated

struct MAGMAQueue {

// Default constructor, does nothing.

MAGMAQueue() = default;

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Delete default constructor for MAGMAQueue

8cc6504

facebook-github-bot reviewed Oct 24, 2018

View reviewed changes

facebook-github-bot closed this in 1fe8278 Oct 28, 2018

vishwakftw deleted the batch-inverse branch October 28, 2018 08:58

vishwakftw mentioned this pull request Nov 3, 2018

[RFC] Switch batched inverse to cuBLAS instead of MAGMA #13546

Closed

jjbouza mentioned this pull request Nov 19, 2018

Batched SVD using cuSolver #14175

Closed

ezyang added open source merged labels Jun 24, 2019

vishwakftw mentioned this pull request Apr 15, 2020

Synchronize MAGMA functions with the current CUDA stream #36605

Closed

[RFC, ready] Batched Inverse #9949

[RFC, ready] Batched Inverse #9949

Uh oh!

Conversation

vishwakftw commented Jul 27, 2018 • edited by suo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Jul 27, 2018

Uh oh!

vishwakftw commented Jul 28, 2018

Uh oh!

vishwakftw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

vishwakftw commented Oct 22, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Oct 22, 2018

Uh oh!

zou3519 commented Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

vishwakftw commented Oct 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

vishwakftw commented Jul 27, 2018 •

edited by suo

Loading

vishwakftw left a comment •

edited

Loading

zou3519 commented Oct 22, 2018 •

edited

Loading

vishwakftw commented Oct 23, 2018 •

edited

Loading

Jasonhunger commented Jan 12, 2021 •

edited

Loading