Implement batch matrix inverse #9102

karol-arndt · 2018-07-02T13:21:03Z

I was recently working on some Kalman filter stuff and found myself in need of a batch matrix inverse, so I implemented it (doing it in a Python for loop is incredibly slow, especially with CUDA). Since cuBLAS has such functionality already implemented (and the standard inverse function just passes 1 as the batch size), it's just a matter of allocating some buffers and passing data to the appropriate cuBLAS functions. The implementation is based on the btrifact (batch LU factorization using getrf) function. I also added a CPU implementation (which is really just a for loop) for the sake of completeness.

I figured that this might be useful for other people, so I'm sharing it here. This is my first contribution to PyTorch and I'm not very experienced with CUDA programming, so all comments regarding the code are most welcome.

vadimkantorov · 2018-07-02T14:23:10Z

A general getrf wrapper may also be useful, e.g. for porting Randomized PCA functions (#8049)...

karol-arndt · 2018-07-02T15:15:43Z

@vadimkantorov isn't that what btrifact does? Looking at the code, it only wraps the call to getrf with some code to ensure column-major, manage the device buffers and check the error codes.

vadimkantorov · 2018-07-02T15:18:00Z

@karol-arndt You are right! Thanks for the tip :) This makes porting that fbpca code trivial to PyTorch.

fmassa · 2018-07-02T21:02:02Z

Thanks for the PR!
I have a quick suggestion on the python interface: instead of making a new binverse function, can't we instead extend support for inverse to support arbitrary batch dimensions (where we view all the batch dimensions as 1 and then view back after the result). What do you think?

karol-arndt · 2018-07-03T08:30:08Z

Well, this indeed sounds like a natural and intuitive extension of the current inverse method, and wouldn't cause any extra clutter in the Python interface. It's also how the API of another popular tensor library works.
On the other hand, currently most - if not all - functions that operate on batch data have a name prefixed with a b (bmm, btrifact, btrisolve...), and extending inverse to work both with batches and single n-by-n matrices would break that convention.
Overall, it's hard to say which is preferable; perhaps some feedback from the code owners could help to solve this problem.

fmassa · 2018-07-03T08:42:24Z

I think the trend will probably be to move away from the b prefix (which is in some sense legacy), in favor of functions that handle batches (like matmul is the successor of mm and bmm).

But let's see what the others think about it.

zou3519 · 2018-07-03T14:22:12Z

I agree with @fmassa -- I think we're trying to move away from the 'b' prefix to simplify the API, and I would prefer a batch inverse function be built into the inverse function.

bmm in particular is "deprecated" in favor of torch.matmul, and the reason why btrifact, btrisolve are named such is because they're direct bindings to the lapack functions of the same name.

ezyang · 2018-07-03T19:51:09Z

How hard would it be to implement this in ATen versus TH? (Asking for information.)

karol-arndt · 2018-07-04T10:21:25Z

@fmassa In that case, I'll adopt the code to extend the inverse function.

@ezyang I've never written any ATen code before. I wouldn't expect it to be particularly difficult though, the API seems to be more friendly than TH. If the current trend is to write new code in ATen, I can try to port the implementation in the coming days. Should I use MKL for the CPU code?

ezyang · 2018-07-04T16:07:18Z

Yep, we're trying to do everything in ATen as much as possible. Sometimes it's not possible, but when it is, it's preferred.

If MKL has got a good CPU implementation, it's definitely a good pick.

karol-arndt · 2018-07-09T12:19:45Z

@ezyang I implemented the inverse methods in ATen. I had to add a cuBLAS handle to ATen context (it's shared with the THC handle, similarly to the cuSPARSE handle).

Worth noting - in addition to cuBLAS version, THC also had a Magma-based implementation of inverse, which I didn't reimplement when porting to ATen. Is it still needed and should it be reimplemented?

@fmassa As you asked, the code now works with arbitrary number of batch dimensions as an extension of the previous inverse method.

ezyang · 2018-07-09T15:33:05Z

@pytorchbot retest this please

aten/src/ATen/native/cuda/Inverse.cu

+        scalar_t **output_gpu;
+        scalar_t **input_ptrs = new scalar_t*[batch_size];
+        scalar_t **output_ptrs = new scalar_t*[batch_size];
+        AT_CUDA_CHECK(cudaMalloc(&input_gpu, batch_size*sizeof(scalar_t*)));


soumith · 2018-07-13T01:50:57Z

@karol-arndt the MAGMA based inverse is MUCH faster than the cublas version for many sizes and overall has a better performance profile (as can be attested to by @martinarjovsky who's been reporting this). If it's not too much of an ask, porting the magma bindings would be good as well.

karol-arndt · 2018-07-16T10:17:53Z

Well, this PR definitely needs more work anyway, as some of the tests are currently failing (the results appear to be transposes of the correct ones, which seems like a data alignment issue on some platforms). I currently don't have the time to work on this anymore - and the current implementation is good enough for me to continue my research. I will most likely return to this in a few weeks, but if someone has the time and energy to fix the issue and add the MAGMA version, that would be great ;)

weiyangfb · 2018-08-28T20:11:56Z

I believe there is a continue work of this PR at #9949, Thanks @karol-arndt for implementing this feature and moved it into ATen!

vishwakftw · 2018-10-28T17:01:02Z

I think this can be closed now, since batch inverse is now part of master.

fmassa · 2018-10-28T17:42:19Z

Thanks @karol-arndt for the original implementation!

karol-arndt added 3 commits July 2, 2018 13:35

Implement batch matrix inverse.

2ea6a2b

Add derivative for batch matrix inverse.

c72d560

Add unit test for batch inverse.

b69f741

karol-arndt requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 2, 2018 13:21

karol-arndt added 4 commits July 9, 2018 15:05

Fix memory leak in batch inverse implementation. (#9102)

f7336b5

Add cuBLAS handle to ATen context. (#9102)

dd7a3b2

Add inverse implementations to ATen. (#9102)

0ded822

Remove old inverse implementations. (#9102)

3af6592

fmassa reviewed Jul 9, 2018

View reviewed changes

aten/src/ATen/native/cuda/Inverse.cu Outdated

scalar_t **output_gpu;

scalar_t **input_ptrs = new scalar_t*[batch_size];

scalar_t **output_ptrs = new scalar_t*[batch_size];

AT_CUDA_CHECK(cudaMalloc(&input_gpu, batch_size*sizeof(scalar_t*)));

This comment was marked as off-topic.

Sign in to view

Use THCudaFree and THCudaMalloc in inverse implementation. (#9102)

498859d

vadimkantorov mentioned this pull request Jul 9, 2018

Add eye_like #9281

Closed

Merge branch 'master' into implement-batch-inverse

cb3f11c

ailzhang assigned ailzhang and unassigned ailzhang Jul 24, 2018

fmassa mentioned this pull request Jul 27, 2018

[Feature request] Batch linear algebra operators #7500

Closed

14 tasks

vishwakftw mentioned this pull request Jul 28, 2018

[RFC, ready] Batched Inverse #9949

Closed

12 tasks

fmassa closed this Oct 28, 2018

ezyang added the open source label Jun 24, 2019

Implement batch matrix inverse #9102

Implement batch matrix inverse #9102

Uh oh!

Conversation

karol-arndt commented Jul 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Jul 2, 2018

Uh oh!

karol-arndt commented Jul 2, 2018

Uh oh!

vadimkantorov commented Jul 2, 2018

Uh oh!

fmassa commented Jul 2, 2018

Uh oh!

karol-arndt commented Jul 3, 2018

Uh oh!

fmassa commented Jul 3, 2018

Uh oh!

zou3519 commented Jul 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jul 3, 2018

Uh oh!

karol-arndt commented Jul 4, 2018

Uh oh!

ezyang commented Jul 4, 2018

Uh oh!

karol-arndt commented Jul 9, 2018

Uh oh!

ezyang commented Jul 9, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

soumith commented Jul 13, 2018

Uh oh!

karol-arndt commented Jul 16, 2018

Uh oh!

weiyangfb commented Aug 28, 2018

Uh oh!

vishwakftw commented Oct 28, 2018

Uh oh!

fmassa commented Oct 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

karol-arndt commented Jul 2, 2018 •

edited

Loading

zou3519 commented Jul 3, 2018 •

edited

Loading