Generalized LU factorization #28608

pearu · 2019-10-24T18:26:37Z

This PR implements support for generalized LU factorization that is required for various algorithms such as PCA (see issue #8049).

vishwakftw

I've been lurking around this PR, waiting for it to be completed, and it finally is! Thank you!!

I just have some comments before this lands.

aten/src/ATen/native/BatchLinearAlgebra.cpp

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

test/test_torch.py

torch/functional.py

ezyang · 2019-10-28T14:11:22Z

fbgemm submodule update doesn't look intentional

pearu · 2019-10-28T18:16:04Z

@ezyang , the fbgemm submodule update issue is now fixed.

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

ezyang · 2019-10-29T14:33:13Z

What still needs to be done on this PR? It's titled WIP

pearu · 2019-10-29T16:39:35Z

@ezyang , the PR is complete as it is, however, there is a magma bug issue, see
#28608 (comment)
that appears only for small square singular matrices on cuda devices. @vishwakftw had concerns that users might interpret the appearing nans as their errors and, as I understood, suggested to hold back this PR. I believe that the issue should be fixed in magma and until then document the magma bug in lu documentation. If that would be satisfactory, I can update the PR and remove WIP asap.
.

vishwakftw · 2019-10-29T18:28:43Z

@pearu, I think the PR can land as is, but raise an issue when the matrices are singular i.e., preserving older behavior. Perhaps when the magma bug is fixed, we can enable accepting singular matrices. Do you think this is reasonable?

pearu · 2019-10-29T19:28:38Z

@vishwakftw, let me sum up the situation:
The old behavior is:

LU factorization accepts only square matrices
LU factorization raises an error when the input matrix is singular

This PR changes the old behavior as follows:
A. LU factorization accepts square and non-square matrices
B. LU factorization allows singular square matrices
C. LU factorization allows singular non-square matrices

The MAGMA bug is effective only when the following conditions are satisfied all at the same time:
(i) A data is stored on a CUDA device
(ii) len(A.shape) > 2, that is, the input A contains batches of matrices
(iii) m == n and m <= 32, that is, the batches contain small square matrices
(iv) pivot == True

To preserve the backward compatibility behavior, avoid the MAGMA bug, and support non-square LU factorizations in full, it would be sufficient to enable the following features from this PR:

A, B, C if storage is on cpu RAM
A, B, C if storage is on cuda RAM and len(A.shape) == 2
A, C if storage is cuda RAM and len(A.shape) > 2, B is disabled (that is, singular square matrices cause an exception when used in patches) until a fix to MAGMA bug will be available.

In essence, the current NaN results will be replaced with an exception until the NaN results will be made impossible by a fix to the MAGMA bug.

Notice that I tend to insist that singular inputs should be allowed as much as possible because for linear algebra algorithms such as truncated SVD or PCA, not allowing singular inputs to LU would be a big blocker. In my opinion, the MAGMA bug is effective only for a small subset of possible application and it would not be reasonable to hold back the development of new algorithms for which there is high demand.

Btw, there exists also another approach: instead of using magmaLuBatched, use the same approach as in the case of CPU RAM storage. That will reduce performance for certain (but possibly important) cases but would enable A, B, C in-full.

ezyang · 2019-10-30T14:31:30Z

This approach SGTM!

pearu · 2019-10-30T15:06:23Z

This approach SGTM!

@ezyang Just to be sure, which approach are you referring to?

Raise an exception if a singular square matrix is used in patches. (A, C, partial B)
Discard using magmaLuBatched and use the CPU algorithm. (A, B, C, all in-full)

vishwakftw · 2019-10-30T19:43:46Z

@pearu maybe there is another way to resolve it, and this won't affect runtime for non-singular matrices.

I propose the following:

perform batched LU using magmaLuBatched.
Find if there are any singular matrices
If yes, replace nans with zero, and fix the pivot tensor to be arange(1, n + 1).

pearu · 2019-10-30T20:23:23Z

I like the proposal, @vishwakftw, as it will be a full solution.
However, I am not sure that replacement nan -> 0 will always give the correct LU factorization.
It might be safer to recompute the LU factorization using magmaLu for singular matrices.

Re pivot == arange(1, n + 1), that might not be generally correct. In the case of recomputing LU, it would also be unnecessary.

vishwakftw · 2019-10-31T14:41:52Z

@pearu we could also do that: identify singular matrices using infos and perform the LU decomposition for them separately. This would solve the pivot issue too.

Need to fix the singular case on CUDA

pearu · 2019-10-31T15:22:28Z

yes, exactly, I am currently working on this approach.

kostmo · 2019-10-31T17:57:57Z

CircleCI build failures summary

As of commit 4a72b8b:

0/2 flaky
2/2 failures introduced in this PR

Here are the reasons each build failed.

This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 5 time(s).

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

test/common_utils.py

test/test_torch.py

torch/functional.py

test/common_utils.py

vishwakftw

I think the changes look good to me.

Would it be possible for you to check if there is a perf regression for non-singular matrices?

pearu · 2019-11-01T19:21:16Z

Theoretically, there should not be as the loop over infos tensor in batchCheckErrors was replaced with another loop over infos tensor for checking the singularity.
However, I noticed that batchCheckErrors copies infos to host ram before running the loop while the loop for singularity checking does not do that. This means a performance penalty as the infos tensor is accessed item-wise from cuda memory...
I'll fix it and run some performance tests as well.

pearu · 2019-11-01T22:19:29Z

For the cases where magma issue 13 is effective, there exists some regression in the performance that is related to the overhead of scanning the info tensor for positive values. For instance, consider an LU factorization of an identity matrix in batches:

import torch
N = <number of batches>
n = <matrix size>
a=torch.zeros((N, n, n)); a[:]=torch.eye(n); a_cuda=a.cuda()
timing = <timeit a_cuda.lu(pivot=True)>

We have the following timing results (best of three runs):

N,n	pytorch master	this PR
10,2	108 µs ± 3.7 µs	120 µs ± 425 ns
10000, 2	154 µs ± 200 ns	183 µs ± 4.17 µs
10000, 32	567 µs ± 2.48 µs	592 µs ± 445 ns
10000, 33	2.39 ms ± 1.8 µs	2.39 ms ± 732 ns

So, the additional overhead is about 12-30 µs per lu call iff magma issue 13 is effective.
Otherwise (see the last row), there is no regression in performance.

Btw, the timings were identical when using a=torch.randn((N, n, n))

vishwakftw · 2019-11-02T16:42:44Z

I guess this fine, thank you for providing the benchmarks @pearu.

vishwakftw · 2019-11-02T16:42:53Z

@pytorchbot rebase this please

pearu · 2019-11-04T20:15:54Z

@vishwakftw @ezyang , is rebasing this PR stuck somewhere?

vishwakftw · 2019-11-04T20:23:16Z

Semi-automatic rebase is not working, could you please try to rebase manually?

pearu · 2019-11-04T20:34:59Z

@pytorchbot rebase this please

pearu · 2019-11-05T08:16:29Z

@vishwakftw, rebase is done.

vishwakftw · 2019-11-05T15:05:29Z

Failures look spurious. @pytorchbot merge this please.

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-11-05T21:42:22Z

@ezyang merged this pull request in fd4f22e.

Summary: This PR implements support for generalized LU factorization that is required for various algorithms such as PCA (see issue pytorch/pytorch#8049). Pull Request resolved: pytorch/pytorch#28608 Differential Revision: D18326449 Pulled By: ezyang fbshipit-source-id: d4011d75710e06e87ddbf5ad9afae42ba3330548

pearu added the open source label Oct 24, 2019

pearu self-assigned this Oct 24, 2019

pearu force-pushed the pearu/generalized-LU branch from 4b5bb4e to f5d59f6 Compare October 27, 2019 15:58

pearu changed the title ~~Generalized LU factorization (WIP)~~ Generalized LU factorization Oct 27, 2019

pearu requested review from ezyang, gchanan, vishwakftw and yf225 October 27, 2019 21:16

vishwakftw previously approved these changes Oct 27, 2019

View reviewed changes

pearu requested a review from vishwakftw October 28, 2019 08:50

pearu force-pushed the pearu/generalized-LU branch 4 times, most recently from 4703bf3 to b71a29d Compare October 28, 2019 15:48

vishwakftw reviewed Oct 28, 2019

View reviewed changes

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu Outdated Show resolved Hide resolved

pearu changed the title ~~Generalized LU factorization~~ Generalized LU factorization [WIP] Oct 29, 2019

pearu changed the title ~~Generalized LU factorization [WIP]~~ Generalized LU factorization Oct 29, 2019

pearu requested a review from vishwakftw October 31, 2019 20:39

vishwakftw reviewed Nov 1, 2019

View reviewed changes

pearu requested a review from vishwakftw November 1, 2019 16:36

vishwakftw approved these changes Nov 1, 2019

View reviewed changes

pearu force-pushed the pearu/generalized-LU branch from e40ccca to 7c7bd66 Compare November 4, 2019 20:32

Generalized LU factorization

a277f54

pearu force-pushed the pearu/generalized-LU branch from 7c7bd66 to a277f54 Compare November 4, 2019 21:02

pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Nov 5, 2019

facebook-github-bot reviewed Nov 5, 2019

View reviewed changes

facebook-github-bot closed this in fd4f22e Nov 5, 2019

facebook-github-bot added the merged label Nov 5, 2019

pearu deleted the pearu/generalized-LU branch November 5, 2019 21:43

mruberry added the Merged label Oct 28, 2020

IvanYashchuk mentioned this pull request Aug 26, 2021

check_errors argument does nothing in torch._lu_with_info #64014

Closed

nikitaved mentioned this pull request Aug 27, 2024

linalg.lu_factor: LU without pivoting is not implemented on the CPU #134459

Open

Generalized LU factorization #28608

Generalized LU factorization #28608

Uh oh!

Conversation

pearu commented Oct 24, 2019

Uh oh!

vishwakftw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang commented Oct 28, 2019

Uh oh!

pearu commented Oct 28, 2019

Uh oh!

Uh oh!

ezyang commented Oct 29, 2019

Uh oh!

pearu commented Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishwakftw commented Oct 29, 2019

Uh oh!

pearu commented Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 30, 2019

Uh oh!

pearu commented Oct 30, 2019

Uh oh!

vishwakftw commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pearu commented Oct 30, 2019

Uh oh!

vishwakftw commented Oct 31, 2019

Uh oh!

pearu commented Oct 31, 2019

Uh oh!

kostmo commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CircleCI build failures summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vishwakftw left a comment

Choose a reason for hiding this comment

Uh oh!

pearu commented Nov 1, 2019

Uh oh!

pearu commented Nov 1, 2019

Uh oh!

vishwakftw commented Nov 2, 2019

Uh oh!

vishwakftw commented Nov 2, 2019

Uh oh!

pearu commented Nov 4, 2019

Uh oh!

vishwakftw commented Nov 4, 2019

Uh oh!

pearu commented Nov 4, 2019

Uh oh!

pearu commented Nov 5, 2019

Uh oh!

vishwakftw commented Nov 5, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 5, 2019

Uh oh!

pearu commented Oct 29, 2019 •

edited

Loading

pearu commented Oct 29, 2019 •

edited

Loading

vishwakftw commented Oct 30, 2019 •

edited

Loading

kostmo commented Oct 31, 2019 •

edited

Loading