Linear algebra GPU backend tracking issue [magma/cusolver/cublas]



## Linear algebra GPU backend tracking issue [MAGMA/cuSOLVER/cuBLAS]

Currently, most GPU linear algebra operators are using MAGMA for their backends, with only a few using cuSOLVER/cuBLAS instead. To improve performance, we would like to migrate the backend of bad-performing MAGMA linear algebra operators to cuSOLVER/cuBLAS backends if they perform better.

This issue is used to track which linear algebra operators currently do **not** use MAGMA as their GPU backend by default, and also track a list of known bad-performing MAGMA operators that could benefit from cuSOLVER/cuBLAS. Feel free to modify this list and link to this issue if you are aware of any such operators.

We welcome contributions to add cuSOLVER/cuBLAS backends for bad-performing MAGMA operators. Please make sure you add benchmark for your PR, and add heuristics that dispatch the operator to different backends if necessary.

(This issue doesn't track CPU or other backends.)

### CUDA version requirement for cuSOLVER/cuBLAS

cuSOLVER/cuBLAS is only enabled when CUDA version is >= 10.1.243 [#45452]. There is no limitation on GPU architectures.
If your CUDA version is lower than that, everything will be dispatched to MAGMA. If MAGMA is not linked in your build, you will get runtime error while calling these linear algebra operators on GPU.


### Operators that currently use non-MAGMA backends

For simplicity, we use `b` for batch size, `m`, `n` for matrix size. A two-dimensional tensor is considered a batch size 1 matrix. Without explicit exceptions, `b == 1` cases include both 2d tensor and >=3d tensor with batch dimension == 1.

Also, most `torch.linalg.x` shares the same backend as `torch.x` linear algebra operator by default.

| operator | cusolver? | magma? | others? | comment |
| --- | --- | --- | --- | --- |
| `torch.inverse`, `torch.linalg.inv_ex` | `b <= 2` | otherwise | | |
| `torch.svd` | always | | | `if (m <= 32 && n <= 32 && b > 1 && ( !some \|\| m == n )) gesvdjBatched; else gesvdj;` |
| `torch.cholesky`, `torch.linalg.cholesky_ex` | always | otherwise | | `b > 1` uses cusolver only when cuda >= 11.3 |
| `torch.cholesky_solve` | `b == 1` | otherwise | | |
| `torch.cholesky_inverse` | `b == 1` | otherwise | | It uses `cholesky_solve` as the backend. |
| `torch.orgqr` | always | | | |
| `torch.ormqr` | always | | | |
| `torch.geqrf` | always | | | `if (n <= 256 && b >= max(2, n / 16)) cublas_batched; else cusolver_looped` |
| `torch.linalg.qr` | always | | | It uses `geqrf` + `orgqr` as the backend. |
| `torch.linalg.eigh` | always | | | |
| `torch.lu_solve` | `(b == 1 && n > 512) \|\| (b > 2 && n <= 128)` | otherwise | | `b` and `n` are tensor sizes of `LU_data` or matrix "A". |
| `torch.lstsq` | always | | | It uses `geqrf`, `ormqr`, and `triangular_solve`. |


last updated fe4ded01f7cf7b5c571f521fe011633c8d75a96b, June 29th, 2021


### Pytorch 1.9 linear algebra development plan

See https://github.com/pytorch/pytorch/issues/47953#issuecomment-822875963

### For detailed MAGMA mechanism

See https://github.com/pytorch/pytorch/issues/47953#issuecomment-762389928

### See also
- #42666
- #26996
- #42403
- #53879

cc @ezyang @gchanan @zou3519 @bdhirsh @ngimel @vishwakftw @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @VitalyFedyunin @ptrblck @IvanYashchuk 

operator	cusolver?	magma?	comment
`torch.inverse`, `torch.linalg.inv_ex`	`b <= 2`	otherwise
`torch.svd`	always		`if (m <= 32 && n <= 32 && b > 1 && ( !some \|\| m == n )) gesvdjBatched; else gesvdj;`
`torch.cholesky`, `torch.linalg.cholesky_ex`	always	otherwise	`b > 1` uses cusolver only when cuda >= 11.3
`torch.cholesky_solve`	`b == 1`	otherwise
`torch.cholesky_inverse`	`b == 1`	otherwise	It uses `cholesky_solve` as the backend.
`torch.orgqr`	always
`torch.ormqr`	always
`torch.geqrf`	always		`if (n <= 256 && b >= max(2, n / 16)) cublas_batched; else cusolver_looped`
`torch.linalg.qr`	always		It uses `geqrf` + `orgqr` as the backend.
`torch.linalg.eigh`	always
`torch.lu_solve`	`(b == 1 && n > 512) \|\| (b > 2 && n <= 128)`	otherwise	`b` and `n` are tensor sizes of `LU_data` or matrix "A".
`torch.lstsq`	always		It uses `geqrf`, `ormqr`, and `triangular_solve`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Linear algebra GPU backend tracking issue [MAGMA/cuSOLVER/cuBLAS]

CUDA version requirement for cuSOLVER/cuBLAS

Operators that currently use non-MAGMA backends

Pytorch 1.9 linear algebra development plan

For detailed MAGMA mechanism

See also

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Description

Linear algebra GPU backend tracking issue [MAGMA/cuSOLVER/cuBLAS]

CUDA version requirement for cuSOLVER/cuBLAS

Operators that currently use non-MAGMA backends

Pytorch 1.9 linear algebra development plan

For detailed MAGMA mechanism

See also

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions