[Feature Pitch] Fast extremal eigensolvers

## 🚀 Feature

When working with large matrices, it's useful to access one or a handful of desired eigenpairs without computing the complete spectrum. PyTorch currently offers only `torch.lobpcg`, which is limited to positive-definite matrices and is very slow in my experience. I've written a [PyTorch C++ extension](https://github.com/rfeinman/Torch-ARPACK) for identifying extremal eigenpairs based on canoncial Arnoldi & Lanczos techniques. These tools are applicable to a wide range of ML problems and would be a useful addition to the core PyTorch library. The algorithms are based on matrix-vector products and are therefore relevant to the ongoing `torch.sparse` project.

## Motivation

Regarding current PyTorch offerings, `torch.lobpcg` is limited to positive-definite matrices and it does not appear to be an efficient implementation of the LOBPCG algorithm (much slower than SciPy's equivalent lobpcg on CPU). `nn.utils.spectral_norm` computes a single eigenpair using power iteration; however, power iteration has poor convergence properties, and in the special case of symmetric/hermitian matrices, convergence can be easily improved with little trade-off.

The motivation is to augment the PyTorch library with fast, compiled tools based on cannonical algorithms for extremal eigensolvers. The proposed package is inspired by the [ARPACK](https://www.caam.rice.edu/software/ARPACK/) Fortran library. It uses only matrix-vector products and is therefore applicable to sparse matrices. Even for matrices stored in dense format, the algorithm is faster and has a smaller memory footprint compared to full-spectrum methods.


## Pitch

The current package (temporarily named "arpack") includes a single tool `eigsh` which is applicable to symmetric matrices that are potentially indefinite. At each step, the algorithm performs an m-step Lanczos factorization and subsequently calls LAPACK routines *?stebz* and *?stein* to compute a partial Schur decomposition of the resulting tridiagonal system. There are no MAGMA analogues to *?stebz* and *?stein* that I'm aware of; therefore, only CPU is currently supported (see "Alternatives" below for some proposals re:GPU).

Snapshot:
```python
import torch
import arpack

# symmetric matrix input
A = torch.randn(1024, 1024)
A = torch.mm(A.t(), A)

# quickly compute the largest/smallest (algebraic) eigenpair
eigval, eigvec, n_iter = arpack.eigsh(A, largest=True)
```

In my runtime expriments, the algorithm is much faster than `torch.linalg.eigh`/`torch.lobpcg` and it comes close to `scipy.sparse.linalg.eigsh`. These experiments were run on CPU using a pytorch compiled with MKL and OpenMP threading.

Results:

```
[--------------------------- eigensolvers --------------------------]
                        |   256    |   512    |    1024   |    2048
6 threads: ----------------------------------------------------------
      torch_eigh        |  1803.1  |  7187.1  |  30054.7  |  208623.6
      torch_lobpcg      |  5670.4  |  9149.9  |  38405.4  |   47878.6
      arpack_eigsh      |   366.0  |  2171.9  |   4787.2  |   15731.7
      scipy_eigsh       |   629.9  |  1023.1  |   3104.4  |   11026.9
      scipy_lobpcg      |  5705.1  |  7917.0  |   8210.3  |   16537.1

Times are in microseconds (us).
```

The benchmark script can be viewed [here](https://github.com/rfeinman/Torch-ARPACK/blob/master/scripts/benchmark.py).


## Alternatives

In addition to the custom `eigsh` function, I've also included a function `eigsh_mkl` that wraps an equivalent tool from the Intel Math Kernel Library (MKL). It is a bit slower than my own implementation, but potentially more reliable as an Intel-backed product.

I've started tracking a list of relevant CPU and CUDA implementations for extremal eigenvalue problems [here](https://github.com/rfeinman/Torch-ARPACK#appendix-cpu--cuda-partial-eigensolver-index). For positive definite matrices, there appears to be an implementation of LOBPCG available for both platforms. In general, offerings on CUDA are relatively limited. My own `eigsh` implementation is portable to CUDA in theory, with the exception of LAPACK routines *?stebz* and *?stein*, for which I'm not aware of any MAGMA/CuSOLVER alternatives.


cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @Lezcano @aocsa @cpuhrsch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Pitch] Fast extremal eigensolvers #58828

🚀 Feature

Motivation

Pitch

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Pitch] Fast extremal eigensolvers #58828

Description

🚀 Feature

Motivation

Pitch

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions