-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🚀 Feature
When working with large matrices, it's useful to access one or a handful of desired eigenpairs without computing the complete spectrum. PyTorch currently offers only torch.lobpcg, which is limited to positive-definite matrices and is very slow in my experience. I've written a PyTorch C++ extension for identifying extremal eigenpairs based on canoncial Arnoldi & Lanczos techniques. These tools are applicable to a wide range of ML problems and would be a useful addition to the core PyTorch library. The algorithms are based on matrix-vector products and are therefore relevant to the ongoing torch.sparse project.
Motivation
Regarding current PyTorch offerings, torch.lobpcg is limited to positive-definite matrices and it does not appear to be an efficient implementation of the LOBPCG algorithm (much slower than SciPy's equivalent lobpcg on CPU). nn.utils.spectral_norm computes a single eigenpair using power iteration; however, power iteration has poor convergence properties, and in the special case of symmetric/hermitian matrices, convergence can be easily improved with little trade-off.
The motivation is to augment the PyTorch library with fast, compiled tools based on cannonical algorithms for extremal eigensolvers. The proposed package is inspired by the ARPACK Fortran library. It uses only matrix-vector products and is therefore applicable to sparse matrices. Even for matrices stored in dense format, the algorithm is faster and has a smaller memory footprint compared to full-spectrum methods.
Pitch
The current package (temporarily named "arpack") includes a single tool eigsh which is applicable to symmetric matrices that are potentially indefinite. At each step, the algorithm performs an m-step Lanczos factorization and subsequently calls LAPACK routines ?stebz and ?stein to compute a partial Schur decomposition of the resulting tridiagonal system. There are no MAGMA analogues to ?stebz and ?stein that I'm aware of; therefore, only CPU is currently supported (see "Alternatives" below for some proposals re:GPU).
Snapshot:
import torch
import arpack
# symmetric matrix input
A = torch.randn(1024, 1024)
A = torch.mm(A.t(), A)
# quickly compute the largest/smallest (algebraic) eigenpair
eigval, eigvec, n_iter = arpack.eigsh(A, largest=True)In my runtime expriments, the algorithm is much faster than torch.linalg.eigh/torch.lobpcg and it comes close to scipy.sparse.linalg.eigsh. These experiments were run on CPU using a pytorch compiled with MKL and OpenMP threading.
Results:
[--------------------------- eigensolvers --------------------------]
| 256 | 512 | 1024 | 2048
6 threads: ----------------------------------------------------------
torch_eigh | 1803.1 | 7187.1 | 30054.7 | 208623.6
torch_lobpcg | 5670.4 | 9149.9 | 38405.4 | 47878.6
arpack_eigsh | 366.0 | 2171.9 | 4787.2 | 15731.7
scipy_eigsh | 629.9 | 1023.1 | 3104.4 | 11026.9
scipy_lobpcg | 5705.1 | 7917.0 | 8210.3 | 16537.1
Times are in microseconds (us).
The benchmark script can be viewed here.
Alternatives
In addition to the custom eigsh function, I've also included a function eigsh_mkl that wraps an equivalent tool from the Intel Math Kernel Library (MKL). It is a bit slower than my own implementation, but potentially more reliable as an Intel-backed product.
I've started tracking a list of relevant CPU and CUDA implementations for extremal eigenvalue problems here. For positive definite matrices, there appears to be an implementation of LOBPCG available for both platforms. In general, offerings on CUDA are relatively limited. My own eigsh implementation is portable to CUDA in theory, with the exception of LAPACK routines ?stebz and ?stein, for which I'm not aware of any MAGMA/CuSOLVER alternatives.
cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @aocsa @cpuhrsch