Feature
Abstraction for lazy evaluation of tensors that can make use of special matrix structure for linear algebra operations, similar to Tensorflow's / scipy's LinearOperator.
Motivation
In many situations, sequences of operations involving tensors can often be expressed and computed without ever instantiating the involved tensors. This can be achieved using lazy evaluation, so that a compiler can optimize the operations (“operator fusion”). In the context of linear algebra, the involved (input or intermediate) tensors often have special structure (e.g. diagonal, block diagonal, triangular, Toeplitz, …). Exploiting this structure using (semi-)symbolic evaluation and reductions can significantly decrease time and memory complexity, often by many orders of magnitude (see example below).
While TensorFlow’s LinearOperator (and to a lesser degree, scipy’s) provide a basic abstraction for implementing such operators, PyTorch does not have this concept. This is a significant shortcoming for researchers whose work involves a lot of linear algebra.
General Wishlist
In order for this to be general useful, we should have the following:
- PyTorch autograd functions can accept the operator abstraction
- The operator abstraction accept the same methods as Tensors, with the default implementations being to evaluate the operator and perform standard operations on the materialized tensors.
Alternatives / 3rd party solutions
Third-party implementations include KeOps’s LazyTensor, which compiles the symbolic operations (basically a JIT) and GPyTorch’s LazyTensor, which performs reductions in a pair-wise fashion using the PyTorch’s python frontend.
Illustration: Simple example using GPyTorch LazyTensor
import torch
from gpytorch.lazy import DiagLazyTensor
d = torch.rand(10000)
D = d.diag() # 10000 x 10000 tensor
D_lazy = DiagLazyTensor(d) # same object, but represented only by d
(D @ D).diag() # 1.6 seconds
(D_lazy @ D_lazy).diag() # 0.000027 seconds (D_lazy @ D_lazy is a DiagLazyTensor)
Many other operations can similarly be accelerated when exploiting known tensor structure, including matrix decompositions (Cholesky) and spectral operations.
Limitations
Both of these solutions have significant shortcomings. KeOps requires setting up an external compiler toolchain (which can be painful), and does not straightforwardly support sparse / structured representations. GPyTorch, being implemented in pure python, incurs overhead that for smaller tensors / simpler operations often outweighs the benefits.
Pyro’s suggested Funsor abstraction also addresses the issue, but is much more general in scope (it generalizes the tensor interface to also cover arbitrary functions of multiple variables, where variables may be integers, real numbers or themselves tensors).
Existing plans for Lazy Tensors
The proposed implementation for lazy tensors in #25753 lays the groundwork for lazy evaluation. However, so far it only deals with the basic dispatch mechanism, not with any actual code optimizations. It should be possible to implement optimization of the involved tensors by manipulating PyTorch Internal Representation generated during the evaluation of the lazy operations.
In order to be able to exploit special structure, it seems that one would need to have certain structured tensor primitives (similar to GPyTorch’s BlockDiagLazyTensor or TensorFlow’s LinearOperatorKronecker).
@gpleiss, @jacobrgardner, @vishwakftw, @bwasti
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @Varal7 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @vincentqb @vishwakftw @ssnl
Feature
Abstraction for lazy evaluation of tensors that can make use of special matrix structure for linear algebra operations, similar to Tensorflow's / scipy's
LinearOperator.Motivation
In many situations, sequences of operations involving tensors can often be expressed and computed without ever instantiating the involved tensors. This can be achieved using lazy evaluation, so that a compiler can optimize the operations (“operator fusion”). In the context of linear algebra, the involved (input or intermediate) tensors often have special structure (e.g. diagonal, block diagonal, triangular, Toeplitz, …). Exploiting this structure using (semi-)symbolic evaluation and reductions can significantly decrease time and memory complexity, often by many orders of magnitude (see example below).
While TensorFlow’s
LinearOperator(and to a lesser degree, scipy’s) provide a basic abstraction for implementing such operators, PyTorch does not have this concept. This is a significant shortcoming for researchers whose work involves a lot of linear algebra.General Wishlist
In order for this to be general useful, we should have the following:
Alternatives / 3rd party solutions
Third-party implementations include KeOps’s LazyTensor, which compiles the symbolic operations (basically a JIT) and GPyTorch’s LazyTensor, which performs reductions in a pair-wise fashion using the PyTorch’s python frontend.
Illustration: Simple example using GPyTorch LazyTensor
Many other operations can similarly be accelerated when exploiting known tensor structure, including matrix decompositions (Cholesky) and spectral operations.
Limitations
Both of these solutions have significant shortcomings. KeOps requires setting up an external compiler toolchain (which can be painful), and does not straightforwardly support sparse / structured representations. GPyTorch, being implemented in pure python, incurs overhead that for smaller tensors / simpler operations often outweighs the benefits.
Pyro’s suggested Funsor abstraction also addresses the issue, but is much more general in scope (it generalizes the tensor interface to also cover arbitrary functions of multiple variables, where variables may be integers, real numbers or themselves tensors).
Existing plans for Lazy Tensors
The proposed implementation for lazy tensors in #25753 lays the groundwork for lazy evaluation. However, so far it only deals with the basic dispatch mechanism, not with any actual code optimizations. It should be possible to implement optimization of the involved tensors by manipulating PyTorch Internal Representation generated during the evaluation of the lazy operations.
In order to be able to exploit special structure, it seems that one would need to have certain structured tensor primitives (similar to GPyTorch’s
BlockDiagLazyTensoror TensorFlow’sLinearOperatorKronecker).@gpleiss, @jacobrgardner, @vishwakftw, @bwasti
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @Varal7 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @vincentqb @vishwakftw @ssnl