(AB)T = B T AT
1
If A, B invertible, (AB) = B 1A 1
If A, B square and AB = I, then BA = I
1
Im X I X
= m
0 In 0 In
Neumann series: if kXk < 1 in any norm,
1
(I X) = I + X + X2 + X3 + · · ·
P
For a square n ⇥ n matrix A, the trace is Trace(A) = ni=1 Ai,i (sum of diagonals).
For any X, Y such that XY Pis square,
P Trace(XY ) = Trace(Y X) (quite useful). For
B 2 Rm⇥n , we have kBk2F = i j |Bij |2 = Trace(B T B).
Triangular structure (upper or lower) is invariant under addition, multiplication, and
inversion. That is, triangular matrices form a ring (in abstract algebra; don’t worry if
this is foreign to you).
Symmetry is invariant under addition and inversion, but not multiplication; AB is
usually not symmetric even if A, B are.
2 SVD: the most important matrix decomposition
We now start the discussion of the most important topic of the course: the singular value
decomposition (SVD). The SVD exists for any matrix, square or rectangular, real or complex.
We will prove its existence and discuss its properties and applications in particular in
low-rank approximation, which can immediately be used for compressing the matrix, and
therefore data.
The SVD has many intimate connections to symmetric eigenvalue problems. Let’s start
with a review.
Symmetric eigenvalue decomposition: Any symmetric matrix A 2 Rn⇥n has the
decomposition
A = V ⇤V T (1)
where V is orthogonal, V T V = In = V V T , and ⇤ = diag( 1 , . . . , n) is a diagonal
matrix of eigenvalues.
i are the eigenvalues, and V is the matrix of eigenvectors (its columns are the eigenvec-
tors).
The decomposition (1) makes two remarkable claims: the eigenvectors can be taken to
be orthogonal (which is true more generally of normal matrices s.t. A⇤ A = AA⇤ ), and the
eigenvalues are real.
14
It is worth reminding you that eigenvectors are not uniquely determined: (assuming
they are normalised s.t. the 2-norm is 1) their signs can always be flipped. (And actually
more, when there are eigenvalues that are multiple i = j ; in this case the eigenvectors
span a subspace whose dimension matches the multiplicity. For example, any vector is an
eigenvector of the identity matrix I).
Now here is the protagonist of this course: Be sure to spend dozens (if not hundreds) of
hours thinking about it!
Theorem 2.1 (Singular Value Decomposition (SVD)) Any matrix A 2 Rm⇥n has the
decomposition :
A = U ⌃V T . (2)
Here U T U = V T V = In (assuming m n for definiteness), ⌃ = diag( 1 , . . . , n ), 1 2
··· n 0.
i (always nonnegative) are called the singular values of A. The rank of A is the number
of positive singular values. The columns of U are called the left singular vectors, and the
columns of V are the right singular vectors.
Writing U = [uP1n, . . . , un ]T and V = [v1 , . . . , vn ], we have an important alternative expres-
sion for A: A = i=1 i ui vi . We will use this expression repeatedly in what follows. Also
note that we always order the singular values in nonincreasing order, so 1 is always the
largest singular value.
The SVD tells us that any (tall) matrix can be written as orthonormal-diagonal-orthogonal.
Roughly, ortho-normal/gonal matrices can be thought of as rotations or reflection, so the
SVD says the action of a matrix can be thought of as a rotation/reflection followed by
magnification (or shrinkage), followed by another rotation/reflection.
Proof: For SVD6 (m n and assume full-rank n > 0 for simplicity): Take Gram matrix
AT A (symmetric) and its eigendecomposition AT A = V ⇤V T with V orthogonal. ⇤ is non-
negative, and (AV )T (AV ) =: ⌃2 is diagonal, so AV ⌃ 1 =: U is orthonormal. Right-multiply
by ⌃V T to get A = U ⌃V T . ⇤
⌃ T
It is also worth mentioning the “full” SVD: A = [U, U? ] V where [U, U? ] 2 Rm⇥m
0
is square and orthogonal. U? 2 Rm⇥m n (pronounced ’U-perp’, for perpendicular) is an or-
thonormal matrix such that U T U? = 0 (whose existence and construction can be established
via the Householder QR factorsation in Section 6). Essentially the full SVD follows from the
(thin) SVD (2) by filling in U in (2) with its orthogonal complement U? .
6
I like to think this is the shortest proof out there.
15
2 3
1 2
62 17
Example 2.1 Let’s find the SVD of A = 6 41
7 . We can follow the proof and com-
05
0 1
6 4
pute the Gram matrix AT A = . Its eigenvalues are 10 and 2 (via the characteristic
4 6
polynomial ( 6)( 6) 42 = 0—please note though that this isn’t how they should be
computed on a computer (the QR algorithm), which we’ll cover later.) An eigenvector matrix
1 1
can be found as V = p12 (degree of freedom in the signs of the columns), computed
1 1
T 2 T 2 10
e.g. via the null vector of (A I). Hence A A = V ⌃ V where ⌃ = . Let
2
2 p 3
3/p 20 1/2
6 3/ 20 1/27
U = AV ⌃ 1 = 6 p
4 1/ 20
7, which is orthonormal. We thus have A = U ⌃V T .
5
p 1/2
1/ 20 1/2
2.1 (Some of the many) applications and consequences of the
SVD: rank, column/row space, etc
From the SVD one can immediately read o↵ a number of important properties of the matrix.
For example:
The rank r of A 2 Rm⇥n , often denoted rank(A): this is the number of nonzero (posi-
tive) singular values i (A). rank(A) is also equal to the number of linearly independent
rows or the number of independent columns, as you probably learnt in your first course
on linear algebra (exercise).
P
– We can always write A = rank i=1
(A) T
i ui vi .
Important: An m ⇥ n matrix A that is of rank r can be written as an (outer) product
of m ⇥ r and r ⇥ n matrices:
Ar = Ur ⌃r VrT
P P
To see this, note from the SVD that A = ni=1 i ui viT = ri=1 i ui viT (since r+1 = 0),
and so A = Ur ⌃r VrT where Ur = [u1 , . . . , ur ], Vr = [v1 , . . . , vr ], and ⌃r is the leading
r ⇥ r submatrix of ⌃.
Column space (Span(A), linear subspace spanned by vectors Ax): span of U = [u1 , . . . , ur ],
often denoted Span(U ).
16
Row space (Span(AT )): row span of v1T , . . . , vrT . Span(V ).
Null space Null(A): span of vr+1 , . . . , vn ; as Av = 0 for these vectors. Null(A) is empty
(or just the 0 vector) if m n and r = n. (When m < n the full SVD is needed to
describe Null(A).)
We have Avi = i ui , and uTi A = i vi . If i , ui , vi satisfy this pair of equations, they
are called a (the ith) singular triplet of A.
Aside from these and other applications, the SVD is also a versatile theoretical tool. Very
often, a good place to start in proving a fact about matrices is to first consider its SVD; you
will see this many times in this course. For example, the SVD can give solutions immediately
for linear systems and least-squares problems, though there are more efficient ways to solve
these problems.
2.2 SVD and symmetric eigenvalue decomposition
As mentioned above, the SVD and the eigenvalue decomposition for symmetric matrices are
closely connected. Here are some results that highlight the connections between A = U ⌃V T
and symmetric eigenvalue decomposition. (We assume m n for definiteness)
V is an eigvector matrix of AT A. (To verify, see proof of SVD)
U is an eigvector matrix (for nonzero eigvals) of AAT (be careful with sign flips; see
below)
p
T
i = i (A A) for i = 1, . . . , n
If A is symmetric, its singular values i (A) are the absolute values of its eigenvalues
i (A), i.e., i (A) = | i (A)|.
Exercise: What if A is unitary, skew-symmetric, normal matrices, triangular? (problem
sheet)
⇥ ⇤
Jordan-Wieldant matrix A0T A0 : This matrix⇥ has eigenvalues
⇤ ± i (A), and m n copies
U U U?
of eigenvalues at 0. Its eigenvector matrix is V V 0 , where AT U? = 0 (U? is empty
when m = n). This matrix, along with the Gram matrix AT A, is a very useful tool
when one tries to extend a result on symmetric eigenvalues to an analogue in terms of
the SVD (or vice versa).
2.3 Uniqueness etc
We have established the existence of the SVD A = U ⌃V T . A natural question is: is it
unique? In other words, are the factors U, ⌃, V uniquely determined by A?
It is straightforward to see that the singular vectors are not uniquely determined. Most
obviously, the singular vectors can be flipped in signs, just like eigenvectors. However, note
17
that the signs of ui , vi are not entirely arbitrary:
Pn if we replace ui by ui , the same needs to
be done for vi in order to satisfy A = i=1 i ui viT . Essentially, once the sign (or rotation)
of ui (or vi ) is fixed, vi (or ui ) is determined uniquely.
More generally, in the presence of multiple singular values (i.e., i = i+1 for some
i), there is a higher degree of freedom in the SVD. Again this is very much analogous to
eigenvalues (recall the discussion on eigenvectors of the identity). Here think of what the
SVD is for an orthogonal matrix: there is an enormous amount of degrees of freedom in the
choice of U and V . The singular values i , on the other hand, are always unique, as they
are the eigenvalues of the Gram matrix.
3 Low-rank approximation via truncated SVD
While the SVD has a huge number of applications, undoubtedly the biggest reason that makes
it so important in computational mathematics is its optimality for low-rank approximation.
To discuss this topic we need some preparation. We will make heavy use of the spectral
norm of matrices. We start with an important characterisation of kAk2 in terms of the
singular value(s), which we previously stated but did not prove.
Proposition 3.1
kAxk2
kAk2 = max = max kAxk2 = 1 (A).
x kxk2 kxk2 =1
Proof: Use the SVD: For any x with unit norm kxk2 = 1,
kAxk2 = kU ⌃V T xk2
= k⌃V T xk2 by unitary invariance
= k⌃yk2 with kyk2 = 1
v
u n
uX
=t 2 2
i yi
i=1
v
u n
uX
t 2 2
1 yi = 2
1 kyk2 = 1.
i=1
Finally, note that taking x = v1 (the leading right singular vector),
qP P we have kAv1 k2 = 1 . ⇤
pPn
Similarly, the Frobenius norm can be expressed as kAkF = 2 2
i j |Aij | = i=1 ( i (A)) ,
P
and the trace norm is (by definition) kAk⇤ = min(m,n)
i=1 i (A). (exercise) In general, norms
that are unitarily invariant can be characterised by the singular values [20].
Now to the main problem: Given A 2 Rm⇥n , consider the problem of finding a rank-r
matrix (remember; these are matrices with r nonzero singular values) Ar 2 Rm⇥n that best
approximates A. That is, find the minimiser Ar for
argminrank(Ar )r kA Ar k2 . (3)
18