Matrix Note
Matrix Note
This note is not a supplementary material for the main slides. I will write notes such as this one
when certain concepts cannot be put on slides. This time, the aim is to elaborate upon subspace
concepts at a more fundamental level.
In words, span{a1 , . . . an } is the set of all possible linear combinations of the vectors a1 , . . . an . It
is easy to verify that span{a1 , . . . an } is a subspace for any given a1 , . . . an . In the literature, span
is commonly used to represent a subspace. For example, we can represent Rm by
Rm = span{e1 , . . . , em }.
In fact, any subspace can be written as a span:
Theorem 1.1 For every subspace S ⊆ Rm , there exists a positive integer n and a collection of
vectors a1 , . . . , an ∈ Rm such that S = span{a1 , . . . , an }.
In the literature you would notice that we take the result in Theorem 1.1 for granted—in a way that
is almost like a common sense and without elaboration. There is an easy way to prove Theorem 1.1,
but the proof requires some result in linear independence. We relegate the proof to the later part
of this note.
Otherwise, it is called linearly dependent. A subset {ai1 , . . . , aik }, where {i1 , . . . , ik } ⊆ {1, . . . , n}
with ij 6= il for any j 6= l, and 1 ≤ k ≤ n, is called a maximal linearly independent subset of
{a1 , . . . , an } if it is linearly independent and is not contained by any other linearly independent
subset of {a1 , . . . , an }. From the above definitions, we see that
1
1. if {a1 , . . . , an } is linearly independent, then any aj cannot be a linear combination of the set
of the other vectors {ai }i∈{1,...n},i6=j ;
2. if {a1 , . . . , an } is linearly dependent, then there exists a vector aj such that it is a linear
combination of the other vectors;
Thus, roughly speaking, we may see a linearly independent {a1 , . . . , an } as a non-redundant (or
sufficiently different) set of vectors, and a maximal linearly independent {ai1 , . . . , aik } as an irre-
ducibly non-redundant set of vectors for representing the whole vector set {a1 , . . . , an }. It can be
easily shown that for any maximal linearly independent subset {ai1 , . . . , aik } of {a1 , . . . , an }, we
have
span{a1 , . . . , an } = span{ai1 , . . . , aik }.
We also have the following results which are very basic and we again take them almost for
granted:
The first result is simple: if there exists a β 6= α such that y = ni=1 βi ai , then we have ni=1 (αi −
P P
βi )ai = y − y = 0, which contradicts the linear independence of {a1 , . . . , am }. For the second
result, I show you in the proof in [2]. The proof is done by induction. Suppose m = 1. Then it is
easy to see that n ≤ 1 must hold. Next suppose m ≥ 2. We need to show that n ≤ m. Partition
the ai ’s as
b
ai = i , i = 1, . . . , n,
ci
where bi ∈ Rm−1 , ci ∈ R. If c1 = . . . = cn = 0, then the linear independence of {a1 , . . . , an } implies
that {b1 , . . . , bn } is also linearly independent. By induction, we know that for any collection of n
linearly independent vectors in Rm−1 , it must hold that n ≤ m − 1. It follows that {b1 , . . . , bn }
must satisfy n ≤ m − 1 too; hence, we get n ≤ m. If ci 6= 0 for some i, we need more work. Assume
w.l.o.g. that cn 6= 0 (we can always reshuffle the ordering of a1 , . . . , an ). For any α1 , . . . , αn−1 ∈ R,
choose αn as follows
n−1
!
1 X
αn = − αi ci .
cn
i=1
2
Let b̃i = bi − (ci /cn )bn for ease of explanation. From the above equation, we see that the linear
independence of {a1 , . . . , an } implies that {b̃1 , . . . , b̃n−1 } is linearly independent. By induction, we
know that {b̃1 , . . . , b̃n−1 } must satisfy n − 1 ≤ m − 1; hence, we get n ≤ m. The proof is complete.
The first and second results above are easy to verify. The proof of the third result is constructive
and is a consequence of the Gram-Schmidt procedure; this will be covered later.
3
2 Basis and Dimension
Let S ⊆ Rm , S 6= {0}, be a subspace. A set of vectors {b1 , . . . , bk } ⊂ Rm is called a basis for S
if {b1 , . . . , bk } is linearly independent and S = span{b1 , . . . , bk }. Taking S = span{a1 , . . . , an } as
an example, any maximal linearly independent subset of {a1 , . . . , an } is a basis for S. From the
definition of bases, the following facts can be shown:
1. A subspace may have more than one basis.
2. A subspace always has an orthonormal basis (if it does not equal {0}).
3. All bases for a subspace S have the same number of elements; i.e., if {b1 , . . . , bk } and
{c1 , . . . , cl } are both bases for S, then k = l.
Given a subspace S ⊆ Rm with S = 6 {0}, the dimension of S is defined as the number of
elements of a basis for S. Also, by convention, the dimension of the subspace {0} is defined as zero.
The notation dim S is used to denote the dimension of S. Some examples are as follows: We have
dim Rm = dim span{e1 , . . . , em } = m. If {ai1 , . . . , aik } is a maximal linearly independent subset
of {a1 , . . . , an }, then dim span{a1 , . . . , an } = k. We have the following properties for subspace
dimension.
Note that the notation X + Y denotes the sum of the sets X and Y; i.e., X + Y = {x + y | x ∈
X , y ∈ Y}.
4
Theorem 1.2 Let S be a subspace of Rm .
1. For every y ∈ Rm , there exists a unique vector ys ∈ S that minimizes kz − yk2 over all z ∈ S.
Thus, we can write ΠS (y) = arg minz∈S kz − yk22 .
2. Given y ∈ Rm , we have ys = ΠS (y) if and only if
ys ∈ S, zT (ys − y) = 0, for all z ∈ S. (1)
The above theorem is a special case of the projection theorem in convex analysis and optimiza-
tion [1, Proposition B.11], which deals with projections onto closed convex sets. In the following
we provide a proof that is enough for the subspace case.
Proof: First, we should mention that there always exists a vector in S at which the minimum
of kz − yk22 over all z ∈ S is attained; in this claim we only need S to be closed. This result is
shown by applying the Weierstrass theorem, and readers are referred to [1, proof of Proposition
B.11] for details.
Second, we show the sufficiency of Statement 2. Let ys ∈ S be a vector that minimizes kz − yk2
over all z ∈ S. Since kz − yk22 ≥ kys − yk22 for all z ∈ S, and
kz − yk22 = kz − ys + ys − yk22
= kz − ys k22 + 2(z − ys )T (ys − y) + kys − yk22 ,
we have
kz − ys k22 + 2(z − ys )T (ys − y) ≥ 0, for all z ∈ S.
The above equation is equivalent to
kzk22 + 2zT (ys − y) ≥ 0, for all z ∈ S;
the reason is that z ∈ S implies z − ys ∈ S, and the converse is also true. Now, suppose that
there exists a point z̄ ∈ S such that z̄T (ys − y) 6= 0. Then, by choosing z = αz̄, where α =
−z̄T (ys −y)/kz̄k22 , one can verify that kzk22 +2zT (ys −y) < 0 and yet z ∈ S. Thus, by contradiction,
we must have zT (ys − y) = 0 for all z ∈ S.
Third, we show the necessity of Statement 2. Suppose that there exists a vector ȳs ∈ S such
that zT (ȳs − y) = 0 for all z ∈ S. The aforementioned condition can be rewritten as
(z − ȳs )T (ȳs − y) = 0, for all z ∈ S,
where we have used the equivalence z − ȳs ∈ S ⇐⇒ z ∈ S. Now, for any z ∈ S, we have
kz − yk22 = kz − ȳs k22 + 2(z − ȳs )T (ȳs − y) + kȳs − yk22
= kz − ȳs k22 + kȳs − yk22
≥ kȳs − yk22 . (2)
The above inequality implies that ȳs minimizes kz − yk2 over all z ∈ S. This, together with
the previous sufficiency proof, completes the proof of Statement 2. In addition, we note that the
equality in (2) holds if and only if z − ȳs = z. This implies that ȳs is the only minimizer of kz − yk2
over all z ∈ S. Thus, we also obtain the uniqueness claim in Statement 1. .
Theorem 1.2 has many implications, e.g., in least squares and orthogonal projections which we
will learn in later lectures. In the next section we will consider an application of Theorem 1.2 to
subspaces.
5
4 Orthogonal Complements
Let S a nonempty subset in Rm . The orthogonal complement of S is defined as the set
S ⊥ = y ∈ Rm | zT y = 0 for all z ∈ S .
It is easy to verify that S ⊥ is always a subspace even if S is not. From the definition, we see that
1. any z ∈ S, y ∈ S ⊥ are orthogonal, i.e., S ⊥ consists of all vectors that are orthogonal to all
vectors of S;
The following theorem is a direct consequence of the projection theorem in Theorem 1.2.
y = ys + y c .
Proof: Let us rephrase the problem in Statement 1 as follows: Given a vector y ∈ Rm , find
a vector ys such that ys ∈ S and y − ys ∈ S ⊥ . This problem is exactly the same as (1) in
Theorem 1.2. Thus, by Theorem 1.2 the solution is uniquely given by ys = ΠS (y). Also, we have
yc = y − ΠS (y) ∈ S ⊥ .
To show Statement 2, let us consider the following problem: Given a vector y ∈ Rm , find a
vector ȳc such that ȳc ∈ S ⊥ and y − ȳc ∈ (S ⊥ )⊥ , or equivalently,
By Theorem 1.2, the solution is uniquely given by ȳc = ΠS ⊥ (y). On the other hand, the above
conditions are seen to satisfy if we choose ȳc = y−ΠS (y) ∈ S ⊥ . It follows that ΠS ⊥ (y) = y−ΠS (y).
movmov
Armed with Theorem 1.3, we can easily prove the following results.
1. S + S ⊥ = Rm ;
2. dim S + dim S ⊥ = m;
3. (S ⊥ )⊥ = S. 1
1
We also have the following result: Let S be any subset (and not necessarily a subspace) in Rm . Then, we have
(S ⊥ )⊥ = span S, where span S is defined as the set of all finite linear combinations of points in S.
6
Proof: Statement 1 is merely a consequence of the first statement in Theorem 1.3. For State-
ment 2, let us assume S does not equal either {0} or Rm ; the cases of {0} and Rm are trivial.
Let {u1 , . . . , vk } and {v1 , . . . , vl } be orthonormal bases of S and S ⊥ , respectively; here note that
k = dim S, l = dim S ⊥ . We can write
S + S ⊥ = span{u1 , . . . , uk , v1 , . . . , vl }.
Also, from the definition of orthogonal complements, it is immediate that uTi vj = 0 for all i, j.
Hence, {u1 , . . . , uk , v1 , . . . , vl } is an orthonormal basis for S + S ⊥ , and consequently we have
dim(S + S ⊥ ) = k + l. Moreover, since S + S ⊥ = Rm , we also have dim(S + S ⊥ ) = m. The result
m = dim S + dim S ⊥ therefore holds.
The proof of Statement 3 is as follows. One can verify from the orthogonal complements
definition that y ∈ S implies y ∈ (S ⊥ )⊥ . On the other hand, any y ∈ (S ⊥ )⊥ satisfies
y = Π(S ⊥ )⊥ (y)
= y − ΠS ⊥ (y) = y − (y − ΠS (y))
= ΠS (y),
where the first equation above is due to the equivalence y ∈ S ⇐⇒ y = ΠS (y) (which is easy to
verify), and the second equation above is due to the second statement of Theorem 1.3. The equality
y = ΠS (y) shown above implies that y ∈ S.
The above result can be shown by Property 1.2. First, we use N (A) = R(AT )⊥ . Second, from the
second result in Property 1.2 we have
Third, it is known that dim R(AT ) = rank(AT ) = rank(A). Combining these results together gives
dim N (A) = n − rank(A).
We close this section by mentioning one more property.
This property is left as an exercise. I should mention that this is a challenging exercise.
5 Gram Schmidt
We consider the Gram-Schmidt (GS) procedure. Note that the following explanation is different
from that in the lecture slides; we use the perspective of projection to explain how the GS procedure
works.
7
5.1 GS as a Matrix Decomposition Result
Let A ∈ Rm×n be a full-column rank matrix (or a matrix with linearly independent columns). The
matrix A can be decomposed as
A = QR, (3)
for some semi-orthogonal Q ∈ Rm×n and upper triangular R ∈ Rn×n with rii > 0 for all i The
procedure for performing such decomposition is called the GS procedure. From the matrix product
r11 r12 . . . r1,n−1 r1,n
..
r22 . Pn
.. = r11 q1 r12 q1 + r22 q2 . . .
QR = [ q1 q2 . . . qn ] .. ri,n qi
. . i=1
rn−1,n−1 rn−1,n
rnn
a1 = r11 q1 ,
a2 = r12 q1 + r22 q2 ,
..
.
ak = r1k q1 + · · · + rkk qk ,
..
.
an = r1,n q1 + · · · + rnn qn .
Property 1.4 Let A ∈ Rm×n , B ∈ Rn×p . If B has full row rank, then R(AB) = R(B).
Property 1.5 Let R ∈ Rn×n be upper triangular with rii 6= 0 for all i. It holds that R(R) = Rn .
Or, R has full rank.
The proof of Property 1.4 is left an exercise for you. The proof of Property 1.5 is shown by the end
of this subsection. Applying Properties 1.4 and 1.5 to (4) gives
S = R(Q).
8
We therefore confirm that every subspace has an orthonormal basis.
Proof of Property 1.5: There are two ways to prove Property 1.5. The first proof is to utilize
the property that if R has nonzeroQ determinant, then R is invertible (or has full rank). Since R
is triangular, we have det(R) = ni=1 rii 6= 0. Hence, we come to the first conclusion R has full
rank. Using the property that a subspace S ⊆ Rn has dim S = n if and only if S = Rn , we have
the second conclusion that R(R) = Rn .
Our second proof goes as follows. We want to show that, given any vector y ∈ Rn , there exists
an x ∈ Rn such that y = Rx. If this is true, then we have Rn = R(R); consequently we also have
dim R(R) = n which is equivalent to saying that R has full rank. Suppose that y = Rx holds. We
can write
y1 r11 r1,n−1 r1,n x1
.. .. .. ..
. . .
= .
yn−1 rn−1,n−1 rn−1,n xn−1
yn rnn xn
From the above equation, we see that we can recursively calculate the components of x by
xn = yn /rnn
xn−1 = (yn−1 − rn−1,n xn )/rn−1,n−1
..
.
x1 = (y1 − ni=2 xi ri,n )/r11 .
P
The above equations indicate that, given any y ∈ Rn , there exists an x ∈ Rn such that y = Rx.
The proof is complete.
Property 1.6 Let Q ∈ Rm×n be semi-othogonal and let S = R(Q). Given any vector y ∈ Rm , it
holds that ΠS (y) = QQT y.
The proof of Property 1.6 is left as an exercise. The GS decomposition is done by executing a
number of n steps in a sequential fashion.
Step 1: Let q1 = a1 /ka1 k2 and r11 = ka1 k2 . Note that a1 6= 0 because {a1 , . . . , an } is linearly
independent. We have
a1 = r11 q1 . (5)
Also, to facilitate the subsequent proof, let S1 = span{a1 } = span{a1 }.
Step 2: According to Theorem 1.3.1, we can decompose a2 as
9
Note that if ΠS ⊥ (a2 ) = 0, then we have a2 ∈ S1 which further implies that {a1 , . . . , an } is linearly
1
dependent. Since we assume linearly independent {a1 , . . . , an }, we must have ΠS ⊥ (a2 ) = 0. By
1
Property 1.6 it holds that
ΠS ⊥ (a2 ) = (qT1 a2 ) q1 .
1 | {z }
:=r12
Let q̃2 = ΠS ⊥ (a2 ) = a2 − r12 q1 , and let q2 = q̃2 /kq̃2 k2 , r22 = kq̃2 k2 . We have
1
Since q1 ∈ S1 and q2 ∈ S1⊥ , q1 and q2 are orthogonal to each other. Let S2 = span{a1 , a2 }. We
have
r11 r12
S2 = R([ a1 a2 ]) = R [ q1 q2 ] = span{q1 , q2 },
r22
where the second equality is due to (5) and (6), and the third equality is due to Properties 1.4 and
1.5.
Let us proceed to Step k + 1, k ≤ n − 1.
Step k + 1: Suppose that, in Step k, we have obtained
r11 r1,k−1 r1,k
.. ..
Qk = [ q1 , . . . , qk ], Rk = . . Sk = span{a1 , . . . , ak } = R(Qk ),
,
rk−1,k−1 rk−1,k
rkk
We must have q̃k+1 6= 0; if q̃k+1 = 0, we have ak+1 ∈ Sk which implies that {a1 , . . . , ak+1 } is
linearly dependent. By Property 1.6,
10
It should be noted that Qk+1 is semi-orthogonal. This is because (i) {q1 , . . . , qk } is orthonormal,
and (ii) q1 , . . . , qk ∈ Sk and qk+1 ∈ Sk imply that qTi qk+1 = 0 for all 1 ≤ i ≤ k. We have
References
[1] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Mass., U.S.A., 2nd
edition, 1999.
[2] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra – Vectors, Matrices, and
Least Squares. 2018. Available online at [Link]
11
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term
1 z1 z12 . . . z1k−1
1 z2 z 2 . . . z k−1
2 2 k
B = . .. ∈ C ,
.. .
1 zk zk2 . . . zkk−1
with z1 , . . . , zk ∈ C. We will show that B is nonsingular if zi ’s are distinct. For now, let us assume
this to be true and focus on showing the linear independence of A. If m ≥ n, we can represent A
by
AT = [ B × ]
with k = n; here, “×” means parts that do not matter. By the rank definition, we have rank(A) =
rank(AT ) ≥ rank(B) = n. Since we also have rank(A) ≤ n, we obtain the result rank(A) = n.
Moreover, if m ≤ n we can represent A by
A = [ BT × ]
with k = m. Following the same argument as above, we obtain rank(A) = m. Thus we have
established the result that A has full rank.
Now, we show that B is nonsingular if zi ’s are distinct. Observe that
Bα = 0 ⇐⇒ p(zi ) = 0, i = 1, . . . , k (1)
where
p(z) = α1 + α2 z + α3 z 2 + . . . + αk z k−1
denotes a polynomial of degree k − 1. On one hand, the condition on the R.H.S. of (1) implies that
z1 , . . . , zk are the roots of p(z). On the other hand, it is known that a polynomial of degree k − 1
has k − 1 roots, and no more. Consequently, the above two statements contradict to each other if
we have zi 6= zj for all i, j with i 6= j. Hence, we have shown that B must be nonsingular if zi ’s are
distinct.
1
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term
1 Eigenvalue Problem
The eigenvalue problem is as follows. Given A ∈ Cn×n , find a vector v ∈ Cn , v 6= 0, such that
Av = λv, (1)
for some λ ∈ C. If we can find a 2-tuple (v, λ) such that (1) holds, we say that (v, λ) is an eigen-pair
of A, λ is an eigenvalue of A, and v is an eigenvector of A associated with λ. From (1), we observe
two facts. First, (1) is equivalent to finding a λ such that
Let us consider the solution to (2). Denote p(λ) = det(A − λI). From the definition of
determinant, one can verify that p(λ) is a polynomial of degree n. Since a polynomial of degree n
has n roots, we may write
Yn
p(λ) = (λi − λ),
i=1
where λ1 , . . . , λn are the roots of p(λ). The above equation shows that (2) holds if and only if
λ = λi for any i = 1, . . . , n. Thus, we conclude that the eigenvalue problem in (1) always has a
solution, and that the roots λ1 , . . . , λn of p(λ) are the solutions to (1) . For convenience, let us
denote
Avi = λi vi , i = 1, . . . , n, (4)
where vi denotes an eigenvector associated with λi .
Next, we take a look at the solution to (3) w.r.t. v, given an eigenvalue λ. Since N (A − λI) is
a subspace and N (A − λI) 6= {0}, we can represent it by N (A − λI) = span{b1 , . . . , br } for some
basis {b1 , . . . , br } ⊂ Cn and for some r ≥ 1. In particular, the dimension of N (A − λI) is r. From
the above representation, we notice that there may be multiple eigenvectors associated with the
same λ. We should be careful when we describe the multiplicity of eigenvectors: A scaled version
of an eigenvector v, i.e., αv for some α ∈ C, α 6= 0, is also an eigenvector, but such a case is trivial.
From such a viewpoint, the eigenvector v associated with λ is unique subject to a complex scaling
if dim N (A − λI) = 1. Moreover, for dim N (A − λI) > 1, there are infinitely many eigenvectors
associated with λ even if we do not count the complex scaling cases; however, we can find a number
of r = dim N (A − λI) linearly independent eigenvectors associated with λ. Also, dim N (A − λI)
is the maximal number of linearly independent eigenvectors we can obtain for λ.
1
2 Multiplicity of Eigenvalues and Eigenvectors
We are concerned with the multiplicity of an eigenvalue and the multiplicity of its associated
eigenvectors. For ease of exposition of ideas, let us assume w.l.o.g. that the eigenvalues λ1 , . . . , λn
are ordered such that {λ1 , . . . λk }, k ≤ n, is the set of all distinct eigenvalues of A; i.e., λi 6= λj
for all i, j ∈ {1, . . . , k} with i 6= j, and λi ∈ {λ1 , . . . λk } for all i ∈ {1, . . . , n}. Then, consider the
following definitions:
• The algebraic multiplicity of an eigenvalue λi , i ∈ {1, . . . , k}, is defined as the number of times
that λi appears as a root of p(λ). We will denote the algebraic multiplicity of λi as µi .
• The geometric multiplicity of λi , i ∈ {1, . . . , k}, is defined as the maximal number of linearly
independent eigenvectors associated with λi . We will denote the geometric multiplicity of λi
as γi , and note that γi = dim N (A − λi I).
Intuitively, it seems that if an eigenvalue is repeated r times, then we should also have r (linearly
independent) eigenvectors associated with it. We will show that
Proof of Property 3.1: For convenience, let λ̄ ∈ {λ1 , . . . , λk } be any eigenvalue of A, and denote
r = dim N (A − λ̄I). We aim to show that the characteristic polynomial det(A − λI) has at least r
repeated roots for λ = λ̄. From basic subspace concepts (cf. Lecture 1), we can find a collection of
orthonormal vectors q1 , . . . , qr ∈ N (A−λI) and a collection of vectors qr+1 , . . . , qn ∈ Cn such that
Q = [ q1 , . . . , qn ] is unitary. Let Q1 = [ q1 , . . . , qr ], Q2 = [ qr+1 , . . . , qn ], and note Q = [ Q1 Q2 ].
We have H H
Q1 AQ1 QH
H Q1 1 AQ2
Q AQ = [ AQ1 AQ2 ] = .
QH
2 QH
2 AQ1 Q2 AQ2
H
Since Aqi = λ̄qi for i = 1, . . . , r, we get AQ1 = λ̄Q1 . By also noting that QH
1 Q1 = I and
QH2 Q1 = 0, the above matrix equation can be simplified to
λ̄I QH
H 1 AQ2
Q AQ = .
0 QH 2 AQ2
2
Consequently, we have the following equivalence for the characteristic polynomial of A:
where the second equality is due to the determinant result for block upper triangular matrices; here
note that det(QH
2 AQ2 − λI) is a polynomial of degree of n − r. From the above equation we see
that det(A − λI) has at least r repeated roots for λ = λ̄. The proof is complete.
B = S−1 AS.
Similar matrices are similar in the sense that their characteristic polynomials are the same. Specif-
ically, if A is similar to B then we have
3
3.2 Eigendecomposition
We are now ready to consider eigendecomposition. A matrix A ∈ Cn×n is said to admit an
eigendecomposition if there exists a nonsingular V ∈ Cn×n such that
A = VΛV−1 ,
3. A does not admit an eigendecomposition if µi > γi for some i ∈ {1, . . . , k}. Such instances
exist as discussed in the last section.
Before we close this section, we should mention that eigendecomposition is guaranteed to exist
in some matrix subclasses. For example, a circulant matrix always admits an eigendecomposition.
Also, we will show later that it is easy to find an arbitrarily close approximation of a given matrix
A such that the approximate matrix has its eigenvalues all being distinct and thus admits an
eigendecomposition.
for some α 6= 0. Let us assume w.l.o.g. that α1 6= 0, k ≥ 2. From (5), we obtain two equations
k k
!
X X
0=A α i vi = α i λ i vi , (6)
i=1 i=1
k k
!
X X
0 = λk α i vi = αi λk vi . (7)
i=1 i=1
4
Subtracting (6) from (7) results in
k−1
X
αi (λk − λi )vi = 0. (8)
i=1
Since α1 6= 0 and λi 6= λj for all i, j ∈ {1, . . . , k}, i 6= j, the above equation holds only when v1 = 0.
This contradicts the fact that v1 is an eigenvector, and the proof is complete.
2. Suppose that λi ’s are ordered such that {λ1 , . . . , λk }, k ≤ n, is the set of all distinct eigenvalues
of A. Also, let vi be any eigenvector associated with λi . Then v1 , . . . , vk must be orthonormal.
vH Av = vH (λv) = λkvk22 .
The above two equations implies that λ = λ∗ , or that λ is real. Moreover, consider the following
equations for any i, j ∈ {1, . . . , k}, i 6= j:
where in the second equation we have used the fact that A is Hermitian and its corresponding
eigenvalues are real. The above equations imply (λi − λj )vjH vi = 0. Since λi 6= λj , it must hold
that vjH vi = 0. Hence, we have shown that any collection of v1 , . . . , vk must be orthonormal.
Note that as a direct corollary of the first result of Property 3.3, any eigenvector of a real
symmetric matrix can be taken as real. Next, we present the main result.
5
Theorem 3.1 Every A ∈ Hn admits an eigendecomposition
A = VΛVH ,
The proof of Theorem 3.1 requires another theorem, namely, the Schur deocmposition theorem.
We will consider the Schur decomposition in the next section.
Theorem 3.2 Let A ∈ Cn×n , and let λ1 , . . . , λn be its eigenvalues. The matrix A admits a
decomposition
A = UTUH , (9)
for some unitary U ∈ Cn×n and for some upper triangular T ∈ Cn×n with tii = λi for all i. If A
is real and λ1 , . . . , λn are all real, U and T can be taken as real.
The decomposition in (9) will be called the Schur decomposition in the sequel. The Schur
decomposition not only shows that any square matrix is similar to an upper triangular matrix, it
also reveals that the “triangularizer” S can be unitary.
Lemma 3.1 Let X ∈ Cn×n and suppose that X takes a block upper triangular form
X11 X12
X= ,
0 X22
6
where X11 ∈ Ck×k , X12 ∈ Ck×(n−k) , X22 ∈ C(n−k)×(n−k) , with 0 ≤ k < n. There exists an unitary
U ∈ Cn×n such that
H X11 Y12 λ̄ ×
U XU = , Y22 = ,
0 Y22 0 ×
for some Y12 ∈ Ck×(n−k) , Y22 ∈ C(n−k)×(n−k) , λ̄ ∈ C.
Proof of Lemma 3.1: The proof may be seen as a variation of the proof of Property 3.1 in
Section 2. Let λ̄ be any eigenvalue of X22 , and v ∈ Cn−k be an eigenvector of X22 associated with λ̄.
Following the same proof as in Property 3.1, there exists a collection of vectors q2 , . . . , qn−k ∈ Cn−k
such that Q = [ v, q2 , . . . , qn−k ] ∈ C(n−k)×(n−k) is unitary, and it can be shown that QH X22 Q
takes the form
H λ̄ ×
Q X22 Q = .
0 ×
Now, let
I 0
U= .
0 Q
We have
H I 0 X11 X12 I 0
U XU =
0 QH 0 X22 0 Q
I 0 X11 X12 Q
=
0 QH 0 X22 Q
X11 X12 Q
= .
0 QH X22 Q
The idea of proving the Schur decomposition in Theorem 3.2 is to apply Lemma 3.1 recursively.
To put into context, let A0 = A and consider the following iterations:
A1 = UH
1 A0 U1
A2 = UH
2 A1 U2
..
.
An−1 = UH
n−1 An−2 Un−1
where every Ui is unitary and obtained by applying Lemma 3.1 with X = Ai−1 and k = i − 1.
From Lemma 3.1 we observe that Ai takes the form
Tii ×
Ai = ,
0 ×
for some upper triangular Tii ∈ Ci×i . Hence, it follows that An−1 is upper triangular. Let
U = U1 U2 · · · Un−1 .
It can be verified that U is unitary. By noting that An−1 = UH AU, we obtain the Schur decom-
position formula A = UTUH where T = An−1 .
7
We should also mentionQhow the result tii = λi for all i is concluded. By the similarity of A and
T and by det(T − λI) = ni=1 (tii − λ), the n roots t11 , . . . , tnn of det(T − λI) must be λ1 , . . . , λn
or its permutation. Hence, we can assume w.l.o.g. that tii = λi for i = 1, . . . , n.
Furthermore, it can be verified that if A is real and its eigenvalues λ1 , . . . , λn are also real, then
we can choose U as real orthogonal and T as real in the Schur decomposition. This is left as a
self-practice problem for you.
• Computations of the Schur decomposition: The proof of the Schur decomposition is construc-
tive, and it also tells how we may write an algorithm to compute the Schur factors U and T.
From the proof in the last subsection, we see that we need two sub-algorithms to construct
U and T, namely, i) an algorithm for computing an eigenvector of a given matrix, and ii) an
algorithm that finds a unitary matrix Q such that its first column vector is fixed as a given
vector v. Task i) may be done by the power method, while Task ii) can be accomplished by
the so-called QR decomposition—which we will study later.
It should be further mentioned that the procedure mentioned above is arguably not the
best approach for computing the Schur factors, although it is top-down, insightful and easy
to understand. There exist computationally more efficient methods for computing the Schur
factors, and they were established based on some rather specific and sophisticated ideas which
are beyond the scope of this course. The only keyword I can give you is QR decomposition.
• Proof of Theorem 3.1 and beyond: With the Schur decomposition, we can easily show that any
Hermitian matrix admits an eigendecomposition. Let A be Hermitian and let A = UTUH
be its Schur decomposition. We see that
• Existence of eigendecomposition: We have mentioned that even though A does not admit an
eigendecomposition, it is not hard to find an approximation of A such that the approximate
matrix admits an eigendecomposition. This is described as follows.
Proposition 3.1 Let A ∈ Cn×n . For every ε > 0, there exists a matrix à ∈ Cn×n such that
the n eigenvalues of à are distinct and
kA − ÃkF ≤ ε.
1
As a minor point to note, for the real symmetric case we first need the first result of Property 3.3 to confirm that
λi ’s are real.
8
The proof is simple and is follows: Let A = UTUH be the Schur decomposition of A, and
let à = U(T + D)UH where D = Diag(d1 , . . . , dn ) and d1 , . . . , dn are chosen such that
|di |2 ≤ ε/n for all i and such that t11 + d1 , . . . , tnn + dn are distinct. It is easy to verify that
kA − Ãk2F = kUDUH k2F = kDk2F ≤ ε2 .
The above proposition suggests that for any square A, we can always find a matrix à that
is arbitrarily close to A and admits an eigendecomposition.
• The Jordan canonical form: Recall that we have been asking the question of whether a
matrix can be similar to a diagonal matrix, and if not possible, a matrix that is closer to
the diagonal matrix. The Jordan canonical form says the following: any square A can be
decomposed as A = SJS−1 where S is nonsingular and J takes some kind of tri-diagonal
structures. The Jordon canonical form is beyond the scope of this course, but we should
note that it is an enhancement of the Schur decomposition. In a nutshell, the proof of the
Jordon canonical form first applies the Schur decomposition. Then, as the non-trivial part
of the proof, it shows that the Schur factor T is similar to another matrix J that takes the
tri-diagonal structure.
9
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term
The focus of this note is to give a more in-depth description of variational characterizations of
eigenvalues of real symmetric matrices. I will also provide the proof of some results concerning the
PSD matrix inequalities in the main lecture slides.
Our notations is as follows. The eigenvalues of a given matrix A ∈ Sn are denoted by
λ1 (A), . . . , λn (A), and they are assumed to be arranged such that
λmax (A) = λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λn (A) = λmin (A),
where we also denote λmin (A) and λmax (A) as the smallest and largest eigenvalues of A, respec-
tively. If not specified, we will simply denote λ1 , . . . , λn , with λ1 ≥ . . . ≥ λn , as the eigenvalues of
A ∈ Sn . Also, we will denote VΛVT as the eigendecomposition of A without explicit mentioning.
xT Ax
min , (2)
x∈Rn ,x6=0 xT x
Note that the ratio xT Ax/xT x is called a Rayleigh quotient. Rayleigh quotients are invariant to
kxk22 ; one can see that
(αx)T A(αx) xT Ax
= ,
(αx)T (αx) xT x
for any α 6= 0 and x 6= 0. Without loss of generality, let us assume kxk2 = 1 and rewrite problems
(1) and (2) as
xT Ax
max = max xT Ax,
x∈Rn ,x6=0 xT x x∈Rn ,kxk2 =1
xT Ax
min = min xT Ax,
x∈R n ,x6=0 xT x x∈Rn ,kxk2 =1
1
respectively. The solutions to problems (1) and (2) are described in the following theorem.
Proof: This theorem is a direct consequence of eigendecomposition for real symmetric matrices.
By letting y = VT x, the term xT Ax can be expressed as
n
X
T T
x Ax = y Λy = λi |yi |2 .
i=1
Since
n
X n
X n
X
λn |yi |2 ≤ λi |yi |2 ≤ λ1 |yi |2
i=1 i=1 i=1
and kyk22 = kVT xk22 = kxk22 for any orthogonal V, we have (3). It follows that
xT Ax ≤ λmax , for all x ∈ Rn , kxk2 = 1, (6)
T n
x Ax ≥ λmin , for all x ∈ R , kxk2 = 1. (7)
It can be verified that the equalities in (6) and (7) are attained when x = v1 and x = vn ,
respectively. Thus, we also have shown (4) and (5).
Following the same proof as in the Rayleigh-Ritz theorem, it can be shown that
λ2 = max xT Ax;
x∈span{v2 ,...,vn }
kxk2 =1
However, the above eigenvalue characterization does not look very attractive as its feasible set
depends on the principal eigenvector. Let us consider another problem
max xT Ax,
x∈Rn ,kxk2 =1
wT x=0
2
where w ∈ Rn is given. By letting y = VT x again, we can derive a lower bound
max xT Ax = max yT Λy
x∈Rn ,kxk2 =1 y∈Rn ,kyk2 =1
wT x=0 (VT w)T y=0
≥ max yT Λy
y∈Rn ,kyk2 =1
(VT w)T y=0
y3 =y4 =...=yn =0
Here, there is a subtle issue that we should pay some attention—Eq. (9) is invalid if the subspace
span{VT w, e3 , . . . , en }⊥ equals {0}. Or, we need to make sure that there exists a nonzero vector
y in span{VT w, e3 , . . . , en }⊥ , for otherwise problem (9) is infeasible. But since
dim(span{VT w, e3 , . . . , en }⊥ ) = n − dim(span{VT w, e3 , . . . , en }) ≥ n − (n − 1) = 1,
there always exists a nonzero vector y ∈ span{VT w, e3 , . . . , en }⊥ . From (9), we further get
2
X
max yT Λy = max λi |yi |2 ≥ λ2 . (10)
y∈span{VT w,e3 ,...,en }⊥ y∈span{VT w,e3 ,...,en }⊥
i=1
kyk2 =1 kyk2 =1
max
n
xT Ax ≥ λ2 .
x∈R ,kxk2 =1
wT x=0
Moreover, we notice that the equality above is attained if we choose w = v1 ; specifically this is the
consequence of (8). Hence, we have a variational characterization of λ2 as follows
Theorem 4.5 (Courant-Fischer) Let A ∈ Sn , and let Sk denote any subspace of Rn and of
dimension k. For any k ∈ {1, . . . , n}, it holds that
Proof: We will show (11) only as the proof of (12) is similar to that of (11) (or take the proof
of (12) as a self-practice problem). First, we show that
3
It can be verified that
λk = max xT Ax.
x∈span{vk ,...,vn }
kxk2 =1
Also, since dim(span{vk , . . . , vn }) = n − k + 1, which means that span{vk , . . . , vn } is feasible to
the outer maximization problem on the left-hand side of (13), we obtain (13).
Second, we show that
min max xT Ax ≥ λk , (14)
Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1
≥ minn max yT Λy
V⊆R y∈V∩W
kyk2 =1
k
X
= minn max λi |yi |2
V⊆R y∈V∩W
kyk2 =1 i=1
≥ λk ,
and the proof is complete.
4
1.3 Implications of the Courant-Fischer Theorem
The Courant-Fischer theorem and its proof insight have led a collection of elegant results for
eigenvalue inequalities, and here we describe some of them. Let A, B ∈ Sn , z ∈ Rn . We have the
following results:
(b) (interlacing) λk+1 (A) ≤ λk (A ± zzT ) for k = 1, . . . , n − 1, and λk (A ± zzT ) ≤ λk−1 (A) for
k = 2, . . . , n;
(c) if rank(B) ≤ r, then λk+r (A) ≤ λk (A + B) for k = 1, . . . , n − r and λk (A + B) ≤ λk−r (A) for
k = r + 1, . . . , n;
(e) for any I = {i1 , . . . , ir } ⊆ {1, . . . , n}, λk+n−r (A) ≤ λk (AI ) ≤ λk (A) for k = 1, . . . , r.
(f) for any semi-orthogonal U ∈ Rn×r , λk+n−r (A) ≤ λk (UT AU) ≤ λk (A) for k = 1, . . . , r.
The details of the proof of the above results are left as self-practice problems for you, and
here are some hints: Result (a) is obtained by applying (3) to (11). Result (b) is shown via the
Courier-Fischer theorem; specifically we have
≥ min max xT Ax
Sn−k+1 ⊆Rn x∈Sn−k+1 ∩R(z)⊥ ,kxk2 =1
≥ minn max xT Ax
Sr ⊆R x∈Sr ,kxk2 =1
n−k≤r
= λk+1 (A),
where the second inequality is obtained by showing that dim(Sn−k+1 ∩ R(z)⊥ ) ≥ n − k (use the
result dim S1 + dim S2 − dim(S1 ∩ S2 ) = dim(S1 + S2 )). Result (c) is shown by applying Result
n
Pr fact thatT a matrix B ∈ S with rank(B) ≤ r can be written as a sum of r outer
(b) and using the
products B = i=1 µi ui ui . Result (d) is proven by applying Result (c). For Result (e), consider
the case of I = {1, . . . , r} for ease of exposition of ideas. We have
≥ min max y T AI y
Sl ⊆Rr y∈Sl , kyk2 =1
r−k+1≤l
= λk (AI ),
where one needs to show that dim(Sn−k+1 ∩span{e1 , . . . , er }) ≥ r −k +1; some care also needs to be
taken as we change the vector dimension in the second equation above. Result (f) is a consequence
of Result (e).
5
1.4 Maximization of a Sum of Rayleigh Quotients
In the previous subsections we consider maximization or minimization of a Rayleigh quotient. Here
we are interested in a problem concerning a sum of Rayleigh quotients:
r
X uT Auii
max , (16)
U∈Rn×r
i=1
uTi ui
ui 6=0 ∀i, uT
i uj =0 ∀i6=j
where we want the vectors u1 , . . . ur of the Rayleigh quotients to be orthogonal to each other. This
problem finds applications in matrix factorization and PCA, as we will see in the next lecture. As
in the previous treatment for Rayleigh quotients, we can rewrite problem (16) as
r r
X uT Aui X
max i
= max uTi Aui
U∈Rn×r
i=1
uTi ui U∈Rn×r
i=1
ui 6=0 ∀i, uT
i uj =0 ∀i6=j kui k2 =1 ∀i, uT
i uj =0 ∀i6=j
The following result describes an equivalence relation between problem (17) and eigenvalues.
Proof: This theorem can be easily shown by Result (f) in Section 1.3. Specifically, for any
semi-orthogonal U ∈ Rn×r we have
r
X r
X
tr(UT AU) = λi (UT AU) ≤ λi (A), (19)
i=1 i=1
where the inequality is owing to Result (f) in Section 1.3. Since the equality in (19) is attained by
U = [ v1 , . . . , vr ], we obtain the desired result.
To enrich your understanding of this topic, I also show you an alternative proof which does not
use Result (f) in Section 1.3. In essence, the main challenge lies in showing (19). Consider the
following problem
max tr(UT ΛU).
U∈Rn×r
UT U=I
It can be shown that the above problem is equivalent to the problem on the right-hand side of (18).
Now, observe that
Xn
tr(UT ΛU) = tr(UUT Λ) = [UUT ]ii λi ,
i=1
6
This leads us to consider a relaxation
n
X
max tr(UT ΛU) ≤ maxn xi λ i
U∈Rn×r , UT U=I x∈R
i=1
s.t. 0 ≤ xi ≤ 1, i = 1, . . . , n
X n
xi = r
i=1
In particular, we replace every [UUT ]ii by xi . One canPeasily see that the optimal value of the
problem on the right-hand side of the above equation is ri=1 λi . Hence, we have shown (19).
2 Matrix Inequalities
In this section the aim is to give the proof of some of the results we mentioned in PSD matrix
inequalities in the main lecture slides. To be specific, let A, B ∈ Sn and recall the following results.
X0 ⇐⇒ S0
Also, we have X 0 ⇐⇒ S 0.
where the first inequality is due to Weyl’s inequality, and the second inequality is because A − B
is PSD.
Proof of (b): Consider the matrix A − I. Since A − I = V(Λ − I)VT , the eigenvalues of A − I
are λ1 − 1, . . . , λn − 1. It follows that A − I is PSD if and only if λk − 1 ≥ 0 for all k, and that
A − I is PD if and only if λk − 1 > 0 for all k.
Proof of (c): The proof is the same as that of (b) and is omitted for brevity.
7
Proof of (d): Since A is PD, its square root A1/2 is invertible. We have
where Theorem 4.3 has been used to obtain the second equivalence. Let C = A−1/2 BA−1/2 , and
denote its eigendecomposition as C = QDQT where D = Diag(d1 , . . . , dn ). Since B is PD and
A1/2 is invertible, by Theorem 4.3 C is PD; hence, we have dk > 0 for all k. From the right-hand
side of (20) we further show the following equivalence
I − C 0 ⇐⇒ 1 ≥ dk , k = 1, . . . , n
1
⇐⇒ ≥ 1, k = 1, . . . , n
dk
⇐⇒ D−1 I
⇐⇒ Q(D−1 − I)QT 0
⇐⇒ C−1 − I 0
⇐⇒ A1/2 B−1 A1/2 − I 0
⇐⇒ B−1 − A−1 0,
where we have applied Theorem 4.3 multiple times and Results (b)–(c).
Proof of (e): Let
I 0
Y= .
−C−1 BT I
One can verify that
A − BC−1 BT
T 0
Y XY = .
0 C
Also, Y is nonsingular; from the block triangular structure of Y we see that det(Y) = 1. Hence,
by Theorem 4.3, X is PSD if and only if YT XY 0. The condition YT XY 0 is equivalent to
A − BC−1 BT 0 and C 0, and hence we have shown the desired result for the PSD case. The
PD case is shown by the same manner.
8
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term
In this note we give the detailed proof of some results in the main slides.
1 Proof of SVD
Recall the SVD theorem:
Theorem 5.1 Given any A ∈ Rm×n , there exists a 3-tuple (U, Σ, V) ∈ Rm×m × Rm×n × Rn×n
such that
A = UΣVT ,
U and V are orthogonal, and Σ takes the form
σi , i = j
[Σ]ij = ,
0, i = 6 j
The proof is as follows. First, consider the matrix product AAT . Since AAT is real symmetric
and PSD, by eigendecomposition we can express AAT as
Λ̃ 0 UT1
T T
= U1 Λ̃UT1 ,
AA = UΛU = U1 U2 (1)
0 0 UT2
where we assume that the eigenvalues are ordered such that λ1 ≥ . . . ≥ λr > 0 and λr+1 = . . . =
λp = 0, with r being the number of nonzero eigenvalues; U ∈ Rm×m denotes a corresponding
orthogonal eigenvector matrix; we partiton U as U = [ U1 U2 ], with U1 ∈ Rm×r and U2 ∈
Rm×(m−r) ; Λ̃ = Diag(λ1 , . . . , λr ). It is easy to verify from the decomposition above that
UT2 A = 0. (2)
To see this, we note from UT2 U1 = 0 that UT2 A(UT2 A)T = 0. By the simple result that BBT = 0
implies B = 0 (which is easy to show and whose proof is omitted here), we conclude that UT2 A = 0.
Second, construct the following matrices
p p
Σ̃ = Λ̃1/2 = Diag( λ1 , . . . , λr ), V1 = AT U1 Σ̃−1 ∈ Rn×r .
V1T V1 = I.
1
Third, consider the matrix product UT AV. We have
T
U1 AV1 UT1 AV2
UT AV =
UT2 AV1 UT2 AV2
Σ̃ 0
= .
0 0
where (2) and (3) have been used. By multiplying the above equation on the left by U and on the
right by VT , we obtain the desired result A = UΣVT . The proof is complete.
ŷ = Âx̂.
The problem is to analyze how the solution error x̂ − x scales with ∆A and ∆y.
This analysis problem can be tackled via SVD. To put into context, define
σmax (A)
κ(A) = ,
σmin (A)
which is called the condition number of A. Note that if A is close to singular, then σmin (A) will
be very small and we would expect a very large κ(A). We have the following result.
Theorem 5.2 suggests that for a sufficiently small error level ε, the relative solution error kx̂ −
xk2 /kxk2 tends to increase with the condition number κ(A). In particular, if εκ(A) ≤ 1/2, we may
simplify the relative solution error bound in Theorem 5.2 to
kx̂ − xk2
≤ 4εκ(A),
kxk2
where we can see that the error bound above scales linearly with κ(A).
2
Proof of Theorem 5.2: For notational convenience, denote ∆x = x̂ − x. The perturbed linear
system can be written as
(A + ∆A)(x + ∆x) = y + ∆y.
The above equation can be re-organized as
and then
∆x = A−1 (∆y − ∆Ax − ∆A∆x).
Let us take 2-norm on the above equation:
where we have used the norm inequality kAxk2 ≤ kAk2 kxk2 and the triangle inequality kx + yk2 ≤
kyk2 + kyk2 to obtain (4).
Next, we apply the assumptions k∆Ak2 /kAk2 ≤ ε and k∆yk2 /kyk2 ≤ ε to (4). We have
where the result kA−1 k2 = maxi=1,...,n 1/σi (A) = 1/σmin (A) has been used. The above inequality
can be rewritten as
(1 − εκ(A))k∆xk2 ≤ 2εκ(A)kxk2 ,
and if 1 − εκ(A) > 0 we can further rewrite
k∆xk2 2εκ(A)
≤ ,
kxk2 1 − εκ(A)
as desired.
3
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term
This note shows the proof of the properties and theorems in the main lecture slides.
Property 7.1 Let A, B ∈ Rn×n be lower triangular. Then, AB is lower triangular. Also, if A,
B have unit diagonal entries, then AB has unit diagonal entries.
Property 7.3 Let A ∈ Rn×n be nonsingular lower triangular. Then, A−1 is lower triangular with
[A−1 ]ii = 1/aii .
dkl = cTk bl .
where we recall that ek ’s are unit vectors. Also, since C = AT is upper triangular, we can employ
a similar representation
X k
ck = aki ei , i = 1, . . . , n.
i=1
Using the above representations, dkl can be expressed as
k
!T n
X X
dkl = aki ei bjl ej
i=1 j=l
k X
X n
= aki bjl eTi ej
i=1 j=l
1
By noting that eTi ej = 0 for all i 6= j, and eTi ei = 1, the above expression can be simplified to
0,
k<l
l
dkl = X
aki bil , k ≥ l
i=k
It follows that D is lower triangular. The above formula also indicates that if akk = bkk = 1 for all
1 ≤ k ≤ n, then dkk = akk bkk = 1 for all 1 ≤ k ≤ n.
for any i = 1, . . . , n, where Aij is a submatrix obtained by deleting the ith row and jth column
of A. Now, consider a lower triangular A. Let us choose i = 1 for the above cofactor expansion
formula
Xn
det(A) = a1j c1j = a11 det(A11 ).
i=1
By repeatedly applying the same cofactor expansion on the cofactors, we obtain det(A) = a11 a22 · · · ann .
2
2 Proof of Theorem 7.1
Let us recapitulate the theorem:
Theorem 7.1 A matrix A ∈ Rn×n has an LU decomposition if every principal submatrix A{1,...,k}
satisfies
det(A{1,...,k} ) 6= 0,
for k = 1, 2, . . . , n − 1. If the LU decomposition of A exists and A is nonsingular, then (L, U) is
unique.
From the development of Gauss elimination shown in the main slides, we see that the LU
(k−1)
decomposition of a given A exists (or can be constructed) if the pivots akk ’s are all nonzero. In
the following, we show that if every principal submatrix A{1,...,k} , 1 ≤ k ≤ n − 1, is nonsingular,
(k−1)
then akk is nonzero. Consider the matrix equation
A(k−1) = Mk−1 · · · M2 M1 A
for any 1 ≤ k ≤ n − 1. For convenience, let W = Mk−1 · · · M2 M1 . By Properties 7.1 and 7.3, W
is lower triangular with unit diagonal elements. By denoting Ai:j,k:l be a submatrix of A obtained
by keeping i, i + 1, . . . , j rows and k, k + 1, . . . , l columns of A, we can expand A(k−1) = WA as
" (k−1) (k−1)
#
A1:k,1:k A1:k,k+1:n W1:k,1:k 0 A1:k,1:k A1:k,k+1:n
(k−1) (k−1) =
A A Wk+1:n,1:k Wk+1:n,k+1:n Ak+1:n,1:k Ak+1:n,k+1:n
k+1:n,1:k k+1:n,k+1:n
Consequently, we have
(k−1)
det(A1:k,1:k ) = det(W1:k,1:k )det(A1:k,1:k ).
(k−1)
Note that A1:k,1:k is upper triangular, and W1:k,1:k is lower triangular with unit diagonal elements.
Thus, by Property 7.2, their determinants are
k
(k−1) (k−1)
Y
det(A1:k,1:k ) = aii , det(W1:k,1:k ) = 1,
i=1
(k−1)
respectively. It follows that if A1:k,1:k is nonsingular, then akk 6= 0.
We are also interested in proving the uniqueness of the LU decomposition. Suppose that A =
L1 U1 and A = L2 U2 are two LU decompositions of A. Also, assume that A is nonsingular. Then,
we claim that L1 , L2 , U1 and U2 are all nonsingular, and thus invertible: the nonsingularity of L1
and L2 follows from Property 7.2, as well as the fact that L1 and L2 are lower triangular with unit
diagonal elements; the nonsingularity of U1 and U2 can be deduced from the nonsingularity of L1
and L2 and the nonsingularity of A (how?). From A = L1 U1 = L2 U2 , we can write
L−1 −1
2 L1 = U2 U1 . (2)
3
Note that the left-hand side of the above equation is a lower triangular matrix, while the right-hand
side a upper triangular matrix (see Properties 7.1 and 7.3). Hence, (2) can only be satisfied when
L−1 −1 −1
2 L1 and U2 U1 are diagonal. Also, since L2 L1 has unit diagonal elements, we further obtain
L−2 L1 = I, and consequently, L1 = L2 . Moreover, from the above result, we also have U−2 U1 = I,
and then U1 = U2 . Thus, we have shown that the LU decomposition of a nonsingular A, if exists,
has to be unique.
(note that any lower triangular M (or L) with unit diagonal elements is invertible, as we have
discussed in the proof of Theorem 7.2). Since A is symmetric, M−1 AM−T is also symmetric. It
follows that M−1 LD is symmetric. By noting that M−1 L is lower triangular with unit diagonal
elements, the only possibility for M−1 LD to be symmetric is that M−1 LD is diagonal. Also, if A
is nonsingular, then it can be verified from A = LDMT that the diagonal matrix D is nonsingular.
As a result, M−1 L must be diagonal. Since M−1 L has unit diagonal elements, we further conclude
that M−1 L = I, or equivalently, L = M.
If A is PD, then any principal submatrix of A is PD—and nonsingular; see Lecture 4, page
15. Hence, by Theorem 7.1, the LU or LDM decomposition of a PD A always exists in a unique
sense. Also, by Theorem 7.2, the LDM decomposition can be simplified to the LDL decomposition
A = LDLT , where L, D is unique. It can be verified that for a PD A, we have di > 0 for all i (I
1
leave this as an exercise for you). By constructing G = LD 2 , we get A = GGT .