0% found this document useful (0 votes)
24 views36 pages

Matrix Note

Uploaded by

picmijovo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views36 pages

Matrix Note

Uploaded by

picmijovo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ENGG 5781: Matrix Analysis and Computations 2023-24 First Term

Lecture 1: Basic Concepts


Instructor: Wing-Kin Ma

This note is not a supplementary material for the main slides. I will write notes such as this one
when certain concepts cannot be put on slides. This time, the aim is to elaborate upon subspace
concepts at a more fundamental level.

1 Subspace and Linear Independence


1.1 Subspace
A nonempty subset S of Rm is called a subspace of Rm if, for any α, β ∈ R,
x, y ∈ S =⇒ αx + βy ∈ S.
m
It can be verified Pnthat if S ⊆ R is a subspace and a1 , . . . an ∈ S, then any linear combination of
a1 , . . . an , i.e., i=1 αi ai , where α ∈ Rn , also lies in S.
Given a collection of vectors a1 , . . . an ∈ Rm , the span of {a1 , . . . an } is defined as
 Xn 
n
span{a1 , . . . an } = y = αi ai α ∈ R .
i=1

In words, span{a1 , . . . an } is the set of all possible linear combinations of the vectors a1 , . . . an . It
is easy to verify that span{a1 , . . . an } is a subspace for any given a1 , . . . an . In the literature, span
is commonly used to represent a subspace. For example, we can represent Rm by
Rm = span{e1 , . . . , em }.
In fact, any subspace can be written as a span:
Theorem 1.1 For every subspace S ⊆ Rm , there exists a positive integer n and a collection of
vectors a1 , . . . , an ∈ Rm such that S = span{a1 , . . . , an }.
In the literature you would notice that we take the result in Theorem 1.1 for granted—in a way that
is almost like a common sense and without elaboration. There is an easy way to prove Theorem 1.1,
but the proof requires some result in linear independence. We relegate the proof to the later part
of this note.

1.2 Linear Independence


A set of vectors {a1 , . . . an } ⊂ Rm is said to be linearly independent if
n
X
αi ai 6= 0, for all α ∈ Rn with α 6= 0.
i=1

Otherwise, it is called linearly dependent. A subset {ai1 , . . . , aik }, where {i1 , . . . , ik } ⊆ {1, . . . , n}
with ij 6= il for any j 6= l, and 1 ≤ k ≤ n, is called a maximal linearly independent subset of
{a1 , . . . , an } if it is linearly independent and is not contained by any other linearly independent
subset of {a1 , . . . , an }. From the above definitions, we see that

1
1. if {a1 , . . . , an } is linearly independent, then any aj cannot be a linear combination of the set
of the other vectors {ai }i∈{1,...n},i6=j ;

2. if {a1 , . . . , an } is linearly dependent, then there exists a vector aj such that it is a linear
combination of the other vectors;

3. {ai1 , . . . , aik } is a maximal linearly independent subset of {a1 , . . . , an } if and only if


{ai1 , . . . , aik , aj } is linearly dependent for any j ∈ {1, . . . , n} \ {i1 , . . . , ik }.

Thus, roughly speaking, we may see a linearly independent {a1 , . . . , an } as a non-redundant (or
sufficiently different) set of vectors, and a maximal linearly independent {ai1 , . . . , aik } as an irre-
ducibly non-redundant set of vectors for representing the whole vector set {a1 , . . . , an }. It can be
easily shown that for any maximal linearly independent subset {ai1 , . . . , aik } of {a1 , . . . , an }, we
have
span{a1 , . . . , an } = span{ai1 , . . . , aik }.
We also have the following results which are very basic and we again take them almost for
granted:

1. Let {a1 , . . . , an } ⊂ Rm be a linearly independent vector set. Suppose y ∈ span{a1 , . . . , an }.


Then the coefficient α for the representation
n
X
y= αi ai
i=1
Pn
is unique; i.e., there does not exist a β ∈ Rn , β 6= α, such that y = i=1 βi ai .

2. If {a1 , . . . an } ⊂ Rm is linearly independent, then n ≤ m must hold.

The first result is simple: if there exists a β 6= α such that y = ni=1 βi ai , then we have ni=1 (αi −
P P
βi )ai = y − y = 0, which contradicts the linear independence of {a1 , . . . , am }. For the second
result, I show you in the proof in [2]. The proof is done by induction. Suppose m = 1. Then it is
easy to see that n ≤ 1 must hold. Next suppose m ≥ 2. We need to show that n ≤ m. Partition
the ai ’s as  
b
ai = i , i = 1, . . . , n,
ci
where bi ∈ Rm−1 , ci ∈ R. If c1 = . . . = cn = 0, then the linear independence of {a1 , . . . , an } implies
that {b1 , . . . , bn } is also linearly independent. By induction, we know that for any collection of n
linearly independent vectors in Rm−1 , it must hold that n ≤ m − 1. It follows that {b1 , . . . , bn }
must satisfy n ≤ m − 1 too; hence, we get n ≤ m. If ci 6= 0 for some i, we need more work. Assume
w.l.o.g. that cn 6= 0 (we can always reshuffle the ordering of a1 , . . . , an ). For any α1 , . . . , αn−1 ∈ R,
choose αn as follows
n−1
!
1 X
αn = − αi ci .
cn
i=1

Then we see that


n Pn−1 Pn−1 "P  #
ci
 n−1 ci
X
i=1 αi bi − i=1 cn α i bn i=1 αi bi − cn bn
αi ai = =
0 0
i=1

2
Let b̃i = bi − (ci /cn )bn for ease of explanation. From the above equation, we see that the linear
independence of {a1 , . . . , an } implies that {b̃1 , . . . , b̃n−1 } is linearly independent. By induction, we
know that {b̃1 , . . . , b̃n−1 } must satisfy n − 1 ≤ m − 1; hence, we get n ≤ m. The proof is complete.

1.3 Orthogonality and Orthonormality


A collection of vectors a1 , . . . , an ∈ Rm is said to be orthogonal if aTi aj = 0 for all i, j with i 6= j,
and orthonormal if aTi aj = 0 for all i, j with i 6= j and kai k2 = 1 for all i. Some basic results arising
from such a definition are as follows.
1. An orthogonal or orthonormal collection of vectors is also linearly independent.
2. Let {a1 , . . . , an } ⊂ Rm be an orthonormal set of vectors. Suppose y ∈ span{a1 , . . . , an }.
Then the coefficient α for the representation
n
X
y= αi ai
i=1

is uniquely given by αi = aTi y, i = 1, . . . , n.


3. Let {a1 , . . . , an } ⊂ Rm be a linearly independent set of vectors. There exists an orthonormal
set of vectors {q1 , . . . , qn } such that
span{a1 , . . . , an } = span{q1 , . . . , qn }.

The first and second results above are easy to verify. The proof of the third result is constructive
and is a consequence of the Gram-Schmidt procedure; this will be covered later.

1.4 Proof of Theorem 1.1


We now prove Theorem 1.1. Suppose that the subspace S does not equal {0}; the case of S = {0}
is trivial to prove. Let n be a positive integer and a1 , . . . , an ∈ S. We have
span{a1 , . . . , an } ⊆ S,
and the reason is that any linear combination of vectors in S also lies in S. Now we wish to show
that there exists a linearly independent collection of a1 , . . . , an ∈ S such that S ⊆ span{a1 , . . . , an }.
We adopt a constructive proof.
First, pick any nonzero vector a1 ∈ S. If any y ∈ S can be written as y = α1 a1 for some α1 ,
then we finish. If not, consider the following recursive process. Suppose that we have previously
picked a linearly independent collection of vectors a1 , . . . , ak−1 ∈ S, but there exists y ∈ S such
that y 6= k−1
P
i=1 α i ai for any αi ’s. Note that we have the case of k = 1 already. Then we pick ak as
any y ∈ S that satisfies y 6= k−1
P
i=1 αP
i ai for any αi ’s. Clearly, a1 , . . . , ak are linearly independent.
If any y ∈ S can be written as y = ki=1 αi ai for some αi ’s, then it means S ⊆ span{a1 , . . . , ak }
and we finish. Otherwise, we increase k by one and repeat the above steps.
The question is whether the recursive process above will stop; if yes, our proof is complete.
Suppose that the process has reached an iteration number of k > m. On one hand, the process says
that {a1 , . . . , ak } is linearly independent. On the other hand, if {a1 , . . . , ak } is linearly independent,
then k ≤ m must hold (a take-it-for-granted fact in linear independence). Hence, by contradiction,
the process must stop; k = m is the largest possible iteration number. The proof is complete.

3
2 Basis and Dimension
Let S ⊆ Rm , S 6= {0}, be a subspace. A set of vectors {b1 , . . . , bk } ⊂ Rm is called a basis for S
if {b1 , . . . , bk } is linearly independent and S = span{b1 , . . . , bk }. Taking S = span{a1 , . . . , an } as
an example, any maximal linearly independent subset of {a1 , . . . , an } is a basis for S. From the
definition of bases, the following facts can be shown:
1. A subspace may have more than one basis.
2. A subspace always has an orthonormal basis (if it does not equal {0}).
3. All bases for a subspace S have the same number of elements; i.e., if {b1 , . . . , bk } and
{c1 , . . . , cl } are both bases for S, then k = l.
Given a subspace S ⊆ Rm with S = 6 {0}, the dimension of S is defined as the number of
elements of a basis for S. Also, by convention, the dimension of the subspace {0} is defined as zero.
The notation dim S is used to denote the dimension of S. Some examples are as follows: We have
dim Rm = dim span{e1 , . . . , em } = m. If {ai1 , . . . , aik } is a maximal linearly independent subset
of {a1 , . . . , an }, then dim span{a1 , . . . , an } = k. We have the following properties for subspace
dimension.

Property 1.1 Let S, S1 , S2 ⊆ Rm be subspaces.


1. If S1 ⊆ S2 , then dim S1 ≤ dim S2 .
2. If S1 ⊆ S2 and dim S1 = dim S2 , then S1 = S2 .
3. We have dim S = m if and only if S = Rm (note: this is a consequence of Statement 3).
4. dim(S1 + S2 ) + dim(S1 ∩ S2 ) = dim S1 + dim S2 .
5. dim(S1 + S2 ) ≤ dim S1 + dim S2 (note: this is a consequence of Statement 4).

Note that the notation X + Y denotes the sum of the sets X and Y; i.e., X + Y = {x + y | x ∈
X , y ∈ Y}.

3 Projections onto Subspaces


Let S ⊆ Rm be a nonempty closed set, and let y ∈ Rm be any given vector. A projection of y onto
S is any solution to the following problem
min kz − yk22 .
z∈S

In words, a projection of y onto S is a point in S that is closest to y in the Euclidean sense. In


general, there may be more than one such closest point. If every y ∈ Rm has only one projection
of y onto S, we will use the following notation
ΠS (y) = arg min kz − yk22 .
z∈S

to denote the projection of y onto S.


We are interested in projections onto subspaces. Such concepts play a crucial role in linear
algebra and matrix analysis. Consider the following theorem:

4
Theorem 1.2 Let S be a subspace of Rm .
1. For every y ∈ Rm , there exists a unique vector ys ∈ S that minimizes kz − yk2 over all z ∈ S.
Thus, we can write ΠS (y) = arg minz∈S kz − yk22 .
2. Given y ∈ Rm , we have ys = ΠS (y) if and only if
ys ∈ S, zT (ys − y) = 0, for all z ∈ S. (1)

The above theorem is a special case of the projection theorem in convex analysis and optimiza-
tion [1, Proposition B.11], which deals with projections onto closed convex sets. In the following
we provide a proof that is enough for the subspace case.

Proof: First, we should mention that there always exists a vector in S at which the minimum
of kz − yk22 over all z ∈ S is attained; in this claim we only need S to be closed. This result is
shown by applying the Weierstrass theorem, and readers are referred to [1, proof of Proposition
B.11] for details.
Second, we show the sufficiency of Statement 2. Let ys ∈ S be a vector that minimizes kz − yk2
over all z ∈ S. Since kz − yk22 ≥ kys − yk22 for all z ∈ S, and
kz − yk22 = kz − ys + ys − yk22
= kz − ys k22 + 2(z − ys )T (ys − y) + kys − yk22 ,
we have
kz − ys k22 + 2(z − ys )T (ys − y) ≥ 0, for all z ∈ S.
The above equation is equivalent to
kzk22 + 2zT (ys − y) ≥ 0, for all z ∈ S;
the reason is that z ∈ S implies z − ys ∈ S, and the converse is also true. Now, suppose that
there exists a point z̄ ∈ S such that z̄T (ys − y) 6= 0. Then, by choosing z = αz̄, where α =
−z̄T (ys −y)/kz̄k22 , one can verify that kzk22 +2zT (ys −y) < 0 and yet z ∈ S. Thus, by contradiction,
we must have zT (ys − y) = 0 for all z ∈ S.
Third, we show the necessity of Statement 2. Suppose that there exists a vector ȳs ∈ S such
that zT (ȳs − y) = 0 for all z ∈ S. The aforementioned condition can be rewritten as
(z − ȳs )T (ȳs − y) = 0, for all z ∈ S,
where we have used the equivalence z − ȳs ∈ S ⇐⇒ z ∈ S. Now, for any z ∈ S, we have
kz − yk22 = kz − ȳs k22 + 2(z − ȳs )T (ȳs − y) + kȳs − yk22
= kz − ȳs k22 + kȳs − yk22
≥ kȳs − yk22 . (2)
The above inequality implies that ȳs minimizes kz − yk2 over all z ∈ S. This, together with
the previous sufficiency proof, completes the proof of Statement 2. In addition, we note that the
equality in (2) holds if and only if z − ȳs = z. This implies that ȳs is the only minimizer of kz − yk2
over all z ∈ S. Thus, we also obtain the uniqueness claim in Statement 1. .

Theorem 1.2 has many implications, e.g., in least squares and orthogonal projections which we
will learn in later lectures. In the next section we will consider an application of Theorem 1.2 to
subspaces.

5
4 Orthogonal Complements
Let S a nonempty subset in Rm . The orthogonal complement of S is defined as the set

S ⊥ = y ∈ Rm | zT y = 0 for all z ∈ S .


It is easy to verify that S ⊥ is always a subspace even if S is not. From the definition, we see that

1. any z ∈ S, y ∈ S ⊥ are orthogonal, i.e., S ⊥ consists of all vectors that are orthogonal to all
vectors of S;

2. we have either S ∩ S ⊥ = {0} or S ∩ S ⊥ = ∅; i.e., if we exclude 0, the sets S and S ⊥ are


non-intersecting.

The following theorem is a direct consequence of the projection theorem in Theorem 1.2.

Theorem 1.3 Let S be a subspace of Rm .

1. For every y ∈ Rm , there exists a unique 2-tuple (ys , yc ) ∈ S × S ⊥ such that

y = ys + y c .

Also, such a (ys , yc ) is given by ys = ΠS (y), yc = y − ΠS (y).

2. The projection of y onto S ⊥ is given by ΠS ⊥ (y) = y − ΠS (y).

Proof: Let us rephrase the problem in Statement 1 as follows: Given a vector y ∈ Rm , find
a vector ys such that ys ∈ S and y − ys ∈ S ⊥ . This problem is exactly the same as (1) in
Theorem 1.2. Thus, by Theorem 1.2 the solution is uniquely given by ys = ΠS (y). Also, we have
yc = y − ΠS (y) ∈ S ⊥ .
To show Statement 2, let us consider the following problem: Given a vector y ∈ Rm , find a
vector ȳc such that ȳc ∈ S ⊥ and y − ȳc ∈ (S ⊥ )⊥ , or equivalently,

ȳc ∈ S ⊥ , z̄T (y − ȳc ) = 0, for all z̄ ∈ S ⊥ .

By Theorem 1.2, the solution is uniquely given by ȳc = ΠS ⊥ (y). On the other hand, the above
conditions are seen to satisfy if we choose ȳc = y−ΠS (y) ∈ S ⊥ . It follows that ΠS ⊥ (y) = y−ΠS (y).
movmov 

Armed with Theorem 1.3, we can easily prove the following results.

Property 1.2 The following properties hold for any subspace S ⊆ Rm :

1. S + S ⊥ = Rm ;

2. dim S + dim S ⊥ = m;

3. (S ⊥ )⊥ = S. 1

1
We also have the following result: Let S be any subset (and not necessarily a subspace) in Rm . Then, we have
(S ⊥ )⊥ = span S, where span S is defined as the set of all finite linear combinations of points in S.

6
Proof: Statement 1 is merely a consequence of the first statement in Theorem 1.3. For State-
ment 2, let us assume S does not equal either {0} or Rm ; the cases of {0} and Rm are trivial.
Let {u1 , . . . , vk } and {v1 , . . . , vl } be orthonormal bases of S and S ⊥ , respectively; here note that
k = dim S, l = dim S ⊥ . We can write

S + S ⊥ = span{u1 , . . . , uk , v1 , . . . , vl }.

Also, from the definition of orthogonal complements, it is immediate that uTi vj = 0 for all i, j.
Hence, {u1 , . . . , uk , v1 , . . . , vl } is an orthonormal basis for S + S ⊥ , and consequently we have
dim(S + S ⊥ ) = k + l. Moreover, since S + S ⊥ = Rm , we also have dim(S + S ⊥ ) = m. The result
m = dim S + dim S ⊥ therefore holds.
The proof of Statement 3 is as follows. One can verify from the orthogonal complements
definition that y ∈ S implies y ∈ (S ⊥ )⊥ . On the other hand, any y ∈ (S ⊥ )⊥ satisfies

y = Π(S ⊥ )⊥ (y)
= y − ΠS ⊥ (y) = y − (y − ΠS (y))
= ΠS (y),

where the first equation above is due to the equivalence y ∈ S ⇐⇒ y = ΠS (y) (which is easy to
verify), and the second equation above is due to the second statement of Theorem 1.3. The equality
y = ΠS (y) shown above implies that y ∈ S. 

Let us give an example on the application of Property 1.2. It is known that

dim N (A) = n − rank(A).

The above result can be shown by Property 1.2. First, we use N (A) = R(AT )⊥ . Second, from the
second result in Property 1.2 we have

n = dim R(AT )⊥ + dim R(AT ).

Third, it is known that dim R(AT ) = rank(AT ) = rank(A). Combining these results together gives
dim N (A) = n − rank(A).
We close this section by mentioning one more property.

Property 1.3 Let S1 , S2 ⊆ Rm be subspaces. It holds that S1 ∩ S2 = (S1⊥ + S2⊥ )⊥ .

This property is left as an exercise. I should mention that this is a challenging exercise.

5 Gram Schmidt
We consider the Gram-Schmidt (GS) procedure. Note that the following explanation is different
from that in the lecture slides; we use the perspective of projection to explain how the GS procedure
works.

7
5.1 GS as a Matrix Decomposition Result
Let A ∈ Rm×n be a full-column rank matrix (or a matrix with linearly independent columns). The
matrix A can be decomposed as
A = QR, (3)
for some semi-orthogonal Q ∈ Rm×n and upper triangular R ∈ Rn×n with rii > 0 for all i The
procedure for performing such decomposition is called the GS procedure. From the matrix product
 
r11 r12 . . . r1,n−1 r1,n
 .. 
 r22 .  Pn
..  = r11 q1 r12 q1 + r22 q2 . . .
   
QR = [ q1 q2 . . . qn ]  .. ri,n qi
 . .  i=1

 rn−1,n−1 rn−1,n 
rnn

we can see that the GS decomposition (3) is equal to

a1 = r11 q1 ,
a2 = r12 q1 + r22 q2 ,
..
.
ak = r1k q1 + · · · + rkk qk ,
..
.
an = r1,n q1 + · · · + rnn qn .

5.2 GS and Orthonormal Bases for Subspaces


In Section 2 we mentioned that every subspace S ⊆ Rm has an orthonormal basis; i.e., there exists
an orthonormal set of vectors {q1 , . . . , qn } such that S = span{q1 , . . . , qn }. We want to take a
closer look into this property. Let S ⊆ Rm be a given subspace. From Theorem 1.1 we know that
there exists a vector set {a1 , . . . , an } such that S = span{a1 , . . . , an }. Without loss of generality,
let us assume that {a1 , . . . , an } is linearly independent. Let A = [ a1 , . . . , an ], and let A = QR
be the corresponding GS decomposition. We have

S = R(A) = R(QR). (4)

Consider the following properties.

Property 1.4 Let A ∈ Rm×n , B ∈ Rn×p . If B has full row rank, then R(AB) = R(B).

Property 1.5 Let R ∈ Rn×n be upper triangular with rii 6= 0 for all i. It holds that R(R) = Rn .
Or, R has full rank.

The proof of Property 1.4 is left an exercise for you. The proof of Property 1.5 is shown by the end
of this subsection. Applying Properties 1.4 and 1.5 to (4) gives

S = R(Q).

8
We therefore confirm that every subspace has an orthonormal basis.
Proof of Property 1.5: There are two ways to prove Property 1.5. The first proof is to utilize
the property that if R has nonzeroQ determinant, then R is invertible (or has full rank). Since R
is triangular, we have det(R) = ni=1 rii 6= 0. Hence, we come to the first conclusion R has full
rank. Using the property that a subspace S ⊆ Rn has dim S = n if and only if S = Rn , we have
the second conclusion that R(R) = Rn .
Our second proof goes as follows. We want to show that, given any vector y ∈ Rn , there exists
an x ∈ Rn such that y = Rx. If this is true, then we have Rn = R(R); consequently we also have
dim R(R) = n which is equivalent to saying that R has full rank. Suppose that y = Rx holds. We
can write     
y1 r11 r1,n−1 r1,n x1
 ..   .. ..   .. 
 .   . . 
=  . 
 

yn−1   rn−1,n−1 rn−1,n  xn−1 
yn rnn xn
From the above equation, we see that we can recursively calculate the components of x by

xn = yn /rnn
xn−1 = (yn−1 − rn−1,n xn )/rn−1,n−1
..
.
x1 = (y1 − ni=2 xi ri,n )/r11 .
P

The above equations indicate that, given any y ∈ Rn , there exists an x ∈ Rn such that y = Rx.
The proof is complete. 

5.3 Proof of the GS Decomposition


We now show how the GS decomposition comes about. The proof of the GS decomposition is
constructive, that is, we show the decomposition by showing how the factors Q and R are obtained,
or computed, from A. We will use the following property.

Property 1.6 Let Q ∈ Rm×n be semi-othogonal and let S = R(Q). Given any vector y ∈ Rm , it
holds that ΠS (y) = QQT y.

The proof of Property 1.6 is left as an exercise. The GS decomposition is done by executing a
number of n steps in a sequential fashion.
Step 1: Let q1 = a1 /ka1 k2 and r11 = ka1 k2 . Note that a1 6= 0 because {a1 , . . . , an } is linearly
independent. We have
a1 = r11 q1 . (5)
Also, to facilitate the subsequent proof, let S1 = span{a1 } = span{a1 }.
Step 2: According to Theorem 1.3.1, we can decompose a2 as

a2 = ΠS1 (a2 ) + ΠS ⊥ (a2 ).


1

9
Note that if ΠS ⊥ (a2 ) = 0, then we have a2 ∈ S1 which further implies that {a1 , . . . , an } is linearly
1
dependent. Since we assume linearly independent {a1 , . . . , an }, we must have ΠS ⊥ (a2 ) = 0. By
1
Property 1.6 it holds that
ΠS ⊥ (a2 ) = (qT1 a2 ) q1 .
1 | {z }
:=r12

Let q̃2 = ΠS ⊥ (a2 ) = a2 − r12 q1 , and let q2 = q̃2 /kq̃2 k2 , r22 = kq̃2 k2 . We have
1

a2 = r12 q1 + r22 q2 . (6)

Since q1 ∈ S1 and q2 ∈ S1⊥ , q1 and q2 are orthogonal to each other. Let S2 = span{a1 , a2 }. We
have   
r11 r12
S2 = R([ a1 a2 ]) = R [ q1 q2 ] = span{q1 , q2 },
r22
where the second equality is due to (5) and (6), and the third equality is due to Properties 1.4 and
1.5.
Let us proceed to Step k + 1, k ≤ n − 1.
Step k + 1: Suppose that, in Step k, we have obtained
 
r11 r1,k−1 r1,k
 .. .. 
Qk = [ q1 , . . . , qk ], Rk =  . .  Sk = span{a1 , . . . , ak } = R(Qk ),
,

 rk−1,k−1 rk−1,k 
rkk

where Qk is semi-orthogonal, and rii > 0 for 1 ≤ i ≤ k. We decompose ak+1 as

ak+1 = ΠSk (ak+1 ) + ΠS ⊥ (ak+1 )


| k {z }
:=q̃k+1

We must have q̃k+1 6= 0; if q̃k+1 = 0, we have ak+1 ∈ Sk which implies that {a1 , . . . , ak+1 } is
linearly dependent. By Property 1.6,

ΠSk (ak+1 ) = Qk (QTk ak+1 ).

Also, denote ri,k = qTi ak+1 . We get


Pk
q̃k+1 = ak+1 − i=1 ri,k qi .

Let qk+1 = q̃k+1 /kq̃k+1 k2 , rk+1,k+1 = kq̃k+1 k2 . We can express ak+1 as

ak+1 = r1,k+1 q1 + · · · + rk,k+1 qk + rk+1,k+1 qk+1 .

Let Sk+1 = span{a1 , . . . , ak+1 }. Let


 
r11 r1,k r1,k+1
 .. .. 
Qk+1 = [ q1 , . . . , qk+1 ], Rk+1 =
 . . 

 rk,k rk,k+1 
rk+1,k+1

10
It should be noted that Qk+1 is semi-orthogonal. This is because (i) {q1 , . . . , qk } is orthonormal,
and (ii) q1 , . . . , qk ∈ Sk and qk+1 ∈ Sk imply that qTi qk+1 = 0 for all 1 ≤ i ≤ k. We have

Sk+1 = R(Qk+1 Rk+1 ) = R(Qk+1 ),

where we have used Properties 1.4 and 1.5.


As we proceed to Step n, we obtain the QR decomposition (3). By writing down the proof
procedure as an algorithmic routine, we obtain the GS procedure in the lecture slides.

References
[1] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Mass., U.S.A., 2nd
edition, 1999.

[2] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra – Vectors, Matrices, and
Least Squares. 2018. Available online at [Link]

11
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term

Lecture 2: Linear Representations and Least Squares


Instructor: Wing-Kin Ma

1 Proof of Linear Independence of Vandemonde Matrices


Let k be any positive integer, and consider the following matrix

1 z1 z12 . . . z1k−1
 
1 z2 z 2 . . . z k−1 
2 2 k
B = . ..  ∈ C ,
 
 .. . 
1 zk zk2 . . . zkk−1

with z1 , . . . , zk ∈ C. We will show that B is nonsingular if zi ’s are distinct. For now, let us assume
this to be true and focus on showing the linear independence of A. If m ≥ n, we can represent A
by
AT = [ B × ]
with k = n; here, “×” means parts that do not matter. By the rank definition, we have rank(A) =
rank(AT ) ≥ rank(B) = n. Since we also have rank(A) ≤ n, we obtain the result rank(A) = n.
Moreover, if m ≤ n we can represent A by

A = [ BT × ]

with k = m. Following the same argument as above, we obtain rank(A) = m. Thus we have
established the result that A has full rank.
Now, we show that B is nonsingular if zi ’s are distinct. Observe that

Bα = 0 ⇐⇒ p(zi ) = 0, i = 1, . . . , k (1)

where
p(z) = α1 + α2 z + α3 z 2 + . . . + αk z k−1
denotes a polynomial of degree k − 1. On one hand, the condition on the R.H.S. of (1) implies that
z1 , . . . , zk are the roots of p(z). On the other hand, it is known that a polynomial of degree k − 1
has k − 1 roots, and no more. Consequently, the above two statements contradict to each other if
we have zi 6= zj for all i, j with i 6= j. Hence, we have shown that B must be nonsingular if zi ’s are
distinct.

1
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term

Lecture 3: Eigenvalues and Eigenvectors


Instructor: Wing-Kin Ma

1 Eigenvalue Problem
The eigenvalue problem is as follows. Given A ∈ Cn×n , find a vector v ∈ Cn , v 6= 0, such that

Av = λv, (1)

for some λ ∈ C. If we can find a 2-tuple (v, λ) such that (1) holds, we say that (v, λ) is an eigen-pair
of A, λ is an eigenvalue of A, and v is an eigenvector of A associated with λ. From (1), we observe
two facts. First, (1) is equivalent to finding a λ such that

det(A − λI) = 0. (2)

Second, (1) is equivalent to

v ∈ N (A − λI), N (A − λI) 6= {0}. (3)

Let us consider the solution to (2). Denote p(λ) = det(A − λI). From the definition of
determinant, one can verify that p(λ) is a polynomial of degree n. Since a polynomial of degree n
has n roots, we may write
Yn
p(λ) = (λi − λ),
i=1

where λ1 , . . . , λn are the roots of p(λ). The above equation shows that (2) holds if and only if
λ = λi for any i = 1, . . . , n. Thus, we conclude that the eigenvalue problem in (1) always has a
solution, and that the roots λ1 , . . . , λn of p(λ) are the solutions to (1) . For convenience, let us
denote
Avi = λi vi , i = 1, . . . , n, (4)
where vi denotes an eigenvector associated with λi .
Next, we take a look at the solution to (3) w.r.t. v, given an eigenvalue λ. Since N (A − λI) is
a subspace and N (A − λI) 6= {0}, we can represent it by N (A − λI) = span{b1 , . . . , br } for some
basis {b1 , . . . , br } ⊂ Cn and for some r ≥ 1. In particular, the dimension of N (A − λI) is r. From
the above representation, we notice that there may be multiple eigenvectors associated with the
same λ. We should be careful when we describe the multiplicity of eigenvectors: A scaled version
of an eigenvector v, i.e., αv for some α ∈ C, α 6= 0, is also an eigenvector, but such a case is trivial.
From such a viewpoint, the eigenvector v associated with λ is unique subject to a complex scaling
if dim N (A − λI) = 1. Moreover, for dim N (A − λI) > 1, there are infinitely many eigenvectors
associated with λ even if we do not count the complex scaling cases; however, we can find a number
of r = dim N (A − λI) linearly independent eigenvectors associated with λ. Also, dim N (A − λI)
is the maximal number of linearly independent eigenvectors we can obtain for λ.

1
2 Multiplicity of Eigenvalues and Eigenvectors
We are concerned with the multiplicity of an eigenvalue and the multiplicity of its associated
eigenvectors. For ease of exposition of ideas, let us assume w.l.o.g. that the eigenvalues λ1 , . . . , λn
are ordered such that {λ1 , . . . λk }, k ≤ n, is the set of all distinct eigenvalues of A; i.e., λi 6= λj
for all i, j ∈ {1, . . . , k} with i 6= j, and λi ∈ {λ1 , . . . λk } for all i ∈ {1, . . . , n}. Then, consider the
following definitions:

• The algebraic multiplicity of an eigenvalue λi , i ∈ {1, . . . , k}, is defined as the number of times
that λi appears as a root of p(λ). We will denote the algebraic multiplicity of λi as µi .

• The geometric multiplicity of λi , i ∈ {1, . . . , k}, is defined as the maximal number of linearly
independent eigenvectors associated with λi . We will denote the geometric multiplicity of λi
as γi , and note that γi = dim N (A − λi I).

Intuitively, it seems that if an eigenvalue is repeated r times, then we should also have r (linearly
independent) eigenvectors associated with it. We will show that

Property 3.1 We have µi ≥ γi for i = 1, . . . , k.

However, there exist instances for which µi > γi . An example is as follows:


 
0 0 1
A = 0 0 0 .
0 0 0

It can be verified that the roots of p(λ) are λ1 = λ2 = λ3 = 0. Thus, we have µ1 = 3, k = 1.


However, one can also verify that
   
 1 0 
N (A − λ1 I) = N (A) = span 0 , 1 ,
0 0
 

and consequently, γ1 = dim N (A − λ1 I) = 2.

Proof of Property 3.1: For convenience, let λ̄ ∈ {λ1 , . . . , λk } be any eigenvalue of A, and denote
r = dim N (A − λ̄I). We aim to show that the characteristic polynomial det(A − λI) has at least r
repeated roots for λ = λ̄. From basic subspace concepts (cf. Lecture 1), we can find a collection of
orthonormal vectors q1 , . . . , qr ∈ N (A−λI) and a collection of vectors qr+1 , . . . , qn ∈ Cn such that
Q = [ q1 , . . . , qn ] is unitary. Let Q1 = [ q1 , . . . , qr ], Q2 = [ qr+1 , . . . , qn ], and note Q = [ Q1 Q2 ].
We have  H  H
Q1 AQ1 QH

H Q1 1 AQ2
Q AQ = [ AQ1 AQ2 ] = .
QH
2 QH
2 AQ1 Q2 AQ2
H

Since Aqi = λ̄qi for i = 1, . . . , r, we get AQ1 = λ̄Q1 . By also noting that QH
1 Q1 = I and
QH2 Q1 = 0, the above matrix equation can be simplified to

λ̄I QH
 
H 1 AQ2
Q AQ = .
0 QH 2 AQ2

2
Consequently, we have the following equivalence for the characteristic polynomial of A:

det(A − λI) = det(QH (A − λI)Q) = det(QH AQ − λI)


= det(λ̄I − λI) det(QH
2 AQ2 − λI)
r H
= λ̄ − λ det(Q2 AQ2 − λI)

where the second equality is due to the determinant result for block upper triangular matrices; here
note that det(QH
2 AQ2 − λI) is a polynomial of degree of n − r. From the above equation we see
that det(A − λI) has at least r repeated roots for λ = λ̄. The proof is complete. 

3 Similarity, Diagonalizability, and Eigendecomposition


3.1 Similarity and Diagonalizability
To set the stage for describing eigendecomposition, we start with introducing the concepts of
similarity and diagonalizability. A matrix B ∈ Cn×n is said to be similar to another matrix
A ∈ Cn×n if there exists a nonsingular S ∈ Cn×n such that

B = S−1 AS.

Similar matrices are similar in the sense that their characteristic polynomials are the same. Specif-
ically, if A is similar to B then we have

det(A − λI) = det(S−1 (A − λI)S) = det(B − λI).

It is easy to verify that similar matrices have the following properties:


1. If B is similar to A, A is also similar to B.

2. If A, B are similar, they have the same set of eigenvalues.

3. If A, B are similar, they have det(A) = det(B).


In matrix analysis we are curious about whether a matrix can be similar to a diagonal matrix—
obviously because diagonal matrices are easy to deal with. A matrix A ∈ Cn×n is said to be
diagonalizable if it is similar to a diagonal matrix; i.e., there exists a nonsingular S ∈ Cn×n and a
diagonal D ∈ Cn×n such that
D = S−1 AS,
or equivalently,
A = SDS−1 .
Now, observe that the above equation can be equivalently rewritten as AS = SD, or, in column-
by-column form
Asi = di si , i = 1, . . . , n,
where di denotes the (i, i)th entry of D. Hence, every (si , di ) must be an eigen-pair of A. Also, since
A and D are similar they have the same set of eigenvalues; more precisely, d1 , . . . , dn must equal
λ1 , . . . , λn or any of its permutation. Let us assume w.l.o.g. that di = λi for i = 1, . . . , n. Then
every si is an eigenvector associated with λi . Furthermore, it is important to note that s1 , . . . , sn
must be linearly independent.

3
3.2 Eigendecomposition
We are now ready to consider eigendecomposition. A matrix A ∈ Cn×n is said to admit an
eigendecomposition if there exists a nonsingular V ∈ Cn×n such that

A = VΛV−1 ,

where Λ = Diag(λ1 , . . . , λn ) and λ1 , . . . , λn are the eigenvalues of A. Essentially, eigendecomposi-


tion is the same as the diagonalization discussed above.
A natural question that arises from the above definition is whether eigendecomposition exists.
From the discussion in the last subsection, we see that the question is the same as asking whether
we can find a collection of eigenvectors v1 , . . . , vn , with each vi being associated with λi , such that
v1 , . . . , vn are linearly independent. Let us consider the following property:
Property 3.2 Let A ∈ Cn×n , and suppose that the eigenvalues λ1 , . . . , λn are ordered such that
{λ1 , . . . λk }, k ≤ n, is the set of all distinct eigenvalues of A. Also, let vi be any eigenvector
associated with λi . Then v1 , . . . , vk must be linearly independent.
We relegate the proof of the above property to the next subsection. Property 3.2 gives rise to the
following implications:
1. If the eigenvalues λ1 , . . . , λn of A are all distinct, then A admits an eigendecomposition.

2. By considering also Property 3.1, A admits an eigendecomposition if and only if µi = γi for


all i = 1, . . . , k.

3. A does not admit an eigendecomposition if µi > γi for some i ∈ {1, . . . , k}. Such instances
exist as discussed in the last section.
Before we close this section, we should mention that eigendecomposition is guaranteed to exist
in some matrix subclasses. For example, a circulant matrix always admits an eigendecomposition.
Also, we will show later that it is easy to find an arbitrarily close approximation of a given matrix
A such that the approximate matrix has its eigenvalues all being distinct and thus admits an
eigendecomposition.

3.3 Proof of Property 3.2


We prove Property 3.2 by contradiction. Suppose that λi 6= λj for all i, j ∈ {1, . . . , k} with i 6= j,
but we can find a collection of linearly dependent v1 , . . . , vk , i.e.,
k
X
αi vi = 0, (5)
i=1

for some α 6= 0. Let us assume w.l.o.g. that α1 6= 0, k ≥ 2. From (5), we obtain two equations
k k
!
X X
0=A α i vi = α i λ i vi , (6)
i=1 i=1
k k
!
X X
0 = λk α i vi = αi λk vi . (7)
i=1 i=1

4
Subtracting (6) from (7) results in
k−1
X
αi (λk − λi )vi = 0. (8)
i=1

By repeatedly applying the trick in (5)-(8), one can show that


k
" #
Y
α1 (λi − λ1 ) v1 = 0.
i=2

Since α1 6= 0 and λi 6= λj for all i, j ∈ {1, . . . , k}, i 6= j, the above equation holds only when v1 = 0.
This contradicts the fact that v1 is an eigenvector, and the proof is complete.

4 Eigendecomposition for Hermitian and Real Symmetric Matri-


ces
A square matrix A is said to be Hermitian if A = AH , and symmetric if A = AT . We denote Hn to
be the set of all n × n complex Hermitian matrices and Sn to be the set of all n × n real symmetric
matrices. As we will see, Hermitian and real symmetric matrices always admit eigendecompositions.
To give some insights, consider the following property.

Property 3.3 Let A ∈ Hn . We have the following results:

1. The eigenvalues λ1 , . . . , λn of A are real.

2. Suppose that λi ’s are ordered such that {λ1 , . . . , λk }, k ≤ n, is the set of all distinct eigenvalues
of A. Also, let vi be any eigenvector associated with λi . Then v1 , . . . , vk must be orthonormal.

Proof: For any eigen-pair (v, λ) of A, we have

vH Av = vH (λv) = λkvk22 .

If A is Hermitian we also have

vH Av = (Av)H v = (λv)H v = λ∗ kvk22 .

The above two equations implies that λ = λ∗ , or that λ is real. Moreover, consider the following
equations for any i, j ∈ {1, . . . , k}, i 6= j:

vjH Avi = vjH (Avi ) = λi vjH vi ,


vjH Avi = (Avj )H vj = λj vjH vi ,

where in the second equation we have used the fact that A is Hermitian and its corresponding
eigenvalues are real. The above equations imply (λi − λj )vjH vi = 0. Since λi 6= λj , it must hold
that vjH vi = 0. Hence, we have shown that any collection of v1 , . . . , vk must be orthonormal. 
Note that as a direct corollary of the first result of Property 3.3, any eigenvector of a real
symmetric matrix can be taken as real. Next, we present the main result.

5
Theorem 3.1 Every A ∈ Hn admits an eigendecomposition

A = VΛVH ,

where V ∈ Cn×n is unitary, and Λ = Diag(λ1 , . . . , λn ) with λi ∈ R for all i. Also, if A ∈ Sn , V


can be taken as real orthogonal.

The proof of Theorem 3.1 requires another theorem, namely, the Schur deocmposition theorem.
We will consider the Schur decomposition in the next section.

5 The Schur Decomposition


If we cannot always diagonalize a matrix, which is true as we discussed, our next question would
be to ask if a matrix can always be similar to another matrix that is closer to a diagonal matrix.
The Schur decomposition seeks to look at triangularizability—given A ∈ Cn×n , can we find a
nonsingular S ∈ Cn×n such that
A = STS−1
for some upper triangular T ∈ Cn×n ? Having such a result is still good: When T is upper triangular,
its characteristic polynomial is simply
n
Y
det(T − λI) = (tii − λ).
i=1

As a result, we can have tii = λi for all i.


As a beautiful result in matrix analysis, we can always triangularize a matrix. The result is
summarized as follows.

Theorem 3.2 Let A ∈ Cn×n , and let λ1 , . . . , λn be its eigenvalues. The matrix A admits a
decomposition
A = UTUH , (9)
for some unitary U ∈ Cn×n and for some upper triangular T ∈ Cn×n with tii = λi for all i. If A
is real and λ1 , . . . , λn are all real, U and T can be taken as real.

The decomposition in (9) will be called the Schur decomposition in the sequel. The Schur
decomposition not only shows that any square matrix is similar to an upper triangular matrix, it
also reveals that the “triangularizer” S can be unitary.

5.1 Proof of the Schur Decomposition


Consider the following lemma.

Lemma 3.1 Let X ∈ Cn×n and suppose that X takes a block upper triangular form
 
X11 X12
X= ,
0 X22

6
where X11 ∈ Ck×k , X12 ∈ Ck×(n−k) , X22 ∈ C(n−k)×(n−k) , with 0 ≤ k < n. There exists an unitary
U ∈ Cn×n such that    
H X11 Y12 λ̄ ×
U XU = , Y22 = ,
0 Y22 0 ×
for some Y12 ∈ Ck×(n−k) , Y22 ∈ C(n−k)×(n−k) , λ̄ ∈ C.
Proof of Lemma 3.1: The proof may be seen as a variation of the proof of Property 3.1 in
Section 2. Let λ̄ be any eigenvalue of X22 , and v ∈ Cn−k be an eigenvector of X22 associated with λ̄.
Following the same proof as in Property 3.1, there exists a collection of vectors q2 , . . . , qn−k ∈ Cn−k
such that Q = [ v, q2 , . . . , qn−k ] ∈ C(n−k)×(n−k) is unitary, and it can be shown that QH X22 Q
takes the form  
H λ̄ ×
Q X22 Q = .
0 ×
Now, let  
I 0
U= .
0 Q
We have
   
H I 0 X11 X12 I 0
U XU =
0 QH 0 X22 0 Q
  
I 0 X11 X12 Q
=
0 QH 0 X22 Q
 
X11 X12 Q
= .
0 QH X22 Q

The result is obtained, as desired. 

The idea of proving the Schur decomposition in Theorem 3.2 is to apply Lemma 3.1 recursively.
To put into context, let A0 = A and consider the following iterations:

A1 = UH
1 A0 U1
A2 = UH
2 A1 U2
..
.
An−1 = UH
n−1 An−2 Un−1

where every Ui is unitary and obtained by applying Lemma 3.1 with X = Ai−1 and k = i − 1.
From Lemma 3.1 we observe that Ai takes the form
 
Tii ×
Ai = ,
0 ×

for some upper triangular Tii ∈ Ci×i . Hence, it follows that An−1 is upper triangular. Let

U = U1 U2 · · · Un−1 .

It can be verified that U is unitary. By noting that An−1 = UH AU, we obtain the Schur decom-
position formula A = UTUH where T = An−1 .

7
We should also mentionQhow the result tii = λi for all i is concluded. By the similarity of A and
T and by det(T − λI) = ni=1 (tii − λ), the n roots t11 , . . . , tnn of det(T − λI) must be λ1 , . . . , λn
or its permutation. Hence, we can assume w.l.o.g. that tii = λi for i = 1, . . . , n.
Furthermore, it can be verified that if A is real and its eigenvalues λ1 , . . . , λn are also real, then
we can choose U as real orthogonal and T as real in the Schur decomposition. This is left as a
self-practice problem for you.

5.2 Implications of the Schur Decomposition


Here we discuss some implications of the Schur decomposition.

• Computations of the Schur decomposition: The proof of the Schur decomposition is construc-
tive, and it also tells how we may write an algorithm to compute the Schur factors U and T.
From the proof in the last subsection, we see that we need two sub-algorithms to construct
U and T, namely, i) an algorithm for computing an eigenvector of a given matrix, and ii) an
algorithm that finds a unitary matrix Q such that its first column vector is fixed as a given
vector v. Task i) may be done by the power method, while Task ii) can be accomplished by
the so-called QR decomposition—which we will study later.
It should be further mentioned that the procedure mentioned above is arguably not the
best approach for computing the Schur factors, although it is top-down, insightful and easy
to understand. There exist computationally more efficient methods for computing the Schur
factors, and they were established based on some rather specific and sophisticated ideas which
are beyond the scope of this course. The only keyword I can give you is QR decomposition.

• Proof of Theorem 3.1 and beyond: With the Schur decomposition, we can easily show that any
Hermitian matrix admits an eigendecomposition. Let A be Hermitian and let A = UTUH
be its Schur decomposition. We see that

0 = A − AH = UTUH − UTH UH = U(T − TH )UH .

This implies that T − TH = 0, or T must be diagonal. Consequently, the Schur dcomposition


reduces to the eigendecomposition with unitary V. The implication T − TH = 0 also implies
that tii = t∗ii , or the eigenvalues λi ’s are real. The same result also applies to a real symmetric
A.1 Using the same idea, we can also show that if A satisfies A = −AH , i.e., it is the
so-called skew-Hermitian matrix, then A admits an eigendecomposition with unitary V and
with purely imaginary λi ’s.

• Existence of eigendecomposition: We have mentioned that even though A does not admit an
eigendecomposition, it is not hard to find an approximation of A such that the approximate
matrix admits an eigendecomposition. This is described as follows.

Proposition 3.1 Let A ∈ Cn×n . For every ε > 0, there exists a matrix à ∈ Cn×n such that
the n eigenvalues of à are distinct and

kA − ÃkF ≤ ε.
1
As a minor point to note, for the real symmetric case we first need the first result of Property 3.3 to confirm that
λi ’s are real.

8
The proof is simple and is follows: Let A = UTUH be the Schur decomposition of A, and
let à = U(T + D)UH where D = Diag(d1 , . . . , dn ) and d1 , . . . , dn are chosen such that
|di |2 ≤ ε/n for all i and such that t11 + d1 , . . . , tnn + dn are distinct. It is easy to verify that
kA − Ãk2F = kUDUH k2F = kDk2F ≤ ε2 .
The above proposition suggests that for any square A, we can always find a matrix à that
is arbitrarily close to A and admits an eigendecomposition.

• The Jordan canonical form: Recall that we have been asking the question of whether a
matrix can be similar to a diagonal matrix, and if not possible, a matrix that is closer to
the diagonal matrix. The Jordan canonical form says the following: any square A can be
decomposed as A = SJS−1 where S is nonsingular and J takes some kind of tri-diagonal
structures. The Jordon canonical form is beyond the scope of this course, but we should
note that it is an enhancement of the Schur decomposition. In a nutshell, the proof of the
Jordon canonical form first applies the Schur decomposition. Then, as the non-trivial part
of the proof, it shows that the Schur factor T is similar to another matrix J that takes the
tri-diagonal structure.

9
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term

Lecture 4: Positive Semidefinite Matrices and Variational


Characterizations of Eigenvalues
Instructor: Wing-Kin Ma

The focus of this note is to give a more in-depth description of variational characterizations of
eigenvalues of real symmetric matrices. I will also provide the proof of some results concerning the
PSD matrix inequalities in the main lecture slides.
Our notations is as follows. The eigenvalues of a given matrix A ∈ Sn are denoted by
λ1 (A), . . . , λn (A), and they are assumed to be arranged such that
λmax (A) = λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λn (A) = λmin (A),
where we also denote λmin (A) and λmax (A) as the smallest and largest eigenvalues of A, respec-
tively. If not specified, we will simply denote λ1 , . . . , λn , with λ1 ≥ . . . ≥ λn , as the eigenvalues of
A ∈ Sn . Also, we will denote VΛVT as the eigendecomposition of A without explicit mentioning.

1 Variational Charcterizations of Eigenvalues of Real Symmetric


Matrices
For a general A ∈ Rn×n , we may only characterize its eigenvalues as the roots of the characteristic
polynomial p(λ) = det(A − λI). However, for a real symmetric A, we can alternatively characterize
the eigenvalues as solutions to certain quadratic optimization problems. As we will see, such
variational characterizations of the eigenvalues lead to a number of interesting matrix analysis
results.

1.1 Rayleigh Quotient Maximization and Minimization


Let A ∈ Sn , and consider the following optimization problems
xT Ax
max , (1)
x∈R ,x6=0 xT x
n

xT Ax
min , (2)
x∈Rn ,x6=0 xT x

Note that the ratio xT Ax/xT x is called a Rayleigh quotient. Rayleigh quotients are invariant to
kxk22 ; one can see that
(αx)T A(αx) xT Ax
= ,
(αx)T (αx) xT x
for any α 6= 0 and x 6= 0. Without loss of generality, let us assume kxk2 = 1 and rewrite problems
(1) and (2) as
xT Ax
max = max xT Ax,
x∈Rn ,x6=0 xT x x∈Rn ,kxk2 =1

xT Ax
min = min xT Ax,
x∈R n ,x6=0 xT x x∈Rn ,kxk2 =1

1
respectively. The solutions to problems (1) and (2) are described in the following theorem.

Theorem 4.4 (Rayleigh-Ritz) For any A ∈ Sn , it holds that


λmin kxk22 ≤ xT Ax ≤ λmax kxk22 , (3)
T
λmax = max x Ax, (4)
x∈Rn ,kxk2 =1

λmin = min xT Ax. (5)


x∈Rn ,kxk2 =1

Proof: This theorem is a direct consequence of eigendecomposition for real symmetric matrices.
By letting y = VT x, the term xT Ax can be expressed as
n
X
T T
x Ax = y Λy = λi |yi |2 .
i=1

Since
n
X n
X n
X
λn |yi |2 ≤ λi |yi |2 ≤ λ1 |yi |2
i=1 i=1 i=1

and kyk22 = kVT xk22 = kxk22 for any orthogonal V, we have (3). It follows that
xT Ax ≤ λmax , for all x ∈ Rn , kxk2 = 1, (6)
T n
x Ax ≥ λmin , for all x ∈ R , kxk2 = 1. (7)
It can be verified that the equalities in (6) and (7) are attained when x = v1 and x = vn ,
respectively. Thus, we also have shown (4) and (5). 

1.2 The Courant-Fischer Minimax Theorem


The Rayleigh-Ritz theorem gives an alternative characterization of the smallest and largest eigenval-
ues of a real symmetric matrix. The next question is whether we provide a similar characterization
for any eigenvalue. To give some insight, consider the following problem
max xT Ax,
x∈span{v2 ,...,vn }
kxk2 =1

Following the same proof as in the Rayleigh-Ritz theorem, it can be shown that
λ2 = max xT Ax;
x∈span{v2 ,...,vn }
kxk2 =1

(please try). Also, the above problem can be alternatively expressed as


λ2 = max xT Ax. (8)
x∈span{v1 }⊥
kxk2 =1

However, the above eigenvalue characterization does not look very attractive as its feasible set
depends on the principal eigenvector. Let us consider another problem
max xT Ax,
x∈Rn ,kxk2 =1
wT x=0

2
where w ∈ Rn is given. By letting y = VT x again, we can derive a lower bound

max xT Ax = max yT Λy
x∈Rn ,kxk2 =1 y∈Rn ,kyk2 =1
wT x=0 (VT w)T y=0

≥ max yT Λy
y∈Rn ,kyk2 =1
(VT w)T y=0
y3 =y4 =...=yn =0

= max yT Λy. (9)


y∈span{VT w,e3 ,...,en }⊥
kyk2 =1

Here, there is a subtle issue that we should pay some attention—Eq. (9) is invalid if the subspace
span{VT w, e3 , . . . , en }⊥ equals {0}. Or, we need to make sure that there exists a nonzero vector
y in span{VT w, e3 , . . . , en }⊥ , for otherwise problem (9) is infeasible. But since

dim(span{VT w, e3 , . . . , en }⊥ ) = n − dim(span{VT w, e3 , . . . , en }) ≥ n − (n − 1) = 1,

there always exists a nonzero vector y ∈ span{VT w, e3 , . . . , en }⊥ . From (9), we further get
2
X
max yT Λy = max λi |yi |2 ≥ λ2 . (10)
y∈span{VT w,e3 ,...,en }⊥ y∈span{VT w,e3 ,...,en }⊥
i=1
kyk2 =1 kyk2 =1

Combining (9)–(10), we have the following inequality

max
n
xT Ax ≥ λ2 .
x∈R ,kxk2 =1
wT x=0

Moreover, we notice that the equality above is attained if we choose w = v1 ; specifically this is the
consequence of (8). Hence, we have a variational characterization of λ2 as follows

λ2 = minn max xT Ax.


w∈R x∈Rn ,kxk2 =1
wT x=0

The example showcased above is an instance of a more general variational characterization


result called the Courant-Fischer theorem.

Theorem 4.5 (Courant-Fischer) Let A ∈ Sn , and let Sk denote any subspace of Rn and of
dimension k. For any k ∈ {1, . . . , n}, it holds that

λk = min max xT Ax, (11)


Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1

λk = maxn min xT Ax. (12)


Sk ⊆R x∈Sk ,kxk2 =1

Proof: We will show (11) only as the proof of (12) is similar to that of (11) (or take the proof
of (12) as a self-practice problem). First, we show that

min max xT Ax ≤ λk . (13)


Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1

3
It can be verified that
λk = max xT Ax.
x∈span{vk ,...,vn }
kxk2 =1
Also, since dim(span{vk , . . . , vn }) = n − k + 1, which means that span{vk , . . . , vn } is feasible to
the outer maximization problem on the left-hand side of (13), we obtain (13).
Second, we show that
min max xT Ax ≥ λk , (14)
Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1

and thereby completes the proof. Let


V = {y = VT x | x ∈ Sn−k+1 },
W = {y ∈ Rn | yk+1 = . . . = yn = 0} = span{e1 , . . . , ek }.
It can be shown that dim V = n − k + 1 for any Sn−k+1 and dim W = k. Also, we will need the
following subspace property:
Property 4.4 Let S1 , S2 be subspaces of Rn . If dim S1 +dim S2 > n, then the intersecting subspace
S1 ∩ S2 must not equal {0} (or have dim(S1 ∩ S2 ) ≥ 1).
Proof of Property 4.4: One can prove this property from the subspace result dim S1 + dim S2 −
dim(S1 ∩ S2 ) = dim(S1 + S2 ). Specifically, we have
dim(S1 ∩ S2 ) = dim S1 + dim S2 − dim(S1 + S2 ) ≥ dim S1 + dim S2 − n.
Hence, if dim S1 + dim S2 ≥ n + 1, the intersecting subspace S1 ∩ S2 must have dimension no less
than one and consequently must not equal {0}.
I also show you an alternative proof that does not require the result dim S1 + dim S2 − dim(S1 ∩
S2 ) = dim(S1 + S2 ). Suppose that S1 ∩ S2 = {0}, but dim S1 + dim S2 ≥ n + 1. Let k = dim S1 ,
l = dim S2 , and let {a1 , . . . , ak } and {b1 , . . . , bl } be bases of S1 and S2 , respectively. The condition
S1 ∩ S2 = {0} implies that
k
X l
X
αi ai = βi bi for some α, β =⇒ α = 0, β = 0. (15)
i=1 i=1
On the other hand, for k + l ≥ n + 1, the vector set {a1 , . . . , ak , b1 , . . . , bl } must be linearly
dependent. This implies that there must exist α, β, with either α 6= 0 or β 6= 0, such that the
left-hand side of (15) holds. This contradicts with the implication in (15). 
By Property 4.4, the intersecting subspace V ∩W must contain a nonzero vector for any Sn−k+1 .
Subsequently, we have
min max xT Ax = minn max yT Λy
Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1 V⊆R y∈V,kyk2 =1

≥ minn max yT Λy
V⊆R y∈V∩W
kyk2 =1
k
X
= minn max λi |yi |2
V⊆R y∈V∩W
kyk2 =1 i=1

≥ λk ,
and the proof is complete. 

4
1.3 Implications of the Courant-Fischer Theorem
The Courant-Fischer theorem and its proof insight have led a collection of elegant results for
eigenvalue inequalities, and here we describe some of them. Let A, B ∈ Sn , z ∈ Rn . We have the
following results:

(a) (Weyl) λk (A) + λn (B) ≤ λk (A + B) ≤ λk (A) + λ1 (B) for k = 1, . . . , n;

(b) (interlacing) λk+1 (A) ≤ λk (A ± zzT ) for k = 1, . . . , n − 1, and λk (A ± zzT ) ≤ λk−1 (A) for
k = 2, . . . , n;

(c) if rank(B) ≤ r, then λk+r (A) ≤ λk (A + B) for k = 1, . . . , n − r and λk (A + B) ≤ λk−r (A) for
k = r + 1, . . . , n;

(d) (Weyl) λj+k−1 (A + B) ≤ λj (A) + λk (B) for j, k ∈ {1, . . . , n} with j + k ≤ n + 1;

(e) for any I = {i1 , . . . , ir } ⊆ {1, . . . , n}, λk+n−r (A) ≤ λk (AI ) ≤ λk (A) for k = 1, . . . , r.

(f) for any semi-orthogonal U ∈ Rn×r , λk+n−r (A) ≤ λk (UT AU) ≤ λk (A) for k = 1, . . . , r.

The details of the proof of the above results are left as self-practice problems for you, and
here are some hints: Result (a) is obtained by applying (3) to (11). Result (b) is shown via the
Courier-Fischer theorem; specifically we have

λk (A ± zzT ) = min max xT (A ± zzT )x


Sn−k+1 ⊆Rn x∈Sn−k+1 ,kxk2 =1

≥ min max xT Ax
Sn−k+1 ⊆Rn x∈Sn−k+1 ∩R(z)⊥ ,kxk2 =1

≥ minn max xT Ax
Sr ⊆R x∈Sr ,kxk2 =1
n−k≤r
= λk+1 (A),

where the second inequality is obtained by showing that dim(Sn−k+1 ∩ R(z)⊥ ) ≥ n − k (use the
result dim S1 + dim S2 − dim(S1 ∩ S2 ) = dim(S1 + S2 )). Result (c) is shown by applying Result
n
Pr fact thatT a matrix B ∈ S with rank(B) ≤ r can be written as a sum of r outer
(b) and using the
products B = i=1 µi ui ui . Result (d) is proven by applying Result (c). For Result (e), consider
the case of I = {1, . . . , r} for ease of exposition of ideas. We have

λk (A) ≥ min max xT Ax


Sn−k+1 ⊆Rn x∈Sn−k+1 ∩span{e1 ,...,er }
kxk2 =1

≥ min max y T AI y
Sl ⊆Rr y∈Sl , kyk2 =1
r−k+1≤l
= λk (AI ),

where one needs to show that dim(Sn−k+1 ∩span{e1 , . . . , er }) ≥ r −k +1; some care also needs to be
taken as we change the vector dimension in the second equation above. Result (f) is a consequence
of Result (e).

5
1.4 Maximization of a Sum of Rayleigh Quotients
In the previous subsections we consider maximization or minimization of a Rayleigh quotient. Here
we are interested in a problem concerning a sum of Rayleigh quotients:
r
X uT Auii
max , (16)
U∈Rn×r
i=1
uTi ui
ui 6=0 ∀i, uT
i uj =0 ∀i6=j

where we want the vectors u1 , . . . ur of the Rayleigh quotients to be orthogonal to each other. This
problem finds applications in matrix factorization and PCA, as we will see in the next lecture. As
in the previous treatment for Rayleigh quotients, we can rewrite problem (16) as
r r
X uT Aui X
max i
= max uTi Aui
U∈Rn×r
i=1
uTi ui U∈Rn×r
i=1
ui 6=0 ∀i, uT
i uj =0 ∀i6=j kui k2 =1 ∀i, uT
i uj =0 ∀i6=j

= max tr(UT AU), (17)


U∈Rn×r
UT U=I

The following result describes an equivalence relation between problem (17) and eigenvalues.

Theorem 4.6 Let A ∈ Sn . It holds that


r
X
λi = max tr(UT AU). (18)
U∈Rn×r
i=1 UT U=I

Proof: This theorem can be easily shown by Result (f) in Section 1.3. Specifically, for any
semi-orthogonal U ∈ Rn×r we have
r
X r
X
tr(UT AU) = λi (UT AU) ≤ λi (A), (19)
i=1 i=1

where the inequality is owing to Result (f) in Section 1.3. Since the equality in (19) is attained by
U = [ v1 , . . . , vr ], we obtain the desired result.
To enrich your understanding of this topic, I also show you an alternative proof which does not
use Result (f) in Section 1.3. In essence, the main challenge lies in showing (19). Consider the
following problem
max tr(UT ΛU).
U∈Rn×r
UT U=I
It can be shown that the above problem is equivalent to the problem on the right-hand side of (18).
Now, observe that
Xn
tr(UT ΛU) = tr(UUT Λ) = [UUT ]ii λi ,
i=1

and that for any semi-orthogonal U ∈ Rn×r we have


[UUT ]ii = eTi UUT ei = kUT ei k22 ≤ kei k22 = 1,
n
X
[UUT ]ii = tr(UUT ) = tr(UT U) = r.
i=1

6
This leads us to consider a relaxation
n
X
max tr(UT ΛU) ≤ maxn xi λ i
U∈Rn×r , UT U=I x∈R
i=1
s.t. 0 ≤ xi ≤ 1, i = 1, . . . , n
X n
xi = r
i=1

In particular, we replace every [UUT ]ii by xi . One canPeasily see that the optimal value of the
problem on the right-hand side of the above equation is ri=1 λi . Hence, we have shown (19). 

2 Matrix Inequalities
In this section the aim is to give the proof of some of the results we mentioned in PSD matrix
inequalities in the main lecture slides. To be specific, let A, B ∈ Sn and recall the following results.

(a) A  B implies λk (A) ≥ λk (B) for k = 1, . . . , n.

(b) A  I is equivalent to λk (A) ≥ 1 for k = 1, . . . , n. Also, A  I is equivalent to λk (A) > 1 for


k = 1, . . . , n.

(c) I  A is equivalent to λk (A) ≤ 1 for k = 1, . . . , n. Also, I  A is equivalent to λk (A) < 1 for


k = 1, . . . , n.

(d) If A, B  0 then A  B is equivalent to B−1  A−1 .

(e) (The Schur complement) Let  


A B
X= ,
BT C
where A ∈ Sm , B ∈ Rm×n , C ∈ Sn with C  0, and denote S = A − BC−1 BT . It holds that

X0 ⇐⇒ S0

Also, we have X  0 ⇐⇒ S  0.

The proof is as follows.


Proof of (a): Suppose A  B. We have, for any k ∈ {1, . . . , n},

λk (A) = λk (B + A − B) ≥ λk (B) + λn (A − B) ≥ λk (B).

where the first inequality is due to Weyl’s inequality, and the second inequality is because A − B
is PSD.
Proof of (b): Consider the matrix A − I. Since A − I = V(Λ − I)VT , the eigenvalues of A − I
are λ1 − 1, . . . , λn − 1. It follows that A − I is PSD if and only if λk − 1 ≥ 0 for all k, and that
A − I is PD if and only if λk − 1 > 0 for all k.
Proof of (c): The proof is the same as that of (b) and is omitted for brevity.

7
Proof of (d): Since A is PD, its square root A1/2 is invertible. We have

A − B  0 ⇐⇒ A1/2 (I − A−1/2 BA−1/2 )A1/2  0 ⇐⇒ I − A−1/2 BA−1/2  0 (20)

where Theorem 4.3 has been used to obtain the second equivalence. Let C = A−1/2 BA−1/2 , and
denote its eigendecomposition as C = QDQT where D = Diag(d1 , . . . , dn ). Since B is PD and
A1/2 is invertible, by Theorem 4.3 C is PD; hence, we have dk > 0 for all k. From the right-hand
side of (20) we further show the following equivalence

I − C  0 ⇐⇒ 1 ≥ dk , k = 1, . . . , n
1
⇐⇒ ≥ 1, k = 1, . . . , n
dk
⇐⇒ D−1  I
⇐⇒ Q(D−1 − I)QT  0
⇐⇒ C−1 − I  0
⇐⇒ A1/2 B−1 A1/2 − I  0
⇐⇒ B−1 − A−1  0,

where we have applied Theorem 4.3 multiple times and Results (b)–(c).
Proof of (e): Let  
I 0
Y= .
−C−1 BT I
One can verify that
A − BC−1 BT
 
T 0
Y XY = .
0 C
Also, Y is nonsingular; from the block triangular structure of Y we see that det(Y) = 1. Hence,
by Theorem 4.3, X is PSD if and only if YT XY  0. The condition YT XY  0 is equivalent to
A − BC−1 BT  0 and C  0, and hence we have shown the desired result for the PSD case. The
PD case is shown by the same manner.

8
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term

Lecture 5: Singular Value Decomposition


Instructor: Wing-Kin Ma

In this note we give the detailed proof of some results in the main slides.

1 Proof of SVD
Recall the SVD theorem:

Theorem 5.1 Given any A ∈ Rm×n , there exists a 3-tuple (U, Σ, V) ∈ Rm×m × Rm×n × Rn×n
such that
A = UΣVT ,
U and V are orthogonal, and Σ takes the form

σi , i = j
[Σ]ij = ,
0, i = 6 j

with σ1 ≥ σ2 ≥ . . . ≥ σp ≥ 0 and with p = min{m, n}.

The proof is as follows. First, consider the matrix product AAT . Since AAT is real symmetric
and PSD, by eigendecomposition we can express AAT as
 Λ̃ 0 UT1
  
T T
= U1 Λ̃UT1 ,

AA = UΛU = U1 U2 (1)
0 0 UT2

where we assume that the eigenvalues are ordered such that λ1 ≥ . . . ≥ λr > 0 and λr+1 = . . . =
λp = 0, with r being the number of nonzero eigenvalues; U ∈ Rm×m denotes a corresponding
orthogonal eigenvector matrix; we partiton U as U = [ U1 U2 ], with U1 ∈ Rm×r and U2 ∈
Rm×(m−r) ; Λ̃ = Diag(λ1 , . . . , λr ). It is easy to verify from the decomposition above that

UT2 A = 0. (2)

To see this, we note from UT2 U1 = 0 that UT2 A(UT2 A)T = 0. By the simple result that BBT = 0
implies B = 0 (which is easy to show and whose proof is omitted here), we conclude that UT2 A = 0.
Second, construct the following matrices
p p
Σ̃ = Λ̃1/2 = Diag( λ1 , . . . , λr ), V1 = AT U1 Σ̃−1 ∈ Rn×r .

One can easily see from (1) that

V1T V1 = I.

Furthermore, let V2 ∈ Rn×(n−r) be a matrix such that V = [ V1 V2 ] is orthogonal; we know from


Lecture 1 that such a matrix always exists. It can be verified that

UT1 AV1 = Σ̃, UT1 AV2 = 0. (3)

1
Third, consider the matrix product UT AV. We have
 T
U1 AV1 UT1 AV2

UT AV =
UT2 AV1 UT2 AV2
 
Σ̃ 0
= .
0 0

where (2) and (3) have been used. By multiplying the above equation on the left by U and on the
right by VT , we obtain the desired result A = UΣVT . The proof is complete.

2 Sensitivity Analysis of the Linear System Solution


Recall the perturbed linear system problem: Let A ∈ Rn×n be nonsingular and y ∈ Rn , and denote
x as the solution to the linear system y = Ax. The actual A and y we deal with are perturbed.
To be specific, we have
 = A + ∆A, ŷ = y + ∆y,
where ∆A and ∆y are errors. Let x̂ denote a solution to the perturbed linear system

ŷ = Âx̂.

The problem is to analyze how the solution error x̂ − x scales with ∆A and ∆y.
This analysis problem can be tackled via SVD. To put into context, define

σmax (A)
κ(A) = ,
σmin (A)

which is called the condition number of A. Note that if A is close to singular, then σmin (A) will
be very small and we would expect a very large κ(A). We have the following result.

Theorem 5.2 Let ε > 0 be a constant such that


k∆Ak2 k∆yk2
≤ ε, ≤ ε.
kAk2 kyk2

If ε is sufficiently small such that εκ(A) < 1, then

kx̂ − xk2 2εκ(A)


≤ .
kxk2 1 − εκ(A)

Theorem 5.2 suggests that for a sufficiently small error level ε, the relative solution error kx̂ −
xk2 /kxk2 tends to increase with the condition number κ(A). In particular, if εκ(A) ≤ 1/2, we may
simplify the relative solution error bound in Theorem 5.2 to

kx̂ − xk2
≤ 4εκ(A),
kxk2

where we can see that the error bound above scales linearly with κ(A).

2
Proof of Theorem 5.2: For notational convenience, denote ∆x = x̂ − x. The perturbed linear
system can be written as
(A + ∆A)(x + ∆x) = y + ∆y.
The above equation can be re-organized as

A∆x = ∆y − ∆Ax − ∆A∆x,

and then
∆x = A−1 (∆y − ∆Ax − ∆A∆x).
Let us take 2-norm on the above equation:

k∆xk2 ≤ kA−1 k2 k∆y − ∆Ax − ∆A∆xk2


≤ kA−1 k2 (k∆yk2 + k∆Axk2 + k∆A∆xk2 )
≤ kA−1 k2 (k∆yk2 + k∆Ak2 kxk2 + k∆Ak2 k∆xk2 ) (4)

where we have used the norm inequality kAxk2 ≤ kAk2 kxk2 and the triangle inequality kx + yk2 ≤
kyk2 + kyk2 to obtain (4).
Next, we apply the assumptions k∆Ak2 /kAk2 ≤ ε and k∆yk2 /kyk2 ≤ ε to (4). We have

k∆yk2 ≤ εkyk2 = εkAxk2 ≤ εkAk2 kxk2 , (5)

and substituting (5) and k∆Ak2 ≤ εkAk2 into (4) results in

k∆xk2 ≤ kA−1 k2 kAk2 (2εkxk2 + εk∆xk2 )


= 2εκ(A)kxk2 + εκ(A)k∆xk2 ,

where the result kA−1 k2 = maxi=1,...,n 1/σi (A) = 1/σmin (A) has been used. The above inequality
can be rewritten as
(1 − εκ(A))k∆xk2 ≤ 2εκ(A)kxk2 ,
and if 1 − εκ(A) > 0 we can further rewrite

k∆xk2 2εκ(A)
≤ ,
kxk2 1 − εκ(A)

as desired. 

3
ENGG 5781: Matrix Analysis and Computations 2022-23 First Term

Lecture 7: Linear Systems


Instructor: Wing-Kin Ma

This note shows the proof of the properties and theorems in the main lecture slides.

1 Proof of Properties 7.1–7.3


Recall Properties 7.1–7.3 in the main lecture slides:

Property 7.1 Let A, B ∈ Rn×n be lower triangular. Then, AB is lower triangular. Also, if A,
B have unit diagonal entries, then AB has unit diagonal entries.

Property 7.2 If A ∈ Rn×n is lower triangular, then det(A) = ni=1 aii .


Q

Property 7.3 Let A ∈ Rn×n be nonsingular lower triangular. Then, A−1 is lower triangular with
[A−1 ]ii = 1/aii .

Their proofs are shown as follows.

1.1 Proof of Property 7.1


Property 7.1 can be shown by examining the matrix product AB in an element-by-element fashion.
I also show you an alternative proof using unit vector representations. For convenience, let C = AT ,
and D = AB = CT B. The (k, l)th entry of D is

dkl = cTk bl .

Since B is lower triangular, its columns can be represented by


n
X
bl = bjl ej , l = 1, . . . , n,
j=l

where we recall that ek ’s are unit vectors. Also, since C = AT is upper triangular, we can employ
a similar representation
X k
ck = aki ei , i = 1, . . . , n.
i=1
Using the above representations, dkl can be expressed as

k
!T  n 
X X
dkl = aki ei  bjl ej 
i=1 j=l
k X
X n
= aki bjl eTi ej
i=1 j=l

1
By noting that eTi ej = 0 for all i 6= j, and eTi ei = 1, the above expression can be simplified to

 0,
 k<l
l
dkl = X

 aki bil , k ≥ l
i=k

It follows that D is lower triangular. The above formula also indicates that if akk = bkk = 1 for all
1 ≤ k ≤ n, then dkk = akk bkk = 1 for all 1 ≤ k ≤ n.

1.2 Proof of Property 7.2


Recall the cofactor expansion formula for the determinant of a general A ∈ Rn×n :
n
X
det(A) = aij cij , cij = (−1)i+j det(Aij ),
i=1

for any i = 1, . . . , n, where Aij is a submatrix obtained by deleting the ith row and jth column
of A. Now, consider a lower triangular A. Let us choose i = 1 for the above cofactor expansion
formula
Xn
det(A) = a1j c1j = a11 det(A11 ).
i=1
By repeatedly applying the same cofactor expansion on the cofactors, we obtain det(A) = a11 a22 · · · ann .

1.3 Proof of Property 7.3


Consider the following system
Ax = ek ,
where 1 ≤ k ≤ n, and A is lower triangular. Let us examine the first k equations of the system:
a11 x1 = 0, (1a)
a21 x1 + a22 x2 = 0, (1b)
..
. (1c)
ak−1,1 x1 + . . . + ak−1,k−1 xk−1 = 0, (1d)
ak,1 x1 + . . . + akk xk = 1. (1e)
By applying forward substitution w.r.t. (1a)–(1e), we obtain
1
x1 = . . . = xk−1 = 0, xk = .
akk
Here, we make an assumption that akk 6= 0. This assumption is satisfied if A is nonsingular; cf.
Property 7.2
Now, we show that the inverse of a lower triangular A is also lower triangular. Let B be the
inverse of A. The identity AB = I can be decomposed into n linear systems:
Abk = ek , k = 1, . . . , n.
Using the previously proven result, the solution bk has [bk ]l = 0 for l = 1, . . . , k − 1. Consequently,
B takes a lower triangular structure. In addition, we have [bk ]k = 1/akk .

2
2 Proof of Theorem 7.1
Let us recapitulate the theorem:

Theorem 7.1 A matrix A ∈ Rn×n has an LU decomposition if every principal submatrix A{1,...,k}
satisfies
det(A{1,...,k} ) 6= 0,
for k = 1, 2, . . . , n − 1. If the LU decomposition of A exists and A is nonsingular, then (L, U) is
unique.

From the development of Gauss elimination shown in the main slides, we see that the LU
(k−1)
decomposition of a given A exists (or can be constructed) if the pivots akk ’s are all nonzero. In
the following, we show that if every principal submatrix A{1,...,k} , 1 ≤ k ≤ n − 1, is nonsingular,
(k−1)
then akk is nonzero. Consider the matrix equation

A(k−1) = Mk−1 · · · M2 M1 A

for any 1 ≤ k ≤ n − 1. For convenience, let W = Mk−1 · · · M2 M1 . By Properties 7.1 and 7.3, W
is lower triangular with unit diagonal elements. By denoting Ai:j,k:l be a submatrix of A obtained
by keeping i, i + 1, . . . , j rows and k, k + 1, . . . , l columns of A, we can expand A(k−1) = WA as
" (k−1) (k−1)
#   
A1:k,1:k A1:k,k+1:n W1:k,1:k 0 A1:k,1:k A1:k,k+1:n
(k−1) (k−1) =
A A Wk+1:n,1:k Wk+1:n,k+1:n Ak+1:n,1:k Ak+1:n,k+1:n
k+1:n,1:k k+1:n,k+1:n

From the above equation, we see that


(k−1)
A1:k,1:k = W1:k,1:k A1:k,1:k .

Consequently, we have
(k−1)
det(A1:k,1:k ) = det(W1:k,1:k )det(A1:k,1:k ).
(k−1)
Note that A1:k,1:k is upper triangular, and W1:k,1:k is lower triangular with unit diagonal elements.
Thus, by Property 7.2, their determinants are
k
(k−1) (k−1)
Y
det(A1:k,1:k ) = aii , det(W1:k,1:k ) = 1,
i=1

(k−1)
respectively. It follows that if A1:k,1:k is nonsingular, then akk 6= 0.
We are also interested in proving the uniqueness of the LU decomposition. Suppose that A =
L1 U1 and A = L2 U2 are two LU decompositions of A. Also, assume that A is nonsingular. Then,
we claim that L1 , L2 , U1 and U2 are all nonsingular, and thus invertible: the nonsingularity of L1
and L2 follows from Property 7.2, as well as the fact that L1 and L2 are lower triangular with unit
diagonal elements; the nonsingularity of U1 and U2 can be deduced from the nonsingularity of L1
and L2 and the nonsingularity of A (how?). From A = L1 U1 = L2 U2 , we can write

L−1 −1
2 L1 = U2 U1 . (2)

3
Note that the left-hand side of the above equation is a lower triangular matrix, while the right-hand
side a upper triangular matrix (see Properties 7.1 and 7.3). Hence, (2) can only be satisfied when
L−1 −1 −1
2 L1 and U2 U1 are diagonal. Also, since L2 L1 has unit diagonal elements, we further obtain
L−2 L1 = I, and consequently, L1 = L2 . Moreover, from the above result, we also have U−2 U1 = I,
and then U1 = U2 . Thus, we have shown that the LU decomposition of a nonsingular A, if exists,
has to be unique.

3 Proof of Theorem 7.2


Theorem 7.2 If A = LDMT is the LDM decomposition of a nonsingular symmetric A, then
L = M.

Let A = LDMT be the LDM decomposition of A, and consider

M−1 AM−T = M−1 LD;

(note that any lower triangular M (or L) with unit diagonal elements is invertible, as we have
discussed in the proof of Theorem 7.2). Since A is symmetric, M−1 AM−T is also symmetric. It
follows that M−1 LD is symmetric. By noting that M−1 L is lower triangular with unit diagonal
elements, the only possibility for M−1 LD to be symmetric is that M−1 LD is diagonal. Also, if A
is nonsingular, then it can be verified from A = LDMT that the diagonal matrix D is nonsingular.
As a result, M−1 L must be diagonal. Since M−1 L has unit diagonal elements, we further conclude
that M−1 L = I, or equivalently, L = M.

4 Proof of Theorem 7.3


Theorem 7.3 If A ∈ Sn is PD, then there exists a unique lower triangular G ∈ Rn×n with positive
diagonal elements such that A = GGT .

If A is PD, then any principal submatrix of A is PD—and nonsingular; see Lecture 4, page
15. Hence, by Theorem 7.1, the LU or LDM decomposition of a PD A always exists in a unique
sense. Also, by Theorem 7.2, the LDM decomposition can be simplified to the LDL decomposition
A = LDLT , where L, D is unique. It can be verified that for a PD A, we have di > 0 for all i (I
1
leave this as an exercise for you). By constructing G = LD 2 , we get A = GGT .

You might also like