Notes - Lecture 9
Notes - Lecture 9
Matrix Decomposition: It is a mathematical technique used to break down a matrix into simpler,
more manageable components. This can make solving matrix equations, understanding
properties of matrices, and performing calculations more efficient. Here are some common types
of matrix decomposition:
Eigenvalue Decomposition
Spectral Decomposition
LU Decomposition
Cholesky Decomposition
QR Decomposition
Singular Value Decomposition
1
Matrix Decomposition
Numerical Stability: Decomposition methods can enhance numerical stability, reducing the
impact of rounding errors and improving the accuracy of results, especially in systems with ill-
conditioned matrices. In iterative methods, decompositions can be used to precondition
matrices, improving convergence rates and stability.
Solving Linear Systems: LU decomposition is useful for solving systems of linear equations,
particularly when solving multiple systems with the same coefficient matrix but different right-
hand sides. QR decomposition helps in solving least squares problems, where the goal is to find
the best-fit solution when the system is overdetermined (more equations than unknowns).
Spectral Analysis: Decompositions such as Eigenvalue Decomposition and Singular Value
Decomposition (SVD) are critical for understanding the properties of matrices, including stability,
oscillations, and pattern recognition.
Dimensionality Reduction: SVD is used in techniques like Principal Component Analysis (PCA) to
reduce the dimensionality of data while preserving its essential features.
Data Compression: SVD and other decompositions are used in data compression techniques,
such as image and video compression, by representing data in a lower-dimensional space.
Signal Processing: Decompositions help in filtering, noise reduction, and feature extraction in
signal processing.
2
Computational Complexity
Computational Complexity or Cost of Computation refers to the amount of computational
resources (such as operations or time) required to perform a given numerical task or algorithm,
relative to the size of the input data (such as matrix size). This measure helps evaluate the
efficiency and scalability of algorithms used for solving linear algebra problems.
Big-O notation: Used to express the upper bound of the computational complexity, showing how the
number of operations grows relative to the input size. For example, 𝑂𝑂(𝑛𝑛3 ) indicates that the number
of operations grows cubically with the matrix size 𝑛𝑛.
5
LU Decomposition
Let’s look at this method for 3 × 3 matrix:
𝑎𝑎11 𝑎𝑎12 𝑎𝑎13 𝑙𝑙11 0 0 𝑢𝑢11 𝑢𝑢12 𝑢𝑢13
𝑨𝑨 = 𝑎𝑎21 𝑎𝑎22 𝑎𝑎23 with 𝑳𝑳 = 𝑙𝑙21 𝑙𝑙22 0 , and 𝑼𝑼 = 0 𝑢𝑢22 𝑢𝑢23 .
𝑎𝑎31 𝑎𝑎32 𝑎𝑎33 𝑙𝑙31 𝑙𝑙32 𝑙𝑙33 0 0 𝑢𝑢33
We get
𝑎𝑎11 𝑎𝑎12 𝑎𝑎13 𝑙𝑙11 0 0 𝑢𝑢11 𝑢𝑢12 𝑢𝑢13
𝑎𝑎21 𝑎𝑎22 𝑎𝑎23 = 𝑙𝑙21 𝑙𝑙22 0 0 𝑢𝑢22 𝑢𝑢23
𝑎𝑎31 𝑎𝑎32 𝑎𝑎33 𝑙𝑙31 𝑙𝑙32 𝑙𝑙33 0 0 𝑢𝑢33
𝑎𝑎11 𝑎𝑎12 𝑎𝑎13 𝑙𝑙11 𝑢𝑢11 𝑙𝑙11 𝑢𝑢12 𝑙𝑙11 𝑢𝑢13
⇒ 𝑎𝑎21 𝑎𝑎22 𝑎𝑎23 = 𝑙𝑙21 𝑢𝑢11 𝑙𝑙21 𝑢𝑢12 + 𝑙𝑙22 𝑢𝑢22 𝑙𝑙21 𝑢𝑢13 + 𝑙𝑙22 𝑢𝑢23
𝑎𝑎31 𝑎𝑎32 𝑎𝑎33 𝑙𝑙31 𝑢𝑢11 𝑙𝑙31 𝑢𝑢12 + 𝑙𝑙32 𝑢𝑢22 𝑙𝑙31 𝑢𝑢13 + 𝑙𝑙32 𝑢𝑢23 + 𝑙𝑙33 𝑢𝑢33
6
We can solve these equations sequentially to obtain: 𝑢𝑢11 = 2, 𝑢𝑢12 = −3, 𝑢𝑢13 = 3, 𝑙𝑙21 = 𝑢𝑢 = 3,
11
6−𝑙𝑙31 𝑢𝑢12
𝑢𝑢22 = −8 − 𝑙𝑙21 𝑢𝑢13 = 1, 𝑢𝑢23 = 7 − 𝑙𝑙21 𝑢𝑢13 = −2, 𝑙𝑙31 = −1, 𝑙𝑙32 = = 3, 𝑢𝑢33 = −1 − 𝑙𝑙31 𝑢𝑢13 − 𝑙𝑙32 𝑢𝑢23 = 8.
𝑢𝑢22
8
LU Decomposition
1 0 0 2 −3 3
Therefore, 𝑳𝑳 = 3 1 0 and 𝑼𝑼 = 0 1 −2 .
−1 3 1 0 0 8
0 0 8 𝑥𝑥3 −8
Thus, once a matrix is decomposed into 𝑳𝑳 and 𝑼𝑼, solving the system 𝑨𝑨𝒙𝒙 = 𝒃𝒃 becomes a simple
matter of forward substitution followed by backward substitution, which is computationally
efficient.
• Cost of computation for LU decomposition: LU decomposition factors a matrix into lower and
upper triangular matrices, typically requiring 𝑂𝑂(𝑛𝑛3 ) operations. Once the decomposition is done,
solving linear systems involves 𝑂𝑂 𝑛𝑛2 operations.
11
QR Decomposition
• Theorem: Suppose 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 with 𝑚𝑚 ≥ 𝑛𝑛 and the columns of 𝑨𝑨 are linearly independent. Then, 𝑨𝑨
can be factored as 𝑨𝑨 = 𝑸𝑸𝑸𝑸 where 𝑸𝑸 is an 𝑚𝑚 × 𝑛𝑛 matrix with orthonormal columns, and 𝑹𝑹 is an
invertible 𝑛𝑛 × 𝑛𝑛 upper triangular matrix.
Proof: The proof will involve explicitly constructing 𝑸𝑸 and 𝑹𝑹 and demonstrating that their product
equals 𝑨𝑨. Assume the columns of 𝑨𝑨 are 𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 . We apply the Gram-Schmidt orthogonalization
process to these vectors to obtain a set of orthonormal vectors 𝒒𝒒 �𝑛𝑛 . We then define 𝑸𝑸 as
�2 , … , 𝒒𝒒
�1 , 𝒒𝒒
the 𝑚𝑚 × 𝑛𝑛 matrix with columns 𝒒𝒒 �𝑛𝑛 , resulting in a matrix 𝑸𝑸 with orthonormal columns.
�2 , … , 𝒒𝒒
�1 , 𝒒𝒒
We can represent 𝑨𝑨 and 𝑸𝑸 as follows:
𝑨𝑨 = [𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 ], 𝑸𝑸 = [� �2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒 �𝑛𝑛 ].
Since 𝒂𝒂𝑗𝑗 lies within the span of {𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 } and span{𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 } = span{� �𝑛𝑛 }, we can
�2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒
express each 𝒂𝒂𝑗𝑗 as a linear combination of these orthonormal vectors:
𝒂𝒂1 = (�
𝒒𝒒1 ⋅ 𝒂𝒂1 )�𝒒𝒒1 + (� 𝒒𝒒2 ⋅ 𝒂𝒂1 )�𝒒𝒒2 + ⋯ + (�
𝒒𝒒𝑛𝑛 ⋅ 𝒂𝒂1 )�𝒒𝒒𝑛𝑛
𝒂𝒂2 = (�𝒒𝒒1 ⋅ 𝒂𝒂2 )�𝒒𝒒1 + (� 𝒒𝒒2 ⋅ 𝒂𝒂2 )�𝒒𝒒2 + ⋯ + (�
𝒒𝒒𝑛𝑛 ⋅ 𝒂𝒂2 )�𝒒𝒒𝑛𝑛
⋮
𝒂𝒂𝑛𝑛 = (�
𝒒𝒒1 ⋅ 𝒂𝒂𝑛𝑛 )�
𝒒𝒒1 + (� 𝒒𝒒2 ⋅ 𝒂𝒂𝑛𝑛 )�
𝒒𝒒2 + ⋯ + (�
𝒒𝒒𝑛𝑛 ⋅ 𝒂𝒂𝑛𝑛 )�𝒒𝒒𝑛𝑛
12
QR Decomposition
Next, we define 𝑹𝑹 as the 𝑛𝑛 × 𝑛𝑛 matrix in the following form:
�1 ⋅ 𝒂𝒂1
𝒒𝒒 �1 ⋅ 𝒂𝒂𝟐𝟐
𝒒𝒒 �1 ⋅ 𝒂𝒂𝑛𝑛
⋯ 𝒒𝒒
� �2 ⋅ 𝒂𝒂2 �2 ⋅ 𝒂𝒂𝑛𝑛
𝑹𝑹 = 𝒒𝒒2 ⋅ 𝒂𝒂1 𝒒𝒒 ⋯ 𝒒𝒒 .
⋮ ⋮ ⋱ ⋮
�𝑛𝑛 ⋅ 𝒂𝒂1
𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂2
𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂𝑛𝑛
⋯ 𝒒𝒒
13
QR Decomposition
Define the projection of 𝒂𝒂 onto 𝒒𝒒 as
𝒒𝒒 𝒒𝒒 𝒒𝒒⋅𝒂𝒂
proj𝒒𝒒 𝒂𝒂 = 𝒂𝒂 ⋅ = 𝒒𝒒 = 𝒒𝒒 �
� ⋅ 𝒂𝒂 𝒒𝒒
𝒒𝒒 𝒒𝒒 𝒒𝒒⋅𝒒𝒒
𝒒𝒒 𝒒𝒒
where 𝒒𝒒
�= = is the unit vector along 𝒒𝒒.
𝒒𝒒 𝒒𝒒⋅𝒒𝒒
Now, we will apply the Gram-Schmidt orthogonalization to the column vectors of 𝑨𝑨:
𝒒𝒒1
𝒒𝒒1 = 𝒂𝒂1 �1 =
𝒒𝒒
𝒒𝒒1
𝒒𝒒2
𝒒𝒒2 = 𝒂𝒂2 − proj𝒒𝒒1 𝒂𝒂2 �2 =
𝒒𝒒
||𝒒𝒒2 ||
𝒒𝒒3
𝒒𝒒3 = 𝒂𝒂3 − proj𝒒𝒒1 𝒂𝒂3 − proj𝒒𝒒2 𝒂𝒂3 �3 =
𝒒𝒒
||𝒒𝒒3 ||
⋮ ⋮
𝒒𝒒𝑘𝑘
𝒒𝒒𝑘𝑘 = 𝒂𝒂𝑘𝑘 − ∑𝑘𝑘−1
𝑗𝑗=1 proj𝒒𝒒𝑗𝑗 𝒂𝒂𝑘𝑘 �𝑘𝑘 =
𝒒𝒒
||𝒒𝒒𝑘𝑘 ||
14
QR Decomposition
�1 ⋅ 𝒂𝒂1
𝒒𝒒 �1 ⋅ 𝒂𝒂2
𝒒𝒒 �1 ⋅ 𝒂𝒂3
𝒒𝒒 �1 ⋅ 𝒂𝒂𝑛𝑛
⋯ 𝒒𝒒
�2 ⋅ 𝒂𝒂1
𝒒𝒒 �2 ⋅ 𝒂𝒂2
𝒒𝒒 �2 ⋅ 𝒂𝒂3
𝒒𝒒 �2 ⋅ 𝒂𝒂𝑛𝑛
⋯ 𝒒𝒒
Let’s look at the elements below the main diagonal of 𝑹𝑹 = 𝒒𝒒
�3 ⋅ 𝒂𝒂1 �3 ⋅ 𝒂𝒂2
𝒒𝒒 �3 ⋅ 𝒂𝒂3
𝒒𝒒 �3 ⋅ 𝒂𝒂𝑛𝑛 .
⋯ 𝒒𝒒
⋮ ⋮ ⋮ ⋱ ⋮
�𝑛𝑛 ⋅ 𝒂𝒂1
𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂2
𝒒𝒒 ⋯ �𝑛𝑛 ⋅ 𝒂𝒂𝑛𝑛
⋯ 𝒒𝒒
�2 ⋅ 𝒂𝒂1 = 𝒒𝒒
𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂1 = 0 as 𝒂𝒂1 = 𝒒𝒒1 = 𝒒𝒒1 𝒒𝒒
�3 ⋅ 𝒂𝒂1 = ⋯ = 𝒒𝒒 �1 , and {� �𝑛𝑛 } is an orthonormal set.
�2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒
�3 ⋅ 𝒂𝒂2 = 𝒒𝒒
𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂2 = 0. Let’s show this.
�4 ⋅ 𝒂𝒂2 = ⋯ = 𝒒𝒒
Similarly, we can say that 𝒂𝒂𝑘𝑘 can be written as a linear combination of {� �𝑘𝑘 } and thus 𝒂𝒂𝑘𝑘
�2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒
will be orthogonal to {� �𝑛𝑛 }. Thus, for the elements in the k-th column of 𝑹𝑹 which are
�𝑘𝑘+2 , … , 𝒒𝒒
𝒒𝒒𝑘𝑘+1 , 𝒒𝒒
below the main diagonal could be expressed as 𝒒𝒒 �𝑘𝑘+2 ⋅ 𝒂𝒂𝑘𝑘 = ⋯ = 𝒒𝒒
�𝑘𝑘+1 ⋅ 𝒂𝒂𝑘𝑘 = 𝒒𝒒 �𝑛𝑛 ⋅ 𝒂𝒂𝑘𝑘 = 0.
15
QR Decomposition
This implies that all inner products below the main diagonal of 𝑹𝑹 must be zero, as they are of the
�𝑖𝑖 ⋅ 𝒂𝒂𝑗𝑗 with 𝑖𝑖 > 𝑗𝑗. This makes 𝑹𝑹 an upper triangular matrix of the form
form 𝒒𝒒
�1 ⋅ 𝒂𝒂1
𝒒𝒒 �1 ⋅ 𝒂𝒂𝟐𝟐
𝒒𝒒 ⋯ �1 ⋅ 𝒂𝒂𝑛𝑛
𝒒𝒒
0 �2 ⋅ 𝒂𝒂2
𝒒𝒒 ⋯ �2 ⋅ 𝒂𝒂𝑛𝑛
𝒒𝒒
𝑹𝑹 = .
⋮ ⋮ ⋱ ⋮
0 0 ⋯ �𝑛𝑛 ⋅ 𝒂𝒂𝑛𝑛
𝒒𝒒
A triangular matrix is invertible if and only if all the entries on the main diagonal, 𝒒𝒒
�𝑘𝑘 ⋅ 𝒂𝒂𝑘𝑘 , are non-
�𝑘𝑘 from the Gram-Schmidt orthogonalization process:
zero. Let’s look at the expression for 𝒒𝒒
Take dot product with 𝒒𝒒 �𝑘𝑘 on both sides: 𝒂𝒂𝑘𝑘 ⋅ 𝒒𝒒 �𝑘𝑘 = ∑𝑘𝑘−1 �𝑗𝑗 ⋅ 𝒂𝒂𝑘𝑘 𝒒𝒒
𝑗𝑗=1 𝒒𝒒 �𝑘𝑘 + ||𝒒𝒒𝑘𝑘 || 𝒒𝒒
�𝑗𝑗 ⋅ 𝒒𝒒 �𝑘𝑘 . This
�𝑘𝑘 ⋅ 𝒒𝒒
simplifies to 𝒂𝒂𝑘𝑘 ⋅ 𝒒𝒒
�𝑘𝑘 = ||𝒒𝒒𝑘𝑘 || > 0 as {𝒒𝒒1 , 𝒒𝒒2 , … , 𝒒𝒒𝑛𝑛 } is a linearly independent set of orthogonal
vectors. Hence, 𝑹𝑹 is invertible.
16
QR Decomposition
2 1 3
• Example: Determine the QR decomposition of the matrix 𝑨𝑨 = −1 0 7
0 −1 −1
2 1 3
Columns of 𝑨𝑨 are 𝒂𝒂1 = �−1� , 𝒂𝒂2 = 0 , 𝒂𝒂3 = 7 .
0 −1 −1
Verify that {𝒂𝒂1 , 𝒂𝒂2 , 𝒂𝒂3 } are linearly independent. After that, perform Gram-Schmidt orthogonalization
to obtain the orthonormal vectors:
2⁄ 5 1⁄ 30 1⁄ 6
𝒒𝒒 �2 = 2⁄ 30
�1 = �− 1⁄ 5� , 𝒒𝒒 � 3 = 2⁄ 6
, 𝒒𝒒
0 − 5⁄ 30 1⁄ 6
Form 𝑸𝑸 = [� �3 ] as
�2 𝒒𝒒
𝒒𝒒1 𝒒𝒒
2⁄ 5 1⁄ 30 1⁄ 6
𝑸𝑸 = − 1⁄ 5 2⁄ 30 2⁄ 6
0 − 5⁄ 30 1⁄ 6
17
QR Decomposition
Let’s form the 𝑹𝑹 matrix
�1 ⋅ 𝒂𝒂1
𝒒𝒒 �1 ⋅ 𝒂𝒂2
𝒒𝒒 �1 ⋅ 𝒂𝒂3
𝒒𝒒 5 2⁄ 5 − 1⁄ 5
𝑹𝑹 = 0 �2 ⋅ 𝒂𝒂2
𝒒𝒒 �2 ⋅ 𝒂𝒂3 =
𝒒𝒒 0 6⁄ 30 22⁄ 30
0 0 �3 ⋅ 𝒂𝒂3
𝒒𝒒 0 0 16⁄ 6
Verify that 𝒒𝒒
�𝑖𝑖 ⋅ 𝒂𝒂𝑗𝑗 = 0 with 𝑖𝑖 > 𝑗𝑗 (i.e. 𝒒𝒒
�2 ⋅ 𝒂𝒂1 = 𝒒𝒒
�3 ⋅ 𝒂𝒂1 = 𝒒𝒒
�3 ⋅ 𝒂𝒂2 = 0).
18
QR Decomposition
• Points To Note:
If 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 with 𝑛𝑛 linearly independent columns then 𝑸𝑸 ∈ ℝ𝑛𝑛×𝑛𝑛 is an orthogonal matrix (i.e.
𝑸𝑸𝑇𝑇 𝑸𝑸 = 𝑸𝑸𝑸𝑸𝑇𝑇 = 𝑰𝑰𝑛𝑛 ).
If 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 with 𝑚𝑚 ≥ 𝑛𝑛 and 𝑛𝑛 linearly independent columns then 𝑸𝑸 ∈ ℝ𝑚𝑚×𝑛𝑛 is a matrix with
orthonormal columns (i.e. 𝑸𝑸𝑇𝑇 𝑸𝑸 = 𝑰𝑰𝑛𝑛 but 𝑸𝑸𝑸𝑸𝑇𝑇 ≠ 𝑰𝑰𝑚𝑚 ).
Cost of computation: Gram-Schmidt process requires 𝑂𝑂(𝑚𝑚𝑛𝑛2 )operations.
• Reduced QR decomposition: Any 𝑚𝑚 × 𝑛𝑛 matrix 𝑨𝑨 = [𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 ] (regardless of whether its
columns are linearly independent or dependent), where 𝑚𝑚 ≥ 𝑛𝑛, can be factored as
� 𝑹𝑹
𝑨𝑨 = 𝑸𝑸 �
� is an 𝑚𝑚 × 𝑛𝑛 matrix with orthonormal columns, and 𝑹𝑹
where 𝑸𝑸 � is an 𝑛𝑛 × 𝑛𝑛 upper triangular matrix.
This factorization is known as the reduced QR decomposition.
� 𝑇𝑇 𝑸𝑸
When 𝑚𝑚 > 𝑛𝑛, 𝑸𝑸 � 𝑸𝑸
� = 𝑰𝑰𝑛𝑛 but 𝑸𝑸 � 𝑇𝑇 ≠ 𝑰𝑰𝑚𝑚 . Thus, sometimes it’s useful to have a full QR decomposition
� will become orthogonal matrix.
in which 𝑸𝑸
19
QR Decomposition
• Full QR decomposition: Any 𝑚𝑚 × 𝑛𝑛 matrix 𝑨𝑨 = [𝒂𝒂1 , 𝒂𝒂2 , … , 𝒂𝒂𝑛𝑛 ] (regardless of whether its columns
are linearly independent or dependent), where 𝑚𝑚 ≥ 𝑛𝑛, can be factored as 𝑨𝑨 = 𝑸𝑸𝑸𝑸, where 𝑸𝑸 is
obtained by appending 𝑸𝑸 � with 𝑚𝑚 − 𝑛𝑛 additional orthonormal columns (otherwise arbitrary) to
form an 𝑚𝑚 × 𝑚𝑚 orthogonal matrix. Similarly, to adjust for the last 𝑚𝑚 − 𝑛𝑛 columns of 𝑸𝑸, we append
�
� , resulting in an 𝑚𝑚 × 𝑛𝑛 upper triangular matrix 𝑹𝑹, given by: 𝑹𝑹 = 𝑹𝑹
rows of zeros to 𝑹𝑹 .
𝟎𝟎
When 𝑨𝑨 has full rank, meaning its columns are linearly independent, the matrix 𝑹𝑹 will also have
linearly independent columns and be nonsingular in the reduced case. This ensures that the
diagonal elements of 𝑹𝑹 are nonzero. When, in addition, we require the diagonal elements of 𝑹𝑹 to be
positive, the reduced QR decomposition becomes unique. In contrast, the full QR decomposition is
generally not unique because the last 𝑚𝑚 − 𝑛𝑛 columns of 𝑸𝑸 can be arranged in any order.
QR factorization is generally defined for tall-and-skinny matrices, meaning matrices with more rows
than columns. For a matrix 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 where 𝑚𝑚 < 𝑛𝑛, attempting a QR factorization results in 𝑹𝑹 that
is not upper triangular.
20
QR Decomposition
Reduced QR
=
𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 � ∈ ℝ𝑚𝑚×𝑛𝑛
𝑸𝑸 � ∈ ℝ𝑛𝑛×𝑛𝑛
𝑹𝑹
Full QR
Entries marked in white are zero, while entries marked in yellow are not necessarily zero. 21
QR Decomposition
• Theorem: If 𝑨𝑨 has linearly independent columns, then the normal equation for the system 𝑨𝑨𝒙𝒙 =
𝒃𝒃 can be expressed as 𝑹𝑹𝒙𝒙 = 𝑸𝑸𝑇𝑇 𝒃𝒃.
Proof: In the least squares approximation, the normal system for 𝑨𝑨𝒙𝒙 = 𝒃𝒃 is 𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝑨𝑨𝑇𝑇 𝒃𝒃.
Let’s decompose 𝑨𝑨 as 𝑨𝑨 = 𝑸𝑸𝑸𝑸 and substitute in the normal equation.
𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝑨𝑨𝑇𝑇 𝒃𝒃
𝑸𝑸𝑸𝑸 𝑇𝑇 𝑸𝑸𝑸𝑸𝒙𝒙 = 𝑸𝑸𝑸𝑸 𝑇𝑇 𝒃𝒃
𝑹𝑹𝑇𝑇 𝑸𝑸𝑇𝑇 𝑸𝑸𝑸𝑸𝒙𝒙 = 𝑹𝑹𝑇𝑇 𝑸𝑸𝑇𝑇 𝒃𝒃
𝑹𝑹𝑇𝑇 𝑹𝑹𝒙𝒙 = 𝑹𝑹𝑇𝑇 𝑸𝑸𝑇𝑇 𝒃𝒃 (since 𝑸𝑸𝑇𝑇 𝑸𝑸 = 𝑰𝑰)
Since 𝑨𝑨 has linearly independent columns, 𝑹𝑹 is invertible and thus 𝑹𝑹𝑇𝑇 is also invertible.
−1 −1
𝑹𝑹𝑇𝑇 𝑹𝑹𝑇𝑇 𝑹𝑹𝒙𝒙 = 𝑹𝑹𝑇𝑇 𝑹𝑹𝑇𝑇 𝑸𝑸𝑇𝑇 𝒃𝒃
𝑹𝑹𝒙𝒙 = 𝑸𝑸𝑇𝑇 𝒃𝒃
Since 𝑹𝑹 is an upper triangular matrix, we can easily obtain the solution vector 𝒙𝒙 through
straightforward back substitution.
22
QR Decomposition
• Example: Obtain the least squares solution for the following system using QR decomposition:
2 −1 1 𝑥𝑥 −4
1
1 −5 2 𝑥𝑥 2
2 =
−3 1 −4 𝑥𝑥 5
3
1 −1 1 −1
2 −1 1
1 −5 2
The matrix 𝑨𝑨 has the following linearly independent columns: 𝒂𝒂1 = , 𝒂𝒂2 = , 𝒂𝒂3 =
−3 1 −4
1 −1 1
2
1 2
Apply the Gram-Schmidt orthogonalization: 𝒒𝒒1 = 𝒂𝒂1 = , 𝒒𝒒2 ⋅ 𝒂𝒂1 = −11, 𝒒𝒒1 = 15.
−3
1
7/15
𝒂𝒂 ⋅𝒒𝒒 −64/15
𝒒𝒒2 = 𝒂𝒂2 − 2 12 𝒒𝒒1 = , 𝒂𝒂3 ⋅ 𝒒𝒒1 = 17
𝒒𝒒1 −6/5
−4/15
23
QR Decomposition
2 299
𝒂𝒂3 ⋅ 𝒒𝒒2 = −53/15, 𝒒𝒒2 =
15
−354/299
𝒂𝒂3 ⋅ 𝒒𝒒1 𝒂𝒂3 ⋅ 𝒒𝒒𝟐𝟐 33/299
𝒒𝒒3 = 𝒂𝒂3 − 𝒒𝒒
2 1 − 2 2𝒒𝒒 =
𝒒𝒒1 𝒒𝒒1 −243/299
−54/299
𝒒𝒒1 𝒒𝒒2 𝒒𝒒3
Orthonormal vectors are: 𝒒𝒒
�1 = �2
, 𝒒𝒒 = �3
, 𝒒𝒒 =
||𝒒𝒒1 || ||𝒒𝒒2 || ||𝒒𝒒3 ||
�1 ⋅ 𝒂𝒂1
𝒒𝒒 �1 ⋅ 𝒂𝒂2
𝒒𝒒 �1 ⋅ 𝒂𝒂3
𝒒𝒒
We can obtain 𝑸𝑸 = 𝒒𝒒
�1 �2
𝒒𝒒 �3
𝒒𝒒 and 𝑹𝑹 = 0 �2 ⋅ 𝒂𝒂2
𝒒𝒒 �2 ⋅ 𝒂𝒂3 .
𝒒𝒒
0 0 �3 ⋅ 𝒂𝒂3
𝒒𝒒
18 151 107
Substitute the values into 𝑹𝑹𝒙𝒙 = 𝑸𝑸𝑇𝑇 𝒃𝒃 to find the final solution: 𝑥𝑥1 = − , 𝑥𝑥2 =− , 𝑥𝑥 = .
7 210 3 210
This method is particularly useful for computing least squares solutions for various systems that
share the same 𝑨𝑨 as you only need to perform the decomposition 𝑨𝑨 = 𝑸𝑸𝑸𝑸 once.
24
QR Decomposition
• Points To Note:
QR decomposition is highly effective for solving overdetermined systems (e.g. least-squares
problems). Remember that solving linear least-squares problems using the normal equations involves
solving a system with the matrix 𝑨𝑨𝑇𝑇 𝑨𝑨. However, directly using the normal equations can be problematic
because 𝜅𝜅 𝑨𝑨𝑇𝑇 𝑨𝑨 = 𝜅𝜅 𝑨𝑨 2 , where 𝜅𝜅(𝑨𝑨) denotes the condition number of 𝑨𝑨. The QR approach mitigates
the condition-number-squaring issue and is significantly more numerically stable.
QR decomposition can also be used to solve a square system 𝑨𝑨𝒙𝒙 = 𝒃𝒃. However, it typically requires
approximately twice as many operations as Gauss elimination, making it less commonly used for this
purpose.
QR Decomposition of a Rank-Deficient Matrix: Let 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 with 𝑚𝑚 ≥ 𝑛𝑛 and rank(𝑨𝑨) = 𝑟𝑟 ≤ 𝑛𝑛. The full
QR factorization of 𝑨𝑨 is obtained as follows: First, construct 𝑸𝑸 � ∈ ℝ𝑚𝑚×𝑟𝑟 and 𝑹𝑹 � ∈ ℝ𝑟𝑟×𝑛𝑛 as described
previously and denote them as 𝑸𝑸1 = 𝑸𝑸 � and 𝑹𝑹1 = 𝑹𝑹� . Let {� �𝑟𝑟 } be the columns of 𝑸𝑸1 . Extend
�2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒
this set to form an orthonormal basis for ℝ𝑚𝑚 including {� �2 , … , 𝒒𝒒
𝒒𝒒1 , 𝒒𝒒 �𝑟𝑟+1 , 𝒒𝒒 �𝑚𝑚 }, and let 𝑸𝑸2 be the
�𝑟𝑟+2 , … , 𝒒𝒒
matrix with columns {� �𝑚𝑚 }. The full QR factorization of 𝑨𝑨 is then written as 𝑨𝑨 = 𝑸𝑸𝑸𝑸 =
�𝑟𝑟+2 , … , 𝒒𝒒
𝒒𝒒𝑟𝑟+1 , 𝒒𝒒
�
[𝑸𝑸1 𝑸𝑸2 ] 𝑹𝑹 , where 𝑸𝑸 ∈ ℝ𝑚𝑚×𝑚𝑚 is the block matrix composed of 𝑸𝑸1 and 𝑸𝑸2 , and 𝑹𝑹 ∈ ℝ𝑚𝑚×𝑛𝑛 is the block
𝟎𝟎
matrix composed of 𝑹𝑹� and 𝟎𝟎.
25
Positive Definite/Semidefinite Matrices
A matrix 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 is positive definite if 𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 > 0 for all nonzero 𝒙𝒙 ∈ ℝ𝑛𝑛 . A matrix 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 is
positive semidefinite if 𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 ≥ 0 for all nonzero 𝒙𝒙 ∈ ℝ𝑛𝑛 . Note that 𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 is referred as quadratic
form associated with 𝑨𝑨.
We will demonstrate that a matrix 𝑨𝑨 is positive definite if and only if all its eigenvalues are positive.
Similarly, a matrix 𝑨𝑨 is positive semidefinite if and only if all its eigenvalues are non-negative. Thus,
symmetric matrices with positive eigenvalues are positive definite matrices, while symmetric
matrices with nonnegative eigenvalues are positive semidefinite matrices.
• Theorem: A matrix 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 is positive definite if and only if 𝑨𝑨 has only positive eigenvalues. And
a matrix 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 is positive semidefinite if and only if 𝑨𝑨 has only nonnegative eigenvalues.
Proof: First, we will assume that the matrix is positive definite and show that it’s eigenvalues are
positive. Let 𝜆𝜆𝑗𝑗 , 𝒆𝒆�𝑗𝑗 represent an eigenpair of the symmetric positive definite matrix 𝑨𝑨, where
2
𝒆𝒆�𝑗𝑗 = 𝒆𝒆�𝑗𝑗𝑇𝑇 𝒆𝒆�𝑗𝑗 = 1.
Since 𝑨𝑨 is symmetric, the eigenvalue 𝜆𝜆𝑗𝑗 must be real. We have 𝑨𝑨�𝒆𝒆𝑗𝑗 = 𝜆𝜆𝑗𝑗 𝒆𝒆�𝑗𝑗 . Thus,
2
𝒆𝒆�𝑗𝑗𝑇𝑇 𝑨𝑨�𝒆𝒆𝑗𝑗 = 𝒆𝒆�𝑗𝑗𝑇𝑇 𝜆𝜆𝑗𝑗 𝒆𝒆�𝑗𝑗 = 𝜆𝜆𝑗𝑗 𝒆𝒆�𝑗𝑗𝑇𝑇 𝒆𝒆�𝑗𝑗 = 𝜆𝜆𝑗𝑗 𝒆𝒆�𝑗𝑗 = 𝜆𝜆𝑗𝑗
26
Positive Definite/Semidefinite Matrices
For a positive definite matrix 𝑨𝑨, 𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 > 0 for all nonzero 𝒙𝒙. Therefore, for eigenvector 𝒆𝒆𝑗𝑗 (which is
nonzero by definition), we have 𝒆𝒆�𝑗𝑗𝑇𝑇 𝑨𝑨�𝒆𝒆𝑗𝑗 > 0 which implies 𝜆𝜆𝑗𝑗 > 0.
If 𝑨𝑨 is positive semidefinite then 𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 ≥ 0 for all nonzero 𝒙𝒙. Therefore, for eigenvector 𝒆𝒆𝑗𝑗 (which is
nonzero by definition), we have 𝒆𝒆�𝑗𝑗𝑇𝑇 𝑨𝑨�𝒆𝒆𝑗𝑗 ≥ 0 which implies 𝜆𝜆𝑗𝑗 ≥ 0.
Now, we will assume the eigenvalues are positive and show that this assumption leads us to
positive definite matrix. Recall the spectral decomposition for symmetric matrices: 𝑨𝑨 = 𝑸𝑸𝑸𝑸𝑸𝑸𝑇𝑇 . Take
𝒙𝒙 to be any non-zero vector and suppose 𝒚𝒚 = 𝑸𝑸𝑇𝑇 𝒙𝒙 (i.e. 𝒚𝒚𝑇𝑇 = 𝒙𝒙𝑇𝑇 𝑸𝑸). Note that 𝒚𝒚 ≠ 𝟎𝟎 as 𝑸𝑸 is an
orthogonal matrix. Now, evaluate
𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝒙𝒙𝑇𝑇 𝑸𝑸𝑸𝑸𝑸𝑸𝑇𝑇 𝒙𝒙 = (𝒙𝒙𝑇𝑇 𝑸𝑸)𝑫𝑫(𝑸𝑸𝑇𝑇 𝒙𝒙) = 𝒚𝒚𝑇𝑇 𝑫𝑫𝒚𝒚 = ∑𝑛𝑛𝑖𝑖=1 𝜆𝜆𝑖𝑖 𝑦𝑦𝑖𝑖2 > 0 (as 𝜆𝜆𝑖𝑖 > 0 and 𝒚𝒚 ≠ 𝟎𝟎)
𝒙𝒙𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝒙𝒙𝑇𝑇 𝑸𝑸𝑸𝑸𝑸𝑸𝑇𝑇 𝒙𝒙 = (𝒙𝒙𝑇𝑇 𝑸𝑸)𝑫𝑫(𝑸𝑸𝑇𝑇 𝒙𝒙) = 𝒚𝒚𝑇𝑇 𝑫𝑫𝒚𝒚 = ∑𝑛𝑛𝑖𝑖=1 𝜆𝜆𝑖𝑖 𝑦𝑦𝑖𝑖2 ≥ 0. That means 𝑨𝑨 is positive
semidefinite.
27
Positive Definite/Semidefinite Matrices
• Examples:
2 1
The matrix 𝑨𝑨 = is positive definite as it has positive eigenvalues, denoted by its spectrum
1 2
Λ 𝑨𝑨 = {3,1}.
4 2
The matrix 𝑨𝑨 = is positive semidefinite as it has non-negative eigenvalues, denoted by its
2 1
spectrum Λ 𝑨𝑨 = {3,0}.
1 2
The matrix 𝑨𝑨 = is neither positive definite nor positive semidefinite as it has negative
2 1
eigenvalue, denoted by its spectrum Λ 𝑨𝑨 = {3, −1}.
28
Positive Definite/Semidefinite Matrices
• Theorem: Consider 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 with 𝑚𝑚 ≥ 𝑛𝑛 and the rank 𝑨𝑨 = 𝑛𝑛. The symmetric matrix 𝑨𝑨𝑇𝑇 𝑨𝑨 is
positive definite.
Proof:
Let’s evaluate the following: 𝒙𝒙𝑇𝑇 𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝑨𝑨𝒙𝒙 𝑇𝑇 𝑨𝑨𝒙𝒙 =∥ 𝑨𝑨𝒙𝒙 ∥2 ≥ 0.
This shows that 𝒙𝒙𝑇𝑇 𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 is always non-negative, as it represents the squared norm of 𝑨𝑨𝒙𝒙.
Given that rank 𝑨𝑨 = 𝑛𝑛, we observe that the dimension of the null space of 𝑨𝑨 is dim[N 𝑨𝑨 ] = 𝑛𝑛 −
rank 𝐀𝐀 = 0.
Since the null space of 𝑨𝑨 is trivial, 𝑨𝑨𝒙𝒙 ≠ 𝟎𝟎 for any nonzero vector 𝒙𝒙. Therefore,
𝒙𝒙𝑇𝑇 𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 =∥ 𝑨𝑨𝒙𝒙 ∥2 > 0
29
Positive Definite/Semidefinite Matrices
However, if rank 𝑨𝑨 < 𝑛𝑛, then dim[N 𝑨𝑨 ] > 0. In this case, there exist nonzero vectors 𝒙𝒙 for which
In the case where 𝑚𝑚 < 𝑛𝑛, 𝑨𝑨𝑇𝑇 𝑨𝑨 is always positive semidefinite. This is because the null space of 𝑨𝑨
has a dimension greater than zero (dim[N 𝑨𝑨 ] > 0), meaning there exist nonzero vectors 𝒙𝒙 for
which:
Positive definite matrices are invertible. Since all eigenvalues are positive, the matrix is invertible,
and its inverse is also positive definite.
Let 𝑨𝑨 be an 𝑚𝑚 × 𝑛𝑛 matrix. To understand SVD, we first need to define the singular values of 𝑨𝑨. Begin
by examining the 𝑛𝑛 × 𝑛𝑛 matrix 𝑨𝑨𝑇𝑇 𝑨𝑨. Since this matrix is symmetric, its eigenvalues are guaranteed to
be real numbers. Not only that 𝑨𝑨𝑇𝑇 𝑨𝑨 must have non-negative (or positive) eigenvalues as it is a
positive semidefinite (or definite) matrix.
Let 𝜆𝜆1 , 𝜆𝜆2 , … , 𝜆𝜆𝑛𝑛 be the eigenvalues of 𝑨𝑨𝑇𝑇 𝑨𝑨, including repetitions. Arrange these eigenvalues in
descending order: 𝜆𝜆1 ≥ 𝜆𝜆2 ≥ ⋯ ≥ 𝜆𝜆𝑛𝑛 ≥ 0.
Singular Values: Define 𝜎𝜎𝑖𝑖 = 𝜆𝜆𝑖𝑖 , so that 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ ≥ 𝜎𝜎𝑛𝑛 ≥ 0. The numbers 𝜎𝜎1 , 𝜎𝜎2 , … , 𝜎𝜎𝑛𝑛 defined
in this manner are called the singular values of 𝑨𝑨.
32
Singular Value Decomposition
• Theorem: 𝑨𝑨𝑇𝑇 𝑨𝑨, 𝑨𝑨𝑨𝑨𝑇𝑇 and 𝑨𝑨 have same rank.
Proof: Let 𝒙𝒙 ∈ N 𝑨𝑨 then we have 𝑨𝑨𝒙𝒙 = 𝟎𝟎 ⇒ 𝑨𝑨𝑇𝑇 𝑨𝑨𝒙𝒙 = 𝟎𝟎, i.e. 𝒙𝒙 ∈ 𝑁𝑁 𝑨𝑨 leads to 𝒙𝒙 ∈ 𝑁𝑁 𝑨𝑨𝑇𝑇 𝑨𝑨 , therefore
𝑁𝑁 𝑨𝑨 ⊆ 𝑁𝑁 𝑨𝑨𝑇𝑇 𝑨𝑨 .
𝑇𝑇 𝑇𝑇 𝑇𝑇 2
𝒚𝒚 𝑨𝑨 𝑨𝑨𝒚𝒚 = 0 ⇒ 𝑨𝑨𝒚𝒚 𝑨𝑨𝒚𝒚 = 0 ⇒ 𝑨𝑨𝒚𝒚 ⋅ 𝑨𝑨𝒚𝒚 = 0 ⇒ 𝑨𝑨𝒚𝒚 = 0 leads to 𝑨𝑨𝒚𝒚 = 𝟎𝟎.
𝑇𝑇 𝑇𝑇
Using similar argument for 𝑨𝑨𝑇𝑇 : rank 𝑨𝑨𝑇𝑇 = rank 𝑨𝑨𝑇𝑇 𝑨𝑨 = rank(𝑨𝑨𝑨𝑨𝑇𝑇 ). Thus, we have
rank 𝑨𝑨 = rank 𝑨𝑨𝑇𝑇 𝑨𝑨 = rank 𝑨𝑨𝑨𝑨𝑇𝑇
33
Singular Value Decomposition
• Theorem: The rank of a symmetric matrix is determined by the number of its nonzero eigenvalues
(with repetitions).
Proof: Let 𝑩𝑩 = 𝑨𝑨𝑇𝑇 𝑨𝑨. The matrix 𝑩𝑩 can be decomposed as 𝑩𝑩 = 𝑸𝑸𝑸𝑸𝑸𝑸−1 , where this represents the
eigenvalue decomposition. Both 𝑩𝑩 and 𝑫𝑫 are similar matrices and similar matrices have same rank.
Therefore, the rank of 𝑩𝑩 is equal to the rank of 𝑫𝑫, which is the number of nonzero rows and also the
number of nonzero eigenvalues.
• Theorem: The number of nonzero singular values of 𝑨𝑨 equals the rank of 𝑨𝑨.
Proof: The rank of any symmetric matrix, such as 𝑨𝑨𝑇𝑇 𝑨𝑨, is equal to the number of its nonzero
eigenvalues (counted with multiplicity). As the singular values of 𝑨𝑨 are nothing but the eigenvalues
of 𝑨𝑨𝑇𝑇 𝑨𝑨, the number of nonzero singular values of 𝑨𝑨 matches the rank of 𝑨𝑨𝑇𝑇 𝑨𝑨, which is also the rank
of 𝑨𝑨.
34
Singular Value Decomposition
• Reduced SVD for rectangular matrices: Every real matrix 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 with rank 𝑟𝑟 can be decomposed
into the form: 𝑨𝑨 = 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 .
𝜎𝜎1 0 ⋯ 0 𝒗𝒗1𝑇𝑇
0 𝜎𝜎1 ⋯ ⋮ 𝒗𝒗𝑇𝑇2
𝑨𝑨 = 𝒖𝒖1 𝒖𝒖2 ⋯ 𝒖𝒖𝑟𝑟
⋮ ⋮ ⋱ ⋮ ⋮
0 ⋯ 0 𝜎𝜎𝑟𝑟 𝒗𝒗𝑇𝑇𝑟𝑟
The matrix 𝜮𝜮 ∈ ℝ𝑟𝑟×𝑟𝑟 is a diagonal matrix of the form 𝜮𝜮 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝜎𝜎1 , 𝜎𝜎2 , … , 𝜎𝜎𝑟𝑟 ), where 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ ≥ 𝜎𝜎𝑟𝑟 .
Here, 𝜎𝜎𝑖𝑖 ’s represent the nonzero singular values of 𝑨𝑨, which are the positive square roots of the nonzero
eigenvalues of both 𝑨𝑨𝑇𝑇 𝑨𝑨 and 𝑨𝑨𝑨𝑨𝑇𝑇 .
The columns of 𝑼𝑼 ∈ ℝ𝑚𝑚×𝑟𝑟 consist of the 𝑟𝑟 orthonormal eigenvectors (correspond to the 𝑟𝑟 nonzero
eigenvalues) of 𝑨𝑨𝑨𝑨𝑇𝑇 , also known as the left singular vectors of 𝑨𝑨.
The columns of 𝑽𝑽 ∈ ℝ𝑛𝑛×𝑟𝑟 (or equivalently, the rows of 𝑽𝑽𝑇𝑇 ) consist of the 𝑟𝑟 orthonormal eigenvectors
(correspond to the 𝑟𝑟 nonzero eigenvalues) of 𝑨𝑨𝑇𝑇 𝑨𝑨, also known as the right singular vectors of 𝑨𝑨.
SVD represents the matrix 𝑨𝑨 as 𝑼𝑼𝚺𝚺𝑽𝑽𝑇𝑇 = ∑𝑟𝑟𝑖𝑖=1 𝜎𝜎𝑖𝑖 𝒖𝒖𝑖𝑖 𝒗𝒗𝑇𝑇𝑖𝑖 , where each term is a rank-one matrix formed by
the outer product of vectors. It is also referred as the dyadic form of SVD.
35
Singular Value Decomposition
• Full SVD for rectangular matrices: By adding 𝑚𝑚 − 𝑟𝑟 (and 𝑛𝑛 − 𝑟𝑟) orthonormal columns to 𝑼𝑼 (and
𝑽𝑽) that are orthogonal to the 𝑟𝑟 eigenvectors of 𝑨𝑨𝑨𝑨𝑇𝑇 (and 𝑨𝑨𝑇𝑇 𝑨𝑨), we obtain orthogonal matrix 𝑼𝑼 ∈
ℝ𝑚𝑚×𝑚𝑚 (and 𝑽𝑽 ∈ ℝ𝑛𝑛×𝑛𝑛 ).
=
Reduced SVD
=
Full SVD
𝑼𝑼 is an 𝑚𝑚 × 𝑚𝑚 orthogonal matrix whose columns 𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑚𝑚 are the left singular vectors of 𝑨𝑨.
𝜮𝜮 is an 𝑚𝑚 × 𝑛𝑛 matrix with singular values 𝜎𝜎1 , 𝜎𝜎2 , … , 𝜎𝜎𝑟𝑟 on its pseudo-diagonal, and zeros
elsewhere. Specifically, 𝜮𝜮 is an 𝑟𝑟 × 𝑟𝑟 diagonal matrix with 𝑚𝑚 − 𝑟𝑟 additional zero rows and 𝑛𝑛 − 𝑟𝑟
additional zero columns.
𝑽𝑽 is an 𝑛𝑛 × 𝑛𝑛 orthogonal matrix whose rows 𝒗𝒗1𝑇𝑇 , 𝒗𝒗𝑇𝑇2 , … , 𝒗𝒗𝑇𝑇𝑛𝑛 are the right singular vectors of 𝑨𝑨.
37
Singular Value Decomposition
To ensure that 𝑽𝑽 and 𝑼𝑼 are orthogonal matrices, we need to construct square matrices with
orthonormal columns. This, can be achieved by including all eigenvectors, both those
corresponding to nonzero eigenvalues and those corresponding to zero eigenvalues.
Specifically, for 𝑨𝑨𝑇𝑇 𝑨𝑨 and 𝑨𝑨𝑨𝑨𝑇𝑇 we can write the spectral decomposition and respective eigenvalue-
eigenvector relations as
𝑨𝑨𝑇𝑇 𝑨𝑨 = 𝑽𝑽(𝜮𝜮𝑇𝑇 𝜮𝜮)𝑽𝑽𝑇𝑇 𝑨𝑨𝑨𝑨𝑇𝑇 = 𝑼𝑼(𝜮𝜮𝜮𝜮𝑇𝑇 )𝑼𝑼𝑇𝑇
𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖2 𝒗𝒗𝑖𝑖 , 𝑖𝑖 = 1,2, ⋯ , 𝑟𝑟 𝑨𝑨𝑨𝑨𝑇𝑇 𝒖𝒖𝑖𝑖 = 𝜎𝜎𝑖𝑖2 𝒖𝒖𝑖𝑖 , 𝑖𝑖 = 1,2, ⋯ , 𝑟𝑟
𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝟎𝟎, 𝑖𝑖 = 𝑟𝑟 + 1, ⋯ , 𝑛𝑛 𝑨𝑨𝑨𝑨𝑇𝑇 𝒖𝒖𝑖𝑖 = 𝟎𝟎 , 𝑖𝑖 = 𝑟𝑟 + 1, … , 𝑚𝑚
Points To Note:
𝑨𝑨𝑇𝑇 𝑨𝑨 ∈ ℝ𝑛𝑛×𝑛𝑛 has rank 𝑟𝑟 and thus 𝑟𝑟 number of nonzero eigenvalues and 𝑛𝑛 − 𝑟𝑟 number of zero
eigenvalues.
𝑨𝑨𝑨𝑨𝑇𝑇 ∈ ℝ𝑚𝑚×𝑚𝑚 has rank 𝑟𝑟 and thus 𝑟𝑟 number of nonzero eigenvalues and 𝑚𝑚 − 𝑟𝑟 number of zero
eigenvalues. 38
Singular Value Decomposition
Proof of SVD:
Starting from the equation 𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖2 𝒗𝒗𝑖𝑖 for 𝑖𝑖 ∈ {1,2, … , 𝑟𝑟}, where 𝒗𝒗𝑖𝑖 are the eigenvectors of 𝑨𝑨𝑇𝑇 𝑨𝑨
corresponding to the eigenvalues 𝜎𝜎𝑖𝑖2 , we can multiply both sides by 𝒗𝒗𝑇𝑇𝑖𝑖 :
Since 𝒗𝒗𝑇𝑇𝑖𝑖 𝒗𝒗𝑖𝑖 = 1 (because 𝒗𝒗𝑖𝑖 are normalized eigenvectors), this simplifies to:
2
𝑨𝑨𝒗𝒗𝑖𝑖 𝑇𝑇
𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖2 𝒗𝒗𝑖𝑖 .
Given that ||𝒗𝒗𝑖𝑖 || = 1, we have:
2
𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖2 .
39
Singular Value Decomposition
Multiply both sides of the equation 𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝝈𝝈2𝑖𝑖 𝒗𝒗𝑖𝑖 by 𝑨𝑨
Here, 𝒖𝒖𝑖𝑖 are the eigenvectors of 𝑨𝑨𝑨𝑨𝑇𝑇 corresponding to eigenvalues 𝜎𝜎𝑖𝑖2 . Since the norm of 𝑨𝑨𝒗𝒗𝑖𝑖 is 𝜎𝜎𝑖𝑖 ,
we have 𝒖𝒖𝑖𝑖 with a unit norm.
The eigenvectors 𝒖𝒖𝑖𝑖 are orthogonal to each other because:
𝑨𝑨𝒗𝒗𝑖𝑖 𝑇𝑇 (𝑨𝑨𝒗𝒗𝑗𝑗 ) = 𝒗𝒗𝑇𝑇𝑖𝑖 𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑗𝑗 = 𝒗𝒗𝑇𝑇𝑖𝑖 𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑗𝑗 = 𝒗𝒗𝑇𝑇𝑖𝑖 𝜎𝜎𝑗𝑗2 𝒗𝒗𝑗𝑗 = 𝜎𝜎𝑗𝑗2 𝒗𝒗𝑇𝑇𝑖𝑖 𝒗𝒗𝑗𝑗 .
40
Singular Value Decomposition
Consider the following key equations: 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖 𝒖𝒖𝑖𝑖 , where 𝒖𝒖𝑖𝑖 are the eigenvectors of 𝑨𝑨𝑨𝑨𝑇𝑇 , 𝒗𝒗𝑖𝑖 are the
eigenvectors of 𝑨𝑨𝑇𝑇 𝑨𝑨, and 𝜎𝜎𝑖𝑖 are the singular values, which are positive square roots of the nonzero
eigenvalues of both 𝑨𝑨𝑨𝑨𝑇𝑇 and 𝑨𝑨𝑇𝑇 𝑨𝑨.
Combine these 𝑟𝑟 equations into a matrix form:
where 𝑽𝑽𝑟𝑟 = [𝒗𝒗1 𝒗𝒗2 ⋯ 𝒗𝒗𝑟𝑟 ] is an 𝑛𝑛 × 𝑟𝑟 matrix of right singular vectors, 𝑼𝑼𝑟𝑟 = [𝒖𝒖1 𝒖𝒖2 ⋯ 𝒖𝒖𝑟𝑟 ] is
an 𝑚𝑚 × 𝑟𝑟 matrix of left singular vectors, and 𝜮𝜮𝑟𝑟 is 𝑟𝑟 × 𝑟𝑟 diagonal matrix with singular values 𝜎𝜎𝑖𝑖 on the
diagonal. As 𝑽𝑽𝑟𝑟 consists of orthonormal columns, we have 𝑽𝑽𝑟𝑟 𝑽𝑽𝑇𝑇𝑟𝑟 = 𝑰𝑰𝑛𝑛 . Thus, multiplying both sides
of 𝑨𝑨𝑽𝑽𝑟𝑟 = 𝑼𝑼𝑟𝑟 𝜮𝜮𝑟𝑟 by 𝑽𝑽𝑇𝑇𝑟𝑟 , we get
𝑨𝑨𝑽𝑽𝑟𝑟 𝑽𝑽𝑇𝑇𝑟𝑟 = 𝑼𝑼𝑟𝑟 𝜮𝜮𝑟𝑟 𝑽𝑽𝑇𝑇𝑟𝑟 ⇒ 𝑨𝑨 = 𝑼𝑼𝑟𝑟 𝜮𝜮𝑟𝑟 𝑽𝑽𝑇𝑇𝑟𝑟 (reduced SVD).
41
Singular Value Decomposition
To ensure that 𝑽𝑽 and 𝑼𝑼 are orthogonal matrices, we need to construct square matrices with
orthonormal columns. This, can be achieved by including all eigenvectors, both those
corresponding to nonzero eigenvalues and those corresponding to zero eigenvalues.
For 𝑽𝑽, include the eigenvectors of 𝑨𝑨𝑇𝑇 𝑨𝑨: 𝑽𝑽 = [𝒗𝒗1 𝒗𝒗2 … 𝒗𝒗𝑟𝑟 𝒗𝒗𝑟𝑟+1 … 𝒗𝒗𝑛𝑛 ], where 𝒗𝒗𝑖𝑖 for 𝑖𝑖 = 𝑟𝑟 +
1, … , 𝑛𝑛 are the eigenvectors corresponding to the zero eigenvalues of 𝑨𝑨𝑇𝑇 𝑨𝑨. These eigenvectors could
be obtained by solving 𝑨𝑨𝑇𝑇 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝟎𝟎 for 𝑖𝑖 = 𝑟𝑟 + 1, ⋯ , 𝑛𝑛.
For 𝑼𝑼, include the eigenvectors of 𝑨𝑨𝑨𝑨𝑇𝑇 : 𝑼𝑼 = [𝒖𝒖1 𝒖𝒖2 … 𝒖𝒖𝑟𝑟 𝒖𝒖𝑟𝑟+1 … 𝒖𝒖𝑚𝑚 ], where 𝒖𝒖𝑖𝑖 for 𝑖𝑖 =
𝑟𝑟 + 1, … , 𝑚𝑚 are the eigenvectors corresponding to the zero eigenvalues of 𝑨𝑨𝑨𝑨𝑇𝑇 . These eigenvectors
could be obtained by solving 𝑨𝑨𝑨𝑨𝑇𝑇 𝒖𝒖𝑖𝑖 = 𝟎𝟎 for 𝑖𝑖 = 𝑟𝑟 + 1, ⋯ , 𝑚𝑚.
This construction ensures that 𝑽𝑽 and 𝑼𝑼 are orthogonal matrices, as they consist of orthonormal
eigenvectors from 𝑨𝑨𝑇𝑇 𝑨𝑨 and 𝑨𝑨𝑨𝑨𝑇𝑇 , respectively.
𝜮𝜮𝑟𝑟 𝟎𝟎
The full SVD can be written as 𝑨𝑨𝑨𝑨 = 𝑼𝑼𝑼𝑼 or 𝑨𝑨 = 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 , where 𝜮𝜮 = .
𝟎𝟎 𝟎𝟎
42
Singular Value Decomposition
• Theorem: Four Orthonormal Basis
𝑨𝑨𝒙𝒙 = ∑𝑟𝑟𝑗𝑗=1 𝜎𝜎𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗𝑗𝑗𝑇𝑇 𝒙𝒙 = ∑𝑟𝑟𝑗𝑗=1 𝜎𝜎𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗𝑗𝑗𝑇𝑇 𝒙𝒙 = ∑𝑟𝑟𝑗𝑗=1 𝜎𝜎𝑗𝑗 𝒗𝒗𝑗𝑗𝑇𝑇 𝒙𝒙 𝒖𝒖𝑗𝑗 . In the last step, we have rearranged the
order of the scalar 𝒗𝒗𝑗𝑗𝑇𝑇 𝒙𝒙 and the vector 𝒖𝒖𝑗𝑗 .
We observe that 𝑨𝑨𝒙𝒙 is a weighted sum of the vectors 𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑟𝑟 . Since this holds for any 𝒙𝒙 ∈ ℝ𝑛𝑛 ,
we can conclude that: C 𝑨𝑨 ⊆ span 𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑟𝑟 .
Can we deduce the converse? Since C 𝑨𝑨 is a subspace, if we can demonstrate that each of the
vectors𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑟𝑟 is contained in C 𝑨𝑨 , then we will confirm that: span 𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑟𝑟 ⊆ C 𝑨𝑨 .
To demonstrate this, recall that 𝑨𝑨𝒗𝒗𝑖𝑖 = 𝜎𝜎𝑖𝑖 𝒖𝒖𝑖𝑖 for 𝑖𝑖 = 1, … , 𝑟𝑟. Therefore, we can express 𝒖𝒖𝑖𝑖 as: 𝒖𝒖𝑖𝑖 =
1
𝑨𝑨𝒗𝒗𝑖𝑖 . Since 𝑨𝑨𝒗𝒗𝑖𝑖 is a linear combination of the columns of 𝑨𝑨, it follows that 𝒖𝒖𝑖𝑖 is in the column
𝜎𝜎𝑖𝑖
space of 𝑨𝑨. Consequently, we can conclude that: C 𝑨𝑨 = span 𝒖𝒖1 , 𝒖𝒖2 , … , 𝒖𝒖𝑟𝑟 and N 𝑨𝑨𝑇𝑇 =
span 𝒖𝒖𝑟𝑟+1 , 𝒖𝒖𝑟𝑟+2 , … , 𝒖𝒖𝑚𝑚
𝑇𝑇 𝑇𝑇
Similarly, one can show =𝑨𝑨𝑇𝑇 ∑𝑇𝑇𝑗𝑗=1 𝜎𝜎𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗𝑖𝑖 = ∑𝑖𝑖𝑗𝑗=1 𝜎𝜎𝑗𝑗 𝒗𝒗𝑗𝑗 𝒖𝒖𝑗𝑗𝑇𝑇 , C 𝑨𝑨𝑇𝑇 = span 𝒗𝒗1 , 𝒗𝒗2 , … , 𝒗𝒗𝑟𝑟 and
N 𝑨𝑨 = span 𝒗𝒗𝑟𝑟+1 , 𝒗𝒗𝑟𝑟+2 , … , 𝒗𝒗𝑛𝑛 .
44
Singular Value Decomposition
1 1
• Example: Find the SVD of A, where 𝑨𝑨 = 0 0 .
2 − 2
Note that rank 𝑨𝑨 = 2 = 𝑛𝑛 and 𝑛𝑛 = 2. Therefore, 𝑨𝑨 is a full-rank matrix.
3 −1
Obtain 𝑨𝑨𝑇𝑇 𝑨𝑨 = . Note that 𝑨𝑨𝑇𝑇 𝑨𝑨 is a symmetric matrix.
−1 3
Step1: For 𝑨𝑨𝑇𝑇 𝑨𝑨, we find the eigenvalues λ1 = 4 and λ2 = 2, with corresponding eigenvectors
2/2 2/2
𝒗𝒗1 = , 𝒗𝒗2 = ,
− 2/2 2/2
arranged in the order λ1 ≥ λ2 . Thus, 𝒗𝒗1 and 𝒗𝒗2 are the right singular vectors of 𝑨𝑨.
Step 2: Define the singular values as 𝜎𝜎𝑗𝑗 = 𝑨𝑨𝒗𝒗𝑗𝑗 = 𝜆𝜆𝑗𝑗 for 𝑗𝑗 = 1, … , 𝑟𝑟. Therefore, we compute 𝜎𝜎1 =
𝜆𝜆1 = 2 and 𝜎𝜎2 = 𝜆𝜆2 = 2.
45
Singular Value Decomposition
Alternatively, we could have computed the singular values directly using:
1 1 0 1 1 2
2/2 2/2
𝑨𝑨𝒗𝒗1 = 0 0 = 0 , 𝑨𝑨𝒗𝒗2 = 0 0 = 0 .
2 − 2 − 2/2 2 2 − 2 2/2
0
From these, the singular values are: 𝜎𝜎1 = ||𝑨𝑨𝒗𝒗1 || = 2 and 𝜎𝜎2 = ||𝑨𝑨𝒗𝒗2 || = 2.
𝑨𝑨𝒗𝒗𝑗𝑗
Step 3: Define 𝒖𝒖𝑗𝑗 = for 𝑗𝑗 = 1, … , 𝑟𝑟. Using the vectors 𝑨𝑨𝒗𝒗1 and 𝑨𝑨𝒗𝒗2 computed earlier:
𝜎𝜎𝑗𝑗
0 0 2 1
1 1 1 1
𝒖𝒖1 =
𝜎𝜎1
𝑨𝑨𝒗𝒗1
=
2
0 = 0 , 𝒖𝒖2 = 𝜎𝜎 𝑨𝑨𝒗𝒗2 =
0 = 0. 2
2
2 1 0 0
Step 4: Combining, we obtain the reduced SVD: 𝑨𝑨 = 𝑼𝑼𝑟𝑟 𝜮𝜮𝑟𝑟 𝑽𝑽𝑇𝑇𝑟𝑟
1 1 0 1 2 0
2⁄2 − 2⁄2
0 0 = 0 0
0 2 2⁄2 2⁄2
2 − 2 1 0
46
Singular Value Decomposition
To complete the full SVD, we need to add a unit vector 𝒖𝒖3 that is orthogonal to 𝒖𝒖1 and 𝒖𝒖2 . We can
choose 𝒖𝒖3 = 0 1 0 𝑇𝑇 .
Finally, the matrix 𝑨𝑨 can be expressed in dyadic form as: 𝑨𝑨 = 𝜎𝜎1 𝒖𝒖1 𝒗𝒗1𝑇𝑇 + 𝜎𝜎2 𝒖𝒖2 𝒗𝒗𝑇𝑇2
1 1 0 2 2 1 2 2 0 0 1 1
0 0 =2 0 − + 2 0 ==�0 0 �+ 0 0
2 2 2 2
2 − 2 1 0 2 − 2 0 0
47
Singular Value Decomposition
• Low-Rank Approximation: One of the most valuable features of the SVD is its ability to provide
optimal low-rank approximations of a matrix. The SVD expresses a matrix 𝑨𝑨 in the form:
𝑟𝑟
where 𝑨𝑨 is represented as the sum of 𝑟𝑟 rank-1 matrices 𝜎𝜎𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗𝑗𝑗𝑇𝑇 . Given that 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ 𝜎𝜎𝑟𝑟 > 0, one
might expect that using a partial sum: ∑𝑘𝑘𝑘
𝑗𝑗=1 𝜎𝜎
𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗 𝑇𝑇
𝑗𝑗 for some 𝑘𝑘 much smaller than 𝑟𝑟, will provide a
good approximation to 𝑨𝑨. This is particularly useful when 𝑨𝑨 represents a low-rank phenomenon that
has been distorted by noise or random sampling errors, which can artificially increase its rank.
For square, diagonalizable matrices 𝑨𝑨 = 𝑸𝑸𝑸𝑸𝑸𝑸−1 , the eigenvalue decomposition also allows 𝑨𝑨 to be
expressed as a sum of rank-1 matrices. However, the SVD offers distinct advantages for low-rank
approximations: (1) The SVD is applicable to all matrices, while eigenvalue decomposition is limited
to square, diagonalizable matrices. (2) The singular values in the SVD are non-negative real
numbers, ordered as 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ ≥ 𝜎𝜎𝑟𝑟 > 0, providing a clear indication of how much each rank-1
matrix 𝜎𝜎𝑗𝑗 𝒖𝒖𝑗𝑗 𝒗𝒗𝑗𝑗𝑇𝑇 contributes to the overall matrix 𝑨𝑨.
48
Singular Value Decomposition
2 100
• Example: Find the eigenvalue decomposition and SVD of 𝑨𝑨 = .
0 1
Eigenvalues of 𝑨𝑨 are λ1 = 2 and λ2 = 1.
Eigenvalue decomposition is given by
1 1 2 0 1 100
𝑨𝑨 = 𝑸𝑸𝑸𝑸𝑸𝑸−1 =
0 − 1⁄100 0 1 0 −100
2 1 1 100
=
0 − 1⁄100 0 −100
2 1
= 1 100 + 0 −100
0 − 1⁄100
2 200 0 −100
=� �+
0 0 0 1
2 200 0 −100 0 −100 2 200
Let’s evaluate 𝑨𝑨 − = and 𝑨𝑨 − = .
0 0 0 1 0 1 0 0
Note that neither of the rank-1 matrices provides a satisfactory approximation on 𝑨𝑨.
49
Singular Value Decomposition
SVD of 𝑨𝑨
𝑨𝑨 = 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 = 𝜎𝜎1 𝒖𝒖1 𝒗𝒗1𝑇𝑇 + 𝜎𝜎2 𝒖𝒖2 𝒗𝒗𝑇𝑇2
We obtain 𝜎𝜎1 = 100.025, 𝜎𝜎2 = 0.02.
50
Singular Value Decomposition
• Dimensionality Reduction: SVD is a valuable tool for dimensionality reduction in machine
learning. It decomposes a matrix 𝑨𝑨 into the sum of outer products of vectors: 𝑨𝑨 = 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 =
∑𝑟𝑟𝑖𝑖=1 𝜎𝜎𝑖𝑖 𝒖𝒖𝑖𝑖 𝒗𝒗𝑇𝑇𝑖𝑖 , where each term 𝜎𝜎𝑖𝑖 𝒖𝒖𝑖𝑖 𝒗𝒗𝑇𝑇𝑖𝑖 represents a rank-one matrix. The vectors 𝒖𝒖𝑖𝑖 and 𝒗𝒗𝑖𝑖 provide
independent directions, and 𝜎𝜎𝑖𝑖 indicates their strength. SVD is particularly useful for
approximating matrices. For instance: 𝑨𝑨 ≈ 𝜎𝜎1 𝒖𝒖1 𝒗𝒗1𝑇𝑇 + 𝜎𝜎2 𝒖𝒖2 𝒗𝒗𝑇𝑇2 + 𝜎𝜎3 𝒖𝒖3 𝒗𝒗𝑇𝑇3 captures the most
significant features of 𝑨𝑨. This approximation is more effective than using terms like: 𝑨𝑨 ≈ 𝜎𝜎4 𝒖𝒖4 𝒗𝒗𝑇𝑇4 +
𝜎𝜎5 𝒖𝒖5 𝒗𝒗𝑇𝑇5 + 𝜎𝜎6 𝒖𝒖6 𝒗𝒗𝑇𝑇6 because it prioritizes the largest singular values ( 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ 𝜎𝜎3 ), which
represent the most important features of the data.
Example: SVD reduces the file/data size without sacrificing image quality (important features). Consider a
dog’s image with 2000 × 1500 pixels
Given a matrix 𝑨𝑨 of size 𝑚𝑚 × 𝑛𝑛, the Moore-Penrose pseudoinverse, denoted by 𝑨𝑨† , is the unique
matrix of size 𝑛𝑛 × 𝑚𝑚 that satisfies the following four conditions:
𝑨𝑨𝑨𝑨† 𝑨𝑨 = 𝑨𝑨: This condition ensures that applying the pseudoinverse to the original matrix and
then back again leaves the original matrix unchanged.
𝑨𝑨† 𝑨𝑨𝑨𝑨† = 𝑨𝑨† : Similarly, this ensures that applying the matrix to the pseudoinverse and back
again leaves the pseudoinverse unchanged.
𝑇𝑇
𝑨𝑨𝑨𝑨† = 𝑨𝑨𝑨𝑨† : This condition means that the product 𝑨𝑨𝑨𝑨† is symmetric.
† 𝑇𝑇
𝑨𝑨 𝑨𝑨 = 𝑨𝑨† 𝑨𝑨: This condition means that the product 𝑨𝑨† 𝑨𝑨 is also symmetric.
52
Pseudoinverse: Moore-Penrose Inverse
• Key Properties:
Existence and uniqueness: The Moore-Penrose pseudoinverse always exists and is unique for
any matrix, regardless of whether the matrix is square, rectangular, singular, or non-singular.
Application in solving linear systems: The pseudoinverse provides a way to solve linear
systems, especially when the system is underdetermined (more unknowns than equations) or
overdetermined (more equations than unknowns). For a system 𝑨𝑨𝒙𝒙 = 𝒃𝒃 : (1) If the system is
overdetermined, the pseudoinverse gives the least squares solution. (2) If the system is
underdetermined, the pseudoinverse gives the minimum norm solution.
The Moore-Penrose pseudoinverse offers a general approach for solving the linear equation system
𝑨𝑨𝒙𝒙 = 𝒃𝒃, where 𝑨𝑨 ∈ ℝ𝑚𝑚×𝑛𝑛 , 𝒙𝒙 ∈ ℝ𝑛𝑛 and 𝒃𝒃 ∈ ℝ𝑚𝑚 . Moore and Penrose showed that there is a general
solution for these systems, which can be given by:
𝒙𝒙 = 𝑨𝑨† 𝒃𝒃
53
Pseudoinverse: Moore-Penrose Inverse
Two-sided inverse: When 𝑚𝑚 = 𝑛𝑛 and 𝑨𝑨 is of full rank, the Moore-Penrose pseudoinverse 𝑨𝑨†
simplifies to 𝑨𝑨−1 . This is also referred as a two-sided inverse of a matrix 𝑨𝑨 as both 𝑨𝑨𝑨𝑨−1 = 𝑰𝑰 = 𝑨𝑨−1 𝑨𝑨
hold true. This is commonly referred to as the inverse of 𝑨𝑨. In this context, the matrix 𝑨𝑨 has full rank,
meaning its rank 𝑟𝑟 is equal to both the number of rows 𝑚𝑚 and the number of columns 𝑛𝑛.
Left inverse: When 𝑚𝑚 > 𝑛𝑛, the solution is the one that minimizes the value of the quantity 𝒃𝒃 − 𝑨𝑨𝒙𝒙 .
In this scenario, there are more equations than unknowns, making it generally impossible to find a
solution that satisfies all the equations. The pseudoinverse provides a solution 𝒙𝒙 that makes 𝑨𝑨† 𝒙𝒙
as close as possible (in the least-squares sense) to the desired vector 𝒃𝒃.
Right inverse: When 𝑚𝑚 < 𝑛𝑛, the Moore-Penrose solution minimizes the 2-norm of 𝒙𝒙 , or 𝒙𝒙 . In such
cases, there are typically an infinite number of solutions, and the Moore-Penrose solution is the one
with the smallest 2-norm.
When 𝑨𝑨 has full rank, the Moore-Penrose pseudoinverse can be determined directly using the
−1 −1
following method: when 𝑚𝑚 > 𝑛𝑛, 𝑨𝑨† = 𝑨𝑨𝑇𝑇 𝑨𝑨 𝑨𝑨𝑇𝑇 (left inverse), and when 𝑚𝑚 < 𝑛𝑛, 𝑨𝑨† = 𝑨𝑨𝑇𝑇 𝑨𝑨𝑨𝑨𝑇𝑇
(right inverse). However, if 𝑨𝑨 is not of full rank, these formulas are not applicable. In such cases, the
pseudoinverse is most effectively calculated using the SVD.
54
Pseudoinverse: Moore-Penrose Inverse
SVD for Pseudoinverse: Let’s suppose we are interested to solve 𝑨𝑨𝒙𝒙 = 𝒃𝒃 for a rectangular matrix 𝑨𝑨.
Substitute SVD for 𝑨𝑨 as 𝑨𝑨 = 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 to obtain
𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝒃𝒃 ⇒ 𝑼𝑼𝑇𝑇 𝑼𝑼𝑼𝑼𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝑼𝑼𝑇𝑇 𝒃𝒃 ⇒ 𝜮𝜮𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝑼𝑼𝑇𝑇 𝒃𝒃 (as 𝑼𝑼 is orthogonal matrix, 𝑼𝑼𝑇𝑇 𝑼𝑼 = 𝑰𝑰)
𝜮𝜮† 𝜮𝜮𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝜮𝜮† 𝑼𝑼𝑇𝑇 𝒃𝒃 ⇒ 𝜮𝜮† 𝜮𝜮𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝜮𝜮† 𝑼𝑼𝑇𝑇 𝒃𝒃 (as 𝜮𝜮† 𝜮𝜮 = 𝑰𝑰)
𝑽𝑽𝑽𝑽𝑇𝑇 𝒙𝒙 = 𝑽𝑽𝜮𝜮† 𝑼𝑼𝑇𝑇 𝒃𝒃 ⇒ 𝒙𝒙 = 𝑽𝑽𝜮𝜮† 𝑼𝑼𝑇𝑇 𝒃𝒃 (as 𝑽𝑽 is orthogonal matrix, 𝑽𝑽𝑽𝑽𝑇𝑇 = 𝑰𝑰)
The pseudoinverse using SVD is given by 𝑨𝑨† = 𝑽𝑽𝜮𝜮† 𝑼𝑼𝑇𝑇 , where 𝜮𝜮† is the pseudoinverse of 𝜮𝜮. If 𝜮𝜮 is
diagonal with singular values 𝜎𝜎1 , 𝜎𝜎2 , … , 𝜎𝜎𝑟𝑟 , then 𝜮𝜮† is an 𝑛𝑛 × 𝑚𝑚 diagonal matrix with the diagonal
entries of the first 𝑟𝑟 rows are 1/𝜎𝜎1 , 1/𝜎𝜎2 , … , 1/𝜎𝜎𝑟𝑟 . Thus, to get 𝜮𝜮† we first transpose 𝜮𝜮 and invert the
nonzero singular values (i.e. taking their reciprocals) while keep the zero singular values as zero.
Note that 𝜮𝜮𝜮𝜮† is a square matrix where the first 𝑟𝑟 diagonal entries are 1 and all other entries are 0.
55
Pseudoinverse: Moore-Penrose Inverse
1
1 −
2
• Example: Find pseudoinverse for 𝑨𝑨 = 1 .
− 1
2
56
Further Reading
Refer to the following chapters for more information:
57