0% found this document useful (0 votes)

1K views1,165 pages

Applied Mathematics. OLVER SHAKIBAN

Applied Mathematics

Uploaded by

Gaby Esc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views1,165 pages

Applied Mathematics. OLVER SHAKIBAN

Applied Mathematics

Uploaded by

Gaby Esc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1165

Applied Mathematics

Peter J. Olver
School of Mathematics
University of Minnesota
Minneapolis, MN 55455
[email protected]
http://www.math.umn.edu/olver

Chehrzad Shakiban
Department of Mathematics
University of St. Thomas
St. Paul, MN 55105-1096
[email protected]
http://webcampus3.stthomas.edu/c9shakiban

Table of Contents
Chapter 1. Linear Algebra
1.1. Solution of Linear Systems
1.2. Matrices and Vectors
Basic Matrix Arithmetic
1.3. Gaussian Elimination Regular Case
Elementary Matrices
The L U Factorization
Forward and Back Substitution
1.4. Pivoting and Permutations
Permutation Matrices
The Permuted L U Factorization
1.5. Matrix Inverses
GaussJordan Elimination
Solving Linear Systems with the Inverse
The L D V Factorization
1.6. Transposes and Symmetric Matrices
Factorization of Symmetric Matrices
1.7. Practical Linear Algebra
Tridiagonal Matrices
Pivoting Strategies
1.8. General Linear Systems
Homogeneous Systems
1.9. Determinants

Chapter 2. Vector Spaces

2.1. Vector Spaces
2.2. Subspaces
2.3. Span and Linear Independence
Linear Independence and Dependence
2.4. Bases
2.5. The Fundamental Matrix Subspaces
Kernel and Range
1

The Superposition Principle

Adjoint Systems, Cokernel, and Corange
The Fundamental Theorem of Linear Algebra
2.6. Graphs and Incidence Matrices

Chapter 3. Inner Products and Norms

3.1. Inner Products
Inner Products on Function Space
3.2. Inequalities
The CauchySchwarz Inequality
Orthogonal Vectors
The Triangle Inequality
3.3. Norms
Unit Vectors
Equivalence of Norms
3.4. Positive Definite Matrices
Gram Matrices
3.5. Completing the Square
The Cholesky Factorization
3.6. Complex Vector Spaces
Complex Numbers
Complex Vector Spaces and Inner Products

Chapter 4. Minimization and Least Squares Approximation

4.1. Minimization Problems
Equilibrium Mechanics
Solution of Equations
The Closest Point
4.2. Minimization of Quadratic Functions
4.3. The Closest Point
4.4. Least Squares
4.5. Data Fitting and Interpolation
Polynomial Approximation and Interpolation
Approximation and Interpolation by General Functions
Weighted Least Squares
Least Squares Approximation in Function Spaces

Chapter 5. Orthogonality
5.1. Orthogonal Bases
Computations in Orthogonal Bases
5.2. The GramSchmidt Process
A Modified GramSchmidt Process
5.3. Orthogonal Matrices
The Q R Factorization
5.4. Orthogonal Polynomials
2

The Legendre Polynomials

Other Systems of Orthogonal Polynomials
5.5. Orthogonal Projections and Least Squares
Orthogonal Projection
Orthogonal Least Squares
Orthogonal Polynomials and Least Squares
5.6. Orthogonal Subspaces
Orthogonality of the Fundamental Matrix Subspaces
and the Fredholm Alternative

Chapter 6. Equilibrium
6.1. Springs and Masses
The Minimization Principle
6.2. Electrical Networks
The Minimization Principle and the ElectricalMechanical Analogy
6.3. Structures in Equilibrium
Bars

Chapter 7. Linear Functions, Linear Transformations

and Linear Systems
7.1. Linear Functions
Linear Operators
The Space of Linear Functions
Composition of Linear Functions
Inverses
7.2. Linear Transformations
Change of Basis
7.3. Affine Transformations and Isometries
Isometry
7.4. Linear Systems
The Superposition Principle
Inhomogeneous Systems
Superposition Principles for Inhomogeneous Systems
Complex Solutions to Real Systems
7.5. Adjoints
SelfAdjoint and Positive Definite Linear Functions
Minimization

Chapter 8. Eigenvalues
8.1. First Order Linear Systems of Ordinary Differential Equations
The Scalar Case
The Phase Plane
8.2. Eigenvalues and Eigenvectors
Basic Properties of Eigenvalues
8.3. Eigenvector Bases and Diagonalization
3

Diagonalization
8.4. Incomplete Matrices and the Jordan Canonical Form
8.5. Eigenvalues of Symmetric Matrices
The Spectral Theorem
Optimization Principles
8.6. Singular Values

Chapter 9. Linear Dynamical Systems

9.1. Linear Dynamical Systems
Existence and Uniqueness
Complete Systems
The General Case
9.2. Stability of Linear Systems
9.3. Two-Dimensional Systems
Distinct Real Eigenvalues
Complex Conjugate Eigenvalues
Incomplete Double Real Eigenvalue
Complete Double Real Eigenvalue
9.4. Dynamics of Structures
Stable Structures
Unstable Structures
Systems with Different Masses
Friction and Damping
9.5. Forcing and Resonance
Electrical Circuits
Forcing and Resonance in Systems
9.6. Matrix Exponentials
Inhomogeneous Linear Systems
Applications in Geometry

Chapter 10. Iteration of Linear Systems

10.1. Linear Iterative Systems
Scalar Systems
Powers of Matrices
10.2. Stability
Fixed Points
10.3. Matrix Norms
Explicit Formulae
The Gerschgorin Circle Theorem
10.4. Markov Processes
10.5. Iterative Solution of Linear Systems
The Jacobi Method
The GaussSeidel Method
Successive OverRelaxation (SOR)
Conjugate Gradients
4

10.6. Numerical Computation of Eigenvalues

The Power Method
The Q R Algorithm
Tridiagonalization

Chapter 11. Boundary Value Problems in One Dimension

11.1. Elastic Bars
11.2. The Greens Function
The Delta Function
Calculus of Generalized Functions
The Greens Function
11.3. Adjoints and Minimum Principles
Adjoints of Differential Operators
Minimum Principles
Inhomogeneous Boundary Conditions
11.4. Beams and Splines
Splines
11.5. SturmLiouville Boundary Value Problems
11.6. Finite Elements
Weak Solutions

Chapter 12. Fourier Series

12.1. Dynamical Equations of Continuous Media
12.2. Fourier Series
Periodic Extensions
Piecewise Continuous Functions
The Convergence Theorem
Even and Odd Functions
Complex Fourier Series
The Delta Function
12.3. Differentiation and Integration
Integration of Fourier Series
Differentiation of Fourier Series
12.4. Change of Scale
12.5. Convergence of the Fourier Series
Convergence in Vector Spaces
Uniform Convergence
Smoothness and Decay
Hilbert Space
Convergence in Norm
Completeness
Pointwise Convergence

Chapter 13. Fourier Analysis

13.1. Discrete Fourier Series and the Fast Fourier Transform
5

Compression and Noise Removal

The Fast Fourier Transform
13.2. Wavelets
The Haar Wavelets
Modern Wavelets
Solving the Dilation Equation
13.3. The Fourier Transform
Derivatives and Integrals
Applications to Differential Equations
Convolution
Fourier Transform on Hilbert Space
The Heisenberg Uncertainty Principle
13.4. The Laplace Transform
The Laplace Transform Calculus
Applications to Initial Value Problems
Convolution

Chapter 14. Vibration and Diffusion in OneDimensional Media

14.1. The Diffusion and Heat Equations
The Heat Equation
Smoothing and Long Time Behavior
Inhomogeneous Boundary Conditions
The Heated Ring
The Fundamental Solution
14.2. Similarity and Symmetry Methods
The Inhomogeneous Heat Equation
The Root Cellar Problem
14.3. The Wave Equation
Forcing and Resonance
14.4. dAlemberts Solution of the Wave Equation
Solutions on Bounded Intervals
14.5. Numerical Methods
Finite Differences
Numerical Solution Methods for the Heat Equation
Numerical Solution Methods for the Wave Equation

Chapter 15. The Laplace Equation

15.1. The Laplace Equation in the Plane
Classification of Linear Partial Differential Equations in the Plane
Characteristics
15.2. Separation of Variables
Polar Coordinates
15.3. The Greens Function
The Method of Images
15.4. Adjoints and Minimum Principles
6

Uniqueness
Adjoints and Boundary Conditions
Positive Definiteness and the Dirichlet Principle
15.5. Finite Elements
Finite Elements and Triangulation
The Finite Element Equations
Assembling the Elements
The Coefficient Vector and the Boundary Conditions
Inhomogeneous Boundary Conditions
Second Order Elliptic Boundary Value Problems

Chapter 16. Complex Analysis

16.1. Complex Variables
Examples of Complex Functions
16.2. Complex Differentiation
Power Series and Analyticity
16.3. Harmonic Functions
Applications to Fluid Mechanics
16.4. Conformal Mapping
Analytic Maps
Conformality
Composition and The Riemann Mapping Theorem
Annular Domains
Applications to Harmonic Functions and Laplaces Equation
Applications to Fluid Flow
Poissons Equation and the Greens Function
16.5. Complex Integration
Lift and Circulation
16.6. Cauchys Integral Formulae and The Calculus of Residues
Cauchys Integral Formula
Derivatives by Integration
The Calculus of Residues
The Residue Theorem
Evaluation of Real Integrals

Chapter 17. Dynamics of Planar Media

17.1. Diffusion in Planar Media
Derivation of the Diffusion Equation
Self-Adjoint Formulation
17.2. Solution Techniques for Diffusion Equations
Qualitative Properties
Inhomogeneous Boundary Conditions and Forcing
17.3. Explicit Solutions for the Heat Equation
Heating of a Rectangle
Heating of a Disk
7

17.4. The Fundamental Solution

17.5. The Planar Wave Equation
Separation of Variables
17.6. Analytical Solutions of the Wave Equation
Vibration of a Rectangular Drum
Vibration of a Circular Drum
Scaling and Symmetry
17.7. Nodal Curves

Chapter 18. Partial Differential Equations in Space

18.1. The Laplace and Poisson Equations
18.2. Separation of Variables
Laplaces Equation in a Ball
18.3. The Greens Function
The Greens Function on the Entire Space
Bounded Domains and the Method of Images
18.4. The Heat Equation in Three-Dimensional Media
Heating of a Ball
The Fundamental Solution to the Heat Equation
18.5. The Wave Equation in Three-Dimensional Media
Vibrations of a Ball
18.6. Spherical Waves and Huygens Principle
The Method of Descent

Chapter 19. Nonlinear Systems

19.1. Iteration of Functions
Scalar Functions
Quadratic Convergence
VectorValued Iteration
19.2. Solution of Equations and Systems
The Bisection Algorithm
Fixed Point Methods
Newtons Method
Systems of Equations
19.3. Optimization
The Objective Function
The Gradient
Critical Points
The Second Derivative Test
Minimization of Scalar Functions
Gradient Descent
Conjugate gradients

Chapter 20. Nonlinear Ordinary Differential Equations

20.1. First Order Systems of Ordinary Differential Equations
8

Scalar Ordinary Differential Equations

First Order Systems
Higher Order Systems
20.2. Existence, Uniqueness, and Continuous Dependence
Existence
Uniqueness
Continuous Dependence
20.3. Stability
Stability of Scalar Differential Equations
Linearization and Stability
Conservative Systems
Lyapunovs Method
20.4. Numerical Solution Methods
Eulers Method
Taylor Methods
Error Analysis
An Equivalent Integral Equation
Implicit and PredictorCorrector Methods
RungeKutta Methods
Stiff Differential Equations

Chapter 21. The Calculus of Variations

21.1. Examples of Variational Problems
Minimal Curves and Geodesics
Minimal Surfaces
21.2. The Simplest Variational Problem
The First Variation and the EulerLagrange Equation
Curves of Shortest Length
Minimal Surface of Revolution
The Brachistrochrone Problem
21.3. The Second Variation
21.4. Multi-dimensional Variational Problems
21.5. Numerical Methods for Variational Problems
Finite Elements
Nonlinear Shooting

Chapter 22. Nonlinear Partial Differential Equations

22.1. Nonlinear Waves and Shocks
A Nonlinear Wave Equation
22.2. Nonlinear Diffusion
Burgers Equation
The HopfCole Transformation
Viscosity Solutions
22.3. Dispersion and Solitons
The KortewegdeVries Equation
9

22.4. Conclusion and Bon Voyage

Appendix A. Vector Calculus in Two Dimensions

A.1.
A.2.
A.3.
A.4.
A.5.

Plane Curves
Planar Domains
Vector Fields
Gradient and Curl
Integrals on Curves
Arc Length
Arc Length Integrals
Line Integrals of Vector Fields
Flux
A.6. Double Integrals
A.7. Greens Theorem

Appendix B. Vector Calculus in Three Dimensions

B.1. Dot and Cross Product
B.2. Curves
B.3. Line Integrals
Arc Length
Line Integrals of Vector Fields
B.4. Surfaces
Tangents to Surfaces
B.5. Surface Integrals
Surface Area
Flux Integrals
B.6. Volume Integrals
Change of Variables
B.7. Gradient, Divergence, and Curl
The Gradient
Divergence and Curl
Interconnections and Connectedness
B.8. The Fundamental Integration Theorems
The Fundamental Theorem for Line Integrals
Stokes Theorem
The Divergence Theorem

Appendix C. Series
C.1. Power Series
Taylors Theorem
C.2. Laurent Series
C.3. Special Functions
The Gamma Function
Series Solutions of Ordinary Differential Equations
Regular Points
10

The Airy Equation

The Legendre Equation
Regular Singular Points
Bessels Equation

Chapter 1
Linear Algebra
The source of linear algebra is the solution of systems of linear algebraic equations.
Linear algebra is the foundation upon which almost all applied mathematics rests. This is
not to say that nonlinear equations are less important; rather, progress in the vastly more
complicated nonlinear realm is impossible without a firm grasp of the fundamentals of
linear systems. Furthermore, linear algebra underlies the numerical analysis of continuous
systems, both linear and nonlinear, which are typically modeled by differential equations.
Without a systematic development of the subject from the start, we will be ill equipped
to handle the resulting large systems of linear equations involving many (e.g., thousands
of) unknowns.
This first chapter is devoted to the systematic development of direct algorithms for
solving systems of linear algegbraic equations in a finite number of variables. Our primary
focus will be the most important situation involving the same number of equations as
unknowns, although in Section 1.8 we extend our techniques to completely general linear
systems. While the former usually have a unique solution, more general systems more
typically have either no solutions, or infinitely many, and so tend to be of less direct physical
relevance. Nevertheless, the ability to confidently handle all types of linear systems is a
basic prerequisite for the subject.
The basic solution algorithm is known as Gaussian elimination, in honor of one of
the all-time mathematical greats the nineteenth century German mathematician Carl
Friedrich Gauss. As the father of linear algebra, his name will occur repeatedly throughout
this text. Gaussian elimination is quite elementary, but remains one of the most important
techniques in applied (as well as theoretical) mathematics. Section 1.7 discusses some
practical issues and limitations in computer implementations of the Gaussian elimination
method for large systems arising in applications.
The systematic development of the subject relies on the fundamental concepts of
scalar, vector, and matrix, and we quickly review the basics of matrix arithmetic. Gaussian elimination can be reinterpreted as matrix factorization, the (permuted) L U decomposition, which provides additional insight into the solution algorithm. Matrix inverses
and determinants are discussed in Sections 1.5 and 1.9, respectively. However, both play a
relatively minor role in practical applied mathematics, and so will not assume their more
traditional central role in this applications-oriented text.

Indirect algorithms, which are based on iteration, will be the subject of Chapter 10.

1/12/04

c 2003

Peter J. Olver

1.1. Solution of Linear Systems.

Gaussian elimination is a simple, systematic approach to the solution of systems of
linear equations. It is the workhorse of linear algebra, and as such of absolutely fundamental importance in applied mathematics. In this section, we review the method in the
most important case in which there are the same number of equations as unknowns. The
general situation will be deferred until Section 1.8.
To illustrate, consider an elementary system of three linear equations
x + 2 y + z = 2,
2 x + 6 y + z = 7,
x + y + 4 z = 3,

(1.1)

in three unknowns x, y, z. Linearity refers to the fact that the unknowns only appear to
the first power in the equations. The basic solution method is to systematically employ
the following fundamental operation:
Linear System Operation #1 : Add a multiple of one equation to another equation.
Before continuing, you should convince yourself that this operation does not change
the solutions to the system. As a result, our goal is to judiciously apply the operation
and so be led to a much simpler linear system that is easy to solve, and, moreover has the
same solutions as the original. Any linear system that is derived from the original system
by successive application of such operations will be called an equivalent system. By the
preceding remark, equivalent linear systems have the same solutions.
The systematic feature is that we successively eliminate the variables in our equations
in order of appearance. We begin by eliminating the first variable, x, from the second
equation. To this end, we subtract twice the first equation from the second, leading to the
equivalent system
x + 2 y + z = 2,
2 y z = 3,
(1.2)
x + y + 4 z = 3.

Next, we eliminate x from the third equation by subtracting the first equation from it:
x + 2 y + z = 2,
2 y z = 3,
y + 3 z = 1.

(1.3)

The equivalent system (1.3) is already simpler than the original system (1.1). Notice that
the second and third equations do not involve x (by design) and so constitute a system
of two linear equations for two unknowns. Moreover, once we have solved this subsystem
for y and z, we can substitute the answer into the first equation, and we need only solve
a single linear equation for x.

Also, there are no product terms like x y or x y z. The official definition of linearity will be
deferred until Chapter 7.

1/12/04

c 2003

Peter J. Olver

We continue on in this fashion, the next phase being the elimination of the second
variable y from the third equation by adding 12 the second equation to it. The result is
x + 2 y + z = 2,
2 y z = 3,
5
5
2 z = 2,

(1.4)

which is the simple system we are after. It is in what is called triangular form, which means
that, while the first equation involves all three variables, the second equation only involves
the second and third variables, and the last equation only involves the last variable.
Any triangular system can be straightforwardly solved by the method of Back Substitution. As the name suggests, we work backwards, solving the last equation first, which
requires z = 1. We substitute this result back into the next to last equation, which becomes 2 y 1 = 3, with solution y = 2. We finally substitute these two values for y and
z into the first equation, which becomes x + 5 = 2, and so the solution to the triangular
system (1.4) is
x = 3,
y = 2,
z = 1.
(1.5)
Moreover, since we only used our basic operation to pass from (1.1) to the triangular
system (1.4), this is also the solution to the original system of linear equations. We note
that the system (1.1) has a unique meaning one and only one solution, namely (1.5).
And that, barring a few complications that can crop up from time to time, is all that
there is to the method of Gaussian elimination! It is very simple, but its importance cannot
be overemphasized. Before discussing the relevant issues, it will help to reformulate our
method in a more convenient matrix notation.

1.2. Matrices and Vectors.

A matrix is a rectangular array of numbers. Thus,

1
1 0 3

e
2
( .2 1.6 .32 ),
,
,

2 4 1
1 .83

5 74

are all examples of matrices. We use the notation

a
a12 . . .
11
a
21 a22 . . .
A=
..
..
..
.
.
.
am1

am2

...

a1n
a2n
..
.
amn

0
,
0

1
2

3
,
5

(1.6)

for a general matrix of size m n (read m by n), where m denotes the number of rows in
A and n denotes the number of columns. Thus, the preceding examples of matrices have
respective sizes 2 3, 4 2, 1 3, 2 1 and 2 2. A matrix is square if m = n, i.e., it
has the same number of rows as columns. A column vector is a m 1 matrix, while a row
1/12/04

c 2003

Peter J. Olver

vector is a 1 n matrix. As we shall see, column vectors are by far the more important of
the two, and the term vector without qualification will always mean column vector.
A 1 1 matrix, which has but a single entry, is both a row and column vector.
The number that lies in the ith row and the j th column of A is called the (i, j) entry
of A, and is denoted by aij . The row index always appears first and the column index
second . Two matrices are equal, A = B, if and only if they have the same size, and all
their entries are the same: aij = bij .
A general linear system of m equations in n unknowns will take the form
a11 x1 + a12 x2 + + a1n xn = b1 ,

a21 x1 + a22 x2 + + a2n xn

..
..
.
.
am1 x1 + am2 x2 + + amn xn

= b2 ,
..
.
= bm .

(1.7)

As such, it has three basic constituents: the m n coefficient matrix A, with entries a ij as
x
1
x2

in (1.6), the column vector x =

.. containing the unknowns, and the column vector
.

b
1
b2
b =
..
.

containing right hand sides. For instance, in our previous example (1.1),

the coefficient matrix is A = 2

1

2

b = 7 contains the right hand

x
2 1

6 1 , the vector of unknowns is x = y , while

z
1 4

sides.

Remark : We will consistently use bold face lower case letters to denote vectors, and
ordinary capital letters to denote general matrices.
Matrix Arithmetic
There are three basic operations in matrix arithmetic: matrix addition, scalar multiplication, and matrix multiplication. First we define addition of matrices. You are only
allowed to add two matrices of the same size, and matrix addition is performed entry by

In tensor analysis, [ 2 ], a sub- and super-script notation is adopted, with a ij denoting the
(i, j) entry of the matrix A. This has certain advantages, but, to avoid possible confusion with
powers, we shall stick with the simpler subscript notation throughout this text.

1/12/04

c 2003

Peter J. Olver

entry. Therefore, if A and B are m n matrices, their sum C = A + B is the m n matrix

whose entries are given by cij = aij + bij for i = 1, . . . , m, j = 1, . . . , n. For example,

1
1

2
0

3 5
2 1

4
1

3
1

When defined, matrix addition is commutative, A + B = B + A, and associative, A + (B +

C) = (A + B) + C, just like ordinary addition.
A scalar is a fancy name for an ordinary number the term merely distinguishes it
from a vector or a matrix. For the time being, we will restrict our attention to real scalars
and matrices with real entries, but eventually complex scalars and complex matrices must
be dealt with. We will often identify a scalar c R with the 1 1 matrix ( c ) in which
it is the sole entry. Scalar multiplication takes a scalar c and an m n matrix A and
computes the m n matrix B = c A by multiplying each entry of A by c. Thus, b ij = c aij
for i = 1, . . . , m, j = 1, . . . , n. For example,

1 2
3 6
3
=
.
1 0
3 0
Basic properties of scalar multiplication are summarized at the end of this section.
Finally, we define matrix multiplication. First, the product between a row vector a
and a column vector x having the same number of entries is the scalar defined by the
following rule:
x
1

n
X
x2

a x = ( a 1 a2 . . . a n )
=
a
x
+
a
x
+

+
a
x
=
ak xk .
1 1
2 2
n n
..
.
k=1
xn

(1.8)

More generally, if A is an m n matrix and B is an n p matrix, so that the number of

columns in A equals the number of rows in B, then the matrix product C = A B is defined
as the m p matrix whose (i, j) entry equals the vector product of the ith row of A and
the j th column of B. Therefore,
cij =

n
X

aik bkj .

(1.9)

k=1

Note that our restriction on the sizes of A and B guarantees that the relevant row and
column vectors will have the same number of entries, and so their product is defined.
For example, the product of the coefficient matrix A and vector of unknowns x for
our original system (1.1) is given by

x + 2y + z
1 2 1
x
A x = 2 6 1 y = 2 x + 6 y + z .
x + y + 4z
1 1 4
z
1/12/04

c 2003

Peter J. Olver

The result is a column vector whose entries reproduce the left hand sides of the original
linear system! As a result, we can rewrite the system
Ax = b

(1.10)

as an equality between two column vectors. This result is general; a linear system (1.7)
consisting of m equations in n unknowns can be written in the matrix form (1.10) where
A is the m n coefficient matrix (1.6), x is the n 1 column vectors of unknowns, and b
is the m 1 column vector containing the right hand sides. This is the reason behind the
non-evident definition of matrix multiplication. Component-wise multiplication of matrix
entries turns out to be almost completely useless in applications.
Now, the bad news. Matrix multiplication is not commutative. For example, BA may
not be defined even when A B is. Even if both are defined, they may be different sized
matrices. For example the product of a row vector r, a 1 n matrix, and a column vector
c, an n 1 matrix, is a 1 1 matrix or scalar s = r c, whereas the reversed product C = c r
is an n n matrix. For example,

3 6
3
3
.
(1 2) =
= 3,
whereas
(1 2)
0 0
0
0
In computing the latter product, dont forget that we multiply the rows of the first matrix
by the columns of the second. Moreover, even if the matrix products A B and B A have
the same size, which requires both A and B to be square matrices, we may still have
AB 6
= B A. For example,

1 2
0 1
2 5
3 4
0 1
1 2
=
=
6
=
.
3 4
1 2
4 11
5 6
1 2
3 4
On the other hand, matrix multiplication is associative, so A (B C) = (A B) C whenever A
has size m n, B has size n p and C has size p q; the result is a matrix of size m q.
The proof of this fact is left to the reader. Consequently, the one significant difference
between matrix algebra and ordinary algebra is that you need to be careful not to change
the order of multiplicative factors without proper justification.
Since matrix multiplication multiplies rows times columns, one can compute the
columns in a matrix product C = A B by multiplying the matrix A by the individual
columns of B. The k th column of C is equal to the product of A with the k th column of
B. For example, the two columns of the matrix product

3 4

1 1 2
1 4

0 2 =
2 0 2
8 6
1 1
are obtained by multiplying the first matrix with the individual columns of the second:

1
1
2
1
1 1 2
2 = 4 .
,
0 =
6
2 0 2
8
2 0 2
1
1
1/12/04

c 2003

Peter J. Olver

In general, if we use bj to denote the j th column of B, then

A B = A b1 b2 . . . b p = A b 1 A b 2 . . . A b p .

(1.11)

There are two important special matrices. The first is the zero matrix of size m n,
denoted Omn or just O if the size is clear from context. It forms the additive unit, so
A + O = A = O + A for any matrix A of the same size. The role of the multiplicative unit
is played by the square identity matrix

1 0 0 ... 0
0 1 0 ... 0

0 0 1 ... 0
I = In =
. . .
..
..
.. .. ..
.
.
0

0 0

...

of size nn. The entries of I along the main diagonal (which runs from top left to bottom
right) are equal to 1; the off-diagonal entries are all 0. As the reader can check, if A is any
m n matrix, then Im A = A = A In . We will sometimes write the last equation as just
I A = A = A I ; even though the identity matrices can have different sizes, only one size is
valid for each matrix product to be defined.
The identity matrix is a particular example of a diagonal matrix. In general, a matrix
is diagonal if all its off-diagonal entries are zero: aij = 0 for all i 6
= j. We will sometimes
write D = diag (c1 , . . . , cn ) for the n n diagonalmatrix with
diagonal entries dii = ci .
1 0 0
Thus, diag (1, 3, 0) refers to the diagonal matrix 0 3 0 , while the n n identity
0 0 0
matrix can be written as In = diag (1, 1, . . . , 1).
Let us conclude this section by summarizing the basic properties of matrix arithmetic.
In the following table, A, B, C are matrices, c, d scalars, O is a zero matrix, and I is an
identity matrix. The matrices are assumed to have the correct sizes so that the indicated
operations are defined.

1.3. Gaussian Elimination Regular Case.

With the basic matrix arithmetic operations in hand, let us now return to our primary
task. The goal is to develop a systematic method for solving linear systems of equations.
While we could continue to work directly with the equations, matrices provide a convenient
alternative that begins by merely shortening the amount of writing, but ultimately leads
to profound insight into the solution and its structure.
We begin by replacing the system (1.7) by its matrix constituents. It is convenient to
ignore the vector of unknowns, and form the augmented matrix

a
a12 . . . a1n b1
11

a21 a22 . . . a2n b2

(1.12)
M = A|b =
.. ..
..
..

..
.
. .
.
.

bn
am1 am2 . . . amn
1/12/04

c 2003

Peter J. Olver

Basic Matrix Arithmetic

Commutativity Matrix Addition

A+B =B+A

Associativity Matrix Addition

Zero Matrix Matrix Addition

(A + B) + C = A + (B + C)
A+O=A=O+A

Associativity Scalar Multiplication

Additive Inverse

c (d A) = (c d) A
A + ( A) = O, A = (1)A

Unit Scalar Multiplication

Zero Scalar Multiplication

1A=A
0A=O

Distributivity Matrix Addition

c (A + B) = (c A) + (c B)

Distributivity Scalar Addition

Associativity Matrix Multiplication

(c + d) A = (c A) + (d A)
(A B) C = A (B C)

Identity Matrix
Zero Matrix Matrix Multiplication

A I = A = IA
AO = O = OA

which is an m (n + 1) matrix obtained by tacking the right hand side vector onto the
original coefficient matrix. The extra vertical line is included just to remind us that the
last column of this matrix is special. For example, the augmented matrix for the system
(1.1), i.e.,

x + 2 y + z = 2,
1 2 1 2
2 x + 6 y + z = 7,
(1.13)
is
M = 2 6 1 7 .
3
1
1
4
x + y + 4 z = 3,
Note that one can immediately recover the equations in the original linear system from
the augmented matrix. Since operations on equations also affect their right hand sides,
keeping track of everything is most easily done through the augmented matrix.

For the time being, we will concentrate our efforts on linear systems that have the
same number, n, of equations as unknowns. The associated
matrix A is square,
coefficient

of size n n. The corresponding augmented matrix M = A | b then has size n (n + 1).

The matrix operation that assumes the role of Linear System Operation #1 is:

Elementary Row Operation #1 :

Add a scalar multiple of one row of the augmented matrix to another row.
For example, if we add 2 times the first row of the augmented matrix (1.13) to the
second row, the result is the row vector
2 ( 1 2 1 2 ) + ( 2 6 1 7 ) = ( 0 2 1 3 ).
1/12/04

c 2003

Peter J. Olver

The result can be recognized as the second

1 2
0 2
1 1

row of

1
1
4

the modified augmented matrix

2
(1.14)
3
3

that corresponds to the first equivalent system (1.2). When elementary row operation #1
is performed, it is critical that the result replace the row being added to not the row
being multiplied by the scalar. Notice that the elimination of a variable in an equation
in this case, the first variable in the second equation amounts to making its entry in
the coefficient matrix equal to zero.
We shall call the (1, 1) entry of the coefficient matrix the first pivot. The precise
definition of pivot will become clear as we continue; the one key requirement is that a
pivot be nonzero. Eliminating the first variable x from the second and third equations
amounts to making all the matrix entries in the column below the pivot equal to zero. We
have already done this with the (2, 1) entry in (1.14). To make the (3, 1) entry equal to
zero, we subtract the first row from the last row. The resulting augmented matrix is

1 2
1 2
0 2 1 3 ,

0 1 3 1

which corresponds to the system (1.3). The second pivot is the (2, 2) entry of this matrix,
which is 2, and is the coefficient of the second variable in the second equation. Again, the
pivot must be nonzero. We use the elementary row operation of adding 12 of the second
row to the third row to make the entry below the second pivot equal to 0; the result is the
augmented matrix

1 2 1 2
N = 0 2 1 3 .
0 0 52 25
that corresponds to the triangular system (1.4). We write the final augmented matrix as

N = U |c ,

where

U= 0
0

The corresponding linear system has vector form

2 1
2 1 ,
0 52

c = 3 .
5
2

U x = c.

(1.15)

Its coefficient matrix U is upper triangular , which means that all its entries below the
main diagonal are zero: uij = 0 whenever i > j. The three nonzero entries on its diagonal,
1, 2, 25 , including the last one in the (3, 3) slot are the three pivots. Once the system has
been reduced to triangular form (1.15), we can easily solve it, as discussed earlier, by back
substitution.
1/12/04

c 2003

Peter J. Olver

Gaussian Elimination Regular Case

start
for j = 1 to n
if mjj = 0, stop; print A is not regular
else for i = j + 1 to n
set lij = mij /mjj
add lij times row j of M to row i of M

next i
next j
end

The preceding algorithm for solving a linear system is known as regular Gaussian
elimination. A square matrix A will be called regular if the algorithm successfully reduces
it to upper triangular form U with all non-zero pivots on the diagonal. In other words,
for regular matrices, we identify each successive nonzero entry in a diagonal position as
the current pivot. We then use the pivot row to make all the entries in the column below
the pivot equal to zero through elementary row operations of Type #1. A system whose
coefficient matrix is regular is solved by first reducing the augmented matrix to upper
triangular form and then solving the resulting triangular system by back substitution.
Let us state this algorithm in the form of a program, written in a general pseudocode
that can be easily translated into any specific language, e.g., C++, Fortran, Java,
Maple, Mathematica or Matlab. We use a single letter M = (mij ) to denote the

current augmented matrix at each stage in the computation, and initialize M = A | b .

Note that the entries of M will change as the algorithm progresses. The final
output of
the program, assuming A is regular, is the augmented matrix M = U | c , where U is
the upper triangular matrix U whose diagonal entries uii are the pivots and c is the vector
of right hand sides obtained after performing the elementary row operations.
Elementary Matrices
A key observation is that elementary row operations can, in fact, be realized by matrix
multiplication.
Definition 1.1. The elementary matrix E associated with an elementary row operation for matrices with m rows is the matrix obtained by applying the row operation to
the m m identity matrix Im .

Strangely, there is no commonly accepted term for these kinds of matrices. Our proposed
adjective regular will prove to be quite useful in the sequel.

1/12/04

c 2003

Peter J. Olver

For example, applying the elementary row operation

that

adds 2 times the first row

1 0 0
to the second row to the 33 identity matrix I = 0 1 0 results in the corresponding
0 0 1

1 0 0

elementary matrix E1 = 2 1 0 . We claim that, if A is any 3rowed matrix, then

0 0 1
multiplying E1 A has the same effect as the given elementary row operation. For example,

1 2 1
1 2 1
1 0 0
E1 A = 2 1 0 2 6 1 = 0 2 1 ,
1 1 4
1 1 4
0 0 1
which you may recognize as the first elementary row operation we used
trative example. Indeed, if we set

1 0 0
1 0 0
1 0
E1 = 2 1 0 ,
E2 = 0 1 0 ,
E3 = 0 1
0 0 1
1 0 1
0 12

to solve the illus

0
0,
1

(1.16)

then multiplication by E1 will subtract twice the first row from the second row, multiplication by E2 will subtract the first row from the third row, and multiplication by E 3 will
add 21 the second row to the third row precisely the row operations used to place our
original system in triangular form. Therefore, performing them in the correct order (and
using the associativity of matrix multiplication), we conclude that when

1 2 1
1 2 1
A = 2 6 1,
then
E 3 E2 E1 A = U = 0 2 1 .
(1.17)
5
1 1 4
0 0 2

The reader should check this by directly multiplying the indicated matrices.
In general, then, the elementary matrix E of size m m will have all 1s on the
diagonal, a nonzero entry c in position (i, j), for some i 6
= j, and all other entries equal
to zero. If A is any m n matrix, then the matrix product E A is equal to the matrix
obtained from A by the elementary row operation adding c times row j to row i. (Note
the reversal of order of i and j.)
The elementary row operation that undoes adding c times row j to row i is the inverse
row operation that subtracts c (or, equivalently, adds c) times row j from row i. The
corresponding inverse elementary matrix again has 1s along the diagonal and c in the
(i, j) slot. Let us denote the inverses of the particular elementary matrices (1.16) by L i ,
so that, according to our general rule,

1 0 0
1 0 0
1 0 0
(1.18)
L1 = 2 1 0 ,
L2 = 0 1 0 ,
L3 = 0 1 0 .
1
0 0 1
1 0 1
0 2 1
Note that the product

Li Ei = I
1/12/04

(1.19)
c 2003

Peter J. Olver

is the 3 3 identity matrix, reflecting the fact that these are inverse operations. (A more
thorough discussion of matrix inverses will be postponed until the following section.)
The product of the latter three elementary matrices is equal to

1 0 0
L = L 1 L2 L3 = 2 1 0 .
(1.20)
1 21 1
The matrix L is called a special lower triangular matrix, where lower triangular means
that all the entries above the main diagonal are 0, while special indicates that all the
entries on the diagonal are equal to 1. Observe that the entries of L below the diagonal are
the same as the corresponding nonzero entries in the Li . This is a general fact, that holds
when the lower triangular elementary matrices are multiplied in the correct order. (For
instance, the product L3 L2 L1 is not so easily predicted.) More generally, the following
elementary consequence of the laws of matrix multiplication will be used extensively.
b are lower triangular matrices of the same size, so is their
Lemma 1.2. If L and L
b
product L L. If they are both special lower triangular, so is their product. Similarly, if
b are (special) upper triangular matrices, so is their product U U
b.
U, U
The L U Factorization

We have almost arrived at our first important result. Consider the product of the
matrices L and U in (1.17), (1.20). Using equation (1.19), along with the basic property
of the identity matrix I and associativity of matrix multiplication, we conclude that
L U = (L1 L2 L3 )(E3 E2 E1 A) = L1 L2 (L3 E3 )E2 E1 A = L1 L2 I E2 E1 A
= L1 (L2 E2 )E1 A = L1 I E1 A = L1 E1 A = I A = A.
In other words, we have factorized the coefficient matrix A = L U into a product of a
special lower triangular matrix L and an upper triangular matrix U with the nonzero
pivots on its main diagonal. The same holds true for almost all square coefficient matrices.
Theorem 1.3. A matrix A is regular if and only if it can be factorized
A = L U,

(1.21)

where L is a special lower triangular matrix, having all 1s on the diagonal, and U is upper
triangular with nonzero diagonal entries, which are its pivots. The nonzero off-diagonal
entries lij for i > j appearing in L prescribe the elementary row operations that bring
A into upper triangular form; namely, one subtracts lij times row j from row i at the
appropriate step of the Gaussian elimination process.

2 1 1
Example 1.4. Let us compute the L U factorization of the matrix A = 4 5 2 .
2 2 0
Applying the Gaussian elimination algorithm, we begin by subtracting twice the first row
from the second row, and then subtract the first row from the third. The result is the
1/12/04

c 2003

Peter J. Olver

1
1
3
0 . The next step adds the second row to the third row, leading to
3 1

2 1 1
the upper triangular matrix U = 0 3 0 , with its diagonal entries 2, 3, 1 indicat0 0 1

1 0 0
ing the pivots. The corresponding lower triangular matrix is L = 2 1 0 , whose
1 1 1
entries below the diagonal are the negatives of the multiples we used during the elimination
procedure. Namely, the (2, 1) entry of L indicates that we added 2 times the first row
to the second row; the (3, 1) entry indicates that we added 1 times the first row to the
third; and, finally, the (3, 2) entry indicates that we added the second row to the third
row during the algorithm. The reader might wish to verify the factorization A = L U , or,
explicitly,

1 0 0
2 1 1
2 1 1
4 5 2 = 2 1 00 3 0 .
1 1 1
0 0 1
2 2 0
2

matrix 0
0

Forward and Back Substitution

Once we know the L U factorization of a regular matrix A, we are able to solve any
associated linear system A x = b in two stages:
(1) First solve the lower triangular system
Lc = b

(1.22)

for the vector c by forward substitution. This is the same as back substitution, except one
solves the equations for the variables in the direct order from first to last. Explicitly,
c1 = b1 ,

c i = bi

i
X

lij cj ,

for

i = 2, 3, . . . , n,

(1.23)

j =1

noting that the previously computed values of c1 , . . . , ci1 are used to determine ci .
(2) Second, solve the resulting upper triangular system
Ux=c

(1.24)

by back substitution. Explicitly, the values of the unknowns

n
X
c
1
xn = n ,
xi =
ci
for
i = n 1, . . . , 2, 1,
uij xj ,
unn
uii
j = i+1

(1.25)

are successively computed, but now in reverse order.

Note that this algorithm does indeed solve the original system, since if
Ux=c
1/12/04

and

L c = b,

then
13

A x = L U x = L c = b.
c 2003

Peter J. Olver

Once we have found the L U factorization of the coefficient matrix A, the Forward and
Back Substitution processes quickly produce the solution, and are easy to program on a
computer.
Example 1.5. With the

2 1
4 5
2 2

L U decomposition

1 0 0
2 1
1
2 = 2 1 00 3
1 1 1
0 0
0

1
0
1

found in Example 1.4, we can readily solve any linear system with the given coefficient
matrix by Forward and Back Substitution. For instance, to find the solution to

1
2 1 1
x
4 5 2y = 2,
2
2 2 0
z

we first solve the lower triangular system

1
2
1

1
0 0
a
1 0b = 2,
2
1 1
c

or, explicitly,

a
2a + b

= 1,
= 2,

a b + c = 2.

The first equation says a = 1; substituting into the second, we find b = 0; the final equation
gives c = 1. We then solve the upper triangular system

2 1
0 3
0 0

1
x
a
1
0 y = b = 0,
1
z
c
1

which is

2 x + y + z = 1,
3y
= 0,
z = 1.

In turn, we find z = 1, then y = 0, and then x = 1, which is the unique solution to the
original system.
Of course, if we are not given the L U factorization in advance, we can just use direct
Gaussian elimination on the augmented matrix. Forward and Back Substitution is useful
if one has already computed the factorization by solving for a particular right hand side
b, but then later wants to know the solutions corresponding to alternative bs.

1.4. Pivoting and Permutations.

The method of Gaussian elimination presented so far applies only to regular matrices.
But not every square matrix is regular; a simple class of examples are matrices whose
upper left entry is zero, and so cannot serve as the first pivot. More generally, the regular
elimination algorithm cannot proceed whenever a zero entry appears in the current pivot
spot on the diagonal. Zero can never serve as a pivot, since we cannot use it to eliminate
any nonzero entries in the column below it. What then to do? The answer requires
revisiting the source of our algorithm.
1/12/04

c 2003

Peter J. Olver

Let us consider, as a specific example, the linear system

3 y + z = 2,
(1.26)

2 x + 6 y + z = 7,
x + 4 z = 3.
The augmented coefficient matrix is

0
2
1

3 1 2
6 1 7 .
0 4 3

In this case, the (1, 1) entry is 0, and is not a legitimate pivot. The problem, of course,
is that the first variable x does not appear in the first equation, and so we cannot use it
to eliminate x in the other two equations. But this problem is actually a bonus we
already have an equation with only two variables in it, and so we only need to eliminate x
from one of the other two equations. To be systematic, we rewrite the system in a different
order,
2 x + 6 y + z = 7,
3 y + z = 2,
x + 4 z = 3,
by interchanging the first two equations. In other words, we employ
Linear System Operation #2 : Interchange two equations.
Clearly this operation does not change the solution, and so produces an equivalent
system. In our case, the resulting augmented coefficient matrix is

2 6 1 7
0 3 1 2,

1 0 4 3

and is obtained from the original by performing the second type of row operation:
Elementary Row Operation #2 : Interchange two rows of the matrix.

The new nonzero upper left entry, 2, can now serve as the first pivot, and we may
continue to apply elementary row operations of Type #1 to reduce our matrix to upper
triangular form. For this particular example, we eliminate the remaining nonzero entry in
the first column by subtracting 12 the first row from the last:

2 6 1 7
0 3 1 2 .

0 3 72 12
The (2, 2) entry serves as the next pivot. To eliminate the nonzero entry below it, we add
the second to the third row:

2 6 1 7
0 3 1 2 .

0 0 29 23
1/12/04

c 2003

Peter J. Olver

Gaussian Elimination Nonsingular Case

start
for j = 1 to n
if mkj = 0 for all k j, stop; print A is singular

if mjj = 0 but mkj 6

= 0 for some k > j, switch rows k and j
for i = j + 1 to n
set lij = mij /mjj
add lij times row j to row i of M

next i
next j
end

We have now placed the system in upper triangular form, with the three pivots, 2, 3, 29
along the diagonal. Back substitution produces the solution x = 35 , y = 95 , z = 31 .
The row interchange that is required when a zero shows up on the diagonal in pivot
position is known as pivoting. Later, in Section 1.7, we shall discuss practical reasons for
pivoting even when a diagonal entry is nonzero. The coefficient matrices for which the
Gaussian elimination algorithm with pivoting produces the solution are of fundamental
importance.
Definition 1.6. A square matrix is called nonsingular if it can be reduced to upper
triangular form with all non-zero elements on the diagonal by elementary row operations
of Types 1 and 2. Conversely, a square matrix that cannot be reduced to upper triangular
form because at some stage in the elimination procedure the diagonal entry and all the
entries below it are zero is called singular .
Every regular matrix is nonsingular, but, as we just saw, nonsingular matrices are
more general. Uniqueness of solutions is the key defining characteristic of nonsingularity.
Theorem 1.7. A linear system A x = b has a unique solution for every choice of
right hand side b if and only if its coefficient matrix A is square and nonsingular.
We are able to prove the if part of this theorem, since nonsingularity implies reduction to an equivalent upper triangular form that has the same solutions as the original
system. The unique solution to the system is found by back substitution. The only if
part will be proved in Section 1.8.
The revised version of the Gaussian Elimination algorithm, valid for all nonsingular
coefficient matrices, is implemented

by the accompanying program. The starting point is

the augmented matrix M = A | b representing the linear system A x = b. After successful termination

of the program, the result is an augmented matrix in upper triangular

form M = U | c representing the equivalent linear system U x = c. One then uses Back
Substitution to determine the solution x to the linear system.
1/12/04

c 2003

Peter J. Olver

Permutation Matrices
As with the first type of elementary row operation, row interchanges can be accomplished by multiplication by a second type of elementary matrix. Again, the elementary
matrix is found by applying the row operation in question to the identity matrix of the
appropriate size. For instance, interchanging rows 1 and 2 of the 3 3 identity matrix
produces the elementary interchange matrix

0 1 0
P = 1 0 0.
0 0 1

As the reader can check, the effect of multiplying a 3 rowed matrix A on the left by P ,
producing P A, is the same as interchanging the first two rows of A. For instance,

4 5 6
1 2 3
0 1 0
1 0 04 5 6 = 1 2 3.
7 8 9
7 8 9
0 0 1

Multiple row interchanges are accomplished by combining such elementary interchange

matrices. Each such combination of row interchanges corresponds to a unique permutation
matrix.

Definition 1.8. A permutation matrix is a matrix obtained from the identity matrix
by any combination of row interchanges.
In particular, applying a row interchange to a permutation matrix produces another
permutation matrix. The following result is easily established.
Lemma 1.9. A matrix P is a permutation matrix if and only if each row of P
contains all 0 entries except for a single 1, and, in addition, each column of P also contains
all 0 entries except for a single 1.
In general, if a permutation matrix P has a 1 in position (i, j), then the effect of
multiplication by P is to move the j th row of A into the ith row of the product P A.
matrices, namely

1 0
0 0 1
0 1, 1 0 0.
0 0
0 1 0
(1.27)
These have the following effects: if A is a matrix with row vectors r 1 , r2 , r3 , then multiplication on the left by each of the six permutation matrices produces

r3
r2
r1
r3
r2
r1
r1 ,
r3 ,
r3 ,
r2 ,
r1 ,
r2 ,
r2
r1
r2
r1
r3
r3
Example 1.10.

1 0 0
0
0 1 0, 1
0 0 1
0

There are six

1 0
0

0 0 , 0
0 1
1

different 3 3 permutation

0 1
1 0 0
0

1 0 , 0 0 1 , 0
0 0
0 1 0
1

respectively. Thus, the first permutation matrix, which is the identity, does nothing. The
second, third and fourth represent row interchanges. The last two are non-elementary
permutations; each can be realized as a pair of row interchanges.
1/12/04

c 2003

Peter J. Olver

An elementary combinnatorial argument proves that there are a total of

n ! = n (n 1) (n 2) 3 2 1

(1.28)

different permutation matrices of size n n. Moreover, the product P = P1 P2 of any

two permutation matrices is also a permutation matrix. An important point is that multiplication of permutation matrices is noncommutative the order in which one permutes
makes a difference. Switching the first and second rows, and then switching the second
and third rows does not have the same effect as first switching the second and third rows
and then switching the first and second rows!
The Permuted L U Factorization
As we now know, any nonsingular matrix A can be reduced to upper triangular form
by elementary row operations of types #1 and #2. The row interchanges merely reorder
the equations. If one performs all of the required row interchanges in advance, then
the elimination algorithm can proceed without requiring any further pivoting. Thus, the
matrix obtained by permuting the rows of A in the prescribed manner is regular. In other
words, if A is a nonsingular matrix, then there is a permutation matrix P such that the
product P A is regular, and hence admits an L U factorization. As a result, we deduce the
general permuted L U factorization
P A = L U,

(1.29)

where P is a permutation matrix, L is special lower triangular, and U is upper triangular

with the pivots on the diagonal. For instance, in the preceding example, we permuted the
first and second rows, and hence equation (1.29) has the explicit form

1
0 0
2 6 1
0 1 0
0 2 1
1 0 02 6 1 = 0
(1.30)
1 00 2 1 .
1
9

1
1
0
0
0 0 1
1 1 4
2
2
As a result of these considerations, we have established the following generalization
of Theorem 1.3.

Theorem 1.11. Let A be an n n matrix. Then the following conditions are

equivalent:
(i ) A is nonsingular.
(ii ) A has n nonzero pivots.
(iii ) A admits a permuted L U factorization: P A = L U,.
One should be aware of a couple of practical complications. First, to implement the
permutation P of the rows that makes A regular, one needs to be clairvoyant: it is not
always clear in advance when and where a required row interchange will crop up. Second,
any row interchange performed during the course of the Gaussian Elimination algorithm
will affect the lower triangular matrix L, and precomputed entries must be permuted
accordingly; an example appears in Exercise .
Once the permuted L U factorization is established, the solution to the original system
A x = b is obtained by using the same Forward and Back Substitution algorithm presented
1/12/04

c 2003

Peter J. Olver

above. Explicitly, we first multiply the system A x = b by the permutation matrix, leading
to
b
P A x = P b b,
(1.31)

b has been obtained by permuting the entries of b in the same

whose right hand side b
fashion as the rows of A. We then solve the two systems
b
L c = b,

and

U x = c,

(1.32)

by, respectively, Forward and Back Substitution as before.

Example 1.12. Suppose we

0
2
1

wish to solve

1
x
2 1
6 1 y = 2 .
0
z
1 4

In view of the P A = L U factorization established in (1.30), we need only solve the two auxiliary systems (1.32) by Forward and Back Substitution, respectively. The lower triangular
system is

1
0 0
0 1 0
1
2
a
0
1 0 b = 1 = 1 0 0 2 ,
1
0 0 1
0
0
1 1
c
2
with solution a = 2, b = 1,

2
0
0

c = 2. The resulting upper triangular system is

6 1
x
2
a

2 1
y =
1
= b.
9
0 2
z
2
c

The solution, which is also the solution to the original system, is obtained by back substi5
, x = 37
tution, with z = 49 , y = 18
18 .

1.5. Matrix Inverses.

The inverse of a matrix is analogous to the reciprocal a1 = 1/a of a scalar, which is
the 1 1 case. We already introduced the inverses of matrices corresponding to elementary
row operations. In this section, we will analyze inverses of general square matrices. We
begin with the formal definition.
Definition 1.13. Let A be a square matrix of size n n. An n n matrix X is
called the inverse of A if it satisfies
X A = I = A X,

(1.33)

where I = I n is the n n identity matrix. The inverse is commonly denoted by X = A 1 .

Remark : Noncommutativity of matrix multiplication requires that we impose both
conditions in (1.33) in order to properly define an inverse to the matrix A. The first
condition X A = I says that X is a left inverse, while the second A X = I requires that
X also be a right inverse, in order that it fully qualify as a bona fide inverse of A.
1/12/04

c 2003

Peter J. Olver

Example 1.14. Since

3
1 0 0
3 4 5
1 2 1
3 1 2 1 1 1 = 0 1 0 = 1
4
0 0 1
4 6 7
2 2 1

1 2 1
we conclude that when A = 3 1 2 then A1 =
2 2 1
1
entries of A do not follow any easily discernable pattern

1 2 1
4 5
1 1 3 1 2 ,
2 2 1
6 7

3 4 5
1 1 1 . Note that the
4 6 7
in terms of the entries of A.

Not every square matrix has an inverse. Indeed, not every scalar has an inverse the
one counterexample being a = 0. There is no general concept of inverse for rectangular
matrices.

x y
of a general 2 2 matrix
Example 1.15. Let us compute the inverse X =
z w

a b
A=
. The right inverse condition
c d

1 0
ax + bz ay + bw
= I
=
AX =
0 1
cx + dz cy + dw
holds if and only if x, y, z, w satisfy the linear system
a x + b z = 1,

a y + b w = 0,

c x + d z = 0,

c y + d w = 0.

Solving by Gaussian elimination (or directly), we find

d
,
ad bc

b
,
ad bc

c
,
ad bc

a
,
ad bc

provided the common denominator a d b c 6

= 0 does not vanish. Therefore, the matrix

1
d b
X=
ad bc c a
forms a right inverse to A. However, a short computation shows that it also defines a left
inverse:

xa + yc xb + yd
1 0
XA =
=
= I,
za+wc zb+wd
0 1
and hence X = A1 is the inverse to A.
The denominator appearing in the preceding formulae has a special name; it is called
the determinant of the 2 2 matrix A, and denoted

a b
= a d b c.
(1.34)
det
c d
1/12/04

c 2003

Peter J. Olver

Thus, the determinant of a 2 2 matrix is the product of the diagonal entries minus
the product of the off-diagonal entries. (Determinants of larger square matrices will be
discussed in Section 1.9.) Thus, the 2 2 matrix A is invertible, with

1
d b
1
,
(1.35)
A =
ad bc c a

1
3
, then det A = 2 6
= 0. We conclude
if and only if det A 6
= 0. For example, if A =
2 4
!

3
1

4
3
2
that A has an inverse, which, by (1.35), is A1 =
=
.
1
2
1
2
1
2

The following key result will be established later in this chapter.

Theorem 1.16. A square matrix A has an inverse if and only if it is nonsingular.
Consequently, an n n matrix will have an inverse if and only if it can be reduced to
upper triangular form with n nonzero pivots on the diagonal by a combination of elementary row operations. Indeed, invertible is often used as a synoinym for nonsingular.
All other matrices are singular and do not have an inverse as defined above. Before attempting to prove this fundamental result, we need to first become familiar with some
elementary properties of matrix inverses.
Lemma 1.17. The inverse of a square matrix, if it exists, is unique.
Proof : If X and Y both satisfy (1.33), so X A = I = A X and Y A = I = A Y , then,
by associativity, X = X I = X(A Y ) = (XA) Y = I Y = Y , and hence X = Y .
Q.E.D.
Inverting a matrix twice gets us back to where we started.
Lemma 1.18. If A is invertible, then A1 is also invertible and (A1 )1 = A.
Proof : The matrix inverse equations A1 A = I = A A1 , are sufficient to prove that
A is the inverse of A1 .
Q.E.D.
Example 1.19. We already learned how to find the inverse of an elementary matrix
of type #1; we just negate the one nonzero off-diagonal entry. For example, if

1 0 0
1 0 0
E = 0 1 0,
then
E 1 = 0 1 0 .
2 0 1
2 0 1

This reflects the fact that the inverse of the elementary row operation that adds twice the
first row to the third row is the operation of subtracting twice the first row from the third
row.

0 1 0
Example 1.20. Let P = 1 0 0 denote the elementary matrix that has the
0 0 1
effect of interchanging rows 1 and 2 of a matrix. Then P 2 = I , since doing the same
1/12/04

c 2003

Peter J. Olver

operation twice in a row has no net effect. This implies that P 1 = P is its own inverse.
Indeed, the same result holds for all elementary permutation matrices that correspond to
row operations of type #2. However, it is not true for more general permutation matrices.
Lemma 1.21. If A and B are invertible matrices of the same size, then their product,
A B, is invertible, and
(A B)1 = B 1 A1 .
(1.36)
Note particularly the reversal in order of the factors.
Proof : Let X = B 1 A1 . Then, by associativity,
X (A B) = B 1 A1 A B = B 1 B = I ,

(A B) X = A B B 1 A1 = A A1 = I .

Thus X is both a left and a right inverse for the product matrix A B and the result
follows.
Q.E.D.

1 2
Example 1.22. One verifies, directly, that the inverse of A =
is A1 =
0
1

0 1
0 1
1 2
1
. Therefore, the
is B
=
, while the inverse of B =
0 1
1 0
1 0
1 2
0 1
2 1
inverse of thier product C = A B =
=
is given by C 1 =
0
1

1
0

0 1
1 2
0 1
1 1
B A =
=
.
1 0
0 1
1 2
We can straightforwardly generalize the preceding result. The inverse of a multiple
product of invertible matrices is the product of their inverses, in the reverse order :
1
1 1
(A1 A2 Am1 Am )1 = A1
m Am1 A2 A1 .

(1.37)

Warning: In general, (A + B)1 6

= A1 + B 1 . This equation is not even true for
scalars (1 1 matrices)!
GaussJordan Elimination
The basic algorithm used to compute the inverse of a square matrix is known as
GaussJordan Elimination, in honor of Gauss and Wilhelm Jordan, a nineteenth century
German engineer. A key fact is that we only need to solve the right inverse equation
AX = I

(1.38)

in order to compute X = A1 . The other equation in (1.33), namely X A = I , will then

follow as an automatic consequence. In other words, for square matrices, a right inverse is
automatically a left inverse, and conversely! A proof will appear below.
The reader may well ask, then, why use both left and right inverse conditions in the
original definition? There are several good reasons. First of all, a rectangular matrix
may satisfy one of the two conditions having either a left inverse or a right inverse
but can never satisfy both. Moreover, even when we restrict our attention to square
1/12/04

c 2003

Peter J. Olver

matrices, starting with only one of the conditions makes the logical development of the
subject considerably more difficult, and not really worth the extra effort. Once we have
established the basic properties of the inverse of a square matrix, we can then safely discard
the superfluous left inverse condition. Finally, when we generalize the notion of an inverse
to a linear operator in Chapter 7, then, unlike square matrices, we cannot dispense with
either of the conditions.
Let us write out the individual columns of the right inverse equation (1.38). The i th
column of the n n identity matrix I is the vector ei that has a single 1 in the ith slot
and 0s elsewhere, so

0
0
1
0
1
0

0
0
0
. .
,
. ,
(1.39)
.
.
.
e
=
e
=
e1 =
.
n
2
..
..
..

0
0
0
0

According to (1.11), the ith column of the matrix product A X is equal to A xi , where
xi denotes the ith column of X = ( x1 x2 . . . xn ). Therefore, the single matrix equation
(1.38) is equivalent to n linear systems
A x1 = e1 ,

A x 2 = e2 ,

...

A x n = en ,

(1.40)

all having the same coefficientmatrix. As such, to solve

them we are led to form the
n augmented matrices M1 = A | e1 , . . . , Mn = A | en , and then perform our
Gaussian elimination algorithm on each one. But this would be a waste of effort. Since
the coefficient matrix is the same, we will end up performing identical row operations on
each augmented matrix. Consequently,
it willbe more
efficient
to combine them into one

large augmented matrix M = A | e1 . . . en = A | I , of size n (2 n), in which the

right hand sides e1 , . . . , en of our systems are placed into n different columns, which we
then recognize as reassembling the columns of an n n identity matrix. We may then
apply our elementary row operations to reduce, if possible, the large augmented matrix so
that its first n columns are in upper triangular form.

0 2 1
Example 1.23. For example, to find the inverse of the matrix A = 2 6 1 ,
1 1 4
we form the large augmented matrix

0 2 1 1 0 0
2 6 1 0 1 0.

1 1 4 0 0 1
Applying the same sequence of elementary
change the rows

2 6 1
0 2 1
1 1 4
1/12/04

row operations as in Section 1.4, we first inter

0 1 0

1 0 0,

0 0 1

c 2003

Peter J. Olver

and then eliminate the nonzero entries below the first

2 6 1 0 1
0 2 1 1 0

0 2 72 0 12
Next we eliminate the entry below

2
0
0

pivot,

0
0.
1

the second pivot:

6 1 0 1 0
2 1 1 0 0 .
0 92 1 12 1

At this
stage, we have reduced our augmented matrix to the upper triangular form

U | C , which is equivalent to reducing the original n linear systems A x i = ei to n

upper triangular systems U xi = ci . We could therefore perform n back substitutions to
produce the solutions xi , which would form the individual columns of the inverse matrix
X = (x1 . . . xn ).
In the standard GaussJordan scheme, one instead continues to employ the usual
sequence

of elementary row operations to fully reduce the augmented matrix to the form
I | X in which the left hand n n matrix has become
the identity, while the right hand

matrix is the desired solution X = A1 . Indeed, I | X represents the n trivial, but

equivalent, linear systems I xi = xi with identity coefficient matrix.
Now, the identity matrix has 0s below the diagonal, just like U . It also has 1s along
the diagonal, whereas U has the pivots (which are all nonzero) along the diagonal. Thus,
the next phase in the procedure is to make all the diagonal entries of U equal to 1. To do
this, we need to introduce the last, and least, of our linear systems operations.

Linear System Operation #3 : Multiply an equation by a nonzero constant.

This operation does not change the solution, and so yields an equivalent linear system.
The corresponding elementary row operation is:
Elementary Row Operation #3 : Multiply a row of the matrix by a nonzero scalar.

Dividing the rows of the upper triangular augmented

matrix
U
|
C
by the diagonal

pivots of U will produce a matrix of the form V | K where V is special upper triangular ,
meaning it has all 1s along the diagonal. In the particular example, the result of these
three elementary row operations of Type #3 is

1
1 3 12 0
0
2

0 1 12 12
0
0 ,

0 0 1 2 1 2
9

1
2

where we multiplied the first and second rows by and the third row by 29 .
We are now over half way towards our goal of an identity matrix on the left. We need
only make the entries above the diagonal equal to zero. This can be done by elementary
row operations of Type #1, but now we work backwards as in back substitution. First,
1/12/04

c 2003

Peter J. Olver

eliminate the nonzero entries in the third column lying above the (3, 3) entry; this is done
by subtracting one half the third row from the second and also from the first:

5
1

1 3 0 19
9
9

1
0 1 0 18
91 .
18

2
0 0 1 2
1
9

Finally, subtract
entry:

1
3

the second from the first to eliminate the remaining nonzero off-diagonal

1 0

0 23
18

7
0 18

2
1

7
18
1
18
19

0 1
0 0
9
The final right hand matrix is our desired inverse:

7
23
18
18
7
1
A1 = 18
18
2
9

2
9
19
2
9

thereby completing the GaussJordan procedure. The reader may wish to verify that the
final result does satisfy both inverse conditions A A1 = I = A1 A.
We are now able to complete the proofs of the basic results on inverse matrices. First,
we need to determine the elementary matrix corresponding to an elementary row operation
of type #3. Again, this is obtained by performing the indicated elementary row operation
on the identity matrix. Thus, the elementary matrix that multiplies row i by the nonzero
scalar c 6
= 0 is the diagonal matrix having c in the ith diagonal position, and 1s elsewhere
along the diagonal. The inverse elementary matrix is the diagonal matrix with 1/c in the
ith diagonal position and 1s elsewhere on the main diagonal; it corresponds to the inverse
operation that divides row i by c. For example, the elementary matrix that multiplies the
second row of a 3 n matrix by the scalar 5 is

1 0 0
1 0 0
E = 0 5 0,
and has inverse
E 1 = 0 15 0 .
0 0 1
0 0 1

The GaussJordan method tells us how to reduce any nonsingular square matrix A
to the identity matrix by a sequence of elementary row operations. Let E 1 , E2 , . . . , EN be
the corresponding elementary matrices. Therefore,
EN EN 1 E2 E1 A = I .

(1.41)

X = EN EN 1 E2 E1

(1.42)

We claim that the matrix product

is the inverse of A. Indeed, formula (1.41) says that X A = I , and so X is a left inverse.
Furthermore, each elementary matrix has an inverse, and so by (1.37), X itself is invertible,
with
1
1
X 1 = E11 E21 EN
(1.43)
1 EN .
1/12/04

c 2003

Peter J. Olver

Therefore, multiplying the already established formula X A = I on the left by X 1 , we

find A = X 1 , and so, by Lemma 1.18, X = A1 as claimed. This completes the proof of
Theorem 1.16. Finally, equating A = X 1 to (1.43), and using the fact that the inverse of
an elementary matrix is also an elementary matrix, we have established:
Proposition 1.24.
elementary matrices.

Any nonsingular matrix A can be written as the product of

0
1

1
3

is converted into the identity matrix

0 1
by row operations corresponding to the matrices E1 =
, corresponding to a row
1 0

1 3
1 0
that
, scaling the second row by 1, and E3 =
interchange, E2 =
0 1
0 1
subtracts 3 times the second row from the first. Therefore,

3 1
0 1
1 0
1 3
1
,
=
A = E 3 E2 E1 =
1 0
1 0
0 1
0 1
For example, the 2 2 matrix A =

while
A=

E11 E21 E31

0
1

1
0

1 0
0 1

1 3
0 1

0 1
1 3

As an application, let us prove that the inverse of a nonsingular triangular matrix is

also triangular. Specifically:
Lemma 1.25. If L is a lower triangular matrix with all nonzero entries on the main
diagonal, then L is nonsingular and its inverse L1 is also lower triangular. In particular,
if L is special lower triangular, so is L1 . A similar result holds for upper triangular
matrices.
Proof : It suffices to note that if L has all nonzero diagonal entries, one can reduce
L to the identity by elementary row operations of Types #1 and #3, whose associated
elementary matrices are all lower triangular. Lemma 1.2 implies that the product (1.42)
is then also lower triangular. If L is special, then all the pivots are equal to 1 and so no
elementary row operations of Type #3 are required, so the inverse is a product of special
lower triangular matrices, and hence is special lower triangular.
Q.E.D.
Solving Linear Systems with the Inverse
An important motivation for the matrix inverse is that it enables one to effect an
immediate solution to a nonsingular linear system.
Theorem 1.26. If A is invertible, then the unique solution to the linear system
A x = b is given by x = A1 b.
Proof : We merely multiply the system by A1 , which yields x = A1 A x = A1 b, as
claimed.
Q.E.D.
1/12/04

c 2003

Peter J. Olver

Thus, with the inverse in hand, a more direct way to solve our example (1.26) is to
multiply the right hand side by the inverse matrix:
5

23
7
2

2
x
18
18
9
6
1
1 = 5 ,
7
y =
18

7
18
9
6
z
2
1
2
3
19
9
9
3

reproducing our earlier solution.

However, while sthetically appealing, the solution method based on the inverse matrix is hopelessly inefficient as compared to forward and back substitution based on a
(permuted) L U factorization, and should not be used . A complete justification of this
dictum will be provided in Section 1.7. In contrast to what you might have learned in an
introductory linear algebra course, you should never use the matrix inverse for practical
computations! This is not to say that the inverse is completely without merit. Far from it!
The inverse continues to play a fundamental role in the theoretical side of linear algebra, as
well as providing important insight into the algorithms that are used in practice. But the
basic message of practical, applied linear algebra is that L U decomposition and Gaussian
Elimination are fundamental; inverses are only used for theoretical purposes, and are to
be avoided in all but the most elementary practical computations.
Remark : The reader may have learned a version of the GaussJordan algorithm for
solving a single linear system that replaces the back substitution step by a further application of all three types of elementary row operations in order to reduce the coefficient matrix
to the identity.
In other words, to solve A x = b, we start with the augmented matrix
M = A | b and use all three types of elementary
row operations to produce (assuming

nonsingularity) the fully reduced form I | x , representing the trivial, equivalent system
I x = x, with the solution x to the original system in its final column. However, as we shall
see, back substitution is much more efficient, and is the method of choice in all practical
situations.
The L D V Factorization
The GaussJordan construction leads to a slightly more detailed version of the L U
factorization, which is useful in certain situations. Let D denote the diagonal matrix having
the same diagonal entries as U ; in other words, D has the pivots on its diagonal and zeros
everywhere else. Let V be the special upper triangular matrix obtained from U by dividing
each row by its pivot, so that V has all 1s on the diagonal. We already encountered V
during the course of the GaussJordan method. It is easily seen that U = D V , which
implies the following result.
Theorem 1.27. A matrix A is regular if and only if it admits a factorization
A = L D V,

(1.44)

where L is special lower triangular matrix, D is a diagonal matrix having the nonzero
pivots on the diagonal, and V is special upper triangular.
1/12/04

c 2003

Peter J. Olver

For the matrix appearing in Example 1.5, we have U = D V , where

1 21
2 0 0
2 1 1
V = 0 1
D = 0 3 0 ,
U = 0 3 0 ,
0 0 1
0 0 1
0 0

producing the A = L D V factorization

1 0
2 1 1
4 5 2 = 2 1
1 1
2 2 0

0
2
00
1
0

0
3
0

0
1
0 0
1
0

1
2

1
0

1
2

0 ,
1

0 .
1

Proposition 1.28. If A = L U is regular, then the factors L and U are each uniquely
determined. The same holds for its A = L D V factorization.
eU
e . Since the diagonal entries of all four matrices are nonProof : Suppose L U = L
zero, Lemma 1.25 implies that they are invertible. Therefore,
e 1 L = L
e 1 L U U 1 = L
e 1 L
eU
e U 1 = U
e U 1 .
L

(1.45)

The left hand side of the matrix equation (1.45) is the product of two special lower triangular matrices, and so, according to Lemma 1.2, is itself special lower triangular with
1s on the diagonal. The right hand side is the product of two upper triangular matrices,
and hence is itself upper triangular. Comparing the individual entries, the only way such a
special lower triangular matrix could equal an upper triangular matrix is if they both equal
e 1 L = I = U
e U 1 , which implies that L
e=L
the diagonal identity matrix. Therefore, L
e
and U = U , and proves the result. The L D V version is an immediate consequence. Q.E.D.

As you may have guessed, the more general cases requiring one or more row interchanges lead to a permuted L D V factorization in the following form.

Theorem 1.29. A matrix A is nonsingular if and only if there is a permutation

matrix P such that
P A = L D V,
(1.46)
where L, D, V are as before.
Uniqueness does not hold for the more general permuted factorizations (1.29), (1.46)
since there may be various permutation matrices that place a matrix A in regular form
P A; see Exercise for an explicit example. Moreover, unlike the regular case, the pivots,
i.e., the diagonal entries of U , are no longer uniquely defined, but depend on the particular
combination of row interchanges employed during the course of the computation.

1.6. Transposes and Symmetric Matrices.

Another basic operation on a matrix is to interchange its rows and columns. If A is
an m n matrix, then its transpose, denoted AT , is the n m matrix whose (i, j) entry
equals the (j, i) entry of A; thus
B = AT
1/12/04

means that
28

bij = aji .
c 2003

Peter J. Olver

For example, if
A=

1 2
4 5

3
6

then

1
AT = 2
3

4
5.
6

Note that the rows of A are the columns of AT and vice versa. In particular, the transpose
of a row vector is a column vector, while the transpose of a column vector is a row vector.

1
For example, if
v = 2 ,
then
vT = ( 1 2 3 ).
3

The transpose of a scalar, considered as a 1 1 matrix, is itself: c T = c for c R.

Remark : Most vectors appearing in applied mathematics are column vectors. To

conserve vertical space in this text, we will often use the transpose notation, e.g., v =
T
( v1 v2 v3 ) , as a compact way of writing column vectors.
In the square case, transpose can be viewed as reflecting the matrix entries across
the main diagonal. For example,

1
2 1
1 3 2
3
0
5 = 2 0 4 .
2 4 8
1 5 8
In particular, the transpose of a lower triangular matrix is upper triangular and vice-versa.
Performing the transpose twice gets you back to where you started:
(AT )T = A.

(1.47)

Unlike the inverse, the transpose is compatible with matrix addition and scalar multiplication:
(1.48)
(A + B)T = AT + B T ,
(c A)T = c AT .
The transpose is also compatible with matrix multiplication, but with a twist. Like the
inverse, the transpose reverses the order of multiplication:
(A B)T = B T AT .

(1.49)

The proof of (1.49) is a straightforward consequence of the basic laws of matrix multiplication. An important special case is the product between a row vector v T and a column
vector w. In this case,
vT w = (vT w)T = wT v,
(1.50)
because the product is a scalar and so equals its own transpose.
Lemma 1.30. The operations of transpose and inverse commute. In other words, if
A is invertible, so is AT , and its inverse is
AT (AT )1 = (A1 )T .
1/12/04

(1.51)
c 2003

Peter J. Olver

Proof : Let Y = (A1 )T . Then, according to (1.49),

Y AT = (A1 )T AT = (A A1 )T = I T = I .
The proof that AT Y = I is similar, and so we conclude that Y = (AT )1 .

Q.E.D.

Factorization of Symmetric Matrices

The most important class of square matrices are those that are unchanged by the
transpose operation.
Definition 1.31. A square matrix is symmetric if it equals its own transpose: A =
T

A .
Thus, A is symmetric if and only if its entries satisfy aji = aij for all i, j. In other
words, entries lying in mirror image positions relative to the main diagonal must be
equal. For example, the most general symmetric 3 3 matrix has the form

a b c
A = b d e .
c e f

Note that any diagonal matrix, including the identity, is symmetric. A lower or upper
triangular matrix is symmetric if and only if it is, in fact, a diagonal matrix.
The L D V factorization of a nonsingular matrix takes a particularly simple form if
the matrix also happens to be symmetric. This result will form the foundation of some
significant later developments.

Theorem 1.32. A symmetric matrix A is regular if and only if it can be factored as

A = L D LT ,

(1.52)

where L is a special lower triangular matrix and D is a diagonal matrix with nonzero
diagonal entries.
Proof : We already know, according to Theorem 1.27, that we can factorize
A = L D V.

(1.53)

We take the transpose of both sides of this equation and use the fact that the tranpose of
a matrix product is the product of the transposes in the reverse order, whence
AT = (L D V )T = V T DT LT = V T D LT ,

(1.54)

where we used the fact that a diagonal matrix is automatically symmetric, D T = D. Note
that V T is special lower triangular, and LT is special upper triangular. Therefore (1.54)
gives the L D V factorization of AT .
In particular, if A = AT , then we can invoke the uniqueness of the L D V factorization,
cf. Proposition 1.28, to conclude that L = V T , and V = LT , (which are two versions of
the same equation). Replacing V by LT in (1.53) proves the factorization (1.52). Q.E.D.
1/12/04

c 2003

Peter J. Olver

T
Example
1.33. Let
us find the L D L factorization of the particular symmetric
1 2 1
matrix A = 2 6 1 . This is done by performing the usual Gaussian elimination
1 1 4
algorithm. Subtracting twice
from the second and also the first row from the
the first row
1 2
1
third produces the matrix 0 2 1 . We then add one half of the second row of
0 1 3
the latter matrix to its third row, resulting in the upper triangular form

1 0 0
1 2 1
1 2 1
U = 0 2 1 = 0 2 0 0 1 12 = D V,
0 0 1
0 0 52
0 0 52

which we further factorize by dividing each row of U by its pivot. On the other
hand, the

1 0 0
special lower triangular matrix associated with the row operations is L = 2 1 0 ,
1 12 1
which, as guaranteed by Theorem 1.32, is the transpose of V = LT . Therefore, the desired
A = L U = L D LT factorizations of this particular symmetric matrix are

1 0 0
1 2 1
1 0 0
1 2 1
1 2 1
1 0 0
2 6 1 = 2 1 0 0 2 1 = 2 1 0 0 2 0 0 1 1 .
2
1 12 1
1 1 4
1 12 1
0 0 52
0 0 52
0 0 1
Example 1.34. Let us look at a general 2 2 symmetric matrix

a b
A=
.
b c

(1.55)

Regularity requires that the

first pivot be
a6
= 0. A single row operation will place A in
a
c

upper triangular form U =

a c b2 . The associated lower triangular matrix is
0
a
!

a
0
1 0
. Thus, A = L U . Finally, D =
L= b
a c b2 is just the diagonal part of
1
0
a
a
U , and we find U = D LT , so that the L D LT factorization is explicitly given by
!
!
!

a
0
b
1 0
a b
1
2
(1.56)
= b
a .
a
c

b
b c
1
0
0 1
a
a
Remark : If A = L D LT , then A is necessarily symmetric. Indeed,
AT = (L D LT )T = (LT )T DT LT = L D LT = A.
T
However, not every symmetric matrix has
an LD L factorization. A simple example is
0 1
the irregular but invertible 2 2 matrix
.
1 0

1/12/04

c 2003

Peter J. Olver

1.7. Practical Linear Algebra.

For pedagogical reasons, the examples and exercise that have been used to illustrate
the algorithms are all based on rather small (2 2 or 3 3) matrices. In such cases, or even
for matrices of moderate size, the differences between the various approaches to solving
linear systems (Gauss, GaussJordan, matrix inverse, etc.) are relatively unimportant,
particularly if one has a decent computer or even hand calculator to perform the tedious
parts. However, real-world applied mathematics deals with much larger linear systems,
and the design of efficient algorithms is critical. For example, numerical solutions of ordinary differential equations will typically lead to matrices with hundreds or thousands of
entries, while numerical solution of partial differential equations arising in fluid and solid
mechanics, weather prediction, image and video processing, chemical reactions, quantum
mechanics, molecular dynamics, and many other areas will often lead to matrices with
millions of entries. It is not hard for such systems to tax even the most sophisticated
supercomputer. Thus, it is essential that we look into the computational details of competing algorithms in order to compare their efficiency, and thereby gain some experience
with the issues underlying the design of high performance numerical algorithms.
The most basic question is: how many arithmetic operations are required for each of
our algorithms? We shall keep track of additions and multiplications separately, since the
latter typically take slightly longer to perform in a computer processor. However, we shall
not distinguish between addition and subtraction, nor between multiplication and division,
as these typically rely on the same floating point algorithm. We shall also assume that
the matrices and vectors are generic, with few, if any, zero entries. Modifications of the
basic algorithms for sparse matrices, meaning those that have lots of zero entries, are an
important topic of research, since these include many of the large matrices that appear
in applications to differential equations. We refer the interested reader to more advanced
numerical linear algebra texts, e.g., [121, 119], for further developments.
First, for ordinary multiplication of an n n matrix A and a vector b, each entry of
the product A b requires n multiplications of the form aij bj and n 1 additions to sum
the resulting products. Since there are n entries, this means a total of n 2 multiplications
and n(n 1) = n2 n additions. Thus, for a matrix of size n = 100, one needs about
10, 000 distinct multiplications and a similar (but slightly fewer) number of additions. If
n = 1, 000, 000 = 106 then n2 = 1012 , which is phenomenally large, and the total time
required to perform the computation becomes a significant issue .
Let us look at the regular Gaussian Elimination algorithm, referring back to our
program. First, we count how many arithmetic operations are based on the j th pivot
mjj . For each of the n j rows lying below it, we must perform one division to compute
the factor lij = mij /mjj used in the elementary row operation. The entries in the column
below the pivot will be made equal to zero automatically, and so we need only compute the
updated entries lying below and to the right of the pivot. There are (n j) 2 such entries in
the coefficient matrix and an additional n j entries in the last column of the augmented
matrix. Let us concentrate on the former for the moment. For each of these, we replace

See [ 31 ] for more sophisticated computational methods to speed up matrix multiplication.

1/12/04

c 2003

Peter J. Olver

mik by mik lij mjk , and so must perform one multiplication and one addition. Therefore,
for the j th pivot there are a total of (n j)(n j + 1) multiplications including the
initial n j divisions needed to produce the lij and (n j)2 additions needed to update
the coefficient matrix. Therefore, to reduce a regular n n matrix to upper triangular
form requires a total of
n
X

j =1

(n j)(n j + 1) =
n
X

j =1

(n j)2 =

n3 n
3

multiplications, and

2 n3 3 n 2 + n
6

(1.57)

additions.

(1.58)

Thus, when n is large, both require approximately 13 n3 operations.

We should also be keeping track of the number of operations on the right hand side
of the system. No pivots appear there, and so there are
n
X

j =1

(n j) =

n2 n
2

(1.59)

multiplications and the same number of additions required to produce the right hand side
in the resulting triangular system U x = c. For large n, this number is considerably smaller
than the coefficient matrix totals (1.57), (1.58).
The next phase of the algorithm can be similarly analyzed. To find the value of

n
X
1
xj =
uji xi
cj
ujj
i=j+1
once we have computed xj+1 , . . . , xn , requires n j + 1 multiplications/divisions and n j
additions. Therefore, the Back Substitution phase of the algorithm requires
n
X

j =1

(n j + 1) =

n
X
n2 n
n2 + n
(n j) =
multiplications, and
additions. (1.60)
2
2
j =1

For n large, both of these are approximately equal to 12 n2 . Comparing these results, we
conclude that the bulk of the computational effort goes into the reduction of the coefficient
matrix to upper triangular form.
Forward substitution, to solve L c = b, has the same operations count, except that
since the diagonal entries of L are all equal to 1, no divisions are required, and so we use a
total of 21 (n2 n) multiplications and the same number of additions. Thus, once we have
computed the L U decomposition of the matrix A, the Forward and Back Substitution
process requires about n2 arithmetic operations of the two types, which is the same as the

In Exercise

1/12/04

the reader is asked to prove these summation formulae by induction.

c 2003

Peter J. Olver

number of operations needed to perform the matrix multiplication A 1 b. Thus, even if

we know the inverse of the coefficient matrix A, it is still just as efficient to use Forward
and Back Substitution to compute the solution!
As noted above, the computation of L and U requires about 31 n3 arithmetic operations
of each type. On the other hand, to complete the full-blown GaussJordan elimination
scheme, we must perform all the elementary row operations on the large augmented matrix,
which has size n 2 n. Therefore, during the reduction to upper triangular form, there are
an additional 12 n3 operations of each type required. Moreover, we then need to perform
an additional 13 n3 operations to reduce U to the identity matrix, and a corresponding 12 n3
operations on the right hand matrix, too. (All these are approximate totals, based on
the leading term in the actual count.) Therefore, GaussJordan requires a grand total of
5 3
1
; multiplying the right hand side to obtain
3 n operations to complete, just to find A
1
2
the solution x = A b involves another n operations. Thus, the GaussJordan method
requires approximately five times as many arithmetic operations, and so would take five
times as long to complete, as compared to the more elementary Gaussian Elimination and
Back Substitution algorithm. These observations serve to justify our earlier contention
that matrix inversion is inefficient, and should never be used to solve linear systems in
practice.

Tridiagonal Matrices

Of course, in special cases, the arithmetic operation count might be considerably

reduced, particularly if A is a sparse matrix with many zero entries. A number of specialized techniques have been designed to handle such sparse linear systems. A particularly
important class are the tridiagonal matrices
q

r1
q2
p2

r2
q3
..
.

r3
..
.
pn2

qn1
pn1

rn1
qn

(1.61)

with all entries zero except for those on the main diagonal, ai,i = qi , on the subdiagonal ,
meaning the n 1 entries ai+1,i = pi immediately below the main diagonal, and the
superdiagonal , meaning the entries ai,i+1 = ri immediately above the main diagonal. (Zero
entries are left blank.) Such matrices arise in the numerical solution of ordinary differential
equations and the spline fitting of curves for interpolation and computer graphics. If
1/12/04

c 2003

Peter J. Olver

A = L U is regular, it turns out that the factors are lower and upper bidiagonal matrices,

d u

1
1
1
d 2 u2

l1 1

d 3 u3
l2 1

,
U
=
L=
.
.
.
.

.
.
.
.
.
.
.
.

dn1 un1
ln2
1
dn
ln1 1
(1.62)
Multiplying out L U , and equating the result to A leads to the equations
d1 = q 1 ,

u1 = r 1 ,

l 1 d1 = p 1 ,

l1 u1 + d 2 = q 2 ,
..
.
lj1 uj1 + dj = qj ,

u2 = r 2 ,
..
.
uj = r j ,

l 2 d2 = p 2 ,
..
.
l j dj = p j ,

..
.

..
.
ln2 un2 + dn1 = qn1 ,
ln1 un1 + dn = qn .

(1.63)

..
.

un1 = rn1 ,

ln1 dn1 = pn1 ,

These elementary algebraic equations can be successively solved for the entries of L and U
in the order d1 , u1 , l1 , d2 , u2 , l2 , d3 , u3 . . . . The original matrix A is regular provided none
of the diagonal entries d1 , d2 , . . . are zero, which allows the recursive procedure to proceed.
Once the L U factors are in place, we can apply Forward and Back Substitution to solve
the tridiagonal linear system A x = b. We first solve L c = b by Forward Substitution,
which leads to the recursive equations
c1 = b1 ,

c 2 = b2 l1 c1 ,

...

cn = bn ln1 cn1 .

(1.64)

We then solve U x = c by Back Substitution, again recursively:

xn =

cn
,
dn

xn1 =

cn1 un1 xn
,
dn1

...

x1 =

c 1 u 1 x2
.
d1

(1.65)

As you can check, there are a total of 5 n 4 multiplications/divisions and 3 n 3 additions/subtractions required to solve a general tridiagonal system of n linear equations
a striking improvement over the general case.
Example 1.35. Consider the n n tridiagonal matrix

4 1
1 4 1

1 4 1

1 4 1
A=

.
.
.

.. .. ..

1 4 1
1 4
1/12/04

c 2003

Peter J. Olver

in which the diagonal entries are all qi = 4, while the entries immediately above and below
the main diagonal are all pi = ri = 1. According to (1.63), the tridiagonal factorization
(1.62) has u1 = u2 = . . . = un1 = 1, while
d1 = 4,

lj = 1/dj ,

The computed values are

j
1
2

dj+1 = 4 lj ,

j = 1, 2, . . . , n 1.

3.75

3.733333

3.732143

3.732057

3.732051

.25

.266666

.267857

.267942

.267948

.267949

These converge rapidly to

dj 2 + 3 = 3.732051 . . . ,

3 = .2679492 . . . ,

which makes the factorization for large n almost trivial. The numbers 2 3 are the
roots of the quadratic equation x2 4 x + 1 = 0; an explanation of this observation will be
revealed in Chapter 19.
lj 2

Pivoting Strategies
Let us now consider the practical side of pivoting. As we know, in the irregular
situations when a zero shows up in a diagonal pivot position, a row interchange is required
to proceed with the elimination algorithm. But even when a nonzero element appear in
the current pivot position, there may be good numerical reasons for exchanging rows in
order to install a more desirable element in the pivot position. Here is a simple example:
.01 x + 1.6 y = 32.1,

x + .6 y = 22.

(1.66)

The exact solution to the system is x = 10, y = 20. Suppose we are working with a
very primitive calculator that only retains 3 digits of accuracy. (Of course, this is not
a very realistic situation, but the example could be suitably modified to produce similar
difficulties no matter how many digits of accuracy our computer retains.) The augmented
matrix is

.01 1.6 32.1

.
1
.6 22
Choosing the (1, 1) entry as our pivot, and subtracting 100 times the first row from the
second produces the upper triangular form

32.1
.01
1.6

.
0 159.4 3188
Since our calculator has only threeplace accuracy, it will round the entries in the second
row, producing the augmented coefficient matrix

32.1
.01
1.6

.
0 159.0 3190
1/12/04

c 2003

Peter J. Olver

Gaussian Elimination With Partial Pivoting

start
for i = 1 to n
set (i) = i
next i
for j = 1 to n
if m(i),j = 0 for all i j, stop; print A is singular
choose i > j such that m(i),j is maximal
interchange (i) (j)
for i = j + 1 to n
set l(i)j = m(i)j /m(j)j

for k = j + 1 to n + 1
set m(i)k = m(i)k l(i)j m(j)k

next k
next i
next j
end

3190
= 20.0628 . . . ' 20.1, and then
159
x = 100 (32.1 1.6 y) = 100 (32.1 32.16) ' 100 (32.1 32.2) = 10. The relatively small
error in y has produced a very large error in x not even its sign is correct!
The problem is that the first pivot, .01, is much smaller than the other element, 1,
that appears in the column below it. Interchanging the two rows before performing the row
operation would resolve the difficulty even with such an inaccurate calculator! After
the interchange, we have

1
.6 22
,
.01 1.6 32.1
The solution by back substitution gives y =

which results in the rounded-off upper triangular form

1
.6 22
1
.6
'
0 1.594 31.88
0 1.59

31.9 .

The solution by back substitution now gives a respectable answer:

y = 31.9/1.59 = 20.0628 . . . ' 20.1,

x = 22 .6 y = 22 12.06 ' 22 12.1 = 9.9.

The general strategy, known as Partial Pivoting, says that at each stage, we should
use the largest legitimate (i.e., lying on or below the diagonal) element as the pivot, even
1/12/04

c 2003

Peter J. Olver

if the diagonal element is nonzero. In a computer implementation of pivoting, there is no

need to waste processor time physically exchanging the row entries in memory. Rather,
one introduces a separate array of pointers that serve to indicate which original row is
currently in which permuted position. More specifically, one initializes n row pointers
(1) = 1, . . . , (n) = n. Interchanging row i and row j of the coefficient or augmented
matrix is then accomplished by merely interchanging (i) and (j). Thus, to access a
matrix element that is currently in row i of the augmented matrix, one merely retrieves
the element that is in row (i) in the computers memory. An explicit implementation of
this strategy is provided below. A program for partial pivoting that includes row pointers
appears above.
Partial pivoting will solve most problems, although there can still be difficulties. For
instance, it will not handle the system
10 x + 1600 y = 3210,

x + .6 y = 22,

obtained by multiplying the first equation in (1.66) by 1000. The tip-off is that, while the
entries in the column containing the pivot are smaller, those in its row are much larger. The
solution to this difficulty is Full Pivoting, in which one also performs column interchanges
preferably with a column pointer to move the largest legitimate element into the
pivot position. In practice, a column interchange is just a reordering of the variables in
the system, which, as long as one keeps proper track of the order, also doesnt change the
solutions.
Finally, there are some matrices that are hard to handle even with pivoting tricks.
Such ill-conditioned matrices are typically characterized by being almost singular . A
famous example of an ill-conditioned matrix is the n n Hilbert matrix

1
1
1
1
...
1
2
3
4
n

1
1
1
1
.
.
.

3
4
5
n+1
2
1

1
1
1
1

.
.
.
3
4
5
6
n+2

.
Hn = 1
(1.67)
1
1
1
1

...
4

5
6
7
n+3

..
..
..
..
..
..

.
.
.
.
.

1
1
1
1
1
...
n n+1 n+2 n+3
2n 1
In Proposition 3.36 we will prove that Hn is nonsingular for all n. However, the solution of
a linear system whose coefficient matrix is a Hilbert matrix Hn , even for moderately sized
n, is a very challenging problem, even if one uses high precision computer arithmetic .

This can be quantified by saying that their determinant is very small, but non-zero; see also
Sections 8.5 and 10.3.

In computer algebra systems such as Maple or Mathematica, one can use exact rational
arithmetic to perform the computations. Then the important issues are time and computational
efficiency.

1/12/04

c 2003

Peter J. Olver

This is because the larger n is, the closer Hn is, in a sense, to being singular.
The reader is urged to try the following computer experiment. Fix a moderately large
value of n, say 20. Choose a column vector x with n entries chosen at random. Compute
b = Hn x directly. Then try to solve the system Hn x = b by Gaussian Elimination. If
it works for n = 20, try n = 50 or 100. This will give you a good indicator of the degree
of precision used by your computer program, and the accuracy of the numerical solution
algorithm.

1.8. General Linear Systems.

So far, we have only treated linear systems involving the same number of equations as
unknowns, and then only those with nonsingular coefficient matrices. These are precisely
the systems that always have a unique solution. We now turn to the problem of solving
a general linear system of m equations in n unknowns. The cases not covered as yet
are rectangular systems, with m 6
= n, as well as square systems with singular coefficient
matrices. The basic idea underlying the Gaussian Elimination Algorithm for nonsingular
systems can be straightforwardly adapted to these cases, too. One systematically utilizes
the same elementary row operations so as to manipulate the coefficient matrix into a
particular reduced form generalizing the upper triangular form we aimed for in the earlier
square, nonsingular cases.
Definition 1.36. An m n
following staircase structure:

...

0
0 ...
0

0
0 ...
0
0

..
..
.. . .
..

.
.
.
.
.
U =

0
0 ...
0
0

..
..
..
.. . .

.
.
.
.
.
0
0 ...
0
0

matrix is said to be in row echelon form if it has the

...

..
.

...

...
..
.
...

0
..
.

...
..
.
...

..
.

...

0
..
.

...
..
.
...

..
.

The entries indicated by

are the pivots, and must be nonzero. The first r rows of U
each contain one pivot, but not all columns need to contain a pivot. The entries below the
staircase, indicated by the solid line, are all zero, while the non-pivot entries above the
staircase, indicated by stars, can be anything. The last mr rows are identically zero, and
do not contain any pivots. Here is an explicit example of a matrix in row echelon form:

3 1
0 2 5 1
0 1 2 1 8 0

0 0
0 0 2 4
0 0
0 0 0 0
1/12/04

c 2003

Peter J. Olver

The three pivots, which are the first three nonzero entries in the nonsero rows, are, respectively, 3, 1, 2. There may, in exceptional situations, be one or more initial all zero
columns.
Proposition 1.37. Any matrix can be reduced to row echelon form by a sequence
of elementary row operations of Types #1 and #2.
In matrix language, Proposition 1.37 implies that if A is any m n matrix, then there
exists an m m permutation matrix P and an m m special lower triangular matrix L
such that
P A = L U,
(1.68)
where U is in row echelon form. The factorization is not unique.
A constructive proof of this result is based on the general Gaussian elimination algorithm, which proceeds as follows. Starting at the top left of the matrix, one searches for
the first column which is not identically zero. Any of the nonzero entries in that column
may serve as the pivot. Partial pivoting indicates that it is probably best to choose the
largest one, although this is not essential for the algorithm to proceed. One places the
chosen pivot in the first row of the matrix via a row interchange, if necessary. The entries
below the pivot are made equal to zero by the appropriate elementary row operations of
Type #1. One then proceeds iteratively, performing the same reduction algorithm on the
submatrix consisting of all entries strictly to the right and below the pivot. The algorithm
terminates when either there is a pivot in the last row, or all of the rows lying below the
last pivot are identically zero, and so no more pivots can be found.
Example 1.38. Let us illustrate the general Gaussian Elimination algorithm with a
particular example. Consider the linear system
x + 3y + 2z u
= a,
2 x + 6 y + z + 4 u + 3 v = b,

(1.69)

x 3 y 3 z + 3 u + v = c,
3 x + 9 y + 8 z 7 u + 2 v = d,

of 4 equations in 5 unknowns, where a, b, c, d are specified numbers . The coefficient matrix

1
3
2 1 0
6
1
4 3
2
A=
(1.70)
.
1 3 3 3 1
3
9
8 7 2
To solve the system, we introduce the augmented matrix

1
3
2 1 0 a

6
1
4 3 b
2

1 3 3 3 1 c

3
9
8 7 2
d

It will be convenient to work with the right hand side in general form, although the reader
may prefer, at least initially, to assign specific values to a, b, c, d.

1/12/04

c 2003

Peter J. Olver

obtained by appending the right hand side of the system. The upper left entry is nonzero,
and so can serve as the first pivot; we eliminate the entries below it by elementary row
operations, resulting in

1
0

0
0

3 2
0 3
0 1
0 2

1
6
2
4

0
3
1
2

b 2a

.
c+a

d 3a

Now, the second column contains no suitable nonzero entry to serve as the second pivot.
(The top entry already lies in a row with a pivot in it, and so cannot be used.) Therefore,
we move on to the third column, choosing the (2, 3) entry, 3, as our second pivot. Again,
we eliminate the entries below it, leading to

a
1 3 2 1 0
0 0 3 6 3

b 2a

1
5
0 0 0
0 0 c 3 b + 3 a .
0 0 0
0 4 d + 32 b 13
3 a

The final pivot is in the last column, and we interchange the last two rows in order to
place the coefficient matrix in row echelon form:

a
1 3 2 1 0
0 0 3 6 3

b 2a

(1.71)
2
13
0 0 0
0 4 d + 3 b 3 a .
0 0 0
0 0 c 13 b + 35 a

There are three pivots, 1, 3, 4, sitting in positions (1, 1), (2, 3) and (3, 5). Note the
staircase form, with the pivots on the steps and everything below the staircase being zero.
Recalling the row operations used to construct the solution (and keeping in mind that
the row interchange that appears at the end also affects the entries of L), we find the
factorization (1.68) has the explicit form

1
0

0
0

0
1
0
0

0
0
0
1

0
1
3
0 2
6

1
1 3
0
3
9

2
1
3
8

1 0
1
4 3 2
=
3 1
3
1
7 2

0
1
32
1
3

0
0
1
0

1
0
0 0

0
0
1
0

3
0
0
0

2
3
0
0

1 0
6 3

0 4
0 0

We shall return to find the solution to our system after a brief theoretical interlude.

Warning: In the augmented matrix, pivots can never appear in the last column,
= 0, that entry
representing the right hand side of the system. Thus, even if c 31 b + 35 a 6
does not qualify as a pivot.
We now introduce the most important numerical quantity associated with a matrix.
Definition 1.39. The rank of a matrix A is the number of pivots.
1/12/04

c 2003

Peter J. Olver

For instance, the rank of the matrix (1.70) equals 3, since its reduced row echelon
form, i.e., the first five columns of (1.71), has three pivots. Since there is at most one pivot
per row and one pivot per column, the rank of an m n matrix is bounded by both m and
n, and so 0 r min{m, n}. The only matrix of rank 0 is the zero matrix, which has no
pivots.
Proposition 1.40. A square matrix of size n n is nonsingular if and only if its
rank is equal to n.
Indeed, the only way an n n matrix can end up having n pivots is if its reduced row
echelon form is upper triangular with nonzero diagonal entries. But a matrix that reduces
to such triangular form is, by definition, nonsingular.
Interestingly, the rank of a matrix does not depend on which elementary row operations are performed along the way to row echelon form. Indeed, performing a different
sequence of row operations say using partial pivoting versus no pivoting can produce
a completely different row echelon form. The remarkable fact, though, is that all such row
echelon forms end up having exactly the same number of pivots, and this number is the
rank of the matrix. A formal proof of this fact will appear in Chapter 2.
Once the coefficient matrix has been reduced to row echelon form, the solution proceeds as follows. The first step is to see if there are any incompatibilities. Suppose one
of the rows in the row echelon form of the coefficient matrix is identically zero, but the
corresponding entry in the last column of the augmented matrix is nonzero. What linear
equation would this represent? Well, the coefficients of all the variables are zero, and so
the equation is of the form 0 = c, where c, the number on the right hand side of the
equation, is the entry in the last column. If c 6
= 0, then the equation cannot be satisfied.
Consequently, the entire system has no solutions, and is an incompatible linear system.
On the other hand, if c = 0, then the equation is merely 0 = 0, and so is trivially satisfied.
For example, the last row in the echelon form (1.71) is all zero, and hence the last entry in
the final column must also vanish in order that the system be compatible. Therefore, the
linear system (1.69) will have a solution if and only if the right hand sides a, b, c, d satisfy
the linear constraint
1
5
(1.72)
3 a 3 b + c = 0.
In general, if the system is incompatible, there is nothing else to do. Otherwise,
every zero row in the echelon form of the augmented matrix also has a zero entry in the
last column, and the system is compatible, so one or more solutions exist. To find the
solution(s), we work backwards, starting with the last row that contains a pivot. The
variables in the system naturally split into two classes.
Definition 1.41. In a linear system U x = c in row echelon form, the variables
corresponding to columns containing a pivot are called basic variables, while the variables
corresponding to the columns without a pivot are called free variables.
The solution to the system proceeds by a version of the Back Substitution procedure.
The nonzero equations are solved, in reverse order, for the basic variable corresponding to
its pivot. Each result is substituted into the preceding equations before they in turn are
1/12/04

c 2003

Peter J. Olver

solved. The remaining free variables, if any, are allowed to take on any values whatsoever,
and the solution then specifies all the basic variables in terms of the free variables, which
serve to parametrize the general solution.
Example 1.42. Let us illustrate this construction with our particular example.
Assuming the compatibility condition (1.72), the reduced augmented matrix (1.71) is

a
1 3 2 1 0

0 0 3 6 3
b 2a

2
13
0 0 0
0 4 d + 3 b 3 a .
0 0 0
0 0
0

The pivots are found in columns 1, 3, 5, and so the corresponding variables, x, z, v, are
basic; the other variables, y, u, are free. We will solve the reduced system for the basic
variables in terms of the free variables.
As a specific example, the values a = 0, b = 3, c = 1, d = 1, satisfy the compatibility constraint (1.72). The resulting augmented echelon matrix (1.71) corresponds to the
system
x + 3y + 2z u
= 0,
3 z + 6 u + 3 v = 3,
4 v = 3,

0 = 0.
We now solve the equations, in reverse order, for the basic variables, and then substitute
the resulting values in the preceding equations. The result is the general solution
v = 34 ,

z = 1 + 2u + v = 14 + 2 u,

x = 3y 2z + u =

1
2

3 y 3 u.

The free variables y, u are completely arbitrary; any value they assume will produce a
solution to the original system. For instance, if y = 2, u = 1 , then x = 3 + 72 ,
z = 47 2 , v = 43 . But keep in mind that this is merely one of an infinite number of
different solutions.
In general, if the m n coefficient matrix of a system of m linear equations in n
unknowns has rank r, there are m r all zero rows in the row echelon form, and these
m r equations must have zero right hand side in order that the system be compatible and
have a solution. Moreover, there are a total of r basic variables and n r free variables,
and so the general solution depends upon n r parameters.
Summarizing the preceding discussion, we have learned that there are only three
possible outcomes for the solution to a general linear system.
Theorem 1.43. A system A x = b of m linear equations in n unknowns has either
(i ) exactly one solution, (ii ) no solutions, or (iii ) infinitely many solutions.
Case (ii ) occurs if the system is incompatible, producing a zero row in the echelon
form that has a nonzero right hand side. Case (iii ) occurs if the system is compatible and
there are one or more free variables. This happens when the system is compatible and the
rank of the coefficient matrix is strictly less than the number of columns: r < n. Case
1/12/04

c 2003

Peter J. Olver

(i ) occurs for nonsingular square coefficient matrices, and, more generally, for compatible
systems for which r = n, implying there are no free varaibles. Since r m, this case can
only arise if the coefficient matrix has at least as many rows as columns, i.e., the linear
system has at least as many equations as unknowns.
A linear system can never have a finite number other than 0 or 1 of solutions.
Thus, any linear system that has more than one solution automatically has infinitely many.
This result does not apply to nonlinear systems. As you know, a real quadratic equation
a x2 + b x + c = 0 can have either 2, 1, or 0 real solutions.
Example 1.44. Consider the linear system
y + 4 z = a,

3 x y + 2 z = b,

x + y + 6 z = c,

consisting of three equations in three unknowns. The augmented coefficient matrix is

0 1 4 a
3 1 2 b .

1 1 6 c

Interchanging the first two rows, and then eliminating the elements below the first pivot
leads to

b
3 1 2
0 1
a .
4
16
4
c 13 b
0 3
3
The second pivot is in the (2, 2) position, but after eliminating the entry below it, we find
the row echelon form to be

3 1 2
b
0 1 4
.
a

1
4

0 0 0
c 3b 3a

Since we have a row of all zeros, the original coefficient matrix is singular, and its rank is
only 2.
The compatibility condition for the system follows from this last row in the reduced
echelon form, and so requires
1
4
3 a + 3 b c = 0.
If this is not satisfied, the system has no solutions; otherwise it has infinitely many. The
free variable is z, since there is no pivot in the third column. The general solution is
y = a 4 z,

1
3

b + 31 y 32 z =

1
3

a + 13 b 2z,

where z is arbitrary.
Geometrically, Theorem 1.43 is indicating something about the possible configurations
of linear subsets (lines, planes, etc.) of an n-dimensional space. For example, a single linear
equation a x + b y + c z = d defines a plane P in three-dimensional space. The solutions to
a system of three linear equations in three unknowns is the intersection P 1 P2 P3 of
three planes. Generically, three planes intersect in a single common point; this is case (i )
1/12/04

c 2003

Peter J. Olver

No Solution

Unique Solution
Figure 1.1.

Infinite # Solutions

Intersecting Planes.

of the theorem, and occurs if and only if the coefficient matrix is nonsingular. The case of
infinitely many solutions occurs when the three planes intersect on a common line, or, even
more degenerately, when they all coincide. On the other hand, parallel planes, or planes
intersecting in parallel lines, have no common point of intersection, and this corresponds
to the third case of a system with no solutions. Again, no other possibilities occur; clearly
one cannot have three planes having exactly 2 points in their common intersection it is
either 0, 1 or . Some possible geometric configurations are illustrated in Figure 1.1.
Homogeneous Systems
A linear system with all 0s on the right hand side is called a homogeneous system. In
matrix notation, a homogeneous system takes the form
A x = 0.

(1.73)

Homogeneous systems are always compatible, since x = 0 is a solution, known as the trivial
solution. If the homogeneous system has a nontrivial solution x 6
= 0, then Theorem 1.43
assures that it must have infinitely many solutions. This will occur if and only if the
reduced system has one or more free variables. Thus, we find:
Theorem 1.45. A homogeneous linear system A x = 0 of m equations in n unknowns has a nontrivial solution x 6
= 0 if and only if the rank of A is r < n. If m < n, the
system always has a nontrivial solution. If m = n, the system has a nontrivial solution if
and only if A is singular.
Example 1.46. Consider the homogeneous linear system
2 x1 + x2 + 5 x4 = 0,
with coefficient matrix

4 x1 + 2 x2 x3 + 8 x4 = 0,

A=
4
2

1
2
1

2 x1 x2 + 3 x3 4 x4 = 0,

0
5
1 8 .
3 4

Since the system is homogeneous and has fewer equations than unknowns, Theorem 1.45
assures us that it has infinitely many solutions, including the trivial solution x 1 = x2 =
1/12/04

c 2003

Peter J. Olver

x3 = x4 = 0. When solving a homogeneous system, the final column of the augmented

matrix consists of all zeros. As it will never be altered by row operations, it is a waste of
effort to carry it along during the process. We therefore perform the Gaussian Elimination
algorithm directly on the coefficient matrix A. Working with the (1, 1) entry as the first
pivot, we first obtain

2 1 0
5
0 0 1 2 .
0 0 3
1
The (2, 3) entry is the second pivot, and we apply one final row operation to place the
matrix in row echelon form

2 1 0
5
0 0 1 2 .
0 0 0 5
This corresponds to the reduced homogeneous system
2 x1 + x2 + 5 x4 = 0,

x3 2 x4 = 0,

5 x4 = 0.

Since there are three pivots in the final row echelon form, the rank of the matrix A is
3. There is one free variable, namely x2 . Using Back Substitution, we easily obtain the
general solution
x1 = 21 t,
x2 = t,
x3 = x4 = 0,
which depends upon a single free parameter t = x2 .
Example 1.47. Consider the homogeneous linear system
2 x y + 3 z = 0,

4 x + 2 y 6 z = 0,
2 x y + z = 0,
6 x 3 y + 3 z = 0,

2 1 3
4 2 6
with coefficient matrix A =
. The system admits the trivial solution
2 1 1
6 3 3
x = y = z = 0, but in this case we need to complete the elimination algorithm before we
can state whether or not there are other solutions. After the first stage, the coefficient

2 1 3
0
0 0
matrix has the form
. To continue, we need to interchange the second and
0 0 2
0 0 6
third rows to place a nonzero entry in the final pivot position; after that the reduction to

2 1 3
2 1 3
0 0 2
0 0 2
row echelon form is immediate:

7
. Thus, the system
0 0
0
0 0
0
0 0 6
0 0
0
reduces to the equations
2 x y + 3 z = 0,
1/12/04

2 z = 0,
46

0 = 0,

0 = 0,
c 2003

Peter J. Olver

where the third and fourth equations are trivially compatible, as they must be in the
homogeneous case. The rank is equal to two, which is less than the number of columns,
and so, even though the system has more equations than unknowns, it has infinitely many
solutions. These can be written in terms of the free variable y, and so the general solution
is x = 12 y, z = 0, where y is arbitrary.

1.9. Determinants.
You may be surprised that, so far, we have left undeveloped a topic that often assumes a central role in basic linear algebra: determinants. As with matrix inverses, while
determinants can be useful in low dimensions and for theoretical purposes, they are mostly
irrelevant when it comes to large scale applications and practical computations. Indeed,
the best way to compute a determinant is (surprise) Gaussian Elimination! However,
you should be familiar with the basics of determinants, and so for completeness, we shall
provide a very brief introduction.
The determinant of a square matrix A, written det A, is a number that immediately
tells whether the matrix is singular or not. (Rectangular matrices do not have determinants.) We already encountered, (1.34), the determinant of a 2 2 matrix, which is
equal
to the
product of the diagonal entries minus the product of the off-diagonal entries:
a b
det
= a d b c. The determinant is nonzero if and only if the matrix has an
c d
inverse. Our goal is to generalize this construction to general square matrices.
There are many different ways to define determinants. The difficulty is that the actual formula is very unwieldy see (1.81) below and not well motivated. We prefer an
axiomatic approach that explains how our elementary row operations affect the determinant. In this manner, one can compute the determinant by Gaussian elimination, which
is, in fact, the fastest and most practical computational method in all but the simplest
situations. In effect, this remark obviates the need to ever compute a determinant.
Theorem 1.48. The determinant of a square matrix A is the uniquely defined scalar
quantity det A that satisfies the following axioms:
(1) Adding a multiple of one row to another does not change the determinant.
(2) Interchanging two rows changes the sign of the determinant.
(3) Multiplying a row by any scalar (including zero) multiplies the determinant by the
same scalar.
(4) Finally, the determinant function is fixed by setting
det I = 1.

(1.74)

Checking that all four of these axioms hold in the 2 2 case (1.34) is left as an
elementary exercise for the reader. A particular consequence of axiom 3 is that when we
multiply a row of any matrix A by the zero scalar, the resulting matrix, which has a row
of all zeros, necessarily has zero determinant.
Lemma 1.49. Any matrix with one or more all zero rows has zero determinant.
1/12/04

c 2003

Peter J. Olver

Since the determinantal axioms tell how determinants behave under all three of our
elementary row operations, we can use Gaussian elimination to compute a general determinant, recovering det A from its permuted L U factorization.
Theorem 1.50. If A is a regular matrix, with A = L U factorization as in (1.21),
then
det A = det U =

n
Y

uii

(1.75)

i=1

equals the product of the pivots. More generally, if A is nonsingular, and requires k row
interchanges to arrive at its permuted L U factorization P A = L U , then
det A = det P det U = (1)

n
Y

uii .

(1.76)

i=1

Finally, A is singular if and only if det A = 0.

Proof : In the regular case, one only needs elementary row operations of type #1 to
reduce A to upper triangular form U , and axiom 1 says these do not change the determinant. Therefore det A = det U . Proceeding with the full GaussJordan scheme, the next
phase is to divide each row in U by its pivot, leading to the special upper triangular matrix
V with all 1s on the diagonal. Axiom 3 implies
n
!
Y
det A = det U =
uii
det V.
(1.77)
i=1

Finally, we can reduce V to the identity by further row operations of Type #1, and so by
(1.74),
det V = det I = 1.
(1.78)
Combining equations (1.77), (1.78) proves the theorem for the regular case. The nonsingular case follows without difficulty each row interchange changes the sign of the
determinant, and so det A equals det U if there have been an even number of interchanges,
but equals det U if there have been an odd number.
Finally, if A is singular, then we can reduce it to a matrix with at least one row of
zeros by elementary row operations of types #1 and #2. Lemma 1.49 implies that the
resulting matrix has zero determinant, and so det A = 0, also.
Q.E.D.
Corollary 1.51. The determinant of a diagonal matrix is the product of the diagonal
entries. The same result holds for both lower triangular and upper triangular matrices.
Example 1.52. Let us compute the determinant of the 4 4 matrix

1
2
A=
0
1
1/12/04

0
1
2
1
48

1
3
2
4

2
4
.
3
2

c 2003

Peter J. Olver

We perform our usual Gaussian Elimination algorithm, successively leading to the matrices

1 0 1 2
1 0 1 2
1 0 1 2
0 1 1 0
0 1 1 0
0 1 1 0

A 7

,
7
7
0 0 2 4
0 0 0
3
0 2 2 3
0 0 0
3
0 0 2 4
0 1 3 4

where we used a single row interchange to obtain the final upper triangular form. Owing
to the row interchange, the determinant of the original matrix is 1 times the product of
the pivots:
det A = 1 1 1 ( 2) 3 = 6.
In particular, this tells us that A is nonsingular. But, of course, this was already implied
by the elimination, since the matrix reduced to upper triangular form with 4 pivots.
Let us now present some of the basic properties of determinants.
Lemma 1.53. The determinant of the product of two square matrices of the same
size is the product of the determinants:
det(A B) = det A det B.

(1.79)

Proof : The product formula holds if A is an elementary matrix; this is a consequence of the determinantal axioms, combined with Corollary 1.51. By induction, if
A = E1 E2 EN is a product of elementary matrices, then (1.79) also holds. Therefore, the result holds whenever A is nonsingular. On the other hand, if A is singular, then
according to Exercise , A = E1 E2 EN Z, where the Ei are elementary matrices, and
Z, the row echelon form, is a matrix with a row of zeros. But then Z B = W also has a
row of zeros, and so A B = E1 E2 EN W is also singular. Thus, both sides of (1.79) are
zero in this case.
Q.E.D.
It is a remarkable fact that, even though matrix multiplication is not commutative, and
so A B 6
= B A in general, it is nevertheless always true that both products have the same
determinant: det(A B) = det(B A). Indeed, both are equal to the product det A det B of
the individual determinants because ordinary (scalar) multiplication is commutative.
Lemma 1.54. Transposing a matrix does not change its determinant:
det AT = det A.

(1.80)

Proof : By inspection, this formula holds if A is an elementary matrix. If A =

E1 E2 EN is a product of elementary matrices, then using (1.49), (1.79) and induction
T T
T
T
T
det AT = det(E1 E2 EN )T = det(EN
EN 1 E1T ) = det EN
det EN
1 det E1

= det EN det EN 1 det E1 = det E1 det E2 det EN

= det(E1 E2 EN ) = det A.

The middle equality follows from the commutativity of ordinary multiplication. This proves
the nonsingular case; the singular case follows from Lemma 1.30, which implies that A T is
singular if and only if A is.
Q.E.D.
1/12/04

c 2003

Peter J. Olver

Remark : Lemma 1.54 has the interesting consequence that one can equally well use
elementary column operations to compute determinants. We will not develop this approach in any detail here, since it does not help us to solve linear equations.
Finally, we state the general formula for a determinant; a proof can be found in [135].
Theorem 1.55. If A is an n n matrix with entries aij , then
det A =

a1,(1) a2,(2) an,(n) .

(1.81)

The sum in (1.81) is over all possible permutations of the columns of A. The
summands consist of all possible ways of choosing n entries of A with one entry in each
column and 1 entry in each row of A. The sign in front of the indicated term depends on
the permutation ; it is + if is an even permutation, meaning that its matrix can be
reduced to the identity by an even number of row interchanges, and is is odd. For
example, the six terms in the well-known formula

a11 a12 a13

a11 a22 a33 + a12 a23 a31 + a13 a21 a 32
(1.82)
det a21 a22 a23 =
a11 a23 a32 a12 a21 a33 a13 a22 a31
a31 a32 a33

for a 3 3 determinant correspond to the six possible 3 3 permutation matrices (1.27).

The proof that (1.81) obeys the basic determinantal axioms is straightforward, but,
will not be done here. The reader might wish to try the 3 3 case to be convinced that
it works. This explicit formula proves that the determinant function is well-defined, and
formally completes the proof of Theorem 1.48.
Unfortunately, the explicit determinant formula (1.81) contains n ! terms, and so,
as soon as n is even moderately large, is completely impractical for computation. The
most efficient way is still our mainstay Gaussian Elimination coupled the fact that the
determinant is the product of the pivots!
Determinants have many fascinating and theoretically important properties. However, in our applications, these will not be required, and so we conclude this very brief
introduction to the subject.

1/12/04

c 2003

Peter J. Olver

Chapter 2
Vector Spaces
Vector spaces and their ancillary structures provide the common language of linear
algebra, and, as such are an essential prerequisite for understanding contemporary applied mathematics. The key concepts of vector space, subspace, linear independence,
span, and basis will appear, not only in linear systems of equations and the geometry of
n-dimensional Euclidean space, but also in the analysis of linear ordinary differential equations, linear partial differential equations, linear boundary value problems, all of Fourier
analysis, numerical approximations like the finite element method, and many, many other
fields. Therefore, in order to develop the wide variety of analytical methods and applications covered in this text, we need to acquire a firm working knowledge of basic vector
space analysis.
One of the great triumphs of modern mathematics was the recognition that many
seemingly distinct constructions are, in fact, different manifestations of the same general
mathematical structure. The abstract notion of a vector space serves to unify spaces of
ordinary vectors, spaces of functions, such as polynomials, exponentials, trigonometric
functions, as well as spaces of matrices, linear operators, etc., all in a common conceptual
framework. Moreover, proofs that might look rather complicated in any particular context often turn out to be relatively transparent when recast in the abstract vector space
framework. The price that one pays for the increased level of abstraction is that, while the
underlying mathematics is not all that difficult, the student typically takes a long time to
assimilate the material. In our opinion, the best way to approach the subject is to think in
terms of concrete examples. First, make sure you understand what the concept or theorem
says in the case of ordinary Euclidean space. Once this is grasped, the next important case
to consider is an elementary function space, e.g., the space of continuous scalar functions.
With these two examples firmly in hand, the leap to the general abstract version should
not be too painful. Patience is essential; ultimately the only way to truly understand an
abstract concept like a vector space is by working with it! And always keep in mind that
the effort expended here will be amply rewarded later on.
Following an introduction to vector spaces and subspaces, we introduce the notions of
span and linear independence of a collection of vector space elements. These are combined
into the all-important concept of a basis of a vector space, leading to a linear algebraic
characterization of its dimension. We will then study the four fundamental subspaces
associated with a matrix range, kernel, corange and cokernel and explain how they
help us understand the solution to linear algebraic systems. Of particular note is the
all-pervasive linear superposition principle that enables one to construct more general
solutions to linear systems by combining known solutions. Superposition is the hallmark
1/12/04

c 2003

Peter J. Olver

of linearity, and will apply not only to linear algebraic equations, but also linear ordinary
differential equations, linear partial differential equations, linear boundary value problems,
and so on. Some interesting applications in graph theory, to be used in our later study of
electrical circuits, will form the final topic of this chapter.

2.1. Vector Spaces.

A vector space is the abstract formulation of the most basic underlying properties of
n-dimensional Euclidean space R n , which is defined as the set of all real (column) vectors
with n entries. The basic laws of vector addition and scalar multiplication in R n serve
as the motivation for the general, abstract definition of a vector space. In the beginning,
we will refer to the elements of a vector space as vectors, even though, as we shall see,
they might also be functions or matrices or even more general objects. Unless dealing
with certain specific examples such as a space of functions, we will use bold face, lower
case Latin letters to denote the elements of our vector space. We begin with the general
definition.
Definition 2.1. A vector space is a set V equipped with two operations:
(i ) Addition: adding any pair of vectors v, w V produces another vector v + w V ;
(ii ) Scalar Multiplication: multiplying a vector v V by a scalar c R produces a vector
cv V .
which are required to satisfy the following axioms for all u, v, w V and all scalars c, d R:
(a) Commutativity of Addition: v + w = w + v.
(b) Associativity of Addition: u + (v + w) = (u + v) + w.
(c) Additive Identity: There is a zero element 0 V satisfying v + 0 = v = 0 + v.
(d) Additive Inverse: For each v V there is an element v V such that
v + ( v) = 0 = ( v) + v.
(e) Distributivity: (c + d) v = (c v) + (d v), and c (v + w) = (c v) + (c w).
(f ) Associativity of Scalar Multiplication: c (d v) = (c d) v.
(g) Unit for Scalar Multiplication: the scalar 1 R satisfies 1 v = v.
Note: We will use bold face 0 to denote the zero element of our vector space, while
ordinary 0 denotes the real number zero. The following identities are elementary consequences of the vector space axioms:
(h) 0 v = 0. (i) (1) v = v. (j) c 0 = 0. (k) If c v = 0, then either c = 0 or v = 0.
Let us, as an example, prove (h). Let z = 0 v. Then, by the distributive property,
z + z = 0 v + 0 v = (0 + 0) v = 0 v = z.
Adding z to both sides of this equation, and making use of axioms (b), (d), and then (c),
implies that z = 0, which completes the proof. Verification of the other three properties
is left as an exercise for the reader.

The precise definition of dimension will appear later, in Theorem 2.28,

1/12/04

c 2003

Peter J. Olver

Remark : For most of this chapter we will deal with real vector spaces, in which the
scalars are the real numbers R. Complex vector spaces, where complex scalars are allowed,
will be introduced in Section 3.6. Vector spaces over other fields are studied in abstract
algebra, [77].
Example 2.2.
space R n consisting
Vector addition and
v +w
1

As noted above, the prototypical example of a real vector space is the

T
of column vectors or n-tuples of real numbers v = ( v1 , v2 , . . . , vn ) .
scalar multiplication are defined in the usual manner:

cv
v
w
1

v2 + w 2
,
v+w =
..

.
vn + w n

c v2

cv =
.. ,
.
c vn
T

whenever

v2
w
, w = .2 .
v=
.
.
.
.
.
vn
wn

The zero vector is 0 = ( 0, . . . , 0 ) . The fact that vectors in R n satisfy all of the vector space axioms is an immediate consequence of the laws of vector addition and scalar
multiplication. Details are left to the reader.
Example 2.3. Let Mmn denote the space of all real matrices of size m n. Then
Mmn forms a vector space under the laws of matrix addition and scalar multiplication.
The zero element is the zero matrix O. Again, the vector space axioms are immediate
consequences of the basic laws of matrix arithmetic. (For the purposes of this example, we
ignore additional matrix properties, like matrix multiplication.) The preceding example of
the vector space R n = M1n is a particular case when the matrices have only one column.
Example 2.4. Consider the space

P (n) = p(x) = an xn + an1 xn1 + + a1 x + a0

(2.1)

consisting of all polynomials of degree n. Addition of polynomials is defined in the usual

manner; for example,
(x2 3 x) + (2 x2 5 x + 4) = 3 x2 8 x + 4.
Note that the sum p(x) + q(x) of two polynomials of degree n also has degree n.
(However, it is not true that the sum of two polynomials of degree = n also has degree n;
for example (x2 + 1) + ( x2 + x) = x + 1 has degree 1 even though the two summands have
degree 2. This means that the set of polynomials of degree = n is not a vector space.) The
zero element of P (n) is the zero polynomial. We can multiply polynomials by scalars real
constants in the usual fashion; for example if p(x) = x2 2 x, then 3 p(x) = 3 x2 6 x.
The proof that P (n) satisfies the vector space axioms is an easy consequence of the basic
laws of polynomial algebra.
Remark : We are ignoring the fact that one can also multiply polynomials; this is not
a vector space operation. Also, any scalar can be viewed as a constant polynomial, but one
should really regard these as two completely different objects one is a number , while
the other is a constant function. To add to the confusion, one typically uses the same
notation for these two objects; for instance, 1 could either mean the real number 1 or the
constant function taking the value 1 everywhere. The reader needs to exercise due care
when interpreting each occurrence.
1/12/04

c 2003

Peter J. Olver

For much of analysis, including differential equations, Fourier theory, numerical methods, and so on, the most important vector spaces consist of sets of functions with certain
specified properties. The simplest such example is the following.
Example 2.5. Let I R be an interval. Consider the function space F = F(I)
that consists of all real-valued functions f (x) defined for all x I, which we also write
as f : I R. The claim is that the function space F has the structure of a vector space.
Addition of functions in F is defined in the usual manner: (f + g)(x) = f (x) + g(x).
Multiplication by scalars c R is the same as multiplication by constants, (c f )(x) = c f (x).
The zero element is the constant function that is identically 0 for all x I. The proof
of the vector space axioms is straightforward, just as in the case of polynomials. As in
the preceding remark, we are ignoring all additional operations multiplication, division,
inversion, composition, etc. that can be done with functions; these are irrelevant as far
as the vector space structure of F goes.
Remark : An interval can be (a) closed , meaning that it includes its endpoints: I =
[ a, b ], (b) open, which does not include either endpoint: I = ( a, b ), or (c) half open,
which includes one but not the other endpoint, so I = [ a, b ) or ( a, b ]. An open endpoint is
allowed to be infinite; in particular, ( , ) = R is another way of writing the real line.
Example 2.6. The preceding examples are all, in fact, special cases of an even
more general construction. A clue is to note that the last example of a function space
does not make any use of the fact that the domain of definition of our functions is a real
interval. Indeed, the construction produces a function space F(I) corresponding to any
subset I R.
Even more generally, let S be any set. Let F = F(S) denote the space of all realvalued functions f : S R. Then we claim that V is a vector space under the operations
of function addition and scalar multiplication. More precisely, given functions f and g,
we define their sum to be the function h = f + g such that h(x) = f (x) + g(x) for all
x S. Similarly, given a function f and a real scalar c R, we define the scalar multiple
k = c f to be the function such that k(x) = c f (x) for all x S. The verification of the
vector space axioms proceeds straightforwardly, and the reader should be able to fill in the
necessary details.
In particular, if S R is an interval, then F(S) coincides with the space of scalar
functions described in the preceding example. If S R n is a subset of Euclidean space,
then the elements of F(S) are functions f (x1 , . . . , xn ) depending upon the n variables
corresponding to the coordinates of points x = (x1 , . . . , xn ) S in the domain. In this
fashion, the set of real-valued functions defined on any domain in R n is found to also form
a vector space.
Another useful example is to let S = {x1 , . . . , xn } R be a finite set of real numbers.
A real-valued function f : S R is defined by its values f (x1 ), f (x2 ), . . . f (xn ) at the
specified points. In applications, one can view such functions as indicating the sample
values of a scalar function f (x) F(R) taken at the sample points x1 , . . . , xn . For example,
when measuring a physical quantity, e.g., temperature, velocity, pressure, etc., one typically
only measures a finite set of sample values. The intermediate, non-recorded values between
the sample points are then reconstructed through some form of interpolation a topic
1/12/04

c 2003

Peter J. Olver

that we shall visit in depth later on. Interestingly, the sample values f (x i ) can be identified
with the entries fi of a vector
T

f = ( f1 , f2 , . . . , fn ) = ( f (x1 ), f (x2 ), . . . , f (xn ) )

Rn,

known as the sample vector . Every sampled function f : {x1 , . . . , xn } R corresponds to

a unique vector f R n and vice versa. (However, different scalar functions f : R R
can have the same sample values.) Addition of sample functions corresponds to addition
of their sample vectors, as does scalar multiplication. Thus, the vector space of sample
functions F(S) = F( {x1 , . . . , xn } ) is the same as the vector space R n ! This connection
between sampled functions and vectors will be the key to the finite Fourier transform, of
fundamental importance in modern signal processing.
Example 2.7. The preceding construction admits yet a further generalization. We
continue to let S be an arbitrary set. Let V be a vector space. The claim is that the space
F(S, V ) consisting of all V valued functions f : S V is a vector space. In other words,
we replace the particular vector space R in the preceding example by a general vector
space, and the same conclusion holds. The operations of function addition and scalar
multiplication are defined in the evident manner: (f + g)(x) = f (x) + g(x) and (c f )(x) =
c f (x), where we are using the vector addition and scalar multiplication operations on V to
induce corresponding operations on V valued functions. The proof that F(S, V ) satisfies
all of the vector space axioms proceeds as before.
The most important example is when S R n is a domain in Euclidean space and
V = R m is itself a Euclidean space. In this case, the elements of F(S, R m ) consist of
T
vector-valued functions f : S R m , so that f (x) = ( f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ) )
is a column vector consisting of m functions of n variables, all defined on a common
domain S. The general construction implies that addition and scalar multiplication of
vector-valued functions is done componentwise; for example

2 x cos x
cos x
x2
.
=

2
2 ex x 8
x
ex 4

2.2. Subspaces.
In the preceding section, we were introduced to the most basic vector spaces that play
a role in this text. Almost all of the important vector spaces arising in applications appear
as particular subsets of these key examples.
Definition 2.8. A subspace of a vector space V is a subset W V which is a vector
space in its own right.
Since elements of W also belong to V , the operations of vector addition and scalar
multiplication for W are induced by those of V . In particular, W must contain the zero
element of V in order to satisfy axiom (c). The verification of the vector space axioms for
a subspace is particularly easy: we only need check that addition and scalar multiplication
keep us within the subspace.
1/12/04

c 2003

Peter J. Olver

Proposition 2.9. A subset W V of a vector space is a subspace if and only if

(a) for every v, w W , the sum v + w W , and
(b) for every v W and every c R, the scalar product c v W .
Proof : The proof is essentially trivial. For example, to show commutativity, given
v, w W , we can regard them as elements of V , in which case v + w = w + v because V
is a vector space. But the closure condition implies that the sum also belongs to W , and
so the commutativity axiom also holds for elements of W . The other axioms are equally
easy to validate.
Q.E.D.
Remark : Condition (a) says that a subspace must be closed under addition, while
(a) says it must also be closed under scalar multiplication. It will sometimes be useful to
combine the two closure conditions. Thus, to prove W V is a subspace it suffices to
check that c v + d w W for every v, w W and c, d R.
Example 2.10. Let us list some examples of subspaces of the three-dimensional
Euclidean space R 3 . In each case, we must verify the closure conditions; the first two are
immediate.
(a) The trivial subspace W = {0}.
(b) The entire space W = R 3 .
T

(c) The set of all vectors of the form ( x, y, 0 ) , i.e., the (x, y)coordinate plane. Note
T
T
T
T
that the sum ( x, y, 0 ) +( x
b, yb, 0 ) = ( x + x
b, y + yb, 0 ) , and scalar multiple c ( x, y, 0 ) =
T
( c x, c y, 0 ) , of vectors in the (x, y)plane also lie in the plane, proving closure.
T

(d) The set of solutions ( x, y, z ) to the homogeneous linear equation

3 x + 2 y z = 0.
T

Indeed, if x = ( x, y, z ) is a solution, then so is any scalar multiple c x = ( c x, c y, c z )

since
3 (c x) + 2 (c y) (c z) = c (3 x + 2 y z) = 0.
b = (b
b = (x + x
Moreover, if x
x, yb, zb) is a second solution, the sum x + x
b, y + yb, z + zb )
also a solution since

3 (x + x
b) + 2 (y + yb) (z + zb) = (3 x + 2 y z) + (3 x
b + 2 yb zb) = 0.

Note that the solution space is a two-dimensional plane consisting of all vectors which are
T
perpendicular (orthogonal) to the vector ( 3, 2, 1 ) .
(e) The set of all vectors lying in the plane spanned by the vectors v1 = ( 2, 3, 0 )
T
and v2 = ( 1, 0, 3 ) . In other words, we consider all vectors of the form

2a + b
1
2
v = a v1 + b v2 = a 3 + b 0 = 3 a ,
3b
3
0

1/12/04

c 2003

Peter J. Olver

where a, b R are arbitrary scalars. If v = a v1 + b v2 and w = b

a v1 + bb v2 are any two
vectors of this form, so is
c v + d w = c (a v1 + b v2 ) + d (b
a v1 + bb v2 ) = (a c + b
a d)v1 + (b c + bb d)v2 = e
a v1 + eb v2 ,

where e
a = a c+b
a d, eb = b c+ bb d. This proves that the plane is a subspace of R 3 . The reader
might already have noticed that this subspace is the same plane that was considered in
item (d).
Example 2.11. The following subsets of R 3 are not subspaces.
T
(a) The set P of all vectors of the form ( x, y, 1 ) , i.e., the plane parallel to the
T
x y coordinate plane passing through ( 0, 0, 1 ) . Indeed, 0 6
P , which is the most basic
requirement for a subspace. In fact, neither of the closure axioms hold for this subset.
(b) The positive octant O + = {x > 0, y > 0, z > 0}. While the sum of two vectors in
O+ belongs to O + , multiplying by negative scalars takes us outside the orthant, violating
closure under scalar multiplication.
(c) The unit sphere S 2 = { x2 + y 2 + z 2 = 1 }. Again, 0 6
S 2 . More generally, curved
surfaces, e.g., the paraboloid P = { z = x2 + y 2 }, are not subspaces. Although 0 P ,
T
most scalar multiples of vectors in P do not belong to P . For example, ( 1, 1, 2 ) P ,
T
T
but 2 ( 1, 1, 2 ) = ( 2, 2, 4 ) 6
P.
In fact, there are only four fundamentally different types of subspaces W R 3 of
three-dimensional Euclidean space:
(i ) The entire space W = R 3 ,
(ii ) a plane passing through the origin,
(iii ) a line passing through the origin,
(iv ) the trivial subspace W = {0}.
To verify this observation, we argue as follows. If W = {0} contains only the zero vector,
then we are in case (iv). Otherwise, W R 3 contains a nonzero vector 0 6
= v1 W .
But since W must contain all scalar multiples c v1 of this element, it includes the entire
line in the direction of v1 . If W contains another vector v2 that does not lie in the line
through v1 , then it must contain the entire plane {c v1 + d v2 } spanned by v1 , v2 . Finally,
if there is a third vector v3 not contained in this plane, then we claim that W = R 3 . This
final fact will be an immediate consequence of general results in this chapter, although the
interested reader might try to prove it directly before proceeding.
Example 2.12. Let I R be an interval, and let F(I) be the space of real-valued
functions f : I R. Let us look at some of the most important examples of subspaces
of F(I). In each case, we need only verify the closure conditions to verify that the given
subset is indeed a subspace.
(a) The space P (n) of polynomials of degree n, which we already encountered.
S
(b) The space P () = n0 P (n) consisting of all polynomials.
(c) The space C0 (I) of all continuous functions. Closure of this subspace relies on
knowing that if f (x) and g(x) are continuous, then both f (x) + g(x) and cf (x) for any
c R are also continuous two basic results from calculus.
1/12/04

c 2003

Peter J. Olver

(d) More restrictively, one can consider the subspace Cn (I) consisting of all functions
f (x) that have n continuous derivatives f 0 (x), f 00 (x), . . . , f (n) (x) on I. Again, we need to
know that if f (x) and g(x) have n continuous derivatives, so do f (x) + g(x) and cf (x) for
any c R.
T
(e) The space C (I) = n0 Cn (I) of infinitely differentiable or smooth functions
is also a subspace. (The fact that this intersection is a subspace follows directly from
Exercise .)
(f ) The space A(I) of analytic functions on the interval I. Recall that a function
f (x) is called analytic at a point a if it is smooth, and, moreover, its Taylor series
f (a) + f 0 (a) (x a) +

1
2

f 00 (a) (x a)2 + =

X
f (n) (a)
(x a)n
n!
n=0

(2.2)

converges to f (x) for all x sufficiently close to a. (It does not have to converge on the entire
interval I.) Not every smooth function is analytic, and so A(I) ( C (I). An explicit
example is the function
1/x
e
,
x > 0,
(2.3)
f (x) =
0,
x 0.
It can be shown that every derivative of this function at 0 exists and equals zero: f (n) (0) =
0, n = 0, 1, 2, . . ., and so the function is smooth. However, its Taylor series at a = 0 is
0 + 0 x + 0 x2 + 0, which converges to the zero function, not to f (x). Therefore f (x)
is not analytic at a = 0.
(g) The set of all mean zero functions. The mean or average of an integrable function
defined on a closed interval I = [ a, b ] is the real number
Z b
1
f (x) dx.
(2.4)
f=
ba a
Z b
f (x) dx = 0. Note that f + g = f + g,
In particular, f has mean zero if and only if
a

and so the sum of two mean zero functions also has mean zero. Similarly, cf = c f , and
any scalar multiple of a mean zero function also has mean zero.
(h) Let x0 I be a given point. Then the set of all functions f (x) that vanish
at the point, f (x0 ) = 0, is a subspace. Indeed, if f (x0 ) = 0 and g(x0 ) = 0, then clearly
(f +g)(x0 ) = 0 and c f (x0 ) = 0, proving closure. This example can evidently be generalized
to functions that vanish at several points, or even on an entire subset.
(i) The set of all solutions u = f (x) to the homogeneous linear differential equation
u00 + 2 u0 3 u = 0.
Indeed, if u = f (x) and u = g(x) are solutions, so are u = f (x) + g(x) and u = c f (x) for
any c R. Note that we do not need to actually solve the equation to verify these claims!

If I = [ a, b ] is closed, we use the appropriate one-sided derivatives at its endpoints.

1/12/04

c 2003

Peter J. Olver

They follow directly from linearity; for example

(f + g)00 + 2(f + g)0 3(f + g) = (f 00 + 2 f 0 3 f ) + (g 00 + 2 g 0 3 g) = 0.
Warning: In the last three examples, the value 0 is essential for the indicated set of
functions to be a subspace. The set of functions such that f (x0 ) = 1, say, is not a subspace.
The set of functions with a fixed nonzero mean, say f = 3, is also not a subspace. Nor is
the set of solutions to an inhomogeneous ordinary differential equation, say
u00 + 2 u0 3 u = x 3.
None of these subsets contain the zero function, nor do they satisfy the closure conditions.

2.3. Span and Linear Independence.

The definition of the span of a finite collection of elements of a vector space generalizes,
in a natural fashion, the geometric notion of two vectors spanning a plane in R 3 . As such,
it forms the first of two important, general methods for constructing subspaces of vector
spaces.
Definition 2.13. Let v1 , . . . , vk be a finite collection of elements of a vector space
V . A sum of the form
k
X
ci vi ,
(2.5)
c1 v1 + c 2 v2 + + c k vk =
i=1

where the coefficients c1 , c2 , . . . , ck are any scalars, is known as a linear combination of the
elements v1 , . . . , vk . Their span is the subset W = span {v1 , . . . , vk } V consisting of all
possible linear combinations (2.5).
For example,
3 v1 + v2 2 v 3 ,

8 v1 31 v3 ,

v2 = 0 v 1 + 1 v 2 + 0 v 3 ,

and

0 = 0 v 1 + 0 v2 + 0 v3 ,

are four different linear combinations of the three vector space elements v 1 , v2 , v3 V .
The key observation is that a span always forms a subspace.
Proposition 2.14. The span of a collection of vectors, W = span {v1 , . . . , vk },
forms a subspace of the underlying vector space.
Proof : We need to show that if
v = c 1 v1 + + c k vk

and

are any two linear combinations, then their sum

b=b
v
c 1 v1 + + b
c k vk

b = (c1 + b
v+v
c1 )v1 + + (ck + b
ck )vk ,

is also a linear combination, as is any scalar multiple

a v = (a c1 )v1 + + (a ck )vk
1/12/04

Q.E .D.
c 2003

Peter J. Olver

Example 2.15. Examples of subspaces spanned by vectors in R 3 :

(i ) If v1 6
= 0 is any non-zero vector in R 3 , then its span is the line { c v1 | c R } in
the direction of v1 . If v1 = 0, then its span just consists of the origin.
(ii ) If v1 and v2 are any two vectors in R 3 , then their span is the set of all vectors
of the form c1 v1 + c2 v2 . Typically, such a span forms a plane passing through the origin.
However, if v1 and v2 are parallel, then their span is just a line. The most degenerate case
is when v1 = v2 = 0, where the span is just a point the origin.
(iii ) If we are given three non-coplanar vectors v1 , v2 , v3 , then their span is all of R 3 ,
as we shall prove below. However, if they all lie in a plane, then their span is the plane
unless they are all parallel, in which case their span is a line or, when v 1 = v2 = v3 = 0,
a single point.
Thus, any subspace of R 3 can be realized as the span of some set of vectors. Note that
we can also consider the span of four or more vectors, but the range of possible subspaces
is limited, as we noted above, to either a point (the origin), a line, a plane, or the entire
three-dimensional space. A crucial question, that we will return to shortly, is to determine
when a given vector belongs to the span of a colection of vectors.
Remark : It is entirely possible for different sets of vectors to span the same subspace.
T
T
For instance, the pair of vectors e1 = ( 1, 0, 0 ) and e2 = ( 0, 1, 0 ) span the xy plane in
T
T
T
R 3 , as do the three coplanar vectors v1 = ( 1, 1, 0 ) , v2 = ( 1, 2, 0 ) , v3 = ( 2, 1, 0 ) .
Example 2.16. Let V = F(R) denote the space of all scalar functions f (x).
(a) The span of the three monomials f1 (x) = 1, f2 (x) = x and f3 (x) = x2 is the set
of all functions of the form
f (x) = c1 f1 (x) + c2 f2 (x) + c3 f3 (x) = c1 + c2 x + c3 x2 ,
where c1 , c2 , c3 are arbitrary scalars (constants). In other words, span {1, x, x 2 } = P (2)
is the subspace of all quadratic (degree 2) polynomials. In a similar fashion, the space
P (n) of polynomials of degree n is spanned by the monomials 1, x, x 2 , . . . , xn .
(b) The next example plays a key role in many applications. Let R be fixed.
Consider the two basic trigonometric functions f1 (x) = cos x, f2 (x) = sin x of frequency
, and hence period 2 /. Their span consists of all functions of the form
f (x) = c1 f1 (x) + c2 f2 (x) = c1 cos x + c2 sin x.

(2.6)

For example, the function cos( x + 2) lies in the span because, by the addition formula
for the cosine,
cos( x + 2) = cos 2 cos x sin 2 sin x
is a linear combination of cos x and sin x.
We can express a general function in their span in the alternative phase-amplitude
form
f (x) = c1 cos x + c2 sin x = r cos( x ).
(2.7)
Expanding the right hand side, we find
r cos( x ) = r cos cos x + r sin sin x
1/12/04

c 2003

Peter J. Olver

3
2
1
-4

-2

2
-1
-2
-3

Graph of 3 cos(2 x 1).

Figure 2.1.
and hence
c1 = r cos ,

c2 = r sin .

We can view the amplitude r 0 and the phase shift as the polar coordinates of point
c = (c1 , c2 ) R 2 prescribed by the coefficients. Thus, any combination of sin x and
cos x can be rewritten as a single cosine, with a phase lag. Figure 2.1 shows the particular
case 3 cos(2 x 1) which has amplitude r = 3, frequency = 2 and phase shift = 1. The
first peak appears at x = / = 12 .
(c) The space T (2) of quadratic trigonometric polynomials is spanned by the functions
1,

cos x,

sin x,

cos2 x,

sin2 x.

cos x sin x,

Thus, the general quadratic trigonometric polynomial can be written as a linear combination
q(x) = c0 + c1 cos x + c2 sin x + c3 cos2 x + c4 cos x sin x + c5 sin2 x,
(2.8)
where c0 , . . . , c5 are arbitrary constants. A more useful spanning set for the same subspace
is the trigonometric functions
1,

cos x,

sin x,

cos 2 x,

sin 2 x.

(2.9)

Indeed, by the double angle formulas, both

cos 2 x = cos2 x sin2 x,

sin 2 x = 2 sin x cos x,

have the form of a quadratic trigonometric polynomial (2.8), and hence both belong to
T (2) . On the other hand, we can write
cos2 x =

1
2

cos 2 x + 12 ,

cos x sin x =

1
2

sin 2 x,

sin2 x = 12 cos 2 x + 12 ,

in terms of the functions (2.9). Therefore, the original linear combination (2.8) can be
written in the alternative form

q(x) = c0 +

1
2

c3 +

1
2

c5 + c1 cos x + c2 sin x + 12 c3

=b
c0 + b
c1 cos x + b
c2 sin x + b
c3 cos 2 x + b
c4 sin 2 x,

1/12/04

1
2

cos 2 x +

1
2

c4 sin 2 x
(2.10)

c 2003

Peter J. Olver

and so the functions (2.9) do indeed span T (2) . It is worth noting that we first characterized T (2) as the span of 6 functions, whereas the second characterization only required 5
functions. It turns out that 5 is the minimal number of functions needed to span T (2) , but
the proof of this fact will be deferred until Chapter 3.
(d) The homogeneous linear ordinary differential equation
u00 + 2 u0 3 u = 0.

(2.11)

considered in part (i) of Example 2.12 has two independent solutions: f 1 (x) = ex and
f2 (x) = e 3 x . (Now may be a good time for you to review the basic techniques for solving
linear, constant coefficient ordinary differential equations.) The general solution to the
differential equation is a linear combination
u = c1 f1 (x) + c2 f2 (x) = c1 ex + c2 e 3 x .
Thus, the vector space of solutions to (2.11) is described as the span of these two basic
solutions. The fact that there are no other solutions is not obvious, but relies on the
basic existence and uniqueness theorems for linear ordinary differential equations; see
Theorem 7.33 for further details.
Remark : One can also define the span of an infinite collection of elements of a vector
space. To avoid convergence issues, one should only consider finite linear combinations
(2.5). For example, the span of the monomials 1, x, x2 , x3 , . . . is the space P () of all
polynomials. (Not the space of convergent Taylor series.) Similarly, the span of the
functions 1, cos x, sin x, cos 2 x, sin 2 x, cos 3 x, sin 3 x, . . . is the space of all trigonometric
polynomials, to be discussed in great detail in Chapter 12.
Linear Independence and Dependence
Most of the time, all of the vectors used to form a span are essential. For example, we
cannot use fewer than two vectors to span a plane in R 3 since the span of a single vector
is at most a line. However, in the more degenerate cases, some of the spanning elements
are not needed. For instance, if the two vectors are parallel, then their span is a line, but
only one of the vectors is really needed to define the line. Similarly, the subspace spanned
by the polynomials
p1 (x) = x 2,

p2 (x) = x2 5 x + 4,

p3 (x) = 3 x2 4 x,

p4 (x) = x2 1.

(2.12)

is the vector space P (2) of quadratic polynomials. But only three of the polynomials are
really required to span P (2) . (The reason will become clear soon, but you may wish to see
if you can demonstrate this on your own.) The elimination of such superfluous spanning
elements is encapsulated in the following basic definition.
Definition 2.17. The vectors v1 , . . . , vk V are called linearly dependent if there
exists a collection of scalars c1 , . . . , ck , not all zero, such that
c1 v1 + + ck vk = 0.

(2.13)

Vectors which are not linearly dependent are called linearly independent.
1/12/04

c 2003

Peter J. Olver

The restriction that the ci s not all simultaneously vanish is essential. Indeed, if
c1 = = ck = 0, then the linear combination (2.13) is automatically zero. To check
linear independence, one needs to show that the only linear combination that produces
the zero vector (2.13) is this trivial one. In other words, c1 = = ck = 0 is the one and
only solution to the vector equation (2.13).
Example 2.18. Some examples of linear independence and dependence:
(a) The vectors

1
0
1
v 3 = 4 ,
v 2 = 3 ,
v1 = 2 ,
3
1
1

are linearly dependent. Indeed,

v1 2 v2 + v3 = 0.
On the other hand, the first two vectors v1 , v2 are linearly independent. To see this,
suppose that

0
c1

c1 v1 + c 2 v2 = 2 c 1 + 3 c 2 = 0 .
0
c1 + c2

For this to happen, the coefficients c1 , c2 must satisfy the homogeneous linear system
c1 = 0,

2 c1 + 3 c2 = 0,

c1 + c2 = 0,

which has only the trivial solution c1 = c2 = 0, proving linear independence.

(b) In general, any collection v1 , . . . , vk that includes the zero vector, say v1 = 0, is
automatically linearly dependent, since 1 v1 + 0 v2 + + 0 vk = 0 is a nontrivial linear
combination that adds up to 0.
(c) The polynomials (2.12) are linearly dependent; indeed,
p1 (x) + p2 (x) p3 (x) + 2 p4 (x) 0
is a nontrivial linear combination that vanishes identically. On the other hand, the first
three polynomials, p1 (x), p2 (x), p3 (x), are linearly independent. Indeed, if the linear combination
c1 p1 (x) + c2 p2 (x) + c3 p3 (x) = (c2 + 3 c3 ) x2 + (c1 5 c2 4 c3 ) x 2 c1 + 4 c2 0
is the zero polynomial, then its coefficients must vanish, and hence c1 , c2 , c3 are required
to solve the homogeneous linear system
c2 + 3 c3 = 0,

c1 5 c2 4 c3 = 0,

2 c1 + 4 c2 = 0.

But this has only the trivial solution c1 = c2 = c3 = 0, and so linear independence follows.
1/12/04

c 2003

Peter J. Olver

Remark : In the last example, we are using the basic fact that a polynomial is identically zero,
p(x) = a0 + a1 x + a2 x2 + + an xn 0
for all
x,
if and only if its coefficients all vanish: a0 = a1 = = an = 0. This is equivalent
to the self-evident fact that the basic monomial functions 1, x, x 2 , . . . , xn are linearly
independent; see Exercise .
Example 2.19. The set of quadratic trigonometric functions
1,

cos x,

sin x,

cos2 x,

cos x sin x,

sin2 x,

that were used to define the vector space T (2) of quadratic trigonometric polynomials, are,
in fact, linearly dependent. This is a consequence of the basic trigonometric identity
cos2 x + sin2 x 1
which can be rewritten as a nontrivial linear combination
1 + 0 cos x + 0 sin x cos2 x + 0 cos x sin x sin2 x 0
that sums to the zero function. On the other hand, the alternative spanning set
1,

cos x,

sin x,

cos 2 x,

sin 2 x,

is linearly independent, since the only identically zero linear combination

c0 + c1 cos x + c2 sin x + c3 cos 2 x + c4 sin 2 x 0
is the trivial one c0 = . . . = c4 = 0. However, the latter fact is not as obvious, and requires
a bit of work to prove directly; see Exercise . An easier proof, based on orthogonality,
will appear in Chapter 5.
Let us now focus our attention on the linear independence or dependence of a set
of vectors v1 , . . . , vk R n in Euclidean space. We begin by forming the n k matrix
A = ( v1 . . . vk ) whose columns are the given vectors. (The fact that we use column
vectors is essential here.) The key is a very basic formula
c
1

A c = c 1 v1 + + c k vk ,

where

c=
.. ,
.
ck

(2.14)

that expresses any linear combination in terms of matrix multiplication. For example,

1
3
0
c1
c1 + 3 c 2
1
3
0
1 2
1 c2 = c1 + 2 c2 + c3 = c1 1 + c2 2 + c3 1 .
4 1 2
c3
4 c1 c2 2 c 3
4
1
2

Formula (2.14) is an immediate consequence of the rules of matrix multiplication; see also
Exercise c. It allows us to reformulate the notions of linear independence and span in
terms of linear systems of equations. The main result is the following:
1/12/04

c 2003

Peter J. Olver

Theorem 2.20. Let v1 , . . . , vk R n and let A = ( v1 . . . vk ) be the corresponding

n k matrix.
(a) The vectors v1 , . . . , vk R n are linearly dependent if and only if there is a non-zero
solution c 6
= 0 to the homogeneous linear system A c = 0.
(b) The vectors are linearly independent if and only if the only solution to the homogeneous system A c = 0 is the trivial one c = 0.
(c) A vector b lies in the span of v1 , . . . , vk if and only if the linear system A c = b is
compatible, i.e., it has at least one solution.
Proof : We prove the first statement, leaving the other two as exercises for the reader.
The condition that v1 , . . . , vk be linearly dependent is that there is a nonzero vector
T

c = ( c1 , c2 , . . . , c k ) 6
=0
such that the linear combination
A c = c1 v1 + + ck vk = 0.
Therefore, linear dependence requires the existence of a nontrivial solution to the homogeneous linear system A c = 0.
Q.E.D.
Example 2.21. Let us determine whether the vectors

1
3
1
v1 = 2 ,
v2 = 0 ,
v3 = 4 ,
1
4
6

are linearly independent or linearly dependent.

a single matrix

1 3

A=
2 0
1 4

v4 = 2 ,
3

(2.15)

We combine them as column vectors into

1
4
6

4
2 .
3

According to Theorem 2.20, we need to figure out whether there are any nontrivial solutions
to the homogeneous equation A c = 0; this can be done by reducing A to row echelon form,
which is

1 3
1
4
U = 0 6 6 6 .
(2.16)
0 0
0
0
The general solution to the homogeneous system A c = 0 is

c = ( 2 c 3 c4 , c 3 c4 , c3 , c4 ) ,
where c3 , c4 the free variables are arbitrary. Any nonzero choice of c3 , c4 will produce
a nontrivial linear combination
(2 c3 c4 )v1 + ( c3 c4 )v2 + c3 v3 + c4 v4 = 0
that adds up to the zero vector. Therefore, the vectors (2.15) are linearly dependent.
1/12/04

c 2003

Peter J. Olver

In fact, Theorem 1.45 says that in this particular case we didnt even need to do the
row reduction if we only needed to answer the question of linear dependence or linear
independence. Any coefficient matrix with more columns than rows automatically has a
nontrivial solution to the associated homogeneous system. This implies the following:
Lemma 2.22. Any collection of k > n vectors in R n is linearly dependent.
Warning: The converse to this lemma is not true. For example, the two vectors
T
T
v1 = ( 1, 2, 3 ) and v2 = ( 2, 4, 6 ) in R 3 are linearly dependent since 2 v1 + v2 = 0.
For a collection of n or fewer vectors in R n , one does need to perform the elimination to
calculate the rank of the corresponding matrix.
Lemma 2.22 is a particular case of the following general characterization of linearly
independent vectors.
Proposition 2.23. A set of k vectors in R n is linearly independent if and only if
the corresponding n k matrix A has rank k. In particular, this requires k n.
Or, to state the result another way, the vectors are linearly independent if and only if
the linear system A c = 0 has no free variables. The proposition is an immediate corollary
of Propositions 2.20 and 1.45.
Example 2.21. (continued ) Let us now see which vectors b R 3 lie in the span of
the vectors (2.15). This will be the case if and only if the linear system A x = b has a
solution. Since the resulting row echelon form (2.16) has a row of all zeros, there will be a
compatibility condition on the entries of b, and therefore not every vector lies in the span.
To find the precise condition, we augment the coefficient matrix, and apply the same row
operations, leading to the reduced augmented matrix

1 3
1
4
b1

0 6 6 6
.
b2 2 b 1

0 0
0
0 b +7b 4b
3

Therefore, b = ( b1 , b2 , b3 ) lies in the span of these four vectors if and only if

34 b1 +

7
6

b2 + b3 = 0.

In other words, these four vectors only span a plane in R 3 .

The same method demonstrates that a collection of vectors will span all of R n if
and only if the row echelon form of the associated matrix contains no all zero rows, or,
equivalently, the rank is equal to n, the number of rows in the matrix.
Proposition 2.24. A collection of k vectors will span R n if and only if their n k
matrix has rank n. In particular, this requires k n.
Warning: Not every collection of n or more vectors in R n will span all of R n . A
counterexample is provided by the vectors (2.15).
1/12/04

c 2003

Peter J. Olver

2.4. Bases.
In order to span a vector space or subspace, we must use a sufficient number of distinct
elements. On the other hand, including too many elements in the spanning set will violate
linear independence, and cause redundancies. The optimal spanning sets are those that are
also linearly independent. By combining the properties of span and linear independence,
we arrive at the all-important concept of a basis.
Definition 2.25. A basis of a vector space V is a finite collection of elements
v1 , . . . , vn V which (a) span V , and (b) are linearly independent.
Bases are absolutely fundamental in all areas of linear algebra and linear analysis, including matrix algebra, geometry of Euclidean space, solutions to linear differential equations, both ordinary and partial, linear boundary value problems, Fourier analysis, signal
and image processing, data compression, control systems, and so on.
Example 2.26. The standard basis of R n consists of the n vectors

1
0
0
0
1
0

0
0
0

e1 = .. ,
e2 = .. ,
...
en =
..

,
.
.
.
0
0
0
0
0
1

(2.17)

so that ei is the vector with 1 in the ith slot and 0s elsewhere. We already encountered
these vectors as the columns of the n n identity matrix, as in (1.39). They clearly span
R n since we can write any vector
x
1

x=
.. = x1 e1 + x2 e2 + + xn en ,
.
xn

(2.18)

as a linear combination, whose coefficients are the entries of x. Moreover, the only linear
combination that gives the zero vector x = 0 is the trivial one x1 = = xn = 0, and so
e1 , . . . , en are linearly independent.
Remark : In the three-dimensional case R 3 , a common physical notation for the standard basis is

1
0
0

i = e1 = 0 ,
j = e2 = 1 ,
k = e3 = 0 .
(2.19)
0
0
1
There are many other possible bases for R 3 . Indeed, any three non-coplanar vectors
can be used to form a basis. This is a consequence of the following general characterization
of bases in R n .
1/12/04

c 2003

Peter J. Olver

Theorem 2.27. Every basis of R n contains exactly n vectors. A set of n vectors

v1 , . . . , vn R n is a basis if and only if the n n matrix A = ( v1 . . . vn ) is nonsingular.
Proof : This is a direct consequence of Theorem 2.20. Linear independence requires
that the only solution to the homogeneous system A x = 0 is the trivial one x = 0.
Secondly, a vector b R n will lie in the span of v1 , . . . , vn if and only if the linear system
A x = b has a solution. For v1 , . . . , vn to span R n , this must hold for all possible right
hand sides b. Theorem 1.7 tells us that both results require that A be nonsingular, i.e.,
have maximal rank n.
Q.E.D.
Thus, every basis of n-dimensional Euclidean space R n contains the same number of
vectors, namely n. This is a general fact, and motivates a linear algebra characterization
of dimension.
Theorem 2.28. Suppose the vector space V has a basis v1 , . . . , vn . Then every other
basis of V has the same number of elements in it. This number is called the dimension of
V , and written dim V = n.
The proof of Theorem 2.28 rests on the following lemma.
Lemma 2.29. Suppose v1 , . . . , vn span a vector space V . Then every set of k > n
elements w1 , . . . , wk V is linearly dependent.
Proof : Let us write each element
wj =

n
X

aij vi ,

j = 1, . . . , k,

i=1

as a linear combination of the spanning set. Then

c 1 w1 + + c k wk =

n X
k
X

aij cj vi .

i=1 j =1
T

This linear combination will be zero whenever c = ( c1 , c2 , . . . , ck ) solves the homogeneous

linear system
k
X
aij cj = 0,
i = 1, . . . , n,
j =1

consisting of n equations in k > n unknowns. Theorem 1.45 guarantees that every homogeneous system with more unknowns than equations always has a non-trivial solution
c6
= 0, and this immediately implies that w1 , . . . , wk are linearly dependent.
Q.E.D.
Proof of Theorem 2.28 : Suppose we have two bases containing a different number of
elements. By definition, the smaller basis spans the vector space. But then Lemma 2.29
tell us that the elements in the larger purported basis must be linearly dependent. This
contradicts our assumption that both sets are bases, and proves the theorem.
Q.E.D.
As a direct consequence, we can now provide a precise meaning to the optimality
property of bases.
1/12/04

c 2003

Peter J. Olver

Theorem 2.30. Suppose V is an n-dimensional vector space. Then

(a) Every set of more than n elements of V is linearly dependent.
(b) No set of less than n elements spans V .
(c) A set of n elements forms a basis if and only if it spans V .
(d) A set of n elements forms a basis if and only if it is linearly independent.
In other words, once we determine the dimension of a vector space, to check that a
given collection with the correct number of elements forms a basis, we only need check one
of the two defining properties: span or linear independence. Thus, n elements that span an
n-dimensional vector space are automatically linearly independent and hence form a basis;
vice versa, n linearly independent elements of n-dimensional vector space automatically
span the space and so form a basis.
Example 2.31. The standard basis of the space P (n) of polynomials of degree n
is given by the n + 1 monomials 1, x, x2 , . . . , xn . (A formal proof of linear independence
appears in Exercise .) We conclude that the vector space P (n) has dimension n + 1.
Thus, any collection of n + 2 or more polynomials of degree n is automatically linearly
dependent. Any other basis of P (n) must contain precisely n + 1 polynomials. But, not
every collection of n+1 polynomials in P (n) is a basis they must be linearly independent.
See Exercise for details.
Remark : By definition, every vector space of dimension 1 n < has a basis. If a
vector space V has no basis, it is either the trivial vector space V = {0}, which by convention has dimension 0, or, by definition, its dimension is infinite. An infinite-dimensional
vector space necessarily contains an infinite collection of linearly independent vectors, and
hence no (finite) basis. Examples of infinite-dimensional vector spaces include most spaces
of functions, such as the spaces of continuous, differentiable, or mean zero functions, as
well as the space of all polynomials, and the space of solutions to a linear homogeneous
partial differential equation. On the other hand, the solution space for a homogeneous
linear ordinary differential equation turns out to be a finite-dimensional vector space. The
most important example of an infinite-dimensional vector space, Hilbert space, to be
introduced in Chapter 12,is essential to modern analysis and function theory, [122, 126],
as well as providing the theoretical setting for all of quantum mechanics, [100, 104].
Warning: There is a well-developed concept of a complete basis of such infinitedimensional function spaces, essential in Fourier analysis,[122, 126], but this requires additional analytical constructions that are beyond our present abilities. Thus, in this book
the term basis always means a finite collection of vectors in a finite-dimensional vector
space.
Lemma 2.32. The elements v1 , . . . , vn form a basis of V if and only if every x V
can be written uniquely as a linear combination thereof :
x = c 1 v1 + + c n vn =

n
X

ci vi

(2.20)

i=1

1/12/04

c 2003

Peter J. Olver

Proof : The condition that the basis span V implies every x V can be written as
some linear combination of the basis elements. Suppose we can write an element
x = c 1 v1 + + c n vn = b
c 1 v1 + + b
c n vn

as two different combinations. Subtracting one from the other, we find

(c1 b
c1 ) v1 + + (cn b
cn ) vn = 0.

Linear independence of the basis elements implies that the coefficients ci b

ci = 0. We
conclude that ci = b
ci , and hence the linear combinations are the same.
Q.E.D.
The coefficients (c1 , . . . , cn ) in (2.20) are called the coordinates of the vector x with
respect to the given basis. For the standard basis (2.17) of R n , the coordinates of a vector
T
x = ( x1 , x2 , . . . , xn ) are its entries i.e., its usual Cartesian coordinates, cf. (2.18). In
many applications, an inspired change of basis will lead to a better adapted coordinate
system, thereby simplifying the computations.
Example 2.33. A Wavelet Basis. The vectors

1
1
1
1
1
1
v3 =
v2 =
v1 = ,
,
,
0
1
1
0
1
1

0
0
v4 =
,
1
1

(2.21)

form a basis of R 4 . This is verified by performing Gaussian elimination on the corresponding 4 4 matrix

1 1
1
0
1 1 1 0
A=
,
1 1 0
1
1 1 0 1
to check that it is nonsingular. This basis is a very simple example of a wavelet basis; the
general case will be discussed in Section 13.2. Wavelets arise in modern applications to
signal and digital image processing, [43, 128].
How do we find the coordinates of a vector x relative to the basis? We need to fix the
coefficients c1 , c2 , c3 , c4 so that
x = c 1 v1 + c 2 v2 + c 3 v3 + c 4 v4 .
We rewrite this equation in matrix form
x = Ac

where

c = ( c 1 , c2 , c3 , c4 ) .
T

For example, solving the linear system for the vector x = ( 4, 2, 1, 5 ) by Gaussian
Elimination produces the unique solution c1 = 2, c2 = 1, c3 = 3, c4 = 2, which are its
coordinates in the wavelet basis:

4
1
1
1
0
2
1 1
1
0

= 2 v 1 v2 + 3 v 3 2 v 4 = 2
+ 3
2
.
1
1
1
0
1
5
1
1
0
1
1/12/04

c 2003

Peter J. Olver

In general, to find the coordinates of a vector x with respect to a new basis of R n

requires the solution of a linear system of equations, namely
Ac = x

for

c = A1 x.

(2.22)

Here x = ( x1 , x2 , . . . , xn ) are the Cartesian coordinates of x, with respect to the standard

T
basis e1 , . . . , en , while c = ( c1 , c2 , . . . , cn ) denotes its coordinates with respect to the new
basis v1 , . . . , vn formed by the columns of the coefficient matrix A = ( v1 v2 . . . vn ). In
practice, one solves for the coordinates by using Gaussian Elimination, not by matrix
inversion.
Why would one want to change bases? The answer is simplification and speed many
computations and formulas become much easier, and hence faster, to perform in a basis that
is adapted to the problem at hand. In signal processing, the wavelet basis is particularly
appropriate for denoising, compression, and efficient storage of signals, including audio,
still images, videos, medical images, geophysical images, and so on. These processes would
be quite time-consuming, if not impossible in the case of video processing, to accomplish
in the standard basis. Many other examples will appear throughout the text.

2.5. The Fundamental Matrix Subspaces.

Let us now return to the general study of linear systems of equations, which we write
in our usual matrix form
A x = b.
(2.23)
Here A is an m n matrix, where m is the number of equations and n the number of
unknowns, i.e., the entries of x R n .
Kernel and Range
There are four important vector subspaces associated with any matrix, which play a
key role in the interpretation of our solution algorithm. The first two of these subspaces
are defined as follows.
Definition 2.34. The range of an m n matrix A is the subspace rng A R m
spanned by the columns of A. The kernel or null space of A is the subspace ker A R n
consisting of all vectors which are annihilated by A, so
ker A = { z R n | A z = 0 } R n .

(2.24)

An alternative name for the range is the column space of the matrix. By definition, a
vector b R m belongs to rng A if and only if it can be written as a linear combination,
b = x 1 v1 + + x n vn ,
of the columns of A = ( v1 v2 . . . vn ). By our basic matrix multiplication formula (2.14),
the right hand side of this equation equals the product A x of the matrix A with the column
T
vector x = ( x1 , x2 , . . . , xn ) , and hence b = A x for some x R n , so
rng A = { A x | x R n } R m .
1/12/04

(2.25)
c 2003

Peter J. Olver

Therefore, a vector b lies in the range of A if and only if the linear system A x = b has
a solution. Thus, the compatibility conditions for linear systems can be re-interpreted as
the conditions for a vector to lie in the range of the coefficient matrix.
A common alternative name for the kernel is the null space of the matrix A. The kernel
of A is the set of solutions to the homogeneous system A z = 0. The proof that ker A is a
subspace requires us to verify the usual closure conditions. Suppose that z, w ker A, so
that A z = 0 = A w. Then, for any scalars c, d,
A(c z + d w) = c A z + d A w = 0,
which implies that c z + d w ker A, proving that ker A is a subspace. This fact can be
re-expressed as the following superposition principle for solutions to a homogeneous system
of linear equations.
Theorem 2.35. If z1 , . . . , zk are solutions to a homogeneous linear system A z = 0,
then so is any linear combination c1 z1 + + ck zk .
Warning: The set of solutions to an inhomogeneous linear system A x = b with b 6
=0
is not a subspace.

1 2 0
3
Example 2.36. Let us compute the kernel of the matrix A = 2 3 1 4 .
3 5 1 1
Since we are solving the homogeneous system A x = 0, we only need
perform the elemen
1 2 0
3
tary row operations on A itself. The resulting row echelon form U = 0 1 1 10
0 0
0
0
corresponds to the equations x 2 y + 3w = 0, y z 10 w = 0. The free variables are
z, w. The general solution to the homogeneous system is

x
2 z + 17 w
2
17
y z + 10 w
1
10
x= =
= z + w ,
z
z
1
0
w
w
0
1
which, for arbitrary scalars z, w, describes the most general vector in ker A. Thus, the
kernel of this matrix is the two-dimensional subspace of R 4 spanned by the linearly indeT
T
pendent vectors ( 2, 1, 1, 0 ) , ( 17, 10, 0, 1 ) .
Remark : This example is indicative of a general method for finding a basis for ker A
which will be developed in more detail in the following section.
Once we know the kernel of the coefficient matrix A, i.e., the space of solutions to the
homogeneous system A z = 0, we are in a position to completely characterize the solutions
to the inhomogeneous linear system (2.23).
Theorem 2.37. The linear system A x = b has a solution x? if and only if b lies in
the range of A. If this occurs, then x is a solution to the linear system if and only if
x = x? + z,

(2.26)

where z ker A is an arbitrary element of the kernel of A.

1/12/04

c 2003

Peter J. Olver

Proof : We already demonstrated the first part of the theorem. If A x = b = A x? are

any two solutions, then their difference z = x x? satisfies
A z = A(x x? ) = A x A x? = b b = 0,
and hence z belongs to the kernel of A. Therefore, x and x? are related by formula (2.26),
which proves the second part of the theorem.
Q.E.D.
Therefore, to construct the most general solution to an inhomogeneous system, we
need only know one particular solution x? , along with the general solution z ker A to
the homogeneous equation. This construction should remind the reader of the method
of solution for inhomogeneous linear ordinary differential equations. Indeed, both linear
algebraic systems and linear ordinary differential equations are but two particular instances
of the general theory of linear systems, to be developed in Chapter 7. In particular, we can
characterize the case when the linear system has a unique solution in any of the following
equivalent ways.
Proposition 2.38. Let A be an m n matrix. Then the following conditions are
equivalent:
(i ) ker A = {0}.
(ii ) rank A = n
(iii ) There are no free variables in the linear system A x = b.
(iv ) The system A x = b has a unique solution for each b rng A.
Specializing even further to square matrices, we can characterize invertibility by looking either at its kernel or at its range.
Proposition 2.39. If A is a square matrix, then the following three conditions are
equivalent: (i ) A is nonsingular; (ii ) ker A = {0}; (iii ) rng A = R n .
Example 2.40. Consider the system A x = b, where

1 0 1
x1
A = 0 1 2 ,
x = x2 ,
x3
1 2 3

b1
b = b2 ,
b3

where the right hand side of the system will be left arbitrary. Applying our usual Gaussian
Elimination procedure to the augmented matrix

1 0 1
b1
1 0 1 b1
0 1 2
.
0 1 2 b2
leads to the row echelon form
b2

0 0 0 b3 + 2 b 2 b 1
1 2 3 b3
The system has a solution if and only if the resulting compatibility condition
b1 + 2 b 2 + b3 = 0

(2.27)

holds. This equation serves to characterize the vectors b that belong to the range of the
matrix A, which is therefore a certain plane in R 3 passing through the origin.
1/12/04

c 2003

Peter J. Olver

To characterize the kernel of A, we take b = 0, and solve the homogeneous system

A z = 0. The row echelon form corresponds to the reduced system
z1 z3 = 0,

z2 2 z3 = 0.

The free variable is z3 , and the equations are solved to give

z1 = c,

z2 = 2 c,

z3 = c,
T

where c is arbitrary. The general solution to the homogeneous system is z = ( c, 2 c, c ) =

T
T
c ( 1, 2, 1 ) , and so the kernel is the line in the direction of the vector ( 1, 2, 1 ) .
T
If we take b = ( 3, 1, 1 ) which satisfies (2.27) and hence lies in the range of A
then the general solution to the inhomogeneous system A x = b is
x1 = 3 + c,

x2 = 1 + 2 c,

x3 = c,

where c is an arbitrary scalar. We can write the solution in the form (2.26), namely

3+c
3
1
x = 1 + 2 c = 1 + c 2 = x? + z,
c
0
1
T

where x? = ( 3, 1, 0 ) plays the role of the particular solution, and z = c ( 1, 2, 1 )

general element of the kernel.

is the

The Superposition Principle

The principle of superposition lies at the heart of linearity. For homogeneous systems,
superposition allows one to generate new solutions by combining known solutions. For
inhomogeneous systems, superposition combines the solutions corresponding to different
inhomogeneities or forcing functions. Superposition is the reason why linear systems are
so important for applications and why they are so much easier to solve. We shall explain
the superposition principle in the context of inhomogeneous linear algebraic systems. In
Chapter 7 we shall see that the general principle applies as stated to completely general
linear systems, including linear differential equations, linear boundary value problems,
linear integral equations, etc.
Suppose we have found particular solutions x?1 and x?2 to two inhomogeneous linear
systems
A x = b1 ,
A x = b2 ,
that have the same coefficient matrix A. Consider the system
A x = c 1 b1 + c 2 b2 ,
in which the right hand side is a linear combination or superposition of the previous two.
Then a particular solution to the combined system is given by the same linear combination
of the previous solutions:
x? = c1 x?1 + c2 x?2 .
1/12/04

c 2003

Peter J. Olver

The proof is easy; we use the rules of matrix arithmetic to compute

A x? = A(c1 x?1 + c2 x?2 ) = c1 A x?1 + c2 A x?2 = c1 b1 + c2 b2 .
In many applications, the inhomogeneities b1 , b2 represent external forces, and the
solutions x?1 , x?2 represent the response of the physical apparatus to the force. The linear
superposition principle says that if we know how the system responds to the individual
forces, we immediately know its response to any combination thereof. The precise details
of the system are irrelevant all that is required is linearity.
Example 2.41. For example, the system

f
x
4 1
=
g
y
1 4
models the mechanical response of a pair of masses connected by springs to an external
T
force. The solution x = ( x, y ) represent the respective displacements of the masses,
T
while the components of the right hand side f = ( f, g ) represent the respective forces
applied to each mass. (See Chapter 6 for full details.) We can compute the response of the
4

T
1 T
system x?1 = 15
to a unit force e1 = ( 1, 0 ) on the first mass, and the response
, 15
1 4 T
T
x?2 = 15
to a unit force e2 = ( 0, 1 ) on the second mass. We then know the
, 15
response of the system to a general force, since we can write

0
1
f
,
+g
= f e1 + g e2 = f
f=
1
0
g
and hence the solution is
x = f x?1 + g x?2 = f

4
15
1
15

1
15
4
15

4
1
15 f 15 g
4
1
f + 15
g
15

The preceding construction is easily extended to several inhomogeneities, and the

result is a general Superposition Principle for inhomogeneous linear systems.
Theorem 2.42. Suppose that we know particular solutions x?1 , . . . , x?k to each of the
inhomogeneous linear systems
A x = b1 ,

A x = b2 ,

...

A x = bk ,

(2.28)

where b1 , . . . , bk rng A. Then, for any choice of scalars c1 , . . . , ck , a particular solution

to the combined system
A x = c 1 b1 + + c k bk
(2.29)
is the same superposition
x? = c1 x?1 + + ck x?k

(2.30)

of individual solutions. The general solution to (2.29) is

u = x? + z = c1 x?1 + + ck x?k + z,

(2.31)

where z is the general solution to the homogeneous equation A z = 0.

1/12/04

c 2003

Peter J. Olver

In particular, if we know particular solutions x?1 , . . . , x?m to

A x = ei ,

for each

i = 1, . . . , m,

(2.32)

where e1 , . . . , em are the standard basis vectors of R m , cf. (2.17), then we can reconstruct
a particular solution x? to the general linear system A x = b by first writing
b = b1 e1 + + b m em
as a linear combination of the basis vectors, and then using superposition to form
x? = b1 x?1 + + bm x?m .

(2.33)

However, for linear algebraic systems, the practical value of this insight is rather limited.
Indeed, in the case when A is square and nonsingular, the superposition method is just a
reformulation of the method of computing the inverse of the matrix. Indeed, the vectors
x?1 , . . . , x?n which satisfy (2.32) are just the columns of A1 , cf. (1.39), and the superposition formula (2.33) is, using (2.14), precisely the solution formula x ? = A1 b that we
abandoned in practical computations, in favor of the more efficient Gaussian elimination
method. Nevertheless, the implications of this result turn out to be of great importance
in the study of linear boundary value problems.
Adjoint Systems, Cokernel, and Corange
A linear system of m equations in n unknowns results in an mn coefficient matrix A.
The transposed matrix AT will be of size n m, and forms the coefficient of an associated
linear system consisting of n equations in m unknowns.
Definition 2.43. The adjoint to a linear system A x = b of m equations in n
unknowns is the linear system
AT y = f
(2.34)
of n equations in m unknowns. Here y R m and f R n .
Example 2.44. Consider the linear system
x1 3 x 2 7 x 3 + 9 x 4 = b 1 ,
x2 + 5 x 3 3 x 4 = b 2 ,
x1 2 x 2 2 x 3 + 6 x 4 = b 3 ,

of three equations in four

1
3
has transpose AT =
7
9

(2.35)

1 3 7 9
unknowns. Its coefficient matrix is A = 0 1
5 3
1 2 2 6

0
1
1 2
. Thus, the adjoint system to (2.35) is the following
5 2
3 6

Warning: Many texts misuse the term adjoint to describe the classical adjugate or cofactor
matrix. These are completely unrelated, and the latter will play no role in this book.

1/12/04

c 2003

Peter J. Olver

system of four equations in three unknowns:

y1 + y 3 = f 1 ,
3 y 1 + y2 2 y 3 = f 2 ,

(2.36)

7 y 1 + 5 y 2 2 y 3 = f3 ,
9 y1 3 y 2 + 6 y 3 = f4 .

On the surface, there appears to be little direct connection between the solutions to a
linear system and its adjoint. Nevertheless, as we shall soon see (and then in even greater
depth in Sections 5.6 and 8.5) there are remarkable, but subtle interrelations between the
two. These turn out to have significant consequences, not only for linear algebraic systems
but to even more profound extensions to differential equations.
To this end, we use the adjoint system to define the other two fundamental subspaces
associated with a coefficient matrix A.
Definition 2.45. The corange of an m n matrix A is the range of its transpose,

(2.37)
corng A = rng AT = AT y y R m R n .

The cokernel or left null space of A is the kernel of its transpose,

coker A = ker AT = w R m AT w = 0 R m ,

(2.38)

that is, the set of solutions to the homogeneous adjoint system.

The corange coincides with the subspace of R n spanned by the rows of A, and is
sometimes referred to as the row space. As a consequence of Theorem 2.37, the adjoint
system AT y = f has a solution if and only if f rng AT = corng A.
To solve the linear system
(2.35) appearing above,
we perform

1 3 7 9 b1
Gaussian Elimination on its augmented matrix 0 1
5 3 b2 that reduces
b
3
1 2 2 6

1 3 7 9
b1
. Thus, the system has a
it to the row echelon form 0 1
5 3
b2

0 0
0
0
b3 b 2 b 1
solution if and only if b rng A satisfies the compatibility condition b 1 b2 + b3 = 0.
For such vectors, the general solution is
Example 2.46.

b1 + 3 b 2 8 x 3
b1 + 3 b 2
8
0
b2
b2 5 x 3 + 3 x 4

5
3
x=
=
+ x3
+ x 4 .
x3
0
1
0
x4
0
0
1

In the second expression, the first vector is a particular solution and the remaining terms
constitute the general element of the two-dimensional kernel of A.
1/12/04

c 2003

Peter J. Olver

The solution to the adjoint system (2.36) is also obtained by Gaussian Elimination

1
0
1 f1

3 1 2 f2
starting with its augmented matrix

. The resulting row echelon

7 5 2 f3

9 3 6
f4

1 0 1
f1

f2
0 1 1

form is

. Thus, there are two compatibility constraints

0 0 0 f3 5 f 2 8 f 1

0 0 0
f4 + 3 f 2
required for a solution to the adjoint system: 8 f1 5 f2 + f3 = 0, 3 f2 + f4 = 0. These
are the conditions required for the right hand side to belong to the corange: f rng A T =
corng A. If satisfied, the adjoint system has the general solution depending on the single
free variable y3 :

f1 y 3
f1
1
y = 3 f1 + f2 y3 = 3 f1 + f2 + y3 1 .
y3
0
1

In the latter formula, the first term represents a particular solution, while the second is
the general element of ker AT = coker A.
The Fundamental Theorem of Linear Algebra

The four fundamental subspaces associated with an m n matrix A, then, are its
range, corange, kernel and cokernel. The range and cokernel are subspaces of R m , while
the kernel and corange are subspaces of R n . Moreover, these subspaces are not completely
arbitrary, but are, in fact, profoundly related through both their numerical and geometric
properties.
The Fundamental Theorem of Linear Algebra states that their dimensions are entirely
prescribed by the rank (and size) of the matrix.
Theorem 2.47. Let A be an m n matrix of rank r. Then
dim corng A = dim rng A = rank A = rank AT = r,
dim ker A = n r,

dim coker A = m r.

(2.39)

Remark : Thus, the rank of a matrix, i.e., the number of pivots, indicates the number
of linearly independent columns, which, remarkably, is always the same as the number of
linearly independent rows! A matrix and its transpose have the same rank, i.e., the same
number of pivots, even though their row echelon forms are quite different, and are rarely
transposes of each other. Theorem 2.47 also proves our earlier contention that the rank of
a matrix is an intrinsic quantity, and does not depend on which specific elementary row
operations are employed during the reduction process, nor on the final row echelon form.

Not to be confused with the Fundamental Theorem of Algebra, that states that every polynomial has a complex root; see Theorem 16.62.

1/12/04

c 2003

Peter J. Olver

Proof : Since the dimension of a subspace is prescribed by the number of vectors in

any basis, we need to relate bases of the fundamental subspaces to the rank of the matrix.
Rather than present the general argument, we will show how to construct bases for each of
the subspaces in a particular instance, and thereby illustrate the method of proof. Consider
the matrix

2 1 1
2
A = 8 4 6 4 .
4 2 3
2

2 1 1 2
The row echelon form of A is obtained in the usual manner: U = 0 0 2 4 .
0 0
0 0
There are two pivots, and thus the rank of A is r = 2.
Kernel : We need to find the solutions to the homogeneous system A x = 0. In our
example, the pivots are in columns 1 and 3, and so the free variables are x 2 , x4 . Using back
substitution on the reduced homogeneous system U x = 0, we find the general solution

1
1
2
2 x2 2 x 4
2

1
0

(2.40)
x=
= x2 + x4
.

0
2

2 x4
1
0
x4
Note that the second and fourth entries are the corresponding free variables x 2 , x4 . Therefore,
T

T
z2 = ( 2 0 2 1 ) ,
z1 = 12 1 0 0 ,

are the basis vectors for ker A. By construction, they span the kernel, and linear independence follows easily since the only way in which the linear combination (2.40) could
vanish, x = 0, is if both free variables vanish: x2 = x4 = 0. In general, there are n r
free variables, each corresponding to one of the basis elements of the kernel, which thus
implies the dimension formula for ker A.
Corange: The corange is the subspace of R n spanned by the rows of A. We claim
that applying an elementary row operation does not alter the corange. To see this for row
b is obtained adding a times the
operations of the first type, suppose, for instance, that A
b
first row of A to the second row. If r1 , r2 , r3 , . . . , rm are the rows of A, then the rows of A
are r1 , b
r2 = r2 + a r1 , r3 , . . . , rm . If
v = c 1 r1 + c 2 r2 + c 3 r3 + + c m rm

is any vector belonging to corng A, then

v=b
c 1 r1 + c 2 b
r2 + c 3 r3 + + c m rm ,

where

b
c 1 = c1 a c 2 ,

b
is also a linear combination of the rows of the new matrix, and hence lies in corng A.
b implies v corng A and we conclude that
The converse is also valid v corng A
elementary row operations of Type #1 do not change corng A. The proof for the other
two types of elementary row operations is even easier, and left to the reader.
1/12/04

c 2003

Peter J. Olver

Since the row echelon form U is obtained from A by a sequence of elementary row
operations, we conclude that corng A = corng U . Moreover, because each nonzero row in
U contains a pivot, it is not hard to see that the nonzero rows of corng U are linearly
independent, and hence form a basis of both corng U and corng A. Since there is one row
per pivot, corng U = corng A has dimension r, the number of pivots. In our example, then,
a basis for corng A consists of the row vectors
s1 = ( 2 1 1 2 ),

s2 = ( 0 0 2 4 ).

The reader may wish to verify their linear independence, as well as the fact that every row
of A lies in their span.
Range: There are two methods for computing a basis of the range or column space.
The first proves that it has dimension equal to the rank. This has the important, and
remarkable consequence that the space spanned by the rows of a matrix and the space
spanned by its columns always have the same dimension, even though they are, in general,
subspaces of different vector spaces.
Now the range of A and the range of U are, in general, different subspaces, so we
cannot directly use a basis for rng U as a basis for rng A. However, the linear dependencies
among the columns of A and U are the same. It is not hard to see that the columns of U
that contain the pivots form a basis for rng U . This implies that the same columns of A
form a basis for rng A. In particular, this implies that dim rng A = dim rng U = r.
In our example, the pivots lie in the first and third columns of U , and hence the first
and third columns of A, namely

2
1
v1 = 8 ,
v3 = 6 ,
4
3
form a basis for rng A. This implies that every column of A can be written uniquely as a
linear combination of the first and third column, as you can validate directly.
In more detail, using our matrix multiplication formula (2.14), we see that a linear
combination of columns of A is trivial,
c1 v1 + + cn vn = A c = 0,
if and only if c ker A. But we know ker A = ker U , and so the same linear combination
of columns of U , namely
U c = c1 u1 + + cn un = 0,
is also trivial. In particular, the linear independence of the pivot columns of U , labeled
uj1 , . . . , ujr , implies the linear independence of the same collection, vj1 , . . . , vjr , of columns
of A. Moreover, the fact that any other column of U can be written as a linear combination
uk = d 1 uj1 + + d r ujr
of the pivot columns implies that the same holds for the corresponding column of A, so
v k = d 1 v j1 + + d r v jr .
1/12/04

c 2003

Peter J. Olver

We conclude that the pivot columns of A form a basis for its range or column space.
An alternative method to find a basis for the range is to note that rng A = corng A T .
Thus, we can employ our previous algorithm to compute corng AT . In our example, applying Gaussian elimination to

2 8 4
2 8 4
0 2 1
1 4 2
b =
AT =

leads to the row echelon form U

. (2.41)
1 6 3
0 0 0
2 4 2
0 0 0

Observe that the row echelon form of AT is not the transpose of the row echelon form of
A! However, they do have the same number of pivots since both A and A T have the same
b , we conclude that
rank. Since the pivots of AT are in the first two columns of U

0
2
y2 = 2 ,
y1 = 8 ,
1
4

forms an alternative basis for rng A.

Cokernel : Finally, to determine a basis for the cokernel of the matrix, we apply the
preceding algorithm for finding a basis for ker AT = coker A. Since the ranks of A and AT
coincide, there are now m r free variables, which is the same as the dimension of ker A T .
In our particular example, using the reduced form (2.41), the only free variable is y 3 ,
and the general solution to the homogeneous adjoint system AT y = 0 is

0
0
1
1
y = 2 y3 = y 3 2 .
y3

We conclude that coker A is one-dimensional, with basis 0

2.6. Graphs and Incidence Matrices.

1
2

We now present an application of linear systems to graph theory. A graph consists

of one or more points, called vertices, and lines or curves connecting them, called edges.
Edge edge connects exactly two vertices, which, for simplicity, are assumed to always be
distinct, so that no edge forms a loop that connects a vertex to itself. However, we do
permit two vertices to be connected by multiple edges. Some examples of graphs appear
in Figure 2.2; the vertices are the black dots. In a planar representation of the graph, the
edges may cross over each other at non-nodal points, but do not actually meet think of
a circuit where the (insulated) wires lie on top of each other, but do not touch. Thus, the
first graph in Figure 2.2 has 5 vertices and 8 edges; the second has 4 vertices and 6 edges
the two central edges do not meet; the final graph has 5 vertices and 10 edges.
Graphs arise in a multitude of applications. A particular case that will be considered in
depth is electrical networks, where the edges represent wires, and the vertices represent the
nodes where the wires are connected. Another example is the framework for a building
1/12/04

c 2003

Peter J. Olver

Figure 2.2.

Figure 2.3.

Three Different Graphs.

Three Versions of the Same Graph.

the edges represent the beams and the vertices the joints where the beams are connected.
In each case, the graph encodes the topology meaning interconnectedness of the
system, but not its geometry lengths of edges, angles, etc.
Two graphs are considered to be the same if one can identify all their edges and
vertices, so that they have the same connectivity properties. A good way to visualize
this is to think of the graph as a collection of strings connected at the vertices. Moving
the vertices and strings around without cutting or rejoining them will have no effect on
the underlying graph. Consequently, there are many ways to draw a given graph; three
equivalent graphs appear in Figure 2.3.
Two vertices in a graph are adjacent if there is an edge connecting them. Two edges
are adjacent if they meet at a common vertex. For instance, in the graph in Figure 2.4, all
vertices are adjacent; edge 1 is adjacent to all edges except edge 5. A path is a sequence of
distinct, i.e., non-repeated, edges, with each edge adjacent to its successor. For example,
in Figure 2.4, one path starts at vertex #1, then goes in order along the edges labeled as
1, 4, 3, 2, thereby passing through vertices 1, 2, 4, 1, 3. Note that while an edge cannot be
repeated in a path, a vertex may be. A circuit is a path that ends up where it began. For
example, the circuit consisting of edges 1, 4, 5, 2 starts at vertex 1, then goes to vertices
2, 4, 3 in order, and finally returns to vertex 1. The starting vertex for a circuit is not
important. For example, edges 4, 5, 2, 1 also represent the same circuit we just described.
A graph is called connected if you can get from any vertex to any other vertex by a path,
1/12/04

c 2003

Peter J. Olver

A Simple Graph.

Figure 2.4.

Figure 2.5.

Digraphs.

which is by far the most important case for applications. We note that every graph can
be decomposed into a disconnected collection of connected subgraphs.
In electrical circuits, one is interested in measuring currents and voltage drops along
the wires in the network represented by the graph. Both of these quantities have a direction,
and therefore we need to specify an orientation on each edge in order to quantify how the
current moves along the wire. The orientation will be fixed by specifying the vertex the
edge starts at, and the vertex it ends at. Once we assign a direction to an edge, a
current along that wire will be positive if it moves in the same direction, i.e., goes from
the starting vertex to the ending one, and negative if it moves in the opposite direction.
The direction of the edge does not dictate the direction of the current it just fixes what
directions positive and negative values of current represent. A graph with directed edges
is known as a directed graph or digraph for short. The edge directions are represented by
arrows; examples of digraphs can be seen in Figure 2.5.
Consider a digraph D consisting of n vertices connected by m edges. The incidence
matrix associated with D is an m n matrix A whose rows are indexed by the edges
and whose columns are indexed by the vertices. If edge k starts at vertex i and ends at
vertex j, then row k of the incidence matrix will have a + 1 in its (k, i) entry and 1 in its
(k, j) entry; all other entries of the row are zero. Thus, our convention is that a + 1 entry
1/12/04

c 2003

Peter J. Olver

A Simple Digraph.

Figure 2.6.

represents the vertex at which the edge starts and a 1 entry the vertex at which it ends.
A simple example is the digraph in Figure 2.6, which consists of five edges joined at
four different vertices. Its 5 4 incidence matrix is

1 1 0
0
1 0 1 0

A = 1 0
0 1 .
(2.42)

0 1
0 1
0 0
1 1
Thus the first row of A tells us that the first edge starts at vertex 1 and ends at vertex 2.
Similarly, row 2 says that the second edge goes from vertex 1 to vertex 3. Clearly one can
completely reconstruct any digraph from its incidence matrix.
Example 2.48. The matrix

1
1

0
A=
0

0
0

1
0
1
1
0
0

0
1
1
0
1
0

0
0
0
1
1
1

0
0

0
.
0

0
1

(2.43)

qualifies as an incidence matrix because each row contains a single +1, a single 1, and
the other entries are 0. Let us construct the digraph corresponding to A. Since A has five
columns, there are five vertices in the digraph, which we label by the numbers 1, 2, 3, 4, 5.
Since it has seven rows, there are 7 edges. The first row has its + 1 in column 1 and its
1 in column 2 and so the first edge goes from vertex 1 to vertex 2. Similarly, the second
edge corresponds to the second row of A and so goes from vertex 3 to vertex 1. The third
row of A gives an edge from vertex 3 to vertex 2; and so on. In this manner we construct
the digraph drawn in Figure 2.7.
The incidence matrix has important geometric and quantitative consequences for the
graph it represents. In particular, its kernel and cokernel have topological significance. For
1/12/04

c 2003

Peter J. Olver

1
1

2
3

5
6

Another Digraph.

Figure 2.7.

example, the kernel of the incidence matrix (2.43) is spanned by the single vector
T

z = (1 1 1 1 1) ,
and represents the fact that the sum of the entries in any given row of A is zero. This
observation holds in general for connected digraphs.
Proposition 2.49. If A is the incidence matrix for a connected digraph, then ker A
T
is one-dimensional, with basis z = ( 1 1 . . . 1 ) .
Proof : If edge k connects vertices i and j, then the k th equation in A z = 0 is zi = zj .
The same equality holds, by a simple induction, if the vertices i and j are connected
by a path. Therefore, if D is connected, all the entries of z are equal, and the result
follows.
Q.E.D.
Corollary 2.50. If A is the incidence matrix for a connected digraph with n vertices,
then rank A = n 1.
Proof : This is an immediate consequence of Theorem 2.47.

Q.E.D.

Next, let us look at the cokernel of an incidence matrix. Consider the particular
example (2.42) corresponding to the digraph in Figure 2.6. We need to compute the kernel
of the transposed incidence matrix

1
1
1
0
0
0
1
0
1 0
(2.44)
AT =
.
0 1 0
0
1
0
0 1 1 1
Solving the homogeneous system AT y = 0 by Gaussian elimination, we discover that
coker A = ker AT is spanned by the two vectors
T

y1 = ( 1 0 1 1 0 ) ,
1/12/04

y2 = ( 0 1 1 0 1 ) .
85

c 2003

Peter J. Olver

Each of these vectors represents a circuit in the digraph, the nonzero entries representing
the direction in which the edges are traversed. For example, y1 corresponds to the circuit
that starts out along edge #1, then traverses edge #4 and finishes by going along edge #3
in the reverse direction, which is indicated by the minus sign in its third entry. Similarly,
y2 represents the circuit consisting of edge #2, followed by edge #5, and then edge #3,
backwards. The fact that y1 and y2 are linearly independent vectors says that the two
circuits are independent.
The general element of coker A is a linear combination c1 y1 + c2 y2 . Certain values of
the constants lead to other types of circuits; for example y1 represents the same circuit
as y1 , but traversed in the opposite direction. Another example is
T

y1 y2 = ( 1 1 0 1 1 ) ,
which represents the square circuit going around the outside of the digraph, along edges
1, 4, 5, 2, the fourth and second being in the reverse direction. We can view this circuit as a
combination of the two triangular circuits; when we add them together the middle edge #3
is traversed once in each direction, which effectively cancels its contribution. (A similar
cancellation occurs in the theory of line integrals; see Section A.5.) Other combinations
represent virtual circuits; for instance, one can interpret 2 y1 21 y2 as two times around
the first triangular circuit plus one half of the other triangular circuit, in the opposite
direction whatever that might mean.
Let us summarize the preceding discussion.
Theorem 2.51. Each circuit in a digraph D is represented by a vector in the cokernel
of its incidence matrix, whose entries are + 1 if the edge is traversed in the correct direction,
1 if in the opposite direction, and 0 if the edge is not in the circuit. The dimension of
the cokernel of A equals the number of independent circuits in D.
The preceding two theorems have an important and remarkable consequence. Suppose
D is a connected digraph with m edges and n vertices and A its m n incidence matrix.
Corollary 2.50 implies that A has rank r = n 1 = n dim ker A. On the other hand,
Theorem 2.51 tells us that dim coker A = l equals the number of independent circuits in
D. The Fundamental Theorem 2.47 says that r = m l. Equating these two different
computations of the rank, we find r = n 1 = m l, or n + l = m + 1. This celebrated
result is known as Eulers formula for graphs, first discovered by the extraordinarily prolific
eighteenth century Swiss mathematician Leonhard Euler .
Theorem 2.52. If G is a connected graph, then
# vertices + # independent circuits = # edges + 1.

(2.45)

Remark : If the graph is planar , meaning that it can be graphed in the plane without
any edges crossing over each other, then the number of independent circuits is equal to
the number of holes in the graph, i.e., the number of distinct polygonal regions bounded

Pronounced Oiler

1/12/04

c 2003

Peter J. Olver

Figure 2.8.

A Cubical Digraph.

by the edges of the graph. For example, the pentagonal digraph in Figure 2.7 bounds
three triangles, and so has three independent circuits. For non-planar graphs, (2.45) gives
a possible definition of the number of independent circuits, but one that is not entirely
standard. A more detailed discussion relies on further developments in the topological
properties of graphs, cf. [33].
Example 2.53. Consider the graph corresponding to the edges of a cube, as illustrated in Figure 2.8, where the second figure represents the same graph squashed down
onto a plane. The graph has 8 vertices and 12 edges. Eulers formula (3.76) tells us that
there are 5 independent circuits. These correspond to the interior square and four trapezoids in the planar version of the digraph, and hence to circuits around 5 of the 6 faces
of the cube. The missing face does indeed define a circuit, but it can be represented as
the sum of the other five circuits, and so is not independent. In Exercise , the reader is
asked to write out the incidence matrix for the cubical digraph and explicitly identify the
basis of its kernel with the circuits.
We do not have the space to further develop the remarkable connections between
graph theory and linear algebra. The interested reader is encouraged to consult a text
devoted to graph theory, e.g., [33].

1/12/04

c 2003

Peter J. Olver

Chapter 3
Inner Products and Norms
The geometry of Euclidean space relies on the familiar properties of length and angle.
The abstract concept of a norm on a vector space formalizes the geometrical notion of the
length of a vector. In Euclidean geometry, the angle between two vectors is governed by
their dot product, which is itself formalized by the abstract concept of an inner product.
Inner products and norms lie at the heart of analysis, both linear and nonlinear, in both
finite-dimensional vector spaces and infinite-dimensional function spaces. It is impossible
to overemphasize their importance for both theoretical developments, practical applications
in all fields, and in the design of numerical solution algorithms.
Mathematical analysis is founded on a few key inequalities. The most basic is the
CauchySchwarz inequality, which is valid in any inner product space. The more familiar triangle inequality for the associated norm is derived as a simple consequence. Not
every norm arises from an inner product, and in the general situation, the triangle inequality becomes part of the definition. Both inequalities retain their validity in both
finite-dimensional and infinite-dimensional vector spaces. Indeed, their abstract formulation helps focus on the key ideas in the proof, avoiding distracting complications resulting
from the explicit formulas.
In Euclidean space R n , the characterization of general inner products will lead us
to the extremely important class of positive definite matrices. Positive definite matrices
play a key role in a variety of applications, including minimization problems, least squares,
mechanical systems, electrical circuits, and the differential equations describing dynamical
processes. Later, we will generalize the notion of positive definiteness to more general linear
operators, governing the ordinary and partial differential equations arising in continuum
mechanics and dynamics. Positive definite matrices most commonly appear in so-called
Gram matrix form, consisting of the inner products between selected elements of an inner
product space. In general, positive definite matrices can be completely characterized by
their pivots resulting from Gaussian elimination. The associated matrix factorization can
be reinterpreted as the method of completing the square for the associated quadratic form.
So far, we have confined our attention to real vector spaces. Complex numbers, vectors
and functions also play an important role in applications, and so, in the final section, we
formally introduce complex vector spaces. Most of the formulation proceeds in direct
analogy with the real version, but the notions of inner product and norm on complex
vector spaces requires some thought. Applications of complex vector spaces and their
inner products are of particular importance in Fourier analysis and signal processing, and
absolutely essential in modern quantum mechanics.
1/12/04

c 2003

Peter J. Olver

v
v2
v2

Figure 3.1.

The Euclidean Norm in R 2 and R 3 .

3.1. Inner Products.

The most basic example of an inner product is the familiar dot product
h v ; w i = v w = v 1 w1 + v 2 w2 + + v n wn =

n
X

vi wi ,

(3.1)

i=1

between (column) vectors v = ( v1 , v2 , . . . , vn ) , w = ( w1 , w2 , . . . , wn ) lying in the Euclidean space R n . An important observation is that the dot product (3.1) can be identified
with the matrix product
w
1

v w = v T w = ( v 1 v2

. . . vn )
..
.
wn

(3.2)

between a row vector vT and a column vector w.

The dot product is the cornerstone of Euclidean geometry. The key remark is that
the dot product of a vector with itself,
h v ; v i = v12 + v22 + + vn2 ,
is the sum of the squares of its entries, and hence equal to the square of its length. Therefore, the Euclidean norm or length of a vector is found by taking the square root:
p

vv =
v12 + v22 + + vn2 .
(3.3)
kvk =
This formula generalizes the classical Pythagorean Theorem to n-dimensional Euclidean
space; see Figure 3.1. Since each term in the sum is non-negative, the length of a vector is
also non-negative, k v k 0. Furthermore, the only vector of length 0 is the zero vector.
The dot product and norm satisfy certain evident properties, and these serve as the
basis for the abstract definition of more general inner products on real vector spaces.
1/12/04

c 2003

Peter J. Olver

Definition 3.1. An inner product on the real vector space V is a pairing that takes
two vectors v, w V and produces a real number h v ; w i R. The inner product is
required to satisfy the following three axioms for all u, v, w V , and c, d R.
(i ) Bilinearity:
h c u + d v ; w i = c h u ; w i + d h v ; w i,
(3.4)
h u ; c v + d w i = c h u ; v i + d h u ; w i.
(ii ) Symmetry:
h v ; w i = h w ; v i.
(iii ) Positivity:
hv;vi > 0

whenever

v6
= 0,

(3.5)
while

h 0 ; 0 i = 0.

(3.6)

A vector space equipped with an inner product is called an inner product space. As
we shall see, a given vector space can admit many different inner products. Verification of
the inner product axioms for the Euclidean dot product is straightforward, and left to the
reader.
Given an inner product, the associated norm of a vector v V is defined as the
positive square root of the inner product of the vector with itself:
p
(3.7)
kvk = hv;vi .
The positivity axiom implies that k v k 0 is real and non-negative, and equals 0 if and
only if v = 0 is the zero vector.

Example 3.2. While certainly the most important inner product on R n , the dot
product is by no means the only possibility. A simple example is provided by the weighted
inner product

w1
v1
.
(3.8)
,
w=
h v ; w i = 2 v 1 w1 + 5 v 2 w2 ,
v=
w2
v2
between vectors in R 2 . The symmetry axiom (3.5) is immediate. Moreover,
h c u + d v ; w i = 2 (c u1 + d v1 ) w1 + 5 (c u2 + d v2 ) w2

= (2 c u1 w1 + 5 c u2 w2 ) + (2 d v1 w1 + 5 d v2 w2 ) = c h u ; w i + d h v ; w i,

which verifies the first bilinearity condition; the second follows by a very similar computation. (Or, one can rely on symmetry; see Exercise .) Moreover,
h v ; v i = 2 v12 + 5 v22 0
is clearly strictly positive for any v 6
= 0 and equal to zero when v = 0, which proves
positivity and hence establishes (3.8) as an legitimate inner product on R 2 . The associated
weighted norm is
p
kvk =
2 v12 + 5 v22 .
A less evident example is provided by the expression

h v ; w i = v 1 w1 v 1 w2 v 2 w1 + 4 v 2 w2 .
1/12/04

(3.9)
c 2003

Peter J. Olver

Bilinearity is verified in the same manner as before, and symmetry is obvious. Positivity
is ensured by noticing that
h v ; v i = v12 2 v1 v2 + 4 v22 = (v1 v2 )2 + 3 v22 > 0
is strictly positive for all nonzero v =
6 0. Therefore, (3.9) defines an alternative inner
product on R 2 . The associated norm
p
kvk =
v12 2 v1 v2 + 4 v22

defines a different notion of distance and consequential non-Pythagorean plane geometry.

Example 3.3. Let c1 , . . . , cn be a set of positive numbers. The corresponding

weighted inner product and weighted norm on R n are defined by
v
u n
n
X
uX
(3.10)
ci vi2 .
hv;wi =
ci vi wi ,
kvk = t
i=1

i=1

The numbers ci > 0 are the weights. The larger the weight ci , the more the ith coordinate
of v contributes to the norm. Weighted norms are particularly important in statistics and
data fitting, where one wants to emphasize certain quantities and de-emphasize others;
this is done by assigning suitable weights to the different components of the data vector
v. Section 4.3 on least squares approximation methods will contain further details.
Inner Products on Function Space
Inner products and norms on function spaces will play an absolutely essential role in
modern analysis, particularly Fourier analysis and the solution to boundary value problems
for both ordinary and partial differential equations. Let us introduce the most important
examples.
Example 3.4. Given a bounded closed interval [ a, b ] R, consider the vector space
C = C0 [ a, b ] consisting of all continuous functions f : [ a, b ] R. The integral
0

hf ;gi =

f (x) g(x) dx

(3.11)

defines an inner product on the vector space C0 , as we shall prove below. The associated
norm is, according to the basic definition (3.7),
s
Z b
f (x)2 dx .
(3.12)
kf k =
a

This quantity is known as the L2 norm of the function f over the interval [ a, b ]. The L2
norm plays the same role in infinite-dimensional function space that the Euclidean norm
or length of a vector plays in the finite-dimensional Euclidean vector space R n .
1/12/04

c 2003

Peter J. Olver

For example, if we take [ a, b ] = [ 0, 12 ], then the L2 inner product between f (x) =

sin x and g(x) = cos x is equal to
h sin x ; cos x i =

/2
0

1
1
2
= .
sin x cos x dx = sin x
2
2
x=0

Similarly, the norm of the function sin x is

s
r
Z /2

2
k sin x k =
.
(sin x) dx =
4
0

One must always be careful when evaluating function norms. For example, the constant
function c(x) 1 has norm
s
r
Z /2

2
k1k =
,
1 dx =
2
0
not 1 as you might have expected. We also note that the value of the norm depends upon
which interval the integral is taken over. For instance, on the longer interval [ 0, ],
sZ

k1k =
12 dx = .
0

Thus, when dealing with the L2 inner product or norm, one must always be careful to
specify the function space, or, equivalently, the interval on which it is being evaluated.
Let us prove that formula (3.11) does, indeed, define an inner product. First, we need
to check that h f ; g i is well-defined. This follows because the product f (x) g(x) of two
continuous functions is also continuous, and hence its integral over a bounded interval is
defined and finite. The symmetry condition for the inner product is immediate:
Z b
hf ;gi =
f (x) g(x) dx = h g ; f i,
a

because multiplication of functions is commutative. The first bilinearity axiom

h c f + d g ; h i = c h f ; h i + d h g ; h i,
amounts to the following elementary integral identity
Z b
Z
Z b

c f (x) + d g(x) h(x) dx = c

f (x) h(x) dx + d
a

g(x) h(x) dx,

valid for arbitrary continuous functions f, g, h and scalars (constants) c, d. The second
bilinearity axiom is proved similarly; alternatively, one can use symmetry to deduce it
from the first as in Exercise . Finally, positivity requires that
Z b
2
kf k = hf ;f i =
f (x)2 dx 0.
a

1/12/04

c 2003

Peter J. Olver

Figure 3.2.

Angle Between Two Vectors.

This is clear because f (x)2 0, and the integral of a nonnegative function is nonnegative.
Moreover, since the function f (x)2 is continuous and nonnegative, its integral will vanish,
Z b
f (x)2 dx = 0 if and only if f (x) 0 is the zero function, cf. Exercise . This completes
a

the demonstration.

Remark : The preceding construction applies to more general functions, but we have
restricted our attention to continuous functions to avoid certain technical complications.
The most general function space admitting this important inner product is known as
Hilbert space, which forms the foundation for modern analysis, [126], including the rigorous
theory of Fourier series, [51], and also lies at the heart of modern quantum mechanics,
[100, 104, 122]. One does need to be extremely careful when trying to extend the inner
product to more general functions. Indeed, there are nonzero, discontinuous functions with
zero L2 norm. An example is

Z 1
1,
x = 0,
2
which satisfies
kf k =
f (x)2 dx = 0
(3.13)
f (x) =
0,
otherwise,
1
because any function which is zero except at finitely many (or even countably many) points
has zero integral. We will discuss some of the details of the Hilbert space construction in
Chapters 12 and 13.
One can also define weighted inner products on the function space C0 [ a, b ]. The
weights along the interval are specified by a (continuous) positive scalar function w(x) > 0.
The corresponding weighted inner product and norm are
s
Z b
Z b
(3.14)
hf ;gi =
f (x) g(x) w(x) dx,
kf k =
f (x)2 w(x) dx .
a

The verification of the inner product axioms in this case is left as an exercise for the reader.

3.2. Inequalities.
1/12/04

c 2003

Peter J. Olver

Returning to the general framework of inner products on vector spaces, we now prove
the most important inequality in applied mathematics. Its origins can be found in the
geometric interpretation of the dot product on Euclidean space in terms of the angle
between vectors.
The CauchySchwarz Inequality
In two and three-dimensional Euclidean geometry, the dot product between two vectors can be geometrically characterized by the equation
v w = k v k k w k cos ,

(3.15)

where measures the angle between the vectors v and w, as depicted in Figure 3.2. Since
| cos | 1,
the absolute value of the dot product is bounded by the product of the lengths of the
vectors:
| v w | k v k k w k.

This fundamental inequality is named after two of the founders of modern analysis, Augustin Cauchy and Herman Schwarz. It holds, in fact, for any inner product.
Theorem 3.5. Every inner product satisfies the CauchySchwarz inequality
| h v ; w i | k v k k w k,

v, w V.

(3.16)

Here, k v k is the associated norm, while | | denotes absolute value. Equality holds if and
only if v and w are parallel vectors.
Proof : The case when w = 0 is trivial, since both sides of (3.16) are equal to 0. Thus,
we may suppose w 6
= 0. Let t R be an arbitrary scalar. Using the three basic inner
product axioms, we have
0 k v + t w k 2 = h v + t w ; v + t w i = k v k 2 + 2 t h v ; w i + t 2 k w k2 ,

(3.17)

with equality holding if and only if v = t w which requires v and w to be parallel

vectors. We fix v and w, and consider the right hand side of (3.17) as a quadratic function,
p(t) = k w k2 t2 + 2 h v ; w i t + k v k2 ,
of the scalar variable t. To get the maximum mileage out of the fact that p(t) 0, let us
look at where it assumes a minimum. This occurs when its derivative vanishes:
p0 (t) = 2 k w k2 t + 2 h v ; w i = 0,

and thus at

hv;wi
.
k w k2

Russians also give credit for its discovery to their compatriot Viktor Bunyakovskii, and,
indeed, many authors append his name to the inequality.

Two vectors are parallel if and only if one is a scalar multiple of the other. The zero vector
is parallel to every other vector, by convention.

1/12/04

c 2003

Peter J. Olver

Substituting this particular minimizing value into (3.17), we find

h v ; w i2
h v ; w i2
h v ; w i2
2
0 kvk 2
+
= kvk
.
k w k2
k w k2
k w k2
2

Rearranging this last inequality, we conclude that

h v ; w i2
k v k2 ,
2
kwk

h v ; w i 2 k v k 2 k w k2 .

Taking the (positive) square root of both sides of the final inequality completes the theorems proof.
Q.E.D.
Given any inner product on a vector space, we can use the quotient
cos =

hv;wi
kvk kwk

(3.18)

to define the angle between the elements v, w V . The CauchySchwarz inequality

tells us that the ratio lies between 1 and + 1, and hence the angle is well-defined, and,
in fact, unique if we restrict it to lie in the range 0 .
For example, using the standard dot product on R 3 , the angle between the vectors
T
T
v = ( 1 0 1 ) and w = ( 0 1 1 ) is given by
cos =

1
1
= ,
2
2 2

and so = 13 , i.e., 60 . Similarly, the angle between the polynomials p(x) = x and
q(x) = x2 defined on the interval I = [ 0, 1 ] is given by
Z 1
r
1
x3 dx
2
hx;x i
15
4
0
s
cos =
= sZ
,
= q q =
2
Z
kxk kx k
16
1
1
1
1
3
5
x2 dx
x4 dx
0

so that = 0.25268 . . . radians.

Warning: One should not try to give this notion of angle between functions more
significance than the formal definition warrants it does not correspond to any angular
properties of their graph. Also, the value depends on the choice of inner product and the
interval upon which it is being computed.
Z For example, if we change to the inner product
1

x3 dx = 0, and hence (3.18) becomes cos = 0,

on the interval [ 1, 1 ], then h x ; x i =

so the angle between x and x2 is now =

1
2

Even in Euclidean space R n , the measurement of angle (and length) depends upon
the choice of an underlying inner product. Different inner products lead to different angle
measurements; only for the standard Euclidean dot product does angle correspond to our
everyday experience.
1/12/04

c 2003

Peter J. Olver

Orthogonal Vectors
A particularly important geometrical configuration occurs when two vectors are perpendicular , which means that they meet at a right angle: = 21 or 23 , and so cos = 0.
The angle formula (3.18) implies that the vectors v, w are perpendicular if and only if
their dot product vanishes: v w = 0. Perpendicularity also plays a key role in general
inner product spaces, but, for historical reasons, has been given a different name.
Definition 3.6. Two elements v, w V of an inner product space V are called
orthogonal if their inner product h v ; w i = 0.
Orthogonality is a remarkably powerful tool in all applications of linear algebra, and
often serves to dramatically simplify many computations. We will devote Chapter 5 to its
detailed development.
T

Example 3.7. The vectors v = ( 1, 2 ) and w = ( 6, 3 ) are orthogonal with

respect to the Euclidean dot product in R 2 , since v w = 1 6 + 2 (3) = 0. We deduce
that they meet at a 90 angle. However, these vectors are not orthogonal with respect to
the weighted inner product (3.8):

6
1
= 2 1 6 + 5 2 (3) = 18 6
= 0.
;
hv;wi =
3
2
Thus, orthogonality, like angles in general, depends upon which inner product is being
used.

Example 3.8. The polynomials p(x) = x and q(x) = x2 12 are orthogonal with
Z 1
respect to the inner product h p ; q i =
p(x) q(x) dx on the interval [ 0, 1 ], since
0

x; x

1
2

1
0

x x

1
2

dx =

1
0

1
2

x dx = 0.

They fail to be orthogonal on most other intervals. For example, on the interval [ 0, 2 ],
Z 2
Z 2
2 1

3 1
2
1
x x 2 dx =
x; x 2 =
x 2 x dx = 3.
0

The Triangle Inequality

The familiar triangle inequality states that the length of one side of a triangle is at
most equal to the sum of the lengths of the other two sides. Referring to Figure 3.3, if the
first two side are represented by vectors v and w, then the third corresponds to their sum
v + w, and so k v + w k k v k + k w k. The triangle inequality is a direct consequence of
the CauchySchwarz inequality, and hence holds for any inner product space.
Theorem 3.9.
inequality

The norm associated with an inner product satisfies the triangle

kv + wk kvk + kwk

(3.19)

for every v, w V . Equality holds if and only if v and w are parallel vectors.
1/12/04

c 2003

Peter J. Olver

v+w
w

Triangle Inequality.

Figure 3.3.
Proof : We compute

k v + w k2 = h v + w ; v + w i = k v k 2 + 2 h v ; w i + k w k 2

2
k v k2 + 2 k v k k w k + k w k 2 = k v k + k w k ,

where the inequality follows from CauchySchwarz. Taking square roots of both sides and
using positivity completes the proof.
Q.E.D.

1
2
3

Example 3.10. The vectors v =

2
and w = 0 sum to v + w = 2 .
1
3
2

Their Euclidean norms are k v k = 6 and k w k = 13, while k v + w k = 17. The

triangle inequality (3.19) in this case says 17 6 + 13, which is valid.

Example 3.11. Consider the functions f (x) = x 1 and g(x) = x2 + 1. Using the
L2 norm on the interval [ 0, 1 ], we find

kf k =

1
0

(x 1)2 dx =

kf + gk =
The triangle inequality requires

1
,
3
s

77
60

kgk =
Z

(x2 + x)2 dx =
0

1
3

23
15

(x2 + 1)2 dx =
0

23
,
15

77
.
60

, which is correct.

The CauchySchwarz and triangle inequalities look much more impressive when writ1/12/04

c 2003

Peter J. Olver

ten out in full detail. For the Euclidean inner product (3.1), they are
v
v
n

u n
u n
X

uX 2 uX

vi t
wi2 ,
vi wi t

i=1
i=1
i=1
v
v
v
u n
u n
u n
uX 2
uX 2
uX
2
t
(vi + wi ) t
vi + t
wi .
i=1

i=1

(3.20)

i=1

Theorems 3.5 and 3.9 imply that these inequalities are valid for arbitrary real numbers
v1 , . . . , vn , w1 , . . . , wn . For the L2 inner product (3.12) on function space, they produce
the following splendid integral inequalities:
s
s
Z

Z b
Z b
b

2
f (x) dx
g(x)2 dx ,
f (x) g(x) dx

a
a
(3.21)
s
s
s
Z b
Z b
Z b

2
f (x) + g(x) dx
f (x)2 dx +
g(x)2 dx ,
a

which hold for arbitrary continuous (and even more general) functions. The first of these is
the original CauchySchwarz inequality, whose proof appeared to be quite deep when it first
appeared. Only after the abstract notion of an inner product space was properly formalized
did its innate simplicity and generality become evident. One can also generalize either of
these sets of inequalities to weighted inner products, replacing the integration element dx
by a weighted version w(x) dx, provided w(x) > 0.

3.3. Norms.
Every inner product gives rise to a norm that can be used to measure the magnitude
or length of the elements of the underlying vector space. However, not every such norm
used in analysis and applications arises from an inner product. To define a general norm
on a vector space, we will extract those properties that do not directly rely on the inner
product structure.
Definition 3.12. A norm on the vector space V assigns a real number k v k to each
vector v V , subject to the following axioms for all v, w V , and c R:
(i ) Positivity: k v k 0, with k v k = 0 if and only if v = 0.
(ii ) Homogeneity: k c v k = | c | k v k.
(iii ) Triangle inequality: k v + w k k v k + k w k.
As we now know, every inner product gives rise to a norm. Indeed, positivity of the
norm is one of the inner product axioms. The homogeneity property follows since
p
p
p
kcvk =
hcv;cvi =
c2 h v ; v i = | c | h v ; v i = | c | k v k.

Finally, the triangle inequality for an inner product norm was established in Theorem 3.9.
Here are some important examples of norms that do not come from inner products.
1/12/04

c 2003

Peter J. Olver

Example 3.13. Let V = R n . The 1norm of a vector v = ( v1 v2 . . . vn )

defined as the sum of the absolute values of its entries:
k v k1 = | v1 | + + | vn |.

(3.22)

The max or norm is equal to the maximal entry (in absolute value):
k v k = sup { | v1 |, . . . , | vn | }.

(3.23)

Verification of the positivity and homogeneity properties for these two norms is straightforward; the triangle inequality is a direct consequence of the elementary inequality
|a + b| |a| + |b|
for absolute values.
The Euclidean norm, 1norm, and norm on R n are just three representatives of
the general pnorm
v
u n
uX
p
| v i |p .
(3.24)
k v kp = t
i=1

This quantity defines a norm for any 1 p < . The norm is a limiting case of
the pnorm as p . Note that the Euclidean norm (3.3) is the 2norm, and is often
designated as such; it is the only pnorm which comes from an inner product. The positivity
and homogeneity properties of the pnorm are straightforward. The triangle inequality,
however, is not trivial; in detail, it reads
v
v
v
u n
u n
u n
X
u
uX
uX
p
p
p
p
p
t
t
| vi + wi |
| vi | + t
| w i |p ,
(3.25)
i=1

i=1

and is known as Minkowskis inequality. A proof can be found in [97].

Example 3.14. There are analogous norms on the space C0 [ a, b ] of continuous

functions on an interval [ a, b ]. Basically, one replaces the previous sums by integrals.
Thus, the Lp norm is defined as
s
Z b
p
p
| f (x) | dx .
(3.26)
k f kp =
a

In particular, the L1 norm is given by integrating the absolute value of the function:
Z b
k f k1 =
| f (x) | dx.
(3.27)
a

The L norm (3.12) appears as a special case, p = 2, and, again, is the only one arising from
an inner product. The proof of the general triangle or Minkowski inequality for p 6
= 1, 2 is

again not trivial. The limiting L norm is defined by the maximum

k f k = max { | f (x) | : a x b } .
1/12/04

(3.28)
c 2003

Peter J. Olver

Example 3.15. Consider the polynomial p(x) = 3 x2 2 on the interval 1 x 1.

Its L2 norm is
sZ
r
1
18
2
2
k p k2 =
(3 x 2) dx =
= 1.8974 . . . .
5
1
Its L norm is

k p k = max | 3 x2 2 | : 1 x 1 = 2,

with the maximum occurring at x = 0. Finally, its L1 norm is

k p k1 =

| 3 x2 2 | dx
1
Z

2/3

Z 2/3

(3 x 2) dx +

2
(2 3 x ) dx + (3 x 2) dx
2/3
1
2/3

q
q
q
2
= 34 23 1 + 83 23 + 43 23 1 = 16
3
3 2 = 2.3546 . . . .

Every norm defines a distance between vector space elements, namely

d(v, w) = k v w k.

(3.29)

For the standard dot product norm, we recover the usual notion of distance between points
in Euclidean space. Other types of norms produce alternative (and sometimes quite useful)
notions of distance that, nevertheless, satisfy all the familiar distance axioms. Notice that
distance is symmetric, d(v, w) = d(w, v). Moreover, d(v, w) = 0 if and only if v = w.
The triangle inequality implies that
d(v, w) d(v, z) + d(z, w)

(3.30)

for any triple of vectors v, w, z.

Unit Vectors
Let V be a fixed normed vector space. The elements u V with unit norm k u k = 1
play a special role, and are known as unit vectors (or functions). The following easy lemma
shows how to construct a unit vector pointing in the same direction as any given nonzero
vector.
Lemma 3.16. If v 6
= 0 is any nonzero vector, then the vector u = v/k v k obtained
by dividing v by its norm is a unit vector parallel to v.
Proof : We compute, making use of the homogeneity property of the norm:

v kvk
=

= 1.
kuk =
kvk kvk

1/12/04

100

c 2003

Q.E .D.

Peter J. Olver

T
Example 3.17. The vector v = ( 1, 2 ) has length k v k2 = 5 with respect to
the standard Euclidean norm. Therefore, the unit vector pointing in the same direction as
v is
1 !

v
1
1
5
=
u=
.
=
2
k v k2
5
2
5

On the other hand, for the 1 norm, k v k1 = 3, and so

v
1
e=
u
=
k v k1
3

1
2

1
3

is the unit vector parallel to v in the 1 norm. Finally, k v k = 2, and hence the corresponding unit vector for the norm is

1 !
1
v
1
2
b=
.
=
=
u
k v k
2 2
1
Thus, the notion of unit vector will depend upon which norm is being used.

Example 3.18. Similarly, on the interval [ 0, 1 ], the quadratic polynomial p(x) =

x2 21 has L2 norm
s
s
r
Z 1
Z 1

2 1 2
4
7
2
1
k p k2 =
.
x 2 dx =
x x + 4 dx =
60
0
0

p(x)
2
15 is a unit polynomial, k u k = 1, which is
= 60
x

2
7
7
kpk
parallel to (or, more correctly, a scalar multiple of) the polynomial p. On the other
hand, for the L norm,

k p k = max x2 21 0 x 1 = 12 ,

Therefore, u(x) =

and hence, in this case u

e(x) = 2 p(x) = 2 x2 1 is the corresponding unit function.
The unit sphere for the given norm is defined as the set of all unit vectors

S1 = k u k = 1 V.

(3.31)

Thus, the unit sphere for the Euclidean norm on R n is the usual round sphere

S1 = k x k2 = x21 + x22 + + x2n = 1 .

For the norm, it is the unit cube

S1 = { x R n | x1 = 1 or x2 = 1 or . . . or xn = 1 } .
For the 1 norm, it is the unit diamond or octahedron
S1 = { x R n | | x 1 | + | x 2 | + + | x n | = 1 } .
1/12/04

101

c 2003

Peter J. Olver

-1

0.5

-0.5

0.5

-1

-0.5

0.5

-1

-0.5

0.5

-0.5

-1

Figure 3.4.

Unit Balls and Spheres for 1, 2 and Norms in R 2 .

See Figure 3.4 for the two-dimensional pictures.

In all cases, the closed unit ball B1 = k u k 1 consists of all vectors of norm less
than or equal to 1, and has the unit sphere as its boundary. If V is a finite-dimensional
normed vector space, then the unit ball B1 forms a compact subset, meaning that it is
closed and bounded. This topological fact, which is not true in infinite-dimensional spaces,
underscores the fundamental distinction between finite-dimensional vector analysis and the
vastly more complicated infinite-dimensional realm.
Equivalence of Norms
While there are many different types of norms, in a finite-dimensional vector space
they are all more or less equivalent. Equivalence does not mean that they assume the same
value, but rather that they are, in a certain sense, always close to one another, and so for
most analytical purposes can be used interchangeably. As a consequence, we may be able
to simplify the analysis of a problem by choosing a suitably adapted norm.
Theorem 3.19. Let k k1 and k k2 be any two norms on R n . Then there exist
positive constants c? , C ? > 0 such that
c? k v k 1 k v k 2 C ? k v k 1

for every

v Rn.

(3.32)

Proof : We just sketch the basic idea, leaving the details to a more rigorous real analysis course, cf. [125, 126]. We begin by noting that a norm defines a continuous function
n
f (v) =
k v k on R . (Continuity is, in fact, a consequence of the triangle inequality.) Let
S1 = k u k1 = 1 denote the unit sphere of the first norm. Any continuous function defined on a compact set achieves both a maximum and a minimum value. Thus, restricting
the second norm function to the unit sphere S1 of the first norm, we can set
c? = k u? k2 = min { k u k2 | u S1 } ,

C ? = k U? k2 = max { k u k2 | u S1 } ,
(3.33)
for certain vectors u? , U? S1 . Note that 0 < c? C ? < , with equality holding if and
only if the the norms are the same. The minimum and maximum (3.33) will serve as the
constants in the desired inequalities (3.32). Indeed, by definition,
c? k u k 2 C ?
1/12/04

102

when

k u k1 = 1,

(3.34)
c 2003

Peter J. Olver

norm and 2 norm

1 norm and 2 norm

Figure 3.5.

Equivalence of Norms.

and so (3.32) is valid for all u S1 . To prove the inequalities in general, assume v 6
= 0.
(The case v = 0 is trivial.) Lemma 3.16 says that u = v/k v k1 S1 is a unit vector
in the first norm: k u k1 = 1. Moreover, by the homogeneity property of the norm,
k u k2 = k v k2 /k v k1 . Substituting into (3.34) and clearing denominators completes the
proof of (3.32).
Q.E.D.
Example 3.20. For example, consider the Euclidean norm k k2 and the max norm
k k on R n . According to (3.33), the bounding constants are found by minimizing and
maximizing k u k = max{ | u1 |, . . . , | un | } over all unit vectors k u k2 = 1 on the (round)
unit sphere. Its maximal value is obtained at the poles, whenU? = ek , with
k ek k = 1.
1
1
Thus, C ? = 1. The minimal value is obtained when u? = , . . . ,
has all equal
n
n

components, whereby c? = k u k = 1/ n . Therefore,

1
k v k2 k v k k v k2 .
n

(3.35)

One can interpret these inequalities as follows. Suppose v is a vector lying on the unit
sphere in the Euclidean norm, so k v k2 = 1. Then (3.35) tells us that its norm is
bounded from above and below by 1/ n k v k 1. Therefore, the unit Euclidean

sphere sits inside the unit sphere in the norm, and outside the sphere of radius 1/ n.
Figure 3.5 illustrates the two-dimensional situation.
One significant consequence of the equivalence of norms is that, in R n , convergence is
independent of the norm. The following are all equivalent to the standard convergence
of a sequence u(1) , u(2) , u(3) , . . . of vectors in R n :
(a) the vectors converge: u(k) u? :
(k)
(b) the individual components all converge: ui u?i for i = 1, . . . , n.
(c) the difference in norms goes to zero: k u(k) u? k 0.
The last case, called convergence in norm, does not depend on which norm is chosen.
Indeed, the basic inequality (3.32) implies that if one norm goes to zero, so does any other
1/12/04

103

c 2003

Peter J. Olver

norm. An important consequence is that all norms on R n induce the same topology
convergence of sequences, notions of open and closed sets, and so on. None of this is true in
infinite-dimensional function space! A rigorous development of the underlying topological
and analytical properties of compactness, continuity, and convergence is beyond the scope
of this course. The motivated student is encouraged to consult a text in real analysis, e.g.,
[125, 126], to find the relevant definitions, theorems and proofs.
Example 3.21. Consider the infinite-dimensional vector space C0 [ 0, 1 ] consisting of
all continuous functions on the interval [ 0, 1 ]. The functions
(
1 n x,
0 x n1 ,
fn (x) =
1
0,
n x 1,
have identical L norms
k fn k = sup { | fn (x) | | 0 x 1 } = 1.
On the other hand, their L2 norm
s
s
Z 1
Z
2
fn (x) dx =
k f n k2 =
0

1/n
0

1
(1 n x)2 dx =
3n

goes to zero as n . This example shows that there is no constant C ? such that
k f k C ? k f k2
for all f C0 [ 0, 1 ]. The L and L2 norms on C0 [ 0, 1 ] are not equivalent there exist
functions which have unit L2 norm but arbitrarily small L norm. Similar inequivalence
properties apply to all of the other standard function space norms. As a result, the topology
on function space is intimately connected with the underlying choice of norm.

3.4. Positive Definite Matrices.

Let us now return to the study of inner products, and fix our attention on the finitedimensional situation. Our immediate goal is to determine the most general inner product
which can be placed on the finite-dimensional vector space R n . The resulting analysis will
lead us to the extremely important class of positive definite matrices. Such matrices play
a fundamental role in a wide variety of applications, including minimization problems, mechanics, electrical circuits, and differential equations. Moreover, their infinite-dimensional
generalization to positive definite linear operators underlie all of the most important examples of boundary value problems for ordinary and partial differential equations.
T
Let h x ; y i denote an inner product between vectors x = ( x1 x2 . . . xn ) , y =
T
( y1 y2 . . . yn ) , in R n . Let us write the vectors in terms of the standard basis vectors:
x = x 1 e1 + + x n en =
1/12/04

n
X

xi e i ,

y = y 1 e1 + + y n en =

i=1

104

n
X

yj ej . (3.36)

j =1

c 2003

Peter J. Olver

Let us carefully analyze the three basic inner product axioms, in order. We use the
bilinearity of the inner product to expand
+
* n
n
n
X
X
X
xi yj h ei ; ej i.
yj e j =
xi e i ;
hx;yi =
i,j = 1

j =1

i=1

Therefore we can write

hx;yi =

n
X

kij xi yj = xT K y,

(3.37)

i,j = 1

where K denotes the n n matrix of inner products of the basis vectors, with entries
kij = h ei ; ej i,

i, j = 1, . . . , n.

(3.38)

We conclude that any inner product must be expressed in the general bilinear form (3.37).
The two remaining inner product axioms will impose certain conditions on the inner
product matrix K. The symmetry of the inner product implies that
kij = h ei ; ej i = h ej ; ei i = kji ,

i, j = 1, . . . , n.

Consequently, the inner product matrix K is symmetric:

K = KT .
Conversely, symmetry of K ensures symmetry of the bilinear form:
h x ; y i = xT K y = (xT K y)T = yT K T x = yT K x = h y ; x i,
where the second equality follows from the fact that the quantity is a scalar, and hence
equals its transpose.
The final condition for an inner product is positivity. This requires that
2

kxk = hx;xi = x Kx =

n
X

i,j = 1

kij xi xj 0

for all

x Rn,

(3.39)

with equality if and only if x = 0. The precise meaning of this positivity condition on the
matrix K is not as immediately evident, and so will be encapsulated in the following very
important definition.
Definition 3.22. An n n matrix K is called positive definite if it is symmetric,
K = K, and satisfies the positivity condition
T

xT K x > 0

for all

06
= x R n.

(3.40)

We will sometimes write K > 0 to mean that K is a symmetric, positive definite matrix.
Warning: The condition K > 0 does not mean that all the entries of K are positive.
There are many positive definite matrices which have some negative entries see Example 3.24 below. Conversely, many symmetric matrices with all positive entries are not
positive definite!
1/12/04

105

c 2003

Peter J. Olver

Remark : Although some authors allow non-symmetric matrices to be designated as

positive definite, we will only say that a matrix is positive definite when it is symmetric.
But, to underscore our convention and so as not to confuse the casual reader, we will often
include the adjective symmetric when speaking of positive definite matrices.
Our preliminary analysis has resulted in the following characterization of inner products on a finite-dimensional vector space.
Theorem 3.23. Every inner product on R n is given by
h x ; y i = xT K y,

x, y R n ,

for

where K is a symmetric, positive definite matrix.

(3.41)

Given a symmetric matrix K, the expression

q(x) = x K x =

n
X

kij xi xj ,

(3.42)

i,j = 1

is known as a quadratic form on R n . The quadratic form is called positive definite if

q(x) > 0

for all

06
= x R n.

(3.43)

Thus, a quadratic form is positive definite if and only if its coefficient matrix is.

4 2
has two negExample 3.24. Even though the symmetric matrix K =
2 3
ative entries, it is, nevertheless, a positive definite matrix. Indeed, the corresponding
quadratic form

2
q(x) = xT K x = 4 x21 4 x1 x2 + 3 x22 = 2 x1 x2 + 2 x22 0

is a sum of two non-negative quantities. Moreover, q(x) = 0 if and only if both terms are
zero, which requires that 2 x1 x2 = 0 and x2 = 0, whereby x1 = 0 also. This proves
positivity for all nonzero x, and hence K > 0 is indeed a positive definite matrix. The
corresponding inner product on R 2 is

4 2
y1
= 4 x 1 y1 2 x 1 y2 2 x 2 y1 + 3 x 2 y2 .
h x ; y i = ( x 1 x2 )
y2
2 3

1 2
On the other hand, despite the fact that the matrix K =
has all positive
2 1
entries, it is not a positive definite matrix. Indeed, writing out
q(x) = xT K x = x21 + 4 x1 x2 + x22 ,
we find, for instance, that q(1, 1) = 2 < 0, violating positivity. These two simple
examples should be enough to convince the reader that the problem of determining whether
a given symmetric matrix is or is not positive definite is not completely elementary.

Exercise shows that the coefficient matrix K in any quadratic form can be taken to be
symmetric without any loss of generality.

1/12/04

106

c 2003

Peter J. Olver

With a little practice, it is not difficult to read off the coefficient matrix K from the
explicit formula for the quadratic form (3.42).
Example 3.25. Consider the quadratic form
q(x, y, z) = x2 + 4 x y + 6 y 2 2 x z + 9 z 2
depending upon three variables. The corresponding coefficient matrix

1
1 2 1

whereby
q(x, y, z) = ( x y z )
2
K=
2 6 0
1
1 0 9

is
2
6
0

1
x

0
y .
9
z

Note that the squared terms in q contribute directly to the diagonal entries of K, while the
mixed terms are split in half to give the symmetric off-diagonal entries. The reader might
wish to try proving that this particular matrix is positive definite by proving positivity of
T
the quadratic form: q(x, y, z) > 0 for all nonzero ( x, y, z ) R 3 . Later, we will establish
a systematic test for positive definiteness.
Slightly more generally, a quadratic form and its associated symmetric coefficient
matrix are called positive semi-definite if
q(x) = xT K x 0

for all

x R n.

(3.44)

A positive semi-definite matrix may have null directions, meaning non-zero vectors z such
that q(z) = zT K z = 0. Clearly any vector z ker K that lies in the matrixs kernel
defines a null direction, but there may be others. In particular, a positive definite matrix
is not allowed to have null directions, so ker K = {0}. Proposition 2.39 implies that all
positive definite matrices are invertible.
Theorem 3.26. All positive definite matrices K are non-singular.

1 1
Example 3.27. The matrix K =
is positive semi-definite, but not
1 1
positive definite. Indeed, the associated quadratic form
q(x) = xT K x = x21 2 x1 x2 + x22 = (x1 x2 )2 0
is a perfect square, and so clearly non-negative. However, the elements of ker K, namely
T
the scalar multiples of the vector ( 1 1 ) , define null directions, since q(1, 1) = 0.

a b
Example 3.28. A general symmetric 2 2 matrix K =
is positive definite
b c
if and only if the associated quadratic form satisfies
q(x) = a x21 + 2 b x1 x2 + c x22 > 0

(3.45)

for all x 6
= 0. Analytic geometry tells us that this is the case if and only if
a c b2 > 0,

a > 0,

(3.46)

i.e., the quadratic form has positive leading coefficient and positive determinant (or negative discriminant). A direct proof of this elementary fact will appear shortly.
1/12/04

107

c 2003

Peter J. Olver

Remark : A quadratic form q(x) = xT K x and its associated symmetric matrix K

are called negative semi-definite if q(x) 0 for all x and negative definite if q(x) < 0
for all x 6
= 0. A quadratic form is called indefinite if it is neither positive nor negative
semi-definite; equivalently, there exist one or more points x+ where q(x+ ) > 0 and one or
more points x where q(x ) < 0.
Gram Matrices
Symmetric matrices whose entries are given by inner products of elements of an inner
product space play an important role. They are named after the nineteenth century Danish
mathematician Jorgen Gram (not the metric mass unit).
Definition 3.29. Let V be an inner product space, and let v1 , . . . , vn V . The
associated Gram matrix

h v1 ; v 1 i h v1 ; v 2 i . . . h v 1 ; v n i

h v2 ; v 1 i h v2 ; v 2 i . . . h v 2 ; v n i
.
K=
(3.47)
..
..
..

..
.

.
.
.
h vn ; v 1 i h vn ; v 2 i . . . h v n ; v n i

is the n n matrix whose entries are the inner products between the chosen vector space
elements.
Symmetry of the inner product implies symmetry of the Gram matrix:
kij = h vi ; vj i = h vj ; vi i = kji ,

and hence

K T = K.

(3.48)

In fact, the most direct method for producing positive definite and semi-definite matrices
is through the Gram matrix construction.
Theorem 3.30. All Gram matrices are positive semi-definite. A Gram matrix is
positive definite if and only if the elements v1 , . . . , vn V are linearly independent.
Proof : To prove positive (semi-)definiteness of K, we need to examine the associated
quadratic form
n
X
T
q(x) = x K x =
kij xi xj .
i,j = 1

Substituting the values (3.48) for the matrix entries, we find

q(x) =

n
X

i,j = 1

h v i ; v j i x i xj .

Bilinearity of the inner product on V implies that we can assemble this summation into a
single inner product
* n
+
n
X
X
q(x) =
xi v i ;
xj v j
= h v ; v i = k v k2 0,
i=1

1/12/04

j =1

108

c 2003

Peter J. Olver

where
v = x 1 v1 + + x n vn
lies in the subspace of V spanned by the given vectors. This immediately proves that K
is positive semi-definite.
Moreover, q(x) = k v k2 > 0 as long as v 6
= 0. If v1 , . . . , vn are linearly independent,
then v = 0 if and only if x1 = = xn = 0, and hence, in this case, q(x) and K are
positive definite.
Q.E.D.

3
1
Example 3.31. Consider the vectors v1 = 2 , v2 = 0 in R 3 . For the
6
1
standard Euclidean dot product, the Gram matrix is

6 3
v1 v 1 v1 v 2
.
=
K=
v2 v 1 v2 v 2
3 45
Positive definiteness implies that the associated quadratic form
q(x1 , x2 ) = 6 x21 6 x1 x2 + 45 x22 > 0
is positive for all (x1 , x2 ) 6
= 0. This can be checked directly using the criteria in (3.46).
On the other hand, if we use the weighted inner product h x ; y i = 3 x1 y1 + 2 x2 y2 +
5 x3 y3 , then the corresponding Gram matrix is

16 21
h v1 ; v 1 i h v1 ; v 2 i
,
=
K=
21 207
h v2 ; v 1 i h v2 ; v 2 i
which, by construction, is also positive definite.
In the case of the Euclidean dot product, the construction of the Gram matrix K
can be directly implemented as follows. Given vectors v1 , . . . , vn R m , let us form the
mn matrix A = ( v1 v2 . . . vn ) whose columns are the vectors in question. Owing to the
identification (3.2) between the dot product and multiplication of row and column vectors,
the (i, j) entry of K is given as the product
kij = vi vj = viT vj
of the ith row of the transpose AT with the j th column of A. In other words, the Gram
matrix
K = AT A
(3.49)
is the matrix

A=
2
1

product of the transpose of A with A. For the preceding Example 3.31,

1 3

3
1
2
1
6
3
T
2 0 =
and so
K=A A=
0 ,
.
3 0 6
3 45
6
1 6

Theorem 3.30 implies that the Gram matrix (3.49) is positive definite if and only if
the columns of A are linearly independent. This implies the following result.
1/12/04

109

c 2003

Peter J. Olver

Proposition 3.32. Given an m n matrix A, the following are equivalent:

(i ) The n n Gram matrix K = AT A is positive definite.
(ii ) A has linearly independent columns.
(iii ) rank A = n m.
(iv ) ker A = {0}.
As noted above, Gram matrices can be based on more general inner products on
more general vector spaces. Let us consider an alternative inner product on the finitedimensional vector space R m . As noted in Theorem 3.23, a general inner product on R m
has the form
h v ; w i = vT C w
for
v, w R m ,
(3.50)
where C > 0 is a symmetric, positive definite m m matrix. Therefore, given n vectors
v1 , . . . , vn R m , the entries of the corresponding Gram matrix are the products
kij = h vi ; vj i = viT C vj .
If we assemble the column vectors as above into an mn matrix A = ( v 1 v2 . . . vn ), then
the Gram inner products are given by multiplying the ith row of AT by the j th column
of the product matrix C A. Therefore, the Gram matrix based on the alternative inner
product (3.50) is given by
K = AT C A.
(3.51)
Theorem 3.30 immediately implies that K is positive definite provided A has rank n.
Theorem 3.33. Suppose A is an m n matrix with linearly independent columns.
Suppose C > 0 is any positive definite m m matrix. Then the matrix K = A T C A is a
positive definite n n matrix.
The Gram matrix K constructed in (3.51) arises in a wide range of applications,
including weighted least squares approximation theory, cf. Chapter 4, the study of equilibrium of mechanical and electrical systems, cf. Chapter 6. Starting in Chapter 11, we
shall look at infinite-dimensional generalizations that apply to differential equations and
boundary value problems.
Example 3.34. In the majority of applications, C = diag (c1 , . . . , cm ) is a diagonal
positive definite matrix, which requires it to have strictly positive diagonal entries c i > 0.
This choice
corresponds
to a weighted inner product (3.10) on R m . For example,

if we
set

3 0 0
1
3
C = 0 2 0 , then the weighted Gram matrix based on the vectors 2 , 0
0 0 5
1
6
of Example 3.31 is

3 0 0

1 3
16
21
1 2 1
T
,
0 2 0 2 0 =
K =A CA=
21 207
3 0 6
0 0 5
1 6
reproducing the second part of Example 3.31.
1/12/04

110

c 2003

Peter J. Olver

The Gram construction also carries over to inner products on function space. Here is
a particularly important example.
Example 3.35. Consider vector space C0 [ 0, 1 ] consisting of continuous functions
Z 1
2
on the interval 0 x 1, equipped with the L inner product h f ; g i =
f (x) g(x) dx.
0

Let us construct the Gram matrix corresponding to the elementary monomial functions
1, x, x2 . We compute the required inner products
Z 1
Z 1
1
2
h1;1i = k1k =
dx = 1,
h1;xi =
x dx = ,
2
0
0
Z 1
Z 1
1
1
x2 dx = ,
h 1 ; x2 i =
x2 dx = ,
h x ; x i = k x k2 =
3
3
0
0
Z 1
Z 1
1
1
h x2 ; x 2 i = k x 2 k 2 =
h x ; x2 i =
x4 dx = ,
x3 dx = .
5
4
0
0
Therefore, the Gram matrix is

1 12 13

K = 12 31 14 .
1
3

1
4

1
5

The monomial functions 1, x, x2 are linearly independent. Therefore, Theorem 3.30 implies
that this particular matrix is positive definite.
The alert reader may recognize this Gram matrix K = H3 as the 3 3 Hilbert matrix
that we encountered in (1.67). More generally, the Gram matrix corresponding to the
monomials 1, x, x2 , . . . , xn has entries
Z 1
1
i
j
xi+j dt =
,
i, j = 0, . . . , n.
kij = h x ; x i =
i+j+1
0
Therefore, the monomial Gram matrix K = Hn+1 is the (n + 1) (n + 1) Hilbert matrix
(1.67). As a consequence of Theorems 3.26 and 3.33, we have proved the following nontrivial result.
Proposition 3.36. The n n Hilbert matrix Hn is positive definite. In particular,
Hn is a nonsingular matrix.
Example 3.37. Let us construct the Gram matrixZcorresponding to the functions

1, cos x, sin x with respect to the inner product h f ; g i =

f (x) g(x) dx on the interval

[ , ]. We compute the inner products

Z
2
h1;1i = k1k =
dx = 2 ,

Z
2
h cos x ; cos x i = k cos x k =
cos2 x dx = ,

Z
sin2 x dx = ,
h sin x ; sin x i = k sin x k2 =

1/12/04

111

h 1 ; cos x i =
h 1 ; sin x i =

cos x dx = 0,

sin x dx = 0,
Z
cos x sin x dx = 0.
h cos x ; sin x i =

c 2003

Peter J. Olver

Therefore, the Gram matrix is a simple diagonal matrix K =

0
definiteness of K is immediately evident.
0

0
0 . Positive

3.5. Completing the Square.

Gram matrices furnish us with an abundant supply of positive definite matrices. However, we still do not know how to test whether a given symmetric matrix is positive definite.
As we shall soon see, the secret already appears in the particular computations in Examples
3.2 and 3.24.
The student may recall the importance of the method known as completing the
square, first in the derivation of the quadratic formula for the solution to
q(x) = a x2 + 2 b x + c = 0,

(3.52)

and, later, in the integration of various types of rational functions. The key idea is to
combine the first two terms in (3.52) as a perfect square, and so rewrite the quadratic
function in the form

2
b
a c b2
= 0.
(3.53)
q(x) = a x +
+
a
a
As a consequence,

b
x+
a

The quadratic formula

b2 a c
.
a2

b2 a c
a
follows by taking the square root of both sides and then solving for x. The intermediate
step (3.53), where we eliminate the linear term, is known as completing the square.
We can perform the same manipulation on the corresponding homogeneous quadratic
form
(3.54)
q(x1 , x2 ) = a x21 + 2 b x1 x2 + c x22 .
x=

We write
q(x1 , x2 ) =

a x21

+ 2 b x 1 x2 +

c x22

b
= a x1 + x2
a

a c b2 2
a c b2 2
x2 = a y12 +
y2
a
a
(3.55)

as a sum of squares of the new variables

b
(3.56)
x ,
y2 = x2 .
a 2
Since y1 = y2 = 0 if and only if x1 = x2 = 0, the final expression is positive definite if and
only if both coefficients are positive:
y1 = x 1 +

a c b2
> 0.
a

a > 0,
1/12/04

112

c 2003

Peter J. Olver

This proves that conditions (3.46) are necessary and sufficient for the quadratic form (3.45)
to be positive definite.
How this simple idea can be generalized to the multi-variable case will become clear
if we write the quadratic form identity (3.55) in matrix form. The original quadratic form
(3.54) is

x1
a b
T
,
x=
.
(3.57)
q(x) = x K x,
where
K=
x2
b c
The second quadratic form in (3.55) is

!

0
y1
T
2
qb (y) = y D y,
where
D=
,
y=
.
(3.58)
ac b
y2
a
Anticipating the final result, the equation connecting x and y can be written in matrix
form as
!

b
1 0
y
1
.
where
LT = b
= x1 + a x2 ,
y = LT x
or
y2
1
x2
a
Substituting into (3.58), we find
a
0

yT D y = (LT x)T D (LT x) = xT L D LT x = xT K x,

where
K = L D LT (3.59)

a b
T
that appears in (1.56). We are
is precisely the L D L factorization of K =
b c
thus led to the important conclusion that completing the square is the same as the L D L T
factorization of a symmetric matrix , obtained through Gaussian elimination!
Recall the definition of a regular matrix as one that can be reduced to upper triangular
form without any row interchanges;Theorem 1.32 says that these are the matrices admitting
an L D LT factorization. The identity (3.59) is therefore valid for all regular nn symmetric
matrices, and shows how to write the associated quadratic form as a sum of squares:
qb (y) = yT D y = d1 y12 + + dn yn2 .

(3.60)

The coefficients di are the pivots of K. In particular, according to Exercise , qb (y) > 0
is positive definite if and only if all the pivots are positive: d i > 0. Let us now state the
main result that completely characterizes positive definite matrices.
Theorem 3.38. A symmetric matrix K is positive definite if and only if it is regular
and has all positive pivots. Consequently, K is positive definite if and only if it can be
factored K = L D LT , where L is special lower triangular, and D is diagonal with all
positive diagonal entries.

1 2 1
Example 3.39. Consider the symmetric matrix K = 2 6 0 . Gaussian
1 0 9
elimination produces the factors

1 2 1
1 0 0
1 0 0
LT = 0 1 1 .
D = 0 2 0,
L = 2 1 0,
0 0 1
0 0 6
1 1 1
1/12/04

113

c 2003

Peter J. Olver

in the factorization K = L D LT . Since the pivots the diagonal entries 1, 2, 6 in D

are all positive, Theorem 3.38 implies that K is positive definite, which means that the
associated quadratic form
q(x) = x21 + 4 x1 x2 2 x1 x3 + 6 x22 + 9 x23 > 0,

for all

x = ( x 1 , x2 , x3 ) 6
= 0.

Indeed, the L D LT factorization implies that q(x) can be explicitly written as a sum of
squares:
q(x) = y12 + 2 y22 + 6 y32 ,

where

y1 = x 1 + 2 x 2 x 3 , y 2 = x 2 + x 3 ,

are the entries of y = LT x. Positivity of the coefficients of the yi2

implies that q(x) is positive definite.
On the other hand, for the L D LT factorization

1
1 0 0
1 0 0
1 2 3

0
0 2 0
K= 2 6 2 = 2 1 0
0
0 0 9
3 2 1
3 2 8

y3 = x3 ,

(which are the pivots)

2
1
0

3
2 ,
1

the fact that D has a negative diagonal entry, 9, implies that K is not positive definite
even though all its entries are positive. The associated quadratic form is
q(x) = x21 + 4 x1 x2 + 6 x1 x3 + 6 x22 + 4 x2 x3 + 8 x23
is not positive definite since, for instance, q(5, 2, 1) = 9 < 0.
The only remaining issue is to show that an irregular matrix cannot be positive

defi0 1
,
nite. For example, the quadratic form corresponding to the irregular matrix K =
1 0
is q(x) = 2 x1 x2 , which is clearly not positive definite, e.g., q(1, 1) = 2. In general,
if the upper left entry k11 = 0, then it cannot serve as the first pivot, and so K is not
regular. But then q(e1 ) = eT1 K e1 = 0, and so K is not positive definite. (It may be
positive semi-definite, or, more likely, indefinite.)
Otherwise, if k11 6
= 0, then we use Gaussian elimination to make all entries lying in
the first column below the pivot equal to zero. As remarked above, this is equivalent to
completing the square in the initial terms of the associated quadratic form
q(x) = k11 x21 + 2 k12 x1 x2 + + 2 k1n x1 xn + k22 x22 + + knn x2n

2
k12
k1n
= k11 x1 +
+ qe(x2 , . . . , xn )
x + +
x
k11 2
k11 n
where

(3.61)

= k11 (x1 + l21 x2 + + ln1 xn )2 + qe(x2 , . . . , xn ),

k21
k
k
k
= 12 ,
...
ln1 = n1 = 1n ,
k11
k11
k11
k11
are precisely the multiples appearing in the first column of the lower triangular matrix L
obtained from Gaussian Elimination, while
l21 =

1/12/04

qe(x2 , . . . , xn ) =
114

n
X

i,j = 2

e
k ij xi xj

c 2003

Peter J. Olver

is a quadratic form involving one fewer variable. The entries of its symmetric coefficient
e are
matrix K
e
k ij = e
k ji = kij lj1 k1i ,
for
i j.

e that lie on or below the diagonal are exactly the same as the entries
Thus, the entries of K
appearing on or below the diagonal of K after the the first phase of the elimination process.
In particular, the second pivot of K is the entry e
k 22 that appears in the corresponding slot
e
in K. If qe is not positive definite, then q cannot be positive definite. Indeed, suppose that
there exist x?2 , . . . , x?n , not all zero, such that qe(x?2 , . . . , x?n ) 0. Setting
x?1 = l21 x?2 ln1 x?n ,

makes the initial square term in (3.61) equal to 0, so q(x?1 , x?2 , . . . , x?n ) = qe(x?2 , . . . , x?n ) 0.
In particular, if the second diagonal entry e
k 22 = 0, then qe is not positive definite, and so
neither is q. Continuing this process, if any diagonal entry of the reduced matrix vanishes,
then the reduced quadratic form cannot be positive definite, and so neither can q. This
demonstrates that if K is irregular, then it cannot be positive definite, which completes
the proof of Theorem 3.38.
The Cholesky Factorization
The identity (3.59) shows us how to write any regular quadratic form q(x) as a sum
of squares. One can push this result slightly further in the positive definite case. Since
each pivot di > 0, we can write the diagonal form (3.60) as a sum of squares with unit
coefficients:
2
p
2
p
qb (y) = d1 y12 + + dn yn2 =
d 1 y1 + +
dn yn = z12 + + zn2 ,
p
where zi = di yi . In matrix form, we are writing
qb (y) = yT D y = zT z = k z k2 ,

where

z = C y,

with C = diag (

p
p
d 1 , . . . , dn )

Since D = C 2 , the matrix C can be thought of as a square root of the diagonal matrix
D. Substituting back into (1.52), we deduce the Cholesky factorization
K = L D L T = L C C T LT = M M T ,

where

M = LC

(3.62)

of a positive definite matrix. Note that M is a lower triangular

matrix with all positive
p
entries, namely the square roots of the pivots mii = ci = di on its diagonal. Applying
the Cholesky factorization to the corresponding quadratic form produces
q(x) = xT K x = xT M M T x = zT z = k z k2 ,

where

z = M T x.

(3.63)

One can interpret this as a change of variables from x to z that converts an arbitrary inner
product norm, as defined by the square root of the positive definite quadratic form q(x),
into the standard Euclidean norm k z k.
1/12/04

115

c 2003

Peter J. Olver

1 2 1
Example 3.40. For the matrix K = 2 6 0 considered in Example 3.39,
1 0 9
T
the Cholesky formula (3.62) gives K = M M , where

1 0
0
1 0 0
1 0
0
M = LC = 2 1 0 0
2 0 = 2 2 0 .
1 1 1
0 0
1
6
2
6

The associated quadratic function can then be written as a sum of pure squares:

q(x) = x21 + 4 x1 x2 2 x1 x3 + 6 x22 + 9 x23 = z12 + z22 + z32 ,

where z = M T x, or, explicitly, z1 = x1 + 2 x2 x3 , z2 = 2 x2 + 2 x3 , z3 = 6 x3 ..

3.6. Complex Vector Spaces.

Although physical applications ultimately require real answers, complex numbers and
complex vector spaces assume an extremely useful, if not essential role in the intervening
analysis. Particularly in the description of periodic phenomena, complex numbers and
complex exponentials assume a central role, dramatically simplifying complicated trigonometric formulae. Complex variable methods are essential in fluid mechanics, electrical
engineering, Fourier analysis, potential theory, electromagnetism, and so on. In quantum
mechanics, complex numbers are ubiquitous. The basic physical quantities are complex
wave functions. Moreover, the Schrodinger equation, which is the basic equation governing
quantum systems, is a complex partial differential equation with complex-valued solutions.
In this section, we survey the basic facts about complex numbers and complex vector
spaces. Most of the constructions are entirely analogous to the real case, and will not be
dwelled on at length. The one exception is the complex version of an inner product, which
does introduce some novelties not found in its simpler real counterpart. Complex analysis
(integration and differentiation of complex functions) and its applications to fluid flows,
potential theory, waves and other areas of mathematics, physics and engineering, will be
the subject of Chapter 16.
Complex Numbers
Recall that a
complex number is an expression of the form z = x + i y, where x, y

are real and i = 1. We call x = Re z the real part of z and y = Im z the imaginary
part. (Note: The imaginary part is the real number y, not i y.) A real number x is
merely a complex number with zero imaginary part: Im z = 0. Complex addition and
multiplication are based on simple adaptations of the rules of real arithmetic to include
the identity i 2 = 1, and so
(x + i y) + (u + i v) = (x + u) + i (y + v),
(x + i y) (u + i v) = (x u y v) + i (x v + y u).

(3.64)

Electrical engineers prefer to use j to indicate the imaginary unit.

1/12/04

116

c 2003

Peter J. Olver

Complex numbers enjoy all the usual laws of real addition and multiplication, including
commutativity: z w = w z.
T
We can identity a complex number x+ i y with a vector ( x, y ) R 2 in the real plane.
Complex addition (3.64) corresponds to vector addition, but complex multiplication does
not have a readily identifiable vector counterpart.
Another important operation on complex numbers is that of complex conjugation.
Definition 3.41. The complex conjugate of z = x + i y is z = x i y, whereby
Re z = Re z, Im z = Im z.
Geometrically, the operation of complex conjugation coincides with reflection of the
corresponding vector through the real axis, as illustrated in Figure 3.6. In particular z = z
if and only if z is real. Note that
Re z =

z+z
,
2

Im z =

zz
.
2i

(3.65)

Complex conjugation is compatible with complex arithmetic:

z + w = z + w,

z w = z w.

In particular, the product of a complex number and its conjugate

z z = (x + i y) (x i y) = x2 + y 2

(3.66)

is real and non-negative. Its square root is known as the modulus of the complex number
z = x + i y, and written
p
| z | = x2 + y 2 .
(3.67)
Note that | z | 0, with | z | = 0 if and only if z = 0. The modulus | z | generalizes the
absolute value of a real number, and coincides with the standard Euclidean norm in the
(x, y)plane. This implies the validity of the triangle inequality
| z + w | | z | + | w |.

(3.68)

Equation (3.66) can be rewritten in terms of the modulus as

z z = | z |2 .

(3.69)

Rearranging the factors, we deduce the formula for the reciprocal of a nonzero complex
number:
1
z
=
,
z
| z |2

z6
= 0,

1
x iy
= 2
.
x + iy
x + y2

(3.70)

u + iv
(x u + y v) + i (x v y u)
=
,
x + iy
x2 + y 2

(3.71)

or, equivalently

The general formula for complex division

wz
w
=
z
| z |2
1/12/04

or, equivalently

117

c 2003

Peter J. Olver

z
r

Complex Numbers.

Figure 3.6.
is an immediate consequence.
The modulus of a complex number,

r = |z| =

p
x2 + y 2 ,

is one component of its polar coordinate representation

x = r cos ,

y = r sin

z = r(cos + i sin ).

(3.72)

The polar angle, which measures the angle that the line connecting z to the origin makes
with the horizontal axis, is known as the phase, and written
ph z = .

(3.73)

The more common term is the argument, and written arg z = ph z. For various reasons,
and to avoid confusion with the argument of a function, we have chosen to use phase
throughout this text. As such, the phase is only defined up to an integer multiple of 2 .
We note that the modulus and phase of a product of complex numbers can be readily
computed:
| z w | = | z | | w |,
ph (z w) = ph z + ph w.
(3.74)
On the other hand, complex conjugation preserves the modulus, but negates the phase:
| z | = | z |,

ph z = ph z.

(3.75)

One of the most important formulas in all of mathematics is Eulers formula

e i = cos + i sin ,

(3.76)

relating the complex exponential with the real sine and cosine functions. This basic identity has a variety of mathematical justifications; see Exercise for one that is based on
comparing power series. Eulers formula (3.76) can be used to compactly rewrite the polar
form (3.72) of a complex number as
z = r ei
1/12/04

where
118

r = | z |,

= ph z.

(3.77)
c 2003

Peter J. Olver

Figure 3.7.

Real and Imaginary Parts of ez .

The complex conjugate identity

e i = cos( ) + i sin( ) = cos i sin = e i ,
permits us to express the basic trigonometric functions in terms of complex exponentials:
cos =

e i + e i
,
2

sin =

e i e i
.
2i

(3.78)

These formulae are very useful when working with trigonometric identities and integrals.
The exponential of a general complex number is easily derived from the basic Euler formula and the standard properties of the exponential function which carry over
unaltered to the complex domain; thus,
ez = ex+ i y = ex e i y = ex cos y + i ex sin y.

(3.79)

Graphs of the real and imaginary parts of the complex exponential appear in Figure 3.7.
Note that e2 i = 1, and hence the exponential function is periodic
ez+2 i = ez

(3.80)

with imaginary period 2 i a reflection of the periodicity of the trigonometric functions

in Eulers formula.
Complex Vector Spaces and Inner Products
A complex vector space is defined in exactly the same manner as its real cousin,
cf. Definition 2.1, the only difference being that we replace real scalars R by complex scalars
C. The most basic example is the n-dimensional complex vector space C n consisting of
T
all column vectors z = ( z1 , z2 , . . . , zn ) that have n complex entries: z1 , . . . , zn C.
Verification of each of the vector space axioms is a straightforward exercise.
We can write any complex vector z = x + i y C n as a linear combination of two real
vectors x, y R n . Its complex conjugate z = x i y is obtained by taking the complex
1/12/04

119

c 2003

Peter J. Olver

conjugates of its individual entries. Thus, for example, if

1 2i
1 + 2i
then
z = 3 .
z = 3 ,
5 i
5i

In particular, z R n C n is a real vector if and only if z = z.

Most of the vector space concepts we developed in the real domain, including span,
linear independence, basis, and dimension, can be straightforwardly extended to the complex regime. The one exception is the concept of an inner product, which requires a little
thought. In analysis, the most important applications of inner products and norms are
based on the associated inequalities: CauchySchwarz and triangle. But there is no natural ordering of the complex numbers, and so one cannot make any sense of a complex
inequality like z < w. Inequalities only make sense in the real domain, and so the norm of
a complex vector should still be a positive, real number.
With this in mind, the nave idea of simply summing the squares of the entries of a
complex vector will not define a norm on C n , since the result will typically be complex.
T
Moreover, this would give some nonzero complex vectors, e.g., ( 1 i ) , a zero norm,
violating positivity .
The correct definition is modeled on the definition of the modulus

|z| = zz
of a complex scalar z C. If, in analogy with the real definition (3.7), the quantity inside
the square root is to represent the inner product of z with itself, then we should define the
dot product between two complex numbers to be
z w = z w,

so that

z z = z z = | z |2 .

If z = x + i y and w = u + i v, then
z w = z w = (x + i y) (u i v) = (x u + y v) + i (y u x v).

(3.81)

Thus, the dot product of two complex numbers is, in general, complex. The real part of
z w is, in fact, the Euclidean dot product between the corresponding vectors in R 2 , while
the imaginary part is, interestingly, their scalar cross-product, cf. (cross2 ).
The vector version of this construction is named after the nineteenth century French
mathematician Charles Hermite, and called the Hermitian dot product on C n . It has the
explicit formula
w
z
1
1
w2
z2

z w = zT w = z1 w1 + z2 w2 + + zn wn , for z =
.. , w = .. . (3.82)
.
.
wn

On the other hand, in relativity, the Minkowski norm is also not always positive, and
indeed the vectors with zero norm play a critical role as they lie on the light cone emanating from
the origin, [ 106 ].

1/12/04

120

c 2003

Peter J. Olver

Pay attention to the fact that we must

to all the entries of the
applycomplex
conjugation

1+ i
1 + 2i
second vector. For example, if z =
,w=
, then
3 + 2i
i
z w = (1 + i )(1 2 i ) + (3 + 2 i )( i ) = 5 4 i .
On the other hand,
w z = (1 + 2 i )(1 i ) + i (3 2 i ) = 5 + 4 i .
Therefore, the Hermitian dot product is not symmetric. Reversing the order of the vectors
results in complex conjugation of the dot product:
w z = z w.
But this extra complication does have the effect that the induced norm, namely
p

0 k z k = z z = zT z = | z 1 | 2 + + | z n | 2 ,

(3.83)

is strictly positive for all 0 6

= z C n . For example, if

1 + 3i
p

z = 2 i ,
| 1 + 3 i |2 + | 2 i |2 + | 5 |2 = 39 .
then
kzk =
5
The Hermitian dot product is well behaved under complex vector addition:
(z + b
z) w = z w + b
z w,

b = z w + z w.
b
z (w + w)

However, while complex scalar multiples can be extracted from the first vector without
alteration, when they multiply the second vector, they emerge as complex conjugates:
(c z) w = c (z w),

z (c w) = c (z w),

c C.

Thus, the Hermitian dot product is not bilinear in the strict sense, but satisfies something
that, for lack of a better name, is known as sesqui-linearity.
The general definition of an inner product on a complex vector space is modeled on
the preceding properties of the Hermitian dot product.
Definition 3.42. An inner product on the complex vector space V is a pairing that
takes two vectors v, w V and produces a complex number h v ; w i C, subject to the
following requirements for all u, v, w V , and c, d C.
(i ) Sesqui-linearity:
h c u + d v ; w i = c h u ; w i + d h v ; w i,
(3.84)
h u ; c v + d w i = c h u ; v i + d h u ; w i.
(ii ) Conjugate Symmetry:
h v ; w i = h w ; v i.

(3.85)

(iii ) Positivity:
k v k2 = h v ; v i 0,
1/12/04

and h v ; v i = 0
121

if and only if v = 0.
c 2003

(3.86)

Peter J. Olver

Thus, when dealing with a complex inner product space, one must pay careful attention to the complex conjugate that appears when the second argument in the inner
product is multiplied by a complex scalar, as well as the complex conjugate that appears
when switching the order of the two arguments.
Theorem 3.43. The CauchySchwarz inequality,
| h v ; w i | k v k k w k,

v, w V.

with | | now denoting the complex modulus, and the triangle inequality
kv + wk kvk + kwk
hold for any complex inner product space.
The proof of this result is practically the same as in the real case, and the details are
left to the reader.
T

Example 3.44. The vectors v = ( 1 + i , 2 i , 3 ) , w = ( 2 i , 1, 2 + 2 i ) , satisfy

k v k = 2 + 4 + 9 = 15,
k w k = 5 + 1 + 8 = 14,
v w = (1 + i )(2 + i ) + 2 i + ( 3)(2 2 i ) = 5 + 11 i .

Thus, the CauchySchwarz inequality reads

| h v ; w i | = | 5 + 11 i | = 146 210 = 15 14 = k v k k w k.

Similarly, the triangle inequality tells us that

T
k v + w k = k ( 3, 1 + 2 i , 1 + 2 i ) k = 9 + 5 + 5 = 19 15 + 14 = k v k + k w k.

Example 3.45. Let C0 = C0 [ , ] denote the complex vector space consisting

of all complex valued continuous functions f (x) = u(x) + i v(x) depending upon the real
variable x . The Hermitian L2 inner product is defined as
Z
hf ;gi =
f (x) g(x) dx ,
(3.87)
with corresponding norm
sZ
kf k =

| f (x) |2 dx =

u(x)2 + v(x)2 dx .

(3.88)

The reader should check that (3.87) satisfies the basic Hermitian inner product axioms.
For example, if k, l are integers, then the inner product of the complex exponential
functions e i kx and e i lx is

2 ,
k = l,

Z
Z

h e i kx ; e i lx i =
e i kx e i lx dx =
e i (kl)x dx =
e i (kl)x

= 0,
k6
= l.

i (k l)
i kx

x =

We conclude that when k 6

= l, the complex exponentials e
and e i lx are orthogonal,
since their inner product is zero. This key example will be of fundamental significance in
the complex version of Fourier analysis.
c 2003 Peter J. Olver
1/12/04
122

Chapter 4
Minimization and Least Squares Approximation
Because Nature strives to be efficient, many systems arising in applications are founded
on a minimization principle. For example, in a mechanical system, the stable equilibrium
positions minimize the potential energy. The basic geometrical problem of minimizing
distance also appears in many contexts. For example, in optics and relativity, light rays
follow the paths of minimal distance the geodesics on the curved space-time. In data
analysis, the most fundamental method for fitting a function to a set of sampled data
points is to minimize the least squares error, which serves as a measurement of the overall
deviation between the sample data and the function. The least squares paradigm carries
over to a wide range of applied mathematical systems. In particular, it underlies the
theory of Fourier series, in itself of inestimable importance in mathematics, physics and
engineering. Solutions to many of the important boundary value problems arising in
mathematical physics and engineering are also characterized by an underlying minimization
principle. Moreover, the finite element numerical solution method relies on the associated
minimization principle. Optimization is ubiquitous in control theory, engineering design
and manufacturing, linear programming, econometrics, and most other fields of analysis.
This chapter introduces and solves the most basic minimization problem that of a
quadratic function of several variables. The minimizer is found by solving an associated
linear system. The solution to the quadratic minimization problem leads directly to a
broad range of applications, including least squares fitting of data, interpolation, and
approximation of functions. Applications to equilibrium mechanics will form the focus of
Chapter 6. Applications to the numerical solution of differential equations in numerical
analysis will appear starting in Chapter 11. More general nonlinear minimization problems,
which, as usual, require a thorough analysis of the linear situation, will be deferred until
Section 19.3.

4.1. Minimization Problems.

Let us begin by introducing three important minimization problems one physical,
one analytical, and one geometrical.
Equilibrium Mechanics
A fundamental principle of mechanics is that systems in equilibrium minimize potential energy. For example, a ball in a bowl will roll downhill until it reaches the bottom,
where it minimizes its potential energy due to gravity. Similarly, a pendulum will swing
back and forth unless it is at the bottom of its arc, where potential energy is minimized.
Actually, the pendulum has a second equilibrium position at the top of the arc, but this
1/12/04

123

c 2003

Peter J. Olver

is an unstable equilibrium, meaning that any tiny movement will knock it off balance.
Therefore, a better way of stating the principle is that stable equilibria are where the mechanical system minimizes potential energy. For the ball rolling on a curved surface, the
local minima the bottoms of valleys are the stable equilibria, while the local maxima
the tops of hills are unstable. This basic idea is fundamental to the understanding
and analysis of the equilibrium configurations of a wide range of physical systems, including masses and springs, structures, electrical circuits, and even continuum models of solid
mechanics and elasticity, fluid mechanics, electromagnetism, thermodynamics, statistical
mechanics, and so on.
Solution of Equations
Suppose we wish to solve a system of equations
f1 (x) = 0,

f2 (x) = 0,

...

fm (x) = 0,

(4.1)

where x = (x1 , . . . , xn ) R n . This system can be converted into a minimization problem

in the following seemingly silly manner. Define

2
p(x) = f1 (x) + + fm (x) = k f (x) k2 ,
(4.2)
where k k denotes the Euclidean norm on R m . Clearly, p(x) 0 for all x. Moreover,
p(x? ) = 0 if and only if each summand is zero, and hence x? is a solution to (4.1).
Therefore, the minimum value of p(x) is zero, and the minimum is achieved if and only if
x = x? solves the system (4.1).
The most important case is when we have a linear system
Ax = b

(4.3)

consisting of m equations in n unknowns. In this case, the solutions may be obtained by

minimizing the function
p(x) = k A x b k2 .
(4.4)
Of course, it is not clear that we have gained much, since we already know how to solve
A x = b by Gaussian elimination. However, this rather simple artifice has profound
consequences.
Suppose that the system (4.3) does not have a solution, i.e., b does not lie in the
range of the matrix A. This situation is very typical when there are more equations than
unknowns: m > n. Such problems arise in data fitting, when the measured data points
are all supposed to lie on a straight line, say, but rarely do so exactly, due to experimental
error. Although we know there is no exact solution to the system, we might still try to
find the vector x? that comes as close to solving the system as possible. One way to
measure closeness is by looking at the magnitude of the residual vector r = A x b, i.e.,
the difference between the left and right hand sides of the system. The smaller k r k =
k A x b k, the better the attempted solution. The vector x? that minimizes the function
(4.4) is known as the least squares solution to the linear system. We note that if the linear
system (4.3) happens to have a actual solution, with A x? = b, then x? qualifies as the
least squares solution too, since in this case p(x? ) = 0 achieves its absolute minimum.
1/12/04

124

c 2003

Peter J. Olver

Thus, the least squares solutions naturally generalize traditional solutions. While not the
only possible method, least squares is is easiest to analyze and solve, and hence, typically,
the method of choice for fitting functions to experimental data and performing statistical
analysis.
The Closest Point
The following minimization problem arises in elementary geometry. Given a point
b R m and a subset V R m , find the point v? V that is closest to b. In other words,
we seek to minimize the distance d(b, v) = k v b k over all possible v V .
The simplest situation occurs when V is a subspace of R m . In this case, the closest
point problem can be reformulated as a least squares minimization problem. Let v 1 , . . . , vn
be a basis for V . The general element v V is a linear combination of the basis vectors.
Applying our handy matrix multiplication formula (2.14), we can write the subspace elements in the form
v = x1 v1 + + xn vn = A x,
where A = ( v1 v2 . . . vn ) is the m n matrix formed by the (column) basis vectors. Note
that we can identify V = rng A with the range of A, i.e., the subspace spanned by its
columns. Consequently, the closest point in V to b is found by minimizing
k v b k2 = k A x b k 2
over all possible x R n . This is exactly the same as the least squares function (4.4)!
Thus, if x? is the least squares solution to the system A x = b, then v ? = A x? is the
closest point to b belonging to V = rng A. In this way, we have established a fundamental
connection between least squares solutions to linear systems and the geometrical problem
of minimizing distances to subspaces.
All three of the preceding minimization problems are solved by the same underlying
mathematical construction, which will be described in detail in Section 4.3.
Remark : We will concentrate on minimization problems. Maximizing a function f (x)
is the same as minimizing its negative f (x), and so can be easily handled by the same
methods.

4.2. Minimization of Quadratic Functions.

The simplest algebraic equations are the linear systems; these must be thoroughly understood before venturing into the far more complicated nonlinear realm. For minimization
problems, the starting point is the minimization of a quadratic function. (Linear functions
do not have minima think of the function f (x) = b x + c whose graph is a straight line.)
In this section, we shall see how the problem of minimizing a general quadratic function
of n variables can be solved by linear algebra techniques.
Let us begin by reviewing the very simplest example minimizing a scalar quadratic
function
p(x) = a x2 + 2 b x + c.
(4.5)
1/12/04

125

c 2003

Peter J. Olver

-1

a>0

-1

a<0
Figure 4.1.

a=0
Parabolas.

If a > 0, then the graph of p is a parabola pointing upwards, and so there exists a unique
minimum value. If a < 0, the parabola points downwards, and there is no minimum
(although there is a maximum). If a = 0, the graph is a straight line, and there is neither
minimum nor maximum except in the trivial case when b = 0 also, and the function
is constant, with every x qualifying as a minimum and a maximum. The three nontrivial
possibilities are sketched in Figure 4.1.
In the case a > 0, the minimum can be found by calculus. The critical points of a
function, which are candidates for minima (and maxima), are found by setting its derivative
to zero. In this case, differentiating, and solving
p0 (x) = 2 a x + 2 b = 0,
we conclude that the only possible minimum value occurs at
b2
b
,
where
p(x? ) = c .
(4.6)
a
a
Of course, one must check that this critical point is indeed a minimum, and not a maximum
or inflection point. The second derivative test will show that p00 (x? ) = 2 a > 0, and so x?
is at least a local minimum.
A more instructive approach to this problem and one that only requires elementary
algebra is to complete the square. As was done in (3.53), we rewrite

2
b
a c b2
p(x) = a x +
+
.
(4.7)
a
a
x? =

If a > 0, then the first term is always 0, and moreover equals 0 only at x ? = b/a,
reproducing (4.6). The second term is constant, and so unaffected by the value of x. We
conclude that p(x) is minimized when the squared term in (4.7) vanishes. Thus, the simple
algebraic identity (4.7) immediately proves that the global minimum of p is at x ? = b/a,
and, moreover its minimal value p(x? ) = (a c b2 )/a is the constant term.
Now that we have the scalar case firmly in hand, let us turn to the more difficult
problem of minimizing quadratic functions that depend on several variables. Thus, we
seek to minimize a (real) quadratic function
p(x) = p(x1 , . . . , xn ) =

n
X

i,j = 1

1/12/04

126

kij xi xj 2

n
X

fi xi + c,

(4.8)

i=1
c 2003

Peter J. Olver

depending on n variables x = ( x1 , x2 , . . . , xn ) R n . The coefficients kij , fi and c are

all assumed to be real; moreover, according to Exercise , we can assume, without loss of
generality, that the coefficients of the quadratic terms are symmetric: kij = kji . Note that
p(x) is slightly more general than a quadratic form (3.42) in that it also contains linear
and constant terms. We shall rewrite the quadratic function (4.8) in a more convenient
matrix notation:
p(x) = xT K x 2 xT f + c,
(4.9)
where K = (kij ) is a symmetric n n matrix, f is a constant vector, and c is a constant
scalar. We shall adapt our method of completing the square to find its minimizer.
We first note that in the simple scalar case (4.5), we needed to impose the condition
that the quadratic coefficient a is positive in order to obtain a (unique) minimum. The
corresponding condition for the multivariable case is that the quadratic coefficient matrix
K be positive definite. This key assumption enables us to prove a very general minimization
theorem.
Theorem 4.1. If K > 0 is a symmetric, positive definite matrix, then the quadratic
function (4.9) has a unique minimizer, which is the solution to the linear system
K x = f,

x? = K 1 f .

namely

(4.10)

The minimum value of p(x) is equal to any of the following expressions:

p(x? ) = p(K 1 f ) = c f T K 1 f = c f T x? = c (x? )T K x? .

(4.11)

Proof : Suppose x? = K 1 f is the (unique why?) solution to (4.10). Then, for any
x R n , we can write
p(x) = xT K x 2 xT f + c = xT K x 2 xT K x? + c

= (x x? )T K(x x? ) + c (x? )T K x? ,

(4.12)

where we used the symmetry of K = K T to identify xT K x? = (x? )T K x. The second term

in the final formula does not depend on x. Moreover, the first term has the form y T K y
where y = x x? . Since we assumed that K is positive definite, y T K y 0 and vanishes
if and only if y = x x? = 0, which achieves its minimum. Therefore, the minimum of
p(x) occurs at x = x? . The minimum value of p(x) is equal to the constant term. The
alternative expressions in (4.11) follow from simple substitutions.
Q.E.D.
Example 4.2. Let us illustrate the result with a simple example. Consider the
problem of minimizing the quadratic function
p(x1 , x2 ) = 4 x21 2 x1 x2 + 3 x22 + 3 x1 2 x2 + 1
over all (real) x1 , x2 . We first write p in the matrix form (4.9), so
p(x1 , x2 ) = ( x1 x2 )
1/12/04

4
1

1
3
127

x1
x2

2 ( x 1 x2 )

32
1

+ 1,

c 2003

Peter J. Olver

whereby
K=

4
1

1
,
3

32
1

(4.13)

(Pay attention to the overall factor of 2 preceding the linear terms.) According to the
theorem, to find the minimum, we must solve the linear system
3 !

2
x1
4 1
.
(4.14)
=
1 3
x2
1
Applying our Gaussian elimination algorithm, only one operation is required to place the
coefficient matrix in upper triangular form:

4 1 32
4 1 32
.
7

5
1 3 1
0 11
4
8

Note that the coefficient matrix is regular (no row interchanges are required) and its two
pivots, namely 4, 11
4 , are both positive; this proves that K > 0 and hence p(x 1 , x2 ) really
does have a minimum, obtained by applying Back Substitution to the reduced system:
!
? 7 !

.318182
x
1
22
=
x? =

.
5
x?2
.227273
22
The quickest way to compute the minimal value
7 5
p(x? ) = p 22
, 22 =

13
44

.295455

is to use the second formula in (4.11).

It is instructive to compare the algebraic solution method with the minimization
procedure you learned in multi-variable calculus. The critical points of p(x 1 , x2 ) are found
by setting both partial derivatives equal to zero:
p
= 8 x1 2 x2 + 3 = 0,
x1

p
= 2 x1 + 6 x2 2 = 0.
x2

If we divide by an overall factor of 2, these are precisely the same linear equations we
already constructed in (4.14). Thus, not surprisingly, the calculus approach leads to the
same critical point. To check whether a critical point is a local minimum, we need to test
the second derivative. In the case of a function of several variables, this requires analyzing
the Hessian matrix , which is the symmetric matrix of second order partial derivatives

2p
2p

x21
x
x
8
2
1
2
=
H=
= 2 K,
2p
2 6
2p
x1 x2
x22

which is exactly twice the quadratic coefficient matrix (4.13). If the Hessian matrix is
positive definite which we already know in this case then the critical point is indeed
1/12/04

128

c 2003

Peter J. Olver

a (local) minimum. Thus, the calculus and algebraic approaches to this minimization
problem lead (not surprisingly) to identical results. However, the algebraic method is
more powerful, because it immedaitely produces the unique, global minimum, whereas,
without extra work (e.g., proving convexity of the function), calculus can only guarantee
that the critical point is a local minimum, [9]. The reader can find the full story on
minimization of nonlinear functions, which is, in fact based on the algebraic theory of
positive definite matrices, in Section 19.3.
The most efficient method for producing a minimum of a quadratic function p(x) on
R n , then, is to first write out the symmetric coefficient matrix K and the vector f . Solving
the system K x = f will produce the minimizer x? provided K > 0 which should be
checked during the course of the procedure by making sure no row interchanges are used
and all the pivots are positive. If these conditions are not met then (with one minor
exception see below) one immediately concludes that there is no minimizer.
Example 4.3. Let us minimize the quadratic function
p(x, y, z) = x2 + 2 x y + x z + 2 y 2 + y z + 2 z 2 + 6 y 7 z + 5.
This has the matrix form (4.9) with

1 1 12

K = 1 2 12 ,
1
2

1
2

f = 3,
7
2

Gaussian elimination produces the L D LT factorization

1 1 12
1 0 0
1 0

K = 1 2 12 = 1 1 0 0 1
1
2

1
2

c = 5.

0 1

1
0

0 0
3
4

1
2

0 .

The pivots, i.e., the diagonal entries of D, are all positive, and hence K is positive definite.
Theorem 4.1 then guarantees that p(x, y, z) has a unique minimizer, which is found by
solving the linear system K x = f . The solution is then quickly obtained by forward and
back substitution:
x? = 2,

y ? = 3,

z ? = 2,

with

p(x? , y ? , z ? ) = p(2, 3, 2) = 11.

Theorem 4.1 solves the general quadratic minimization problem when the quadratic
coefficient matrix is positive definite. If K is not positive definite, then the quadratic
function (4.9) does not have a minimum, apart from one exceptional situation.
Theorem 4.4. If K > 0 is positive definite, then the quadratic function p(x) =
xT K x2 xT f +c has a unique global minimizer x? satisfying K x? = f . If K 0 is positive
semi-definite, and f rng K, then every solution to K x? = f is a global minimum of p(x).
However, in the semi-definite case, the minimum is not unique since p(x ? + z) = p(x? ) for
any null vector z ker K. In all other cases, there is no global minimum, and p(x) can
assume arbitrarily large negative values.
1/12/04

129

c 2003

Peter J. Olver

Proof : The first part is just a restatement of Theorem 4.1. The second part is proved
by a similar computation, and uses the fact that a positive semi-definite but not definite
matrix has a nontrivial kernel. If K is not positive semi-definite, then one can find a
vector y such that a = yT K y < 0. If we set x = t y, then p(x) = p(t y) = a t2 + 2 b t + c,
with b = yT f . Since a < 0, by choosing | t | 0 sufficiently large, one can arrange that
p(t y) 0 is an arbitrarily large negative quantity. The one remaining case when K is
positive semi-definite, but f 6
rng K is left until Exercise .
Q.E.D.

4.3. Least Squares and the Closest Point.

We are now in a position to solve the basic geometric problem of finding the element
in a subspace that is closest to a given point in Euclidean space.
Problem: Let V be a subspace of R m . Given b R m , find v? V which minimizes
k v b k over all possible v V .

The minimal distance k v? b k to the closest point is called the distance from the
point b to the subspace V . Of course, if b V lies in the subspace, then the answer is
easy: the closest point is v? = b itself. The distance from b to the subspace is zero. Thus,
the problem only becomes interesting when b 6
V.

Remark : Initially, you may assume that k k denotes the usual Euclidean norm, and
so the distance corresponds to the usual Euclidean length. But it will be no more difficult
to solve the closest point problem for any norm that arises from an inner product: k v k =
p
h v ; v i. In fact, requiring that V R m is not crucial either; the same method works
when V is a finite-dimensional subspace of any inner product space.
However, the methods do not apply to more general norms not coming from inner
products, e.g., the 1 norm or norm. These are much harder to handle, and, in such cases,
the closest point problem is a nonlinear minimization problem whose solution requires the
more sophisticated methods of Section 19.3.
When solving the closest point problem, the goal is to minimize the distance
k v b k2 = k v k2 2 h v ; b i + k b k 2 ,

(4.15)

over all possible v belonging to the subspace V R m . Let us assume that we know a
basis v1 , . . . , vn of V , with n = dim V . Then the most general vector in V is a linear
combination
v = x 1 v1 + + x n vn
(4.16)
of the basis vectors. We substitute the formula (4.16) for v into the distance function
(4.15). As we shall see, the resulting expression is a quadratic function of the coefficients
T
x = ( x1 , x2 , . . . , xn ) , and so the minimum is provided by Theorem 4.1.
First, the quadratic terms come from expanding
2

k v k = h v ; v i = h x 1 v1 + + x n vn ; x 1 v1 + + x n vn i =

1/12/04

130

n
X

i,j = 1

xi xj h vi ; vj i.

c 2003

(4.17)
Peter J. Olver

Therefore,
2

kvk =

n
X

kij xi xj = xT Kx,

i,j = 1

where K is the symmetric n n Gram matrix whose (i, j) entry is the inner product
kij = h vi ; vj i,

(4.18)

between the basis vectors of our subspace. Similarly,

h v ; b i = h x 1 v1 + + x n vn ; b i =
and so
hv;bi =

n
X

i=1

xi h vi ; b i,

xi fi = x T f ,

i=1

where f R n is the vector whose ith entry is the inner product

fi = h v i ; b i

(4.19)

between the point and the subspace basis elements. We conclude that the squared distance
function (4.15) reduces to the quadratic function
T

p(x) = x Kx 2 x f + c =

n
X

i,j = 1

kij xi xj 2

n
X

fi xi + c,

(4.20)

i=1

in which K and f are given in (4.18), (4.19), while c = k b k2 .

Since we assumed that the basis vectors v1 , . . . , vn are linearly independent, Proposition 3.32 assures us that the associated Gram matrix K = AT A is positive definite.
Therefore, we may directly apply our basic minimization Theorem 4.1 to solve the closest
point problem.
Theorem 4.5. Let v1 , . . . , vn form a basis for the subspace V R m . Given b R m ,
the closest point v? = x?1 v1 + + x?n vn V is prescribed by the solution x? = K 1 f
to the linear system
K x = f,
(4.21)
where K and f are given in (4.18), (4.19). The distance between the point and the subspace
is
q
?
k b k 2 f T x? .
(4.22)
kv bk =
When using the standard Euclidean inner product and norm on R n to measure distance, the entries of the Gram matrix K and the vector f are given by dot products:
kij = vi vj = viT vj ,
1/12/04

131

fi = vi b = viT b.
c 2003

Peter J. Olver

As in (3.49), both sets of equations can be combined into a single matrix equation. If
A = ( v1 v2 . . . vn ) denotes the m n matrix formed by the basis vectors, then
K = AT A,

f = AT b,

c = k b k2 .

(4.23)

A direct derivation of these equations is instructive. Since, by formula (2.14),

v = x1 v1 + + xn vn = A x,
we find
k v b k2 = k A x b k2 = (A x b)T (A x b) = (xT AT bT )(A x b)
= xT AT A x 2 xT AT b + bT b = xT Kx 2 xT f + c,
thereby justifying (4.23). (In the next to last equality, we equate the scalar quantities
bT A x = (bT A x)T = xT AT b.)
If, instead of the Euclidean inner product, we adopt an alternative inner product
h v ; w i = vT C w prescribed by a positive definite matrix C > 0, then the same computations produce
(4.24)
K = AT C A,
f = AT C b,
c = k b k2 .
The weighted Gram matrix formula was previously derived in (3.51).

1
2
Example 4.6. Let V R 3 be the plane spanned by v1 = 2 , v2 = 3 .
1
1

1
Our goal is to find the point v? V that is closest to b = 0 , where distance is
0
measured
in
the
usual
Euclidean
norm.
We
combine
the
basis
vectors
to form the matrix

1
2
A = 2 3 . According to (4.23), the positive definite Gram matrix and associated
1 1
vector are

6 3
1
T
T
K=A A=
,
f =A b=
.
3 14
2
(Or, alternatively, these can be computed directly by taking inner products, as in (4.18), (4.19).)
4 1 T
, 5 . Theorem 4.5 implies that
We solve the linear system K x = f for x? = K 1 f = 15
the closest point is
2

.6667
3
1
v? = x?1 v1 + x?2 v2 = A x? = 15
.0667 .
.4667
7
15

The distance from the point b to the plane is k v ? b k =

1/12/04

132

1
3

.5774.

c 2003

Peter J. Olver

Suppose, on the other hand, that distance is measured in the weighted norm k v k =
+ 21 v22 + 31 v32 corresponding to the diagonal matrix C = diag (1, 12 , 13 ). In this case, we
form the weighted Gram matrix and vector (4.24):

!
1 0 0

1
2
2
10

1 2 1
T
1
3
3
0 2 0 2 3 =
K = A CA =
,
53
2
2 3 1
1

1 1
0 0 3
3
6

1 0 0

1
1 2 1
1
T
1
0 2 0 0 =
f = A Cb =
,
2 3 1
2
0
0 0 13
v12

and so

x? = K 1 f

.8563
v? = A x? .0575 .
.6034

.3506
,
.2529

In this case, the distance between the point and the subspace is measured in the weighted
norm: k v? b k .3790.
Remark : The solution to the closest point problem given in Theorem 4.5 applies, as
stated, to the more general case when V W is a finite-dimensional subspace of a general
inner product space W . The underlying inner product space W can even be infinitedimensional, which it is when dealing with least squares approximations in function space,
to be described at the end of this chapter, and in Fourier analysis.
Least Squares
As we first observed in Section 4.1, the solution to the closest point problem also
solves the basic least squares minimization problem! Let us officially define the notion of
a (classical) least squares solution to a linear system.
Definition 4.7. The least squares solution to a linear system of equations
Ax = b

(4.25)

is the vector x? R n that minimizes the Euclidean norm k A x b k.

Remark : Later, we will generalize the least squares method to more general weighted
norms coming from inner products. However, for the time being we restrict our attention
to the Euclidean version.
If the system (4.25) actually has a solution, then it is automatically the least squares
solution. Thus, the concept of least squares solution is new only when the system does not
have a solution, i.e., b does not lie in the range of A. We also want the least squares solution
to be unique. As with an ordinary solution, this happens if and only if ker A = {0}, or,
equivalently, the columns of A are linearly independent, or, equivalently, rank A = n.
As before, to make the connection with the closest point problem, we identify the
subspace V = rng A R m as the range or column space of the matrix A. If the columns
1/12/04

133

c 2003

Peter J. Olver

of A are linearly independent, then they form a basis for the range V . Since every element
of the range can be written as v = A x, minimizing k A x b k is the same as minimizing
the distance k v b k between the point and the subspace. The least squares solution x ?
to the minimization problem gives the closest point v ? = A x? in V = rng A. Therefore,
the least squares solution follows from Theorem 4.5. In the Euclidean case, we state the
result more explicitly by using (4.23) to write out the linear system (4.21) and the minimal
distance (4.22).
Theorem 4.8. Assume ker A = {0}. Set K = AT A and f = AT b. Then the least
squares solution to A x = b is the unique solution to the normal equations
Kx = f

(AT A) x = AT b,

(4.26)

namely
x? = (AT A)1 AT b.

(4.27)

k A x? b k2 = k b k2 f T x? = k b k2 bT A (AT A)1 AT b.

(4.28)

The least squares error is

Note that the normal equations (4.26) can be simply obtained by multiplying the
original system A x = b on both sides by AT . In particular, if A is square and invertible,
then (AT A)1 = A1 (AT )1 , and so (4.27) reduces to x = A1 b, while the two terms in
the error formula (4.28) cancel out, producing 0 error. In the rectangular case when
this is not allowed formula (4.27) gives a new formula for the solution to (4.25) when
b rng A.
Example 4.9. Consider the linear system
x1 + 2 x 2

= 1,

3 x1 x2 + x3 = 0,
x1 + 2 x2 + x3 = 1,
x1 x2 2 x3 = 2,
2 x1 + x2 x3 = 2,

consisting of 5 equations in 3 unknowns. The coefficient matrix and right hand side are
1
3

A = 1

1
2

2
1
2
1
1

0
1

1 ,

2
1

1
0

b = 1 .

2
2

A direct application of Gaussian elimination shows that b 6

rng A, and so the system is
incompatible it has no solution. Of course, to apply the least squares method, one is
not required to check this in advance. If the system has a solution, it is the least squares
solution too, and the least squares method will find it.
1/12/04

134

c 2003

Peter J. Olver

To form the normal equations (4.26), we compute

16 2 2
8
K = AT A = 2 11
2 ,
f = AT b = 0 .
2 2
7
7

Solving the 3 3 system K x = f by Gaussian elimination, we find

x = K 1 f ( .4119, .2482, .9532 ) ,

to be the least squares solution to the system. The least squares error is
T

k b A x? k k ( .0917, .0342, .131, .0701, .0252 ) k .1799,

which is reasonably small indicating that the system is, roughly speaking, not too
incompatible.
Remark : If ker A 6
= {0}, then the least squares solution to A x = b is not unique,
cf. Exercise . When you ask Matlab to solve such a linear system (when A is not square)
then it gives you the least squares solution that has the minimum Euclidean norm.

4.4. Data Fitting and Interpolation.

One of the most important applications of the least squares minimization process is
to the fitting of data points. Suppose we are running an experiment in which we measure
a certain time-dependent physical quantity. At time ti we make the measurement yi , and
thereby obtain a set of, say, m data points
(t1 , y1 ),

(t2 , y2 ),

...

(tm , ym ).

(4.29)

Suppose our theory indicates that the data points are supposed to all lie on a single line
y = + t,

(4.30)

whose precise form meaning its coefficients , is to be determined. For example, a

police car is interested in clocking the speed of a vehicle using measurements of its relative
distance at several times. Assuming that the vehicle is traveling at constant speed, its
position at time t will have the linear form (4.30), with , the velocity, and , the initial
position, to be determined. Experimental error will almost inevitably make this impossible
to achieve exactly, and so the problem is to find the straight line (4.30) which best fits
the measured data.
The error between the measured value yi and the sample value predicted by the
function (4.30) at t = ti is
ei = yi ( + ti ),

i = 1, . . . , m.

We can write this system in matrix form as

e = y A x,
1/12/04

135

c 2003

Peter J. Olver

Figure 4.2.

Least Squares Approximation of Data by a Straight Line.

where
e
1
e2
e=
..
.

y
1
y2
y=
..
.

t1
t2
..
.

,
while
A=
.
(4.31)
x=
..

.
em
ym
1 tm
We call e the error vector and y the data vector . The coefficients , of our desired
function (4.30) are the unknowns, forming the entries of the column vector x.
If we could fit the data exactly, so yi = + ti for all i, then each ei = 0, and we
could solve A x = y. In matrix language, the data points all lie on a straight line if and
only if y rng A. If the data points are not all collinear, then we seek the straight line
that minimizes the total squared error or Euclidean norm
q
Error = k e k = e21 + + e2m .

Pictorially, referring to Figure 4.2, the errors are the vertical distances from the points to
the line, and we are seeking to minimize the square root of the sum of the squares of the
individual errors , hence the term least squares. In vector language, we are looking for the
T
coefficient vector x = ( , ) which minimizes the Euclidean norm of the error vector
k e k = k A x y k.

(4.32)

This choice of minimization may strike the reader as a little odd. Why not just minimize
the sum of the absolute value of the errors, i.e., the 1 norm k e k1 = | e1 | + + | en | of the
error vector, or minimize the maximal error, i.e., the norm k e k = max{ | e1 |, , | en | }? Or,
even better, why minimize the vertical distance to the line? Maybe the perpendicular distance
from each data point to the line, as computed in Exercise , would be a better measure of
error. The answer is that, although all of these alternative minimization criteria are interesting
and potentially useful, they all lead to nonlinear minimization problems, and are much harder
to solve! The least squares minimization problem can be solved by linear algebra, whereas the
others lead to nonlinear minimization problems. Moreover, one needs to be properly understand
the linear solution before moving on to the more treacherous nonlinear situation, cf. Section 19.3.

1/12/04

136

c 2003

Peter J. Olver

Thus, we are precisely in the situation of characterizing the least squares solution to the
system A x = y that was covered in the preceding subsection.
Theorem 4.8 prescribes the solution to this least squares minimization problem. We
form the normal equations
(AT A) x = AT y,

with solution

x? = (AT A)1 AT y.

(4.33)

Invertibility of the Gram matrix K = AT A relies on the assumption that the matrix A
have linearly independent columns. This requires that its columns be linearly independent,
and so not all the ti are equal, i.e., we must measure the data at at least two distinct times.
Note that this restriction does not preclude measuring some of the data at the same time,
e.g., by repeating the experiment. However, choosing all the ti s to be the same is a silly
data fitting problem. (Why?)
For the particular matrices (4.31), we compute
1 t
1
!

P
1 t2

t
1
m
t
1
1
.
.
.
1
i

AT A =
..
= P t P(t )2 = m t t2 ,
t1 t2 . . . tm ...
.
i
i
1 tm
(4.34)
y
1

P
y2
y
y
1 1 ... 1
T

= P i
=m
A y=
,
.

..
t1 t2 . . . t m
t y
ty
i i

where the overbars, namely

m
1 X
t,
m i=1 i

m
1 X
y,
m i=1 i

t2 =

m
1 X 2
t ,
m i=1 i

ty =

denote the average sample values of the indicated variables.

m
1 X
t y,
m i=1 i i

(4.35)

Warning: The average of a product is not equal to the product of the averages! In
particular,
t2 6
= ( t )2 ,
ty 6
= t y.

Substituting (4.34) into the normal equations (4.33), and canceling the common factor
of m, we find that we have only to solve a pair of linear equations
t + t2 = t y.

+ t = y,
The solution is
= y t ,

P
(t t ) yi
= P i
=
.
(ti t )2
t2 ( t ) 2
tyty

(4.36)

Therefore, the best (in the least squares sense) straight line that fits the given data is
y = (t t ) + y,
where the lines slope is given in (4.36).
1/12/04

137

c 2003

Peter J. Olver

Example 4.10. Suppose the data points are given by the table

Then

1
1
A=
1
1

Therefore

0
1
,
3
6
T

A A=

AT =

4 10
10 46

1
0

1 1
1 3

1
6

2
3
y = .
7
12

A y=

24
96

The normal equations (4.33) reduce to

4 + 10 = 24,

10 + 46 = 96,

12
7 ,

12
7 .

Therefore, the best least squares fit to the data is the straight line
y=

12
7

Alternatively, one can compute this formula directly from (4.36).

Example 4.11. Suppose we are given a sample of an unknown radioactive isotope.
At time ti we measure, using a Geiger counter, the amount mi of radioactive material in
the sample. The problem is to determine the initial amount of material and the isotopes
half life. If the measurements were exact, we would have m(t) = m0 e t , where m0 = m(0)
log 2
; see
is the initial mass, and < 0 the decay rate. The half life is given by t ? =

Example 8.1 for additional information.

As it stands this is not a linear least squares problem, but it can be converted to that
form by taking logarithms:
y(t) = log m(t) = log m0 + t = + t.
We can thus do a linear least squares fit on the logarithms yi = log mi of the radioactive
mass data at the measurement times ti to determine the best values for and = log m0 .
Polynomial Approximation and Interpolation
The basic least squares philosophy has a variety of different extensions, all interesting
and all useful. First, we can replace the affine function (4.30) by a quadratic function
y = + t + t2 ,

(4.37)

In this case, we are looking for the parabola that best fits the data. For example, Newtons
theory of gravitation says that (in the absence of air resistance) a falling object obeys the
1/12/04

138

c 2003

Peter J. Olver

Linear

Quadratic
Figure 4.3.

Cubic

Interpolating Polynomials.

parabolic law (4.37), where = h0 is the initial height, = v0 is the initial velocity, and
= 21 g m is one half the weight of the object. Suppose we observe a falling body, and
measure its height yi at times ti . Then we can approximate its initial height, initial velocity
and weight by finding the parabola (4.37) that best fits the data. Again, we characterize
the least squares fit by minimizing the sum of the squares of errors ei = yi y(ti ).
The method can evidently be extended to a completely general polynomial function
y(t) = 0 + 1 t + + n tn

(4.38)

of degree n. The total least squares error between the data and the sample values of the
function is equal to
m
X

2
2
kek =
yi y(ti ) = k y A x k2 ,
(4.39)
i=1

where

1
A=
..
.
1

t1
t2
..
.
tm

t21
t22
..
.
t2m

...
...
..
.
...

tn1

tn2

.. ,
.
tnm

x=
.2 .
.
.
n

(4.40)

In particular, if m = n + 1, then A is square, and so, assuming A is invertible, we can

solve A x = y exactly. In other words, there is no error, and the solution is an interpolating
polynomial , meaning that it fits the data exactly. A proof of the following result can be
found in Exercise .
Lemma 4.12. If t1 , . . . , tn+1 are distinct, ti 6
= tj , then the (n + 1) (n + 1) interpolation matrix (4.40) is nonsingular.
This result immediately implies the basic existence theorem for interpolating polynomials.
Theorem 4.13. Let t1 , . . . , tn+1 be distinct sample points. Then, for any prescribed
data y1 , . . . , yn+1 , there exists a unique degree n interpolating polynomial (4.38) with
sample values y(ti ) = yi for all i = 1, . . . , n + 1.
Thus, two points will determine a unique interpolating line, three points a unique
interpolating parabola, four points an interpolating cubic, and so on. Examples are illustrated in Figure 4.3.
1/12/04

139

c 2003

Peter J. Olver

Example 4.14. The basic ideas of interpolation and least squares fitting of data
can be applied to approximate complicated mathematical functions by much simpler polynomials. Such approximation schemes are used in all numerical computations when
you ask your computer or calculator to compute et or cos t or any other function, it only
knows how to add, subtract, multiply and divide, and so must rely on an approximation
scheme based on polynomials In the dark ages before computers, one would consult
precomputed tables of values of the function at particular data points. If one needed a
value at a nontabulated point, then some form of polynomial interpolation would typically
be used to accurately approximate the intermediate value.
For example, suppose we want to compute reasonably accurate values for the exponential function et for values of t lying in the interval 0 t 1 by using a quadratic
polynomial
p(t) = + t + t2 .
(4.41)
If we choose 3 points, say t1 = 0, t2 = .5, t3 = 1, then there is a unique quadratic polynomial
(4.41) that interpolates et at the data points, i.e.,
p(ti ) = eti

for

i = 1, 2, 3.

In this case, the coefficient matrix (4.40), namely

1 0
0
A = 1 .5 .25 ,
1 1
1

is invertible. Therefore, we can exactly solve the interpolation equations A x = y, where

e t1
1

y = et2 = 1.64872
e t3

2.71828

is the data vector. The solution

x = = .876603

.841679

yields the interpolating polynomial

p(t) = 1 + .876603 t + .841679 t2 .

(4.42)

It is the unique quadratic polynomial that agrees with et at the three specified data points.
See Figure 4.4 for a comparison of the graphs; the first graph shows e t , the second p(t), and

Actually, one could also allow interpolation and approximation by rational functions, a subject known as Pade approximation theory. See [ 12 ] for details.

1/12/04

140

c 2003

Peter J. Olver

2.5

1.5

0.5

0.2

0.4

0.6

0.8

0.2

Figure 4.4.

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Quadratic Interpolating Polynomial for et .

the third lays the two graphs on top of each other. Even with such a simple interpolation
scheme, the two functions are quite close. The L norm of the difference is

k et p(t) k = max | et p(t) | 0 t 1 .01442,

with the maximum error occurring at t .796.

There is, in fact, an explicit formula for the interpolating polynomial that is named after the influential eighteenth century ItaloFrench mathematician JosephLouis Lagrange.
It relies on the basic superposition principle for solving inhomogeneous systems Theorem 2.42. Specifically, if we know the solutions x1 , . . . , xn+1 to the particular interpolation
systems
A xk = ek ,
k = 1, . . . , n + 1,
(4.43)
where e1 , . . . , en+1 are the standard basis vectors of R n+1 , then the solution to
A x = y = y1 e1 + + yn+1 en+1
is given by the superposition formula
x = y1 x1 + + yn+1 xn+1 .
The particular interpolation equation (4.43) corresponds to interpolation data y = e k ,
meaning that yk = 1, while yi = 0 at all points ti with i 6
= k. If we can find the
n + 1 particular interpolating polynomials that realize this very special data, we can use
superposition to construct the general interpolating polynomial. It turns out that there is
a simple explicit formula for the basic interpolating polynomials.
Theorem 4.15. Given distinct values t1 , . . . , tn+1 , the k th Lagrange interpolating
polynomial is the degree n polynomial given by
Lk (t) =

(t t1 ) (t tk1 )(t tk+1 ) (t tn+1 )

,
(tk t1 ) (tk tk1 )(tk tk+1 ) (tk tn+1 )

k = 1, . . . , n + 1.
(4.44)

It is the unique polynomial of degree n that satisfies

(
1,
i = k,
i, k = 1, . . . , n + 1.
Lk (ti ) =
0,
i6
= k,
1/12/04

141

c 2003

(4.45)
Peter J. Olver

0.8

0.6

0.4

0.4
0.2

0.2

0.2
0.2

0.4

0.6

0.8

0.2

L1 (t)

0.4

0.6

0.8

L2 (t)

Figure 4.5.

0.4

0.6

0.8

L3 (t)

Lagrange Interpolating Polynomials for the Points 0, .5, 1.

Proof : The uniqueness of the Lagrange interpolating polynomial is an immediate

consequence of Theorem 4.13. To show that (4.44) is the correct formula, we note that
when t = ti , i 6
= k, the factor (t ti ) in the numerator of Lk (t) vanishes, while when t = tk
the numerator and denominator are equal.
Q.E.D.
Theorem 4.16. If t1 , . . . , tn+1 are distinct, then the degree n polynomial that interpolates the associated data y1 , . . . , yn+1 is
p(t) = y1 L1 (t) + + yn+1 Ln+1 (t).

(4.46)

Proof : We merely compute

p(tk ) = y1 L1 (tk ) + + yk Lk (t) + + yn+1 Ln+1 (tk ) = yk ,
where, according to (4.45), every summand except the k th is zero.

Q.E.D.

Example 4.17. For example, the three quadratic Lagrange interpolating polynomials for the values t1 = 0, t2 = 21 , t3 = 1 used to interpolate et in Example 4.14 are
(t 12 )(t 1)
= 2 t2 3 t + 1,
(0 12 )(0 1)
(t 0)(t 1)
L2 (t) = 1
= 4 t2 + 4 t,
1
( 2 0)( 2 1)

L1 (t) =

(4.47)

(t 0)(t 12 )
L3 (t) =
= 2 t2 t.
(1 0)(1 12 )

Thus, one can rewrite the quadratic interpolant (4.42) to et as

y(t) = L1 (t) + e1/2 L2 (t) + e L3 (t)
= (2 t2 3 t + 1) + 1.64872( 4 t2 + 4 t) + 2.71828(2 t2 t).
We stress that this is the same interpolating polynomial we have merely rewritten it in
the more transparent Lagrange form.
1/12/04

142

c 2003

Peter J. Olver

-3

-2

0.8

0.6

0.4

0.2

-1

-0.2

Figure 4.6.

-3

-2

-1

-3

-2

-0.2

-1

-0.2

Degree 2, 4 and 10 Interpolating Polynomials for 1/(1 + t 2 ).

One might expect that the higher the degree, the more accurate the interpolating
polynomial. This expectation turns out, unfortunately, not to be uniformly valid. While
low degree interpolating polynomials are usually reasonable approximants to functions,
high degree interpolants are more expensive to compute, and, moreover, can be rather
badly behaved, particularly near the ends of the interval. For example, Figure 4.6 displays
the degree 2, 4 and 10 interpolating polynomials for the function 1/(1 + t 2 ) on the interval
3 t 3 using equally spaced data points. Note the rather poor approximation of the
function near the endpoints of the interval. Higher degree interpolants fare even worse,
although the bad behavior becomes more and more concentrated near the ends of the
interval. As a consequence, high degree polynomial interpolation tends not to be used
in practical applications. Better alternatives rely on least squares approximants by low
degree polynomials, to be described next, and interpolation by piecewise cubic splines, a
topic that will be discussed in depth in Chapter 11.
If we have m > n + 1 data points, then, usually, there is no degree n polynomial
that fits all the data, and so one must switch over to a least squares approximation. The
first requirement is that the associated m (n + 1) interpolation matrix (4.40) has rank
n + 1; this follows from Lemma 4.12 provided at least n + 1 of the values t 1 , . . . , tm are
distinct. Thus, given data at m n + 1 different sample points t1 , . . . , tm , we can uniquely
determine the best least squares polynomial of degree n that fits the data by solving the
normal equations (4.33).
Example 4.18. If we use more than three data points, but still require a quadratic
polynomial, then we cannot interpolate exactly, and must use a least squares approximant.
Let us return to the problem of approximating the exponential function e t . For instance,
using five equally spaced sample points t1 = 0, t2 = .25, t3 = .5, t4 = .75, t5 = 1, the
coefficient matrix and sampled data vector (4.40) are
1
1

A = 1

1
1

1/12/04

0
0
.25 .0625

.5
.25 ,

.75 .5625
1
1
143

1
1.28403

y = 1.64872 .

2.11700
2.71828

c 2003

Peter J. Olver

2.5

1.5

0.5

0.2

Figure 4.7.

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Quadratic Approximating Polynomial and Quartic Interpolating

Polynomial for et .

The solution to the normal equations (4.26), with

5.
T

K=A A=
2.5
1.875

2.5
1.875
1.5625

1.875
1.5625 ,
1.38281

8.76803
f = AT y = 5.45140 ,
4.40153
T

x = K 1 f = ( 1.00514, .864277, .843538 ) .

This leads to the modified approximating quadratic polynomial
p2 (t) = 1.00514 + .864277 t + .843538 t2 .
On the other hand, the quartic interpolating polynomial
p4 (t) = .069416 t4 + .140276 t3 + .509787 t2 + .998803 t + 1
is found directly from the data values as above. The quadratic polynomial has a maximal
error of .011 slightly better than the quadratic interpolant while the quartic has
a significantly smaller maximal error: .0000527. (In this case, high degree interpolants
are not ill behaved.) See Figure 4.7 for a comparison of the graphs, and Example 4.21
below for further discussion.
Approximation and Interpolation by General Functions
There is nothing special about polynomial functions in the preceding approximation
scheme. For example, suppose we were interested in finding the best 2 -periodic trigonometric approximation
y = 1 cos t + 2 sin t
1/12/04

144

c 2003

Peter J. Olver

to a given set of data. Again, the least squares error takes the same form k y A x k 2 as
in (4.39), where
cos t
y
sin t1
1
1

y
cos t2 sin t2

1
,
.2 .
A=
x
=
,
y
=
.
.

.
..
..
2
.
cos tm sin tm
ym

The key is that the unspecified parameters in this case 1 , 2 occur linearly in the
approximating function. Thus, the most general case is to approximate the data (4.29) by
a linear combination
y(t) = 1 h1 (t) + 2 h2 (t) + + n hn (t),
of prescribed, linearly independent functions h1 (x), . . . , hn (x). The least squares error is,
as always, given by
v
u m
u X
2
yi y(ti )
= k y A x k,
Error = t
i=1

where the coefficient matrix

h1 (t1 ) h2 (t1 )

h1 (t2 ) h2 (t2 )
A=
..
..
.
.
h1 (tm ) h2 (tm )

and vector of unknown coefficients are

y
. . . hn (t1 )
1
1

2
y2
. . . hn (t2 )
,

x=
y=
..

..
.. ,
.. .
.
.
.

.
n
ym
. . . h (t )
n

(4.48)

Thus, the columns of A are the sampled values of the functions. If A is square and
nonsingular, then we can find an interpolating function of the prescribed form by solving
the linear system
A x = y.
(4.49)
A particularly important case is provided by the 2 n + 1 trigonometric functions
1,

cos x,

sin x,

cos 2 x,

sin 2 x,

...

cos n x,

sin n x.

Interpolation on 2 n + 1 equally spaced data points on the interval [ 0, 2 ] leads to the

discrete Fourier transform, of profound significance in signal processing, data transmission,
and compression, [27]. Trigonometric interpolation and the discrete Fourier transform will
be the focus of Section 13.1.
If there are more than n data points, then we cannot, in general, interpolate exactly,
and must content ourselves with a least squares approximation. The least squares solution
to the interpolation equations (4.49) is found by solving the associated normal equations
K x = f , where the (i, j) entry of K = AT A is m times the average value of the product
of hi (t) and hj (t), namely
kij = m hi (t) hj (t) =

m
X

hi (t ) hj (t ),

(4.50)

1/12/04

145

c 2003

Peter J. Olver

whereas the ith entry of f = AT y is m times the average

fi = m hi (t) y =

m
X

hi (t ) y .

(4.51)

The one key question is whether the columns of A are linearly independent; this is more
subtle than the polynomial case covered by Lemma 4.12, and requires the sampled function
vectors to be linearly independent, which in general is different than requiring the functions
themselves to be linearly independent. See Exercise for a few details on the distinction
between these two notions of linear independence.
If the parameters do not occur linearly in the functional formula, then we cannot use a
linear analysis to find the least squares solution. For example, a direct linear least squares
approach does not suffice to find the frequency , the amplitude r, and the phase of a
general trigonometric approximation:
y = c1 cos t + c2 sin t = r cos( t + ).
Approximating data by such a function constitutes a nonlinear minimization problem, and
must be solved by the more sophisticated techniques presented in Section 19.3.
Weighted Least Squares
Another generalization is to introduce weights in the measurement of the least squares
error. Suppose some of the data is known to be more reliable or more significant than
others. For example, measurements at an earlier time may be more accurate, or more
critical to the data fitting problem, than measurements at later time. In that situation,
we should penalize any errors at the earlier times and downplay errors in the later data.
In general, this requires the introduction of a positive weight c i > 0 associated to each
data point (ti , yi ); the larger the weight, the more important the error. For a straight line
approximation y = + t, the weighted least squares error is defined as
v
v
u m
u m
X
u
uX
2
2
t
Error =
ci ei = t
ci yi ( + ti ) .
i=1

i=1

Let us rewrite this formula in matrix form. Let C = diag (c1 , . . . , cm ) denote the diagonal
weight matrix . Note that C > 0 is positive definite, since all the weights are positive. The
least squares error

Error = eT C e = k e k
is then the norm of the error vector e with respect to the weighted inner product
h v ; w i = vT C w

(4.52)

induced by the matrix C. Since e = y A x,

k e k2 = k A x y k2 = (A x y)T C (A x y)

= xT AT C A x 2 xT AT C y + yT C y = xT K x 2 xT f + c,

1/12/04

146

c 2003

(4.53)

Peter J. Olver

where
K = AT C A,

f = AT C y,

c = y T C y = k y k2 .

Note that K is the Gram matrix derived in (3.51), whose entries

kij = h vi ; vj i = viT C vj

are the weighted inner products between the column vectors v1 , . . . , vn of A. Theorem 3.33
immediately implies that K is positive definite provided A has linearly independent
columns or, equivalently, has rank n m.
Theorem 4.19. Suppose A is an m n matrix with linearly independent columns.
Suppose C > 0 is any positive definite m m matrix. Then, the quadratic function (4.53)
giving the weighted least squares error has a unique minimizer, which is the solution to
the weighted normal equations
AT C A x = AT C y,

x = (AT C A)1 AT C y.

so that

(4.54)

In other words, the weighted least squares solution is obtained by multiplying both
sides of the original system A x = y by the matrix AT C. The derivation of this result
allows C > 0 to be any positive definite matrix. In applications, the off-diagonal entries
of C can be used to weight cross-correlation terms in the data.
Example 4.20. In Example 4.10 we fit the data
ti

1
2

1
4

with an unweighted least squares line. Now we shall assign the weights for the error at
each sample point listed in the last row of the table, so that errors in the first two data
values carry more weight. To find the weighted least squares line y = + t that best fits
the data, we compute

1 0
!

3 0 0 0
23
5
1
1
1
1
1
1
0
2
0
0

4
,
AT C A =

1 3
0 1 3 6
0 0 12 0
5 31
2
0 0 0 14
1 6

2
!

3 0 0 0
37
1
1
1
1
0
2
0
0
3

2
=
AT C y =
.

69
0 1 3 6
0 0 21 0
7
2
0 0 0 14
12
Thus, the weighted normal equations (4.54) reduce to
23
4

+ 5 =

37
2 ,

5 +

31
2

69
2 ,

= 1.7817,

= 1.6511.

Therefore, the least squares fit to the data under the given weights is y = 1.7817 + 1.6511 t.
1/12/04

147

c 2003

Peter J. Olver

Least Squares Approximation in Function Spaces

So far, while we have used least squares minimization to interpolate and approximate
known, complicated functions by simpler polynomials, we have only worried about the
errors committed at a discrete, preassigned set of sample points. A more uniform approach
would be to take into account the errors committed at all points in the interval of interest.
This can be accomplished by replacing the discrete, finite-dimensional vector space norm
on sample vectors by a continuous, infinite-dimensional function space norm in order to
specify the least squares error that must be minimized over the entire interval.
More specifically, we let V = C0 [ a, b ] denote the space of continuous functions on the
bounded interval [ a, b ] with L2 inner product
Z b
hf ;gi =
f (t) g(t) dt.
(4.55)
a

Let P (n) denote the subspace consisting of all polynomials of degree n. For simplicity,
we employ the standard monomial basis 1, t, t2 , . . . , tn . We will be approximating a general
function f (t) C0 [ a, b ] by a polynomial
p(t) = 1 + 2 t + + n+1 tn P (n)

(4.56)

of degree at most n. The error function e(t) = f (t)p(t) measures the discrepancy between
the function and its approximating polynomial at each t. Instead of summing the squares
of the errors at a finite set of sample points, we go to a continuous limit that integrates
the squared errors of all points in the interval. Thus, the approximating polynomial will
be characterized as the one that minimizes the L2 least squares error
s
Z b
[ p(t) f (t) ]2 dt .
(4.57)
Error = k e k = k p f k =
a

To solve the minimization problem, we begin by substituting (4.56) and expanding,

as in (4.17):

n+1
2 n+1
n+1
X
X
X

i1 j
i1
2
i j h t
; t i2
i t
f (t) =
i h ti1 ; f (t) i+k f (t) k2 .
kp f k =

i,j = 1

i=1

As a result, we are led to minimize the same kind of quadratic function

xT K x 2 xT f + c,

(4.58)

where x = 1 , 2 , . . . , n+1
is the vector containing the unknown coefficients in the
minimizing polynomial, while
Z b
Z b
i1 j1
i+j2
i1
(4.59)
kij = h t
;t
i=
t
dt,
fi = h t
;f i =
ti1 f (t) dt,
a

are, as before, the Gram matrix K consisting of inner products between basis monomials
along with the vector f of inner products between the monomials and the right hand side.
The coefficients of the least squares minimizing polynomial are thus found by solving the
associated normal equations K x = f .
1/12/04

148

c 2003

Peter J. Olver

2.5

1.5

0.5

0.2

0.4

0.6

0.8

Figure 4.8.

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Quadratic Least Squares Approximation of et .

Example 4.21. Let us return to the problem of approximating the exponential

function f (t) = et on the interval 0 t 1. We consider the subspace P (2) consisting of
all quadratic polynomials
p(t) = + t + t2 .
Using the monomial basis 1, t, t2 , the normal equations are

1 12 13

e1
1 1 1

2 3 4 = 1 .
1
3

1
4

1
5

The coefficient matrix is the Gram matrix K consisting of the inner products
Z 1
1
i j
ht ;t i =
ti+j dt =
i+j+1
0

between basis monomials, while the right hand side is the vector of inner products
Z 1
t i
ti et dt.
he ;t i =
0

The solution is computed to be

= 39 e 105 ' 1.012991,

= 216 e + 588 ' .851125,

= 210 e 570 ' .839184,

leading to the least squares quadratic approximant

p? (t) = 1.012991 + .851125 t + .839184 t2 ,

(4.60)

that is plotted in Figure 4.8 The least squares error is

k et p? (t) k ' .00527593.

The maximal error is measured by the L norm of the difference,

k et p? (t) k = max | et p? (t) | 0 t 1 ' .014981815,

with the maximum occurring at t = 1. Thus, the simple quadratic polynomial (4.60) will
give a reasonable approximation to the first two decimal places in e t on the entire interval
[ 0, 1 ]. A more accurate approximation can be made by taking a higher degree polynomial,
or by decreasing the length of the interval.
1/12/04

149

c 2003

Peter J. Olver

Remark : Although the least squares polynomial (4.60) minimizes the L 2 norm of the
error, it does slightly worse with the L norm than the previous sample-based minimizer
(4.42). The problem of finding the quadratic polynomial that minimizes the L norm is
more difficult, and must be solved by nonlinear minimization methods.
Remark : As noted in Example 3.35, the Gram matrix for the simple monomial basis
is the nn Hilbert matrix (1.67). The ill conditioned nature of the Hilbert matrix, and the
consequential difficulty in accurately solving the normal equations, complicates the practical numerical implementation of high degree least squares polynomial approximations. A
better approach, based on an alternative orthogonal polynomial basis, will be discussed in
in the ensuing Chapter.

1/12/04

150

c 2003

Peter J. Olver

Chapter 5
Orthogonality
Orthogonality is the mathematical formalization of the geometrical property of perpendicularity suitably adapted to general inner product spaces. In finite-dimensional
spaces, bases that consist of mutually orthogonal elements play an essential role in the theory, in applications, and in practical numerical algorithms. Many computations become
dramatically simpler and less prone to numerical instabilities when performed in orthogonal systems. In infinite-dimensional function space, orthogonality unlocks the secrets of
Fourier analysis and its manifold applications, and underlies basic series solution methods
for partial differential equations. Indeed, many large scale modern applications, including
signal processing and computer vision, would be impractical, if not completely infeasible
were it not for the dramatic simplifying power of orthogonality. As we will later discover,
orthogonal systems naturally arise as eigenvector and eigenfunction bases for symmetric
matrices and self-adjoint boundary value problems for both ordinary and partial differential equations, and so play a major role in both finite-dimensional and infinite-dimensional
analysis and applications.
Orthogonality is motivated by geometry, and the methods have significant geometrical
consequences. Orthogonal matrices play an essential role in the geometry of Euclidean
space, computer graphics, animation, and three-dimensional image analysis, to be discussed
in Chapter 7. The orthogonal projection of a point onto a subspace is the closest point or
least squares minimizer. Moreover, when written in terms of an orthogonal basis for the
subspace, the normal equations underlying least squares analysis have an elegant explicit
solution formula. Yet another important fact is that the four fundamental subspaces of a
matrix form mutually orthogonal pairs. The orthogonality property leads directly to a new
characterization of the compatibility conditions for linear systems known as the Fredholm
alternative.
The duly famous GramSchmidt process will convert an arbitrary basis of an inner
product space into an orthogonal basis. As such, it forms one of the key algorithms of
linear analysis, in both finite-dimensional vector spaces and also function space where it
leads to the classical orthogonal polynomials and other systems of orthogonal functions. In
Euclidean space, the GramSchmidt process can be re-interpreted as a new kind of matrix
factorization, in which a nonsingular matrix A = Q R is written as the product of an
orthogonal matrix Q and an upper triangular matrix R. The Q R factorization underlies
one of the primary numerical algorithms for computing eigenvalues, to be presented in
Section 10.6.
1/12/04

150

c 2003

Peter J. Olver

Figure 5.1.

Orthogonal Bases in R 2 and R 3 .

5.1. Orthogonal Bases.

Let V be a fixed real inner product space. Recall that two elements v, w V are
called orthogonal if their inner product vanishes: h v ; w i = 0. In the case of vectors in
Euclidean space, this means that they meet at a right angle, as sketched in Figure 5.1.
A particularly important configuration is when V admits a basis consisting of mutually
orthogonal elements.
Definition 5.1. A basis u1 , . . . , un of V is called orthogonal if h ui ; uj i = 0 for
all i 6
= j. The basis is called orthonormal if, in addition, each vector has unit length:
k ui k = 1, for all i = 1, . . . , n.
For the Euclidean space R n equipped with the standard dot product, the simplest
example of an orthonormal basis is the standard basis

0
1
0
0
1
0

0
0
0
. .
,
. ,
.
.
.
e
=
e1 =
e
=
.
n
2
..
..
..

0
0
0
0

Orthogonality follows because ei ej = 0, for i 6

= j, while k ei k = 1 implies normality.
Since a basis cannot contain the zero vector, there is an easy way to convert an
orthogonal basis to an orthonormal basis. Namely, one replaces each basis vector by a unit
vector pointing in the same direction, as in Lemma 3.16.

The methods can be adapted more or less straightforwardly to complex inner product spaces.
The main complication, as noted in Section 3.6, is that we need to be careful with the order of
vectors appearing in the non-symmetric complex inner products. In this chapter, we will write
all inner product formulas in the proper order so that they retain their validity in complex vector
spaces.

1/12/04

151

c 2003

Peter J. Olver

Lemma 5.2. If v1 , . . . , vn is any orthogonal basis, then the normalized vectors

ui = vi /k vi k form an orthonormal basis.
Example 5.3. The vectors

1
v1 = 2 ,
1

v 2 = 1 ,
2

5
v 3 = 2 ,
1

are easily seen to form a basis of R 3 . Moreover, they are mutually perpendicular, v1 v2 =
v1 v3 = v2 v3 = 0 , and so form an orthogonal basis with respect to the standard dot
product on R 3 . When we divide each orthogonal basis vector by its length, the result is
the orthonormal basis

1
5
0
1
0
5
26
302

1
1
1
1

u1 =
= 6 , u2 =
2
1 = 5 , u3 =
2 =
30 ,
6 1
5 2
30
1
2
1
16
5
30
satisfying u1 u2 = u1 u3 = u2 u3 = 0 and k u1 k = k u2 k = k u3 k = 1. The appearance
of square roots in the elements of an orthonormal basis is fairly typical.
A useful observation is that any orthogonal collection of nonzero vectors is automatically linearly independent.
Proposition 5.4. If v1 , . . . , vk V are nonzero, mutually orthogonal, so h vi ; vj i =
0 for all i 6
= j, then they are linearly independent.
Proof : Suppose
c1 v1 + + ck vk = 0.
Let us take the inner product of the equation with any vi . Using linearity of the inner
product and orthogonality of the elements, we compute
0 = h c 1 v1 + + c k vk ; v i i = c 1 h v1 ; v i i + + c k h vk ; v i i = c i h vi ; v i i = c i k v i k2 .
Therefore, provided vi 6
= 0, we conclude that the coefficient ci = 0. Since this holds for
all i = 1, . . . , k, linear independence of v1 , . . . , vk follows.
Q.E.D.
As a direct corollary, we infer that any orthogonal collection of nonzero vectors is
automatically a basis for its span.
Proposition 5.5. Suppose v1 , . . . , vn V are mutually orthogonal nonzero elements
of an inner product space V . Then v1 , . . . , vn form an orthogonal basis for their span
W = span {v1 , . . . , vn } V , which is therefore a subspace of dimension n = dim W . In
particular, if dim V = n, then they form a orthogonal basis for V .
Orthogonality is also of great significance for function spaces.
1/12/04

152

c 2003

Peter J. Olver

Example 5.6. Consider the vector space P (2) consisting of all quadratic polynomials
p(x) = + x + x2 , equipped with the L2 inner product and norm
s
Z 1
Z 1
p
hp;qi =
p(x) q(x) dx,
kpk = hp;pi =
p(x)2 dx .
0

The standard monomials 1, x, x do not form an orthogonal basis. Indeed,

h1;xi =

One orthogonal basis of P

p1 (x) = 1,

1
2,
(2)

h 1 ; x2 i =

1
3

h x ; x2 i =

1
4

is provided by following polynomials:

p2 (x) = x 12 ,

p3 (x) = x2 x + 61 .

(5.1)

Indeed, one easily verifies that h p1 ; p2 i = h p1 ; p3 i = h p2 ; p3 i = 0, while

1
1
1
1
k p2 k = = ,
k p3 k =
= .
(5.2)
12
2 3
180
6 5
The corresponding orthonormal basis is found by dividing each orthogonal basis element
by its norm:

(5.3)
u3 (x) = 5 6 x2 6 x + 1 .
u1 (x) = 1,
u2 (x) = 3 ( 2 x 1 ) ,
k p1 k = 1,

In Section 5.4 below, we will learn how to construct such orthogonal systems of polynomials.
Computations in Orthogonal Bases

What are the advantages of orthogonal and orthonormal bases? Once one has a basis
of a vector space, a key issue is how to express other elements as linear combinations of
the basis elements that is, to find their coordinates in the prescribed basis. In general,
this is not an easy problem, since it requires solving a system of linear equations, (2.22).
In high dimensional situations arising in applications, computing the solution may require
a considerable, if not infeasible amount of time and effort.
However, if the basis is orthogonal, or, even better, orthonormal, then the change
of basis computation requires almost no work. This is the crucial insight underlying the
efficacy of both discrete and continuous Fourier methods, large data least squares approximations, signal and image processing, and a multitude of other crucial applications.
Theorem 5.7. Let u1 , . . . , un be an orthonormal basis for an inner product space
V . Then one can write any element v V as a linear combination
in which the coordinates

v = c 1 u1 + + c n un ,
ci = h v ; ui i,

(5.4)

i = 1, . . . , n,

(5.5)

are explicitly given as inner products. Moreover, the norm

v
u n
q
uX
h v ; ui i 2
c21 + + c2n = t
kvk =

(5.6)

i=1

is the square root of the sum of the squares of its coordinates.

1/12/04

153

c 2003

Peter J. Olver

Proof : Let us compute the inner product of (5.4) with one of the basis vectors. Using
the orthonormality conditions

0
i6
= j,
(5.7)
h ui ; u j i =
1
i = j,
and bilinearity of the inner product, we find
+
* n
n
X
X
c j h uj ; u i i = c i k u i k 2 = c i .
c j uj ; u i =
h v ; ui i =
j =1

j =1

To prove formula (5.6), we similarly expand

kvk = hv;vi =

n
X

i,j = 1

c i c j h ui ; u j i =

n
X

c2i ,

i=1

again making use of the orthonormality of the basis elements.

Q.E.D.

Example 5.8. Let us rewrite the vector v = ( 1, 1, 1 ) in terms of the orthonormal

basis

1
5
0
26
1

2 ,
u1 = 6 ,
u2 = 5 ,
u3 =

30
2
1
1

6
5
30
constructed in Example 5.3. Computing the dot products
v u1 =

2
6

v u2 =

3
5

v u3 =

4
30

we conclude that
v=

2
6

u1 +

3
5

u2 +

4
30

u3 ,

as the reader can validate. Needless to say, a direct computation based on solving the
associated linear system, as in Chapter 2, is more tedious.
While passage from an orthogonal basis to its orthonormal version is elementary one
simply divides each basis element by its norm we shall often find it more convenient to
work directly with the unnormalized version. The next result provides the corresponding
formula expressing a vector in terms of an orthogonal, but not necessarily orthonormal
basis. The proof proceeds exactly as in the orthonormal case, and details are left to the
reader.
Theorem 5.9. If v1 , . . . , vn form an orthogonal basis, then the corresponding coordinates of a vector
v = a 1 v1 + + a n vn
1/12/04

are given by
154

ai =

h v ; vi i
.
k v i k2
c 2003

(5.8)
Peter J. Olver

In this case, the norm can be computed via the formula

2
n
n
X
X
h v ; vi i
2
2
2
.
kvk =
ai k v i k =
k vi k
i=1
i=1

(5.9)

Equation (5.8), along with its orthonormal simplification (5.5), is one of the most
important and useful formulas we shall establish. Applications will appear repeatedly
throughout the remainder of the text.
Example 5.10. The wavelet basis

1
1
1
1
v2 =
v1 = ,
,
1
1
1
1

1
1
v3 =
,
0
0

0
0
v4 =
,
1
1

(5.10)

introduced in Example 2.33 is, in fact, an orthogonal basis of R 4 . The norms are

k v1 k = 2,
k v2 k = 2,
k v3 k = 2,
k v4 k = 2.
Therefore, using (5.8), we can readily express any vector as a linear combination of the
wavelet basis vectors. For example,

4
2
v=
= 2 v 1 v2 + 3 v 3 2 v 4 ,
1
5
where the wavelet basis coordinates are computed directly by
8
h v ; v1 i
= = 2,
2
k v1 k
4

h v ; v2 i
4
=
= 1,
2
k v2 k
4

h v ; v3 i
6
= =3
2
k v3 k
2

h v ; v4 i
4
=
= 2 .
2
k v4 k
2

This is clearly a lot quicker than solving the linear system, as we did in Example 2.33.
Finally, we note that
46 = k v k2 = 22 k v1 k2 + ( 1)2 k v2 k2 + 32 k v3 k2 + ( 2)2 k v4 k2 = 4 4 + 1 4 + 9 2 + 4 2,
in conformity with (5.9).
Example 5.11. The same formulae are equally valid for orthogonal bases in function
spaces. For example, to express a quadratic polynomial

p(x) = c1 p1 (x) + c2 p2 (x) + c3 p3 (x) = c1 + c2 x 12 + c3 x2 x + 16

in terms of the orthogonal basis (5.1), we merely compute the inner product integrals
Z 1
Z 1

h p ; p1 i
h p ; p2 i
=
= 12
c1 =
p(x) dx,
c2 =
p(x) x 21 dx,
2
2
k p1 k
k p2 k
0
0
Z 1

h p ; p2 i
c3 =
= 180
p(x) x2 x + 16 dx.
2
k p2 k
0
1/12/04

155

c 2003

Peter J. Olver

Thus, for example,

p(x) = x2 + x + 1 =

11
6

as is easily checked.

+2 x

1
2

+ x2 x +

1
6

Example 5.12. Perhaps the most important example of an orthogonal basis is

provided by the basic trigonometric functions. Let T (n) denote the vector space consisting
of all trigonometric polynomials
X

T (x) =

ajk (sin x)j (cos x)k

(5.11)

0 j+k n

of degree n. The constituent monomials (sin x)j (cos x)k obviously span T (n) , but they
do not form a basis owing to identities stemming from the basic trigonometric formula
cos2 x + sin2 x = 1; see Example 2.19 for additional details. Exercise introduced a more
convenient spanning set consisting of the 2 n + 1 functions
1,

cos x,

sin x,

cos 2 x,

sin 2 x,

...

cos n x,

sin n x.

(5.12)

Let us prove that these functions form an orthogonal basis of T (n) with respect to the L2
inner product and norm:
hf ;gi =

f (x) g(x) dx,

kf k =

f (x)2 dx.

(5.13)

The elementary integration formulae

k6
= l,
0,
cos k x cos l x dx =
2 , k = l = 0,

,
k=l6
= 0,

sin k x sin l x dx =

k6
= l,

k=l6
= 0,

cos k x sin l x dx = 0,

(5.14)

which are valid for all nonnegative integers k, l 0, imply the orthogonality relations
h cos k x ; cos l x i = h sin k x ; sin l x i = 0,
k6
= l,

k6
= 0,
k cos k x k = k sin k x k = ,

h cos k x ; sin l x i = 0,

k 1 k = 2 .

(5.15)

Proposition 5.5 now assures us that the functions (5.12) form a basis for T (n) . One
key consequence is that dim T (n) = 2 n + 1 a fact that is not so easy to establish
directly. Orthogonality of the trigonometric functions (5.12) means that we can compute
the coefficients a0 , . . . , an , b1 , . . . , bn of any trigonometric polynomial
p(x) = a0 +

n
X

ak cos k x + bk sin k x

k=1

1/12/04

156

(5.16)
c 2003

Peter J. Olver

by an explicit integration formula. Namely,

Z
Z
1
1
hf ;1i
h f ; cos k x i
=
=
a0 =
f (x) dx,
ak =
f (x) cos k x dx,
k 1 k2
2
k cos k x k2

Z
1
h f ; sin k x i
=
bk =
f (x) sin k x dx,
k 1.
k sin k x k2

(5.17)

These formulae willplay an essential role in the theory and applications of Fourier series;
see Chapter 12.

5.2. The GramSchmidt Process.

Once one becomes convinced of the utility of orthogonal and orthonormal bases, the
natural question follows: How can we construct them? A practical algorithm was first
discovered by Laplace in the eighteenth century. Today the algorithm is known as the
GramSchmidt process, after its rediscovery by Jorgen Gram, who we already met in
Chapter 3, and Erhard Schmidt, a nineteenth century German mathematician. It forms
one of the premier algorithms of applied and computational linear algebra.
Let V denote a finite-dimensional inner product space. (To begin with, the reader can
assume V us a subspace of R m with the standard Euclidean dot product, although the
algorithm will be formulated in complete generality.) We assume that we already know
some basis w1 , . . . , wn of V , where n = dim V . Our goal is to use this information to
construct an orthogonal basis v1 , . . . , vn .
We will construct the orthogonal basis elements one by one. Since initially we are not
worrying about normality, there are no conditions on the first orthogonal basis element v 1
and so there is no harm in choosing
v1 = w 1 .
Note that v1 6
= 0 since w1 appears in the original basis. The second basis vector must be
orthogonal to the first: h v2 ; v1 i = 0. Let us try to arrange this by subtracting a suitable
multiple of v1 , and set
v2 = w 2 c v 1 ,
where c is a scalar to be determined. The orthogonality condition
0 = h v 2 ; v 1 i = h w 2 ; v 1 i c h v 1 ; v 1 i = h w 2 ; v 1 i c k v 1 k2
requires that c =

h w2 ; v 1 i
, and therefore
k v 1 k2
v2 = w 2

h w2 ; v 1 i
v1 .
k v 1 k2

(5.18)

Linear independence of v1 = w1 and w2 ensures that v2 6

= 0. (Check!)
Next, we construct
v3 = w 3 c 1 v1 c 2 v2
1/12/04

157

c 2003

Peter J. Olver

by subtracting suitable multiples of the first two orthogonal basis elements from w 3 . We
want v3 to be orthogonal to both v1 and v2 . Since we already arranged that h v1 ; v2 i = 0,
this requires
0 = h v3 ; v1 i = h w3 ; v1 i c1 h v1 ; v1 i,

0 = h v3 ; v2 i = h w3 ; v2 i c2 h v2 ; v2 i,

and hence
c1 =

h w3 ; v 1 i
,
k v 1 k2

c2 =

h w3 ; v 2 i
.
k v 2 k2

Therefore, the next orthogonal basis vector is given by the formula

v3 = w 3

h w3 ; v 1 i
h w3 ; v 2 i
v1
v2 .
2
k v1 k
k v 2 k2

Continuing in the same manner, suppose we have already constructed the mutually
orthogonal vectors v1 , . . . , vk1 as linear combinations of w1 , . . . , wk1 . The next orthogonal basis element vk will be obtained from wk by subtracting off a suitable linear
combination of the previous orthogonal basis elements:
vk = wk c1 v1 ck1 vk1 .
Since v1 , . . . , vk1 are already orthogonal, the orthogonality constraint
0 = h v k ; v j i = h w k ; v j i c j h vj ; v j i
requires
cj =

h wk ; v j i
k v j k2

for

j = 1, . . . , k 1.

(5.19)

In this fashion, we establish the general GramSchmidt formula

vk = w k

k1
X

j =1

h wk ; v j i
vj ,
k v j k2

k = 1, . . . , n.

(5.20)

The GramSchmidt process (5.20) defines a recursive procedure for constructing the orthogonal basis vectors v1 , . . . , vn . If we are actually after an orthonormal basis u1 , . . . , un ,
we merely normalize the resulting orthogonal basis vectors, setting u k = vk /k vk k for
k = 1, . . . , n.
Example 5.13. The vectors

1
w1 = 1 ,
1
1/12/04

1
w2 = 0 ,
2
158

2
w3 = 2 ,
3
c 2003

(5.21)

Peter J. Olver

are readily seen to form a basis of R 3 . To construct an orthogonal basis (with respect
to the standard
dot

product) using the GramSchmidt procedure, we begin by setting

1
v1 = w1 = 1 . The next basis vector is
1

4
1
1
3
1
w2 v 1
1

v2 = w 2
v = 0
= 3 .
1
k v 1 k2 1
3
2
1
5
3

The last orthogonal basis vector is

1
2
1
3
w v
w v
7
3

v3 = w3 3 21 v1 3 22 v2 = 2
1 14 13 = 32 .
k v1 k
k v2 k
3
3
5
3
1
12
3

The reader can easily validate the orthogonality of v1 , v2 , v3 .

An orthonormal basis is obtained by dividing each vector by its length. Since
r
r

14
7
k v1 k = 3,
,
k v3 k =
.
k v2 k =
3
2
we produce the corresponding orthonormal basis vectors

u1 =

1
13

3
13

u2 =

4
42
1
42
5

u3 =

2
14
3

14
1
14

(5.22)

Example 5.14. Here is a typical sort of problem: Find an orthonormal basis (with
respect to the dot product) for the subspace V R 4 consisting of all vectors which are
T
T
orthogonal to the vector a = ( 1, 2, 1, 3 ) . Now, a vector x = ( x1 , x2 , x3 , x4 ) is
orthogonal to a if and only if
x a = x1 + 2 x2 x3 3 x4 = 0.
Solving this homogeneous linear system by the usual method, we find that the free variables
are x2 , x3 , x4 , and so a (non-orthogonal) basis for the subspace is

2
1
3
1
0
0
w1 =
w 2 = ,
w 3 = .
,
0
1
0
0
0
1

This will, in fact, be a consequence of the successful completion of the GramSchmidt algorithm and does not need to be checked in advance. If the given vectors were not linearly
independent, then eventually one of the GramSchmidt vectors would vanish, and the process will
break down.

1/12/04

159

c 2003

Peter J. Olver

To obtain an orthogonal basis, we apply the GramSchmidt process. First, v 1 = w1 =

1
2
2
5
2
w2 v 1
0 2 1
1

v1 =
. The next element is v2 = w2
=5

. The
2
1
0
0
k v1 k
5
1
0
0
0
0
last element of our orthogonal basis is

1 1

3
2
5
2
32
6

w3 v 2
w3 v 1
0
1 55 1

=
v

v
=
v3 = w 3

.
k v 1 k2 1
k v 2 k2 2 0
5 0 65 1 1
2

An orthonormal basis can then be obtained by dividing each vi by its length:

2
1
1

5
30
10
1
2
2

10
30 ,
5 ,
u1 =
u
=
u
=

.
2
3

5
1
0
30

10
2

0
0
10

(5.23)

The GramSchmidt procedure has one final important consequence. By definition,

every finite-dimensional vector space admits a basis. Given an inner product, the Gram
Schmidt process enables one to construct an orthogonal and even orthonormal basis of the
space. Therefore, we have, in fact, implemented a constructive proof of the existence of
orthogonal and orthonormal bases of finite-dimensional inner product spaces. Indeed, the
construction shows that there are many different orthogonal and hence orthonormal bases.
Theorem 5.15. A finite-dimensional inner product space has an orthonormal basis.
Modifications of the GramSchmidt Process
With the basic GramSchmidt algorithm now in hand, it is worth looking at a couple
of reformulations that have both practical and theoretical uses. The first is an alternative
approach that can be used to directly construct the orthonormal basis vectors u 1 , . . . , un
from the basis w1 , . . . , wn .
We begin by replacing each orthogonal basis vector in the basic GramSchmidt formula (5.20) by its normalized version uj = vj /k vj k. As a result, we find that the original
basis vectors can be expressed in terms of the orthonormal basis via a triangular system
w1 = r11 u1 ,
w2 = r12 u1 + r22 u2 ,
w3 = r13 u1 + r23 u2 + r33 u3 ,
..
..
..
..
.
.
.
.

(5.24)

wn = r1n u1 + r2n u2 + + rnn un .

1/12/04

160

c 2003

Peter J. Olver

The coefficients rij can, in fact, be directly computed without using the intermediate
derivation. Indeed, taking the inner product of the j th equation with the orthonormal
basis vector uj , we find, in view of the orthonormality constraints (5.7),
h wj ; ui i = h r1j u1 + + rjj uj ; ui i = r1j h u1 ; ui i + + rjj h un ; ui i = rij ,
and hence
rij = h wj ; ui i.

(5.25)

On the other hand, according to (5.6),

2
2
2
k wj k2 = k r1j u1 + + rjj uj k2 = r1j
+ + rj1,j
+ rjj
.

(5.26)

The pair of equations (5.25), (5.26) can be rearranged to devise a recursive procedure to
compute the orthonormal basis. At stage j, we assume that we have already constructed
u1 , . . . , uj1 . We then compute
rij = h wj ; ui i,

for each

i = 1, . . . , j 1.

(5.27)

We obtain the next orthonormal basis vector uj by the formulae

rjj =

2 r2
k wj k2 r1j
j1,j ,

uj =

wj r1j u1 rj1,j uj1

. (5.28)
rjj

Running through the formulae (5.27), (5.28) for j = 1, . . . , n leads to the same orthonormal
basis u1 , . . . , un as the previous version of the GramSchmidt process.
Example 5.16. Let us apply the revised algorithm to the vectors

2
1
1
w3 = 2 ,
w2 = 0 ,
w1 = 1 ,
3
2
1

of Example 5.13. To begin, we set

r11 = k w1 k =

u1 =

w1
=
r11

The next step is to compute

r12

1
= h w2 ; u1 i = , r22 =
3

q
2 =
k w2 k2 r12

14
,
3

1
13

3
13

u2 =

w2 r12 u1
=

r22

4
42
1

42
5
42

When j = 1, there is nothing to do.

1/12/04

161

c 2003

Peter J. Olver

The final step yields

r13 = h w3 ; u1 i = 3 ,
r33 =

k w3

2
r13

r23 = h w3 ; u2 i =

2
r23

7
,
2

u3 =

21
,
2

w3 r13 u1 r23 u2
=
r33

2
143

14
1

As advertised, the result is the same orthonormal basis vectors u1 , u2 , u3 found in Example 5.13.
For hand computations, the orthogonal version (5.20) of the GramSchmidt process is
slightly easier even if one does ultimately want an orthonormal basis since it avoids
the square roots that are ubiquitous in the orthonormal version (5.27), (5.28). On the
other hand, for numerical implementation on a computer, the orthonormal version is a bit
faster, as it involves fewer arithmetic operations.
However, in practical, large scale computations, both versions of the GramSchmidt
process suffer from a serious flaw. They are subject to numerical instabilities, and so roundoff errors may seriously corrupt the computations, producing inaccurate, non-orthogonal
vectors. Fortunately, there is a simple rearrangement of the calculation that obviates
this difficulty and leads to a numerically robust algorithm that is used in practice. The
idea is to treat the vectors simultaneously rather than sequentially, making full use of
the orthonormal basis vectors as they arise. More specifically, the algorithm begins as
before we take u1 = w1 /k w1 k. We then subtract off the appropriate multiples of u1
from all of the remaining basis vectors so as to arrange their orthogonality to u 1 . This is
accomplished by setting
(2)

wk = w k h w k ; u 1 i u1 ,
(2)

for

k = 2, . . . , n.

(2)

The second orthonormal basis vector u2 = w2 /k w2 k is then obtained by normalizing.

(2)
(2)
We next modify the remaining vectors w3 , . . . , wn to produce vectors
(2)

(3)

(2)

wk = w k h w k ; u 2 i u2 ,

k = 3, . . . , n,
(3)

(3)

that are orthogonal to both u1 and u2 . Then u3 = w3 /k w3 k is taken as the next

orthonormal basis element, and so on. The full algorithm starts with the initial basis
(1)
vectors wj = wk , k = 1, . . . , n, and then recursively computes
(j)

uj =

(j)
k wj

(j+1)

(j)

= w k h w k ; u j i uj ,

j = 1, . . . n,
k = j + 1, . . . , n.

(5.29)

(In the final phase, when j = n, the second formula is no longer relevant.) The result is a
numerically stable computation of the same orthonormal basis vectors u 1 , . . . , un .
1/12/04

162

c 2003

Peter J. Olver

Example 5.17. Let us apply the stable GramSchmidt process (5.29) to the basis
vectors

2
0
1
(1)
(1)
(1)
w1 = w 1 = 2 ,
w 2 = w 2 = 4 ,
w3 = w3 = 2 .
1
1
3
2
(1)

The first orthonormal basis vector is u1 =

(1)
k w1

3
2
3
13

. Next, we compute

1
(2)
(1)
(1)
(2)
w 3 = w 3 h w 3 ; u 1 i u1 = 0 .
w2
2

12
(2)

w2
1

The second orthonormal basis vector is u2 =

=
2 . Finally,
(2)
k w2 k
0
1
2
2
6
(3)
w3
1
2
(3)
(2)
(2)
w3 = w 3 h w 3 ; u 2 i u2 = 2 ,
u3 =
= 6 .
(3)

k w3 k
2
232
The resulting vectors u1 , u2 , u3 form the desired orthonormal basis.

2
(1)
(1)
= w 2 h w 2 ; u 1 i u1 = 2 ,
0

5.3. Orthogonal Matrices.

Matrices whose columns form an orthonormal basis of R n relative to the standard
Euclidean dot product have a distinguished role. Such orthogonal matrices appear in
a wide range of applications in geometry, physics, quantum mechanics, partial differential equations, symmetry theory, and special functions. Rotational motions of bodies in
three-dimensional space are described by orthogonal matrices, and hence they lie at the
foundations of rigid body mechanics, including satellite and underwater vehicle motions,
as well as three-dimensional computer graphics and animation. Furthermore, orthogonal
matrices are an essential ingredient in one of the most important methods of numerical
linear algebra: the Q R algorithm for computing eigenvalues of matrices, to be presented
in Section 10.6.
Definition 5.18. A square matrix Q is called an orthogonal matrix if it satisfies
QT Q = I .

(5.30)

The orthogonality condition implies that one can easily invert an orthogonal matrix:
Q1 = QT .

(5.31)

In fact the two conditions are equivalent, and hence a matrix is orthogonal if and only if
its inverse is equal to its transpose. The second important characterization of orthogonal
matrices relates them directly to orthonormal bases.
1/12/04

163

c 2003

Peter J. Olver

Proposition 5.19. A matrix Q is orthogonal if and only if its columns form an

orthonormal basis with respect to the Euclidean dot product on R n .
Proof : Let u1 , . . . , un be the columns of Q. Then uT1 , . . . , uTn are the rows of the
transposed matrix QT . The (i, j)th entry of the product QT Q = I is given
as the product
1, i = j,
of the ith row of QT times the j th column of Q. Thus, ui uj = uTi uj =
which
0, i 6
= j,
are precisely the conditions (5.7) for u1 , . . . , un to form an orthonormal basis.
Q.E.D.
Warning: Technically, we should be referring to an orthonormal matrix, not an
orthogonal matrix. But the terminology is so standard throughout mathematics that
we have no choice but to adopt it here. There is no commonly accepted term for a matrix
whose columns form an orthogonal but not orthonormal basis.

a b
is orthogonal if and only if its columns
Example 5.20. A 2 2 matrix Q =
c d

a
b
u1 =
, u2 =
, form an orthonormal basis of R 2 . Equivalently, the requirement
c
d
T

Q Q=

a
b

c
d

a b
c d

a2 + c 2
ac + bd

ac + bd
b 2 + d2

1
0

0
1

implies that its entries must satisfy the algebraic equations

a2 + c2 = 1,

a c + b d = 0,
T

b2 + d2 = 1.
T

The first and last equations say the points ( a, c ) and ( b, d ) lie on the unit circle in R 2 ,
and so
a = cos ,
c = sin ,
b = cos ,
d = sin ,
for some choice of angles , . The remaining orthogonality condition is
0 = a c + b d = cos cos + sin sin = cos( ).
This implies that and differ by a right angle: = 12 . The sign leads to two
cases:
b = sin ,
d = cos ,
or
b = sin ,
d = cos .
As a result, every 2 2 orthogonal matrix has one of two possible forms

cos sin
cos
sin
or
,
where
0 < 2 .
sin cos
sin cos

(5.32)

The corresponding orthonormal bases are illustrated in Figure 5.2. Note that the former
is a right-handed basis which can be obtained from the standard basis e 1 , e2 by a rotation
through angle , while the latter has the opposite, reflected orientation.
1/12/04

164

c 2003

Peter J. Olver

u2
u1

Figure 5.2.

Orthonormal Bases in R 2 .

Example 5.21. A 3 3 orthogonal matrix Q = ( u1 u2 u3 ) is prescribed by 3

mutually perpendicular vectors of unit length in R 3 . For instance,

the orthonormal basis

constructed in (5.22) corresponds to the orthogonal matrix Q =

1
3
1

3
13

A complete list of 3 3 orthogonal matrices can be found in Exercises

4
42
1

42
5

and .

2
14
314
114

Lemma 5.22. An orthogonal matrix has determinant det Q = 1.

Proof : Taking the determinant of (5.30) gives

1 = det I = det(QT Q) = det QT det Q = (det Q)2 ,

which immediately proves the lemma.

Q.E.D.

An orthogonal matrix is called proper if it has determinant + 1. Geometrically, the

columns of a proper orthogonal matrices form a right-handed basis of R n , as defined in
Exercise . An improper orthogonal matrix, with determinant 1, corresponds to a lefthanded basis that lives in a mirror image world.
Proposition 5.23. The product of two orthogonal matrices is also orthogonal.
Proof : If QT1 Q1 = I = QT2 Q2 , then (Q1 Q2 )T (Q1 Q2 ) = QT2 QT1 Q1 Q2 = QT2 Q2 = I ,
and so Q1 Q2 is also orthogonal.
Q.E.D.
This property says that the set of all orthogonal matrices forms a group , known as
the orthogonal group. The orthogonal group lies at the foundation of everyday Euclidean
geometry.

The precise mathematical definition of a group can be found in Exercise . Although they
will not play a significant role in this text, groups are the mathematical formalization of symmetry and, as such, form one of the most fundamental concepts in advanced mathematics and its
applications, particularly quantum mechanics and modern theoretical physics. Indeed, according
to the mathematician Felix Klein, cf. [ 152 ], all geometry is based on group theory.

1/12/04

165

c 2003

Peter J. Olver

The Q R Factorization
The GramSchmidt procedure for orthonormalizing bases of R n can be reinterpreted
as a matrix factorization. This is more subtle than the L U factorization that resulted
from Gaussian elimination, but is of comparable importance, and is used in a broad range
of applications in mathematics, physics, engineering and numerical analysis.
Let w1 , . . . , wn be a basis of R n , and let u1 , . . . , un be the corresponding orthonormal
basis that results from any one of the three implementations of the GramSchmidt process.
We assemble both sets of column vectors to form nonsingular n n matrices
A = ( w1 w2 . . . wn ),

Q = ( u1 u2 . . . un ).

Since the ui form an orthonormal basis, Q is an orthogonal matrix. In view of the matrix
multiplication formula (2.14), the GramSchmidt equations (5.24) can be recast into an
equivalent matrix form:

A = Q R,

0
R=
..
.
0

where

r12
r22
..
.

...
...
..
.

...

r1n
r2n
..

(5.33)

rnn

is an upper triangular matrix, whose entries are the previously computed coefficients
(5.27), (5.28). Since the GramSchmidt process works on any basis, the only requirement on the matrix A is that its columns form a basis of R n , and hence A can be any
nonsingular matrix. We have therefore established the celebrated Q R factorization of
nonsingular matrices.
Theorem 5.24. Any nonsingular matrix A can be factorized, A = Q R, into the
product of an orthogonal matrix Q and an upper triangular matrix R. The factorization
is unique if all the diagonal entries of R are assumed to be positive.
The proof of uniqueness is left to Exercise .

1 1 2
Example 5.25. The columns of the matrix A = 1 0 2 are the same as
1 2 3
the basis vectors considered in Example 5.16. The orthonormal basis (5.22) constructed
using the GramSchmidt algorithm leads to the orthogonal and upper triangular matrices

1
4
2
1
3

42
14
3

1
3 ,
14
21 .
Q=
R
=

3
42
14
3
2
1
5
7
1

3 42 14
0
0
2

The reader may wish to verify that, indeed, A = Q R.

While any of the three implementations of the GramSchmidt algorithm will produce
the Q R factorization of a given matrix A = ( w1 w2 . . . wn ), the stable version, as encoded
1/12/04

166

c 2003

Peter J. Olver

Q R Factorization of a Matrix A
start
for j = 1 to n
q
set rjj =
a21j + + a2nj

if rjj = 0, stop; print A has linearly dependent columns

else for i = 1 to n
set aij = aij /rjj
next i
for k = j + 1 to n
set rjk = a1j a1k + + anj ank
for i = 1 to n
set aik = aik aij rjk
next i

next k
next j
end

in equations (5.29), is the one to use in practical computations, as it is the least likely to
fail due to numerical artifacts arising from round-off errors. The accompanying pseudocode
program reformulates the algorithm purely in terms of the matrix entries a ij of A. During
the course of the algorithm, the entries of the matrix A are successively overwritten; the
final result is the orthogonal matrix Q appearing in place of A. The entries r ij of R must
be stored separately.
Example 5.26. Let us factorize the matrix

2 1 0
1 2 1
A=
0 1 2
0 0 1

0
0

1
2

using the numerically stable Q R algorithm. As in the program, we work directly on the

matrix A, gradually changing it into orthogonal form. In the first loop, we set r 11 = 5
to be the norm of the first column vector of
We then normalize the first column
A.
2
1 0 0
15

2
1
0
. The next entries r =
5
by dividing by r11 ; the resulting matrix is
12

0 1 2 1
0 0 1 2
1/12/04

167

c 2003

Peter J. Olver

4 ,
5

r13 = 15 , r14 = 0, are obtained by taking the dot products of the first column
with the other three columns. For j = 1, 2, 3, we subtract r1j times the first column

2
3
2

0
5
5

15
6
4

0
5
5
th
is a matrix whose first column is

from the j column; the result 5

0
1
2 1

0
0
1 2
normalized to have unit length, and whose second, third and fourth columns are orthogonal
to it. In the next loop, we normalize the second column by dividing by its norm r 22 =

370 25 0
5
1

0
5
14
5
70

. We then take dot products of

, and so obtain the matrix

5
5

2
1
70

0
0
1 2

5
the second column with the remaining two columns to produce r23 = 1670 , r24 = 14
.
Subtracting these multiples of the second column from the third and fourth columns, we
2
3
2

7
14
5
70
1
4
3
6

7
7
5
70
obtain
, which now has its first two columns orthonormalized,

9
6
5

0
7
14
70

0
0
1
2
and orthogonal to the last
two
columns. We then normalize
the third column by dividing

3
2
3
2

70
14
105
5

1
6
4
3

7
70
105
, and so 5
by r33 = 15
. Finally, we subtract r34 = 20
7
105
5
6
9
0

70
105
7
0
0
2
105
times the third column from the fourth column. Dividing the resulting fourth column by
its norm r44 = 56 results in the final formulas

2
5
1

Q=
0

370
6
70
5

2
105
4
105
6
105
7
105

for the A = Q R factorization.

130
2
30
330
4
30

R=
0

4
5
14
5

0
0

1
5
16

70
15
7

5
14
20

105

5
6

The Q R factorization can be used as an alternative to Gaussian elimination to solve

linear systems. Indeed, the system
Ax = b

becomes

Q R x = b,

and hence

R x = QT b,

(5.34)

since Q1 = QT is an orthogonal matrix. Since R is upper triangular, the latter system

1/12/04

168

c 2003

Peter J. Olver

can be solved for x by back substitution. The resulting algorithm, while more expensive
to compute, does offer some numerical advantages over traditional Gaussian elimination
as it is less prone to inaccuracies resulting from ill-conditioned coefficient matrices.
Example 5.27. Let us apply the A = Q R factorization
1

4
2
3 13
3
42
14
1 1 2

1 0 2 = 1
1
3 0
14

3
42
14
3
1 2 3
1
5
1
3 42 14
0
0

21 ,

2
7
2

that we found in Example 5.25 to solve the linear system A x = ( 0, 4, 5 ) . We first

compute
1

1
1

3
3
0
3
3
3

4
4 = 21 .
1
5
QT b =
2
42

42
42

7
3
1
2

2
14

We then solve the upper triangular system

3 13 3
3 3
x

21
14
21
y =

Rx =
2
0

3
2
7
7
z

0
0
2
2
T

by back substitution, leading to the solution x = ( 2, 0, 1 ) .

5.4. Orthogonal Polynomials.

Orthogonal and orthonormal bases play, if anything, an even more essential role in
the analysis on function spaces. Unlike the Euclidean space R n , most obvious bases of
a (finite dimensional) function space are typically not orthogonal with respect to any
natural inner product. Thus, the computation of an orthonormal basis of functions is a
critical step towards simplifying the subsequent analysis. The GramSchmidt process can
be applied in the same manner, and leads to the classical orthogonal polynomials that
arise in approximation and interpolation theory. Other orthogonal systems of functions
play starring roles in Fourier analysis and its generalizations, in quantum mechanics, in
the solution of partial differential equations by separation of variables, and a host of other
applications.
In this section, we concentrate on orthogonal polynomials. Orthogonal systems of
trigonometric functions will appear in Chapters 12 and 13. Orthogonal systems of special
functions, including Bessel functions and spherical harmonics, are used in the solution to
linear partial differential equations in Chapters 17 and 18.
The Legendre Polynomials
We shall construct an orthonormal basis for the vector space P (n) consisting of all
polynomials of degree n. For definiteness the construction will be based on the particular
1/12/04

169

c 2003

Peter J. Olver

L2 inner product
hp;qi =

p(t) q(t) dt.

(5.35)

The method will work for any other bounded interval, but choosing [ 1, 1 ] will lead us to a
particularly important case. We shall apply the GramSchmidt orthogonalization process
to the elementary, but non-orthogonal monomial basis 1, t, t2 , . . . tn . Because

Z 1
2

,
k + l even,
k+l
k l
t
dt =
ht ;t i =
(5.36)
k+l+1

1
0,
k + l odd,

odd degree monomials are orthogonal to even degree monomials, but that is all. Let
q0 (t), q1 (t), . . . , qn (t) denote the orthogonal polynomials that result from applying the
GramSchmidt process to the non-orthogonal monomial basis 1, t, t 2 , . . . , tn . We begin
by setting
Z 1
2
q0 (t) = 1,
k q0 k =
q0 (t)2 dt = 2.
1

According to (5.18), the next orthogonal basis polynomial is

q1 (t) = t

h t ; q0 i
q (t) = t,
k q 0 k2 0

k q 1 k2 =

2
3

In general, the GramSchmidt formula (5.20) says we should define

qk (t) = t

k1
X

j =0

h tk ; q j i
q (t)
k q j k2 j

for

k = 1, 2, . . . .

We can then recursively compute the next few polynomials

q2 (t) = t2 31 ,

q3 (t) = t3 35 t,
4

q4 (t) = t

6 2
7 t

k q 2 k2 =

3
35

k q 3 k2 =
2

k q4 k =

8
45 ,
8
175 ,
128
11025

(5.37)
,

and so on. The reader can verify that they satisfy the orthogonality conditions
Z 1
qi (t) qj (t) dt = 0,
i6
= j.
h qi ; q j i =
1

The resulting polynomials q0 , q1 , q2 , . . . are known as the monic Legendre polynomials, in

honor of the 18th century French mathematician AdrienMarie Legendre who used them
to study Newtonian gravitation. Since the first n of them, namely q 0 , . . . , qn1 span the

A polynomial is called monic if its leading coefficient is equal to 1.

1/12/04

170

c 2003

Peter J. Olver

subspace P (n1) of polynomials of degree n 1, the next one, qn , is the unique monic
polynomial that is orthogonal to every polynomial of degree n 1:
h tk ; qn i = 0,

k = 0, . . . , n 1.

(5.38)

Since the monic Legendre polynomials form a basis for the space of polynomials, one
can uniquely rewrite any polynomial of degree n as a linear combination:
p(t) = c0 q0 (t) + c1 q1 (t) + + cn qn (t).

(5.39)

In view of the general orthogonality formula (5.8), the coefficients are simply given by
inner products
Z 1
h p ; qk i
1
ck =
=
p(t) qk (t) dt,
k = 0, . . . , n.
(5.40)
k q k k2
k qk k2 1
For example,
t4 = q4 (t) + 67 q2 (t) + 15 q0 (t) = (t4 67 t2 +

3
35 )

6
7

(t2 13 ) + 51 .

The coefficients can either be obtained directly, or via (5.40); for example,
11025
c4 =
128

175
c3 =
8

t q4 (t) dt = 1,

1
1

t4 q3 (t) dt = 0.

The classical Legendre polynomials are certain scalar multiples, namely

Pk (t) =

(2 k)!
q (t),
(k!)2 k

k = 0, 1, 2, . . . ,

(5.41)

of the orthogonal basis polynomials. The multiple is fixed by the requirement that
Pk (1) = 1,

(5.42)

which is not so important here, but does play a role in other applications. The first few
classical Legendre polynomials are
k P0 k2 = 2,
k P1 k2 = 32 ,

P0 (t) = 1,
P1 (t) = t,
P2 (t) =
P3 (t) =
P4 (t) =
P5 (t) =
P6 (t) =

3 2
1
2t 2,
5 3
3
2 t 2 t,
35 4
15 2
3
8 t 4 t + 8,
35 3
15
63 5
8 t 4 t + 8 t.
231 6
315 4
105 2
16 t 16 t + 16 t

k P2 k2 = 52 ,

k P3 k2 = 27 ,
k P4 k2 = 92 ,

5
16

k P 5 k2 =

k P 6 k2 =

2
11 ,
2
13 ,

and are graphed in Figure 5.3. There is, in fact, an explicit formula for the Legendre polynomials, due to the early nineteenth century Portuguese mathematician Olinde Rodrigues.
1/12/04

171

c 2003

Peter J. Olver

-1

1.5

0.5

-0.5

0.5

-1

-0.5

0.5

-1

-0.5

-1

-1.5

1.5

0.5

-0.5

0.5

-1

-0.5

0.5

-1

-0.5

-1

-1.5

Figure 5.3.

0.5

The Legendre Polynomials P0 (t), . . . , P5 (t).

Theorem 5.28. The Rodrigues formula for the classical Legendre polynomials is
r
1
dk
2
2
k
(5.43)
Pk (t) = k
(t 1) ,
k Pk k =
,
k = 0, 1, 2, . . . .
k
2 k! dt
2k + 1
Thus, for example,
P4 (t) =
Proof : Let

d4 2
1 d4 2
1
4
(t

1)
=
(t 1)4 =
16 4! dt4
384 dt4
Rj,k (t) =

35 4
8 t

15 2
4 t

dj 2
(t 1)k ,
dtj

+ 83 .

(5.44)

which is evidently a polynomial of degree 2 kj. In particular, the Rodrigues formula (5.43)
claims that Pk (t) is a multiple of Rk,k (t). Note that
d
R (t) = Rj+1,k (t).
dt j,k

(5.45)

Moreover,
Rj,k (1) = 0 = Rj,k ( 1)

whenever

j < k,

(5.46)

since, by the product rule, differentiating (t2 1)k a total of j < k times still leaves at
least one factor of t2 1 in each summand, which therefore vanishes at t = 1.
Lemma 5.29. If j k, then the polynomial Rj,k (t) is orthogonal to all polynomials
of degree j 1.
1/12/04

172

c 2003

Peter J. Olver

Proof : In other words,

Z
i
h t ; Rj,k i =

1
1

ti Rj,k (t) dt = 0,

for all

(5.47)

0 i < j k.

0
Since j > 0, we use (5.45) to write Rj,k (t) = Rj1,k
(t). Integrating by parts,
i

h t ; Rj,k i =

1
1

0
ti Rj1,k
(t) dt

= it Rj1,k (t)
i

t = 1

1
1

ti1 Rj1,k (t) dt = i h ti1 ; Rj1,k i,

where the boundary terms vanish owing to (5.46). We then repeat the process, and eventually
h ti ; Rj,k i = i h ti1 ; Rj1,k i

= i(i 1) h ti2 ; Rj2,k i = = (1)i i (i 1) 3 2 h 1 ; Rji,k i

Z 1
1

i
i
= (1) i !
Rji,k (t) dt = (1) i ! Rji1,k (t)
= 0,
t = 1

by (5.46), and since j > i.

Q.E.D.

In particular, Rk,k (t) is a polynomial of degree k which is orthogonal to every polynomial of degree k 1. By our earlier remarks, this implies that it is a constant multiple,
Rk,k (t) = ck Pk (t)
of the k th Legendre polynomial. To determine ck , we need only compare the leading terms:
Rk,k (t) =

dk 2 k
(2 k)! k
(2 k)!
dk 2
k
(t
1)
=
(t + ) =
t + , while Pk (t) = k t2 k + .
k
k
2
dt
dt
(k!)
2 k!

We conclude that ck = 2k k!, which proves (5.43). The proof of the formula for k Pk k can
be found in Exercise .
Q.E.D.
The Legendre polynomials play an important role in many aspects of applied mathematics, including numerical analysis, least squares approximation of functions, and solution
of partial differential equations.
Other Systems of Orthogonal Polynomials
The standard Legendre polynomials form an orthogonal system with respect to the L 2
inner product on the interval [ 1, 1 ]. Dealing with any other interval, or, more generally,
a weighted inner product between functions on an interval, leads to a different, suitably
adapted collection of orthogonal polynomials. In all cases, applying the GramSchmidt
process to the standard monomials 1, t, t2 , t3 , . . . will produce the desired orthogonal system.
1/12/04

173

c 2003

Peter J. Olver

Example 5.30. In this example, we construct orthogonal polynomials for the weighted
inner product
Z
hf ;gi =

f (t) g(t) e t dt

(5.48)

on the interval [ 0, ). A straightforward integration by parts proves that

Z
tk e t dt = k!,
and hence
h ti ; tj i = (i + j)!
k ti k2 = (2 i)!

(5.49)

We apply the GramSchmidt process to construct a system of orthogonal polynomials for

this inner product. The first few are
q0 (t) = 1,
h t ; q0 i
q (t) = t 1,
k q 0 k2 0
h t2 ; q 0 i
h t2 ; q 1 i
q2 (t) = t2
q
(t)

q (t) = t2 4 t + 2,
k q 0 k2 0
k q 1 k2 1
q1 (t) = t

q3 (t) = t3 9 t2 + 18 t 6,

k q0 k2 = 1,
k q1 k2 = 1,
k q 2 k2 = 4 ,
k q3 k2 = 36 .

The resulting orthogonal polynomials are known as the (monic) Laguerre polynomials,
named after the nineteenth century French mathematician Edmond Laguerre.
In some cases, a change of variables may be used to relate systems of orthogonal polynomials and thereby circumvent the GramSchmidt computation. Suppose, for instance,
that our goal is to construct an orthogonal system of polynomials for the L 2 inner product
Z b
f (t) g(t) dt on the interval [ a, b ]. The key remark is that we can map the
hh f ; g ii =
a

interval [ 1, 1 ] to [ a, b ] by a simple linear change of variables of the form s = + t.

Specifically,
s=

2t b a
ba

will change

atb

1 s 1.

(5.50)

The map changes functions F (s), G(s), defined for 1 s 1, into the functions

2t b a
2t b a
f (t) = F
,
g(t) = G
,
(5.51)
ba
ba
defined for a t b. Moreover, interpreting (5.50) as a change of variables for the
2
integrals, we have ds =
dt, and so the inner products are related by
ba

Z b
Z b
2t b a
2t b a
hf ;gi =
f (t) g(t) dt =
F
G
dt
ba
ba
a
a
(5.52)
Z 1
ba
ba
F (s) G(s)
=
ds =
h F ; G i,
2
2
1
1/12/04

174

c 2003

Peter J. Olver

where the final L2 inner product is over the interval [ 1, 1 ]. In particular, the change of
variables maintains orthogonality, while rescaling the norms:
r
ba
(5.53)
k F k.
h f ; g i = 0 if and only if h F ; G i = 0,
kf k =
2
Moreover, if F (s) is a polynomial of degree n in s, then f (t) is a polynomial of degree n in t
and vice versa. Applying these observations to the Legendre polynomials, we immediately
deduce the following.
Proposition 5.31. The transformed Legendre polynomials
r

2
t

a
ba
Pek (t) = Pk
,
,
k = 0, 1, 2, . . . ,
k Pek k =
ba
2k + 1

(5.54)

form an orthogonal system of polynomials with respect to the L2 inner product on the
interval [ a, b ].
Z 1
2
Example 5.32. As an example, consider the L inner product hh f ; g ii =
f (t) g(t) dt
0

on the interval [ 0, 1 ]. The map s = 2 t 1 will change 0 t 1 to 1 s 1. According

to Proposition 5.31, this change of variables will convert the Legendre polynomials P k (s)
into an orthogonal system of polynomials
r
1
2
e
e
.
P k (t) = Pk (2 t 1),
with corresponding L norms
kPk k =
2k + 1
on the interval [ 0, 1 ]. The first few are
Pe0 (t) = 1,

Pe1 (t) = 2 t 1,

Pe2 (t) = 6 t2 6 t + 1,

Pe3 (t) = 20 t3 30 t2 + 12 t 1,

Pe4 (t) = 70 t4 140 t3 + 90 t2 20 t + 1,

Pe5 (t) =

63 5
8 t

35 3
4 t

15
8

(5.55)

One can, as an alternative, derive these formulae through a direct application of the Gram
Schmidt process.

5.5. Orthogonal Projections and Least Squares.

In Chapter 4, we introduced, solved and learned the significance of the problem of
finding the point on a prescribed subspace that lies closest to a given point. In this
section, we shall discover an important geometrical interpretation of our solution: the
closest point is the orthogonal projection of the point onto the subspace. Furthermore,
if we adopt an orthogonal, or, even better, orthonormal basis for the subspace, then the
closest point can be constructed through a very elegant, explicit formula. In this manner,
orthogonality allows us to effectively bypass the normal equations and solution formulae
that were so laboriously computed in Chapter 4. The resulting orthogonal projection
formulae have important practical consequences for the solution of a wide range of least
squares minimization problems.
1/12/04

175

c 2003

Peter J. Olver

W
Figure 5.4.

The Orthogonal Projection of a Vector onto a Subspace.

Orthogonal Projection
We begin by characterizing the orthogonal projection of a vector onto a subspace.
Throughout this section, we will consider a prescribed finite-dimensional subspace W V
of a real inner product space V . While the subspace is necessarily finite-dimensional, the
inner product space itself may be infinite-dimensional. Initially, though, you may wish to
concentrate on V = R m with the ordinary Euclidean dot product, which is the easiest case
to visualize as it coincides with our geometric intuition, as in Figure 5.4.
A vector z V is said to be orthogonal to the subspace W if it is orthogonal to every
vector in W , so h z ; w i = 0 for all w W . Given a basis w1 , . . . , wn for W , we note that
z is orthogonal to W if and only if it is orthogonal to every basis vector: h z ; w i i = 0 for
i = 1, . . . , n. Indeed, any other vector in W has the form w = c1 w1 + + cn wn and
hence, by linearity, h z ; w i = c1 h z ; w1 i + + cn h z ; wn i = 0, as required.
Definition 5.33. The orthogonal projection of v onto the subspace W is the element
w W that makes the difference z = v w orthogonal to W .
As we shall see, the orthogonal projection is unique. The explicit construction is
greatly simplified by taking a orthonormal basis of the subspace, which, if necessary, can be
arranged by applying the GramSchmidt process to a known basis. (A direct construction
of the orthogonal projection in terms of a general basis appears in Exercise .)
Theorem 5.34. Let u1 , . . . , un be an orthonormal basis for the subspace W V .
Then the orthogonal projection of a vector v V onto W is
w = c 1 u1 + + c n un

where

ci = h v ; ui i,

i = 1, . . . , n.

(5.56)

Proof : First, since u1 , . . . , un form a basis of the subspace, the orthogonal projection
element w = c1 u1 + + cn un must be some linear combination thereof. Definition 5.33
1/12/04

176

c 2003

Peter J. Olver

requires that the difference z = vw be orthogonal to W . It suffices to check orthogonality

to the basis vectors of W . By our orthonormality assumption, for each 1 i n,
0 = h z ; u i i = h v ; u i i h w ; u i i = h v ; u i i h c 1 u1 + + c n un ; u i i
= h v ; u i i c 1 h u1 ; u i i c n h un ; u i i = h v ; u i i c i .
We deduce that the coefficients ci = h v ; ui i of the orthogonal projection w are uniquely
prescribed by the orthogonality requirement.
Q.E.D.
More generally, if we employ an orthogonal basis v1 , . . . , vn for the subspace W , then
the same argument demonstrates that the orthogonal projection of v onto W is given by
w = a 1 v1 + + a n vn ,

where

ai =

h v ; vi i
,
k v i k2

i = 1, . . . , n.

(5.57)

Of course, we could equally well replace the orthogonal basis by the orthonormal basis
obtained by dividing each vector by its length: ui = vi /k vi k. The reader should be able
to prove that the two formulae (5.56), (5.57) for the orthogonal projection yield the same
vector w.
Example 5.35. Consider the plane W R 3 spanned by the orthogonal vectors

1
1
v 2 = 1 .
v1 = 2 ,
1
1
T

According to formula (5.57), the orthogonal projection of v = ( 1, 0, 0 ) onto W is

1
1
1
h v ; v1 i
h v ; v2 i
1
1 2

w=
v +
v =
2 +
1 = 0 .
k v 1 k2 1
k v 2 k2 2
6
3
1
1
1
2
Alternatively, we can replace v1 , v2 by the orthonormal basis

u1 =

v1
=
k v1 k

1
62

6
1
6

u2 =

Then, using the orthonormal version (5.56),

1
w = h v ; u 1 i u1 + h v ; u 2 i u2 =
6

1
62

6
1
6

v2
=
k v2 k

+ 1

1
3
1

3
1
3

1
3
1
3
1
3

2
= 0 .

1
2

The answer is, of course, the same. As the reader may notice, while the theoretical formula
is simpler when written in an orthonormal basis, for hand computations the orthogonal basis version avoids dealing with square roots. (Of course, when performing the computation
on a computer, this is not a significant issue.)
1/12/04

177

c 2003

Peter J. Olver

An intriguing observation is that the coefficients in the orthogonal projection formulae

(5.56) and (5.57) coincide with the formulae (5.5), (5.8) for writing a vector in terms of an
orthonormal or orthogonal basis. Indeed, if v were an element of W , then it would coincide
with its orthogonal projection, w = v (why?). As a result, the orthogonal projection
formulae include the orthogonal basis formulae as a special case.
It is also worth noting that the same formulae occur in the GramSchmidt algorithm,
(5.19). This observation leads to a useful geometric interpretation for the GramSchmidt
construction. For each k = 1, . . . , n, let
Vk = span {w1 , . . . , wk } = span {v1 , . . . , vk } = span {u1 , . . . , uk }

(5.58)

denote the k-dimensional subspace spanned by the first k basis elements. The basic Gram
Schmidt formula (5.20) can be rewritten in the form vk = wk yk , where yk is the orthogonal projection of wk onto the subspace Vk1 . The resulting vector vk is, by construction,
orthogonal to the subspace, and hence orthogonal to all of the previous basis elements,
which serves to rejustify the GramSchmidt construction.
Orthogonal Least Squares
Now we make an important connection: The orthogonal projection of a vector onto a
subspace is also the least squares vector the closest point in the subspace!
Theorem 5.36. Let W V be a finite-dimensional subspace of an inner product
space. Given a vector v V , the closest point or least squares minimizer w W is the
same as the orthogonal projection of v onto W .
Proof : Let w W be the orthogonal projection of v onto the subspace, which requires
e W is any other vector in
that the difference z = v w be orthogonal to W . Suppose w
the subspace. Then,
e k2 = k w + z w
e k2 = k w w
e k2 + 2 h w w
e ; z i + k z k2 = k w w
e k2 + k z k 2 .
kv w

e ; z i = 0 vanishes because z is orthogonal to every vector

The inner product term h w w
e Since z = v w is uniquely prescribed by the vector v, the second
in W , including w w.
2
e W . Therefore, k v w
e k2
term k z k does not change with the choice of the point w
2
e k is minimized. Since w
e W is allowed to be
will be minimized if and only if k w w
2
e k = 0 occurs when w
e = w.
any element of the subspace W , the minimal value k w w
e coincides with the orthogonal projection w.
Thus, the closest point w
Q.E.D.

In particular, if we are supplied with an orthonormal or orthogonal basis of our subspace, then we can compute the closest least squares point w W to v using our orthogonal
projection formulae (5.56) or (5.57). In this way, orthogonal bases have a very dramatic
simplifying effect on the least squares approximation formulae. They completely avoid the
construction of and solution to the much more complicated normal equations.

Example 5.37. Consider the least squares problem of finding the closest point w to
T
the vector v = ( 1, 2, 2, 1 ) in the three-dimensional subspace spanned by the orthogonal

We use the ordinary Euclidean norm on R 4 throughout this example.

1/12/04

178

c 2003

Peter J. Olver

vectors v1 = ( 1, 1, 2, 0 ) , v2 = ( 0, 2, 1, 2 ) , v3 = ( 1, 1, 0, 1 ) . Since the spanning

vectors are orthogonal (but not orthonormal), we can use the orthogonal projection formula (5.57) to find the linear combination w = a1 v1 + a2 v2 + a3 v3 . Thus,
h v ; v2 i
h v ; v3 i
4
4
a3 =
= ,
= ,
2
2
k v2 k
9
k v3 k
3

31 13 4 T
is the closest point to v in the subspace.
and so w = 12 v1 + 49 v2 + 34 v3 = 11
6 , 18 , 9 , 9
a1 =

1
3
h v ; v1 i
= = ,
2
k v1 k
6
2

a2 =

Even when we only know a non-orthogonal basis for the subspace, it may still be a
good strategy to first use GramSchmidt to replace it by an orthogonal or even orthonormal
basis, and then apply the orthogonal projection formulae (5.56), (5.57) to calculate the
least squares point. Not only does this simplify the final computation, it will often avoid
the ill-conditioning and numerical inaccuracies that sometimes afflict the direct solution to
the normal equations (4.26). The following example illustrates this alternative procedure.

Example 5.38. Let us return to the problem, solved in Example 4.6, of finding the
T
T
closest point on plane V spanned by w1 = ( 1, 2, 1 ) , w2 = ( 2, 3, 1 ) to the point
T
b = ( 1, 0, 0 ) . We proceed now by first using the GramSchmidt process to compute an
orthogonal basis

5
1
2
w v

v2 = w2 2 21 w1 = 2 ,
v1 = w 1 = 2 ,
k v1 k
1
3
2

for our subspace. Therefore, applying the orthogonal projection formula (5.57), the closest
point is

3
b v1
b v2
1
v? =
v
+
v
=
15 ,
k v 1 k2 1 k v 2 k2 2
7
15

reconfirming our earlier result. By this device, we have managed to circumvent the tedious
solving of linear equations.
Let us revisit the problem, described in Section 4.4, of approximating experimental
data by a least squares minimization procedure. The required calculations are significantly
simplified by the introduction of an orthogonal basis of the least squares subspace. Given
sample points t1 , . . . , tm , let
T

tk = tk1 , tk2 , . . . , tkm ,

k = 0, 1, 2, . . .

be the vectors obtained by sampling the monomial tk . More generally, sampling a polynomial
y = p(t) = 0 + 1 t + + n tn
(5.59)
results in the self-same linear combination
T

p = ( p(t1 ), . . . , p(tn ) ) = 0 t0 + 1 t1 + + n tn
1/12/04

179

c 2003

(5.60)
Peter J. Olver

of monomial samplevectors. We
that the sampled polynomial vectors form a
conclude
m
subspace W = span t0 , . . . , tn R spanned by the monomial sample vectors.
T

Let y = ( y1 , y2 , . . . , ym ) denote data measured at the sample points. The polynomial least squares approximation to the given data is, by definition, the polynomial y = p(t)
whose corresponding sample vector p W is the closest point or, equivalently, the orthogonal projection of the data vector y onto the subspace W . The sample vectors t 0 , . . . , tn
are not orthogonal, and so the direct approach requires solving the normal equations (4.33)
in order to find the desired polynomial least squares coefficients 0 , . . . , n .
An alternative method is to first use the GramSchmidt procedure to construct an
orthogonal basis for the subspace W , from which the least squares coefficients are found
by simply taking appropriate inner products. Let us adopt the rescaled version
m
1 X
hv;wi =
v w = vw
m i=1 i i

(5.61)

of the standard dot product on R m . If v, w represent the sample vectors corresponding

to the functions v(t), w(t), then their inner product h v ; w i is equal to the average value
of the product function v(t) w(t) on the m sample points. In particular, the inner product
between our monomial basis vectors corresponding to sampling t k and tl is
m
m
1 X k+l
1 X k l
h tk ; t l i =
t t =
t
= tk+l ,
m i=1 i i
m i=1 i

(5.62)

which is the averaged sample value of the monomial tk+l .

To keep the formulae reasonably simple, let us further assume that the sample points
are evenly spaced and symmetric about 0. The second requirement means that if t i is a
sample point, so is ti . An example would be the seven sample points 3, 2, 1, 0, 1, 2, 3.
As a consequence of these two assumptions, the averaged sample values of the odd powers
of t vanish: t2i+1 = 0. Hence, by (5.62), the sample vectors tk and tl are orthogonal
whenever k + l is odd.
Applying the GramSchmidt algorithm to t0 , t1 , t2 , . . . produces the orthogonal basis
T
vectors q0 , q1 , q2 , . . . . Each qk = ( qk (t1 ), . . . , qk (tm ) ) can be interpreted as the sample
vector for a certain interpolating polynomial qk (t) of degree k. The first few polynomials
qk (t), their corresponding orthogonal sample vectors, along with their squared norms,

For weighted least squares, we would adopt an appropriately weighted inner product.

The method works without these particular assumptions, but the formulas become more
unwieldy; see Exercise .

1/12/04

180

c 2003

Peter J. Olver

-3

-2

-1

3
-3

-2

-1

-3

-2

-1

-1
-1

Linear

-1

Quadratic
Figure 5.5.

Cubic

Least Squares Data Approximations.

k qk k2 = qk (t)2 , follow:
k q0 k2 = 1,

q0 (t) = 1,

q 0 = t0 ,

q1 (t) = t,

q 1 = t1 ,

q2 (t) = t2 t2 ,

q 2 = t 2 t 2 t0 ,

k q 1 k2 = t2 ,

2
(5.63)
k q 2 k 2 = t 4 t2 ,
2
t4
t4
t4
q3 (t) = t3 t ,
q 3 = t 3 t1 ,
k q 3 k2 = t6
.
t2
t2
t2
With these in hand, the least squares approximating polynomial of degree n to the
given data vector y is given by a linear combination
p(t) = a0 q0 (t) + a1 q1 (t) + a2 q2 (t) + + an qn (t).

(5.64)

The required coefficients are obtained directly through the orthogonality formulae (5.57),
and so
q y
h qk ; y i
= k .
(5.65)
ak =
2
k qk k
q2
k

An additional advantage of the orthogonal basis approach, beyond the fact that one
can write down explicit formulas for the coefficients, is that the same coefficients a j appear
in all the least squares formulae, and hence one can readily increase the degree, and,
presumably, the accuracy, of the approximating polynomial without having to recompute
any of the lower degree terms. For instance, if a quadratic approximant a 0 + a1 q1 (t) +
a2 q2 (t) looks insufficiently close, one can add in the cubic term a3 q3 (t) with a3 given
by (5.65) for k = 3, without having to recompute the quadratic coefficients a 0 , a1 , a2 .
This simplification is not valid when using the non-orthogonal basis elements, where the
lower order coefficients will change whenever the degree of the approximating polynomial
is increased.
.
Example 5.39. Consider the following tabulated sample values:

1/12/04

1.4

1.3

1.8

2.9

c 2003

Peter J. Olver

181

To compute polynomial least squares fits of degrees 1, 2 and 3, we begin by computing the
polynomials (5.63), which for the given sample points ti are
q0 (t) = 1,

q1 (t) = t,

k q0 k2 = 1,

k q1 k2 = 4,

q2 (t) = t2 4,
k q2 k2 = 12,

q3 (t) = t3 7 t ,

k q 3 k2 =

216
7 .

Thus, to four decimal places, the coefficients for the least squares approximation (5.64) are
a0 = h q0 ; y i = 0.3429,

a2 =

1
12

h q2 ; y i = 0.0738,

a1 =

1
4

a3 =

7
216

h q1 ; y i = 0.7357,

h q3 ; y i = 0.0083.

To obtain the best linear approximation, we use

p1 (t) = a0 q0 (t) + a1 q1 (t) = 0.3429 + 0.7357 t,
with a least squares error of 0.7081. Similarly, the quadratic and cubic least squares
approximations are
p2 (t) = 0.3429 + 0.7357 t + 0.0738 (t2 4),

p3 (t) = 0.3429 + 0.7357 t + 0.0738 (t2 4) 0.0083 (t3 7 t),

with respective least squares errors 0.2093 and 0.1697 at the sample points. A plot of the
three approximations appears in Figure 5.5. The cubic term does not significantly increase
the accuracy of the approximation, and so this data probably comes from sampling a
quadratic function.
Orthogonal Polynomials and Least Squares
In a similar fashion, the orthogonality of Legendre polynomials and more general
orthogonal functions serves to simplify the construction of least squares approximants
in function space. As an example, let us reconsider the problem, from Chapter 4, of
approximating et by a polynomial of degree n. For the interval 1 t 1, we write the
best least squares approximant as a linear combination of Legendre polynomials,

p(t) = a0 P0 (t) + a1 P1 (t) + + an Pn (t) = a0 + a1 t + a2 32 t2 12 + . (5.66)

Since the Legendre polynomials form an orthogonal basis, the least squares coefficients can
be immediately computed by the inner product formula (5.57), so
Z 1
2k + 1
h et ; Pk i
et Pk (t) dt.
=
ak =
2
k Pk k
2
1

For example, the quadratic approximant is obtained from the first three terms in (5.66),
where

Z
Z
1 1 t
1
1
3 1 t
3
a0 =
e dt =
e
' 1.175201,
a1 =
t e dt =
' 1.103638,
2 1
2
e
2 1
e

Z
5 1 3 2 1 t
7
5
a2 =
t 2 e dt =
e
' .357814.
2 1 2
2
e
1/12/04

182

c 2003

Peter J. Olver

2.5

-1

-0.5

2.5

1.5

0.5

-1

-0.5

0.5

-1

-0.5

0.5

Quadratic Least Squares Approximation to et .

Figure 5.6.
Therefore

et 1.175201 + 1.103638 t + .357814

2
2 t

1
2

(5.67)

gives the quadratic least squares approximation to et on [ 1, 1 ]. Graphs appear in

Figure 5.6; the first graph shows et , the second the quadratic approximant (5.67), and
the third lays the two graphs on top of each other.
As in the discrete case, there are two major advantages of the orthogonal Legendre
approach over the direct approach presented in Example 4.21. First, we do not need
to solve any linear systems of equations. Indeed, the coefficient matrix for polynomial
least squares approximation based on the monomial basis is some variant of the notoriously ill-conditioned Hilbert matrix, (1.67), and the computation of an accurate solution
is particularly tricky. Our precomputation of an orthogonal system of polynomials has
successfully circumvented the dangerous Hilbert matrix system.
The second advantage was already mentioned in the preceding subsection. Unlike the
direct approach, the coefficients ak do not change if we desire to go to higher accuracy by
increasing the degree of the approximating polynomial. For instance, in the first case, if the
quadratic approximation
5 3 3 (5.67) is not accurate enough, we can add in a cubic correction
a3 P3 (t) = a3 2 t 2 t , where we compute the required coefficient by
7
a3 =
2

7
2 t t e dt = 2
3

3
2

5
37 e
e

' .070456.

We do not need to recompute the coefficients a0 , a1 , a2 . The successive Legendre polynomial coefficients decrease fairly rapidly:
a0 ' 1.175201,
a4 ' .009965,

a1 ' 1.103638,
a5 ' .001100,

a2 ' .357814,
a6 ' .000099,

a3 ' .070456,

leading to greater and greater accuracy in the least squares approximation. An explanation
will appear in Chapter 12.
If we switch to another norm, then we need to construct an associated set of orthogonal polynomials to apply the method. For instance, the polynomial least squares
approximation of degree n to a function f (t) with respect to the L2 norm on [ 0, 1 ] has
the form a0 + a1 Pe1 (t) + a2 Pe2 (t) + + an Pen (t), where Pe1 (t) are the rescaled Legendre
1/12/04

183

c 2003

Peter J. Olver

polynomials (5.55), and, by orthogonality,

h f ; Pek i
ak =
= (2 k + 1)
k Pe k2
k

For the particular function et , we find

Z 1
et dt = e 1 ' 1.718282,
a0 =
0

a2 = 5

a1 = 3

Z
Z

1
0

f (t) Pek (t) dt.

1
0

(2 t 1) et dt = 3(3 e) ' .845155,

(6 t2 6 t + 1)et dt = 5(7 e 19) ' .139864.

Thus, the best quadratic least squares approximation is

p?2 (t) = 1.718282 + .845155 (2 t 1) + .139864 (6 t2 6 t + 1)
= 1.012991 + .851125 t + .839184 t2 .

It is worth emphasizing that this is the same approximating polynomial as we computed

in (4.60). The use of an orthogonal system of polynomials merely streamlines the computation.

5.6. Orthogonal Subspaces.

We now extend the notion of orthogonality from individual elements to subspaces of
an inner product space V .
Definition 5.40. Two subspaces W, Z V are called orthogonal if every vector in
W is orthogonal to every vector in Z.
In other words, W and Z are orthogonal subspaces if and only if h w ; z i = 0 for every
w W, z Z. In practice, one only needs to check orthogonality of basis elements: If
w1 , . . . , wk is a basis for W and z1 , . . . , zl a basis for Z, then W and Z are orthogonal if
and only if h wi ; zj i = 0 for all i = 1, . . . , k and j = 1, . . . , l.
Example 5.41. The plane W R 3 defined by the equation 2 x y + 3 z = 0 is
orthogonal, with respect to the dot product, to the line Z spanned by its normal vector
T
T
n = ( 2, 1, 3 ) . Indeed, every w = ( x, y, z ) W satisfies the orthogonality condition
w n = 2 x y + 3 z = 0, which is just the equation for the plane.
T

Example 5.42. Let W be the span of w1 = ( 1, 2, 0, 1 ) , w2 = ( 3, 1, 2, 1 ) , and

T
T
Z the span of z1 = ( 3, 2, 0, 1 ) , z2 = ( 1, 0, 1, 1 ) . Then all wi zj = 0, and hence W
and Z are orthogonal subspaces of R 4 under the Euclidean dot product.
Definition 5.43. The orthogonal complement to a subspace W V , denoted W ,
is defined as the set of all vectors which are orthogonal to W , so
W = { v V | h v ; w i = 0 for all w W } .
1/12/04

184

c 2003

(5.68)
Peter J. Olver

Figure 5.7.

Orthogonal Complement to a Line.

One easily checks that the orthogonal complement W to a subspace W V is also

a subspace. Moreover, W W = {0}. (Why?) Note that the orthogonal complement to
a subspace will depend upon which inner product is being used. In the remainder of the
chapter, we will concentrate exclusively on the Euclidean inner product.
T

Example 5.44. Let W = { ( t, 2 t, 3 t ) | t R } be the line (one-dimensional subT

space) in the direction of the vector w1 = ( 1, 2, 3 ) R 3 . The orthogonal complement
W will be the plane passing through the origin having normal vector w 1 , as sketched in
T
Figure 5.7. In other words, z = ( x, y, z ) W if and only if
z w1 = x + 2 y + 3 z = 0.

(5.69)

Thus W is characterized as the solution space to the homogeneous linear equation (5.69),
or, equivalently, the kernel of the 1 3 matrix A = w1T = ( 1 2 3 ). We can write the
general solution to the equation in the form

2y 3z
2
3
= y 1 + z 0 = y z1 + z z2 ,
y
z=
z
0
1
T

where y, z are the free variables. The indicated solution vectors z1 = ( 2, 1, 0 ) , z2 =

T
( 3, 0, 1 ) , form a (non-orthogonal) basis for the orthogonal complement W .
Proposition 5.45. Suppose that W V is a finite-dimensional subspace of an inner
product space. Then every vector v V can be uniquely decomposed into v = w + z
where w W and z W .
Proof : We let w W be the orthogonal projection of v onto W . Then z = v w
is, by definition, orthogonal to W and hence belongs to W . Note that z can be viewed
as the orthogonal projection of v onto the complementary subspace W . If we are given
e +e
e =e
two such decompositions, v = w + z = w
z, then w w
z z. The left hand side
of this equation lies in W while the right hand side belongs to W . But, as we already
e
noted, the only vector that belongs to both W and W is the zero vector. Thus, w = w
and z = e
z, which proves uniqueness.
Q.E.D.
1/12/04

185

c 2003

Peter J. Olver

v
z
w

Figure 5.8.

Orthogonal Decomposition of a Vector.

As a direct consequence of Exercise , we conclude that a subspace and its orthogonal

complement have complementary dimensions:
Proposition 5.46. If dim W = m and dim V = n, then dim W = n m.
Example 5.47. Return to the situation described in Example 5.44. Let us decomT
pose the vector v = ( 1, 0, 0 ) R 3 into a sum v = w + z of a vector w lying in the line W
and a vector z belonging to its orthogonal plane W , defined by (5.69). Each is obtained
by an orthogonal projection onto the subspace in question, but we only need to compute
one of the two directly since the second can be obtained by subtracting from v.
Orthogonal projection onto a one-dimensional subspace is easy since any basis is,
trivially, an orthogonal basis. Thus, the projection of v onto the line spanned by w 1 =
1 2 3 T
T
, 14 , 14 . The component in W is then
( 1, 2, 3 ) is w = k w1 k2 h v ; w1 i w1 = 14
13

2
3 T
obtained by subtraction: z = v w = 14 , 14
, 14
. Alternatively, one can obtain z
directly by orthogonal projection onto the plane W . You need to be careful: the basis
derived in Example 5.44 is not orthogonal, and so you will need to set up and solve the
normal equations to find the closest point z. Or, you can first convert the basis into an
orthogonal basis by a single GramSchmidt step, and then use the orthogonal projection
formula (5.57). All three methods lead to the same vector z W .

Example 5.48. Let W R 4 be the two-dimensional subspace spanned by the

T
T
orthogonal vectors w1 = ( 1, 1, 0, 1 ) and w2 = ( 1, 1, 1, 2 ) . Its orthogonal complement
T
W (with respect to the Euclidean dot product) is the set of all vectors v = ( x, y, z, w )
that satisfy the linear system
v w1 = x + y + w = 0,

v w2 = x + y + z 2 w = 0.

Applying the usual algorithm the free variables are y and w we find that the solution
T
T
space is spanned by z1 = ( 1, 1, 0, 0 ) , z2 = ( 1, 0, 3, 1 ) , which form a non-orthogonal
basis for W .

T
T
The orthogonal basis y1 = z1 = ( 1, 1, 0, 0 ) and y2 = z2 12 z1 = 12 , 12 , 3, 1
for W is obtained by a single GramSchmidt step. To decompose the vector v =
1/12/04

186

c 2003

Peter J. Olver

( 1, 0, 0, 0 ) = w + z, say, we compute the two orthogonal projections: w = 13 w1 + 17 w2 =

10 10 1 1 T

1
10
1
1 T
W , and z = 12 y1 21
y2 = 11
W . Or you can
21 , 21 , 7 , 21
21 , 21 , 7 , 21
easily obtain z = v w by subtraction.
Proposition 5.49. If W is a finite-dimensional subspace of an inner product space,
then (W ) = W .
This result is an immediate corollary of the orthogonal decomposition Proposition 5.45.
Warning: Propositions 5.45 and 5.49 are not necessarily true for infinite-dimensional vector spaces. In general, if dim W = , one can only assert that W (W ) . For example,
it can be shown that, [125], on any bounded interval [ a, b ] the orthogonal complement to
the subspace of all polynomials P () C0 [ a, b ] with respect to the L2 inner product is
trivial: (P () ) = {0}. This means that the only continuous function which satisfies the
moment equations
Z b
n
h x ; f (x) i =
xn f (x) dx = 0,
for all
n = 0, 1, 2, . . .
a

is the zero function f (x) 0. But the orthogonal complement of {0} is the entire space,
and so ((P () ) ) = C0 [ a, b ] 6
= P () .
The difference is that, in infinite-dimensional function space, a proper subspace W (V
can be dense , whereas in finite dimensions, every proper subspace is a thin subset that
only occupies an infinitesimal fraction of the entire vector space. This seeming paradox
underlies the success of numerical methods, such as the finite element method, in approximating functions by elements of a subspace.
Orthogonality of the Fundamental Matrix Subspaces and the Fredholm Alternative
In Chapter 2, we introduced the four fundamental subspaces associated with an m n
matrix A. According to the fundamental Theorem 2.47, the first two, the kernel or null
space and the corange or row space, are subspaces of R n having complementary dimensions.
The second two, the cokernel or left null space and the range or column space, are subspaces
of R m , also of complementary dimensions. In fact, more than this is true the subspace
pairs are orthogonal complements with respect to the standard Euclidean dot product!
Theorem 5.50. Let A be an m n matrix of rank r. Then its kernel and corange
are orthogonal complements as subspaces of R n , of respective dimensions n r and r,
while its cokernel and range are orthogonal complements in R m , of respective dimensions
m r and r:
ker A = (corng A) R n ,

coker A = (rng A) R m .

(5.70)

Figure 5.9 illustrates the geometric configuration of (5.70).

In general, a subset W V of a normed vector space is dense if, for every v V , there
are elements w W that are arbitrarily close, k v w k < for every > 0. The Weierstrass
approximation theorem, [ 126 ], tells us that the polynomials form a dense subspace of the space of
continuous functions, and underlies the proof of the result mentioned in the preceding paragraph.

1/12/04

187

c 2003

Peter J. Olver

ker A

rng A

cokerA

corngA

Figure 5.9.

The Fundamental Matrix Subspaces.

Proof : A vector x R n lies in ker A if and only if A x = 0. According to the rules

of matrix multiplication, the ith entry of A x equals the product of the ith row rTi of A
and x. But this product vanishes, rTi x = ri x = 0, if and only if x is orthogonal to ri .
Therefore, x ker A if and only if x is orthogonal to all the rows of A. Since the rows
span corng A = rng AT , this is equivalent to the statement that x lies in the orthogonal
complement (corng A) , which proves the first statement. The proof for the range and
cokernel follows from the same argument applied to the transposed matrix A T . Q.E.D.
Combining Theorems 2.47 and 5.50, we deduce the following important characterization of compatible linear systems, known as the Fredholm alternative. The Swedish
mathematician Ivar Fredholms main interest was in solving linear integral equations, but
his compatibility criterion is also applicable to linear matrix systems, as well as linear
differential equations, linear variational problems, and many other linear systems.
Theorem 5.51. The linear system A x = b has a solution if and only if b is orthogonal to the cokernel of A.
Indeed, the linear system has a solution if and only if the right hand side b rng A
belongs to the range of A, which, by (5.70), requires that b be orthogonal to the cokernel
coker A. Therefore, the compatibility conditions for the linear system A x = b can be
written in the form
yb=0

for every y satisfying

AT y = 0.

(5.71)

Or, to state in another way, the vector b is a linear combination of the columns of A if
and only if it is orthogonal to every vector y in the cokernel of A. In practice, one only
needs to check orthogonality of b with respect to a basis y1 , . . . , ymr of the cokernel,
leading to a system of m r compatibility constraints, where r = rank A denotes the rank
of the coefficient matrix. We note that m r is also the number of all zero rows in the
row echelon form of A, and hence yields precisely the same number of constraints on the
right hand side b.
Example 5.52. In Example 2.40,
we analyzed the linear system A x = b with

1 0 1
coefficient matrix A = 0 1 2 . Using direct Gaussian elimination, we were led to
1 2 3
1/12/04

188

c 2003

Peter J. Olver

a single compatibility condition, namely b1 +2 b2 +b3 = 0, required for the system to have
a solution. We now understand the meaning behind this equation: it is telling us that the
right hand side b must be orthogonal to the cokernel of A. The cokernel is determined by
solving the homogeneous adjoint system AT y = 0, and is the line spanned by the vector
y1 = (1, 2, 1)T . Thus, the compatibility condition requires that b be orthogonal to y 1 ,
in accordance with the Fredholm Theorem 5.51.
Example 5.53. Let us determine the compatibility conditions for the linear system
x1 x 2 + 3 x 3 = b 1 ,

x 1 + 2 x 2 4 x 3 = b2 ,

2 x 1 + 3 x 2 + x 3 = b3 , x 1 + 2 x 3 = b4 ,

1 1 3
1 2 4
by computing the cokernel of its coefficient matrix A =
. To this end,
2
3
1
1
0
2
T
we need to solve the homogeneous adjoint system A y = 0, namely
y1 y2 + 2 y3 + y4 = 0,

y1 + 2 y2 + 3 y3 = 0,

Using Gaussian elimination, we find the general solution

3 y1 4 y2 + y3 + 2 y4 = 0.

y = y3 ( 7, 5, 1, 0 ) + y4 ( 2, 1, 0, 1 )

is a linear combination (whose coefficients are the free variables) of the two basis vectors
for coker A. Thus, the compatibility conditions are obtained by taking their dot products
with the right hand side of the original system:
7 b1 5 b2 + b3 = 0,

2 b1 b2 + b4 = 0.

The reader can check that these are indeed the same compatibility
conditions
that result

from a direct Gaussian elimination on the augmented matrix A | b .

We are now very close to a full understanding of the fascinating geometry that lurks
behind the simple algebraic operation of multiplying a vector x R n by an m n matrix,
resulting in a vector b = A x R m . Since the kernel and corange of A are orthogonal
complementary subspaces in the domain space R n , Proposition 5.46 tells us that we can
uniquely decompose x = w + z where w corng A, while z ker A. Since A z = 0, we
have
b = A x = A(w + z) = A w.
Therefore, we can regard multiplication by A as a combination of two operations:
(i ) The first is an orthogonal projection onto the subspace corng A taking x to w.
(ii ) The second takes a vector in corng A R n to a vector in rng A R m , taking the
orthogonal projection w to the image vector b = A w = A x.
Moreover, if A has rank r then, according to Theorem 2.47, both rng A and corng A are rdimensional subspaces, albeit of different vector spaces. Each vector b rng A corresponds
e corng A satisfy b = A w = A w,
e then
to a unique vector w corng A. Indeed, if w, w
e = 0 and hence w w
e ker A. But, since they are complementary subspaces,
A(w w)
the only vector that belongs to both the kernel and the corange is the zero vector, and
e In this manner, we have proved the first part of the following result; the
hence w = w.
second is left as Exercise .
1/12/04

189

c 2003

Peter J. Olver

Proposition 5.54. Multiplication by an m n matrix A of rank r defines a one-toone correspondence between the r-dimensional subspaces corng A R n and rng A R m .
Moreover, if v1 , . . . , vr forms a basis of corng A then their images A v1 , . . . , A vr form a
basis for rng A.
In summary, the linear system A x = b has a solution if and only if b rng A, or,
equivalently, is orthogonal to every vector y coker A. If the compatibility conditions
hold, then the system has a unique solution w corng A that, by the definition of the
corange or row space, is a linear combination of the rows of A. The general solution to
the system is x = w + z where w is the particular solution belonging to the corange, while
z ker A is an arbitrary element of the kernel.
Theorem 5.55. A compatible linear system A x = b with b rng A = (coker A)
has a unique solution w corng A with A w = b. The general solution is x = w + z
where z ker A. The particular solution is distinguished by the fact that it has minimum
Euclidean norm k w k among all possible solutions.
Indeed, since the corange and kernel are orthogonal subspaces, the norm of a general
solution x = w + z is
k x k2 = k w + z k 2 = k w k2 + 2 w z + k z k 2 = k w k2 + k z k2 k w k2 ,
with equality if and only if z = 0.
Example 5.56. Consider the

1 1
0 1

1 3
5 1

linear system

1
x
2 2
2 1 y 1
.
=
4
z
5 2
6
w
9 6

Applying the standard Gaussian elimination algorithm, we discover that the coefficient
T
matrix has rank 3, and the kernel is spanned by the single vector z1 = ( 1, 1, 0, 1 ) . The
system itself is compatible; indeed, the right hand side is orthogonal to the basis cokernel
T
vector ( 2, 24, 7, 1 ) , and so satisfies the Fredholm alternative.
T
The general solution to the linear system is x = ( t, 3 t, 1, t ) where t = w is the
free variable. We decompose the solution x = w + z into a vector w in the corange and
an element z in the kernel. The easiest way to do this is to first compute its orthogonal
T
projection z = k z1 k2 x z1 z1 = ( t 1, 1 t, 0, t 1 ) of the solution x onto the oneT
dimensional kernel. We conclude that w = x z = ( 1, 2, 1, 1 ) corng A is the unique
solution belonging to the corange of the coefficient matrix, i.e., the only solution that can
be written as a linear combination of its row vectors, or, equivalently, the only solution
which is orthogonal to the kernel. The reader should check this by finding the coefficients
in the linear combination, or, equivalently, writing w = AT v for some v R 4 .
In this example, the analysis was simplified by the fact that the kernel was onedimensional, and hence the orthogonal projection was relatively easy to compute. In more
complicated situations, to determine the decomposition x = w + z one needs to solve the
1/12/04

190

c 2003

Peter J. Olver

normal equations (4.26) in order to find the orthogonal projection or least squares point in
the subspace; alternatively, one can first determine an orthogonal basis for the subspace,
and then apply the orthogonal (or orthonormal) projection formula (5.57). Of course,
once one of the constituents w, z has been found, the other can be simply obtained by
subtraction from x.

1/12/04

191

c 2003

Peter J. Olver

Chapter 6
Equilibrium
In this chapter, we turn to some interesting applications of linear algebra to the
analysis of mechanical structures and electrical circuits. We will discover that there are remarkable analogies between electrical and mechanical systems. Both fit into a very general
mathematical framework which, when suitably formulated, will also apply in the continuous realm, and ultimately governs the equilibria of systems arising throughout physics and
engineering. The one difference is that discrete structures and circuits are governed by
linear algebraic equations on finite-dimensional vector spaces, whereas continuous media
are modeled by differential equations and boundary value problems on infinite-dimensional
function spaces.
We begin by analyzing in detail a linear chain of masses interconnected by springs
and constrained to move only in the longitudinal direction. Our general mathematical
framework is already manifest in this rather simple mechanical structure. Next, we consider
simple electrical circuits consisting of resistors and current sources interconnected by a
network of wires. Finally, we treat small (so as to remain in a linear regime) displacements
of two and three-dimensional structures constructed out of elastic bars. In all cases, we
only consider the equilibrium configurations; dynamical processes for each of the physical
systems will be taken up in Chapter 9.
In the mechanical and electrical systems treated in the present chapter, the linear system governing the equilibrium configuration has the same structure: the coefficient matrix
is of general positive (semi-)definite Gram form. The positive definite cases correspond
to stable structures and circuits, which can support any external forcing, and possess a
unique stable equilibrium solution that can be characterized by a minimization principle.
On the other hand, the positive semi-definite cases correspond to unstable structures and
circuits that cannot remain in equilibrium except for very special configurations of external forces. In the case of mechanical structures, the instabilities are of two types: rigid
motions, under which the structure maintains its overall absence of any applied force.

6.1. Springs and Masses.

A massspring chain consists of n masses m1 , m2 , . . . mn arranged in a straight
line. Each mass is connected to its immediate neighbor(s) by a spring. Moreover, the
massspring chain may be connected at one or both ends by a spring to a solid support.
At first, for specificity, let us look at the case when both ends of the chain are attached,
as illustrated in Figure 6.1. To be definite, we assume that the masses are arranged in a
vertical line, and order them from top to bottom. On occasion, we may refer to the top
support as mass m0 and the bottom support as mass mn+1 . For simplicity, we will only
1/12/04

192

c 2003

Peter J. Olver

Figure 6.1.

A MassSpring Chain with Fixed Ends.

allow the masses to move in the vertical direction one-dimensional motion. (Section 6.3
deals with the more complicated cases of two- and three-dimensional motion.)
If we subject some or all of the masses to an external force, e.g., gravity, then the
system will move to a new equilibrium position. The motion of the ith mass is measured
by its displacement ui from its original position, which, since we are only allowing vertical
motion, is a scalar. Referring to Figure 6.1, we use the convention that u i > 0 if the mass
has moved downwards, and ui < 0 if it has moved upwards. The problem is to determine
the new equilibrium configuration of the chain under the prescribed forcing, that is, to set
up and solve a system of equations for the displacements u1 , . . . , un .
Let ej denote the elongation of the j th spring, which connects mass mj1 to mass mj .
By elongation, we mean how far the spring has been stretched, so that e j > 0 if the spring
is longer than its reference length, while ej < 0 if the spring has been compressed. The
elongations can be determined directly from the displacements according to the geometric
formula
ej = uj uj1 ,
j = 2, . . . , n,
(6.1)
while
e1 = u1 ,

en+1 = un ,

(6.2)

since the top and bottom supports are fixed. We write the elongation equations (6.1), (6.2)
in matrix form
e = A u,

(6.3)

The differential equations governing its dynamical behavior will be the subject of Chapter 9. Damping or frictional effects will cause the system to eventually settle down into a stable
equilibrium configuration.

1/12/04

193

c 2003

Peter J. Olver

e
1
e2
where e =
..
.

is the elongation vector , u = .2 is the displacement vector , and

.
un
en+1
the coefficient matrix

1 1

1
1
(6.4)
A=

.
.

..
..

1 1
1

has size (n + 1) n, with only the non-zero entries being indicated. The matrix A is
known as the reduced incidence matrix for the massspring chain. It effectively encodes
the underlying geometry of the massspring chain, including the boundary conditions at
the top and the bottom.
The next step is to connect the elongation ej experienced by the j th spring to its internal force yj . This is the basic constitutive assumption, that relates geometry to kinematics.
In the present case, we shall assume that the springs are not stretched (or compressed)
particularly far, and so obey Hookes Law
yj = c j e j ,

(6.5)

named after the prolific seventeenth century English scientist and inventor Robert Hooke.
The constant cj > 0 measures the springs stiffness. Hookes Law says that force is
proportional to elongation the more you stretch a spring, the more internal force it
experiences. A hard spring will have a large stiffness and so takes a large force to stretch,
whereas a soft spring will have a small, but still positive, stiffness. We write (6.5) in matrix
form
y = C e,
(6.6)
where

y
1
y2
y=
..
.

yn+1

c2
..

.
cn+1

Note particularly that C > 0 is a diagonal, positive definite matrix.

Finally, the forces must balance if the system is to remain in equilibrium. Let f i
denote the external force on the ith mass mi . We also measure force in the downwards
direction, so fi > 0 means the force is pulling the ith mass downwards. (In particular,
gravity would induce a positive force on each mass.) The ith mass is immediately below

The connection with the incidence matrix of a graph will become evident in Section 6.2.

1/12/04

194

c 2003

Peter J. Olver

the ith spring and above the (i + 1)st spring. If the ith spring is stretched, it will exert an
upwards force on mi , while if the (i + 1)st spring is stretched, it will pull mi downwards.
Therefore, the balance of forces on mi requires that
fi = yi yi+1 .

(6.7)

The matrix form of the force balance law is

f = AT y

(6.8)

where f = (f1 , . . . , fn )T . The remarkable, and very general fact is that the force balance
coefficient matrix

1 1

1
1

(6.9)
AT =

1 1

..
..

.
.
1

is the transpose of the reduced incidence matrix (6.4) for the chain. This connection
between geometry and force balance turns out to be very general, and is the reason underlying the positivity of the final coefficient matrix in the resulting system of equilibrium
equations.
Summarizing, we have
e = A u,

y = C e,

f = AT y.

(6.10)

These equations can be combined into a single linear system

Ku = f,

K = AT C A

where

(6.11)

is called the stiffness matrix associated with the entire massspring chain. The stiffness
matrix K has the form of a Gram matrix (3.51) for the weighted inner product h v ; w i =
vT C w induced by the diagonal matrix of spring stiffnesses. Theorem 3.33 tells us that
since A has linearly independent columns (which should be checked), and C > 0 is positive
definite, then the stiffness matrix K > 0 is automatically positive definite. In particular,
Theorem 3.38 guarantees that K is an invertible matrix, and hence the linear system (6.11)
has a unique solution u = K 1 f . We can therefore conclude that the massspring chain
assumes a unique equilibrium position.
In fact, in the particular case considered here,

c1 + c 2
c2

c2
c2 + c 3
c3

c3
c3 + c 4
c4

c4
c4 + c 5 c 5

(6.12)
K=

.
.
.

.
.
.
.
.
.

c
c
+c
c
n1

1/12/04

195

cn + cn+1
c 2003

Peter J. Olver

has a very simple symmetric, tridiagonal form. As such, we can apply our tridiagonal
solution algorithm of Section 1.7 to rapidly solve the system.
Example 6.1. Let us consider the particular case of n = 3 masses connected by
identical springs with unit spring constant. Thus, c1 = c2 = c3 = c4 = 1 and C =
diag (1, 1, 1, 1) = I is the 4 4 identity matrix. The 3 3 stiffness matrix is then

1 1
T

K=A A= 0 1
0 0

1
0
1
0
0
1
0

0
1
1

0
1
1
0

0
2
0
= 1
1
0
1

1
2
1

0
1 .
2

A straightforward Gaussian elimination produces the K = L D LT factorization

1
0 0
2 1 0
1 12
2 0 0
0
1 2 1 = 1
1 0 0 32 0 0 1 23 .
2
0 1 2
0 0
1
0 32 1
0 0 43

With this in hand, we can solve the basic equilibrium equations K u = f by our basic
forward and back substitution algorithm.
Suppose, for example, we pull the middle mass downwards with a unit force, so f 2 = 1
T
while f1 = f3 = 0. Then f = ( 0, 1, 0 ) , and the solution to the equilibrium equations

1
T
(6.11) is u = 2 , 1, 12 , whose entries prescribe the mass displacements. Observe that
all three masses have moved down, with the middle mass moving twice as far as the other
two. The corresponding spring elongations and internal forces are obtained by matrix
multiplication

T
y = e = A u = 12 , 12 , 12 , 12
.
Thus the top two springs are elongated, while the bottom two are compressed, all by an
equal amount.
Similarly, if all the masses are equal, m1 = m2 = m3 = m, then the solution under a
T
constant downwards gravitational force f = ( m g, m g, m g ) of magnitude g is

3
m
g
mg
2

u = K 1 m g = 2 m g ,
mg

3
2

and

y = e = Au =

2 m g,

1
2

m g, 21 m g, 23 m g

Now, the middle mass has only moved 33% farther than the others, whereas the top and
bottom spring are experiencing three times as much elongation/compression as the middle
springs.
An important observation is that we cannot determine the internal forces y or elongations e directly from the force balance law (6.8) because the transposed matrix A T is
not square, and so the system f = AT y does not have a unique solution. We must first
1/12/04

196

c 2003

Peter J. Olver

determine the displacements u using the full equilibrium equations (6.11), and then use the
resulting displacements to reconstruct the elongations and internal forces. This situation
is referred to as statically indeterminate.
Remark : Even though we construct K = AT C A and then factor it as K = L D LT ,
there is no direct algorithm to get from A and C to L and D, which, typically, are matrices
of a different size.
The behavior of the system will depend upon both the forcing and the boundary
conditions. Suppose, by way of contrast, that we only fix the top of the chain to a support,
and leave the bottom mass hanging freely, as in Figure 6.2. The geometric relation between
the displacements and the elongations has the same form (6.3) as before, but the reduced
incidence matrix is slightly altered:
1
1

1
1

1
..
.

.
1

(6.13)

This matrix has size n n and is obtained from the preceding example (6.4) by eliminating
the last row corresponding to the missing bottom spring. The constitutive equations are
still governed by Hookes law y = C e, as in (6.6), with C = diag (c 1 , . . . , cn ) the n n
diagonal matrix of spring stiffnesses. Finally, the force balance equations are also found
to have the same general form f = AT y as in (6.8), but with the transpose of the revised
incidence matrix (6.13). In conclusion, the equilibrium equations K x = f have an identical
form (6.11), based on the revised stiffness matrix
c1 + c 2
c2

T
K = A CA =

c2
c2 + c 3
c3

c3
c3 + c 4
c4

c4
c4 + c 5
..

c5
..

cn1

cn1 + cn
cn

cn
cn

(6.14)

Note that only the bottom right entry is different from the fixed end version (6.12). In
contrast to the chain with two fixed ends, this system is called statically determinate
because the incidence matrix A is square and nonsingular. This means that it is possible
to solve the force balance law (6.8) directly for the internal forces y = A 1 f without having
to solve the full equilibrium equations for the displacements.
Example 6.2.
1/12/04

For a three mass chain with one free end and equal unit spring
197

c 2003

Peter J. Olver

m1
m2
m3

Figure 6.2.

A MassSpring Chain with One Free End.

constants c1 = c2 = c3 = 1, the stiffness matrix

1
1 1 0
T

1
K = A A = 0 1 1
0
0 0
1

is

2
0 0

1 0 = 1
0
1 1

1 0
2 1 .
1 1
T

Pulling the middle mass downwards with a unit force, whereby f = ( 0, 1, 0 ) , results in
the displacements

1
1
so that
y = e = A u = 1 .
u = K 1 f = 2 ,
0
2

In this configuration, the bottom two masses have moved equal amounts, and twice as far
as the top mass. Because we are only pulling on the middle mass, the lower-most spring
hangs free and experiences no elongation, whereas the top two springs are stretched by the
same amount.
Similarly, for a chain of equal masses subject to a constant downwards gravitational
T
force f = ( m g, m g, m g ) , the equilibrium position is

mg
3mg
3mg
and
y = e = A u = 2 m g .
u = K 1 m g = 5 m g ,
mg
6mg
mg

Note how much further the masses have moved now that the restraining influence of the
bottom support has been removed. The top spring is experiencing the most elongation,
and is thus the most likely to break, because it must support all three masses.
The Minimization Principle
According to Theorem 4.1, when the coefficient matrix of the linear system governing a
massspring chain is positive definite, the unique equilibrium solution can be characterized
by a minimization principle. The quadratic function to be minimized has a physical interpretation: it is the potential energy of the system. Nature is parsimonious when it comes to
energy: physical systems seek out equilibrium configurations that minimize energy. This
1/12/04

198

c 2003

Peter J. Olver

general minimization principle can often be advantageously used in the construction of

mathematical models, as well as in their solution, both analytical and numerical.
The energy function to be minimized can be determined directly from physical principles. For a massspring chain, the potential energy of the ith mass equals the product
of the applied force times its displacement: fi ui . The minus sign is the result of our
convention that a positive displacement ui > 0 means that the mass has moved down,
and hence decreased its potential energy. Thus, the total potential energy due to external
forcing on all the masses in the chain is

n
X

f i ui = u T f .

i=1

Next, we calculate the internal energy of the system. The potential energy in a single
spring elongated by an amount e is obtained by integrating the internal force, y = c e,
leading to
Z e
Z e
y de =
c e de = 12 c e2 .
0

Totalling the contributions from each spring, we find the internal spring energy to be
n
1 X
c e2 =
2 i=1 i i

1
2

eT C e =

1
2

uT AT CA u =

1
2

uT K u,

where we used the incidence equation e = A u relating elongation and displacement. Therefore, the total potential energy is
p(u) =

1
2

uT K u u T f .

(6.15)

Since K > 0, Theorem 4.1 implies that this quadratic function has a unique minimizer
that satisfies the equilibrium equation K u = f .
Example 6.3. For a three mass chain with two fixed ends described in Example 6.1,
the potential energy function (6.15) has the explicit form

f1
2 1 0
u1
1
p(u) = ( u1 u2 u3 ) 1 2 1 u2 ( u1 u2 u3 ) f2
2
f3
u3
0 1 2
= u21 u1 u2 + u22 u2 u3 + u23 f1 u1 f2 u2 f3 u3 ,
T

where f = ( f1 , f2 , f3 ) is the external forcing. The minimizer of this particular quadratic

T
function gives the equilibrium displacements u = ( u1 , u2 , u3 ) of the three masses.

6.2. Electrical Networks.

An electrical network consists of a collection of wires that are joined together at their
ends. The junctions where one or more wires are connected are called nodes. Abstractly,
we can view any such electrical network as a graph, the wires being the edges and the
1/12/04

199

c 2003

Peter J. Olver

u1
R1

R4
u4

Figure 6.3.

A Simple Electrical Network.

nodes the vertices. To begin with we assume that there are no electrical devices (batteries,
inductors, capacitors, etc.) in the network and so the the only impediment to current
flowing through the network is each wires resistance. (If desired, we may add resistors
to the network to increase the resistance along the wires.) As we shall see, resistance (or,
rather, its reciprocal) plays a very similar role to spring stiffness.
We shall introduce current sources into the network at one or more of the nodes, and
would like to determine how the induced current flows through the wires in the network.
The basic equilibrium equations for the currents are the consequence of three fundamental
laws of electricity.
Voltage is defined as the electromotive force that moves electrons through a wire. is
induced by a drop in the voltage potential along the wire. The voltage in a wire is induced
by the difference in the voltage potentials at the two ends, just as the gravitational force on
a mass is induced by a difference in gravitational potential. To quantify voltage, we need
to assign an orientation to the wire. Then a positive voltage means the electrons move in
the assigned direction, while under a negative voltage they move in reverse. The original
choice of orientation is arbitrary, but once assigned will pin down the sign conventions used
by voltages, currents, etc. To this end, we draw a digraph to represent the network, and
each edge or wire is assigned a direction that indicates its starting and ending vertices or
nodes. A simple example is illustrated in Figure 6.3, and contains five wires joined at four
different nodes. The arrows indicate the orientations of the wires, while the wavy lines are
the standard electrical symbols for resistance.
In an electrical network, each node will have a voltage potential, denoted u i . If wire
k starts at node i and ends at node j, under its assigned orientation, then its voltage v k
equals the potential difference at its ends:
vk = ui uj .

(6.16)

Note that vk > 0 if ui > uj , and so the electrons go from the starting node i to the ending
1/12/04

200

c 2003

Peter J. Olver

node j, in accordance with our choice of orientation. In our particular illustrative example,
v1 = u1 u2 ,

v 2 = u1 u3 ,

v 3 = u1 u4 ,

v 4 = u2 u4 ,

v 5 = u3 u4 .

Let us rewrite this system in matrix form

v = A u,

(6.17)

where, for our particular example,

1 1
1 0

A = 1 0

0 1
0 0

0
1
0
0
1

0
0

1 .

1
1

(6.18)

The alert reader will recognize this matrix as the incidence matrix (2.42) for the digraph
defined by the circuit; see (2.42). This is true in general the voltages along the wires
of an electrical network are related to the potentials at the nodes by a linear system of the
form (6.17), where A is the incidence matrix of the network digraph. The rows of the
incidence matrix are indexed by the wires; the columns are indexed by the nodes. Each
row of the matrix A has a single + 1 in the column indexed by the starting node, and a
single 1 in the column of the ending node.
Kirchhoff s Voltage Law states that the sum of the voltages around each closed loop in
the network is zero. For example, in the circuit under consideration, around the left-hand
triangle we have
v1 + v4 v3 = (u1 u2 ) + (u2 u4 ) (u1 u4 ) = 0.
Note that v3 appears with a minus sign since we must traverse wire #3 in the opposite
direction to its assigned orientation when going around the loop in the counterclockwise
direction. The voltage law is a direct consequence of (6.17). Indeed, as discussed in
Section 2.6, the loops can be identified with vectors ` coker A = ker A T in the cokernel
of the incidence matrix, and so
` v = `T v = `T A u = 0.

(6.19)

Therefore, orthogonality of the voltage vector v to the loop vector ` is the mathematical
formulation of the zero-loop relation.
Given a prescribed set of voltages v along the wires, can one find corresponding voltage
potentials u at the nodes? To answer this question, we need to solve v = A u, which
requires v rng A. According to the Fredholm Alternative Theorem 5.51, the necessary
and sufficient condition for this to hold is that v be orthogonal to coker A. Theorem 2.51
says that the cokernel of an incidence matrix is spanned by the loop vectors, and so v is
a possible set of voltages if and only if v is orthogonal to all the loop vectors ` coker A,
i.e., the Voltage Law is necessary and sufficient for the given voltages to be physically
realizable in the network.
Kirchhoffs Laws are related to the topology of the circuit how the different wires
are connected together. Ohms Law is a constitutive relation, indicating what the wires
1/12/04

201

c 2003

Peter J. Olver

are made of. The resistance along a wire, including any added resistors, prescribes the
relation between voltage and current or the rate of flow of electric charge. The law reads
v k = R k yk ,

(6.20)

where vk is the voltage and yk (often denoted Ik in the engineering literature) denotes the
current along wire k. Thus, for a fixed voltage, the larger the resistance of the wire, the
smaller the current that flows through it. The direction of the current is also prescribed
by our choice of orientation of the wire, so that yk > 0 if the current is flowing from the
starting to the ending node. We combine the individual equations (6.20) into a matrix
form
v = R y,
(6.21)
where the resistance matrix R = diag (R1 , . . . , Rn ) > 0 is diagonal and positive definite.
We shall, in analogy with (6.6), replace (6.21) by the inverse relationship
y = C v,

(6.22)

where C = R1 is the conductance matrix , again diagonal, positive definite, whose entries
are the conductances ck = 1/Rk of the wires. For the particular circuit in Figure 6.3,

c2
c3
c4
c5

1/R1

1/R2

1/R3
1/R4
1/R5

(6.23)

Finally, we stipulate that electric current is not allowed to accumulate at any node, i.e.,
every electron that arrives at a node must leave along one of the wires. Let y k , yl , . . . , ym
denote the currents along all the wires k, l, . . . , m that meet at node i in the network, and
fi an external current source, if any, applied at node i. Kirchhoff s Current Law requires
that the net current into the node, namely
yk yl ym + fi = 0,

(6.24)

must be zero. Each sign is determined by the orientation of the wire, with if node i
is a starting node or + if it is an ending node.
In our particular example, suppose that we send a 1 amp current source into the first
node. Then Kirchhoffs Current Law requires
y1 + y2 + y3 = 1,

y1 + y4 = 0,

y2 + y5 = 0,

y3 y4 y5 = 0.

Since we have solved (6.24) for the currents, the signs in front of the y i have been reversed,
with + now indicating a starting node and an ending node. The matrix form of this
system is
AT y = f ,
(6.25)
1/12/04

202

c 2003

Peter J. Olver

where y = ( y1 , y2 , y3 , y4 , y5 ) are the currents along the five wires, and f = ( 1, 0, 0, 0 )

represents the current sources at the four nodes. The coefficient matrix

1
1
1
0
0
0
1
0
1 0
AT =
(6.26)
,
0 1 0
0
1
0
0 1 1 1
is the transpose of the incidence matrix (6.18). As in the massspring chain, this is a
general fact, and is an immediate result of Kirchhoffs two laws. The coefficient matrix for
the current law is the transpose of the incidence matrix for the voltage law.
Let us assemble the full system of equilibrium equations:
v = A u,

y = C v,

f = AT y.

(6.27)

Remarkably, we arrive at a system of linear relations that has an identical form to the
massspring chain system (6.10). As before, they combine into a single linear system
Ku = f,

where

K = AT C A

(6.28)

is the resistivity matrix associated with the given network. In our particular example,
combining (6.18), (6.23), (6.26) produces the resistivity matrix

c1 + c 2 + c 3
c1
c2
c3
c1
c1 + c 4
0
c4

K = AT C A =
(6.29)

c2
0
c2 + c5
c5
c3
c4
c5
c3 + c 4 + c 5
depending on the conductances of the five wires in the network.

Remark : There is a simple pattern to the resistivity matrix, evident in (6.29). The
diagonal entries kii equal the sum of the conductances of all the wires having node i at
one end. The non-zero off-diagonal entries kij , i 6
= j, equal ck , the conductance of the

wire joining node i to node j, while kij = 0 if there is no wire joining the two nodes.
Consider the case when all the wires in our network have equal unit resistance, and
so ck = 1/Rk = 1 for k = 1, . . . , 5. Then the resistivity matrix is

3 1 1 1
0 1
1 2
(6.30)
K=
.
1 0
2 1
1 1 1 3
However, trying to solve the system (6.28) runs into an immediate difficulty: there is no
solution! The matrix (6.30) is not positive definite it has zero determinant, and so is
T
not invertible. Moreover, the particular current source vector f = ( 1, 0, 0, 0 ) does not lie
in the range of K. Something is clearly amiss.

This assumes that there is only one wire joining the two nodes.

1/12/04

203

c 2003

Peter J. Olver

Before getting discouraged, let us sit back and use a little physical intuition. We are
trying to put a 1 amp current into the network at node 1. Where can the electrons go? The
answer is nowhere they are trapped in the circuit and, as they accumulate, something
drastic will happen sparks will fly! This is clearly an unstable situation, and so the fact
that the equilibrium equations do not have a solution is trying to tell us that the physical
system cannot remain in a steady state. The physics rescues the mathematics, or, vice
versa, the mathematics elucidates the underlying physical processes!
In order to achieve a steady state in an electrical network, we must remove as much
current as we put in. In other words, the sum of all the current sources must vanish:
f1 + f2 + + fn = 0.
For example, if we feed a 1 amp current into node 1, then we must extract a total of 1
amps worth of current from the other nodes. If we extract a 1 amp current from node
T
4, the modified current source vector f = ( 1, 0, 0, 1 ) does indeed lie in the range of K
(check!) and the equilibrium system (6.28) has a solution. Fine . . .
But we are not out of the woods yet. As we know, if a linear system has a singular
square coefficient matrix, then either it has no solutions the case we already rejected
or it has infinitely many solutions the case we are considering now. In the particular
network under consideration, the general solution to the linear system

u1
3 1 1 1
1
0 1 u2 0
1 2

1 0
2 1
u3
0
1 1 1 3
u4
1
is found by Gaussian elimination:
1
2

u = 4
1
4

1
2

1

+ t

1
= 41 + t ,

1
+ t
4
1
t
0
+t

(6.31)

where t = u4 is the free variable. The nodal voltage potentials

u1 =

1
2

+ t,

u2 =

1
4

+ t,

u3 =

1
4

+ t,

u4 = t,

depend on a free parameter t.

The ambiguity arises because we have not specified a baseline value for the voltage
potentials. Indeed, voltage potential is a mathematical abstraction that cannot be measured directly; only relative potential differences have physical import. To eliminate the
ambiguity, one needs to assign a base potential level. (A similar ambiguity arises in the
specification of gravitational potential.) In terrestrial electricity, the Earth is assumed
to be at a zero voltage potential. Specifying a particular node to have zero potential is
physically equivalent to grounding that node. Grounding one of the nodes, e.g., setting
u4 = t = 0, will then uniquely specify all the other voltage potentials, resulting in a unique
solution u1 = 21 , u2 = 41 , u3 = 41 , u4 = 0, to the system.
1/12/04

204

c 2003

Peter J. Olver

On the other hand, even without specification of a baseline potential level, the corresponding voltages and currents along the wires are uniquely specified. In our example,
computing y = v = A u gives
y1 = v 1 =

1
4

y2 = v 2 =

1
4

y3 = v 3 =

1
2

y4 = v 4 =

1
4

y5 = v 5 =

1
4

independent of the value of t in (6.31). Thus, the nonuniqueness of the voltage potential
solution u is not an essential difficulty. All physical quantities that we can measure
currents and voltages are uniquely specified by the solution to the equilibrium system.
Remark : Although they have no real physical meaning, we cannot dispense with the
nonmeasurable (and non-unique) voltage potentials u. Most circuits are statically indeterminate since their incidence matrix is rectangular and not invertible, and so the linear
system AT y = f cannot be solved directly for the currents in terms of the voltage sources
it does not have a unique solution. Only by first solving the full equilibrium system
(6.28) for the potentials, and then using the relation y = CA u between the potentials and
the currents, can we determine the actual values of the currents in our network.
Let us analyze what is going on in the context of our general mathematical framework.
Proposition 3.32 says that the resistivity matrix K = AT CA is positive definite (and
hence nonsingular) provided A has linearly independent columns, or, equivalently, ker A =
{0}. But Proposition 2.49 says that the incidence matrix A of a directed graph never
has a trivial kernel. Therefore, the resistivity matrix K is only positive semi-definite,
and hence singular. If the network is connected, then ker A = ker K = coker K is oneT
dimensional, spanned by the vector z = ( 1, 1, 1, . . . , 1 ) . According to the Fredholm
Alternative Theorem 5.51, the fundamental network equation K u = f has a solution if
and only if f is orthogonal to coker K, and so the current source vector must satisfy
f z = f1 + f2 + + fn = 0,

(6.32)

as we already observed. Therefore, the linear algebra reconfirms our physical intuition: a
connected network admits an equilibrium configuration, obtained by solving (6.28), if and
only if the nodal current sources add up to zero, i.e., there is no net influx of current into
the network.
Grounding one of the nodes is equivalent to nullifying the value of its voltage potential:
ui = 0. This variable is now fixed, and can be safely eliminated from our system. To
accomplish this, we let A? denote the m (n 1) matrix obtained by deleting the ith
column from A. For example, if we ground node number 4 in our sample network, then
we erase the fourth column of the incidence matrix (6.18), leading to the reduced incidence
matrix

1 1 0
1 0 1

?
(6.33)
0 .
A = 1 0

0 1
0
0 0
1
1/12/04

205

c 2003

Peter J. Olver

The key observation is that A? has trivial kernel, ker A? = {0}, and therefore the reduced
network resistivity matrix

c1 + c 2 + c 3
c1
c2
(6.34)
K ? = (A? )T C A? =
c1
c1 + c 4
0 .
c2
0
c2 + c5
is positive definite. Note that we can obtain K ? directly from K by deleting both its
T
fourth row and fourth column. Let f ? = ( 1, 0, 0 ) denote the reduced current source
vector obtained by deleting the fourth entry from f . Then the reduced linear system is
K ? u? = f ? ,

where

u ? = ( u1 , u2 , u3 ) ,

(6.35)

is the reduced voltage potential vector. Positive definiteness of K ? implies that (6.35) has
a unique solution u? , from which we can reconstruct the voltages v = A? u? and currents
y = C v = CA? u? along the wires. In our example, if all the wires have unit resistance,
then the reduced system (6.35) is

1
u1
3 1 1
1 2
0 u2 = 0 ,
0
u3
1 0
2

T
and has unique solution u? = 21 14 41 . The voltage potentials are
u1 = 12 ,

u2 = 41 ,

u3 = 14 ,

u4 = 0,

and correspond to the earlier solution (6.31) when t = 0. The corresponding voltages and
currents along the wires are the same as before.
So far, we have only considered the effect of current sources at the nodes. Suppose
now that the circuit contains one or more batteries. Each battery serves as a voltage source
along one of the wires, and we let bk denote the voltage of a battery connected to wire
k. The quantity bk comes with a sign, indicated by the batterys positive and negative
terminals. Our convention is that bk > 0 if the current from the battery runs in the same
direction as our chosen orientation of the wire. The battery voltage modifies the voltage
balance equation (6.16):
vk = ui uj + bk .
The corresponding matrix form (6.17) becomes
v = A u + b,

(6.36)

where b = ( b1 , b2 , . . . , bm ) is the battery vector whose entries are indexed by the wires.
(If there is no battery on wire k, the corresponding entry is bk = 0.) The remaining two
equations are as before, so y = C v are the currents in the wires, and, in the absence of
external current sources, Kirchhoffs Current Law implies AT y = 0. Using the modified
formula (6.36) for the voltages, these combine into the following equilibrium system
K ? u = AT C A u = AT C b.
1/12/04

206

(6.37)
c 2003

Peter J. Olver

R 12

u7
R7

R 10

u3
R6
u4

R8
u6

R3
u1

Figure 6.4.

R 11

R5
R1

Cubical Electrical Network with a Battery.

Thus, interestingly, the voltage potentials satisfy the normal weighted least squares equations (4.54) corresponding to the system A u = b, with weights given by the conductances
in the individual wires in the circuit. It is a remarkable fact that Nature solves a least
squares problem in order to make the weighted norm of the voltages v as small as possible.
Furthermore, the batteries have exactly the same effect on the voltage potentials as if
we imposed the current source vector
f = AT C b.

(6.38)

Namely, the effect of the battery of voltage bk on wire k is the exactly the same as introducing an additional current sources of ck bk at the starting node and ck bk at the ending
node. Note that the induced current vector f rng K continues to satisfy the network
constraint (6.32). Vice versa, a given system of current sources f has the same effect as
any collection of batteries b that satisfies (6.38).
Unlike a current source, a circuit with a battery always admits a solution for the voltage potentials and currents. Although the currents are uniquely determined, the voltage
potentials are not. As before, to eliminate the ambiguity, we can ground one of the nodes
and use the reduced incidence matrix A? and reduced current source vector f ? obtained
by eliminating the column/entry corresponding to the grounded node.
Example 6.4. Consider an electrical network running along the sides of a cube,
where each wire contains a 2 ohm resistor and there is a 9 volt battery source on one wire.
The problem is to determine how much current flows through the wire directly opposite the
battery. Orienting the wires and numbering them as indicated in Figure 6.4, the incidence
1/12/04

207

c 2003

Peter J. Olver

matrix is

1
1

0
A=
0

0
0

1
0
0
1
1
0
0
0
0
0
0
0

0
1
0
0
0
1
1
0
0
0
0
0

0
0
1
0
0
0
0
1
1
0
0
0

0
0
0
1
0
1
0
0
0
1
0
0

0
0
0
0
1
0
0
1
0
0
1
0

0
0
0
0
0
0
1
0
1
0
0
1

0
0

0
.
0

1
1

We connect the battery along wire #1 and measure the resulting current along wire #12.
To avoid the ambiguity in the voltage potentials, we ground the last node and erase the
final column from A to obtain the reduced incidence matrix A? . Since the resistance
matrix R has all 2s along the diagonal, the conductance matrix is C = 21 I . Therefore the
network resistivity matrix is

3 1 1 1 0
0
0
0
0 1 1 0
1 3

1 0
3
0 1 0 1

?
T
?
?
? T
?
0
3
0 1 1 .
K = (A ) CA = 12 (A ) A = 1 0

2
3
0
0
0 1 1 0

0 1 0 1 0
3
0
0
0 1 1 0
0
3
T

The current source corresponding to the battery b = ( 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ) along

the first wire is
T
f ? = (A? )T C b = 12 ( 9, 9, 0, 0, 0, 0, 0 ) .
Solving the resulting linear system by Gaussian elimination, the voltage potentials are
T

.
u? = (K ? )1 f ? = 3, 49 , 98 , 98 , 38 , 83 , 34

Thus, the induced currents along the sides of the cube are

15
15 15 15
3
3
3
3
3
3
3 T
y = C v = C (A? u? + b) = 15
.
8 , 16 , 16 , 16 , 16 , 4 , 16 , 4 , 16 , 16 , 16 , 8

In particular, the current on the wire that is opposite the battery is y12 = 38 , flowing in
the opposite direction to its orientation. The most current flows through the battery wire,
while wires 7, 9, 10 and 11 transmit the least current.
The Minimization Principle and the ElectricalMechanical Analogy
As with a massspring chain, the current flows in such a resistive electrical network
can be characterized by a minimization principle. The power in a wire is defined as the
1/12/04

208

c 2003

Peter J. Olver

product of its current y and voltage v,

P = y v = R y2 = c v2 ,

(6.39)

where R is the resistance, c = 1/R the conductance, and we are using Ohms Law (6.20)
to relate voltage and current. Physically, the power tells us the rate at which electrical
energy is converted into heat or energy by the resistance along the wire.
Summing over all the wires in the network, the total power is the dot product
P =

m
X

yk vk = yT v = vT C v = (A u + b)T C (A u + b)

k=1

= uT AT C A u + 2 uT AT C b + bT C b.

The resulting quadratic function can be written in the usual form

1
2

P = p(u) =

1
2

uT K u uT f + c,

(6.40)

where K = AT C A is the network resistivity matrix, while f = AT C b are the equivalent

current sources at the nodes (6.38) that correspond to the batteries. The last term c =
1 T
2 b C b is one half the internal power in the battery, and does not depend upon the
currents/voltages in the wires. In deriving (6.40), we have ignored any additional external
current sources at the nodes. By an analogous argument, a current source will contribute
to the linear terms in the power in the same fashion, and so the linear terms u T f represent
the effect of both batteries and external current sources.
In general, the resistivity matrix K is only positive semi-definite, and so the quadratic
power function (6.40) does not, in general, have a minimizer. As argued above, to ensure
equilibrium, we need to ground one or more of the nodes. The resulting reduced form
p(u? ) =

1
2

(u? )T K ? u? (u? )T f ? ,

for the power now has a positive definite coefficient matrix K ? > 0. The minimizer of
the power function is the solution u? to the reduced linear system (6.35). Therefore, the
network adjusts itself to minimize the power or total energy loss! Just as with mechanical
systems, Nature solves a minimization problem in an effort to conserve energy.
We have discovered the remarkable correspondence between the equilibrium equations
for electrical networks (6.10), and those of massspring chains (6.27). This Electrical
Mechanical Correspondence is summarized in the following table. In the following section,
we will see that the analogy extends to more general structures. In Chapter 15, we will
discover that it continues to apply in the continuous regime, and subsumes solid mechanics,
fluid mechanics, electrostatics, and many other physical systems in a common mathematical framework!

For alternating currencts, there is no annoying factor of 2 in the formula for the power, and
the analogy is more direct.

1/12/04

209

c 2003

Peter J. Olver

Structures

Variables

Networks

Displacements
Elongations
Spring stiffnesses
Internal Forces
External forcing
Stiffness matrix
Potential energy

u
v = Au
C
y=Cv
f = AT y
K = AT C A
p(u) = 21 uT Ku uT f

Voltages
Voltage drops
Conductivities
Currents
Current sources
Resistivity matrix
1
2 Power

Prestressed bars/springs

v = Au + b

Batteries

6.3. Structures in Equilibrium.

A structure (sometimes known as a truss) is a mathematical idealization of a framework for a building. Think of a skyscraper when just the I-beams are connected together
before the walls, floors, ceilings, roof and ornamentation are added. An ideal structure
is constructed of elastic bars connected at joints. By a bar , we mean a straight, rigid
rod that can be (slightly) elongated, but not bent. (Beams, which are allowed to bend,
are more complicated, and we defer their treatment until Section 11.4.) When a bar is
stretched, it obeys Hookes law (at least in the linear regime we are currently modeling)
and so, for all practical purposes, behaves like a spring with a very large stiffness. As
a result, a structure can be regarded as a two- or three-dimensional generalization of a
massspring chain.
The joints will allow the bar to rotate in any direction. Of course, this is an idealization; in a building, the rivets and bolts will prevent rotation to a significant degree. However, under moderate stress for example, if the wind is blowing through our skyscraper,
the rivets and bolts can only be expected to keep the structure connected, and the rotational motion will provide stresses on the bolts which must be taken into account when
designing the structure. Of course, under extreme stress, the structure will fall apart
a disaster that its designers must avoid. The purpose of this section is to derive conditions that will guarantee that a structure is rigidly stable under moderate forcing, or,
alternatively, understand the mechanisms that might lead to its collapse.
Bars
The first order of business is to understand how an individual bar reacts to motion. We
have already encountered the basic idea in our treatment of springs. The key complication
here is that the ends of the bar/spring are not restricted to a single direction of motion,
but can move in either two or three-dimensional space. We use d = 2 or 3 to denote
the dimension of the underlying space. (When d = 1, the truss reduces to a massspring
chain.)
1/12/04

210

c 2003

Peter J. Olver

Figure 6.5.

Tangent Line Approximation.

Consider an unstressed bar with one end at position a1 R d and the other end at
T
position a2 R d . In d = 2 dimensions, we write ai = ( ai , bi ) , while in d = 3-dimensional
T
space ai = ( ai , bi , ci ) . The length of the bar is L = k a1 a2 k, where we use the standard
Euclidean norm to measure distance on R d throughout this section.
Suppose we move the ends of the bar a little, sending ai to bi = ai + ui and
simultaneously aj to bj = aj + uj . The unit vectors ui , uj R d indicate the respective
direction of displacement of the two ends, and we think of > 0, the magnitude of the
displacement, as small. How much has this motion stretched the bar? The length of the
displaced bar is
L + e = k bi bj k = k (ai + ui ) (aj + uj ) k = k (ai aj ) + (ui uj ) k
q
(6.41)
= k ai aj k2 + 2 (ai aj ) (ui uj ) + 2 k ui uj k2 .

The difference between the new length and the original length, namely
q
k ai aj k2 + 2 (ai aj ) (ui uj ) + 2 k ui uj k2 k ai aj k,
e =

(6.42)

is, by definition, the bars elongation.

If the underlying dimension d is 2 or more, the elongation e is a nonlinear function
of the displacement vectors ui , uj . Thus, an exact, geometrical treatment of structures
in equilibrium requires dealing with nonlinear systems of equations. For example, the
design of robotic mechanisms, [111], requires dealing with the fully nonlinear equations.
However, in many practical situations, the displacements are fairly small, so 1. For
example, when a building moves, the lengths of bars are in meters, but the displacements
are, barring catastrophes, typically in centimeters if not millimeters. In such situations,
we can replace the geometrically exact elongation by a much simpler linear approximation.
The most basic linear approximation to a nonlinear function g() near = 0 is given
by its tangent line or linear Taylor polynomial
g() g(0) + g 0 (0) ,

(6.43)

as in Figure 6.5. In the case of small displacements of a bar, the elongation (6.42) is a
square root function of the particular form
p
g() = a2 + 2 b + 2 c2 a,
1/12/04

211

c 2003

Peter J. Olver

where
a = k ai aj k,

b = (ai aj ) (ui uj ),

c = k ui uj k,

b
are independent of . Since g(0) = 0 and g 0 (0) = , the linear approximation (6.43) has
a
the form
p
b
a2 + 2 b + 2 c 2 a
for
1.
a
In this manner, we arrive at the linear approximation to the bars elongation
e

(ai aj ) (ui uj )
= n ( ui uj ),
k ai aj k

where

(ai aj )
k ai aj k

is the unit vector, k n k = 1, that points in the direction of the bar from node j to node i.
The overall small factor of was merely a device used to derive the linear approximation. It can now be safely discarded, so that the displacement of the i th node is now ui
instead of ui , and we assume k ui k is small. If bar k connects node i to node j, then its
(approximate) elongation is equal to
ek = nk (ui uj ) = nk ui nk uj ,

where

nk =

ai a j
.
k ai aj k

(6.44)

The elongation ek is the sum of two terms: the first, nk ui , is the component of the
displacement vector for node i in the direction of the unit vector nk that points along the
bar towards node i, whereas the second, nk uj , is the component of the displacement
vector for node j in the direction of the unit vector nk that points in the opposite
direction along the bar towards node j. Their sum gives the total elongation of the bar.
We assemble all the linear equations (6.44) relating nodal displacements to bar elongations in matrix form
e = A u.
(6.45)
e
u
1

e2
u
R m is the vector of elongations, while u = .2 R d n is the vector
Here e =
.
.
.
.
.
em
un
of displacements. Each ui R d is itself a column vector with d entries,andso u has a
xi
total of d n entries. For example, in the planar case d = 2, we have ui =
since each
yi
nodes displacement has both an x and y component, and so
x
1

y1
u

1
x2

u2
= y2 R 2 n .
u =
.
.
.
.
.
.
un

xn
yn

1/12/04

212

c 2003

Peter J. Olver

Figure 6.6.

Three Bar Planar Structure.

In three dimensions, d = 3, we have ui = ( xi , yi , zi ) , and so each node will contribute

three components to the displacement vector
u = ( x 1 , y1 , z1 , x 2 , y2 , z2 , . . . , x n , yn , zn )

R3 n.

The incidence matrix A connecting the displacements and elongations will be of size
m d n. The k th row of A will have (at most) 2 d nonzero entries. The entries in the d slots
corresponding to node i will be the components of the (transposed) unit bar vector n Tk
pointing towards node i, as given in (6.44), while the entries in the d slots corresponding to
node j will be the components of its negative nTk , which is the unit bar vector pointing
towards node j. All other entries are 0. The constructions are best appreciated by working
through an explicit example.
Example 6.5. Consider the planar structure pictured in Figure 6.6. The four nodes
are at positions
a1 = (0, 0)T ,

a2 = (1, 1)T ,

a3 = (3, 1)T ,

a4 = (4, 0)T ,

so the two side bars are at 45 angles and the center bar is horizontal. Applying our
algorithm, the associated incidence matrix is

12 12 12 12
0
0 0
0

(6.46)
A= 0
1
0 0
0 .
0 1 0
1
1
1
1
0
0 0
0 2 2 2 2
The three rows of A refer to the three bars in our structure. The columns come in pairs,
as indicated by the vertical lines in the matrix: the first two columns refer to the x and
y displacements of the first node; the third and fourth columns refer to the second node,
and so on. The first two entries of the first row of A indicate the unit vector

T
a1 a 2
= 12 , 12
n1 =
k a1 a2 k
that points along the first bar towards the first node, while the third and fourth entries
have the opposite signs, and form the unit vector

T
a2 a 1
= 12 , 12
n1 =
k a2 a1 k
1/12/04

213

c 2003

Peter J. Olver

along the same bar that points in the opposite direction towards the second node. The
remaining entries are zero because the first bar only connects the first two nodes. Similarly,
the unit vector along the second bar pointing towards node 2 is
n2 =

a2 a 3
T
= ( 1, 0 ) ,
k a2 a3 k

and this gives the third and fourth entries of the second row of A; the fifth and sixth entries
are their negatives, corresponding to the unit vector n2 pointing towards node 3. The
last row is constructed from the unit vector in the direction of bar #3 in the same fashion.
Remark : Interestingly, the incidence matrix for a structure only depends on the directions of the bars and not their lengths. This is analogous to the fact that the incidence
matrix for an electrical network only depends on the connectivity properties of the wires
and not on their overall lengths. Indeed, one can regard the incidence matrix for a structure
as a kind of ddimensional generalization of the incidence matrix for a directed graph.
The next phase of our procedure is to introduce the constitutive relations for the bars
in our structure that determine their internal forces or stresses. As we remarked at the
beginning of the section, each bar is viewed as a very strong spring, subject to a linear
Hookes law equation
yk = c k e k
(6.47)
that relates its elongation ek to its internal force yk . The bar stiffness ck > 0 is a positive
scalar, and so yk > 0 if the bar is in tension, while yk < 0 if the bar is compressed. In this
approximation, there is no bending and the bars will only experience external forcing at
the nodes. We write (6.47) in matrix form
y = C e,
where C = diag (c1 , . . . , cm ) > 0 is a diagonal, positive definite matrix.
Finally, we need to balance the forces at each node in order to achieve equilibrium.
If bar k terminates at node i, then it exerts a force yk nk on the node, where nk is
the unit vector pointing towards the node in the direction of the bar, as in (6.44). The
minus sign comes from physics: if the bar is under tension, so yk > 0, then it is trying to
contract back to its unstressed state, and so will pull the node towards it in the opposite
direction to nk while a bar in compression will push the node away. In addition, we
may have an externally applied force vector, denoted by f i , on node i, which might be
some combination of gravity, weights, mechanical forces, and so on. (In this admittedly
simplified model, external forces only act on the nodes.) Force balance at equilibrium
requires that the sum of all the forces, external and internal, at each node cancel; thus,
X
X
fi +
( yk nk ) = 0,
or
yk n k = f i ,
k

where the sum is over all the bars that are attached to node i. The matrix form of the
force balance equations is (and this should no longer come as a surprise)
f = AT y,
1/12/04

214

(6.48)
c 2003

Peter J. Olver

Figure 6.7.

A Triangular Structure.
T

where AT is the transpose of the incidence matrix, and f = ( f1 , f2 , . . . , fn ) R d n

is the vector containing all external force on the nodes. Putting everything together,
(6.45), (6.47), (6.48), i.e.,
e = A u,

f = AT y,

y = C e,

we once again are lead to our familiar linear system of equations

Ku = f,

where

K = AT C A.

(6.49)

The stiffness matrix K is a positive (semi-)definite Gram matrix (3.51) associated with
the weighted inner product on the space of elongations prescribed by the diagonal matrix
C.
As we know, the stiffness matrix for our structure will be positive definite, K > 0, if
and only if the incidence matrix has trivial kernel: ker A = {0}. The preceding example,
and indeed all of these constructed so far, will not have this property, for the same reason
as in an electrical network because we have not tied down (or grounded) our structure
anywhere. In essence, we are considering a structure floating in outer space, which is free
to move around without changing its shape. As we will see, each possible rigid motion
of the structure will correspond to an element of the kernel of its incidence matrix, and
thereby prevent positive definiteness of the structure matrix K.
Example 6.6. Consider a planar space station in the shape of a unit equilateral
triangle, as in Figure 6.7. Placing the nodes at positions
a1 =

1
2,

3
2

a2 = ( 1, 0 ) ,

a3 = ( 0, 0 ) ,

we use the preceding algorithm to compute the incidence matrix

1
3
1 3
0
0
2
2
2

2

3
3 1
A = 12

0
0 ,
2
2 2
0 1
0
0
0 1
1/12/04

215

c 2003

Peter J. Olver

Figure 6.8.

Rotating a Space Station.

whose rows are indexed by the bars, and whose columns are indexed in pairs by the three
nodes. The kernel of A is three-dimensional, with basis

0

1

0

z2 =
,
1

0
1

1

0

1

z1 =
,
0

1
0

3
2
1
2

.
z3 =

(6.50)

These three displacement vectors correspond to three different planar rigid motions: the
first two correspond to translations, and the third to a rotation.
The translations are easy to discern. Translating the space station in a horizontal direction means that we move all three nodes the same amount, and so the displacements are
u1 = u2 = u3 = a for some fixed vector a. In particular, a rigid unit horizontal translation
T
has a = e1 = ( 1, 0 ) , and corresponds to the first kernel basis vector. Similarly, a unit
T
vertical translation of all three nodes corresponds to a = e2 = ( 0, 1 ) , and corresponds to
the second kernel basis vector. Any other translation is a linear combination of these two.
Translations do not alter the lengths of any of the bars, and so do not induce any stress
in the structure.
The rotations are a little more subtle, owing to the linear approximation that we used
to compute the elongations. Referring to Figure 6.8, rotating the space station through a
T
small angle around the node a3 = ( 0, 0 ) will move the other two nodes to positions
b1 =
1/12/04

1
2
1
2

cos
sin +

3
sin
2
3
2 cos

b2 =
216

!
cos
,
sin

!
0
b3 =
.
0
c 2003

(6.51)

Peter J. Olver

However, the corresponding displacements

1
2
1
2

u1 = b 1 a 1 =

u2 = b 2 a 2 =

(cos 1)
sin +

3
2

!
cos 1
,
sin

3
2

sin

(cos 1)

!
0
u3 = b3 a3 =
,
0

(6.52)

do not combine into a vector that belongs to ker A. The problem is that, under a rotation,
the nodes move along circles, while the kernel displacements u = z ker A correspond
to straight line motion! In order to maintain consistency, we must adopt a similar linear
approximation of the nonlinear circular motion of the nodes. Thus, we replace the nonlinear
displacements uj () in (6.52) by their linear tangent approximations u0j (0), and so
u1

3
2
1
2

!
0
1

u3 =

!
0
0

The resulting displacements do combine to produce the displacement vector

u = 23

1
2

0 1 0 0

= z3

that moves the space station in the direction of the third element of the kernel of the
incidence matrix! Thus, as claimed, z3 represents the linear approximation to a rigid
rotation around the first node.
Remarkably, the rotations around the other two nodes, although distinct nonlinear
motions, can be linearly approximated by particular combinations of the three kernel basis
elements z1 , z2 , z3 , and so already appear in our description of ker A. For example, the
displacement vector
u=

3
2

z1 +

1
2

z2 z 3 = 0 0

3
2

1
2

(6.53)

represents the linear approximation to a rigid rotation around the first node. We conclude
that the three-dimensional kernel of the incidence matrix represents the sum total of all
possible rigid motions of the space station, or, more correctly, their linear approximations.
Which types of forces will maintain the space station in equilibrium? This will happen
if and only if we can solve the force balance equations
AT y = f

(6.54)

for the internal forces y. The Fredholm Alternative Theorem 5.51 implies that the system
(6.54) has a solution if and only if f is orthogonal to coker A T = ker A. Therefore, f =

Note that uj (0) = 0.

1/12/04

217

c 2003

Peter J. Olver

( f1 g1 f2 g2 f3 g3 ) must be orthogonal to the basis vectors (6.50), and so must satisfy

the three linear constraints
z1 f = f1 + f2 + f3 = 0,

z2 f = g1 + g2 + g3 = 0,

z3 f =

3
2

f1 +

1
2

g1 + g3 = 0.

The first requires that there is no net horizontal force on the space station. The second
requires no net vertical force. The last constraint requires that the moment of the forces
around the first node vanishes. The vanishing of the force moments around each of the
other two nodes follows, since the associated kernel vectors can be expressed as linear
combinations of the three basis elements. The physical requirements are clear. If there is
a net horizontal or vertical force, the space station will rigidly translate in that direction;
if there is a non-zero force moment, the station will rigidly rotate. In any event, unless
the force balance equations are satisfied, the space station cannot remain in equilibrium.
A freely floating space station is in an unstable configuration and can easily be set into
motion.
Since there are three independent rigid motions, we need to impose three constraints
on the structure in order to stabilize it. Grounding one of the nodes, i.e., preventing it
from moving by attaching it to a fixed support, will serve to eliminate the two translational
instabilities. For example, setting u3 = 0 has the effect of fixing the third node of the space
station to a support. With this specification, we can eliminate the variables associated with
that node, and thereby delete the corresponding columns of the incidence matrix leaving
the reduced incidence matrix

1
3
0
0
2
2

1
3
3
A? = 12
.

2
2
2
0

The kernel of A? is now only one-dimensional, spanned by the single vector

z?3 =

3
2

1
2

0 1

which corresponds to (the linear approximation of) the rotations around the fixed node. To
prevent the structure from rotating, we can also fix the second node, by further requiring
u2 = 0. This allows us to eliminate the third and fourth columns of the incidence matrix
and the resulting doubly reduced incidence matrix

1
3
2

A?? =

1
2

2
3
2

Now ker A?? = {0} is trivial, and hence the corresponding reduced stiffness matrix

1
3

!
!

2
2
1
1
1

0
0

2
3 =
2
K ?? = (A?? )T A?? = 2
12

2
3
3
0
0 23
2
2
0
0
1/12/04

218

c 2003

Peter J. Olver

Figure 6.9.

Three Bar Structure with Fixed Supports.

is positive definite. The space station with two fixed nodes is a stable structure, which can
now support an arbitrary external forcing. (Forces on the fixed nodes now have no effect
since they are no longer allowed to move.)
In general, a planar structure without any fixed nodes will have at least a threedimensional kernel, corresponding to the rigid planar motions of translations and (linear
approximations to) rotations. To stabilize the structure, one must fix two (non-coincident)
nodes. A three-dimensional structure that is not tied to any fixed supports will admit 6
independent rigid motions in its kernel. Three of these correspond to rigid translations in
the three coordinate directions, while the other three correspond to linear approximations
to the rigid rotations around the three coordinate axes. To eliminate the rigid motion
instabilities of the structure, one needs to fix three non-collinear nodes; details can be
found in the exercises.
Even after attaching a sufficient number of nodes to fixed supports so as to eliminate
all possible rigid motions, there may still remain nonzero vectors in the kernel of the
reduced incidence matrix of the structure. These indicate additional instabilities in which
the shape of the structure can deform without any applied force. Such non-rigid motions
are known as mechanisms of the structure. Since a mechanism moves the nodes without
elongating any of the bars, it does not induce any internal forces. A structure that admits
a mechanism is unstable even very tiny external forces may provoke a large motion.
Example 6.7. Consider the three bar structure of Example 6.5, but now with its
two ends attached to supports, as pictured in Figure 6.9. Since we are fixing nodes 1 and
4, setting u1 = u4 = 0, we should remove the first two and last column pairs from the
incidence matrix (6.46), leading to the reduced incidence matrix

1
0
0

2
A? = 1 0 1
0 .

1
1
0
0 2 2

The structure no longer admits any rigid motions. However, the kernel of A ? is oneT
dimensional, spanned by reduced displacement vector z? = ( 1 1 1 1 ) , which corresponds to the unstable mechanism that displaces the second node in the direction
T
T
u2 = ( 1 1 ) and the third node in the direction u3 = ( 1 1 ) . Geometrically, then,
1/12/04

219

c 2003

Peter J. Olver

Figure 6.10.

Reinforced Planar Structure.

z? represents the displacement where node 2 moves down and to the left at a 45 angle,
while node 3 moves simultaneously up and to the left at a 45 angle. This mechanism does
not alter the lengths of the three bars (at least in our linear approximation regime) and
so requires no net force to be set into motion.
As with the rigid motions of the space station, an external forcing vector f ? will
maintain equilibrium only when it lies in the corange of A? , and hence must be orthogonal
T
to all the mechanisms in ker A? = (corng A? ) . Thus, the nodal forces f 2 = ( f2 , g2 ) and
T
f 3 = ( f3 , g3 ) must satisfy the balance law
z? f ? = f2 g2 + f3 + g3 = 0.
If this fails, the equilibrium equation has no solution, and the structure will move. For
example, a uniform horizontal force f2 = f3 = 1, g2 = g3 = 0, will induce the mechanism,
whereas a uniform vertical force, f2 = f3 = 0, g2 = g3 = 1, will maintain equilibrium. In
the latter case, the solution to the equilibrium equations

3
1
1
0
2
2

1
1
0
0
2
? ?
?
?
? T ?
2
K u =f ,
where
K = (A ) A =
,
3
1
1 0

2
2
0

1
2

is indeterminate, since we can add in any element of ker K ? = ker A? , so

u? = ( 3 5 2 0 ) + t ( 1 1 1 1 ) .
In other words, the equilibrium position is not unique, since the structure can still be
displaced in the direction of the unstable mechanism while maintaining the overall force
balance. On the other hand, the elongations and internal forces
T

y = e = A ? u? = ( 2 1 2 ) ,
are well-defined, indicating that, under our stabilizing uniform vertical force, all three bars
are compressed, with the two diagonal bars experiencing 41.4% more compression than
the horizontal bar.
1/12/04

220

c 2003

Peter J. Olver

Remark : Just like the rigid rotations, the mechanisms described here are linear approximations to the actual nonlinear motions. In a physical structure, the vertices will
move along curves whose tangents at the initial configuration are the directions indicated
by the mechanism vector. In certain cases, a structure can admit a linear mechanism, but
one that cannot be physically realized due to the nonlinear constraints imposed by the
geometrical configurations of the bars. Nevertheless, such a structure is at best borderline
stable, and should not be used in any real-world applications that rely on stability of the
structure.
We can always stabilize a structure by first fixing nodes to eliminate rigid motions,
and then adding in extra bars to prevent mechanisms. In the preceding example, suppose
we attach an additional bar connecting nodes 2 and 4, leading to the reinforced structure
in Figure 6.10. The revised incidence matrix is

12 12

0 1

0
0

0 310

12
0
0
0

1
2

0
0
1
10

0
1
12
0

0
1
2
0

1
2
3

110

and is obtained from (6.46) by appending another row representing the added bar. When
nodes 1 and 4 are fixed, the reduced incidence matrix

A =

1
2

1
0

0
0

1
2
0

3
10

1
10

12
0

has trivial kernel, ker A? = {0}, and hence the structure is stable. It admits no mechanisms, and can support any configuration of forces (within reason very large forces
will take us outside the linear regime described by the model, and the structure may be
crushed!).
This particular case is statically determinate owing to the fact that the incidence
matrix is square and nonsingular, which implies that one can solve the force balance
equations (6.54) directly for the internal forces. For instance, a uniform downwards vertical
force f2 = f3 = 0, g2 = g3 = 1, e.g., gravity, will produce the internal forces
y1 =

y2 = 1,

y3 =

y4 = 0

indicating that bars 1, 2 and 3 are experiencing compression, while, interestingly, the
reinforcing bar 4 remains unchanged in length and hence experiences no internal force.
Assuming the bars are all of the same material, and taking the elastic constant to be 1, so
1/12/04

221

c 2003

Peter J. Olver

C = I , then the reduced stiffness matrix is

12
5
1
5

K = (A ) A =
1
?

? T

1
5
3
5

3
2

0
.
1
2

1
2

The solution to the reduced equilibrium equations is

so
u2 = 12 32 ,
u? = 12 32 32 72 ,

T
u3 = 23 27 .

give the displacements of the two nodes under the applied force. Both are moving down
and to the left, with node 3 moving relatively farther owing to its lack of reinforcement.
Suppose we reinforce the structure yet further by adding in a bar connecting nodes 1
and 3. The resulting reduced incidence matrix

1
1
0
0
2
2

1
0
1
0

1
1

0
0

A =
2
2

1
0
0

10
10
0

3
10

1
10

again has trivial kernel, ker A? = {0}, and hence the structure is stable. Indeed, adding
in extra bars to a stable structure cannot cause it to lose stability. (In matrix language,
appending additional rows to a matrix cannot increase the size of its kernel, cf. Exercise .)
Since the incidence matrix is rectangular, the structure is now statically indeterminate and
we cannot determine the internal forces without first solving the full equilibrium equations
(6.49) for the displacements. The stiffness matrix is

12
5
1
5

K ? = (A? )T A? =
1
0

1
5
3
5

12
5
51

0
.
1
5

3
5

For the same uniform vertical force, the displacement u? = (K ? )1 f ? is

u? =

1
10

1
17
17
10 10 10

so that the free nodes now move symmetrically down and towards the center of the structure. The internal forces on the bars are
q
q

4
4
2
1
y1 = 5 2,
y3 = 5 2,
y5 = 25 .
y2 = 5 ,
y4 = 5 ,
All five bars are now experiencing compression, the two outside bars being the most
stressed, the reinforcing bars slightly more than half that, while the center bar feels less
1/12/04

222

c 2003

Peter J. Olver

A Swing Set.

Figure 6.11.

than a fifth the stress that the outside bars experience. This simple computation should
already indicate to the practicing construction engineer which of the bars in our structure
are more likely to collapse under an applied external force. By comparison, the reader can
investigate what happens under a uniform horizontal force.
Summarizing our discussion, we have established the following fundamental result
characterizing the stability and equilibrium of structures.
Theorem 6.8. A structure is stable, and will maintain an equilibrium under arbitrary external forcing, if and only if its reduced incidence matrix A ? has linearly independent columns, or, equivalently, ker A? = {0}. More generally, an external force f ?
on a structure will maintain equilibrium if and only if f ? (ker A? ) , which means that
the external force is orthogonal to all rigid motions and all mechanisms admitted by the
structure.
Example 6.9. A swing set is to be constructed, consisting of two diagonal supports
at each end and a horizontal cross bar. Is this configuration stable, i.e., can a child swing
on it without it collapsing? The movable joints are at positions
T

a1 = ( 1, 1, 3 ) ,

a2 = ( 4, 1, 3 ) .

The four fixed supports are at positions

a3 = ( 0, 0, 0 ) ,

a4 = ( 0, 2, 0 ) ,

a5 = ( 5, 0, 0 ) ,

a6 = ( 5, 2, 0 ) .

The reduced incidence matrix for the structure is calculated in the usual manner:

1
11
1
11

?
A = 1

0
0

1
11

111

3
11
3
11

0
0

0
1

0
0

1
11
111

3
11
3

1
11
1

For instance, the first three entries contained in the first row refer to the unit vector
a1 a 3
in the direction of the bar going from a3 to a1 . Suppose the three bars
n1 =
k a1 a3 k
1/12/04

223

c 2003

Peter J. Olver

have the same stiffness, and so (taking c1 = = c5 = 1) the reduced stiffness matrix for
the structure is

13
11

?
? T ?
K = (A ) A = 11
1

0
0

6
11

2
11

18
11

13
11

2
11

6
11

0
18
11

We find ker K ? = ker A? is one-dimensional, with basis

z? = ( 3 0 1 3 0 1 ) ,
which indicates a mechanism that causes the swing set to collapse: the first node moves
up and to the right, while the second node moves down and to the right, the horizontal
motion being three times as large as the vertical. The structure can support forces f 1 =
T
T
( f1 , g1 , h1 ) , f 2 = ( f2 , g2 , h2 ) , if and only if the combined force vector f ? is orthogonal
to the mechanism vector z? , and so
3 (f1 + f2 ) h1 + h2 = 0.
Thus, as long as the net horizontal force is in the y direction and the vertical forces on the
two joints are equal, the structure will maintain its shape. Otherwise, a reinforcing bar,
say from a1 to a6 (although this will interfere with the swinging!) or a pair of bars from
the nodes to two new ground supports, will be required to completely stabilize the swing.
T

For a uniform downwards unit vertical force, f = ( 0, 0, 1, 0, 0, 1 ) , a particular

solution to (6.11) is
T

0 43 11
0 0
u? = 13
6
6
and the general solution u = u? + t z? is obtained by adding in an arbitrary element of the
kernel. The resulting forces/elongations are uniquely determined,
y = e = A ? u = A ? u? =

11
6

so that every bar is compressed, the middle one experiencing slightly more than half the
stress of the outer supports.
If we stabilize the structure by adding in two vertical supports at the nodes, then the
1/12/04

224

c 2003

Peter J. Olver

new reduced incidence matrix

?
A =

11
1
11

1
11

111

3
11
3
11

1
11
1
11

has trivial kernel, indicating stabilization of the

13
6
0 11
11
2
0
0
11

6
29
0 11

K ? = 11
1 0
0

0
0
0
0

1
11

111

3
11

0
1

structure. The reduced stiffness matrix

1
0
0

0
0
0

6
13
0

11
11

2
0
0
11
29
6
0
11
11

is only slightly different than before, but this is enough to make it positive definite, K ? > 0,
and so allow arbitrary external forcing without collapse. Under the uniform vertical force,
the internal forces are

T

?
11
11
11
11
1
2
2
y = e = A u = 10 10 5 10 10 5 5
.

Note the overall reductions in stress in the original bars; the two new vertical bars are now
experiencing the largest amount of stress.

1/12/04

225

c 2003

Peter J. Olver

Chapter 7
Linear Functions and Linear Systems
We began this book by learning how to systematically solve systems of linear algebraic
equations. This elementary problem formed our launching pad for developing the fundamentals of linear algebra. In its initial form, matrices and vectors were the primary focus
of our study, but the theory was developed in a sufficiently general and abstract form that
it can be immediately applied to many other important situations particularly infinitedimensional function spaces. Indeed, applied mathematics deals, not just with algebraic
equations, but also differential equations, difference equations, integral equations, integrodifferential equations, differential delay equations, control systems, and many, many other
types of systems not all of which, unfortunately, can be adequately developed in this
introductory text. It is now time to assemble what we have learned about linear matrix
systems and place the results in a suitably general framework that will lead to insight into
the fundamental principles that govern completely general linear problems.
The most basic underlying object of linear systems theory is the vector space, and
we have already seen that the elements of vector spaces can be vectors, or functions, or
even vector-valued functions. The seminal ideas of span, linear independence, basis and
dimension are equally applicable and equally vital in more general contexts, particularly
function spaces. Just as vectors in Euclidean space are prototypes of general elements
of vector spaces, matrices are also prototypes of much more general objects, known as
linear functions. Linear functions are also known as linear maps or linear operators,
particularly when we deal with function spaces, and include linear differential operators,
linear integral operators, evaluation of a function or its derivative at a point, and many
other basic operations on functions. Generalized functions, such as the delta function to
be introduced in Chapter 11, are, in fact, properly formulated as linear operators on a
suitable space of functions. As such, linear maps form the simplest class of functions on
vector spaces. Nonlinear functions can often be closely approximated by linear functions,
generalizing the calculus approximation of a function by its tangent line. As a result, linear
functions must be thoroughly understood before any serious progress can be made in the
vastly more complicated nonlinear world.
In geometry, linear functions are interpreted as linear transformations of space (or
space-time), and, as such, lie at the foundations of motion of bodies, computer graphics and games, and the mathematical formulation of symmetry. Most basic geometrical
transformations, including rotations, scalings, reflections, projections, shears and so on,
are governed by linear transformations. However, translations require a slight generalization, known as an affine function. Linear operators on infinite-dimensional function
spaces are the basic objects of quantum mechanics. Each quantum mechanical observable
1/12/04

225

c 2003

Peter J. Olver

(mass, energy, momentum) is formulated as a linear operator on an infinite-dimensional

Hilbert space the space of wave functions or states of the system. The dynamics of the
quantum mechanical system is governed by the linear Schrodinger equation, [100, 104].
It is remarkable that quantum mechanics is an entirely linear theory, whereas classical
and relativistic mechanics are inherently nonlinear! The holy grail of modern physics
the unification of general relativity and quantum mechanics is to resolve the apparent
incompatibility of the microscopic and macroscopic physical regimes.
A linear system is just an equation satisfied by a linear function. The most basic
linear system is a system of linear algebraic equations. Linear systems also include linear
differential equations, linear boundary value problems, linear partial differential equations,
and many, many others in a common conceptual framework. The fundamental idea of linear
superposition and the relation between the solutions to inhomogeneous and homogeneous
systems underlie the structure of the solution space of all linear systems. You have no doubt
encountered many of these ideas in your first course on ordinary differential equations;
they have also already appeared in our development of the theory underlying the solution
of linear algebraic systems. The second part of this book will be devoted to solution
techniques for particular classes of linear systems arising in applied mathematics.

7.1. Linear Functions.

We begin our study of linear functions with the basic definition. For simplicity, we shall
concentrate on real linear functions between real vector spaces. Extending the concepts
and constructions to complex linear functions on complex vector spaces is not difficult,
and will be dealt with later.
Definition 7.1. Let V and W be real vector spaces. A function L: V W is called
linear if it obeys two basic rules:
L[ v + w ] = L[ v ] + L[ w ],

L[ c v ] = c L[ v ],

(7.1)

We will call V the domain space and W the target space for L.
In particular, setting c = 0 in the second condition implies that a linear function
always maps the zero element in V to the zero element in W , so
L[ 0 ] = 0.

(7.2)

We can readily combine the two defining conditions into a single rule
L[ c v + d w ] = c L[ v ] + d L[ w ],

for all

v, w V,

c, d R,

(7.3)

that characterizes linearity of a function L. An easy induction proves that a linear function
respects linear combinations, so
L[ c1 v1 + + ck vk ] = c1 L[ v1 ] + + ck L[ vk ]

(7.4)

The term target is used here to avoid later confusion with the range of L, which, in general,
is a subspace of the target vector space W .

1/12/04

226

c 2003

Peter J. Olver

for any c1 , . . . , ck R and v1 , . . . , vk V .

The interchangeable terms linear map, linear operator and, when V = W , linear
transformation are all commonly used as alternatives to linear function, depending on
the circumstances and taste of the author. The term linear operator is particularly
useful when the underlying vector space is a function space, so as to avoid confusing the
two different uses of the word function. As usual, we will sometimes refer to the elements
of a vector space as vectors even though they might be functions or matrices or something
else, depending upon the vector space being considered.
Example 7.2. The simplest linear function is the zero function L[ v ] 0 which
maps every element v V to the zero vector in W . Note that, in view of (7.2), this is
the only constant linear function. A nonzero constant function is not, despite its evident
simplicity, linear. Another simple but important linear function is the identity function
I = I V : V V which leaves every vector unchanged: I [ v ] = v. Slightly more generally,
the operation of scalar multiplication Ma [ v ] = a v by a fixed scalar a R defines a linear
function from V to itself.
Example 7.3. Suppose V = R. We claim that every linear function L: R R has
the form
y = L(x) = a x,
for some constant a. Indeed, writing x R as a scalar product x = x 1, and using the
second property in (7.1), we find
L(x) = L(x 1) = x L(1) = a x,

where

a = L(1).

Therefore, the only scalar linear functions are those whose graph is a straight line passing
through the origin.
Warning: Even though the graph of the function
y = a x + b,

(7.5)

is a straight line, this is not a linear function unless b = 0 so the line goes through
the origin. The correct name for a function of the form (7.5) is an affine function; see
Definition 7.20 below.
Example 7.4. Let V = R n and W = R m . Let A be an m n matrix. Then the
function L[ v ] = A v given by matrix multiplication is easily seen to be a linear function.
Indeed, the requirements (7.1) reduce to the basic distributivity and scalar multiplication
properties of matrix multiplication:
A(v + w) = A v + A w,

A(c v) = c A v,

for all

v, w R n ,

c R.

In fact, every linear function between two Euclidean spaces has this form.
Theorem 7.5. Every linear function L: R n R m is given by matrix multiplication,
L[ v ] = A v, where A is an m n matrix.
1/12/04

227

c 2003

Peter J. Olver

Figure 7.1.

Linear Function on Euclidean Space.

Warning: Pay attention to the order of m and n. While A has size m n, the linear
function L goes from R n to R m .
Proof : The key idea is to look at what the linear function does to the basis vectors.
Let e1 , . . . , en be the standard basis of R n , and let b
e1 , . . . , b
em be the standard basis of R m .
(We temporarily place hats on the latter to avoid confusing the two.) Since L[ e j ] R m ,
we can write it as a linear combination of the latter basis vectors:

a1j
a2j

L[ ej ] = aj = . = a1j b
e1 + a2j b
e2 + + amj b
em ,
j = 1, . . . , n.
(7.6)
.
.
amj
Let us construct the m n matrix

a
11
a21
A = ( a 1 a2 . . . a n ) =
..
.

am1

a12
a22
..
.

...
...
..
.

a1n
a2n
..
.

am2

...

amn

(7.7)

whose columns are the image vectors (7.6). Using (7.4), we then compute the effect of L
T
on a general vector v = ( v1 , v2 , . . . , vn ) R n :
L[ v ]= L[ v1 e1 + + vn en ] = v1 L[ e1 ] + + vn L[ en ] = v1 a1 + + vn an = A v.
The final equality follows from our basic formula (2.14) connecting matrix multiplication
and linear combinations. We conclude that the vector L[ v ] coincides with the vector A v
obtained by multiplying v by the coefficient matrix A.
Q.E.D.
The proof of Theorem 7.5 shows us how to construct the matrix representative of
a given linear function L: R n R m . We merely assemble the image column vectors
a1 = L[ e1 ], . . . , an = L[ en ] into an m n matrix A.
Example 7.6. In the case of a function from R n to itself, the two basic linearity
conditions (7.1) have a simple geometrical interpretation. Since vector addition is the
1/12/04

228

c 2003

Peter J. Olver

Linearity of Rotations.

Figure 7.2.

same as completing the parallelogram indicated in Figure 7.1, the first linearity condition
requires that L map parallelograms to parallelograms. The second linearity condition says
that if we stretch a vector by a factor c, then its image under L must also be stretched by
the same amount. Thus, one can often detect linearity by simply looking at the geometry
of the function.
As a specific example, consider the function R : R 2 R 2 that rotates the vectors in
the plane around the origin by a specified angle . This geometric transformation clearly
preserves parallelograms, as well as stretching see Figure 7.2 and hence defines a
linear function. In order to find its matrix representative, we need to find out where the
basis vectors e1 , e2 are mapped. Referring to Figure 7.3, we have

cos
sin
R [ e1 ] = cos e1 + sin e2 =
,
R [ e2 ] = sin e1 + cos e2 =
.
sin
cos

According to the general recipe (7.7), we assemble these two column vectors to obtain the
matrix form of the rotation transformation, and so

cos sin
.
(7.8)
R [ v ] = A v,
where
A =
sin cos

x
through angle gives the vector
Therefore, rotating a vector v =
y

cos sin
x
x cos y sin
b = R [ v ] = A v =
v
=
sin
cos
y
x sin + y cos
with coordinates

x
b = x cos y sin ,

yb = x sin + y cos .

These formulae can be proved directly, but, in fact, are a consequence of the underlying
linearity of rotations.
1/12/04

229

c 2003

Peter J. Olver

Figure 7.3.

Rotation in R .

Linear Operators
So far, we have concentrated on linear functions on Euclidean space, and discovered
that they are all represented by matrices. For function spaces, there is a much wider
variety of linear operators available, and a complete classification is out of the question.
Let us look at some of the main representative examples that arise in applications.
Example 7.7. (i ) Recall that C0 [ a, b ] denotes the vector space consisting of all
continuous functions on the interval [ a, b ]. Evaluation of the function at a point, L[ f ] =
f (x0 ), defines a linear operator L: C0 [ a, b ] R, because
L[ c f + d g ] = c f (x0 ) + d g(x0 ) = c L[ f ] + d L[ g ]
for any functions f, g C0 [ a, b ] and scalars (constants) c, d.
(ii ) Another real-valued linear function is the integration operator
Z b
I[ f ] =
f (x) dx.

(7.9)

Linearity of I is an immediate consequence of the basic integration identity

Z b
Z b
Z b

c f (x) + d g(x) dx = c
f (x) dx + d
g(x) dx,
a

which is valid for arbitrary integrable which includes continuous functions f, g and
scalars c, d.
(iii ) We have already seen that multiplication of functions by a fixed scalar f (x) 7
c f (x) defines a linear map Mc : C0 [ a, b ] C0 [ a, b ]; the particular case c = 1 reduces to the
identity transformation I = M1 . More generally, if a(x) C0 [ a, b ] is a fixed continuous
function, then the operation Ma [ f (x) ] = a(x) f (x) of multiplication by a also defines a
linear transformation Ma : C0 [ a, b ] C0 [ a, b ].
(iv ) Another important linear transformation is the indefinite integral
Z x
J[ f ] =
f (y) dy.
(7.10)
a

According to the Fundamental Theorem of Calculus, the integral of a continuous function is

continuously differentiable; therefore, J: C0 [ a, b ] C1 [ a, b ] defines a linear operator from
the space of continuous functions to the space of continuously differentiable functions.
1/12/04

230

c 2003

Peter J. Olver

(v ) Vice versa, differentiation of functions is also a linear operation. To be precise,

since not every continuous function can be differentiated, we take the domain space to be
the vector space C1 [ a, b ] of continuously differentiable functions on the interval [ a, b ]. The
derivative operator
D[ f ] = f 0
(7.11)
defines a linear operator D: C1 [ a, b ] C0 [ a, b ]. This follows from the elementary differentiation formula
D[ c f + d g ] = (c f + d g) 0 = c f 0 + d g 0 = c D[ f ] + d D[ g ],
valid whenever c, d are constant.
The Space of Linear Functions
Given vector spaces V, W , we use L( V, W ) to denote the set of all linear functions
L: V W . We claim that L( V, W ) is itself a vector space. We add two linear functions
L, M L( V, W ) in the same way we add general functions: (L + M )[ v ] = L[ v ] + M [ v ].
You should check that L+M satisfies the linear function axioms (7.1) provided L and M do.
Similarly, multiplication of a linear function by a scalar c R is defined so that (c L)[ v ] =
c L[ v ], again producing a linear function. The verification that L( V, W ) satisfies the basic
vector space axioms is left to the reader.
In particular, if V = R n and W = R m , then Theorem 7.5 implies that we can identify
n
L( R , R m ) with the space Mmn of all m n matrices. Addition of linear functions
corresponds to matrix addition, while scalar multiplication coincides with the usual scalar
multiplication of matrices. Therefore, the space of all m n matrices forms a vector
space a fact we already knew. A basis for Mmn is given by the m n matrices Eij ,
1 i m, 1 j n, which have a single 1 in the (i, j) position and zeros everywhere
else. Therefore, the dimension of Mmn is m n. Note that Eij corresponds to the specific
linear transformation mapping ej R n to b
ei R m and every other ek R n to zero.
Example 7.8. The space of linear transformations
of the plane, L( R 2 , R 2 ) is iden

a b
. The standard basis of M22
tified with the space M22 of 2 2 matrices A =
c d
consists of the 4 = 2 2 matrices

1 0
0 1
0 0
0 0
E11 =
,
E12 =
,
E21 =
,
E22 =
.
0 0
0 0
1 0
0 1
Indeed, we can uniquely write any other matrix

a b
= a E11 + b E12 + c E21 + d E22 ,
A=
c d

as a linear combination of these four basis matrices.

In infinite-dimensional situations, one usually imposes additional restrictions, e.g., continuity

or boundedness of the linear operators. We can safely relegate these more subtle distinctions to a
more advanced treatment of the subject. See [ 122 ] for a full discussion of the rather sophisticated
analytical details, which do play an important role in serious quantum mechanical applications.

1/12/04

231

c 2003

Peter J. Olver

A particularly important case is when the target space of the linear functions is R.
Definition 7.9. The dual space to a vector space V is defined as the vector space
V = L( V, R ) consisting of all real-valued linear functions L: V R.
If V = R n , then every linear function L: R n R is given by multiplication by a 1 n
matrix, i.e., a row vector. Explicitly,
v
1
v2

L[ v ] = a v = a1 v1 + + an vn ,
where a = ( a1 a2 . . . an ),
v=
.. .
.
vn
Therefore, we can identify the dual space (R n ) with the space of row vectors with n
entries. In light of this observation, the distinction between row vectors and column
vectors is now seen to be much more sophisticated than mere semantics or notation. Row
vectors should be viewed as real-valued linear functions the dual objects to column
vectors.
The standard dual basis 1 , . . . , n of (R n ) consists of the standard row basis vectors,
namely j is the row vector with 1 in the j th slot and zeros elsewhere. The j th dual basis
element defines the linear function
Ej [ v ] = j v = v j ,
that picks off the j th coordinate of v with respect to the original basis e1 , . . . , en . Thus,
the dimension of V = R n and its dual (R n ) are both equal to n.
An inner product structure provides a mechanism for identifying a vector space and
its dual. However, it should be borne in mind that this identification will depend upon
the choice of inner product.
Theorem 7.10. Let V be a finite-dimensional real inner product space. Then every
linear function L: V R is given by an inner product
L[ v ] = h a ; v i

(7.12)

with a unique vector a V . The correspondence between L and a allows us to identify

V 'V.
Proof : Let u1 , . . . , un be an orthonormal basis of V . (If necessary, we can use the
GramSchmidt process to generate such a basis.) If we write v = x1 u1 + + xn un , then,
by linearity,
L[ v ] = x1 L[ u1 ] + + xn L[ un ] = a1 x1 + + an xn ,
where ai = L[ ui ]. On the other hand, if we write a = a1 u1 + + an un , then, by
orthonormality of the basis,
ha;vi =

n
X

a i xj h ui ; u j i = a 1 x1 + + a n xn .

i,j = 1

Thus equation (7.12) holds, which completes the proof.

1/12/04

232

Q.E.D.
c 2003

Peter J. Olver

Remark : In the particular case when V = R n is endowed with the standard dot
product, then Theorem 7.10 identifies a row vector representing a linear function with
the corresponding column vector obtained by transposition a 7aT . Thus, the nave
identification of a row and a column vector is, in fact, an indication of a much more subtle
phenomenon that relies on the identification of R n with its dual based on the Euclidean
inner product. Alternative inner products will lead to alternative, more complicated,
identifications of row and column vectors; see Exercise for details.
Important: Theorem 7.10 is not true if V is infinite-dimensional. This fact will have
important repercussions for the analysis of the differential equations of continuum mechanics, which will lead us immediately into the much deeper waters of generalized function
theory. Details will be deferred until Section 11.2.
Composition of Linear Functions
Besides adding and multiplying by scalars, one can also compose linear functions.
Lemma 7.11. Let V, W, Z be vector spaces. If L: V W and M : W Z are linear
functions, then the composite function M L: V Z, defined by (M L)[ v ] = M [ L[ v ] ]
is linear.
Proof : This is straightforward:
(M L)[ c v + d w ] = M [ L[ c v + d w ] ] = M [ c L[ v ] + d L[ w ] ]
= c M [ L[ v ] ] + d M [ L[ w ] ] = c (M L)[ v ] + d (M L)[ w ],
where we used, successively, the linearity of L and then of M .

Q.E.D.

For example, if L[ v ] = A v maps R n to R m , and M [ w ] = B w maps R m to R l , so

that A is an m n matrix and B is a l m matrix, then
(M L)[ v ] = M [ L[ v ] ] = B(A v) = (B A) v,
and hence the composition M L: R n R l corresponds to the l n product matrix BA.
In other words, on Euclidean space, composition of linear functions is the same as matrix
multiplication!
As with matrix multiplication, composition of (linear) functions is not commutative.
In general the order of the constituents makes a difference.
Example 7.12. Composing two rotations gives another rotation: R R = R+ .
In other words, if we first rotate by angle and then by angle , the net result is rotation
by angle +. On the matrix level of (7.8), this implies that A A = A+ , or, explicitly,

cos( + ) sin( + )
cos sin
cos sin
.
=
sin( + ) cos( + )
sin cos
sin cos

Multiplying out the left hand side, we deduce the well-known trigonometric addition formulae
cos( + ) = cos cos sin sin ,

sin( + ) = cos sin + sin cos .

In fact, this computation constitutes a bona fide proof of these two identities!
1/12/04

233

c 2003

Peter J. Olver

Example 7.13. One can build up more sophisticated linear operators on function
space by adding and composing simpler ones. In particular, the linear higher order derivative operators are obtained by composing the derivative operator D, defined in (7.11), with
itself. For example,
D2 [ f ] = D D[ f ] = D[ f 0 ] = f 00
is the second derivative operator. One needs to exercise some care about the domain of
definition, since not every function is differentiable. In general,
Dk [ f ] = f (k) (x)

defines a linear operator

D k : Cn [ a, b ] Cnk [ a, b ]

for any n k.
If we compose D k with the linear operation of multiplication by a fixed function
a(x) Cnk [ a, b ] we obtain the linear operator f (x) 7a D k [ f ] = a(x) f (k) (x). Finally, a
general linear ordinary differential operator of order n
L = an (x) Dn + an1 (x) Dn1 + + a1 (x) D + a0 (x)

(7.13)

is obtained by summing such linear operators. If the coefficient functions a 0 (x), . . . , an (x)
are continuous, then
dn1 u
du
dn u
+
a
(x)
+ + a1 (x)
+ a0 (x)u
(7.14)
n1
n
n1
dx
dx
dx
defines a linear operator from Cn [ a, b ] to C0 [ a, b ]. The most important case but certainly not the only one arising in applications is when the coefficients a i (x) = ci of L
are all constant.
L[ u ] = an (x)

Inverses
The inverse of a linear function is defined in direct analogy with the Definition 1.13
of the inverse of a (square) matrix.
Definition 7.14. Let L: V W be a linear function. If M : W V is a linear
function such that both composite functions
LM = IW,

M L = IV ,

(7.15)

are equal to the identity function, then we call M the inverse of L and write M = L 1 .
The two conditions (7.15) require
L[ M [ w ] ] = w

for all

w W,

and

M [ L[ v ] ] = v

for all

v V.

Of course, if M = L1 is the inverse of L, then L = M 1 is the inverse of M since the

conditions are symmetric.
If V = R n , W = R m , so that L and M are given by matrix multiplication, by A and
B respectively, then the conditions (7.15) reduce to the usual conditions
AB = I,

BA = I,

for matrix inversion, cf. (1.33). Therefore B = A1 is the inverse matrix. In particular,
for L to have an inverse, we need m = n and its coefficient matrix A to be square and
nonsingular.
1/12/04

234

c 2003

Peter J. Olver

Rotation.

Figure 7.4.

Example 7.15. The Fundamental Theorem of Calculus

says, roughly, that differenZ x
tiation D[ f ] = f 0 and (indefinite) integration J[ f ] =
f (y) dy are inverse operations.
a

More precisely, the derivative of the indefinite integral of f is equal to f , and hence
Z x
d
D[ J[ f (x) ] ] =
f (y) dy = f (x).
dx a
In other words, the composition

D J = I C0 [ a,b ]
defines the identity operator on the function space C0 [ a, b ]. On the other hand, if we
integrate the derivative of a continuously differentiable function f C 1 [ a, b ], we obtain
Z x
0
f 0 (y) dy = f (x) f (a).
J[ D[ f (x) ] ] = J[ f (x) ] =
a

Therefore

J[ D[ f (x) ] ] = f (x) f (a),

and so

J D 6
= I C1 [ a,b ]

is not the identity operator. Therefore, differentiation, D, is a left inverse for integration,
J, but not a right inverse!
This perhaps surprising phenomenon could not be anticipated from the finite-dimensional
matrix theory. Indeed, if a matrix A has a left inverse B, then B is automatically a right
inverse too, and we write B = A1 as the inverse of A. On an infinite-dimensional vector
space, a linear operator may possess one inverse without necessarily the other. However,
if both a left and a right inverse exist they must be equal; see Exercise .
If we restrict D to the subspace V = { f | f (a) = 0 } C1 [ a, b ] consisting of all continuously differentiable functions that vanish at the left hand endpoint, then J: C 0 [ a, b ] V ,
and D: V C0 [ a, b ] are, by the preceding argument, inverse linear operators: D J =
I C0 [ a,b ] , and J D = I V . Note that V ( C1 [ a, b ] ( C0 [ a, b ]. Thus, we discover the curious
and disconcerting infinite-dimensional phenomenon that J defines a one-to-one, invertible,
linear map from a vector space C0 [ a, b ] to a proper subspace V ( C0 [ a, b ]. This paradoxical
situation cannot occur in finite dimensions. A linear map on a finite-dimensional vector
space can only be invertible when the domain and target spaces have the same dimension,
and hence its matrix is necessarily square!

7.2. Linear Transformations.

1/12/04

235

c 2003

Peter J. Olver

Figure 7.5.

Reflection through the y axis.

A linear function L: R n R n that maps n-dimensional Euclidean space to itself defines a linear transformation. As such, it can be assigned a geometrical interpretation that
leads to further insight into the nature and scope of linear functions. The transformation
L maps a point x R n to its image point L[ x ] = A x, where A is its n n matrix representative. Many of the basic maps that appear in geometry, in computer graphics and
computer gaming, in deformations of elastic bodies, in symmetry and crystallography, and
in Einsteins special relativity, are defined by linear transformations. The two-, three- and
four-dimensional (viewing time as a fourth dimension) cases are of particular importance.
Most of the important classes linear transformations already appear in the two-dimensional case. Every linear function L: R 2 R 2 has the form

a b
ax + by
x
(7.16)
,
where
A=
=
L
c d
cx + dy
y
is an arbitrary 2 2 matrix. We have already encountered the rotation matrices

cos sin
R =
,
sin
cos

(7.17)

whose effect is to rotate every vector in R 2 through an angle ; in Figure 7.4 we illustrate
the effect on a couple of square regions in the plane. Planar rotation matrices coincide
with the 2 2 proper orthogonal matrices, meaning matrices Q that satisfy
QT Q = I ,

det Q = +1.

(7.18)

The improper orthogonal matrices, i.e., those with determinant 1, define reflections. For
example, the matrix

1 0
x
x
A=
corresponds to the linear transformation L
=
, (7.19)
0 1
y
y
which reflects the plane through the y axis; see Figure 7.5. It can be visualized by thinking
of the y axis as a mirror. Another simple example is the improper orthogonal matrix

0 1
x
y
R=
. The corresponding linear transformation L
=
(7.20)
1 0
y
x
is a reflection through the diagonal line y = x, as illustrated in Figure 7.6.
1/12/04

236

c 2003

Peter J. Olver

Figure 7.6.

Reflection through the Diagonal.

Figure 7.7.

A ThreeDimensional Rotation.

A similar bipartite classification of orthogonal matrices carries over to three-dimensional

(and even higher dimensional) space. The proper orthogonal matrices correspond to rotations and the improper to reflections, or, more generally,
reflections combined
with rota

cos sin 0
tions. For example, the proper orthogonal matrix sin
cos 0 corresponds to a
0
0
1

cos 0 sin
rotation through an angle around the zaxis, while 0
1
0 corresponds
sin 0 cos
to a rotation through an angle around the yaxis. In general, a proper orthogonal matrix Q = ( u1 u2 u3 ) with columns ui = Q ei corresponds to the rotation in which the
standard basis vectors e1 , e2 , e3 are rotated to new positions given by the orthonormal
basis u1 , u2 , u3 . It can be shown see Exercise that every 3 3 orthogonal matrix
corresponds to a rotation around a line through the origin in R 3 the axis of the rotation,
as sketched in Figure 7.7.
Since the product of two (proper) orthogonal matrices is also (proper) orthogonal,
this implies that the composition of two rotations is also a rotation. Unlike the planar
case, the order in which the rotations are performed is important! Multiplication of n n
orthogonal matrices is not commutative for n 3. For example, rotating first around the
zaxis and then rotating around the yaxis does not have the same effect as first rotating
around the yaxis and then rotating first around the zaxis. If you dont believe this,
try it out with a solid object, e.g., this book, and rotate through 90 , say, around each
axis; the final configuration of the book will depend upon the order in which you do the
1/12/04

237

c 2003

Peter J. Olver

Figure 7.8.

Stretch along the xaxis.

Figure 7.9.

Shear in the x direction.

rotations. Then prove this mathematically by showing that the two rotation matrices do
not commute.
Other important linear transformations arise from elementary matrices. First, the
elementary matrices corresponding to the third type of row operations multiplying a
row by a scalar correspond to simple stretching transformations. For example, if

2x
x
2 0
=
,
then the linear transformation
L
A=
y
y
0 1
has the effect of stretching along the x axis by a factor of 2; see Figure 7.8. A matrix with
a negative diagonal entry corresponds to a reflection followed by a stretch. For example,
the elementary matrix (7.19) gives an example of a pure reflection, while the more general
elementary matrix

2 0
2 0
1 0
=
0 1
0 1
0 1
can be written as the product of a reflection through the y axis followed by a stretch along
the x axis. In this case, the order of these operations is immaterial.
For 22 matrices, there is only one type of row interchange matrix, namely the matrix
(7.20) that yields a reflection through the diagonal y = x. The elementary matrices of Type
#1 correspond to shearing transformations of the plane. For example, the matrix

1 2
x
x + 2y
represents the linear transformation
L
=
,
0 1
y
y
which has the effect of shearing the plane along the xaxis. The constant 2 will be called
the shear factor , which can be either positive or negative. Each point moves parallel to
the x axis by an amount proportional to its (signed) distance from the axis; see Figure 7.9.
1/12/04

238

c 2003

Peter J. Olver

Similarly, the elementary matrix

1 0
represents the linear transformation
3 1

x
x
,
=
L
y 3x
y

which represents a shear along the y axis. Shears map rectangles to parallelograms; distances are altered, but areas are unchanged.
All of the preceding linear maps are invertible, and so represented by nonsingular
matrices. Besides the zero map/matrix, which sends every point x R 2 to the origin, the
simplest singular map is

x
x
1 0
,
=
corresponding to the linear transformation
L
0
y
0 0
T

which is merely the orthogonal projection of the vector ( x, y ) onto the xaxis. Other
rank one matrices represent various kinds of projections from the plane to a line through
the origin; see Exercise for details.
A similar classification of linear maps appears in higher dimensions. The linear transformations constructed from elementary matrices can be built up from the following four
basic types:
(i ) A stretch in a single coordinate direction.
(ii ) A reflection through a coordinate plane.
(iii ) A reflection through a diagonal plane,
(iv ) A shear along a coordinate axis.
Moreover, we already proved that every nonsingular matrix can be written as a product
of elementary matrices; see (1.41). This has the remarkable consequence that every linear
transformation can be constructed from a sequence of elementary stretches, reflections, and
shears. In addition, there is one further, non-invertible type of basic linear transformation:
(v ) An orthogonal projection onto a lower dimensional subspace.
All possible linear transformations of R n can be built up, albeit non-uniquely, as a combination of these five basic types.

1
3
2
corresponding to a plane
Example 7.16. Consider the matrix A = 2

1
2

3
2

rotation through = 30 , cf. (7.17). Rotations are not elementary linear transformations.
To express this particular rotation as a product of elementary matrices, we need to perform
a Gauss-Jordan row reduction to reduce it to the identity matrix. Let us indicate the basic
steps:

!
!

1
3

1
0
2
2
E1 =
,
,
E1 A =
2
13 1
0

!
3
!
3
1

1 0
2
2

E2 =
,
E 2 E1 A =
,
3
0 2
0
1
1/12/04

239

c 2003

Peter J. Olver

E3 =
E4 =

2
3

1
3

,
,

E 3 E2 E1 A =

1 13
0

E 4 E3 E2 E1 A = I =

and hence

3
1

1
2
2

= A = E11 E21 E31 E41 =

1
1
2

3
2

0
1

1
0

0
2
3

1 0
0 1

3
2

1 13
0

Therefore, a 30 rotation can be effected by performing the following composition of elementary transformations in the prescribed order:
(1) First, a shear in the xdirection with shear factor 13 ,

(2) Then a stretch in the direction of the xaxis by a factor of 23 ,

(3) Then a stretch (or, rather, a contraction) in the y-direction by the reciprocal
factor 23 ,
(4) Finally, a shear in the direction of the yaxis with shear factor 13 .
The fact that the combination of these special transformations results in a pure rotation is
surprising and non-obvious. Similar decompositions can be systematically found for higher
dimensional linear transformations.
Change of Basis
Sometimes a linear transformation represents an elementary geometrical transformation, but this is not evident because the matrix happens to be written in the wrong
coordinates. The characterization of linear functions from R n to R m as multiplication by
m n matrices in Theorem 7.5 relies on using the standard bases of the domain and target
spaces. In many cases, the standard basis is not particularly well adapted to the linear
transformation, and one can often gain more insight by adopting a more suitable basis.
Therefore, we need to understand how to write a given linear transformation in a new
basis.
The following general result says that, in any basis, a linear function on finitedimensional vector spaces can be realized by matrix multiplication of the coordinates.
But the particular matrix representative will depend upon the choice of basis.
Theorem 7.17. Let L: V W be a linear function. Suppose V has basis v1 , . . . , vn
and W has basis w1 , . . . , wm . We can write
v = x1 v1 + + xn vn V,

w = y1 w1 + + ym wm W,

where x = ( x1 , x2 , . . . , xn ) are the coordinates of v relative to the chosen basis on V

T
and y = ( y1 , y2 , . . . , ym ) are those of w relative to its basis. Then the linear function
w = L[ v ] is given in these coordinates by multiplication, y = B x, by an m n matrix B.
1/12/04

240

c 2003

Peter J. Olver

Proof : We mimic the proof of Theorem 7.5, replacing the standard basis vectors by
more general basis vectors. In other words, we should apply L to the basis vectors of V
and express the result as a linear combination of the basis vectors in W . Specifically, we
m
X
bij wi . The coefficients bij form the entries of the desired coefficient
write L[ vj ] =
i=1

matrix. Indeed, by linearity

L[ v ] = L[ x1 v1 + + xn vn ] = x1 L[ v1 ] + + xn L[ vn ] =
and so yi =

n
X

m
X

i=1

bij xj as claimed.

n
X

j =1

bij xj wi ,

Q.E.D.

j =1

Suppose that the linear transformation L: R n R m is represented by a certain m n

matrix A relative to the standard bases e1 , . . . , en and b
e1 , . . . , b
em of the domain and target
n
m
spaces. If we introduce new bases for R and R then the same linear transformation
may have a completely different matrix representation. Therefore, different matrices may
represent the same underlying linear transformation, with respect to different bases.

x
xy
Example 7.18. Consider the linear transformation L
=
which
y
2x + 4y
2
we write in the
Cartesian coordinates x, y on R . The corresponding coefficient
standard,
1 1
is the matrix representation of L relative to the standard basis
matrix A =
2 4
e1 , e2 of R 2 . This means that

1
1
= e1 + 4 e2 .
= e1 + 2 e 2 ,
L[ e2 ] =
L[ e1 ] =
4
2
Let us see what happens if we replace the standard basis by the alternative basis

1
1
v1 =
,
v2 =
.
1
2
What is the corresponding matrix formulation of the same linear transformation? According to the recipe of Theorem 7.17, we must compute

2
3
L[ v1 ] =
= 2 v1 ,
L[ v2 ] =
= 3 v2 .
2
6
The linear transformation acts by stretching in the direction v1 by a factor of 2 and
simultaneously stretching in the direction v2 by a factor of 3. Therefore,

the matrix form

2 0
of L with respect to this new basis is the diagonal matrix D =
. In general,
0 3
L[ a v1 + b v2 ] = 2 a v1 + 3 b v2 ,
T

whose effect is to multiply the new basis coordinates a = ( a, b ) by the diagonal matrix
D. Both A and D represent the same linear transformation the former in the standard
1/12/04

241

c 2003

Peter J. Olver

basis and the latter in the new basis. The simple geometry of this linear transformation
is thereby exposed through the inspired choice of an adapted basis. The secret behind the
choice of such well-adapted bases will be revealed in Chapter 8.
How does one effect a change of basis in general? According to formula (2.22), if
T
v1 , . . . , vn form a new basis of R n , then the coordinates y = ( y1 , y2 , . . . , yn ) of a vector
x = y 1 v1 + y2 v2 + + y n vn
are found by solving the linear system
S y = x,

where

S = ( v 1 v2 . . . v n )

(7.21)

is the nonsingular n n matrix whose columns are the basis vectors.

Consider first a linear transformation L: R n R n from R n to itself. When written
in terms of the standard basis, L[ x ] = A x has a certain n n coefficient matrix A. To
change to the new basis v1 , . . . , vn , we use (7.21) to rewrite the standard x coordinates in
terms of the new y coordinates. We also need to write the target vector f = A x in terms
of the new coordinates, which requires f = S g. Therefore, the new target coordinates are
expressed in terms of the new domain coordinates via
g = S 1 f = S 1 A x = S 1 A S y = B y.
Therefore, in the new basis, the matrix form of our linear transformation is
B = S 1 A S.

(7.22)

Two matrices A and B which are related by such an equation for some nonsingular matrix S
are called similar . Similar matrices represent the same linear transformation, but relative
to different bases of the underlying vector space R n .
Returning to the preceding
example,
we assemble the new basis vectors to form the
1
1
change of basis matrix S =
, and verify that
1 2

2
1
1 1
1 1
2 0
1
S AS =
=
= D,
1 1
2 4
1 2
0 3
reconfirming our earlier computation.
More generally, a linear transformation L: R n R m is represented by an m n
matrix A with respect to the standard bases on both the domain and target spaces. What
happens if we introduce a new basis v1 , . . . , vn on the domain space R n and a new basis
w1 , . . . , wm on the target space R m ? Arguing as above, we conclude that the matrix
representative of L with respect to these new bases is given by
B = T 1 A S,

(7.23)

where S = ( v1 v2 . . . vn ) is the domain basis matrix, while T = ( w1 w2 . . . wm ) is the

range basis matrix.
1/12/04

242

c 2003

Peter J. Olver

In particular, suppose that the linear transformation has rank

r = dim rng L = dim corng L.
Let us choose a basis v1 , . . . , vn of R n such that v1 , . . . , vr form a basis of corng L while
vr+1 , . . . , vn form a basis for ker L = (corng L) . According to Proposition 5.54, the
image vectors w1 = L[ v1 ], . . . , wr = L[ vr ] form a basis for rng L, while L[ vr+1 ] = =
L[ vn ] = 0. We further choose a basis wr+1 , . . . , wm for coker L = (rng L) , and note that
the combination w1 , . . . , wm forms a basis for R m . The matrix form of L relative to these
two adapted bases is simply

1 0 0 ... 0 0 ... 0
0 1 0 ... 0 0 ... 0

0 0 1 ... 0 0 ... 0

. . . .

.. 1 0 . . . 0
(7.24)
B = .. .. ..
.

0 0 0 ... 0 0 ... 0

. . . .
. . ... ... . . . ...
.. .. ..
0 0 0 ... 0 0 ... 0
In this matrix, the first r rows have a single 1 in the diagonal slot, indicating that the first
r basis vectors of the domain space are mapped to the first r basis vectors of the target
space while the last m r rows are all zero, indicating that the last n r basis vectors in
the domain are all mapped to 0. Thus, by a suitable choice of bases on both the domain
and target spaces, any linear transformation has an extremely simple canonical form.
Example 7.19. According to the illustrative example following Theorem 2.47, the
matrix

2 1 1
2
A = 8 4 6 4
4 2 3
2

has rank 2. Based on the calculations, we choose the domain space basis
1

0
2
2
2
0
1
1
0
v 3 = ,
v2 =
v1 =
v4 =
,
,
,
2
1
0
2
4
2
0
1
noting that v1 , v2 are a basis for the row space corng A, while v3 , v4 are a basis for ker A.
For our basis of the target space, we first compute w1 = A v1 and w2 = A v2 , which form
a basis for rng A. We supplement these by the single basis vector w3 for coker A, and so

0
6
10

w3 = 21 ,
w2 = 4 ,
w1 = 34 ,
2
17
1

In terms of these two bases, the canonical matrix

1
B = T 1 A S = 0
0
1/12/04

243

form of the linear function is

0 0 0
1 0 0 ,
0 0 0
c 2003

Peter J. Olver

where the bases are assembled to form the matrices

2
2
0
2
10
1 0 1 0

,
T
=
34
S=

1 2 0 2
17
2
4 0 1

6
4
2

0
1
2

7.3. Affine Transformations and Isometries.

Not every transformation of importance in geometrical applications arises as a linear
function. A simple example is a translation, where all the points in R n are moved in the
same direction by a common distance. The function that accomplishes this is
T [ x ] = x + a,

x Rn,

(7.25)

where a R n is a fixed vector that determines the direction and the distance that the
points are translated. Except in the trivial case a = 0, the translation T is not a linear
function because
T[x + y] = x + y + a 6
= T [ x ] + T [ y ] = x + y + 2 a.
Or, even more simply, one notes that T [ 0 ] = a 6
= 0.
Combining translations and linear functions leads us to an important class of geometrical transformations.
Definition 7.20. A function F : R n R m of the form
F [ x ] = A x + b,

(7.26)

where A is an m n matrix and b R n a fixed vector, is called an affine function.

For example, every affine function from R to itself has the form
f (x) = x + .

(7.27)

As mentioned earlier, even though the graph of f (x) is a straight line, f is not a linear
function unless = 0, and the line goes through the origin. Thus, to be technically
correct, we should refer to (7.27) as an affine scalar function.
Example 7.21. The affine function

0 1
x
1
y + 1
F (x, y) =
+
=
1 0
y
2
x2
has the effect of first rotating the plane R 2 by 90 about the origin, and then translating
T
by the vector ( 1, 2 ) . The reader may enjoy proving that this combination has the same

effect as just rotating the plane through an angle of 90 centered at the point 43 , 12 .
See Exercise .
1/12/04

244

c 2003

Peter J. Olver

The composition of two affine functions is again an affine function. Specifically, given
F [ x ] = A x + a, G[ y ] = B y + b, then
(G F )[ x ] = G[ F [ x ] ] = G[ A x + a ]
= B (A x + a) + b = C x + c,

where

C = B A,

c = B a + b.

(7.28)

Note that the coefficient matrix of the composition is the product of the coefficient matrices,
but the resulting vector of translation is not the sum the two translation vectors!
Isometry
A transformation that preserves distance is known as a rigid motion, or, more abstractly, as an isometry. We already encountered the basic rigid motions in Chapter 6
they are the translations and the rotations.
Definition 7.22. A function F : V V is called an isometry on a normed vector
space if it preserves the distance:
d(F [ v ], F [ w ]) = d(v, w)

for all

v, w V.

(7.29)

Since the distance between points is just the norm of the vector between them,
d(v, w) = k v w k, cf. (3.29), the isometry condition (7.29) can be restated as

F[v] F[w] = k v w k
for all
v, w V.
(7.30)
Clearly, any translation

T [ v ] = v + a,

where

is a fixed vector

defines an isometry since T [ v ] T [ w ] = v w. A linear transformation L: V V defines

an isometry if and only if

L[ v ] = k v k
for all
v V,
(7.31)
because, by linearity, L[ v ] L[ w ] = L[ v w ]. More generally, an affine transformation
F [ v ] = L[ v ] + a is an isometry if and only if its linear part L[ v ] is.
For the standard Euclidean norm on V = R n , the linear isometries consist of rotations and reflections. Both are characterized by orthogonal matrices, the rotations having
determinant + 1, while the reflections have determinant 1.
Proposition 7.23. A linear transformation L[ x ] = Q v defines a Euclidean isometry
of R if and only if Q is an orthogonal matrix.
n

Proof : The linear isometry condition (7.31) requires that

Q x 2 = (Q x)T Q x = xT QT Q x = xT x = k x k2
for all

x Rn.

(7.32)

Clearly this holds if and only if QT Q = I , which is precisely the condition (5.30) that Q
be an orthogonal matrix.
Q.E.D.
1/12/04

245

c 2003

Peter J. Olver

Figure 7.10.

A Screw.

Remark : It can be proved, [153], that the most general Euclidean isometry of R n is an
affine transformation F [ x ] = Q x + a where Q is an orthogonal matrix and a is a constant
vector. Therefore, every Euclidean isometry is a combination of translations, rotations and
reflections. The proper isometries correspond to the rotations, with det Q = 1, and can be
realized as physical motions; improper isometries, with det Q = 1, are then obtained by
reflection in a mirror.
The isometries of R 2 and R 3 are fundamental to the understanding of how objects
move in three-dimensional space. Basic computer graphics and animation require efficient
implementation of rigid isometries in three-dimensional space, coupled with appropriate
(nonlinear) perspective maps prescribing the projection of three-dimensional objects onto
a two-dimensional viewing screen.
There are three basic types of proper affine isometries. First are the translations
F [ x ] = x+a in a fixed direction a. Second are the rotations. For example, F [ x ] = Q x with
det Q = 1 represent rotations around the origin, while the more general case F [ x ] = Q(x
b) + b = Q x + ( I Q)b is a rotation around the point b. Finally, the screw motions are
affine maps of the form F [ x ] = Q x+a where the orthogonal matrix Q represents a rotation
through an angle around a fixedaxisa, which
term;

is also the direction
ofthe translation
0
cos sin 0
x
x
cos 0 y + 0 represents
see Figure 7.10. For example, F y = sin
a
0
0
1
z
z
a vertical screw along the the zaxis through an angle by an distance a. As its name
implies, a screw represents the motion of a point on the head of a screw. It can be proved,
cf. Exercise , that every proper isometry of R 3 is either a translation, a rotation, or a
screw.

7.4. Linear Systems.

The abstract notion of a linear system serves to unify, in a common conceptual framework, linear systems of algebraic equations, linear ordinary differential equations, linear
1/12/04

246

c 2003

Peter J. Olver

partial differential equations, linear boundary value problems, and a wide variety of other
linear problems in mathematics and its applications. The idea is simply to replace matrix
multiplication by a general linear function. Many of the structural results we learned in the
matrix context have, when suitably formulated, direct counterparts in these more general
situations, thereby shedding some light on the nature of their solutions.
Definition 7.24. A linear system is an equation of the form
L[ u ] = f ,

(7.33)

in which L: V W is a linear function between vector spaces, the right hand side f W
is an element of the target space, while the desired solution u V belongs to the domain
space. The system is homogeneous if f = 0; otherwise, it is called inhomogeneous.
Example 7.25. If V = R n and W = R m , then, according to Theorem 7.5, every
linear function L: R n R m is given by matrix multiplication: L[ u ] = A u. Therefore, in
this particular case, every linear system is a matrix system, namely A u = f .
Example 7.26. A linear ordinary differential equation takes the form L[ u ] = f ,
where L is an nth order linear differential operator of the form (7.13), and the right hand
side is, say, a continuous function. Written out, the differential equation takes the familiar
form
dn u
dn1 u
du
+
a
(x)
+ + a1 (x)
+ a0 (x)u = f (x).
(7.34)
n1
n
n1
dx
dx
dx
You should already have some familiarity with solving the constant coefficient case. Appendix C describes a method for constructing series representations for the solutions to
more general, non-constant coefficient equations.
L[ u ] = an (x)

Example 7.27. Let K(x, y) be a function of two variables which is continuous for
all a x, y b. Then the integral
Z b
IK [ u ] =
K(x, y) u(y) dy
a

defines a linear operator IK : C [ a, b ] C0 [ a, b ], known as an integral transform. Important examples include the Fourier and Laplace transforms, to be discussed in Chapter 13.
Finding the inverse transform requires solving a linear integral equation I K [ u ] = f , which
has the explicit form
Z b
K(x, y) u(y) dy = f (x).
a

Example 7.28. One can combine linear maps to form more complicated, mixed
types of linear systems. For example, consider a typical initial value problem
u00 + u0 2 u = x,

u(0) = 1,

u0 (0) = 1,

(7.35)

for a scalar unknown function u(x). The differential equation can be written as a linear
system
L[ u ] = x,
where
L[ u ] = (D 2 + D 2)[ u ] = u00 + u0 2 u
1/12/04

247

c 2003

Peter J. Olver

is a linear, constant coefficient differential operator. If we further define

00
u (x) + u0 (x) 2 u(x)
L[ u ]
,
u(0)
M [ u ] = u(0) =
u0 (0)
u0 (0)

then M defines a linear map whose domain is the space C2 of twice continuously differentiable
functions,
and whose range is the vector space V consisting of all triples

f (x)

v=
a , where f C0 is a continuous function and a, b R are real constants. You
b
should convince yourself that V is indeed a vector space under the evident addition and
scalar multiplication operations. In this way, we can write the initial value problem (7.35)
T
in linear systems form as M [ u ] = f , where f = ( x, 1, 1 ) .
A similar construction applies to linear boundary value problems. For example, the
boundary value problem
u00 + u = ex ,

u(0) = 1,

u(1) = 2,

is in the form of a linear system

M[u] = f,

where

u00 (x) + u(x)

,
M[u] =
u(0)
u(1)

ex
f = 1 .
2

Note that M : C2 V defines a linear map having the preceding domain and target spaces.
The Superposition Principle
Before attempting to tackle general inhomogeneous linear systems, it will help to
look first at the homogeneous version. The most important fact is that homogeneous
linear systems admit a superposition principle, that allows one to construct new solutions
from known solutions. As we learned, the word superposition refers to taking linear
combinations of solutions.
Consider a general homogeneous linear system
L[ z ] = 0

(7.36)

where L is a linear function. If we are given two solutions, say z 1 and z2 , meaning that
L[ z1 ] = 0,

L[ z2 ] = 0,

then their sum z1 + z2 is automatically a solution, since, in view of the linearity of L,

L[ z1 + z2 ] = L[ z1 ] + L[ z2 ] = 0 + 0 = 0.

This is a particular case of the general Cartesian product construction between vector spaces,
with V = C0 R 2 . See Exercise for details.

1/12/04

248

c 2003

Peter J. Olver

Similarly, given a solution z and any scalar c, the scalar multiple c z is automatically a
solution, since
L[ c z ] = c L[ z ] = c 0 = 0.
Combining these two elementary observations, we can now state the general superposition
principle. The proof is an immediate consequence of formula (7.4).
Theorem 7.29. If z1 , . . . , zk are all solutions to the same homogeneous linear system
L[ z ] = 0, and c1 , . . . , ck are any scalars, then the linear combination c1 z1 + + ck zk is
also a solution.
As with matrices, we call the solution space to the homogeneous linear system (7.36)
the kernel of the linear function L. Theorem 7.29 implies that the kernel always forms a
subspace.
Proposition 7.30. If L: V W is a linear function, then its kernel
ker L = { z V | L[ z ] = 0 } V

(7.37)

forms a subspace of the domain space V .

As we know, in the case of linear matrix systems, the kernel can be explicitly determined by the basic Gaussian elimination algorithm. For more general linear operators, one
must develop appropriate solution techniques for solving the homogeneous linear system.
Here is a simple example from the theory of linear, constant coefficient ordinary differential
equations.
Example 7.31. Consider the second order linear differential operator
L = D2 2 D 3,

(7.38)

which maps the function u(x) to the function

L[ u ] = (D 2 2 D 3)[ u ] = u00 2 u0 3 u.
The associated homogeneous system takes the form of a homogeneous, linear, second order
ordinary differential equation
L[ u ] = u00 2 u0 3 u = 0.

(7.39)

In accordance with the standard solution method, we plug the exponential ansatz
u = e x

The German word ansatz (plural ans

atze) refers to the method of finding a solution to a
complicated equation by guessing the solutions form in advance. Typically, one is not clever
enough to guess the precise solution, and so the ansatz will have one or more free parameters
in this case the constant exponent that, with some luck, can be rigged up to fulfill the
requirements imposed by the equation. Thus, a reasonable English translation of ansatz is
inspired guess.

1/12/04

249

c 2003

Peter J. Olver

into the equation. The result is

L[ e x ] = D2 [ e x ] 2 D[ e x ] 3 e x = (2 2 3)e x ,
and therefore, e x is a solution if and only if satisfies the characteristic equation
0 = 2 2 3 = ( 3)( + 1).
The two roots are 1 = 3, 2 = 1, and hence
u1 (x) = e3 x ,

u2 (x) = e x ,

(7.40)

are two linearly independent solutions of (7.39). According to the general superposition
principle, every linear combination
u(x) = c1 u1 (x) + c2 u2 (x) = c1 e3 x + c2 e x
of these two basic solutions is also a solution, for any choice of constants c 1 , c2 . In fact,
this two-parameter family constitutes the most general solution to the ordinary differential
equation (7.39). Thus, the kernel of the second order differential operator (7.38) is twodimensional, with basis given by the independent exponential solutions (7.40).
In general, the solution space to an nth order homogeneous linear ordinary differential
equation
L[ u ] = an (x)

dn u
dn1 u
du
+
a
(x)
+ + a1 (x)
+ a0 (x)u = 0
n1
n
n1
dx
dx
dx

(7.41)

forms a subspace of the vector space Cn [ a, b ] of n times continuously differentiable functions, since it is just the kernel of a linear differential operator L: C n [ a, b ] C0 [ a, b ]. This
implies that linear combinations of solutions are also solutions. To determine the number
of solutions, or, more precisely, the dimension of the solution space, we need to impose
some mild restrictions on the differential operator.
Definition 7.32. The differential operator L is called nonsingular on an open interval [ a, b ] if all its coefficients an (x), . . . , a0 (x) C0 [ a, b ] are continuous functions and its
leading coefficient does not vanish: an (x) 6
= 0 for all a < x < b.
The basic existence and uniqueness result governing nonsingular homogeneous linear
ordinary differential equations can be formulated as a characterization of the dimension of
the solution space.
Theorem 7.33. The kernel of a nonsingular nth order ordinary differential operator
forms an n-dimensional subspace ker L Cn [ a, b ].
A proof of this result can be found in Section 20.1. The fact that the kernel has
dimension n means that it has a basis consisting of n linearly independent solutions
u1 (x), . . . , un (x) Cn [ a, b ] such that the general solution to the homogeneous differential equation (7.41) is given by a linear combination
u(x) = c1 u1 (x) + + cn un (x),
1/12/04

250

c 2003

Peter J. Olver

where c1 , . . . , cn are arbitrary constants. Therefore, once we find n linearly independent

solutions of an nth order homogeneous linear ordinary differential equation, we can immediately write down its most general solution.
The condition that the leading coefficient an (x) does not vanish is essential. Points
where an (x) = 0 are known as singular points. They arise in many applications, but must
be treated separately and with care; see Appendix C. (Of course, if the coefficients are
constant then there is nothing to worry about either the leading coefficient is nonzero,
an 6
= 0, or the operator is, in fact, of lower order than advertised.)
Example 7.34. A second order Euler differential equation takes the form
E[ u ] = a x2 u00 + b x u0 + c u = 0,

(7.42)

where 0 6
= a, b, c are constants, and E = a x2 D2 + b x D + c is a second order, non-constant
coefficient differential operator. Instead of the exponential solution ansatz used in the
constant coefficient case, Euler equations are solved by using a power ansatz
u(x) = xr
with unknown exponent r. Substituting into the differential equation, we find
E[ xr ] = a r (r 1) xr + b r xr + c xr = [ a r (r 1) + b r + c ] xr = 0,
and hence xr is a solution if and only if r satisfies the characteristic equation
a r (r 1) + b r + c = a r 2 + (b a) r + c = 0.

(7.43)

If the characteristic equation has two distinct real roots, r1 6

= r2 , then there are two linearly
independent solutions u1 (x) = xr1 and u2 (x) = xr2 , and the general (real) solution to (7.42)
has the form
r
r
u(x) = c1 | x | 1 + c2 | x | 2 .
(7.44)
(The absolute values are usually needed to ensure that the solutions remain real when x < 0
is negative.) The other cases repeated roots and complex roots will be discussed
below.
The Euler equation has a singular point at x = 0, where its leading coefficient vanishes.
Theorem 7.33 assures us that the differential equation has a two-dimensional solution space
on any interval not containing the singular point. However, predicting the number of
solutions which remain continuously differentiable at x = 0 is not as immediate, since it
depends on the values of the exponents r1 and r2 . For instance, the case
x2 u00 3 x u0 + 3 u = 0

has solution

u = c 1 x + c 2 x3 ,

which forms a two-dimensional subspace of C0 (R). However,

x2 u00 + x u0 u = 0
1/12/04

has solution
251

u = c1 x +

c2
,
x

c 2003

Peter J. Olver

and only the multiples of the first solution x are continuous at x = 0. Therefore, the
solutions that are continuous everywhere form only a one-dimensional subspace of C 0 (R).
Finally,
x2 u00 + 5 x u0 + 3 u = 0

has solution

c
c1
+ 23 ,
x
x

and there are no nontrivial solutions u 6

0 that are continuous at x = 0.
Example 7.35. Considerr the Laplace equation
[ u ] =

2u 2u
+ 2 =0
x2
y

(7.45)

for a function u(x, y) defined on a domain R 2 . The Laplace equation is the most
important partial differential equation, and its applications range over almost all fields
of mathemtics, physics and engineering, including complex analysis, geometry, fluid mechanics, electromagnetism, elasticity, thermodynamics, and quantum mechanics. It is a
homogeneous linear partial differential equation corresponding to the partial differential
operator = x2 + y2 known as the Laplacian operator. Linearity can either be proved
directly, or by noting that is built up from the basic linear partial derivative operators
x , y by the processes of composition and addition, as in Exercise .
Unlike the case of a linear ordinary differential equation, there are an infinite number
of linearly independent solutions to the Laplace equation. Examples include the trigonometric/exponential solutions
e x cos y,

e x sin y,

e y cos x,

e y sin y,

where is any real constant. There are also infinitely many independent polynomial
solutions, the first few of which are
1,

x2 y 2 ,

x y,

x3 3 x y 2 ,

...

The reader might enjoy finding some more polynomial solutions and trying to spot the
pattern. (The answer will appear shortly.) As usual, we can build up more complicated
solutions by taking general linear combinations of these particular ones. In fact, it will be
shown that the most general solution to the Laplace equation can be written as a convergent
infinite series in the basic polynomial solutions. Later, in Chapters 15 and 16, we will learn
how to construct these and many other solutions to the planar Laplace equation.
Inhomogeneous Systems
Now we turn our attention to an inhomogeneous linear system
L[ u ] = f .

(7.46)

Unless f = 0, the solution space to (7.46) is not a subspace. (Why?) The key question
is existence is there a solution to the system? In the homogeneous case, existence is
not an issue, since 0 is always a solution to L[ z ] = 0. The key question for homogeneous
1/12/04

252

c 2003

Peter J. Olver

systems is uniqueness whether ker L = {0}, in which case 0 is the only solution, or
whether there are nontrivial solutions 0 6
= z ker L.
In the matrix case, the compatibility of an inhomogeneous system A x = b which
was required for the existence of a solution led to the general definition of the range of
a matrix, which we copy verbatim for linear functions.
Definition 7.36. The range of a linear function L: V W is the subspace
rng L = { L[ v ] | v V } W.
The proof that rng L is a subspace is straightforward. If f = L[ v ] and g = L[ w ] are
any two elements of the range, so is any linear combination, since, by linearity
c f + d g = c L[ v ] + d L[ w ] = L[ c v + d w ] rng L.
For example, if L[ v ] = A v is given by multiplication by an m n matrix, then its range
is the subspace rng L = rng A R m spanned by the columns of A the column space
of the coefficient matrix. When L is a linear differential operator, or more general linear
operator, characterizing its range can be a much more challenging problem.
The fundamental theorem regarding solutions to inhomogeneous linear equations exactly mimics our earlier result, Theorem 2.37, in the particular case of matrix systems.
Theorem 7.37. Let L: V W be a linear function. Let f W . Then the inhomogeneous linear system
L[ u ] = f
(7.47)
has a solution if and only if f rng L. In this case, the general solution to the system has
the form
u = u? + z
(7.48)
where u? is a particular solution, so L[ u? ] = f , and z is a general element of ker L, i.e.,
the general solution to the corresponding homogeneous system
L[ z ] = 0.

(7.49)

Proof : We merely repeat the proof of Theorem 2.37. The existence condition f rng L
is an immediate consequence of the definition of the range. Suppose u? is a particular
solution to (7.47). If z is a solution to (7.49), then, by linearity,
L[ u? + z ] = L[ u? ] + L[ z ] = f + 0 = f ,
and hence u? + z is also a solution to (7.47). To show that every solution has this form,
let u be a second solution, so that L[ u ] = f . Then
L[ u u? ] = L[ u ] L[ u? ] = f f = 0.
Therefore u u? = z ker L is a solution to (7.49).
1/12/04

253

Q.E.D.
c 2003

Peter J. Olver

Remark : In physical systems, the inhomogeneity f typically corresponds to an external forcing function. The solution z to the homogeneous system represents the systems
natural, unforced motion. Therefore, the decomposition formula (7.48) states that a linear
system responds to an external force as a combination of its own internal motion and a
specific motion u? induced by the forcing. Examples of this important principle appear
throughout the book.
Corollary 7.38. The inhomogeneous linear system (7.47) has a unique solution if
and only if f rng L and ker L = {0}.
Therefore, to prove that a linear system has a unique solution, we first need to prove
an existence result that there is at least one solution, which requires the right hand side f
to lie in the range of the operator L, and then a uniqueness result, that the only solution
to the homogeneous system L[ z ] = 0 is the trivial zero solution z = 0. Consequently, if
an inhomogeneous system L[ u ] = f has a unique solution, then any other inhomogeneous
system L[ u ] = g that is defined by the same linear function also has a unique solution for
every g rng L.
Example 7.39. Consider the inhomogeneous linear second order differential equation
u00 + u0 2 u = x.
Note that this can be written in the linear system form
L[ u ] = x,

where

L = D2 + D 2

is a linear second order differential operator. The kernel of the differential operator L is
found by solving the associated homogeneous linear equation
L[ z ] = z 00 + z 0 2 z = 0.
Applying the usual solution method, we find that the homogeneous differential equation
has a two-dimensional solution space, with basis functions
z1 (x) = e 2 x ,

z2 (x) = ex .

Therefore, the general element of ker L is a linear combination

z(x) = c1 z1 (x) + c2 z2 (x) = c1 e 2 x + c2 ex .
To find a particular solution to the inhomogeneous differential equation, we rely on
the method of undetermined coefficients . We introduce the solution ansatz u = a x + b,
and compute
L[ u ] = L[ a x + b ] = 2 a x 2 b + a = x.

One could also employ the method of variation of parameters, although in general the undetermined coefficient method, when applicable, is the more straightforward of the two. Details of
the two methods can be found, for instance, in [ 24 ].

1/12/04

254

c 2003

Peter J. Olver

Equating the two expressions, we conclude that a = 12 , b = 14 , and hence

u? (x) = 21 x

1
4

is a particular solution to the inhomogeneous differential equation. Theorem 7.37 then

says that the general solution is
u(x) = u? (x) + z(x) = 12 x

1
4

+ c 1 e 2 x + c 2 ex .

Example 7.40. By inspection, we see that

u(x, y) = 21 sin(x + y)
is a solution to the particular Poisson equation
2u 2u
+ 2 = sin(x + y).
x2
y

(7.50)

Theorem 7.37 implies that every solution to this inhomogeneous version of the Laplace
equation takes the form
u(x, y) = 21 sin(x + y) + z(x, y),
where z(x, y) is an arbitrary solution to the homogeneous Laplace equation (7.45).
Example 7.41. The problem is to solve the linear boundary value problem
u00 + u = x,

u(0) = 0,

(7.51)

u() = 0.

The first step is to solve the differential equation. To this end, we find that cos x and sin x
form a basis for the solution space to the corresponding homogeneous differential equation
z 00 + z = 0. The method of undetermined coefficients then produces the particular solution
u? (x) = x to the inhomogeneous differential equation, and so the general solution is
u(x) = x + c1 cos x + c2 sin x.

(7.52)

The next step is to see whether any solutions also satisfy the boundary conditions. Plugging
formula (7.52) into the boundary conditions gives
u(0) = c1 = 0,

u() = c1 = 0.

However, these two conditions are incompatible, and so there is no solution to the linear
system (7.51). The function f (x) = x does not lie in the range of the differential operator
L[ u ] = u00 + u when u is subjected to the boundary conditions.
On the other hand, if we change the inhomogeneity, the boundary value problem
u00 + u = x 12 ,

u(0) = 0,

u() = 0.

(7.53)

does admit a solution, but the solution fails to be unique. Applying the preceding solution
method, we find that the function
u(x) = x 21 + 21 cos x + c sin x
1/12/04

255

c 2003

Peter J. Olver

solves the system for any choice of constant c. Note that z(x) = sin x forms a basis for the
kernel or solution space of the homogeneous boundary value problem
z 00 + z = 0,

z(0) = 0,

z() = 0.

Incidentally, if we slightly modify the interval of definition, considering

u00 + u = f (x),
u(0) = 0,
u 21 = 0,

(7.54)

then the system is compatible for any inhomogeneity f (x), and the solution to the boundary value problem is unique. For example, if f (x) = x, then the unique solution is
u(x) = x 21 sin x .

(7.55)

This example highlights some major differences between boundary value problems
and initial value problems for ordinary differential equations. For nonsingular initial value
problems, there is a unique solution for any set of initial conditions. For boundary value
problems, the structure of the solution space either a unique solution for all inhomogeneities, or no solution, or infinitely many solutions, depending on the right hand side
has more of the flavor of a linear matrix system. An interesting question is how to
characterize the inhomogeneities f (x) that admit a solution, i.e., lie in the range of the
operator. We will return to this question in Chapter 11.
Superposition Principles for Inhomogeneous Systems
The superposition principle for inhomogeneous linear systems allows us to combine
different inhomogeneities provided we do not change the underlying linear operator. The
result is a straightforward generalization of the matrix version described in Theorem 2.42.
Theorem 7.42. Let L: V W be a prescribed linear function. Suppose that, for
each i = 1, . . . , k, we know a particular solution u?i to the inhomogeneous linear system
L[ u ] = f i for some f i rng L. Given scalars c1 , . . . , ck , a particular solution to the
combined inhomogeneous system
L[ u ] = c1 f 1 + + ck f k

(7.56)

is the same linear combination u? = c1 u?1 + + ck u?k of particular solutions. The general
solution to the inhomogeneous system (7.56) is
u = u? + z = c1 u?1 + + ck u?k + z,
where z ker L is the general solution to the associated homogeneous system L[ z ] = 0.
The proof is an easy consequence of linearity, and left to the reader. In physical
terms, the superposition principle can be interpreted as follows. If we know the response
of a linear physical system to several different external forces, represented by f 1 , . . . , fk ,
then the response of the system to a linear combination of these forces is just the identical
linear combination of the individual responses. The homogeneous solution z represents an
internal motion that the system acquires independent of any external forcing. Superposition requires linearity of the system, and so is always applicable in quantum mechanics,
1/12/04

256

c 2003

Peter J. Olver

which is a linear theory. But, in classical and relativistic mechanics superposition only
applies in a linear approximation corresponding to small motions/displacements/etc. The
nonlinear regime is much more unpredictable, and combinations of external forces may
lead to unexpected results.
Example 7.43. We already know that a particular solution to the linear differential
equation
u00 + u = x
is
u?1 = x.
The method of undetermined coefficients is used to solve the inhomogeneous equation
u00 + u = cos x.
Since cos x and sin x are already solutions to the homogeneous equation, we must use
the solution ansatz u = a x cos x + b x sin x, which, when substituted into the differential
equation, produces the particular solution
u?2 = 21 x sin x.
Therefore, by the superposition principle, the combination inhomogeneous system
u00 + u = 3 x 2 cos x

has a particular solution

u? = 3 u?1 2 u?2 = 3 x + x sin x.

The general solution is obtained by appending the general solution to the homogeneous
equation: u = 3 x + x sin x + c1 cos x + c2 sin x.
Example 7.44. Consider the boundary value problem

u00 + u = x,
u(0) = 2,
u 21 = 1,

(7.57)

which is a modification of (7.54) with inhomogeneous boundary conditions. The superposition principle applies here, and allows us to decouple the inhomogeneity due to the
forcing from the inhomogeneity due to the boundary conditions. We already solved the
boundary value problem with homogeneous boundary conditions; see (7.55). On the other
hand, the unforced boundary value problem

u00 + u = 0,
u(0) = 2,
u 12 = 1,
(7.58)
has unique solution

u(x) = 2 cos x sin x.

(7.59)

Therefore, the solution to the combined problem (7.57) is the sum of these two:

u(x) = x + 2 cos x 1 + 12 sin x .

The solution is unique because the corresponding homogeneous boundary value problem

z 00 + z = 0,
z(0) = 0,
z 21 = 0,

has only the trivial solution z(x) 0. Incidentally, the solution (7.59) can itself be
decomposed as a linear combination of the solutions cos x and sin x to a pair of yet more
elementary boundary
problems with just one inhomogeneous
1 boundary condition;
1 value
namely, u(0) = 1, u 2 = 0, and, respectively, u(0) = 0, u 2 = 1.
1/12/04

257

c 2003

Peter J. Olver

Complex Solutions to Real Systems

The easiest way to obtain solutions to a linear, homogeneous, constant coefficient
ordinary differential equation is through an exponential ansatz, which effectively reduces it
to the algebraic characteristic equation. Complex roots of the characteristic equation yield
complex exponential solutions. But, if the equation is real, then the real and imaginary
parts of the complex solutions are automatically real solutions. This solution technique is
a particular case of a general principle for producing real solutions to real linear systems
from, typically, simpler complex solutions. To work, the method requires some additional
structure on the vector spaces involved.
Definition 7.45. A complex vector space V is called conjugated if it admits an
operation of complex conjugation taking u V to u that is compatible with scalar multiplication. In other words, if u V and C, then we require u = u.
The simplest example of a conjugated vector space is C n . The complex conjugate of
a vector is obtained by conjugating all its entries. Thus we have
u = v + i w,
u = v i w,

where

v = Re u =

u+u
,
2

w = Im u =

uu
,
2i

(7.60)

are the real and imaginary parts of u C n . For example, if

1 2i
1
2
1 + 2i
1
2
u = 3 i = 0 + i 3 ,
u = 3 i = 0 i 3 .
then
5
5
0
5
5
0

The same definition of real and imaginary part carries over to general conjugated vector
spaces. A subspace V C n is conjugated if and only if u V whenever u V . Another
prototypical example of a conjugated vector space is the space of complex-valued functions
f (x) = r(x) + i s(x) defined on the interval a x b. The complex conjugate function is
f (x) = r(x) i s(x). Thus, the complex conjugate of
e(1+3 i )x = ex cos 3 x + i ex sin 3 x

e(1+3 i )x = e(1 3 i )x = ex cos 3 x i ex sin 3 x.

An element v V of a conjugated vector space is called real if v = v. One easily

checks that the real and imaginary parts of a general element, as defined by (7.60), are
both real elements.
Definition 7.46. A linear operator L: V W between conjugated vector spaces is
called real if it commutes with complex conjugation:
L[ u ] = L[ u ].

(7.61)

For example, the linear function F : C n C m given by matrix multiplication, F (u) =

A u, is real if and only if A is a real matrix. Similarly, a differential operator (7.13) is real
if its coefficients are real-valued functions.
1/12/04

258

c 2003

Peter J. Olver

Theorem 7.47. If L[ u ] = 0 is a real homogeneous linear system and u = v + i w is

a complex solution, then its complex conjugate u = v i w is also a solution. Moreover,
both the real and imaginary parts, v and w, of a complex solution are real solutions.
Proof : First note that, by reality,
L[ u ] = L[ u ] = 0

whenever

L[ u ] = 0,

and hence the complex conjugate u of any solution is also a solution. Therefore, by linear
superposition, v = Re u = 12 (u+u) and w = Im u = 21i (uu) are also solutions. Q.E.D.
Example 7.48. The real linear matrix system

x

0
2 1 3 0 y
=
0
2 1 1 2
z
w

has a complex solution

3
1
1 3 i
1
0

1
u=
.
+ i
=
2
1
1 + 2i
4
2
2 4 i

Since the coefficient matrix is real, the real and imaginary parts,
T

v = ( 1, 1, 1, 2 ) ,

w = ( 3, 0, 2, 4 ) ,

are both solutions of the system.

On the other hand, the complex linear system

2
1+ i

2 i
0

i
2 i

has the complex solution

x

0 y
0
=
1
z
0
w

1
1 i
i 0
u=
= +
2
2
2
2 + 2i

1
1
i
.
0
2

However, neither the real nor the imaginary part is a solution to the system.
Example 7.49. Consider the real ordinary differential equation
u00 + 2 u0 + 5 u = 0.
To solve it, as in Example 7.31, we use the exponential ansatz u = e x , leading to the
characteristic equation
2 + 2 + 5 = 0.
1/12/04

259

c 2003

Peter J. Olver

There are two roots,

1 = 1 + 2 i ,

2 = 1 2 i ,

leading, via Eulers formula (3.76), to the complex solutions

u1 (x) = e( 1+2 i ) x = e x cos 2 x + i e x sin 2 x,
u2 (x) = e( 12 i ) x = e x cos 2 x i e x sin 2 x.
The complex conjugate of the first solution is the second, in accordance with Theorem 7.47.
Moreover, the real and imaginary parts of the two solutions
v(x) = e x cos 2 x,

w(x) = e x sin 2 x,

are individual real solutions. The general solution is a linear combination

u(x) = c1 e x cos 2 x + c2 e x sin 2 x,
of the two linearly independent real solutions.
Example 7.50. Consider the second order Euler differential equation
L[ u ] = x2 u00 + 7 x u0 + 13 u = 0.
The roots of the associated characteristic equation
r (r 1) + 7 r + 13 = r 2 + 6 r + 13 = 0
are complex: r = 3 2 i , and the resulting solutions xr = x 32 i are complex conjugate
powers. Using Eulers formula (3.76), we write them in real and imaginary form, e.g.,
x 3+2 i = x 3 e2 i log x = x 3 cos(2 log x) + i x 3 sin(2 log x).
Again, by Theorem 7.47, the real and imaginary parts of the complex solution are by
themselves real solutions to the equation. Therefore, the general real solution is
u(x) = c1 x 3 cos(2 log x) + c2 x 3 sin(2 log x).
Example 7.51. The complex monomial
u(x, y) = (x + i y)n
is a solution to the Laplace equation (7.45) because, by the chain rule,
2u
= n(n 1)(x + i y)n2 ,
x2

2u
= n(n 1) i 2 (x + i y)n2 = n(n 1)(x + i y)n2 ,
y 2

and hence uxx + uyy = 0. Since the Laplace operator is real, Theorem 7.47 implies that
the real and imaginary parts of this complex solution are real solutions. The resulting real
solutions are known as harmonic polynomials.
1/12/04

260

c 2003

Peter J. Olver

To find the explicit formulae for the harmonic polynomials, we use the Binomial
Formula and the fact that i 2 = 1, i 3 = i , i 4 = 1, etc., to expand

n n2
n n3
n
n
n1
2
(x + i y) = x + n x
( i y) +
x
( i y) +
x
( i y)3 +
2
3

n n3 3
n n2 2
n
n1
x
y + ,
x
y i
= x + i nx
y
3
2
in which we use the standard notation

n!
n
=
k
k ! (n k) !

(7.62)

for the binomial coefficients. Separating the real and imaginary terms, we find

n n4 4
n n2 2
n
n
x
y + ,
x
y +
Re (x + i y) = x
4
2

n n3 3
n n5 5
n
n1
Im (x + i y) = n x
y
x
y +
x
y + .
3
5

(7.63)

The first few of these harmonic polynomials were described in Example 7.35. In fact, it can
be proved that every polynomial solution to the Laplace equation is a linear combination
of the fundamental real harmonic polynomials; see Chapter 16 for full details.

7.5. Adjoints.
In Sections 2.5 and 5.6, we discovered the importance of the adjoint system A T y = f
in the analysis of systems of linear equations A x = b. Two of the four fundamental matrix
subspaces are based on the transposed matrix. While the m n matrix A defines a linear
function from R n to R m , its transpose, AT , has size n m and hence characterizes a linear
function in the reverse direction, from R m to R n .
As with most fundamental concepts for linear matrix systems, the adjoint system and
transpose operation on the coefficient matrix are the prototypes of a much more general
construction that is valid for general linear functions. However, it is not as obvious how
to transpose a more general linear operator L[ u ], e.g., a differential operator acting on
function space. In this section, we shall introduce the concept of the adjoint of a linear
function that generalizes the transpose operation on matrices. Unfortunately, most of the
interesting examples must be deferred until we develop additional analytical tools, starting
in Chapter 11.
The adjoint (and transpose) relies on an inner product structure on both the domain
and target spaces. For simplicity, we restrict our attention to real inner product spaces,
leaving the complex version to the interested reader. Thus, we begin with a linear function
L: V W that maps an inner product space V to a second inner product space W . We
distinguish the inner products on V and W (which may be different even when V and W
are the same vector space) by using a single angle bracket

1/12/04

ei
hv;v

to denote the inner product between

261

e V,
v, v
c 2003

Peter J. Olver

and a double angle bracket

e ii
hh w ; w

to denote the inner product between

e W.
w, w

With the prescription of inner products on both the domain and target spaces, the abstract
definition of the adjoint of a linear function can be formulated.
Definition 7.52. Let V, W be inner product spaces, and let L: V W be a linear
function. The adjoint of L is the function L : W V that satisfies
hh L[ v ] ; w ii = h v ; L [ w ] i

for all

v V,

w W.

(7.64)

Note that the adjoint function goes in the opposite direction to L, just like the transposed matrix. Also, the left hand side of equation (7.64) indicates the inner product on
W , while the right hand side is the inner product on V which is where the respective
vectors live. In infinite-dimensional situations, the adjoint may not exist. But if it does,
then it is uniquely determined by (7.64); see Exercise .
Remark : Technically, (7.64) only defines the formal adjoint of L. For the infinitedimensional function spaces arising in analysis, a true adjoint must satisfy certain additional requirements, [122]. However, we will suppress all such advanced analytical complications in our introductory treatment of the subject.
Lemma 7.53. The adjoint of a linear function is a linear function.
Proof : Given v V , w, z W , and scalars c, d R, we find
h v ; L [ c w + d z ] i = hh L[ v ] ; c w + d z ii = c hh L[ v ] ; w ii + d hh L[ v ] ; z ii
= c h v ; L [ w ] i + d h v ; L [ z ] i = h v ; c L [ w ] + d L [ z ] i.
Since this holds for all v V , we must have
L [ c w + d z ] = c L [ w ] + d L [ z ],
proving linearity.

Q.E.D.

The proof of the next result is left as an exercise.

Lemma 7.54. The adjoint of the adjoint of L is just L = (L ) .
Example 7.55. Let us first show how the defining equation (7.64) for the adjoint
leads directly to the transpose of a matrix. Let L: R n R m be the linear function
L[ v ] = A v defined by multiplication by the m n matrix A. Then L : R m R n is
linear, and so is represented by matrix multiplication, L [ w ] = A w, by an n m matrix
A . We impose the ordinary Euclidean dot products
e,
e i = vT v
hv;v

1/12/04

e Rn,
v, v

262

e
e ii = wT w,
hh w ; w

e Rm,
w, w

c 2003

Peter J. Olver

as our inner products on both R n and R m . Evaluation of both sides of the adjoint equation (7.64) gives
hh L[ v ] ; w ii = hh A v ; w ii = (A v)T w = vT AT w,
h v ; L [ w ] i = h v ; A w i = vT A w.

(7.65)

Since these must agree for all v, w, cf. Exercise , the matrix A representing L is equal to
the transposed matrix AT . Therefore, the adjoint of a matrix with respect to the Euclidean
inner product is its transpose: A = AT .
Example 7.56. Let us now adopt different, weighted inner products on the domain
and target spaces for the linear map L: R n 7R m given by L[ v ] = A v. Suppose that
e i = vT M v
e , while
the inner product on the domain space R n is given by h v ; v
m
T
e ii = w C w,
e
the inner product on the target space R is given by hh w ; w

where M > 0 and C > 0 are positive definite matrices of respective sizes m m and n n.
Then, in place of (7.65), we have
hh A v ; w ii = (A v)T C w = vT AT C w,

h v ; A w i = vT M A w.

Equating these expressions, we deduce that AT C = M A . Therefore the weighted adjoint

of the matrix A is given by the more complicated formula
A = M 1 AT C.

(7.66)

In applications, M plays the role of the mass matrix, and explicitly appears in the dynamical systems to be solved in Chapter 9. In particular, suppose A is square, defining a
e i = vT C v
e on both the
linear map L: R n R n . If we adopt the same inner product h v ; v
n

1 T
domain and target spaces R , then the adjoint matrix A = C A C is similar to the
transpose.
Everything that we learned about transposes can be reinterpreted in the more general
language of adjoints. The next result generalizes the fact, (1.49), that the transpose of the
product of two matrices is the product of the transposes, in the reverse order.
Lemma 7.57. If L: V W and M : W Z have respective adjoints L : W V
and M : Z W , then the composite linear function M L: V Z has adjoint (M L) =
L M , which maps Z to V .
e i, hh w ; w
e ii, hhh z ; e
Proof : Let h v ; v
z iii, denote, respectively, the inner products on
V, W, Z. For v V , z Z, we compute using the definition (7.64),
h v ; (M L) [ z ] i = hhh M L[ v ] ; z iii = hhh M [ L[ v ] ] ; z iii
= hh L[ v ] ; M [ z ] ii = h v ; L [ M [ z ] ] i = h v ; (L M )[ z ] i.

Since this holds for all v and z, the identification follows.

1/12/04

263

Q.E.D.
c 2003

Peter J. Olver

In this chapter, we have only looked at adjoints in the finite-dimensional situation,

when the linear functions are given by matrix multiplication. The equally important case
of adjoints of linear operators on function spaces, e.g., differential operators appearing in
boundary value problems, will be a principal focus of Section 11.3.
SelfAdjoint and Positive Definite Linear Functions
Throughout this section V will be a fixed inner product space. We can generalize the
notions of symmetric and positive definite matrices to linear operators on V in a natural
fashion. The analog of a symmetric matrix is a self-adjoint linear function.
Definition 7.58. A linear function K: V V is called self-adjoint if K = K. A
self-adjoint linear function is positive definite if
h v ; K[ v ] i > 0

for all

06
= v V.

(7.67)

In particular, if K > 0 then ker K = {0}, and so the positive definite linear system
K[ u ] = f with f rng K has a unique solution. The next result generalizes our basic
observation that the Gram matrices K = AT A, cf. (3.49), are symmetric and positive
(semi-)definite.
Theorem 7.59. Let L: V W be a linear map between inner product spaces
with adjoint L : W V . Then the composite map K = L L: V V is self-adjoint.
Moreover, K is positive definite if and only if ker L = {0}.
Proof : First, by Lemmas 7.57 and 7.54,
K = (L L) = L (L ) = L L = K,
proving self-adjointness. Furthermore, for v V , the inner product
h v ; K[ v ] i = h v ; L [ L[ v ] ] i = h L[ v ] ; L[ v ] i = k L[ v ] k2 > 0
is strictly positive provided L[ v ] 6
= 0. Thus, if ker L = {0}, then the positivity condition
(7.67) holds, and conversely.
Q.E.D.
Consider the case of a linear function L: R n R m that is represented by the m n
matrix A. For the Euclidean dot product on the two spaces, the adjoint L is represented
by the transpose AT , and hence the map K = L L has matrix representation AT A.
Therefore, in this case Theorem 7.59 reduces to our earlier Proposition 3.32 governing the
positive definiteness of the Gram matrix product AT A. If we change the inner product on
e ii = wT C w,
e then L is represented by AT C, and hence
the target space space to hh w ; w
K = L L has matrix form AT C A, which is the general symmetric, positive definite Gram
matrix constructed in (3.51) that played a key role in our development of the equations of
ei =
equilibrium in Chapter 6. Finally, if we also use the alternative inner product h v ; v
e on the domain space R n , then, according to (7.66), the adjoint of L has matrix
vT M v
form
A = M 1 AT C,
and therefore
K = A A = M 1 AT C A
(7.68)
1/12/04

264

c 2003

Peter J. Olver

is a self-adjoint, positive definite matrix with respect to the weighted inner product on
R n prescribed by the positive definite matrix M . In this case, the positive definite, selfadjoint operator K is no longer represented by a symmetric matrix. So, we did not quite
tell the truth when we said we would only allow symmetric matrices to be positive definite
we really meant only self-adjoint matrices. The general case will be important in our
discussion of the vibrations of mass/spring chains that have unequal masses. Extensions of
these constructions to differential operators underlies the analysis of the static and dynamic
differential equations of continuum mechanics, to be studied in Chapters 1118.
Minimization
In Chapter 4, we learned that the solution to a matrix system K u = f , with positive
definite coefficient matrix K > 0, can be characterized as the unique minimizer for the
quadratic function p(u) = 12 uT K u uT f . There is an analogous minimization principle
that characterizes the solutions to linear systems defined by positive definite linear operators. This general result is of tremendous importance in analysis of boundary value
problems for differential equations and also underlies the finite element numerical solution
algorithms. Details will appear in the subsequent chapters.
Theorem 7.60. Let K: V V be a positive definite operator on an inner product
space V . If f rng K, then the quadratic function
p(u) =

1
2

h u ; K[ u ] i h u ; f i

(7.69)

has a unique minimizer, which is the solution u = u? to the linear system K[ u ] = f .

Proof : The proof mimics that of its matrix counterpart in Theorem 4.1. Since f =
K[ u ], we can write
?

p(u) =

1
2

h u ; K[ u ] i h u ; K[ u? ] i =

1
2

h u u? ; K[ u u? ] i

1
2

h u? ; K[ u? ] i. (7.70)

where we used linearity, along with the fact that K is self-adjoint to identify the terms
h u ; K[ u? ] i = h u? ; K[ u ] i. Since K > 0 is positive definite, the first term on the right
hand side of (7.70) is always 0; moreover it equals its minimal value 0 if and only if
u = u? . On the other hand, the second term does not depend upon u at all, and hence is
a constant. Therefore, to minimize p(u) we must make the first term as small as possible,
which is accomplished by setting u = u? .
Q.E.D.
Remark : For linear functions given by matrix multiplication, positive definiteness
automatically implies invertibility, and so the linear system K u = f has a solution for
every right hand side. This is no longer necessarily true when K is a positive definite
operator on an infinite-dimensional function space. Therefore, the existence of a solution
or minimizer is a significant issue. And, in fact, many modern analytical existence results
rely on such minimization principles.
Theorem 7.61. Suppose L: V W is a linear map between inner product spaces
with ker L = {0} and adjoint map L : W V . Let K = L L: V V be the associated
positive definite operator. If f rng K, then the quadratic function
p(u) =
1/12/04

1
2

k L[ u ] k2 h u ; f i
265

(7.71)
c 2003

Peter J. Olver

has a unique minimizer u? , which is the solution to the linear system K[ u? ] = f .

Proof : It suffices to note that the quadratic term in (7.69) can be written in the
alternative form
h u ; K[ u ] i = h u ; L [ L[ u ] ] i = h L[ u ] ; L[ u ] i = k L[ u ] k2 .
Thus, (7.71) reduces to the quadratic function of the form (7.69) with K = L L, and so
Theorem 7.61 is an immediate consequence of Theorem 7.60.
Q.E.D.
Warning: In (7.71), the first term k L[ u ] k2 is computed using the norm based on the
inner product on W , while the second term h u ; f i employs the inner product on V .
Example 7.62. For a generalized positive definite matrix (7.68), the quadratic funce i = vT M v
e,
tion (7.71) is computed with respect to the alternative inner product h v ; v
so
p(u) = 21 (A u)T C A u uT M f = 12 uT (AT C A)u uT (M f ).
Theorem 7.61 tells us that the minimizer of the quadratic function is the solution to
AT C A u = M f ,

K u = M 1 AT C A u = f .

This also follows from our earlier finite-dimensional minimization Theorem 4.1.
This section is a preview of things to come, but the full implications will require us to
develop more analytical expertise. In Chapters 11, 15 and 18 , we will find that the most
important minimization principles for characterizing solutions to the linear boundary value
problems of physics and engineering all arise through this general, abstract construction.

1/12/04

266

c 2003

Peter J. Olver

Chapter 8
Eigenvalues
So far, our applications have concentrated on statics: unchanging equilibrium configurations of physical systems mass/spring chains, circuits, and structures that are
modeled by linear systems of algebraic equations. It is now time to allow motion in our
universe. In general, a dynamical system refers to the (differential) equations governing
the temporal behavior of some physical system: mechanical, electrical, chemical, fluid, etc.
Our immediate goal is to understand the behavior of the simplest class of linear dynamical systems frirst order autonomous linear systems of ordinary differential equations.
As always, complete analysis of the linear situation is an essential prerequisite to making
progress in the more complicated nonlinear realm.
We begin with a very quick review of the scalar case, whose solutions are exponential
functions. Substituting a similar exponential solution ansatz into the system leads us
immediately to the equations defining the eigenvalues and eigenvectors of the coefficient
matrix. Eigenvalues and eigenvectors are of absolutely fundamental importance in both
the mathematical theory and a very wide range of applications, including iterative systems
and numerical solution methods. Thus, to continue we need to gain a proper grounding in
their basic theory and computation.
The present chapter develops the most important properties of eigenvalues and eigenvectors; the applications to dynamical systems will appear in Chapter 9, while applications
to iterative systems and numerical methods is the topic of Chapter 10. Extensions of the
eigenvalue concept to differential operators acting on infinite-dimensional function space,
of essential importance for solving linear partial differential equations modelling continuous
dynamical systems, will be covered in later chapters.Each square matrix has a collection of
one or more complex scalars called eigenvalues and associated vectors, called eigenvectors.
Roughly speaking, the eigenvectors indicate directions of pure stretch and the eigenvalues the amount of stretching. Most matrices are complete, meaning that their (complex)
eigenvectors form a basis of the underlying vector space. When written in the eigenvector
basis, the matrix assumes a very simple diagonal form, and the analysis of its properties
becomes extremely simple. A particularly important class are the symmetric matrices,
whose eigenvectors form an orthogonal basis of R n ; in fact, this is by far the most common
way for orthogonal bases to appear. Incomplete matrices are trickier, and we relegate
them and their associated non-diagonal Jordan canonical form to the final section. The
numerical computation of eigenvalues and eigenvectors is a challenging issue, and must be

See the footnote in Chapter 7 for an explanation of the term ansatz or inspired guess.

1/12/04

267

c 2003

Peter J. Olver

be deferred until Section 10.6. Unless you are prepared to consult that section now, in
order to solve the computer-based problems in this chapter, you will need to make use of
a program that can accurately compute eigenvalues and eigenvectors of matrices.
A non-square matrix A does not have eigenvalues; however, we have already made
extensive use of the associated square Gram matrix K = AT A. The square roots of the
eigenvalues of K serve to define the singular values of A. Singular values and principal
component analysis are now used in an increasingly broad range of modern applications,
including statistical analysis, image processing, semantics, language and speech recognition, and learning theory. The singular values are used to define the condition number of
a matrix, that indicates the degree of difficulty of accurately solving the associated linear
system.

8.1. Simple Dynamical Systems.

The purpose of this section is to motivate the concepts of eigenvalue and eigenvector
of square matrices by attempting to solve the simplest class of dynamical systems first
order linear systems of ordinary differential equations. We begin with a review of the
scalar case, introducing basic notions of stability in preparation for the general version,
to be treated in depth in Chapter 9. We use the exponential form of the scalar solution
as a template for a possible solution in the vector case, and this immediately leads us to
the fundamental eigenvalue/eigenvector equation. Readers who are uninterested in such
motivations are advised skip ahead to Section 8.2.
Scalar Ordinary Differential Equations
Eigenvalues first appear when attempting to solve linear systems of ordinary differential equations. In order to motivate the construction, we begin by reviewing the scalar
case. Consider the elementary ordinary differential equation
du
= a u.
dt

(8.1)

Here a R is a real constant, while the unknown u(t) is a scalar function. As you learned
in first year calculus, the general solution to (8.1) is an exponential function
u(t) = c ea t .

(8.2)

The integration constant c is uniquely determined by a single initial condition

u(t0 ) = b

(8.3)

imposed at an initial time t0 . Substituting t = t0 into the solution formula (8.2),

u(t0 ) = c ea t0 = b,

and so

c = b e a t0 .

We conclude that
u(t) = b ea(tt0 ) .

(8.4)

is the unique solution to the scalar initial value problem (8.1), (8.3).
1/12/04

268

c 2003

Peter J. Olver

-1

-0.5

0.5

-1

-0.5

0.5

-1

-0.5

0.5

-2

-4

-6

a<0

-6

a=0
Figure 8.1.

a>0

Solutions to u = a u.

Example 8.1. The radioactive decay of an isotope, say Uranium 238, is governed
by the differential equation
du
= u.
(8.5)
dt
Here u(t) denotes the amount of the isotope remaining at time t, and the coefficient
> 0 governs the decay rate. The solution is given by an exponentially decaying function
u(t) = c e t , where c = u(0) is the initial amount of radioactive material.
The half-life t? is the time it takes for half of a sample to decay, that is when u(t? ) =
1
?
2 u(0). To determine t , we solve the algebraic equation
?

e t = 12 ,

so that

t? =

log 2
.

(8.6)

At each integer multiple n t? of the half-life, exactly half of the isotope has decayed, i.e.,
u(n t? ) = 2n u(0).
Let us make some elementary, but pertinent observations about this simple linear
dynamical system. First of all, since the equation is homogeneous, the zero function
u(t) 0 (corresponding to c = 0) is a constant solution, known as an equilibrium solution
or fixed point, since it does not depend on t. If the coefficient a > 0 is positive, then the
solutions (8.2) are exponentially growing (in absolute value) as t + . This implies that
the zero equilibrium solution is unstable. The initial condition u(t 0 ) = 0 produces the zero
solution, but if we make a tiny error (either physical, numerical, or mathematical) in the
initial data, say u(t0 ) = , then the solution u(t) = ea(tt0 ) will eventually get very far
away from equilibrium. More generally, any two solutions with very close, but not equal,
initial data, will eventually become arbitrarily far apart: | u 1 (t) u2 (t) | as t .
One consequence is the inherent difficulty in accurately computing the long time behavior
of the solution, since small numerical errors will eventually have very large effects.
On the other hand, if a < 0, the solutions are exponentially decaying in time. In this
case, the zero solution is stable, since a small error in the initial data will have a negligible
effect on the solution. In fact, the zero solution is globally asymptotically stable. The
phrase asymptotically stable implies that solutions that start out near zero eventually
return; more specifically, if u(t0 ) = is small, then u(t) 0 as t . The adjective
globally implies that this happens no matter how large the initial data is. In fact, for
1/12/04

269

c 2003

Peter J. Olver

a linear system, the stability (or instability) of an equilibrium solution is always a global
phenomenon.
The borderline case is when a = 0. Then all the solutions to (8.1) are constant. In this
case, the zero solution is stable indeed, globally stable but not asymptotically stable.
The solution to the initial value problem u(t0 ) = is u(t) . Therefore, a solution that
starts out near equilibrium will remain near, but will not asymptotically return. The three
qualitatively different possibilities are illustrated in Figure 8.1.
First Order Dynamical Systems
The simplest class of dynamical systems consist of n first order ordinary differential
equations for n unknown functions
du1
dun
= f1 (t, u1 , . . . , un ),
...
= fn (t, u1 , . . . , un ),
dt
dt
which depend on a scalar variable t R, which we usually view as time. We will often
write the system in the equivalent vector form
du
= f (t, u).
dt

(8.7)

The vector-valued solution u(t) = (u1 (t), . . . , un (t))T serves to parametrize a curve in R n ,
called a solution trajectory. A dynamical system is called autonomous if the time variable
t does not appear explicitly on the right hand side, and so has the system has the form
du
= f (u).
(8.8)
dt
Dynamical systems of ordinary differential equations appear in an astonishing variety of
applications, and have been the focus of intense research activity since the early days of
calculus.
We shall concentrate most of our attention on the very simplest case: a homogeneous,
linear, autonomous dynamical system
du
= A u,
(8.9)
dt
in which A is a constant nn matrix. In full detail, the system consists of n linear ordinary
differential equations
du1
= a11 u1 + a12 u2 + + a1n un ,
dt
du2
= a21 u1 + a22 u2 + + a2n un ,
dt
(8.10)
..
..
.
.
dun
= an1 u1 + an2 u2 + + ann un ,
dt
involving n unknown functions u1 (t), u2 (t), . . . , un (t). In the autonomous case, the coefficients aij are assumed to be (real) constants. We seek not only to develop basic solution
1/12/04

270

c 2003

Peter J. Olver

techniques for such dynamical systems, but to also understand their behavior from both a
qualitative and quantitative standpoint.
Drawing our inspiration from the exponential solution formula (8.2) in the scalar case,
let us investigate whether the vector system has any solutions of a similar exponential form
u(t) = e t v,

(8.11)

in which is a constant scalar, so e t is a scalar function of t, while v R n is a constant

vector. In other words, the components ui (t) = vi e t of our desired solution are assumed
to be constant multiples of the same exponential function. Since v is assumed to be
constant, the derivative of u(t) is easily found:

On the other hand, since e t

d t
du
=
e v = e t v.
dt
dt
is a scalar, it commutes with matrix multiplication, and so
A u = A e t v = e t A v.

Therefore, u(t) will solve the system (8.9) if and only if

e t v = e t A v,
or, canceling the common scalar factor e t ,
v = A v.
The result is a system of algebraic equations relating the vector v and the scalar . Analysis
of this system and its ramifications will be the topic of the remainder of this chapter.
After gaining a complete understanding, we will return to the solution of linear dynamical
systems in Chapter 9.

8.2. Eigenvalues and Eigenvectors.

We inaugurate our discussion of eigenvalues and eigenvectors with the fundamental
definition.
Definition 8.2. Let A be an n n matrix. A scalar is called an eigenvalue of A
if there is a non-zero vector v 6
= 0, called an eigenvector , such that
A v = v.

(8.12)

Thus, the matrix A effectively stretches the eigenvector v by an amount specified by

the eigenvalue . In this manner, the eigenvectors specify the directions of pure stretch
for the linear transformation defined by the matrix A.
Remark : The odd-looking terms eigenvalue and eigenvector are hybrid German
English words. In the original German, they are Eigenwert and Eigenvektor , which can
be fully translated as proper value and proper vector. For some reason, the halftranslated terms have acquired a certain charm, and are now standard. The alternative
English terms characteristic value and characteristic vector can be found in some (mostly
older) texts. Oddly, the term characteristic equation, to be defined below, is still used.
1/12/04

271

c 2003

Peter J. Olver

The requirement that the eigenvector v be nonzero is important, since v = 0 is a

trivial solution to the eigenvalue equation (8.12) for any scalar . Moreover, as far as
solving linear ordinary differential equations goes, the zero vector v = 0 only gives the
trivial zero solution u(t) 0.
The eigenvalue equation (8.12) is a system of linear equations for the entries of the
eigenvector v provided the eigenvalue is specified in advance but is mildly
nonlinear as a combined system for and v. Gaussian elimination per se will not solve
the problem, and we are in need of a new idea. Let us begin by rewriting the equation in
the form
(A I )v = 0,
(8.13)

where I is the identity matrix of the correct size . Now, for given , equation (8.13) is a
homogeneous linear system for v, and always has the trivial zero solution v = 0. But we
are specifically seeking a nonzero solution! According to Theorem 1.45, a homogeneous
linear system has a nonzero solution v 6
= 0 if and only if its coefficient matrix, which in
this case is A I , is singular. This observation is the key to resolving the eigenvector
equation.
Theorem 8.3. A scalar is an eigenvalue of the n n matrix A if and only if
the matrix A I is singular, i.e., of rank < n. The corresponding eigenvectors are the
nonzero solutions to the eigenvalue equation (A I )v = 0.
We know a number of ways to characterize singular matrices, including the determinantal criterion given in Theorem 1.50. Therefore, the following result is an immediate
corollary of Theorem 8.3.
Proposition 8.4. A scalar is an eigenvalue of the matrix A if and only if is a
solution to the characteristic equation
det(A I ) = 0.

(8.14)

In practice, when finding eigenvalues and eigenvectors by hand, one first solves the
characteristic equation (8.14). Then, for each eigenvalue one uses standard linear algebra
methods, i.e., Gaussian elimination, to solve the corresponding linear system (8.13) for the
eigenvector v.
Example 8.5. Consider the 2 2 matrix

3 1
A=
.
1 3
We compute the determinant in the characteristic equation using (1.34):

3
1
det(A I ) = det
= (3 )2 1 = 2 6 + 8.
1
3

Note that it is not legal to write (8.13) in the form (A )v = 0 since we do not know how
to subtract a scalar from a matrix A. Worse, if you type A in Matlab, it will subtract
from all the entries of A, which is not what we are after!

1/12/04

272

c 2003

Peter J. Olver

The characteristic equation is a quadratic polynomial equation, and can be solved by

factorization:
2 6 + 8 = ( 4) ( 2) = 0.
We conclude that A has two eigenvalues: 1 = 4 and 2 = 2.
For each eigenvalue, the corresponding eigenvectors are found by solving the associated
homogeneous linear system (8.13). For the first eigenvalue, the corresponding eigenvector
equation is

x + y = 0,
1 1
x
0
(A 4 I ) v =
=
,
or
1 1
y
0
x y = 0.
The general solution is

x = y = a,

1
a
,
=a
v=
1
a

where a is an arbitrary scalar. Only the nonzero solutions count as eigenvectors, and so
the eigenvectors for the eigenvalue 1 = 4 must have a 6
= 0, i.e., they are all nonzero scalar
T
multiples of the basic eigenvector v1 = ( 1, 1 ) .
Remark : In general, if v is an eigenvector of A for the eigenvalue , then so is any
nonzero scalar multiple of v. In practice, we only distinguish linearly independent eigenT
vectors. Thus, in this example, we shall say v1 = ( 1, 1 ) is the eigenvector corresponding
to the eigenvalue 1 = 4, when we really mean that the eigenvectors for 1 = 4 consist of
all nonzero scalar multiples of v1 .
Similarly, for the second eigenvalue 2 = 2, the eigenvector equation is

1 1
x
0
(A 2 I ) v =
=
.
1 1
y
0
T

The solution ( a, a ) = a ( 1, 1 ) is the set of scalar multiples of the eigenvector

T
v2 = ( 1, 1 ) . Therefore, the complete list of eigenvalues and eigenvectors (up to scalar
multiple) is

1
1
.
,
2 = 2,
v2 =
1 = 4,
v1 =
1
1
Example 8.6. Consider the 3 3 matrix

0 1 1
A = 1 2
1 .
1 1
2

If, at this stage, you end up with a linear system with only the trivial zero solution, youve
done something wrong! Either you dont have a correct eigenvalue maybe you made a mistake
setting up and/or solving the characteristic equation or youve made an error solving the
homogeneous eigenvector system.

1/12/04

273

c 2003

Peter J. Olver

Using the formula (1.82) for a 3 3 determinant, we compute the characteristic equation

1 2
0 = det(A I ) = det
1
1

1
1
2

= ( )(2 )2 + (1) 1 1 + (1) 1 1

1 (2 )(1) 1 1 ( ) (2 ) 1 (1)
= 3 + 4 2 5 + 2.

The resulting cubic polynomial can be factorized:

3 + 4 2 5 + 2 = ( 1)2 ( 2) = 0.
Most 3 3 matrices have three different eigenvalues, but this particular one has only two:
1 = 1, which is called a double eigenvalue since it is a double root of the characteristic
equation, along with a simple eigenvalue 2 = 2.
The eigenvector equation (8.13) for the double eigenvalue 1 = 1 is

1
(A I )v =
1

0
x
1 1

y = 0.
1
1
0
z
1
1

The general solution to this homogeneous linear system

1
1
a b
v = a = a 1 + b 0
1
0
b

depends upon two free variables, y = a, z = b. Any nonzero solution forms a valid
eigenvector for the eigenvalue 1 = 1, and so the general eigenvector is any non-zero linear
T
b1 = ( 1, 0, 1 )T .
combination of the two basis eigenvectors v1 = ( 1, 1, 0 ) , v
On the other hand, the eigenvector equation for the simple eigenvalue 2 = 2 is

The general solution

(A 2 I )v =
1
1

1 1
x
0

0
1
y = 0.
1
0
z
0

1
a
v = a = a 1
1
a

consists of all scalar multiple of the eigenvector v2 = ( 1, 1, 1 ) .

1/12/04

274

c 2003

Peter J. Olver

In summary, the eigenvalues and (basis) eigenvectors for this matrix are

1
1
b1 = 0 ,
v
1 = 1,
v1 = 1 ,
1
0

1 .
2 = 2,
v2 =
1

(8.15)

In general, given an eigenvalue , the corresponding eigenspace V R n is the subspace spanned by all its eigenvectors. Equivalently, the eigenspace is the kernel
V = ker(A I ).

(8.16)

In particular, is an eigenvalue if and only if V 6

= {0} is a nontrivial subspace, and then
every nonzero element of V is a corresponding eigenvector. The most economical way to
indicate each eigenspace is by writing out a basis, as in (8.15).

1 2 1
Example 8.7. The characteristic equation of the matrix A = 1 1 1 is
2 0 1
0 = det(A I ) = 3 + 2 + 5 + 3 = ( + 1)2 ( 3).

Again, there is a double eigenvalue 1 = 1 and a simple eigenvalue 2 = 3. However, in

this case the matrix

2 2 1
A 1 I = A + I = 1 0 1
2 0 2
T

has only a one-dimensional kernel, spanned by ( 2, 1, 2 ) . Thus, even though 1 is a

double eigenvalue, it only admits a one-dimensional eigenspace. The list of eigenvalues
and eigenvectors is, in a sense, incomplete:

2
2
2 = 3,
v2 = 1 .
1 = 1,
v1 = 1 ,
2
2

1 2 0
Example 8.8. Finally, consider the matrix A = 0 1 2 . The characteristic
2 2 1
equation is
0 = det(A I ) = 3 + 2 3 5 = ( + 1) (2 2 + 5).
The linear factor yields the eigenvalue 1. The quadratic factor leads to two complex
roots, 1 + 2 i and 1 2 i , which can be obtained via the quadratic formula. Hence A has
one real and two complex eigenvalues:
1 = 1,
1/12/04

2 = 1 + 2 i ,
275

3 = 1 2 i .
c 2003

Peter J. Olver

Complex eigenvalues are as important as real eigenvalues, and we need to be able to handle
them too. To find the corresponding eigenvectors, which will also be complex, we need
to solve the usual eigenvalue equation (8.13), which is now a complex homogeneous linear
system. For example, the eigenvector(s) for 2 = 1 + 2 i are found by solving

2 i

(A (1 + 2 i ) I )v =
0
2

2
2 i
2

0
0
x

2
y = 0.
0
2 2 i
z

This linear system can be solved by Gaussian elimination (with complex pivots). A simpler
approach is to work directly: the first equation 2 i x + 2 y = 0 tells us that y = i x, while
the second equation 2 i y 2 z = 0 says z = i y = x. If we trust our calculations
so far, we do not need to solve the final equation 2 x + 2 y + (2 2 i )z = 0, since we
know that the coefficient matrix is singular and hence it must be a consequence of the first
two equations. (However, it does serve as a useful check on our work.) So, the general
T
solution v = ( x, i x, x ) is an arbitrary constant multiple of the complex eigenvector
T
v2 = ( 1, i , 1 ) .
Summarizing, the matrix under consideration has three complex eigenvalues and three
corresponding eigenvectors, each unique up to (complex) scalar multiple:
1 = 1,

1
v1 = 1 ,
1

2 = 1 + 2 i ,

1
v2 = i ,
1

3 = 1 2 i ,

1
v3 = i .
1

Note that the third complex eigenvalue is the complex conjugate of the second, and the
eigenvectors are similarly related. This is indicative of a general fact for real matrices:
Proposition 8.9. If A is a real matrix with a complex eigenvalue = + i and
corresponding complex eigenvector v = x + i y, then the complex conjugate = i is
also an eigenvalue with complex conjugate eigenvector v = x i y.
Proof : First take complex conjugates of the eigenvalue equation (8.12)
A v = A v = v = v.
Using the fact that a real matrix is unaffected by conjugation, so A = A, we conclude
A v = v,

(8.17)

which is the eigenvalue equation for the eigenvalue and eigenvector v.

Q.E.D.

As a consequence, when dealing with real matrices, one only needs to compute the
eigenvectors for one of each complex conjugate pair of eigenvalues. This observation effectively halves the amount of work in the unfortunate event that we are confronted with
complex eigenvalues.
1/12/04

276

c 2003

Peter J. Olver

Remark : The reader may recall that we said one should never use determinants in
practical computations. So why have we reverted to using determinants to find eigenvalues?
The truthful answer is that the practical computation of eigenvalues and eigenvectors never
resorts to the characteristic equation! The method is fraught with numerical traps and
inefficiencies when (a) computing the determinant leading to the characteristic equation,
then (b) solving the resulting polynomial equation, which is itself a nontrivial numerical
problem, [30], and, finally, (c) solving each of the resulting linear eigenvector systems.
e to the true eigenvalue , the approximate
Indeed, if we only know an approximation
e = 0 has a nonsingular coefficient matrix, and hence only
eigenvector system (A )v
admits the trivial solution which does not even qualify as an eigenvector! Nevertheless,
the characteristic equation does give us important theoretical insight into the structure
of the eigenvalues of a matrix, and can be used on small, e.g., 2 2 and 3 3, matrices,
when exact arithmetic is employed. Numerical algorithms for computing eigenvalues and
eigenvectors are based on completely different ideas, and will be discussed in Section 10.6.
Basic Properties of Eigenvalues
If A is an n n matrix, then its characteristic polynomial is
pA () = det(A I ) = cn n + cn1 n1 + + c1 + c0 .

(8.18)

The fact that pA () is a polynomial of degree n is a consequence of the general determinantal formula (1.81). Indeed, every term is plus or minus a product of matrix entries
containing one from each row and one from each column. The term corresponding to the
identity permutation is obtained by multiplying the the diagonal entries together, which,
in this case, is

(a11 ) (a22 ) (ann ) = (1)n n +(1)n1 a11 + a22 + + ann n1 + ,

(8.19)
All of the other terms have at most n 2 diagonal factors aii , and so are polynomials
of degree n 2 in . Thus, (8.19) is the only summand containing the monomials n
and n1 , and so their respective coefficients are
cn = (1)n ,

cn1 = (1)n1 (a11 + a22 + + ann ) = (1)n1 tr A,

(8.20)

where tr A, the sum of its diagonal entries, is called the trace of the matrix A. The other
coefficients cn2 , . . . , c1 in (8.18) are more complicated combinations of the entries of A.
However, setting = 0 implies pA (0) = det A =
c0 , andhence the constant term equals the
a b
determinant of the matrix. In particular, if A =
is a 22 matrix, its characteristic
c d
polynomial has the form

a
b
pA () = det(A I ) = det
c
d
(8.21)
2
2
= (a + d) + (a d b c) = (tr A) + (det A).
As a result of these considerations, the characteristic equation of an n n matrix A
is a polynomial equation of degree n, namely pA () = 0. According to the Fundamental
1/12/04

277

c 2003

Peter J. Olver

Theorem of Algebra (see Corollary 16.63) every (complex) polynomial of degree n can be
completely factored:
pA () = (1)n ( 1 )( 2 ) ( n ).

(8.22)

The complex numbers 1 , . . . , n , some of which may be repeated, are the roots of the
characteristic equation pA () = 0, and hence the eigenvalues of the matrix A. Therefore,
we immediately conclude:
Theorem 8.10. An n n matrix A has at least one and at most n distinct complex
eigenvalues.
Most n n matrices meaning those for which the characteristic polynomial factors
into n distinct factors have exactly n complex eigenvalues. More generally, an eigenvalue
j is said to have multiplicity m if the factor ( j ) appears exactly m times in the
factorization (8.22) of the characteristic polynomial. An eigenvalue is simple if it has
multiplicity 1. In particular, A has n distinct eigenvalues if and only if all its eigenvalues are
simple. In all cases, when the eigenvalues are counted in accordance with their multiplicity,
every n n matrix has a total of n possibly repeated eigenvalues.
An example of a matrix with just one eigenvalue, of multiplicity n, is the nn identity
matrix I , whose only eigenvalue is = 1. In this case, every nonzero vector in R n is an
eigenvector of the identity matrix, and so the eigenspace is all of R n . At the other extreme,
the bidiagonal Jordan block matrix

J =

1
..

(8.23)

also has only one eigenvalue, , again of multiplicity n. But in this case, J has only
one eigenvector (up to scalar multiple), which is the standard basis vector e n , and so its
eigenspace is one-dimensional.
Remark : If is a complex eigenvalue of multiplicity k for the real matrix A, then its
complex conjugate also has multiplicity k. This is because complex conjugate roots of a
real polynomial necessarily appear with identical multiplicities.
Remark : If n 4, then one can, in fact, write down an explicit formula for the
solution to a polynomial equation of degree n, and hence explicit (but not particularly
helpful) formulae for the eigenvalues of general 2 2, 3 3 and 4 4 matrices. As soon
as n 5, there is no explicit formula (at least in terms of radicals), and so one must
usually resort to numerical approximations. This remarkable and deep algebraic result
was proved by the young Norwegian mathematician Nils Hendrik Abel in the early part of
the nineteenth century, [57].
1/12/04

278

c 2003

Peter J. Olver

If we explicitly multiply out the factored product (8.22) and equate the result to the
characteristic polynomial (8.18), we find that its coefficients c0 , c1 , . . . cn1 can be written
as certain polynomials of the roots, known as the elementary symmetric polynomials. The
first and last are of particular importance:
c 0 = 1 2 n ,

cn1 = (1)n1 (1 + 2 + + n ).

(8.24)

Comparison with our previous formulae for the coefficients c0 and cn1 leads us to the
following useful result.
Proposition 8.11. The sum of the eigenvalues of a matrix equals its trace:
1 + 2 + + n = tr A = a11 + a22 + + ann .

(8.25)

The product of the eigenvalues equals its determinant:

1 2 n = det A.

(8.26)

Remark : For repeated eigenvalues, one must add or multiply them in the formulae
(8.25), (8.26) according to their multiplicity.

1 2 1
Example 8.12. The matrix A = 1 1 1 considered in Example 8.7 has trace
2 0 1
and determinant
tr A = 1,
det A = 3.
These fix, respectively, the coefficient of 2 and the constant term in the characteristic
equation. This matrix has two distinct eigenvalues, 1, which is a double eigenvalue, and
3, which is simple. For this particular matrix, formulae (8.25), (8.26) become
1 = tr A = (1) + (1) + 3,

3 = det A = (1)(1) 3.

8.3. Eigenvector Bases and Diagonalization.

Most of the vector space bases that play a distinguished role in applications consist
of eigenvectors of a particular matrix. In this section, we show that the eigenvectors for
any complete matrix automatically form a basis for R n or, in the complex case, C n . In
the following subsection, we use the eigenvector basis to rewrite the linear transformation
determined by the matrix in a simple diagonal form.
The first task is to show that eigenvectors corresponding to distinct eigenvalues are
automatically linearly independent.
Lemma 8.13. If 1 , . . . , k are distinct eigenvalues of the same matrix A, then the
corresponding eigenvectors v1 , . . . , vk are linearly independent.
1/12/04

279

c 2003

Peter J. Olver

Proof : We use induction on the number of eigenvalues. The case k = 1 is immediate

since an eigenvector cannot be zero. Assume that we know the result for k 1 eigenvalues.
Suppose we have a linear combination
c1 v1 + + ck1 vk1 + ck vk = 0

(8.27)

which vanishes. Let us multiply this equation by the matrix A:

A c1 v1 + + ck1 vk1 + ck vk = c1 A v1 + + ck1 A vk1 + ck A vk

= c1 1 v1 + + ck1 k1 vk1 + ck k vk = 0.

On the other hand, if we just multiply the original equation by k , we also have
c1 k v1 + + ck1 k vk1 + ck k vk = 0.
Subtracting this from the previous equation, the final terms cancel and we are left with
the equation
c1 (1 k )v1 + + ck1 (k1 k )vk1 = 0.
This is a vanishing linear combination of the first k 1 eigenvectors, and so, by our
induction hypothesis, can only happen if all the coeffici