Applied Mathematics. OLVER SHAKIBAN
Applied Mathematics. OLVER SHAKIBAN
Peter J. Olver
School of Mathematics
University of Minnesota
Minneapolis, MN 55455
[email protected]
http://www.math.umn.edu/olver
Chehrzad Shakiban
Department of Mathematics
University of St. Thomas
St. Paul, MN 55105-1096
[email protected]
http://webcampus3.stthomas.edu/c9shakiban
Table of Contents
Chapter 1. Linear Algebra
1.1. Solution of Linear Systems
1.2. Matrices and Vectors
Basic Matrix Arithmetic
1.3. Gaussian Elimination Regular Case
Elementary Matrices
The L U Factorization
Forward and Back Substitution
1.4. Pivoting and Permutations
Permutation Matrices
The Permuted L U Factorization
1.5. Matrix Inverses
GaussJordan Elimination
Solving Linear Systems with the Inverse
The L D V Factorization
1.6. Transposes and Symmetric Matrices
Factorization of Symmetric Matrices
1.7. Practical Linear Algebra
Tridiagonal Matrices
Pivoting Strategies
1.8. General Linear Systems
Homogeneous Systems
1.9. Determinants
Chapter 5. Orthogonality
5.1. Orthogonal Bases
Computations in Orthogonal Bases
5.2. The GramSchmidt Process
A Modified GramSchmidt Process
5.3. Orthogonal Matrices
The Q R Factorization
5.4. Orthogonal Polynomials
2
Chapter 6. Equilibrium
6.1. Springs and Masses
The Minimization Principle
6.2. Electrical Networks
The Minimization Principle and the ElectricalMechanical Analogy
6.3. Structures in Equilibrium
Bars
Chapter 8. Eigenvalues
8.1. First Order Linear Systems of Ordinary Differential Equations
The Scalar Case
The Phase Plane
8.2. Eigenvalues and Eigenvectors
Basic Properties of Eigenvalues
8.3. Eigenvector Bases and Diagonalization
3
Diagonalization
8.4. Incomplete Matrices and the Jordan Canonical Form
8.5. Eigenvalues of Symmetric Matrices
The Spectral Theorem
Optimization Principles
8.6. Singular Values
Uniqueness
Adjoints and Boundary Conditions
Positive Definiteness and the Dirichlet Principle
15.5. Finite Elements
Finite Elements and Triangulation
The Finite Element Equations
Assembling the Elements
The Coefficient Vector and the Boundary Conditions
Inhomogeneous Boundary Conditions
Second Order Elliptic Boundary Value Problems
Plane Curves
Planar Domains
Vector Fields
Gradient and Curl
Integrals on Curves
Arc Length
Arc Length Integrals
Line Integrals of Vector Fields
Flux
A.6. Double Integrals
A.7. Greens Theorem
Appendix C. Series
C.1. Power Series
Taylors Theorem
C.2. Laurent Series
C.3. Special Functions
The Gamma Function
Series Solutions of Ordinary Differential Equations
Regular Points
10
11
Chapter 1
Linear Algebra
The source of linear algebra is the solution of systems of linear algebraic equations.
Linear algebra is the foundation upon which almost all applied mathematics rests. This is
not to say that nonlinear equations are less important; rather, progress in the vastly more
complicated nonlinear realm is impossible without a firm grasp of the fundamentals of
linear systems. Furthermore, linear algebra underlies the numerical analysis of continuous
systems, both linear and nonlinear, which are typically modeled by differential equations.
Without a systematic development of the subject from the start, we will be ill equipped
to handle the resulting large systems of linear equations involving many (e.g., thousands
of) unknowns.
This first chapter is devoted to the systematic development of direct algorithms for
solving systems of linear algegbraic equations in a finite number of variables. Our primary
focus will be the most important situation involving the same number of equations as
unknowns, although in Section 1.8 we extend our techniques to completely general linear
systems. While the former usually have a unique solution, more general systems more
typically have either no solutions, or infinitely many, and so tend to be of less direct physical
relevance. Nevertheless, the ability to confidently handle all types of linear systems is a
basic prerequisite for the subject.
The basic solution algorithm is known as Gaussian elimination, in honor of one of
the all-time mathematical greats the nineteenth century German mathematician Carl
Friedrich Gauss. As the father of linear algebra, his name will occur repeatedly throughout
this text. Gaussian elimination is quite elementary, but remains one of the most important
techniques in applied (as well as theoretical) mathematics. Section 1.7 discusses some
practical issues and limitations in computer implementations of the Gaussian elimination
method for large systems arising in applications.
The systematic development of the subject relies on the fundamental concepts of
scalar, vector, and matrix, and we quickly review the basics of matrix arithmetic. Gaussian elimination can be reinterpreted as matrix factorization, the (permuted) L U decomposition, which provides additional insight into the solution algorithm. Matrix inverses
and determinants are discussed in Sections 1.5 and 1.9, respectively. However, both play a
relatively minor role in practical applied mathematics, and so will not assume their more
traditional central role in this applications-oriented text.
Indirect algorithms, which are based on iteration, will be the subject of Chapter 10.
1/12/04
c 2003
Peter J. Olver
(1.1)
in three unknowns x, y, z. Linearity refers to the fact that the unknowns only appear to
the first power in the equations. The basic solution method is to systematically employ
the following fundamental operation:
Linear System Operation #1 : Add a multiple of one equation to another equation.
Before continuing, you should convince yourself that this operation does not change
the solutions to the system. As a result, our goal is to judiciously apply the operation
and so be led to a much simpler linear system that is easy to solve, and, moreover has the
same solutions as the original. Any linear system that is derived from the original system
by successive application of such operations will be called an equivalent system. By the
preceding remark, equivalent linear systems have the same solutions.
The systematic feature is that we successively eliminate the variables in our equations
in order of appearance. We begin by eliminating the first variable, x, from the second
equation. To this end, we subtract twice the first equation from the second, leading to the
equivalent system
x + 2 y + z = 2,
2 y z = 3,
(1.2)
x + y + 4 z = 3.
Next, we eliminate x from the third equation by subtracting the first equation from it:
x + 2 y + z = 2,
2 y z = 3,
y + 3 z = 1.
(1.3)
The equivalent system (1.3) is already simpler than the original system (1.1). Notice that
the second and third equations do not involve x (by design) and so constitute a system
of two linear equations for two unknowns. Moreover, once we have solved this subsystem
for y and z, we can substitute the answer into the first equation, and we need only solve
a single linear equation for x.
Also, there are no product terms like x y or x y z. The official definition of linearity will be
deferred until Chapter 7.
1/12/04
c 2003
Peter J. Olver
We continue on in this fashion, the next phase being the elimination of the second
variable y from the third equation by adding 12 the second equation to it. The result is
x + 2 y + z = 2,
2 y z = 3,
5
5
2 z = 2,
(1.4)
which is the simple system we are after. It is in what is called triangular form, which means
that, while the first equation involves all three variables, the second equation only involves
the second and third variables, and the last equation only involves the last variable.
Any triangular system can be straightforwardly solved by the method of Back Substitution. As the name suggests, we work backwards, solving the last equation first, which
requires z = 1. We substitute this result back into the next to last equation, which becomes 2 y 1 = 3, with solution y = 2. We finally substitute these two values for y and
z into the first equation, which becomes x + 5 = 2, and so the solution to the triangular
system (1.4) is
x = 3,
y = 2,
z = 1.
(1.5)
Moreover, since we only used our basic operation to pass from (1.1) to the triangular
system (1.4), this is also the solution to the original system of linear equations. We note
that the system (1.1) has a unique meaning one and only one solution, namely (1.5).
And that, barring a few complications that can crop up from time to time, is all that
there is to the method of Gaussian elimination! It is very simple, but its importance cannot
be overemphasized. Before discussing the relevant issues, it will help to reformulate our
method in a more convenient matrix notation.
1
1 0 3
e
2
( .2 1.6 .32 ),
,
,
2 4 1
1 .83
5 74
am2
...
a1n
a2n
..
.
amn
0
,
0
1
2
3
,
5
(1.6)
for a general matrix of size m n (read m by n), where m denotes the number of rows in
A and n denotes the number of columns. Thus, the preceding examples of matrices have
respective sizes 2 3, 4 2, 1 3, 2 1 and 2 2. A matrix is square if m = n, i.e., it
has the same number of rows as columns. A column vector is a m 1 matrix, while a row
1/12/04
c 2003
Peter J. Olver
vector is a 1 n matrix. As we shall see, column vectors are by far the more important of
the two, and the term vector without qualification will always mean column vector.
A 1 1 matrix, which has but a single entry, is both a row and column vector.
The number that lies in the ith row and the j th column of A is called the (i, j) entry
of A, and is denoted by aij . The row index always appears first and the column index
second . Two matrices are equal, A = B, if and only if they have the same size, and all
their entries are the same: aij = bij .
A general linear system of m equations in n unknowns will take the form
a11 x1 + a12 x2 + + a1n xn = b1 ,
= b2 ,
..
.
= bm .
(1.7)
As such, it has three basic constituents: the m n coefficient matrix A, with entries a ij as
x
1
x2
b
1
b2
b =
..
.
bm
xn
containing right hand sides. For instance, in our previous example (1.1),
x
2 1
sides.
Remark : We will consistently use bold face lower case letters to denote vectors, and
ordinary capital letters to denote general matrices.
Matrix Arithmetic
There are three basic operations in matrix arithmetic: matrix addition, scalar multiplication, and matrix multiplication. First we define addition of matrices. You are only
allowed to add two matrices of the same size, and matrix addition is performed entry by
In tensor analysis, [ 2 ], a sub- and super-script notation is adopted, with a ij denoting the
(i, j) entry of the matrix A. This has certain advantages, but, to avoid possible confusion with
powers, we shall stick with the simpler subscript notation throughout this text.
1/12/04
c 2003
Peter J. Olver
1
1
2
0
3 5
2 1
4
1
3
1
1 2
3 6
3
=
.
1 0
3 0
Basic properties of scalar multiplication are summarized at the end of this section.
Finally, we define matrix multiplication. First, the product between a row vector a
and a column vector x having the same number of entries is the scalar defined by the
following rule:
x
1
n
X
x2
a x = ( a 1 a2 . . . a n )
=
a
x
+
a
x
+
+
a
x
=
ak xk .
1 1
2 2
n n
..
.
k=1
xn
(1.8)
n
X
aik bkj .
(1.9)
k=1
Note that our restriction on the sizes of A and B guarantees that the relevant row and
column vectors will have the same number of entries, and so their product is defined.
For example, the product of the coefficient matrix A and vector of unknowns x for
our original system (1.1) is given by
x + 2y + z
1 2 1
x
A x = 2 6 1 y = 2 x + 6 y + z .
x + y + 4z
1 1 4
z
1/12/04
c 2003
Peter J. Olver
The result is a column vector whose entries reproduce the left hand sides of the original
linear system! As a result, we can rewrite the system
Ax = b
(1.10)
as an equality between two column vectors. This result is general; a linear system (1.7)
consisting of m equations in n unknowns can be written in the matrix form (1.10) where
A is the m n coefficient matrix (1.6), x is the n 1 column vectors of unknowns, and b
is the m 1 column vector containing the right hand sides. This is the reason behind the
non-evident definition of matrix multiplication. Component-wise multiplication of matrix
entries turns out to be almost completely useless in applications.
Now, the bad news. Matrix multiplication is not commutative. For example, BA may
not be defined even when A B is. Even if both are defined, they may be different sized
matrices. For example the product of a row vector r, a 1 n matrix, and a column vector
c, an n 1 matrix, is a 1 1 matrix or scalar s = r c, whereas the reversed product C = c r
is an n n matrix. For example,
3 6
3
3
.
(1 2) =
= 3,
whereas
(1 2)
0 0
0
0
In computing the latter product, dont forget that we multiply the rows of the first matrix
by the columns of the second. Moreover, even if the matrix products A B and B A have
the same size, which requires both A and B to be square matrices, we may still have
AB 6
= B A. For example,
1 2
0 1
2 5
3 4
0 1
1 2
=
=
6
=
.
3 4
1 2
4 11
5 6
1 2
3 4
On the other hand, matrix multiplication is associative, so A (B C) = (A B) C whenever A
has size m n, B has size n p and C has size p q; the result is a matrix of size m q.
The proof of this fact is left to the reader. Consequently, the one significant difference
between matrix algebra and ordinary algebra is that you need to be careful not to change
the order of multiplicative factors without proper justification.
Since matrix multiplication multiplies rows times columns, one can compute the
columns in a matrix product C = A B by multiplying the matrix A by the individual
columns of B. The k th column of C is equal to the product of A with the k th column of
B. For example, the two columns of the matrix product
3 4
1 1 2
1 4
0 2 =
2 0 2
8 6
1 1
are obtained by multiplying the first matrix with the individual columns of the second:
1
1
2
1
1 1 2
2 = 4 .
,
0 =
6
2 0 2
8
2 0 2
1
1
1/12/04
c 2003
Peter J. Olver
A B = A b1 b2 . . . b p = A b 1 A b 2 . . . A b p .
(1.11)
There are two important special matrices. The first is the zero matrix of size m n,
denoted Omn or just O if the size is clear from context. It forms the additive unit, so
A + O = A = O + A for any matrix A of the same size. The role of the multiplicative unit
is played by the square identity matrix
1 0 0 ... 0
0 1 0 ... 0
0 0 1 ... 0
I = In =
. . .
..
..
.. .. ..
.
.
0
0 0
...
of size nn. The entries of I along the main diagonal (which runs from top left to bottom
right) are equal to 1; the off-diagonal entries are all 0. As the reader can check, if A is any
m n matrix, then Im A = A = A In . We will sometimes write the last equation as just
I A = A = A I ; even though the identity matrices can have different sizes, only one size is
valid for each matrix product to be defined.
The identity matrix is a particular example of a diagonal matrix. In general, a matrix
is diagonal if all its off-diagonal entries are zero: aij = 0 for all i 6
= j. We will sometimes
write D = diag (c1 , . . . , cn ) for the n n diagonalmatrix with
diagonal entries dii = ci .
1 0 0
Thus, diag (1, 3, 0) refers to the diagonal matrix 0 3 0 , while the n n identity
0 0 0
matrix can be written as In = diag (1, 1, . . . , 1).
Let us conclude this section by summarizing the basic properties of matrix arithmetic.
In the following table, A, B, C are matrices, c, d scalars, O is a zero matrix, and I is an
identity matrix. The matrices are assumed to have the correct sizes so that the indicated
operations are defined.
a
a12 . . . a1n b1
11
..
.
. .
.
.
bn
am1 am2 . . . amn
1/12/04
c 2003
Peter J. Olver
A+B =B+A
(A + B) + C = A + (B + C)
A+O=A=O+A
c (d A) = (c d) A
A + ( A) = O, A = (1)A
1A=A
0A=O
c (A + B) = (c A) + (c B)
(c + d) A = (c A) + (d A)
(A B) C = A (B C)
Identity Matrix
Zero Matrix Matrix Multiplication
A I = A = IA
AO = O = OA
which is an m (n + 1) matrix obtained by tacking the right hand side vector onto the
original coefficient matrix. The extra vertical line is included just to remind us that the
last column of this matrix is special. For example, the augmented matrix for the system
(1.1), i.e.,
x + 2 y + z = 2,
1 2 1 2
2 x + 6 y + z = 7,
(1.13)
is
M = 2 6 1 7 .
3
1
1
4
x + y + 4 z = 3,
Note that one can immediately recover the equations in the original linear system from
the augmented matrix. Since operations on equations also affect their right hand sides,
keeping track of everything is most easily done through the augmented matrix.
For the time being, we will concentrate our efforts on linear systems that have the
same number, n, of equations as unknowns. The associated
matrix A is square,
coefficient
c 2003
Peter J. Olver
1 2
0 2
1 1
row of
1
1
4
2
(1.14)
3
3
that corresponds to the first equivalent system (1.2). When elementary row operation #1
is performed, it is critical that the result replace the row being added to not the row
being multiplied by the scalar. Notice that the elimination of a variable in an equation
in this case, the first variable in the second equation amounts to making its entry in
the coefficient matrix equal to zero.
We shall call the (1, 1) entry of the coefficient matrix the first pivot. The precise
definition of pivot will become clear as we continue; the one key requirement is that a
pivot be nonzero. Eliminating the first variable x from the second and third equations
amounts to making all the matrix entries in the column below the pivot equal to zero. We
have already done this with the (2, 1) entry in (1.14). To make the (3, 1) entry equal to
zero, we subtract the first row from the last row. The resulting augmented matrix is
1 2
1 2
0 2 1 3 ,
0 1 3 1
which corresponds to the system (1.3). The second pivot is the (2, 2) entry of this matrix,
which is 2, and is the coefficient of the second variable in the second equation. Again, the
pivot must be nonzero. We use the elementary row operation of adding 12 of the second
row to the third row to make the entry below the second pivot equal to 0; the result is the
augmented matrix
1 2 1 2
N = 0 2 1 3 .
0 0 52 25
that corresponds to the triangular system (1.4). We write the final augmented matrix as
N = U |c ,
where
U= 0
0
2 1
2 1 ,
0 52
c = 3 .
5
2
U x = c.
(1.15)
Its coefficient matrix U is upper triangular , which means that all its entries below the
main diagonal are zero: uij = 0 whenever i > j. The three nonzero entries on its diagonal,
1, 2, 25 , including the last one in the (3, 3) slot are the three pivots. Once the system has
been reduced to triangular form (1.15), we can easily solve it, as discussed earlier, by back
substitution.
1/12/04
c 2003
Peter J. Olver
next i
next j
end
The preceding algorithm for solving a linear system is known as regular Gaussian
elimination. A square matrix A will be called regular if the algorithm successfully reduces
it to upper triangular form U with all non-zero pivots on the diagonal. In other words,
for regular matrices, we identify each successive nonzero entry in a diagonal position as
the current pivot. We then use the pivot row to make all the entries in the column below
the pivot equal to zero through elementary row operations of Type #1. A system whose
coefficient matrix is regular is solved by first reducing the augmented matrix to upper
triangular form and then solving the resulting triangular system by back substitution.
Let us state this algorithm in the form of a program, written in a general pseudocode
that can be easily translated into any specific language, e.g., C++, Fortran, Java,
Maple, Mathematica or Matlab. We use a single letter M = (mij ) to denote the
Strangely, there is no commonly accepted term for these kinds of matrices. Our proposed
adjective regular will prove to be quite useful in the sequel.
1/12/04
10
c 2003
Peter J. Olver
1 0 0
1 2 1
1 2 1
1 0 0
E1 A = 2 1 0 2 6 1 = 0 2 1 ,
1 1 4
1 1 4
0 0 1
which you may recognize as the first elementary row operation we used
trative example. Indeed, if we set
1 0 0
1 0 0
1 0
E1 = 2 1 0 ,
E2 = 0 1 0 ,
E3 = 0 1
0 0 1
1 0 1
0 12
(1.16)
then multiplication by E1 will subtract twice the first row from the second row, multiplication by E2 will subtract the first row from the third row, and multiplication by E 3 will
add 21 the second row to the third row precisely the row operations used to place our
original system in triangular form. Therefore, performing them in the correct order (and
using the associativity of matrix multiplication), we conclude that when
1 2 1
1 2 1
A = 2 6 1,
then
E 3 E2 E1 A = U = 0 2 1 .
(1.17)
5
1 1 4
0 0 2
The reader should check this by directly multiplying the indicated matrices.
In general, then, the elementary matrix E of size m m will have all 1s on the
diagonal, a nonzero entry c in position (i, j), for some i 6
= j, and all other entries equal
to zero. If A is any m n matrix, then the matrix product E A is equal to the matrix
obtained from A by the elementary row operation adding c times row j to row i. (Note
the reversal of order of i and j.)
The elementary row operation that undoes adding c times row j to row i is the inverse
row operation that subtracts c (or, equivalently, adds c) times row j from row i. The
corresponding inverse elementary matrix again has 1s along the diagonal and c in the
(i, j) slot. Let us denote the inverses of the particular elementary matrices (1.16) by L i ,
so that, according to our general rule,
1 0 0
1 0 0
1 0 0
(1.18)
L1 = 2 1 0 ,
L2 = 0 1 0 ,
L3 = 0 1 0 .
1
0 0 1
1 0 1
0 2 1
Note that the product
Li Ei = I
1/12/04
11
(1.19)
c 2003
Peter J. Olver
is the 3 3 identity matrix, reflecting the fact that these are inverse operations. (A more
thorough discussion of matrix inverses will be postponed until the following section.)
The product of the latter three elementary matrices is equal to
1 0 0
L = L 1 L2 L3 = 2 1 0 .
(1.20)
1 21 1
The matrix L is called a special lower triangular matrix, where lower triangular means
that all the entries above the main diagonal are 0, while special indicates that all the
entries on the diagonal are equal to 1. Observe that the entries of L below the diagonal are
the same as the corresponding nonzero entries in the Li . This is a general fact, that holds
when the lower triangular elementary matrices are multiplied in the correct order. (For
instance, the product L3 L2 L1 is not so easily predicted.) More generally, the following
elementary consequence of the laws of matrix multiplication will be used extensively.
b are lower triangular matrices of the same size, so is their
Lemma 1.2. If L and L
b
product L L. If they are both special lower triangular, so is their product. Similarly, if
b are (special) upper triangular matrices, so is their product U U
b.
U, U
The L U Factorization
We have almost arrived at our first important result. Consider the product of the
matrices L and U in (1.17), (1.20). Using equation (1.19), along with the basic property
of the identity matrix I and associativity of matrix multiplication, we conclude that
L U = (L1 L2 L3 )(E3 E2 E1 A) = L1 L2 (L3 E3 )E2 E1 A = L1 L2 I E2 E1 A
= L1 (L2 E2 )E1 A = L1 I E1 A = L1 E1 A = I A = A.
In other words, we have factorized the coefficient matrix A = L U into a product of a
special lower triangular matrix L and an upper triangular matrix U with the nonzero
pivots on its main diagonal. The same holds true for almost all square coefficient matrices.
Theorem 1.3. A matrix A is regular if and only if it can be factorized
A = L U,
(1.21)
where L is a special lower triangular matrix, having all 1s on the diagonal, and U is upper
triangular with nonzero diagonal entries, which are its pivots. The nonzero off-diagonal
entries lij for i > j appearing in L prescribe the elementary row operations that bring
A into upper triangular form; namely, one subtracts lij times row j from row i at the
appropriate step of the Gaussian elimination process.
2 1 1
Example 1.4. Let us compute the L U factorization of the matrix A = 4 5 2 .
2 2 0
Applying the Gaussian elimination algorithm, we begin by subtracting twice the first row
from the second row, and then subtract the first row from the third. The result is the
1/12/04
12
c 2003
Peter J. Olver
1
1
3
0 . The next step adds the second row to the third row, leading to
3 1
2 1 1
the upper triangular matrix U = 0 3 0 , with its diagonal entries 2, 3, 1 indicat0 0 1
1 0 0
ing the pivots. The corresponding lower triangular matrix is L = 2 1 0 , whose
1 1 1
entries below the diagonal are the negatives of the multiples we used during the elimination
procedure. Namely, the (2, 1) entry of L indicates that we added 2 times the first row
to the second row; the (3, 1) entry indicates that we added 1 times the first row to the
third; and, finally, the (3, 2) entry indicates that we added the second row to the third
row during the algorithm. The reader might wish to verify the factorization A = L U , or,
explicitly,
1 0 0
2 1 1
2 1 1
4 5 2 = 2 1 00 3 0 .
1 1 1
0 0 1
2 2 0
2
matrix 0
0
(1.22)
for the vector c by forward substitution. This is the same as back substitution, except one
solves the equations for the variables in the direct order from first to last. Explicitly,
c1 = b1 ,
c i = bi
i
X
lij cj ,
for
i = 2, 3, . . . , n,
(1.23)
j =1
noting that the previously computed values of c1 , . . . , ci1 are used to determine ci .
(2) Second, solve the resulting upper triangular system
Ux=c
(1.24)
n
X
c
1
xn = n ,
xi =
ci
for
i = n 1, . . . , 2, 1,
uij xj ,
unn
uii
j = i+1
(1.25)
and
L c = b,
then
13
A x = L U x = L c = b.
c 2003
Peter J. Olver
Once we have found the L U factorization of the coefficient matrix A, the Forward and
Back Substitution processes quickly produce the solution, and are easy to program on a
computer.
Example 1.5. With the
2 1
4 5
2 2
L U decomposition
1 0 0
2 1
1
2 = 2 1 00 3
1 1 1
0 0
0
1
0
1
found in Example 1.4, we can readily solve any linear system with the given coefficient
matrix by Forward and Back Substitution. For instance, to find the solution to
1
2 1 1
x
4 5 2y = 2,
2
2 2 0
z
1
2
1
1
0 0
a
1 0b = 2,
2
1 1
c
or, explicitly,
a
2a + b
= 1,
= 2,
a b + c = 2.
The first equation says a = 1; substituting into the second, we find b = 0; the final equation
gives c = 1. We then solve the upper triangular system
2 1
0 3
0 0
1
x
a
1
0 y = b = 0,
1
z
c
1
which is
2 x + y + z = 1,
3y
= 0,
z = 1.
In turn, we find z = 1, then y = 0, and then x = 1, which is the unique solution to the
original system.
Of course, if we are not given the L U factorization in advance, we can just use direct
Gaussian elimination on the augmented matrix. Forward and Back Substitution is useful
if one has already computed the factorization by solving for a particular right hand side
b, but then later wants to know the solutions corresponding to alternative bs.
14
c 2003
Peter J. Olver
2 x + 6 y + z = 7,
x + 4 z = 3.
The augmented coefficient matrix is
0
2
1
3 1 2
6 1 7 .
0 4 3
In this case, the (1, 1) entry is 0, and is not a legitimate pivot. The problem, of course,
is that the first variable x does not appear in the first equation, and so we cannot use it
to eliminate x in the other two equations. But this problem is actually a bonus we
already have an equation with only two variables in it, and so we only need to eliminate x
from one of the other two equations. To be systematic, we rewrite the system in a different
order,
2 x + 6 y + z = 7,
3 y + z = 2,
x + 4 z = 3,
by interchanging the first two equations. In other words, we employ
Linear System Operation #2 : Interchange two equations.
Clearly this operation does not change the solution, and so produces an equivalent
system. In our case, the resulting augmented coefficient matrix is
2 6 1 7
0 3 1 2,
1 0 4 3
and is obtained from the original by performing the second type of row operation:
Elementary Row Operation #2 : Interchange two rows of the matrix.
The new nonzero upper left entry, 2, can now serve as the first pivot, and we may
continue to apply elementary row operations of Type #1 to reduce our matrix to upper
triangular form. For this particular example, we eliminate the remaining nonzero entry in
the first column by subtracting 12 the first row from the last:
2 6 1 7
0 3 1 2 .
0 3 72 12
The (2, 2) entry serves as the next pivot. To eliminate the nonzero entry below it, we add
the second to the third row:
2 6 1 7
0 3 1 2 .
0 0 29 23
1/12/04
15
c 2003
Peter J. Olver
next i
next j
end
We have now placed the system in upper triangular form, with the three pivots, 2, 3, 29
along the diagonal. Back substitution produces the solution x = 35 , y = 95 , z = 31 .
The row interchange that is required when a zero shows up on the diagonal in pivot
position is known as pivoting. Later, in Section 1.7, we shall discuss practical reasons for
pivoting even when a diagonal entry is nonzero. The coefficient matrices for which the
Gaussian elimination algorithm with pivoting produces the solution are of fundamental
importance.
Definition 1.6. A square matrix is called nonsingular if it can be reduced to upper
triangular form with all non-zero elements on the diagonal by elementary row operations
of Types 1 and 2. Conversely, a square matrix that cannot be reduced to upper triangular
form because at some stage in the elimination procedure the diagonal entry and all the
entries below it are zero is called singular .
Every regular matrix is nonsingular, but, as we just saw, nonsingular matrices are
more general. Uniqueness of solutions is the key defining characteristic of nonsingularity.
Theorem 1.7. A linear system A x = b has a unique solution for every choice of
right hand side b if and only if its coefficient matrix A is square and nonsingular.
We are able to prove the if part of this theorem, since nonsingularity implies reduction to an equivalent upper triangular form that has the same solutions as the original
system. The unique solution to the system is found by back substitution. The only if
part will be proved in Section 1.8.
The revised version of the Gaussian Elimination algorithm, valid for all nonsingular
coefficient matrices, is implemented
16
c 2003
Peter J. Olver
Permutation Matrices
As with the first type of elementary row operation, row interchanges can be accomplished by multiplication by a second type of elementary matrix. Again, the elementary
matrix is found by applying the row operation in question to the identity matrix of the
appropriate size. For instance, interchanging rows 1 and 2 of the 3 3 identity matrix
produces the elementary interchange matrix
0 1 0
P = 1 0 0.
0 0 1
As the reader can check, the effect of multiplying a 3 rowed matrix A on the left by P ,
producing P A, is the same as interchanging the first two rows of A. For instance,
4 5 6
1 2 3
0 1 0
1 0 04 5 6 = 1 2 3.
7 8 9
7 8 9
0 0 1
Definition 1.8. A permutation matrix is a matrix obtained from the identity matrix
by any combination of row interchanges.
In particular, applying a row interchange to a permutation matrix produces another
permutation matrix. The following result is easily established.
Lemma 1.9. A matrix P is a permutation matrix if and only if each row of P
contains all 0 entries except for a single 1, and, in addition, each column of P also contains
all 0 entries except for a single 1.
In general, if a permutation matrix P has a 1 in position (i, j), then the effect of
multiplication by P is to move the j th row of A into the ith row of the product P A.
matrices, namely
1 0
0 0 1
0 1, 1 0 0.
0 0
0 1 0
(1.27)
These have the following effects: if A is a matrix with row vectors r 1 , r2 , r3 , then multiplication on the left by each of the six permutation matrices produces
r3
r2
r1
r3
r2
r1
r1 ,
r3 ,
r3 ,
r2 ,
r1 ,
r2 ,
r2
r1
r2
r1
r3
r3
Example 1.10.
1 0 0
0
0 1 0, 1
0 0 1
0
0 0 , 0
0 1
1
different 3 3 permutation
0 1
1 0 0
0
1 0 , 0 0 1 , 0
0 0
0 1 0
1
respectively. Thus, the first permutation matrix, which is the identity, does nothing. The
second, third and fourth represent row interchanges. The last two are non-elementary
permutations; each can be realized as a pair of row interchanges.
1/12/04
17
c 2003
Peter J. Olver
(1.28)
(1.29)
1
0 0
2 6 1
0 1 0
0 2 1
1 0 02 6 1 = 0
(1.30)
1 00 2 1 .
1
9
1
1
0
0
0 0 1
1 1 4
2
2
As a result of these considerations, we have established the following generalization
of Theorem 1.3.
18
c 2003
Peter J. Olver
above. Explicitly, we first multiply the system A x = b by the permutation matrix, leading
to
b
P A x = P b b,
(1.31)
and
U x = c,
(1.32)
0
2
1
wish to solve
1
x
2 1
6 1 y = 2 .
0
z
1 4
In view of the P A = L U factorization established in (1.30), we need only solve the two auxiliary systems (1.32) by Forward and Back Substitution, respectively. The lower triangular
system is
1
0 0
0 1 0
1
2
a
0
1 0 b = 1 = 1 0 0 2 ,
1
0 0 1
0
0
1 1
c
2
with solution a = 2, b = 1,
2
0
0
2 1
y =
1
= b.
9
0 2
z
2
c
The solution, which is also the solution to the original system, is obtained by back substi5
, x = 37
tution, with z = 49 , y = 18
18 .
(1.33)
19
c 2003
Peter J. Olver
3
1 0 0
3 4 5
1 2 1
3 1 2 1 1 1 = 0 1 0 = 1
4
0 0 1
4 6 7
2 2 1
1 2 1
we conclude that when A = 3 1 2 then A1 =
2 2 1
1
entries of A do not follow any easily discernable pattern
1 2 1
4 5
1 1 3 1 2 ,
2 2 1
6 7
3 4 5
1 1 1 . Note that the
4 6 7
in terms of the entries of A.
Not every square matrix has an inverse. Indeed, not every scalar has an inverse the
one counterexample being a = 0. There is no general concept of inverse for rectangular
matrices.
x y
of a general 2 2 matrix
Example 1.15. Let us compute the inverse X =
z w
a b
A=
. The right inverse condition
c d
1 0
ax + bz ay + bw
= I
=
AX =
0 1
cx + dz cy + dw
holds if and only if x, y, z, w satisfy the linear system
a x + b z = 1,
a y + b w = 0,
c x + d z = 0,
c y + d w = 0.
d
,
ad bc
y=
b
,
ad bc
z=
c
,
ad bc
w=
a
,
ad bc
1
d b
X=
ad bc c a
forms a right inverse to A. However, a short computation shows that it also defines a left
inverse:
xa + yc xb + yd
1 0
XA =
=
= I,
za+wc zb+wd
0 1
and hence X = A1 is the inverse to A.
The denominator appearing in the preceding formulae has a special name; it is called
the determinant of the 2 2 matrix A, and denoted
a b
= a d b c.
(1.34)
det
c d
1/12/04
20
c 2003
Peter J. Olver
Thus, the determinant of a 2 2 matrix is the product of the diagonal entries minus
the product of the off-diagonal entries. (Determinants of larger square matrices will be
discussed in Section 1.9.) Thus, the 2 2 matrix A is invertible, with
1
d b
1
,
(1.35)
A =
ad bc c a
1
3
, then det A = 2 6
= 0. We conclude
if and only if det A 6
= 0. For example, if A =
2 4
!
3
1
4
3
2
that A has an inverse, which, by (1.35), is A1 =
=
.
1
2
1
2
1
2
1 0 0
1 0 0
E = 0 1 0,
then
E 1 = 0 1 0 .
2 0 1
2 0 1
This reflects the fact that the inverse of the elementary row operation that adds twice the
first row to the third row is the operation of subtracting twice the first row from the third
row.
0 1 0
Example 1.20. Let P = 1 0 0 denote the elementary matrix that has the
0 0 1
effect of interchanging rows 1 and 2 of a matrix. Then P 2 = I , since doing the same
1/12/04
21
c 2003
Peter J. Olver
operation twice in a row has no net effect. This implies that P 1 = P is its own inverse.
Indeed, the same result holds for all elementary permutation matrices that correspond to
row operations of type #2. However, it is not true for more general permutation matrices.
Lemma 1.21. If A and B are invertible matrices of the same size, then their product,
A B, is invertible, and
(A B)1 = B 1 A1 .
(1.36)
Note particularly the reversal in order of the factors.
Proof : Let X = B 1 A1 . Then, by associativity,
X (A B) = B 1 A1 A B = B 1 B = I ,
(A B) X = A B B 1 A1 = A A1 = I .
Thus X is both a left and a right inverse for the product matrix A B and the result
follows.
Q.E.D.
1 2
Example 1.22. One verifies, directly, that the inverse of A =
is A1 =
0
1
0 1
0 1
1 2
1
. Therefore, the
is B
=
, while the inverse of B =
0 1
1 0
1 0
1 2
0 1
2 1
inverse of thier product C = A B =
=
is given by C 1 =
0
1
1
0
1
0
0 1
1 2
0 1
1 1
B A =
=
.
1 0
0 1
1 2
We can straightforwardly generalize the preceding result. The inverse of a multiple
product of invertible matrices is the product of their inverses, in the reverse order :
1
1 1
(A1 A2 Am1 Am )1 = A1
m Am1 A2 A1 .
(1.37)
(1.38)
22
c 2003
Peter J. Olver
matrices, starting with only one of the conditions makes the logical development of the
subject considerably more difficult, and not really worth the extra effort. Once we have
established the basic properties of the inverse of a square matrix, we can then safely discard
the superfluous left inverse condition. Finally, when we generalize the notion of an inverse
to a linear operator in Chapter 7, then, unlike square matrices, we cannot dispense with
either of the conditions.
Let us write out the individual columns of the right inverse equation (1.38). The i th
column of the n n identity matrix I is the vector ei that has a single 1 in the ith slot
and 0s elsewhere, so
0
0
1
0
1
0
0
0
0
. .
,
. ,
(1.39)
.
.
.
e
=
e
=
e1 =
.
n
2
..
..
..
0
0
0
0
According to (1.11), the ith column of the matrix product A X is equal to A xi , where
xi denotes the ith column of X = ( x1 x2 . . . xn ). Therefore, the single matrix equation
(1.38) is equivalent to n linear systems
A x1 = e1 ,
A x 2 = e2 ,
...
A x n = en ,
(1.40)
0 2 1
Example 1.23. For example, to find the inverse of the matrix A = 2 6 1 ,
1 1 4
we form the large augmented matrix
0 2 1 1 0 0
2 6 1 0 1 0.
1 1 4 0 0 1
Applying the same sequence of elementary
change the rows
2 6 1
0 2 1
1 1 4
1/12/04
0 1 0
1 0 0,
0 0 1
23
c 2003
Peter J. Olver
2 6 1 0 1
0 2 1 1 0
0 2 72 0 12
Next we eliminate the entry below
2
0
0
pivot,
0
0.
1
6 1 0 1 0
2 1 1 0 0 .
0 92 1 12 1
At this
stage, we have reduced our augmented matrix to the upper triangular form
of elementary row operations to fully reduce the augmented matrix to the form
I | X in which the left hand n n matrix has become
the identity, while the right hand
pivots of U will produce a matrix of the form V | K where V is special upper triangular ,
meaning it has all 1s along the diagonal. In the particular example, the result of these
three elementary row operations of Type #3 is
1
1 3 12 0
0
2
0 1 12 12
0
0 ,
0 0 1 2 1 2
9
1
2
where we multiplied the first and second rows by and the third row by 29 .
We are now over half way towards our goal of an identity matrix on the left. We need
only make the entries above the diagonal equal to zero. This can be done by elementary
row operations of Type #1, but now we work backwards as in back substitution. First,
1/12/04
24
c 2003
Peter J. Olver
eliminate the nonzero entries in the third column lying above the (3, 3) entry; this is done
by subtracting one half the third row from the second and also from the first:
5
1
1 3 0 19
9
9
1
0 1 0 18
91 .
18
2
0 0 1 2
1
9
Finally, subtract
entry:
1
3
the second from the first to eliminate the remaining nonzero off-diagonal
1 0
0 23
18
7
0 18
2
1
7
18
1
18
19
0 1
0 0
9
The final right hand matrix is our desired inverse:
7
23
18
18
7
1
A1 = 18
18
2
9
19
2
9
19
2
9
2
9
19
2
9
thereby completing the GaussJordan procedure. The reader may wish to verify that the
final result does satisfy both inverse conditions A A1 = I = A1 A.
We are now able to complete the proofs of the basic results on inverse matrices. First,
we need to determine the elementary matrix corresponding to an elementary row operation
of type #3. Again, this is obtained by performing the indicated elementary row operation
on the identity matrix. Thus, the elementary matrix that multiplies row i by the nonzero
scalar c 6
= 0 is the diagonal matrix having c in the ith diagonal position, and 1s elsewhere
along the diagonal. The inverse elementary matrix is the diagonal matrix with 1/c in the
ith diagonal position and 1s elsewhere on the main diagonal; it corresponds to the inverse
operation that divides row i by c. For example, the elementary matrix that multiplies the
second row of a 3 n matrix by the scalar 5 is
1 0 0
1 0 0
E = 0 5 0,
and has inverse
E 1 = 0 15 0 .
0 0 1
0 0 1
The GaussJordan method tells us how to reduce any nonsingular square matrix A
to the identity matrix by a sequence of elementary row operations. Let E 1 , E2 , . . . , EN be
the corresponding elementary matrices. Therefore,
EN EN 1 E2 E1 A = I .
(1.41)
X = EN EN 1 E2 E1
(1.42)
is the inverse of A. Indeed, formula (1.41) says that X A = I , and so X is a left inverse.
Furthermore, each elementary matrix has an inverse, and so by (1.37), X itself is invertible,
with
1
1
X 1 = E11 E21 EN
(1.43)
1 EN .
1/12/04
25
c 2003
Peter J. Olver
0
1
1
3
0 1
by row operations corresponding to the matrices E1 =
, corresponding to a row
1 0
1 3
1 0
that
, scaling the second row by 1, and E3 =
interchange, E2 =
0 1
0 1
subtracts 3 times the second row from the first. Therefore,
3 1
0 1
1 0
1 3
1
,
=
A = E 3 E2 E1 =
1 0
1 0
0 1
0 1
For example, the 2 2 matrix A =
while
A=
0
1
1
0
1 0
0 1
1 3
0 1
0 1
1 3
26
c 2003
Peter J. Olver
Thus, with the inverse in hand, a more direct way to solve our example (1.26) is to
multiply the right hand side by the inverse matrix:
5
23
7
2
2
x
18
18
9
6
1
1 = 5 ,
7
y =
18
7
18
9
6
z
2
1
2
3
19
9
9
3
nonsingularity) the fully reduced form I | x , representing the trivial, equivalent system
I x = x, with the solution x to the original system in its final column. However, as we shall
see, back substitution is much more efficient, and is the method of choice in all practical
situations.
The L D V Factorization
The GaussJordan construction leads to a slightly more detailed version of the L U
factorization, which is useful in certain situations. Let D denote the diagonal matrix having
the same diagonal entries as U ; in other words, D has the pivots on its diagonal and zeros
everywhere else. Let V be the special upper triangular matrix obtained from U by dividing
each row by its pivot, so that V has all 1s on the diagonal. We already encountered V
during the course of the GaussJordan method. It is easily seen that U = D V , which
implies the following result.
Theorem 1.27. A matrix A is regular if and only if it admits a factorization
A = L D V,
(1.44)
where L is special lower triangular matrix, D is a diagonal matrix having the nonzero
pivots on the diagonal, and V is special upper triangular.
1/12/04
27
c 2003
Peter J. Olver
1 21
2 0 0
2 1 1
V = 0 1
D = 0 3 0 ,
U = 0 3 0 ,
0 0 1
0 0 1
0 0
1 0
2 1 1
4 5 2 = 2 1
1 1
2 2 0
0
2
00
1
0
0
3
0
0
1
0 0
1
0
1
2
1
0
1
2
1
2
0 ,
1
0 .
1
Proposition 1.28. If A = L U is regular, then the factors L and U are each uniquely
determined. The same holds for its A = L D V factorization.
eU
e . Since the diagonal entries of all four matrices are nonProof : Suppose L U = L
zero, Lemma 1.25 implies that they are invertible. Therefore,
e 1 L = L
e 1 L U U 1 = L
e 1 L
eU
e U 1 = U
e U 1 .
L
(1.45)
The left hand side of the matrix equation (1.45) is the product of two special lower triangular matrices, and so, according to Lemma 1.2, is itself special lower triangular with
1s on the diagonal. The right hand side is the product of two upper triangular matrices,
and hence is itself upper triangular. Comparing the individual entries, the only way such a
special lower triangular matrix could equal an upper triangular matrix is if they both equal
e 1 L = I = U
e U 1 , which implies that L
e=L
the diagonal identity matrix. Therefore, L
e
and U = U , and proves the result. The L D V version is an immediate consequence. Q.E.D.
As you may have guessed, the more general cases requiring one or more row interchanges lead to a permuted L D V factorization in the following form.
means that
28
bij = aji .
c 2003
Peter J. Olver
For example, if
A=
1 2
4 5
3
6
then
1
AT = 2
3
4
5.
6
Note that the rows of A are the columns of AT and vice versa. In particular, the transpose
of a row vector is a column vector, while the transpose of a column vector is a row vector.
1
For example, if
v = 2 ,
then
vT = ( 1 2 3 ).
3
1
2 1
1 3 2
3
0
5 = 2 0 4 .
2 4 8
1 5 8
In particular, the transpose of a lower triangular matrix is upper triangular and vice-versa.
Performing the transpose twice gets you back to where you started:
(AT )T = A.
(1.47)
Unlike the inverse, the transpose is compatible with matrix addition and scalar multiplication:
(1.48)
(A + B)T = AT + B T ,
(c A)T = c AT .
The transpose is also compatible with matrix multiplication, but with a twist. Like the
inverse, the transpose reverses the order of multiplication:
(A B)T = B T AT .
(1.49)
The proof of (1.49) is a straightforward consequence of the basic laws of matrix multiplication. An important special case is the product between a row vector v T and a column
vector w. In this case,
vT w = (vT w)T = wT v,
(1.50)
because the product is a scalar and so equals its own transpose.
Lemma 1.30. The operations of transpose and inverse commute. In other words, if
A is invertible, so is AT , and its inverse is
AT (AT )1 = (A1 )T .
1/12/04
29
(1.51)
c 2003
Peter J. Olver
Q.E.D.
A .
Thus, A is symmetric if and only if its entries satisfy aji = aij for all i, j. In other
words, entries lying in mirror image positions relative to the main diagonal must be
equal. For example, the most general symmetric 3 3 matrix has the form
a b c
A = b d e .
c e f
Note that any diagonal matrix, including the identity, is symmetric. A lower or upper
triangular matrix is symmetric if and only if it is, in fact, a diagonal matrix.
The L D V factorization of a nonsingular matrix takes a particularly simple form if
the matrix also happens to be symmetric. This result will form the foundation of some
significant later developments.
(1.52)
where L is a special lower triangular matrix and D is a diagonal matrix with nonzero
diagonal entries.
Proof : We already know, according to Theorem 1.27, that we can factorize
A = L D V.
(1.53)
We take the transpose of both sides of this equation and use the fact that the tranpose of
a matrix product is the product of the transposes in the reverse order, whence
AT = (L D V )T = V T DT LT = V T D LT ,
(1.54)
where we used the fact that a diagonal matrix is automatically symmetric, D T = D. Note
that V T is special lower triangular, and LT is special upper triangular. Therefore (1.54)
gives the L D V factorization of AT .
In particular, if A = AT , then we can invoke the uniqueness of the L D V factorization,
cf. Proposition 1.28, to conclude that L = V T , and V = LT , (which are two versions of
the same equation). Replacing V by LT in (1.53) proves the factorization (1.52). Q.E.D.
1/12/04
30
c 2003
Peter J. Olver
T
Example
1.33. Let
us find the L D L factorization of the particular symmetric
1 2 1
matrix A = 2 6 1 . This is done by performing the usual Gaussian elimination
1 1 4
algorithm. Subtracting twice
from the second and also the first row from the
the first row
1 2
1
third produces the matrix 0 2 1 . We then add one half of the second row of
0 1 3
the latter matrix to its third row, resulting in the upper triangular form
1 0 0
1 2 1
1 2 1
U = 0 2 1 = 0 2 0 0 1 12 = D V,
0 0 1
0 0 52
0 0 52
which we further factorize by dividing each row of U by its pivot. On the other
hand, the
1 0 0
special lower triangular matrix associated with the row operations is L = 2 1 0 ,
1 12 1
which, as guaranteed by Theorem 1.32, is the transpose of V = LT . Therefore, the desired
A = L U = L D LT factorizations of this particular symmetric matrix are
1 0 0
1 2 1
1 0 0
1 2 1
1 2 1
1 0 0
2 6 1 = 2 1 0 0 2 1 = 2 1 0 0 2 0 0 1 1 .
2
1 12 1
1 1 4
1 12 1
0 0 52
0 0 52
0 0 1
Example 1.34. Let us look at a general 2 2 symmetric matrix
a b
A=
.
b c
(1.55)
a
0
1 0
. Thus, A = L U . Finally, D =
L= b
a c b2 is just the diagonal part of
1
0
a
a
U , and we find U = D LT , so that the L D LT factorization is explicitly given by
!
!
!
a
0
b
1 0
a b
1
2
(1.56)
= b
a .
a
c
b
b c
1
0
0 1
a
a
Remark : If A = L D LT , then A is necessarily symmetric. Indeed,
AT = (L D LT )T = (LT )T DT LT = L D LT = A.
T
However, not every symmetric matrix has
an LD L factorization. A simple example is
0 1
the irregular but invertible 2 2 matrix
.
1 0
1/12/04
31
c 2003
Peter J. Olver
1/12/04
32
c 2003
Peter J. Olver
mik by mik lij mjk , and so must perform one multiplication and one addition. Therefore,
for the j th pivot there are a total of (n j)(n j + 1) multiplications including the
initial n j divisions needed to produce the lij and (n j)2 additions needed to update
the coefficient matrix. Therefore, to reduce a regular n n matrix to upper triangular
form requires a total of
n
X
j =1
(n j)(n j + 1) =
n
X
j =1
(n j)2 =
n3 n
3
multiplications, and
2 n3 3 n 2 + n
6
(1.57)
additions.
(1.58)
j =1
(n j) =
n2 n
2
(1.59)
multiplications and the same number of additions required to produce the right hand side
in the resulting triangular system U x = c. For large n, this number is considerably smaller
than the coefficient matrix totals (1.57), (1.58).
The next phase of the algorithm can be similarly analyzed. To find the value of
n
X
1
xj =
uji xi
cj
ujj
i=j+1
once we have computed xj+1 , . . . , xn , requires n j + 1 multiplications/divisions and n j
additions. Therefore, the Back Substitution phase of the algorithm requires
n
X
j =1
(n j + 1) =
n
X
n2 n
n2 + n
(n j) =
multiplications, and
additions. (1.60)
2
2
j =1
For n large, both of these are approximately equal to 12 n2 . Comparing these results, we
conclude that the bulk of the computational effort goes into the reduction of the coefficient
matrix to upper triangular form.
Forward substitution, to solve L c = b, has the same operations count, except that
since the diagonal entries of L are all equal to 1, no divisions are required, and so we use a
total of 21 (n2 n) multiplications and the same number of additions. Thus, once we have
computed the L U decomposition of the matrix A, the Forward and Back Substitution
process requires about n2 arithmetic operations of the two types, which is the same as the
In Exercise
1/12/04
33
c 2003
Peter J. Olver
Tridiagonal Matrices
p1
A=
r1
q2
p2
r2
q3
..
.
r3
..
.
pn2
..
qn1
pn1
rn1
qn
(1.61)
with all entries zero except for those on the main diagonal, ai,i = qi , on the subdiagonal ,
meaning the n 1 entries ai+1,i = pi immediately below the main diagonal, and the
superdiagonal , meaning the entries ai,i+1 = ri immediately above the main diagonal. (Zero
entries are left blank.) Such matrices arise in the numerical solution of ordinary differential
equations and the spline fitting of curves for interpolation and computer graphics. If
1/12/04
34
c 2003
Peter J. Olver
A = L U is regular, it turns out that the factors are lower and upper bidiagonal matrices,
d u
1
1
1
d 2 u2
l1 1
d 3 u3
l2 1
,
U
=
L=
.
.
.
.
.
.
.
.
.
.
.
.
dn1 un1
ln2
1
dn
ln1 1
(1.62)
Multiplying out L U , and equating the result to A leads to the equations
d1 = q 1 ,
u1 = r 1 ,
l 1 d1 = p 1 ,
l1 u1 + d 2 = q 2 ,
..
.
lj1 uj1 + dj = qj ,
u2 = r 2 ,
..
.
uj = r j ,
l 2 d2 = p 2 ,
..
.
l j dj = p j ,
..
.
..
.
ln2 un2 + dn1 = qn1 ,
ln1 un1 + dn = qn .
(1.63)
..
.
un1 = rn1 ,
These elementary algebraic equations can be successively solved for the entries of L and U
in the order d1 , u1 , l1 , d2 , u2 , l2 , d3 , u3 . . . . The original matrix A is regular provided none
of the diagonal entries d1 , d2 , . . . are zero, which allows the recursive procedure to proceed.
Once the L U factors are in place, we can apply Forward and Back Substitution to solve
the tridiagonal linear system A x = b. We first solve L c = b by Forward Substitution,
which leads to the recursive equations
c1 = b1 ,
c 2 = b2 l1 c1 ,
...
cn = bn ln1 cn1 .
(1.64)
cn
,
dn
xn1 =
cn1 un1 xn
,
dn1
...
x1 =
c 1 u 1 x2
.
d1
(1.65)
As you can check, there are a total of 5 n 4 multiplications/divisions and 3 n 3 additions/subtractions required to solve a general tridiagonal system of n linear equations
a striking improvement over the general case.
Example 1.35. Consider the n n tridiagonal matrix
4 1
1 4 1
1 4 1
1 4 1
A=
.
.
.
.. .. ..
1 4 1
1 4
1/12/04
35
c 2003
Peter J. Olver
in which the diagonal entries are all qi = 4, while the entries immediately above and below
the main diagonal are all pi = ri = 1. According to (1.63), the tridiagonal factorization
(1.62) has u1 = u2 = . . . = un1 = 1, while
d1 = 4,
lj = 1/dj ,
dj+1 = 4 lj ,
j = 1, 2, . . . , n 1.
dj
3.75
3.733333
3.732143
3.732057
3.732051
3.732051
lj
.25
.266666
.267857
.267942
.267948
.267949
.267949
dj 2 + 3 = 3.732051 . . . ,
3 = .2679492 . . . ,
which makes the factorization for large n almost trivial. The numbers 2 3 are the
roots of the quadratic equation x2 4 x + 1 = 0; an explanation of this observation will be
revealed in Chapter 19.
lj 2
Pivoting Strategies
Let us now consider the practical side of pivoting. As we know, in the irregular
situations when a zero shows up in a diagonal pivot position, a row interchange is required
to proceed with the elimination algorithm. But even when a nonzero element appear in
the current pivot position, there may be good numerical reasons for exchanging rows in
order to install a more desirable element in the pivot position. Here is a simple example:
.01 x + 1.6 y = 32.1,
x + .6 y = 22.
(1.66)
The exact solution to the system is x = 10, y = 20. Suppose we are working with a
very primitive calculator that only retains 3 digits of accuracy. (Of course, this is not
a very realistic situation, but the example could be suitably modified to produce similar
difficulties no matter how many digits of accuracy our computer retains.) The augmented
matrix is
32.1
.01
1.6
.
0 159.4 3188
Since our calculator has only threeplace accuracy, it will round the entries in the second
row, producing the augmented coefficient matrix
32.1
.01
1.6
.
0 159.0 3190
1/12/04
36
c 2003
Peter J. Olver
for k = j + 1 to n + 1
set m(i)k = m(i)k l(i)j m(j)k
next k
next i
next j
end
3190
= 20.0628 . . . ' 20.1, and then
159
x = 100 (32.1 1.6 y) = 100 (32.1 32.16) ' 100 (32.1 32.2) = 10. The relatively small
error in y has produced a very large error in x not even its sign is correct!
The problem is that the first pivot, .01, is much smaller than the other element, 1,
that appears in the column below it. Interchanging the two rows before performing the row
operation would resolve the difficulty even with such an inaccurate calculator! After
the interchange, we have
1
.6 22
,
.01 1.6 32.1
The solution by back substitution gives y =
1
.6 22
1
.6
'
0 1.594 31.88
0 1.59
22
31.9 .
The general strategy, known as Partial Pivoting, says that at each stage, we should
use the largest legitimate (i.e., lying on or below the diagonal) element as the pivot, even
1/12/04
37
c 2003
Peter J. Olver
x + .6 y = 22,
obtained by multiplying the first equation in (1.66) by 1000. The tip-off is that, while the
entries in the column containing the pivot are smaller, those in its row are much larger. The
solution to this difficulty is Full Pivoting, in which one also performs column interchanges
preferably with a column pointer to move the largest legitimate element into the
pivot position. In practice, a column interchange is just a reordering of the variables in
the system, which, as long as one keeps proper track of the order, also doesnt change the
solutions.
Finally, there are some matrices that are hard to handle even with pivoting tricks.
Such ill-conditioned matrices are typically characterized by being almost singular . A
famous example of an ill-conditioned matrix is the n n Hilbert matrix
1
1
1
1
...
1
2
3
4
n
1
1
1
1
.
.
.
3
4
5
n+1
2
1
1
1
1
1
.
.
.
3
4
5
6
n+2
.
Hn = 1
(1.67)
1
1
1
1
...
4
5
6
7
n+3
..
..
..
..
..
..
.
.
.
.
.
1
1
1
1
1
...
n n+1 n+2 n+3
2n 1
In Proposition 3.36 we will prove that Hn is nonsingular for all n. However, the solution of
a linear system whose coefficient matrix is a Hilbert matrix Hn , even for moderately sized
n, is a very challenging problem, even if one uses high precision computer arithmetic .
This can be quantified by saying that their determinant is very small, but non-zero; see also
Sections 8.5 and 10.3.
In computer algebra systems such as Maple or Mathematica, one can use exact rational
arithmetic to perform the computations. Then the important issues are time and computational
efficiency.
1/12/04
38
c 2003
Peter J. Olver
This is because the larger n is, the closer Hn is, in a sense, to being singular.
The reader is urged to try the following computer experiment. Fix a moderately large
value of n, say 20. Choose a column vector x with n entries chosen at random. Compute
b = Hn x directly. Then try to solve the system Hn x = b by Gaussian Elimination. If
it works for n = 20, try n = 50 or 100. This will give you a good indicator of the degree
of precision used by your computer program, and the accuracy of the numerical solution
algorithm.
...
0
0 ...
0
0
0 ...
0
0
..
..
.. . .
..
.
.
.
.
.
U =
0
0 ...
0
0
0
0 ...
0
0
..
..
..
.. . .
.
.
.
.
.
0
0 ...
0
0
...
...
...
...
...
...
...
...
...
...
..
.
...
..
...
...
...
...
..
.
...
0
..
.
0
..
.
...
..
.
...
...
..
.
...
..
.
..
..
.
...
0
..
.
0
..
.
0
..
.
0
..
.
...
..
.
...
..
.
..
.
..
3 1
0 2 5 1
0 1 2 1 8 0
0 0
0 0 2 4
0 0
0 0 0 0
1/12/04
39
c 2003
Peter J. Olver
The three pivots, which are the first three nonzero entries in the nonsero rows, are, respectively, 3, 1, 2. There may, in exceptional situations, be one or more initial all zero
columns.
Proposition 1.37. Any matrix can be reduced to row echelon form by a sequence
of elementary row operations of Types #1 and #2.
In matrix language, Proposition 1.37 implies that if A is any m n matrix, then there
exists an m m permutation matrix P and an m m special lower triangular matrix L
such that
P A = L U,
(1.68)
where U is in row echelon form. The factorization is not unique.
A constructive proof of this result is based on the general Gaussian elimination algorithm, which proceeds as follows. Starting at the top left of the matrix, one searches for
the first column which is not identically zero. Any of the nonzero entries in that column
may serve as the pivot. Partial pivoting indicates that it is probably best to choose the
largest one, although this is not essential for the algorithm to proceed. One places the
chosen pivot in the first row of the matrix via a row interchange, if necessary. The entries
below the pivot are made equal to zero by the appropriate elementary row operations of
Type #1. One then proceeds iteratively, performing the same reduction algorithm on the
submatrix consisting of all entries strictly to the right and below the pivot. The algorithm
terminates when either there is a pivot in the last row, or all of the rows lying below the
last pivot are identically zero, and so no more pivots can be found.
Example 1.38. Let us illustrate the general Gaussian Elimination algorithm with a
particular example. Consider the linear system
x + 3y + 2z u
= a,
2 x + 6 y + z + 4 u + 3 v = b,
(1.69)
x 3 y 3 z + 3 u + v = c,
3 x + 9 y + 8 z 7 u + 2 v = d,
1
3
2 1 0
6
1
4 3
2
A=
(1.70)
.
1 3 3 3 1
3
9
8 7 2
To solve the system, we introduce the augmented matrix
1
3
2 1 0 a
6
1
4 3 b
2
1 3 3 3 1 c
3
9
8 7 2
d
It will be convenient to work with the right hand side in general form, although the reader
may prefer, at least initially, to assign specific values to a, b, c, d.
1/12/04
40
c 2003
Peter J. Olver
obtained by appending the right hand side of the system. The upper left entry is nonzero,
and so can serve as the first pivot; we eliminate the entries below it by elementary row
operations, resulting in
1
0
0
0
3 2
0 3
0 1
0 2
1
6
2
4
0
3
1
2
b 2a
.
c+a
d 3a
Now, the second column contains no suitable nonzero entry to serve as the second pivot.
(The top entry already lies in a row with a pivot in it, and so cannot be used.) Therefore,
we move on to the third column, choosing the (2, 3) entry, 3, as our second pivot. Again,
we eliminate the entries below it, leading to
a
1 3 2 1 0
0 0 3 6 3
b 2a
1
5
0 0 0
0 0 c 3 b + 3 a .
0 0 0
0 4 d + 32 b 13
3 a
The final pivot is in the last column, and we interchange the last two rows in order to
place the coefficient matrix in row echelon form:
a
1 3 2 1 0
0 0 3 6 3
b 2a
(1.71)
2
13
0 0 0
0 4 d + 3 b 3 a .
0 0 0
0 0 c 13 b + 35 a
There are three pivots, 1, 3, 4, sitting in positions (1, 1), (2, 3) and (3, 5). Note the
staircase form, with the pivots on the steps and everything below the staircase being zero.
Recalling the row operations used to construct the solution (and keeping in mind that
the row interchange that appears at the end also affects the entries of L), we find the
factorization (1.68) has the explicit form
1
0
0
0
0
1
0
0
0
0
0
1
0
1
3
0 2
6
1
1 3
0
3
9
2
1
3
8
1 0
1
4 3 2
=
3 1
3
1
7 2
0
1
32
1
3
0
0
1
0
1
0
0 0
0
0
1
0
3
0
0
0
2
3
0
0
1 0
6 3
0 4
0 0
We shall return to find the solution to our system after a brief theoretical interlude.
Warning: In the augmented matrix, pivots can never appear in the last column,
= 0, that entry
representing the right hand side of the system. Thus, even if c 31 b + 35 a 6
does not qualify as a pivot.
We now introduce the most important numerical quantity associated with a matrix.
Definition 1.39. The rank of a matrix A is the number of pivots.
1/12/04
41
c 2003
Peter J. Olver
For instance, the rank of the matrix (1.70) equals 3, since its reduced row echelon
form, i.e., the first five columns of (1.71), has three pivots. Since there is at most one pivot
per row and one pivot per column, the rank of an m n matrix is bounded by both m and
n, and so 0 r min{m, n}. The only matrix of rank 0 is the zero matrix, which has no
pivots.
Proposition 1.40. A square matrix of size n n is nonsingular if and only if its
rank is equal to n.
Indeed, the only way an n n matrix can end up having n pivots is if its reduced row
echelon form is upper triangular with nonzero diagonal entries. But a matrix that reduces
to such triangular form is, by definition, nonsingular.
Interestingly, the rank of a matrix does not depend on which elementary row operations are performed along the way to row echelon form. Indeed, performing a different
sequence of row operations say using partial pivoting versus no pivoting can produce
a completely different row echelon form. The remarkable fact, though, is that all such row
echelon forms end up having exactly the same number of pivots, and this number is the
rank of the matrix. A formal proof of this fact will appear in Chapter 2.
Once the coefficient matrix has been reduced to row echelon form, the solution proceeds as follows. The first step is to see if there are any incompatibilities. Suppose one
of the rows in the row echelon form of the coefficient matrix is identically zero, but the
corresponding entry in the last column of the augmented matrix is nonzero. What linear
equation would this represent? Well, the coefficients of all the variables are zero, and so
the equation is of the form 0 = c, where c, the number on the right hand side of the
equation, is the entry in the last column. If c 6
= 0, then the equation cannot be satisfied.
Consequently, the entire system has no solutions, and is an incompatible linear system.
On the other hand, if c = 0, then the equation is merely 0 = 0, and so is trivially satisfied.
For example, the last row in the echelon form (1.71) is all zero, and hence the last entry in
the final column must also vanish in order that the system be compatible. Therefore, the
linear system (1.69) will have a solution if and only if the right hand sides a, b, c, d satisfy
the linear constraint
1
5
(1.72)
3 a 3 b + c = 0.
In general, if the system is incompatible, there is nothing else to do. Otherwise,
every zero row in the echelon form of the augmented matrix also has a zero entry in the
last column, and the system is compatible, so one or more solutions exist. To find the
solution(s), we work backwards, starting with the last row that contains a pivot. The
variables in the system naturally split into two classes.
Definition 1.41. In a linear system U x = c in row echelon form, the variables
corresponding to columns containing a pivot are called basic variables, while the variables
corresponding to the columns without a pivot are called free variables.
The solution to the system proceeds by a version of the Back Substitution procedure.
The nonzero equations are solved, in reverse order, for the basic variable corresponding to
its pivot. Each result is substituted into the preceding equations before they in turn are
1/12/04
42
c 2003
Peter J. Olver
solved. The remaining free variables, if any, are allowed to take on any values whatsoever,
and the solution then specifies all the basic variables in terms of the free variables, which
serve to parametrize the general solution.
Example 1.42. Let us illustrate this construction with our particular example.
Assuming the compatibility condition (1.72), the reduced augmented matrix (1.71) is
a
1 3 2 1 0
0 0 3 6 3
b 2a
2
13
0 0 0
0 4 d + 3 b 3 a .
0 0 0
0 0
0
The pivots are found in columns 1, 3, 5, and so the corresponding variables, x, z, v, are
basic; the other variables, y, u, are free. We will solve the reduced system for the basic
variables in terms of the free variables.
As a specific example, the values a = 0, b = 3, c = 1, d = 1, satisfy the compatibility constraint (1.72). The resulting augmented echelon matrix (1.71) corresponds to the
system
x + 3y + 2z u
= 0,
3 z + 6 u + 3 v = 3,
4 v = 3,
0 = 0.
We now solve the equations, in reverse order, for the basic variables, and then substitute
the resulting values in the preceding equations. The result is the general solution
v = 34 ,
z = 1 + 2u + v = 14 + 2 u,
x = 3y 2z + u =
1
2
3 y 3 u.
The free variables y, u are completely arbitrary; any value they assume will produce a
solution to the original system. For instance, if y = 2, u = 1 , then x = 3 + 72 ,
z = 47 2 , v = 43 . But keep in mind that this is merely one of an infinite number of
different solutions.
In general, if the m n coefficient matrix of a system of m linear equations in n
unknowns has rank r, there are m r all zero rows in the row echelon form, and these
m r equations must have zero right hand side in order that the system be compatible and
have a solution. Moreover, there are a total of r basic variables and n r free variables,
and so the general solution depends upon n r parameters.
Summarizing the preceding discussion, we have learned that there are only three
possible outcomes for the solution to a general linear system.
Theorem 1.43. A system A x = b of m linear equations in n unknowns has either
(i ) exactly one solution, (ii ) no solutions, or (iii ) infinitely many solutions.
Case (ii ) occurs if the system is incompatible, producing a zero row in the echelon
form that has a nonzero right hand side. Case (iii ) occurs if the system is compatible and
there are one or more free variables. This happens when the system is compatible and the
rank of the coefficient matrix is strictly less than the number of columns: r < n. Case
1/12/04
43
c 2003
Peter J. Olver
(i ) occurs for nonsingular square coefficient matrices, and, more generally, for compatible
systems for which r = n, implying there are no free varaibles. Since r m, this case can
only arise if the coefficient matrix has at least as many rows as columns, i.e., the linear
system has at least as many equations as unknowns.
A linear system can never have a finite number other than 0 or 1 of solutions.
Thus, any linear system that has more than one solution automatically has infinitely many.
This result does not apply to nonlinear systems. As you know, a real quadratic equation
a x2 + b x + c = 0 can have either 2, 1, or 0 real solutions.
Example 1.44. Consider the linear system
y + 4 z = a,
3 x y + 2 z = b,
x + y + 6 z = c,
0 1 4 a
3 1 2 b .
1 1 6 c
Interchanging the first two rows, and then eliminating the elements below the first pivot
leads to
b
3 1 2
0 1
a .
4
16
4
c 13 b
0 3
3
The second pivot is in the (2, 2) position, but after eliminating the entry below it, we find
the row echelon form to be
3 1 2
b
0 1 4
.
a
1
4
0 0 0
c 3b 3a
Since we have a row of all zeros, the original coefficient matrix is singular, and its rank is
only 2.
The compatibility condition for the system follows from this last row in the reduced
echelon form, and so requires
1
4
3 a + 3 b c = 0.
If this is not satisfied, the system has no solutions; otherwise it has infinitely many. The
free variable is z, since there is no pivot in the third column. The general solution is
y = a 4 z,
x=
1
3
b + 31 y 32 z =
1
3
a + 13 b 2z,
where z is arbitrary.
Geometrically, Theorem 1.43 is indicating something about the possible configurations
of linear subsets (lines, planes, etc.) of an n-dimensional space. For example, a single linear
equation a x + b y + c z = d defines a plane P in three-dimensional space. The solutions to
a system of three linear equations in three unknowns is the intersection P 1 P2 P3 of
three planes. Generically, three planes intersect in a single common point; this is case (i )
1/12/04
44
c 2003
Peter J. Olver
No Solution
Unique Solution
Figure 1.1.
Infinite # Solutions
Intersecting Planes.
of the theorem, and occurs if and only if the coefficient matrix is nonsingular. The case of
infinitely many solutions occurs when the three planes intersect on a common line, or, even
more degenerately, when they all coincide. On the other hand, parallel planes, or planes
intersecting in parallel lines, have no common point of intersection, and this corresponds
to the third case of a system with no solutions. Again, no other possibilities occur; clearly
one cannot have three planes having exactly 2 points in their common intersection it is
either 0, 1 or . Some possible geometric configurations are illustrated in Figure 1.1.
Homogeneous Systems
A linear system with all 0s on the right hand side is called a homogeneous system. In
matrix notation, a homogeneous system takes the form
A x = 0.
(1.73)
Homogeneous systems are always compatible, since x = 0 is a solution, known as the trivial
solution. If the homogeneous system has a nontrivial solution x 6
= 0, then Theorem 1.43
assures that it must have infinitely many solutions. This will occur if and only if the
reduced system has one or more free variables. Thus, we find:
Theorem 1.45. A homogeneous linear system A x = 0 of m equations in n unknowns has a nontrivial solution x 6
= 0 if and only if the rank of A is r < n. If m < n, the
system always has a nontrivial solution. If m = n, the system has a nontrivial solution if
and only if A is singular.
Example 1.46. Consider the homogeneous linear system
2 x1 + x2 + 5 x4 = 0,
with coefficient matrix
4 x1 + 2 x2 x3 + 8 x4 = 0,
A=
4
2
1
2
1
2 x1 x2 + 3 x3 4 x4 = 0,
0
5
1 8 .
3 4
Since the system is homogeneous and has fewer equations than unknowns, Theorem 1.45
assures us that it has infinitely many solutions, including the trivial solution x 1 = x2 =
1/12/04
45
c 2003
Peter J. Olver
2 1 0
5
0 0 1 2 .
0 0 3
1
The (2, 3) entry is the second pivot, and we apply one final row operation to place the
matrix in row echelon form
2 1 0
5
0 0 1 2 .
0 0 0 5
This corresponds to the reduced homogeneous system
2 x1 + x2 + 5 x4 = 0,
x3 2 x4 = 0,
5 x4 = 0.
Since there are three pivots in the final row echelon form, the rank of the matrix A is
3. There is one free variable, namely x2 . Using Back Substitution, we easily obtain the
general solution
x1 = 21 t,
x2 = t,
x3 = x4 = 0,
which depends upon a single free parameter t = x2 .
Example 1.47. Consider the homogeneous linear system
2 x y + 3 z = 0,
4 x + 2 y 6 z = 0,
2 x y + z = 0,
6 x 3 y + 3 z = 0,
2 1 3
4 2 6
with coefficient matrix A =
. The system admits the trivial solution
2 1 1
6 3 3
x = y = z = 0, but in this case we need to complete the elimination algorithm before we
can state whether or not there are other solutions. After the first stage, the coefficient
2 1 3
0
0 0
matrix has the form
. To continue, we need to interchange the second and
0 0 2
0 0 6
third rows to place a nonzero entry in the final pivot position; after that the reduction to
2 1 3
2 1 3
0 0 2
0 0 2
row echelon form is immediate:
7
. Thus, the system
0 0
0
0 0
0
0 0 6
0 0
0
reduces to the equations
2 x y + 3 z = 0,
1/12/04
2 z = 0,
46
0 = 0,
0 = 0,
c 2003
Peter J. Olver
where the third and fourth equations are trivially compatible, as they must be in the
homogeneous case. The rank is equal to two, which is less than the number of columns,
and so, even though the system has more equations than unknowns, it has infinitely many
solutions. These can be written in terms of the free variable y, and so the general solution
is x = 12 y, z = 0, where y is arbitrary.
1.9. Determinants.
You may be surprised that, so far, we have left undeveloped a topic that often assumes a central role in basic linear algebra: determinants. As with matrix inverses, while
determinants can be useful in low dimensions and for theoretical purposes, they are mostly
irrelevant when it comes to large scale applications and practical computations. Indeed,
the best way to compute a determinant is (surprise) Gaussian Elimination! However,
you should be familiar with the basics of determinants, and so for completeness, we shall
provide a very brief introduction.
The determinant of a square matrix A, written det A, is a number that immediately
tells whether the matrix is singular or not. (Rectangular matrices do not have determinants.) We already encountered, (1.34), the determinant of a 2 2 matrix, which is
equal
to the
product of the diagonal entries minus the product of the off-diagonal entries:
a b
det
= a d b c. The determinant is nonzero if and only if the matrix has an
c d
inverse. Our goal is to generalize this construction to general square matrices.
There are many different ways to define determinants. The difficulty is that the actual formula is very unwieldy see (1.81) below and not well motivated. We prefer an
axiomatic approach that explains how our elementary row operations affect the determinant. In this manner, one can compute the determinant by Gaussian elimination, which
is, in fact, the fastest and most practical computational method in all but the simplest
situations. In effect, this remark obviates the need to ever compute a determinant.
Theorem 1.48. The determinant of a square matrix A is the uniquely defined scalar
quantity det A that satisfies the following axioms:
(1) Adding a multiple of one row to another does not change the determinant.
(2) Interchanging two rows changes the sign of the determinant.
(3) Multiplying a row by any scalar (including zero) multiplies the determinant by the
same scalar.
(4) Finally, the determinant function is fixed by setting
det I = 1.
(1.74)
Checking that all four of these axioms hold in the 2 2 case (1.34) is left as an
elementary exercise for the reader. A particular consequence of axiom 3 is that when we
multiply a row of any matrix A by the zero scalar, the resulting matrix, which has a row
of all zeros, necessarily has zero determinant.
Lemma 1.49. Any matrix with one or more all zero rows has zero determinant.
1/12/04
47
c 2003
Peter J. Olver
Since the determinantal axioms tell how determinants behave under all three of our
elementary row operations, we can use Gaussian elimination to compute a general determinant, recovering det A from its permuted L U factorization.
Theorem 1.50. If A is a regular matrix, with A = L U factorization as in (1.21),
then
det A = det U =
n
Y
uii
(1.75)
i=1
equals the product of the pivots. More generally, if A is nonsingular, and requires k row
interchanges to arrive at its permuted L U factorization P A = L U , then
det A = det P det U = (1)
n
Y
uii .
(1.76)
i=1
Finally, we can reduce V to the identity by further row operations of Type #1, and so by
(1.74),
det V = det I = 1.
(1.78)
Combining equations (1.77), (1.78) proves the theorem for the regular case. The nonsingular case follows without difficulty each row interchange changes the sign of the
determinant, and so det A equals det U if there have been an even number of interchanges,
but equals det U if there have been an odd number.
Finally, if A is singular, then we can reduce it to a matrix with at least one row of
zeros by elementary row operations of types #1 and #2. Lemma 1.49 implies that the
resulting matrix has zero determinant, and so det A = 0, also.
Q.E.D.
Corollary 1.51. The determinant of a diagonal matrix is the product of the diagonal
entries. The same result holds for both lower triangular and upper triangular matrices.
Example 1.52. Let us compute the determinant of the 4 4 matrix
1
2
A=
0
1
1/12/04
0
1
2
1
48
1
3
2
4
2
4
.
3
2
c 2003
Peter J. Olver
We perform our usual Gaussian Elimination algorithm, successively leading to the matrices
1 0 1 2
1 0 1 2
1 0 1 2
0 1 1 0
0 1 1 0
0 1 1 0
A 7
,
7
7
0 0 2 4
0 0 0
3
0 2 2 3
0 0 0
3
0 0 2 4
0 1 3 4
where we used a single row interchange to obtain the final upper triangular form. Owing
to the row interchange, the determinant of the original matrix is 1 times the product of
the pivots:
det A = 1 1 1 ( 2) 3 = 6.
In particular, this tells us that A is nonsingular. But, of course, this was already implied
by the elimination, since the matrix reduced to upper triangular form with 4 pivots.
Let us now present some of the basic properties of determinants.
Lemma 1.53. The determinant of the product of two square matrices of the same
size is the product of the determinants:
det(A B) = det A det B.
(1.79)
Proof : The product formula holds if A is an elementary matrix; this is a consequence of the determinantal axioms, combined with Corollary 1.51. By induction, if
A = E1 E2 EN is a product of elementary matrices, then (1.79) also holds. Therefore, the result holds whenever A is nonsingular. On the other hand, if A is singular, then
according to Exercise , A = E1 E2 EN Z, where the Ei are elementary matrices, and
Z, the row echelon form, is a matrix with a row of zeros. But then Z B = W also has a
row of zeros, and so A B = E1 E2 EN W is also singular. Thus, both sides of (1.79) are
zero in this case.
Q.E.D.
It is a remarkable fact that, even though matrix multiplication is not commutative, and
so A B 6
= B A in general, it is nevertheless always true that both products have the same
determinant: det(A B) = det(B A). Indeed, both are equal to the product det A det B of
the individual determinants because ordinary (scalar) multiplication is commutative.
Lemma 1.54. Transposing a matrix does not change its determinant:
det AT = det A.
(1.80)
The middle equality follows from the commutativity of ordinary multiplication. This proves
the nonsingular case; the singular case follows from Lemma 1.30, which implies that A T is
singular if and only if A is.
Q.E.D.
1/12/04
49
c 2003
Peter J. Olver
Remark : Lemma 1.54 has the interesting consequence that one can equally well use
elementary column operations to compute determinants. We will not develop this approach in any detail here, since it does not help us to solve linear equations.
Finally, we state the general formula for a determinant; a proof can be found in [135].
Theorem 1.55. If A is an n n matrix with entries aij , then
det A =
(1.81)
The sum in (1.81) is over all possible permutations of the columns of A. The
summands consist of all possible ways of choosing n entries of A with one entry in each
column and 1 entry in each row of A. The sign in front of the indicated term depends on
the permutation ; it is + if is an even permutation, meaning that its matrix can be
reduced to the identity by an even number of row interchanges, and is is odd. For
example, the six terms in the well-known formula
1/12/04
50
c 2003
Peter J. Olver
Chapter 2
Vector Spaces
Vector spaces and their ancillary structures provide the common language of linear
algebra, and, as such are an essential prerequisite for understanding contemporary applied mathematics. The key concepts of vector space, subspace, linear independence,
span, and basis will appear, not only in linear systems of equations and the geometry of
n-dimensional Euclidean space, but also in the analysis of linear ordinary differential equations, linear partial differential equations, linear boundary value problems, all of Fourier
analysis, numerical approximations like the finite element method, and many, many other
fields. Therefore, in order to develop the wide variety of analytical methods and applications covered in this text, we need to acquire a firm working knowledge of basic vector
space analysis.
One of the great triumphs of modern mathematics was the recognition that many
seemingly distinct constructions are, in fact, different manifestations of the same general
mathematical structure. The abstract notion of a vector space serves to unify spaces of
ordinary vectors, spaces of functions, such as polynomials, exponentials, trigonometric
functions, as well as spaces of matrices, linear operators, etc., all in a common conceptual
framework. Moreover, proofs that might look rather complicated in any particular context often turn out to be relatively transparent when recast in the abstract vector space
framework. The price that one pays for the increased level of abstraction is that, while the
underlying mathematics is not all that difficult, the student typically takes a long time to
assimilate the material. In our opinion, the best way to approach the subject is to think in
terms of concrete examples. First, make sure you understand what the concept or theorem
says in the case of ordinary Euclidean space. Once this is grasped, the next important case
to consider is an elementary function space, e.g., the space of continuous scalar functions.
With these two examples firmly in hand, the leap to the general abstract version should
not be too painful. Patience is essential; ultimately the only way to truly understand an
abstract concept like a vector space is by working with it! And always keep in mind that
the effort expended here will be amply rewarded later on.
Following an introduction to vector spaces and subspaces, we introduce the notions of
span and linear independence of a collection of vector space elements. These are combined
into the all-important concept of a basis of a vector space, leading to a linear algebraic
characterization of its dimension. We will then study the four fundamental subspaces
associated with a matrix range, kernel, corange and cokernel and explain how they
help us understand the solution to linear algebraic systems. Of particular note is the
all-pervasive linear superposition principle that enables one to construct more general
solutions to linear systems by combining known solutions. Superposition is the hallmark
1/12/04
51
c 2003
Peter J. Olver
of linearity, and will apply not only to linear algebraic equations, but also linear ordinary
differential equations, linear partial differential equations, linear boundary value problems,
and so on. Some interesting applications in graph theory, to be used in our later study of
electrical circuits, will form the final topic of this chapter.
1/12/04
52
c 2003
Peter J. Olver
Remark : For most of this chapter we will deal with real vector spaces, in which the
scalars are the real numbers R. Complex vector spaces, where complex scalars are allowed,
will be introduced in Section 3.6. Vector spaces over other fields are studied in abstract
algebra, [77].
Example 2.2.
space R n consisting
Vector addition and
v +w
1
cv
v
w
1
v2 + w 2
,
v+w =
..
.
vn + w n
c v2
cv =
.. ,
.
c vn
T
whenever
v2
w
, w = .2 .
v=
.
.
.
.
.
vn
wn
The zero vector is 0 = ( 0, . . . , 0 ) . The fact that vectors in R n satisfy all of the vector space axioms is an immediate consequence of the laws of vector addition and scalar
multiplication. Details are left to the reader.
Example 2.3. Let Mmn denote the space of all real matrices of size m n. Then
Mmn forms a vector space under the laws of matrix addition and scalar multiplication.
The zero element is the zero matrix O. Again, the vector space axioms are immediate
consequences of the basic laws of matrix arithmetic. (For the purposes of this example, we
ignore additional matrix properties, like matrix multiplication.) The preceding example of
the vector space R n = M1n is a particular case when the matrices have only one column.
Example 2.4. Consider the space
(2.1)
53
c 2003
Peter J. Olver
For much of analysis, including differential equations, Fourier theory, numerical methods, and so on, the most important vector spaces consist of sets of functions with certain
specified properties. The simplest such example is the following.
Example 2.5. Let I R be an interval. Consider the function space F = F(I)
that consists of all real-valued functions f (x) defined for all x I, which we also write
as f : I R. The claim is that the function space F has the structure of a vector space.
Addition of functions in F is defined in the usual manner: (f + g)(x) = f (x) + g(x).
Multiplication by scalars c R is the same as multiplication by constants, (c f )(x) = c f (x).
The zero element is the constant function that is identically 0 for all x I. The proof
of the vector space axioms is straightforward, just as in the case of polynomials. As in
the preceding remark, we are ignoring all additional operations multiplication, division,
inversion, composition, etc. that can be done with functions; these are irrelevant as far
as the vector space structure of F goes.
Remark : An interval can be (a) closed , meaning that it includes its endpoints: I =
[ a, b ], (b) open, which does not include either endpoint: I = ( a, b ), or (c) half open,
which includes one but not the other endpoint, so I = [ a, b ) or ( a, b ]. An open endpoint is
allowed to be infinite; in particular, ( , ) = R is another way of writing the real line.
Example 2.6. The preceding examples are all, in fact, special cases of an even
more general construction. A clue is to note that the last example of a function space
does not make any use of the fact that the domain of definition of our functions is a real
interval. Indeed, the construction produces a function space F(I) corresponding to any
subset I R.
Even more generally, let S be any set. Let F = F(S) denote the space of all realvalued functions f : S R. Then we claim that V is a vector space under the operations
of function addition and scalar multiplication. More precisely, given functions f and g,
we define their sum to be the function h = f + g such that h(x) = f (x) + g(x) for all
x S. Similarly, given a function f and a real scalar c R, we define the scalar multiple
k = c f to be the function such that k(x) = c f (x) for all x S. The verification of the
vector space axioms proceeds straightforwardly, and the reader should be able to fill in the
necessary details.
In particular, if S R is an interval, then F(S) coincides with the space of scalar
functions described in the preceding example. If S R n is a subset of Euclidean space,
then the elements of F(S) are functions f (x1 , . . . , xn ) depending upon the n variables
corresponding to the coordinates of points x = (x1 , . . . , xn ) S in the domain. In this
fashion, the set of real-valued functions defined on any domain in R n is found to also form
a vector space.
Another useful example is to let S = {x1 , . . . , xn } R be a finite set of real numbers.
A real-valued function f : S R is defined by its values f (x1 ), f (x2 ), . . . f (xn ) at the
specified points. In applications, one can view such functions as indicating the sample
values of a scalar function f (x) F(R) taken at the sample points x1 , . . . , xn . For example,
when measuring a physical quantity, e.g., temperature, velocity, pressure, etc., one typically
only measures a finite set of sample values. The intermediate, non-recorded values between
the sample points are then reconstructed through some form of interpolation a topic
1/12/04
54
c 2003
Peter J. Olver
that we shall visit in depth later on. Interestingly, the sample values f (x i ) can be identified
with the entries fi of a vector
T
Rn,
2 x cos x
cos x
x2
.
=
2
2 ex x 8
x
ex 4
2.2. Subspaces.
In the preceding section, we were introduced to the most basic vector spaces that play
a role in this text. Almost all of the important vector spaces arising in applications appear
as particular subsets of these key examples.
Definition 2.8. A subspace of a vector space V is a subset W V which is a vector
space in its own right.
Since elements of W also belong to V , the operations of vector addition and scalar
multiplication for W are induced by those of V . In particular, W must contain the zero
element of V in order to satisfy axiom (c). The verification of the vector space axioms for
a subspace is particularly easy: we only need check that addition and scalar multiplication
keep us within the subspace.
1/12/04
55
c 2003
Peter J. Olver
(c) The set of all vectors of the form ( x, y, 0 ) , i.e., the (x, y)coordinate plane. Note
T
T
T
T
that the sum ( x, y, 0 ) +( x
b, yb, 0 ) = ( x + x
b, y + yb, 0 ) , and scalar multiple c ( x, y, 0 ) =
T
( c x, c y, 0 ) , of vectors in the (x, y)plane also lie in the plane, proving closure.
T
is
3 (x + x
b) + 2 (y + yb) (z + zb) = (3 x + 2 y z) + (3 x
b + 2 yb zb) = 0.
Note that the solution space is a two-dimensional plane consisting of all vectors which are
T
perpendicular (orthogonal) to the vector ( 3, 2, 1 ) .
(e) The set of all vectors lying in the plane spanned by the vectors v1 = ( 2, 3, 0 )
T
and v2 = ( 1, 0, 3 ) . In other words, we consider all vectors of the form
2a + b
1
2
v = a v1 + b v2 = a 3 + b 0 = 3 a ,
3b
3
0
1/12/04
56
c 2003
Peter J. Olver
where e
a = a c+b
a d, eb = b c+ bb d. This proves that the plane is a subspace of R 3 . The reader
might already have noticed that this subspace is the same plane that was considered in
item (d).
Example 2.11. The following subsets of R 3 are not subspaces.
T
(a) The set P of all vectors of the form ( x, y, 1 ) , i.e., the plane parallel to the
T
x y coordinate plane passing through ( 0, 0, 1 ) . Indeed, 0 6
P , which is the most basic
requirement for a subspace. In fact, neither of the closure axioms hold for this subset.
(b) The positive octant O + = {x > 0, y > 0, z > 0}. While the sum of two vectors in
O+ belongs to O + , multiplying by negative scalars takes us outside the orthant, violating
closure under scalar multiplication.
(c) The unit sphere S 2 = { x2 + y 2 + z 2 = 1 }. Again, 0 6
S 2 . More generally, curved
surfaces, e.g., the paraboloid P = { z = x2 + y 2 }, are not subspaces. Although 0 P ,
T
most scalar multiples of vectors in P do not belong to P . For example, ( 1, 1, 2 ) P ,
T
T
but 2 ( 1, 1, 2 ) = ( 2, 2, 4 ) 6
P.
In fact, there are only four fundamentally different types of subspaces W R 3 of
three-dimensional Euclidean space:
(i ) The entire space W = R 3 ,
(ii ) a plane passing through the origin,
(iii ) a line passing through the origin,
(iv ) the trivial subspace W = {0}.
To verify this observation, we argue as follows. If W = {0} contains only the zero vector,
then we are in case (iv). Otherwise, W R 3 contains a nonzero vector 0 6
= v1 W .
But since W must contain all scalar multiples c v1 of this element, it includes the entire
line in the direction of v1 . If W contains another vector v2 that does not lie in the line
through v1 , then it must contain the entire plane {c v1 + d v2 } spanned by v1 , v2 . Finally,
if there is a third vector v3 not contained in this plane, then we claim that W = R 3 . This
final fact will be an immediate consequence of general results in this chapter, although the
interested reader might try to prove it directly before proceeding.
Example 2.12. Let I R be an interval, and let F(I) be the space of real-valued
functions f : I R. Let us look at some of the most important examples of subspaces
of F(I). In each case, we need only verify the closure conditions to verify that the given
subset is indeed a subspace.
(a) The space P (n) of polynomials of degree n, which we already encountered.
S
(b) The space P () = n0 P (n) consisting of all polynomials.
(c) The space C0 (I) of all continuous functions. Closure of this subspace relies on
knowing that if f (x) and g(x) are continuous, then both f (x) + g(x) and cf (x) for any
c R are also continuous two basic results from calculus.
1/12/04
57
c 2003
Peter J. Olver
(d) More restrictively, one can consider the subspace Cn (I) consisting of all functions
f (x) that have n continuous derivatives f 0 (x), f 00 (x), . . . , f (n) (x) on I. Again, we need to
know that if f (x) and g(x) have n continuous derivatives, so do f (x) + g(x) and cf (x) for
any c R.
T
(e) The space C (I) = n0 Cn (I) of infinitely differentiable or smooth functions
is also a subspace. (The fact that this intersection is a subspace follows directly from
Exercise .)
(f ) The space A(I) of analytic functions on the interval I. Recall that a function
f (x) is called analytic at a point a if it is smooth, and, moreover, its Taylor series
f (a) + f 0 (a) (x a) +
1
2
f 00 (a) (x a)2 + =
X
f (n) (a)
(x a)n
n!
n=0
(2.2)
converges to f (x) for all x sufficiently close to a. (It does not have to converge on the entire
interval I.) Not every smooth function is analytic, and so A(I) ( C (I). An explicit
example is the function
1/x
e
,
x > 0,
(2.3)
f (x) =
0,
x 0.
It can be shown that every derivative of this function at 0 exists and equals zero: f (n) (0) =
0, n = 0, 1, 2, . . ., and so the function is smooth. However, its Taylor series at a = 0 is
0 + 0 x + 0 x2 + 0, which converges to the zero function, not to f (x). Therefore f (x)
is not analytic at a = 0.
(g) The set of all mean zero functions. The mean or average of an integrable function
defined on a closed interval I = [ a, b ] is the real number
Z b
1
f (x) dx.
(2.4)
f=
ba a
Z b
f (x) dx = 0. Note that f + g = f + g,
In particular, f has mean zero if and only if
a
and so the sum of two mean zero functions also has mean zero. Similarly, cf = c f , and
any scalar multiple of a mean zero function also has mean zero.
(h) Let x0 I be a given point. Then the set of all functions f (x) that vanish
at the point, f (x0 ) = 0, is a subspace. Indeed, if f (x0 ) = 0 and g(x0 ) = 0, then clearly
(f +g)(x0 ) = 0 and c f (x0 ) = 0, proving closure. This example can evidently be generalized
to functions that vanish at several points, or even on an entire subset.
(i) The set of all solutions u = f (x) to the homogeneous linear differential equation
u00 + 2 u0 3 u = 0.
Indeed, if u = f (x) and u = g(x) are solutions, so are u = f (x) + g(x) and u = c f (x) for
any c R. Note that we do not need to actually solve the equation to verify these claims!
1/12/04
58
c 2003
Peter J. Olver
where the coefficients c1 , c2 , . . . , ck are any scalars, is known as a linear combination of the
elements v1 , . . . , vk . Their span is the subset W = span {v1 , . . . , vk } V consisting of all
possible linear combinations (2.5).
For example,
3 v1 + v2 2 v 3 ,
8 v1 31 v3 ,
v2 = 0 v 1 + 1 v 2 + 0 v 3 ,
and
0 = 0 v 1 + 0 v2 + 0 v3 ,
are four different linear combinations of the three vector space elements v 1 , v2 , v3 V .
The key observation is that a span always forms a subspace.
Proposition 2.14. The span of a collection of vectors, W = span {v1 , . . . , vk },
forms a subspace of the underlying vector space.
Proof : We need to show that if
v = c 1 v1 + + c k vk
and
b=b
v
c 1 v1 + + b
c k vk
b = (c1 + b
v+v
c1 )v1 + + (ck + b
ck )vk ,
a v = (a c1 )v1 + + (a ck )vk
1/12/04
59
Q.E .D.
c 2003
Peter J. Olver
(2.6)
For example, the function cos( x + 2) lies in the span because, by the addition formula
for the cosine,
cos( x + 2) = cos 2 cos x sin 2 sin x
is a linear combination of cos x and sin x.
We can express a general function in their span in the alternative phase-amplitude
form
f (x) = c1 cos x + c2 sin x = r cos( x ).
(2.7)
Expanding the right hand side, we find
r cos( x ) = r cos cos x + r sin sin x
1/12/04
60
c 2003
Peter J. Olver
3
2
1
-4
-2
2
-1
-2
-3
Figure 2.1.
and hence
c1 = r cos ,
c2 = r sin .
We can view the amplitude r 0 and the phase shift as the polar coordinates of point
c = (c1 , c2 ) R 2 prescribed by the coefficients. Thus, any combination of sin x and
cos x can be rewritten as a single cosine, with a phase lag. Figure 2.1 shows the particular
case 3 cos(2 x 1) which has amplitude r = 3, frequency = 2 and phase shift = 1. The
first peak appears at x = / = 12 .
(c) The space T (2) of quadratic trigonometric polynomials is spanned by the functions
1,
cos x,
sin x,
cos2 x,
sin2 x.
cos x sin x,
Thus, the general quadratic trigonometric polynomial can be written as a linear combination
q(x) = c0 + c1 cos x + c2 sin x + c3 cos2 x + c4 cos x sin x + c5 sin2 x,
(2.8)
where c0 , . . . , c5 are arbitrary constants. A more useful spanning set for the same subspace
is the trigonometric functions
1,
cos x,
sin x,
cos 2 x,
sin 2 x.
(2.9)
have the form of a quadratic trigonometric polynomial (2.8), and hence both belong to
T (2) . On the other hand, we can write
cos2 x =
1
2
cos 2 x + 12 ,
cos x sin x =
1
2
sin 2 x,
sin2 x = 12 cos 2 x + 12 ,
in terms of the functions (2.9). Therefore, the original linear combination (2.8) can be
written in the alternative form
q(x) = c0 +
1
2
c3 +
1
2
c5 + c1 cos x + c2 sin x + 12 c3
=b
c0 + b
c1 cos x + b
c2 sin x + b
c3 cos 2 x + b
c4 sin 2 x,
1/12/04
61
1
2
c5
cos 2 x +
1
2
c4 sin 2 x
(2.10)
c 2003
Peter J. Olver
and so the functions (2.9) do indeed span T (2) . It is worth noting that we first characterized T (2) as the span of 6 functions, whereas the second characterization only required 5
functions. It turns out that 5 is the minimal number of functions needed to span T (2) , but
the proof of this fact will be deferred until Chapter 3.
(d) The homogeneous linear ordinary differential equation
u00 + 2 u0 3 u = 0.
(2.11)
considered in part (i) of Example 2.12 has two independent solutions: f 1 (x) = ex and
f2 (x) = e 3 x . (Now may be a good time for you to review the basic techniques for solving
linear, constant coefficient ordinary differential equations.) The general solution to the
differential equation is a linear combination
u = c1 f1 (x) + c2 f2 (x) = c1 ex + c2 e 3 x .
Thus, the vector space of solutions to (2.11) is described as the span of these two basic
solutions. The fact that there are no other solutions is not obvious, but relies on the
basic existence and uniqueness theorems for linear ordinary differential equations; see
Theorem 7.33 for further details.
Remark : One can also define the span of an infinite collection of elements of a vector
space. To avoid convergence issues, one should only consider finite linear combinations
(2.5). For example, the span of the monomials 1, x, x2 , x3 , . . . is the space P () of all
polynomials. (Not the space of convergent Taylor series.) Similarly, the span of the
functions 1, cos x, sin x, cos 2 x, sin 2 x, cos 3 x, sin 3 x, . . . is the space of all trigonometric
polynomials, to be discussed in great detail in Chapter 12.
Linear Independence and Dependence
Most of the time, all of the vectors used to form a span are essential. For example, we
cannot use fewer than two vectors to span a plane in R 3 since the span of a single vector
is at most a line. However, in the more degenerate cases, some of the spanning elements
are not needed. For instance, if the two vectors are parallel, then their span is a line, but
only one of the vectors is really needed to define the line. Similarly, the subspace spanned
by the polynomials
p1 (x) = x 2,
p2 (x) = x2 5 x + 4,
p3 (x) = 3 x2 4 x,
p4 (x) = x2 1.
(2.12)
is the vector space P (2) of quadratic polynomials. But only three of the polynomials are
really required to span P (2) . (The reason will become clear soon, but you may wish to see
if you can demonstrate this on your own.) The elimination of such superfluous spanning
elements is encapsulated in the following basic definition.
Definition 2.17. The vectors v1 , . . . , vk V are called linearly dependent if there
exists a collection of scalars c1 , . . . , ck , not all zero, such that
c1 v1 + + ck vk = 0.
(2.13)
Vectors which are not linearly dependent are called linearly independent.
1/12/04
62
c 2003
Peter J. Olver
The restriction that the ci s not all simultaneously vanish is essential. Indeed, if
c1 = = ck = 0, then the linear combination (2.13) is automatically zero. To check
linear independence, one needs to show that the only linear combination that produces
the zero vector (2.13) is this trivial one. In other words, c1 = = ck = 0 is the one and
only solution to the vector equation (2.13).
Example 2.18. Some examples of linear independence and dependence:
(a) The vectors
1
0
1
v 3 = 4 ,
v 2 = 3 ,
v1 = 2 ,
3
1
1
v1 2 v2 + v3 = 0.
On the other hand, the first two vectors v1 , v2 are linearly independent. To see this,
suppose that
0
c1
c1 v1 + c 2 v2 = 2 c 1 + 3 c 2 = 0 .
0
c1 + c2
For this to happen, the coefficients c1 , c2 must satisfy the homogeneous linear system
c1 = 0,
2 c1 + 3 c2 = 0,
c1 + c2 = 0,
c1 5 c2 4 c3 = 0,
2 c1 + 4 c2 = 0.
But this has only the trivial solution c1 = c2 = c3 = 0, and so linear independence follows.
1/12/04
63
c 2003
Peter J. Olver
Remark : In the last example, we are using the basic fact that a polynomial is identically zero,
p(x) = a0 + a1 x + a2 x2 + + an xn 0
for all
x,
if and only if its coefficients all vanish: a0 = a1 = = an = 0. This is equivalent
to the self-evident fact that the basic monomial functions 1, x, x 2 , . . . , xn are linearly
independent; see Exercise .
Example 2.19. The set of quadratic trigonometric functions
1,
cos x,
sin x,
cos2 x,
cos x sin x,
sin2 x,
that were used to define the vector space T (2) of quadratic trigonometric polynomials, are,
in fact, linearly dependent. This is a consequence of the basic trigonometric identity
cos2 x + sin2 x 1
which can be rewritten as a nontrivial linear combination
1 + 0 cos x + 0 sin x cos2 x + 0 cos x sin x sin2 x 0
that sums to the zero function. On the other hand, the alternative spanning set
1,
cos x,
sin x,
cos 2 x,
sin 2 x,
A c = c 1 v1 + + c k vk ,
where
c2
c=
.. ,
.
ck
(2.14)
that expresses any linear combination in terms of matrix multiplication. For example,
1
3
0
c1
c1 + 3 c 2
1
3
0
1 2
1 c2 = c1 + 2 c2 + c3 = c1 1 + c2 2 + c3 1 .
4 1 2
c3
4 c1 c2 2 c 3
4
1
2
Formula (2.14) is an immediate consequence of the rules of matrix multiplication; see also
Exercise c. It allows us to reformulate the notions of linear independence and span in
terms of linear systems of equations. The main result is the following:
1/12/04
64
c 2003
Peter J. Olver
c = ( c1 , c2 , . . . , c k ) 6
=0
such that the linear combination
A c = c1 v1 + + ck vk = 0.
Therefore, linear dependence requires the existence of a nontrivial solution to the homogeneous linear system A c = 0.
Q.E.D.
Example 2.21. Let us determine whether the vectors
1
3
1
v1 = 2 ,
v2 = 0 ,
v3 = 4 ,
1
4
6
1 3
A=
2 0
1 4
v4 = 2 ,
3
(2.15)
1
4
6
4
2 .
3
According to Theorem 2.20, we need to figure out whether there are any nontrivial solutions
to the homogeneous equation A c = 0; this can be done by reducing A to row echelon form,
which is
1 3
1
4
U = 0 6 6 6 .
(2.16)
0 0
0
0
The general solution to the homogeneous system A c = 0 is
c = ( 2 c 3 c4 , c 3 c4 , c3 , c4 ) ,
where c3 , c4 the free variables are arbitrary. Any nonzero choice of c3 , c4 will produce
a nontrivial linear combination
(2 c3 c4 )v1 + ( c3 c4 )v2 + c3 v3 + c4 v4 = 0
that adds up to the zero vector. Therefore, the vectors (2.15) are linearly dependent.
1/12/04
65
c 2003
Peter J. Olver
In fact, Theorem 1.45 says that in this particular case we didnt even need to do the
row reduction if we only needed to answer the question of linear dependence or linear
independence. Any coefficient matrix with more columns than rows automatically has a
nontrivial solution to the associated homogeneous system. This implies the following:
Lemma 2.22. Any collection of k > n vectors in R n is linearly dependent.
Warning: The converse to this lemma is not true. For example, the two vectors
T
T
v1 = ( 1, 2, 3 ) and v2 = ( 2, 4, 6 ) in R 3 are linearly dependent since 2 v1 + v2 = 0.
For a collection of n or fewer vectors in R n , one does need to perform the elimination to
calculate the rank of the corresponding matrix.
Lemma 2.22 is a particular case of the following general characterization of linearly
independent vectors.
Proposition 2.23. A set of k vectors in R n is linearly independent if and only if
the corresponding n k matrix A has rank k. In particular, this requires k n.
Or, to state the result another way, the vectors are linearly independent if and only if
the linear system A c = 0 has no free variables. The proposition is an immediate corollary
of Propositions 2.20 and 1.45.
Example 2.21. (continued ) Let us now see which vectors b R 3 lie in the span of
the vectors (2.15). This will be the case if and only if the linear system A x = b has a
solution. Since the resulting row echelon form (2.16) has a row of all zeros, there will be a
compatibility condition on the entries of b, and therefore not every vector lies in the span.
To find the precise condition, we augment the coefficient matrix, and apply the same row
operations, leading to the reduced augmented matrix
1 3
1
4
b1
0 6 6 6
.
b2 2 b 1
0 0
0
0 b +7b 4b
3
7
6
b2 + b3 = 0.
66
c 2003
Peter J. Olver
2.4. Bases.
In order to span a vector space or subspace, we must use a sufficient number of distinct
elements. On the other hand, including too many elements in the spanning set will violate
linear independence, and cause redundancies. The optimal spanning sets are those that are
also linearly independent. By combining the properties of span and linear independence,
we arrive at the all-important concept of a basis.
Definition 2.25. A basis of a vector space V is a finite collection of elements
v1 , . . . , vn V which (a) span V , and (b) are linearly independent.
Bases are absolutely fundamental in all areas of linear algebra and linear analysis, including matrix algebra, geometry of Euclidean space, solutions to linear differential equations, both ordinary and partial, linear boundary value problems, Fourier analysis, signal
and image processing, data compression, control systems, and so on.
Example 2.26. The standard basis of R n consists of the n vectors
1
0
0
0
1
0
0
0
0
e1 = .. ,
e2 = .. ,
...
en =
..
,
.
.
.
0
0
0
0
0
1
(2.17)
so that ei is the vector with 1 in the ith slot and 0s elsewhere. We already encountered
these vectors as the columns of the n n identity matrix, as in (1.39). They clearly span
R n since we can write any vector
x
1
x2
x=
.. = x1 e1 + x2 e2 + + xn en ,
.
xn
(2.18)
as a linear combination, whose coefficients are the entries of x. Moreover, the only linear
combination that gives the zero vector x = 0 is the trivial one x1 = = xn = 0, and so
e1 , . . . , en are linearly independent.
Remark : In the three-dimensional case R 3 , a common physical notation for the standard basis is
1
0
0
i = e1 = 0 ,
j = e2 = 1 ,
k = e3 = 0 .
(2.19)
0
0
1
There are many other possible bases for R 3 . Indeed, any three non-coplanar vectors
can be used to form a basis. This is a consequence of the following general characterization
of bases in R n .
1/12/04
67
c 2003
Peter J. Olver
n
X
aij vi ,
j = 1, . . . , k,
i=1
n X
k
X
aij cj vi .
i=1 j =1
T
consisting of n equations in k > n unknowns. Theorem 1.45 guarantees that every homogeneous system with more unknowns than equations always has a non-trivial solution
c6
= 0, and this immediately implies that w1 , . . . , wk are linearly dependent.
Q.E.D.
Proof of Theorem 2.28 : Suppose we have two bases containing a different number of
elements. By definition, the smaller basis spans the vector space. But then Lemma 2.29
tell us that the elements in the larger purported basis must be linearly dependent. This
contradicts our assumption that both sets are bases, and proves the theorem.
Q.E.D.
As a direct consequence, we can now provide a precise meaning to the optimality
property of bases.
1/12/04
68
c 2003
Peter J. Olver
n
X
ci vi
(2.20)
i=1
1/12/04
69
c 2003
Peter J. Olver
Proof : The condition that the basis span V implies every x V can be written as
some linear combination of the basis elements. Suppose we can write an element
x = c 1 v1 + + c n vn = b
c 1 v1 + + b
c n vn
1
1
1
1
1
1
v3 =
v2 =
v1 = ,
,
,
0
1
1
0
1
1
0
0
v4 =
,
1
1
(2.21)
form a basis of R 4 . This is verified by performing Gaussian elimination on the corresponding 4 4 matrix
1 1
1
0
1 1 1 0
A=
,
1 1 0
1
1 1 0 1
to check that it is nonsingular. This basis is a very simple example of a wavelet basis; the
general case will be discussed in Section 13.2. Wavelets arise in modern applications to
signal and digital image processing, [43, 128].
How do we find the coordinates of a vector x relative to the basis? We need to fix the
coefficients c1 , c2 , c3 , c4 so that
x = c 1 v1 + c 2 v2 + c 3 v3 + c 4 v4 .
We rewrite this equation in matrix form
x = Ac
where
c = ( c 1 , c2 , c3 , c4 ) .
T
For example, solving the linear system for the vector x = ( 4, 2, 1, 5 ) by Gaussian
Elimination produces the unique solution c1 = 2, c2 = 1, c3 = 3, c4 = 2, which are its
coordinates in the wavelet basis:
4
1
1
1
0
2
1 1
1
0
= 2 v 1 v2 + 3 v 3 2 v 4 = 2
+ 3
2
.
1
1
1
0
1
5
1
1
0
1
1/12/04
70
c 2003
Peter J. Olver
for
c = A1 x.
(2.22)
(2.24)
An alternative name for the range is the column space of the matrix. By definition, a
vector b R m belongs to rng A if and only if it can be written as a linear combination,
b = x 1 v1 + + x n vn ,
of the columns of A = ( v1 v2 . . . vn ). By our basic matrix multiplication formula (2.14),
the right hand side of this equation equals the product A x of the matrix A with the column
T
vector x = ( x1 , x2 , . . . , xn ) , and hence b = A x for some x R n , so
rng A = { A x | x R n } R m .
1/12/04
71
(2.25)
c 2003
Peter J. Olver
Therefore, a vector b lies in the range of A if and only if the linear system A x = b has
a solution. Thus, the compatibility conditions for linear systems can be re-interpreted as
the conditions for a vector to lie in the range of the coefficient matrix.
A common alternative name for the kernel is the null space of the matrix A. The kernel
of A is the set of solutions to the homogeneous system A z = 0. The proof that ker A is a
subspace requires us to verify the usual closure conditions. Suppose that z, w ker A, so
that A z = 0 = A w. Then, for any scalars c, d,
A(c z + d w) = c A z + d A w = 0,
which implies that c z + d w ker A, proving that ker A is a subspace. This fact can be
re-expressed as the following superposition principle for solutions to a homogeneous system
of linear equations.
Theorem 2.35. If z1 , . . . , zk are solutions to a homogeneous linear system A z = 0,
then so is any linear combination c1 z1 + + ck zk .
Warning: The set of solutions to an inhomogeneous linear system A x = b with b 6
=0
is not a subspace.
1 2 0
3
Example 2.36. Let us compute the kernel of the matrix A = 2 3 1 4 .
3 5 1 1
Since we are solving the homogeneous system A x = 0, we only need
perform the elemen
1 2 0
3
tary row operations on A itself. The resulting row echelon form U = 0 1 1 10
0 0
0
0
corresponds to the equations x 2 y + 3w = 0, y z 10 w = 0. The free variables are
z, w. The general solution to the homogeneous system is
x
2 z + 17 w
2
17
y z + 10 w
1
10
x= =
= z + w ,
z
z
1
0
w
w
0
1
which, for arbitrary scalars z, w, describes the most general vector in ker A. Thus, the
kernel of this matrix is the two-dimensional subspace of R 4 spanned by the linearly indeT
T
pendent vectors ( 2, 1, 1, 0 ) , ( 17, 10, 0, 1 ) .
Remark : This example is indicative of a general method for finding a basis for ker A
which will be developed in more detail in the following section.
Once we know the kernel of the coefficient matrix A, i.e., the space of solutions to the
homogeneous system A z = 0, we are in a position to completely characterize the solutions
to the inhomogeneous linear system (2.23).
Theorem 2.37. The linear system A x = b has a solution x? if and only if b lies in
the range of A. If this occurs, then x is a solution to the linear system if and only if
x = x? + z,
(2.26)
72
c 2003
Peter J. Olver
1 0 1
x1
A = 0 1 2 ,
x = x2 ,
x3
1 2 3
b1
b = b2 ,
b3
where the right hand side of the system will be left arbitrary. Applying our usual Gaussian
Elimination procedure to the augmented matrix
1 0 1
b1
1 0 1 b1
0 1 2
.
0 1 2 b2
leads to the row echelon form
b2
0 0 0 b3 + 2 b 2 b 1
1 2 3 b3
The system has a solution if and only if the resulting compatibility condition
b1 + 2 b 2 + b3 = 0
(2.27)
holds. This equation serves to characterize the vectors b that belong to the range of the
matrix A, which is therefore a certain plane in R 3 passing through the origin.
1/12/04
73
c 2003
Peter J. Olver
z2 2 z3 = 0.
z2 = 2 c,
z3 = c,
T
x2 = 1 + 2 c,
x3 = c,
where c is an arbitrary scalar. We can write the solution in the form (2.26), namely
3+c
3
1
x = 1 + 2 c = 1 + c 2 = x? + z,
c
0
1
T
is the
74
c 2003
Peter J. Olver
f
x
4 1
=
g
y
1 4
models the mechanical response of a pair of masses connected by springs to an external
T
force. The solution x = ( x, y ) represent the respective displacements of the masses,
T
while the components of the right hand side f = ( f, g ) represent the respective forces
applied to each mass. (See Chapter 6 for full details.) We can compute the response of the
4
T
1 T
system x?1 = 15
to a unit force e1 = ( 1, 0 ) on the first mass, and the response
, 15
1 4 T
T
x?2 = 15
to a unit force e2 = ( 0, 1 ) on the second mass. We then know the
, 15
response of the system to a general force, since we can write
0
1
f
,
+g
= f e1 + g e2 = f
f=
1
0
g
and hence the solution is
x = f x?1 + g x?2 = f
4
15
1
15
+g
1
15
4
15
4
1
15 f 15 g
4
1
f + 15
g
15
A x = b2 ,
...
A x = bk ,
(2.28)
(2.30)
(2.31)
75
c 2003
Peter J. Olver
for each
i = 1, . . . , m,
(2.32)
where e1 , . . . , em are the standard basis vectors of R m , cf. (2.17), then we can reconstruct
a particular solution x? to the general linear system A x = b by first writing
b = b1 e1 + + b m em
as a linear combination of the basis vectors, and then using superposition to form
x? = b1 x?1 + + bm x?m .
(2.33)
However, for linear algebraic systems, the practical value of this insight is rather limited.
Indeed, in the case when A is square and nonsingular, the superposition method is just a
reformulation of the method of computing the inverse of the matrix. Indeed, the vectors
x?1 , . . . , x?n which satisfy (2.32) are just the columns of A1 , cf. (1.39), and the superposition formula (2.33) is, using (2.14), precisely the solution formula x ? = A1 b that we
abandoned in practical computations, in favor of the more efficient Gaussian elimination
method. Nevertheless, the implications of this result turn out to be of great importance
in the study of linear boundary value problems.
Adjoint Systems, Cokernel, and Corange
A linear system of m equations in n unknowns results in an mn coefficient matrix A.
The transposed matrix AT will be of size n m, and forms the coefficient of an associated
linear system consisting of n equations in m unknowns.
Definition 2.43. The adjoint to a linear system A x = b of m equations in n
unknowns is the linear system
AT y = f
(2.34)
of n equations in m unknowns. Here y R m and f R n .
Example 2.44. Consider the linear system
x1 3 x 2 7 x 3 + 9 x 4 = b 1 ,
x2 + 5 x 3 3 x 4 = b 2 ,
x1 2 x 2 2 x 3 + 6 x 4 = b 3 ,
1
3
has transpose AT =
7
9
(2.35)
1 3 7 9
unknowns. Its coefficient matrix is A = 0 1
5 3
1 2 2 6
0
1
1 2
. Thus, the adjoint system to (2.35) is the following
5 2
3 6
Warning: Many texts misuse the term adjoint to describe the classical adjugate or cofactor
matrix. These are completely unrelated, and the latter will play no role in this book.
1/12/04
76
c 2003
Peter J. Olver
(2.36)
7 y 1 + 5 y 2 2 y 3 = f3 ,
9 y1 3 y 2 + 6 y 3 = f4 .
On the surface, there appears to be little direct connection between the solutions to a
linear system and its adjoint. Nevertheless, as we shall soon see (and then in even greater
depth in Sections 5.6 and 8.5) there are remarkable, but subtle interrelations between the
two. These turn out to have significant consequences, not only for linear algebraic systems
but to even more profound extensions to differential equations.
To this end, we use the adjoint system to define the other two fundamental subspaces
associated with a coefficient matrix A.
Definition 2.45. The corange of an m n matrix A is the range of its transpose,
(2.37)
corng A = rng AT = AT y y R m R n .
coker A = ker AT = w R m AT w = 0 R m ,
(2.38)
The corange coincides with the subspace of R n spanned by the rows of A, and is
sometimes referred to as the row space. As a consequence of Theorem 2.37, the adjoint
system AT y = f has a solution if and only if f rng AT = corng A.
To solve the linear system
(2.35) appearing above,
we perform
1 3 7 9 b1
Gaussian Elimination on its augmented matrix 0 1
5 3 b2 that reduces
b
3
1 2 2 6
1 3 7 9
b1
. Thus, the system has a
it to the row echelon form 0 1
5 3
b2
0 0
0
0
b3 b 2 b 1
solution if and only if b rng A satisfies the compatibility condition b 1 b2 + b3 = 0.
For such vectors, the general solution is
Example 2.46.
b1 + 3 b 2 8 x 3
b1 + 3 b 2
8
0
b2
b2 5 x 3 + 3 x 4
5
3
x=
=
+ x3
+ x 4 .
x3
0
1
0
x4
0
0
1
In the second expression, the first vector is a particular solution and the remaining terms
constitute the general element of the two-dimensional kernel of A.
1/12/04
77
c 2003
Peter J. Olver
The solution to the adjoint system (2.36) is also obtained by Gaussian Elimination
1
0
1 f1
3 1 2 f2
starting with its augmented matrix
9 3 6
f4
1 0 1
f1
f2
0 1 1
form is
0 0 0
f4 + 3 f 2
required for a solution to the adjoint system: 8 f1 5 f2 + f3 = 0, 3 f2 + f4 = 0. These
are the conditions required for the right hand side to belong to the corange: f rng A T =
corng A. If satisfied, the adjoint system has the general solution depending on the single
free variable y3 :
f1 y 3
f1
1
y = 3 f1 + f2 y3 = 3 f1 + f2 + y3 1 .
y3
0
1
In the latter formula, the first term represents a particular solution, while the second is
the general element of ker AT = coker A.
The Fundamental Theorem of Linear Algebra
The four fundamental subspaces associated with an m n matrix A, then, are its
range, corange, kernel and cokernel. The range and cokernel are subspaces of R m , while
the kernel and corange are subspaces of R n . Moreover, these subspaces are not completely
arbitrary, but are, in fact, profoundly related through both their numerical and geometric
properties.
The Fundamental Theorem of Linear Algebra states that their dimensions are entirely
prescribed by the rank (and size) of the matrix.
Theorem 2.47. Let A be an m n matrix of rank r. Then
dim corng A = dim rng A = rank A = rank AT = r,
dim ker A = n r,
dim coker A = m r.
(2.39)
Remark : Thus, the rank of a matrix, i.e., the number of pivots, indicates the number
of linearly independent columns, which, remarkably, is always the same as the number of
linearly independent rows! A matrix and its transpose have the same rank, i.e., the same
number of pivots, even though their row echelon forms are quite different, and are rarely
transposes of each other. Theorem 2.47 also proves our earlier contention that the rank of
a matrix is an intrinsic quantity, and does not depend on which specific elementary row
operations are employed during the reduction process, nor on the final row echelon form.
Not to be confused with the Fundamental Theorem of Algebra, that states that every polynomial has a complex root; see Theorem 16.62.
1/12/04
78
c 2003
Peter J. Olver
2 1 1
2
A = 8 4 6 4 .
4 2 3
2
2 1 1 2
The row echelon form of A is obtained in the usual manner: U = 0 0 2 4 .
0 0
0 0
There are two pivots, and thus the rank of A is r = 2.
Kernel : We need to find the solutions to the homogeneous system A x = 0. In our
example, the pivots are in columns 1 and 3, and so the free variables are x 2 , x4 . Using back
substitution on the reduced homogeneous system U x = 0, we find the general solution
1
1
2
2 x2 2 x 4
2
x2
1
0
(2.40)
x=
= x2 + x4
.
0
2
2 x4
1
0
x4
Note that the second and fourth entries are the corresponding free variables x 2 , x4 . Therefore,
T
T
z2 = ( 2 0 2 1 ) ,
z1 = 12 1 0 0 ,
are the basis vectors for ker A. By construction, they span the kernel, and linear independence follows easily since the only way in which the linear combination (2.40) could
vanish, x = 0, is if both free variables vanish: x2 = x4 = 0. In general, there are n r
free variables, each corresponding to one of the basis elements of the kernel, which thus
implies the dimension formula for ker A.
Corange: The corange is the subspace of R n spanned by the rows of A. We claim
that applying an elementary row operation does not alter the corange. To see this for row
b is obtained adding a times the
operations of the first type, suppose, for instance, that A
b
first row of A to the second row. If r1 , r2 , r3 , . . . , rm are the rows of A, then the rows of A
are r1 , b
r2 = r2 + a r1 , r3 , . . . , rm . If
v = c 1 r1 + c 2 r2 + c 3 r3 + + c m rm
where
b
c 1 = c1 a c 2 ,
b
is also a linear combination of the rows of the new matrix, and hence lies in corng A.
b implies v corng A and we conclude that
The converse is also valid v corng A
elementary row operations of Type #1 do not change corng A. The proof for the other
two types of elementary row operations is even easier, and left to the reader.
1/12/04
79
c 2003
Peter J. Olver
Since the row echelon form U is obtained from A by a sequence of elementary row
operations, we conclude that corng A = corng U . Moreover, because each nonzero row in
U contains a pivot, it is not hard to see that the nonzero rows of corng U are linearly
independent, and hence form a basis of both corng U and corng A. Since there is one row
per pivot, corng U = corng A has dimension r, the number of pivots. In our example, then,
a basis for corng A consists of the row vectors
s1 = ( 2 1 1 2 ),
s2 = ( 0 0 2 4 ).
The reader may wish to verify their linear independence, as well as the fact that every row
of A lies in their span.
Range: There are two methods for computing a basis of the range or column space.
The first proves that it has dimension equal to the rank. This has the important, and
remarkable consequence that the space spanned by the rows of a matrix and the space
spanned by its columns always have the same dimension, even though they are, in general,
subspaces of different vector spaces.
Now the range of A and the range of U are, in general, different subspaces, so we
cannot directly use a basis for rng U as a basis for rng A. However, the linear dependencies
among the columns of A and U are the same. It is not hard to see that the columns of U
that contain the pivots form a basis for rng U . This implies that the same columns of A
form a basis for rng A. In particular, this implies that dim rng A = dim rng U = r.
In our example, the pivots lie in the first and third columns of U , and hence the first
and third columns of A, namely
2
1
v1 = 8 ,
v3 = 6 ,
4
3
form a basis for rng A. This implies that every column of A can be written uniquely as a
linear combination of the first and third column, as you can validate directly.
In more detail, using our matrix multiplication formula (2.14), we see that a linear
combination of columns of A is trivial,
c1 v1 + + cn vn = A c = 0,
if and only if c ker A. But we know ker A = ker U , and so the same linear combination
of columns of U , namely
U c = c1 u1 + + cn un = 0,
is also trivial. In particular, the linear independence of the pivot columns of U , labeled
uj1 , . . . , ujr , implies the linear independence of the same collection, vj1 , . . . , vjr , of columns
of A. Moreover, the fact that any other column of U can be written as a linear combination
uk = d 1 uj1 + + d r ujr
of the pivot columns implies that the same holds for the corresponding column of A, so
v k = d 1 v j1 + + d r v jr .
1/12/04
80
c 2003
Peter J. Olver
We conclude that the pivot columns of A form a basis for its range or column space.
An alternative method to find a basis for the range is to note that rng A = corng A T .
Thus, we can employ our previous algorithm to compute corng AT . In our example, applying Gaussian elimination to
2 8 4
2 8 4
0 2 1
1 4 2
b =
AT =
Observe that the row echelon form of AT is not the transpose of the row echelon form of
A! However, they do have the same number of pivots since both A and A T have the same
b , we conclude that
rank. Since the pivots of AT are in the first two columns of U
0
2
y2 = 2 ,
y1 = 8 ,
1
4
0
0
1
1
y = 2 y3 = y 3 2 .
y3
1
2
81
c 2003
Peter J. Olver
Figure 2.2.
Figure 2.3.
the edges represent the beams and the vertices the joints where the beams are connected.
In each case, the graph encodes the topology meaning interconnectedness of the
system, but not its geometry lengths of edges, angles, etc.
Two graphs are considered to be the same if one can identify all their edges and
vertices, so that they have the same connectivity properties. A good way to visualize
this is to think of the graph as a collection of strings connected at the vertices. Moving
the vertices and strings around without cutting or rejoining them will have no effect on
the underlying graph. Consequently, there are many ways to draw a given graph; three
equivalent graphs appear in Figure 2.3.
Two vertices in a graph are adjacent if there is an edge connecting them. Two edges
are adjacent if they meet at a common vertex. For instance, in the graph in Figure 2.4, all
vertices are adjacent; edge 1 is adjacent to all edges except edge 5. A path is a sequence of
distinct, i.e., non-repeated, edges, with each edge adjacent to its successor. For example,
in Figure 2.4, one path starts at vertex #1, then goes in order along the edges labeled as
1, 4, 3, 2, thereby passing through vertices 1, 2, 4, 1, 3. Note that while an edge cannot be
repeated in a path, a vertex may be. A circuit is a path that ends up where it began. For
example, the circuit consisting of edges 1, 4, 5, 2 starts at vertex 1, then goes to vertices
2, 4, 3 in order, and finally returns to vertex 1. The starting vertex for a circuit is not
important. For example, edges 4, 5, 2, 1 also represent the same circuit we just described.
A graph is called connected if you can get from any vertex to any other vertex by a path,
1/12/04
82
c 2003
Peter J. Olver
A Simple Graph.
Figure 2.4.
Figure 2.5.
Digraphs.
which is by far the most important case for applications. We note that every graph can
be decomposed into a disconnected collection of connected subgraphs.
In electrical circuits, one is interested in measuring currents and voltage drops along
the wires in the network represented by the graph. Both of these quantities have a direction,
and therefore we need to specify an orientation on each edge in order to quantify how the
current moves along the wire. The orientation will be fixed by specifying the vertex the
edge starts at, and the vertex it ends at. Once we assign a direction to an edge, a
current along that wire will be positive if it moves in the same direction, i.e., goes from
the starting vertex to the ending one, and negative if it moves in the opposite direction.
The direction of the edge does not dictate the direction of the current it just fixes what
directions positive and negative values of current represent. A graph with directed edges
is known as a directed graph or digraph for short. The edge directions are represented by
arrows; examples of digraphs can be seen in Figure 2.5.
Consider a digraph D consisting of n vertices connected by m edges. The incidence
matrix associated with D is an m n matrix A whose rows are indexed by the edges
and whose columns are indexed by the vertices. If edge k starts at vertex i and ends at
vertex j, then row k of the incidence matrix will have a + 1 in its (k, i) entry and 1 in its
(k, j) entry; all other entries of the row are zero. Thus, our convention is that a + 1 entry
1/12/04
83
c 2003
Peter J. Olver
A Simple Digraph.
Figure 2.6.
represents the vertex at which the edge starts and a 1 entry the vertex at which it ends.
A simple example is the digraph in Figure 2.6, which consists of five edges joined at
four different vertices. Its 5 4 incidence matrix is
1 1 0
0
1 0 1 0
A = 1 0
0 1 .
(2.42)
0 1
0 1
0 0
1 1
Thus the first row of A tells us that the first edge starts at vertex 1 and ends at vertex 2.
Similarly, row 2 says that the second edge goes from vertex 1 to vertex 3. Clearly one can
completely reconstruct any digraph from its incidence matrix.
Example 2.48. The matrix
1
1
0
A=
0
0
0
1
0
1
1
0
0
0
1
1
0
1
0
0
0
0
1
1
1
0
0
0
.
0
0
1
(2.43)
qualifies as an incidence matrix because each row contains a single +1, a single 1, and
the other entries are 0. Let us construct the digraph corresponding to A. Since A has five
columns, there are five vertices in the digraph, which we label by the numbers 1, 2, 3, 4, 5.
Since it has seven rows, there are 7 edges. The first row has its + 1 in column 1 and its
1 in column 2 and so the first edge goes from vertex 1 to vertex 2. Similarly, the second
edge corresponds to the second row of A and so goes from vertex 3 to vertex 1. The third
row of A gives an edge from vertex 3 to vertex 2; and so on. In this manner we construct
the digraph drawn in Figure 2.7.
The incidence matrix has important geometric and quantitative consequences for the
graph it represents. In particular, its kernel and cokernel have topological significance. For
1/12/04
84
c 2003
Peter J. Olver
1
1
2
3
5
6
Another Digraph.
Figure 2.7.
example, the kernel of the incidence matrix (2.43) is spanned by the single vector
T
z = (1 1 1 1 1) ,
and represents the fact that the sum of the entries in any given row of A is zero. This
observation holds in general for connected digraphs.
Proposition 2.49. If A is the incidence matrix for a connected digraph, then ker A
T
is one-dimensional, with basis z = ( 1 1 . . . 1 ) .
Proof : If edge k connects vertices i and j, then the k th equation in A z = 0 is zi = zj .
The same equality holds, by a simple induction, if the vertices i and j are connected
by a path. Therefore, if D is connected, all the entries of z are equal, and the result
follows.
Q.E.D.
Corollary 2.50. If A is the incidence matrix for a connected digraph with n vertices,
then rank A = n 1.
Proof : This is an immediate consequence of Theorem 2.47.
Q.E.D.
Next, let us look at the cokernel of an incidence matrix. Consider the particular
example (2.42) corresponding to the digraph in Figure 2.6. We need to compute the kernel
of the transposed incidence matrix
1
1
1
0
0
0
1
0
1 0
(2.44)
AT =
.
0 1 0
0
1
0
0 1 1 1
Solving the homogeneous system AT y = 0 by Gaussian elimination, we discover that
coker A = ker AT is spanned by the two vectors
T
y1 = ( 1 0 1 1 0 ) ,
1/12/04
y2 = ( 0 1 1 0 1 ) .
85
c 2003
Peter J. Olver
Each of these vectors represents a circuit in the digraph, the nonzero entries representing
the direction in which the edges are traversed. For example, y1 corresponds to the circuit
that starts out along edge #1, then traverses edge #4 and finishes by going along edge #3
in the reverse direction, which is indicated by the minus sign in its third entry. Similarly,
y2 represents the circuit consisting of edge #2, followed by edge #5, and then edge #3,
backwards. The fact that y1 and y2 are linearly independent vectors says that the two
circuits are independent.
The general element of coker A is a linear combination c1 y1 + c2 y2 . Certain values of
the constants lead to other types of circuits; for example y1 represents the same circuit
as y1 , but traversed in the opposite direction. Another example is
T
y1 y2 = ( 1 1 0 1 1 ) ,
which represents the square circuit going around the outside of the digraph, along edges
1, 4, 5, 2, the fourth and second being in the reverse direction. We can view this circuit as a
combination of the two triangular circuits; when we add them together the middle edge #3
is traversed once in each direction, which effectively cancels its contribution. (A similar
cancellation occurs in the theory of line integrals; see Section A.5.) Other combinations
represent virtual circuits; for instance, one can interpret 2 y1 21 y2 as two times around
the first triangular circuit plus one half of the other triangular circuit, in the opposite
direction whatever that might mean.
Let us summarize the preceding discussion.
Theorem 2.51. Each circuit in a digraph D is represented by a vector in the cokernel
of its incidence matrix, whose entries are + 1 if the edge is traversed in the correct direction,
1 if in the opposite direction, and 0 if the edge is not in the circuit. The dimension of
the cokernel of A equals the number of independent circuits in D.
The preceding two theorems have an important and remarkable consequence. Suppose
D is a connected digraph with m edges and n vertices and A its m n incidence matrix.
Corollary 2.50 implies that A has rank r = n 1 = n dim ker A. On the other hand,
Theorem 2.51 tells us that dim coker A = l equals the number of independent circuits in
D. The Fundamental Theorem 2.47 says that r = m l. Equating these two different
computations of the rank, we find r = n 1 = m l, or n + l = m + 1. This celebrated
result is known as Eulers formula for graphs, first discovered by the extraordinarily prolific
eighteenth century Swiss mathematician Leonhard Euler .
Theorem 2.52. If G is a connected graph, then
# vertices + # independent circuits = # edges + 1.
(2.45)
Remark : If the graph is planar , meaning that it can be graphed in the plane without
any edges crossing over each other, then the number of independent circuits is equal to
the number of holes in the graph, i.e., the number of distinct polygonal regions bounded
Pronounced Oiler
1/12/04
86
c 2003
Peter J. Olver
Figure 2.8.
A Cubical Digraph.
by the edges of the graph. For example, the pentagonal digraph in Figure 2.7 bounds
three triangles, and so has three independent circuits. For non-planar graphs, (2.45) gives
a possible definition of the number of independent circuits, but one that is not entirely
standard. A more detailed discussion relies on further developments in the topological
properties of graphs, cf. [33].
Example 2.53. Consider the graph corresponding to the edges of a cube, as illustrated in Figure 2.8, where the second figure represents the same graph squashed down
onto a plane. The graph has 8 vertices and 12 edges. Eulers formula (3.76) tells us that
there are 5 independent circuits. These correspond to the interior square and four trapezoids in the planar version of the digraph, and hence to circuits around 5 of the 6 faces
of the cube. The missing face does indeed define a circuit, but it can be represented as
the sum of the other five circuits, and so is not independent. In Exercise , the reader is
asked to write out the incidence matrix for the cubical digraph and explicitly identify the
basis of its kernel with the circuits.
We do not have the space to further develop the remarkable connections between
graph theory and linear algebra. The interested reader is encouraged to consult a text
devoted to graph theory, e.g., [33].
1/12/04
87
c 2003
Peter J. Olver
Chapter 3
Inner Products and Norms
The geometry of Euclidean space relies on the familiar properties of length and angle.
The abstract concept of a norm on a vector space formalizes the geometrical notion of the
length of a vector. In Euclidean geometry, the angle between two vectors is governed by
their dot product, which is itself formalized by the abstract concept of an inner product.
Inner products and norms lie at the heart of analysis, both linear and nonlinear, in both
finite-dimensional vector spaces and infinite-dimensional function spaces. It is impossible
to overemphasize their importance for both theoretical developments, practical applications
in all fields, and in the design of numerical solution algorithms.
Mathematical analysis is founded on a few key inequalities. The most basic is the
CauchySchwarz inequality, which is valid in any inner product space. The more familiar triangle inequality for the associated norm is derived as a simple consequence. Not
every norm arises from an inner product, and in the general situation, the triangle inequality becomes part of the definition. Both inequalities retain their validity in both
finite-dimensional and infinite-dimensional vector spaces. Indeed, their abstract formulation helps focus on the key ideas in the proof, avoiding distracting complications resulting
from the explicit formulas.
In Euclidean space R n , the characterization of general inner products will lead us
to the extremely important class of positive definite matrices. Positive definite matrices
play a key role in a variety of applications, including minimization problems, least squares,
mechanical systems, electrical circuits, and the differential equations describing dynamical
processes. Later, we will generalize the notion of positive definiteness to more general linear
operators, governing the ordinary and partial differential equations arising in continuum
mechanics and dynamics. Positive definite matrices most commonly appear in so-called
Gram matrix form, consisting of the inner products between selected elements of an inner
product space. In general, positive definite matrices can be completely characterized by
their pivots resulting from Gaussian elimination. The associated matrix factorization can
be reinterpreted as the method of completing the square for the associated quadratic form.
So far, we have confined our attention to real vector spaces. Complex numbers, vectors
and functions also play an important role in applications, and so, in the final section, we
formally introduce complex vector spaces. Most of the formulation proceeds in direct
analogy with the real version, but the notions of inner product and norm on complex
vector spaces requires some thought. Applications of complex vector spaces and their
inner products are of particular importance in Fourier analysis and signal processing, and
absolutely essential in modern quantum mechanics.
1/12/04
88
c 2003
Peter J. Olver
v3
v
v2
v2
v1
v1
Figure 3.1.
n
X
vi wi ,
(3.1)
i=1
between (column) vectors v = ( v1 , v2 , . . . , vn ) , w = ( w1 , w2 , . . . , wn ) lying in the Euclidean space R n . An important observation is that the dot product (3.1) can be identified
with the matrix product
w
1
v w = v T w = ( v 1 v2
w2
. . . vn )
..
.
wn
(3.2)
vv =
v12 + v22 + + vn2 .
(3.3)
kvk =
This formula generalizes the classical Pythagorean Theorem to n-dimensional Euclidean
space; see Figure 3.1. Since each term in the sum is non-negative, the length of a vector is
also non-negative, k v k 0. Furthermore, the only vector of length 0 is the zero vector.
The dot product and norm satisfy certain evident properties, and these serve as the
basis for the abstract definition of more general inner products on real vector spaces.
1/12/04
89
c 2003
Peter J. Olver
Definition 3.1. An inner product on the real vector space V is a pairing that takes
two vectors v, w V and produces a real number h v ; w i R. The inner product is
required to satisfy the following three axioms for all u, v, w V , and c, d R.
(i ) Bilinearity:
h c u + d v ; w i = c h u ; w i + d h v ; w i,
(3.4)
h u ; c v + d w i = c h u ; v i + d h u ; w i.
(ii ) Symmetry:
h v ; w i = h w ; v i.
(iii ) Positivity:
hv;vi > 0
whenever
v6
= 0,
(3.5)
while
h 0 ; 0 i = 0.
(3.6)
A vector space equipped with an inner product is called an inner product space. As
we shall see, a given vector space can admit many different inner products. Verification of
the inner product axioms for the Euclidean dot product is straightforward, and left to the
reader.
Given an inner product, the associated norm of a vector v V is defined as the
positive square root of the inner product of the vector with itself:
p
(3.7)
kvk = hv;vi .
The positivity axiom implies that k v k 0 is real and non-negative, and equals 0 if and
only if v = 0 is the zero vector.
Example 3.2. While certainly the most important inner product on R n , the dot
product is by no means the only possibility. A simple example is provided by the weighted
inner product
w1
v1
.
(3.8)
,
w=
h v ; w i = 2 v 1 w1 + 5 v 2 w2 ,
v=
w2
v2
between vectors in R 2 . The symmetry axiom (3.5) is immediate. Moreover,
h c u + d v ; w i = 2 (c u1 + d v1 ) w1 + 5 (c u2 + d v2 ) w2
= (2 c u1 w1 + 5 c u2 w2 ) + (2 d v1 w1 + 5 d v2 w2 ) = c h u ; w i + d h v ; w i,
which verifies the first bilinearity condition; the second follows by a very similar computation. (Or, one can rely on symmetry; see Exercise .) Moreover,
h v ; v i = 2 v12 + 5 v22 0
is clearly strictly positive for any v 6
= 0 and equal to zero when v = 0, which proves
positivity and hence establishes (3.8) as an legitimate inner product on R 2 . The associated
weighted norm is
p
kvk =
2 v12 + 5 v22 .
A less evident example is provided by the expression
h v ; w i = v 1 w1 v 1 w2 v 2 w1 + 4 v 2 w2 .
1/12/04
90
(3.9)
c 2003
Peter J. Olver
Bilinearity is verified in the same manner as before, and symmetry is obvious. Positivity
is ensured by noticing that
h v ; v i = v12 2 v1 v2 + 4 v22 = (v1 v2 )2 + 3 v22 > 0
is strictly positive for all nonzero v =
6 0. Therefore, (3.9) defines an alternative inner
product on R 2 . The associated norm
p
kvk =
v12 2 v1 v2 + 4 v22
i=1
The numbers ci > 0 are the weights. The larger the weight ci , the more the ith coordinate
of v contributes to the norm. Weighted norms are particularly important in statistics and
data fitting, where one wants to emphasize certain quantities and de-emphasize others;
this is done by assigning suitable weights to the different components of the data vector
v. Section 4.3 on least squares approximation methods will contain further details.
Inner Products on Function Space
Inner products and norms on function spaces will play an absolutely essential role in
modern analysis, particularly Fourier analysis and the solution to boundary value problems
for both ordinary and partial differential equations. Let us introduce the most important
examples.
Example 3.4. Given a bounded closed interval [ a, b ] R, consider the vector space
C = C0 [ a, b ] consisting of all continuous functions f : [ a, b ] R. The integral
0
hf ;gi =
f (x) g(x) dx
(3.11)
defines an inner product on the vector space C0 , as we shall prove below. The associated
norm is, according to the basic definition (3.7),
s
Z b
f (x)2 dx .
(3.12)
kf k =
a
This quantity is known as the L2 norm of the function f over the interval [ a, b ]. The L2
norm plays the same role in infinite-dimensional function space that the Euclidean norm
or length of a vector plays in the finite-dimensional Euclidean vector space R n .
1/12/04
91
c 2003
Peter J. Olver
/2
0
/2
1
1
2
= .
sin x cos x dx = sin x
2
2
x=0
2
k sin x k =
.
(sin x) dx =
4
0
One must always be careful when evaluating function norms. For example, the constant
function c(x) 1 has norm
s
r
Z /2
2
k1k =
,
1 dx =
2
0
not 1 as you might have expected. We also note that the value of the norm depends upon
which interval the integral is taken over. For instance, on the longer interval [ 0, ],
sZ
k1k =
12 dx = .
0
Thus, when dealing with the L2 inner product or norm, one must always be careful to
specify the function space, or, equivalently, the interval on which it is being evaluated.
Let us prove that formula (3.11) does, indeed, define an inner product. First, we need
to check that h f ; g i is well-defined. This follows because the product f (x) g(x) of two
continuous functions is also continuous, and hence its integral over a bounded interval is
defined and finite. The symmetry condition for the inner product is immediate:
Z b
hf ;gi =
f (x) g(x) dx = h g ; f i,
a
valid for arbitrary continuous functions f, g, h and scalars (constants) c, d. The second
bilinearity axiom is proved similarly; alternatively, one can use symmetry to deduce it
from the first as in Exercise . Finally, positivity requires that
Z b
2
kf k = hf ;f i =
f (x)2 dx 0.
a
1/12/04
92
c 2003
Peter J. Olver
Figure 3.2.
This is clear because f (x)2 0, and the integral of a nonnegative function is nonnegative.
Moreover, since the function f (x)2 is continuous and nonnegative, its integral will vanish,
Z b
f (x)2 dx = 0 if and only if f (x) 0 is the zero function, cf. Exercise . This completes
a
the demonstration.
Remark : The preceding construction applies to more general functions, but we have
restricted our attention to continuous functions to avoid certain technical complications.
The most general function space admitting this important inner product is known as
Hilbert space, which forms the foundation for modern analysis, [126], including the rigorous
theory of Fourier series, [51], and also lies at the heart of modern quantum mechanics,
[100, 104, 122]. One does need to be extremely careful when trying to extend the inner
product to more general functions. Indeed, there are nonzero, discontinuous functions with
zero L2 norm. An example is
Z 1
1,
x = 0,
2
which satisfies
kf k =
f (x)2 dx = 0
(3.13)
f (x) =
0,
otherwise,
1
because any function which is zero except at finitely many (or even countably many) points
has zero integral. We will discuss some of the details of the Hilbert space construction in
Chapters 12 and 13.
One can also define weighted inner products on the function space C0 [ a, b ]. The
weights along the interval are specified by a (continuous) positive scalar function w(x) > 0.
The corresponding weighted inner product and norm are
s
Z b
Z b
(3.14)
hf ;gi =
f (x) g(x) w(x) dx,
kf k =
f (x)2 w(x) dx .
a
The verification of the inner product axioms in this case is left as an exercise for the reader.
3.2. Inequalities.
1/12/04
93
c 2003
Peter J. Olver
Returning to the general framework of inner products on vector spaces, we now prove
the most important inequality in applied mathematics. Its origins can be found in the
geometric interpretation of the dot product on Euclidean space in terms of the angle
between vectors.
The CauchySchwarz Inequality
In two and three-dimensional Euclidean geometry, the dot product between two vectors can be geometrically characterized by the equation
v w = k v k k w k cos ,
(3.15)
where measures the angle between the vectors v and w, as depicted in Figure 3.2. Since
| cos | 1,
the absolute value of the dot product is bounded by the product of the lengths of the
vectors:
| v w | k v k k w k.
This fundamental inequality is named after two of the founders of modern analysis, Augustin Cauchy and Herman Schwarz. It holds, in fact, for any inner product.
Theorem 3.5. Every inner product satisfies the CauchySchwarz inequality
| h v ; w i | k v k k w k,
v, w V.
(3.16)
Here, k v k is the associated norm, while | | denotes absolute value. Equality holds if and
only if v and w are parallel vectors.
Proof : The case when w = 0 is trivial, since both sides of (3.16) are equal to 0. Thus,
we may suppose w 6
= 0. Let t R be an arbitrary scalar. Using the three basic inner
product axioms, we have
0 k v + t w k 2 = h v + t w ; v + t w i = k v k 2 + 2 t h v ; w i + t 2 k w k2 ,
(3.17)
and thus at
t=
hv;wi
.
k w k2
Russians also give credit for its discovery to their compatriot Viktor Bunyakovskii, and,
indeed, many authors append his name to the inequality.
Two vectors are parallel if and only if one is a scalar multiple of the other. The zero vector
is parallel to every other vector, by convention.
1/12/04
94
c 2003
Peter J. Olver
h v ; w i 2 k v k 2 k w k2 .
or
Taking the (positive) square root of both sides of the final inequality completes the theorems proof.
Q.E.D.
Given any inner product on a vector space, we can use the quotient
cos =
hv;wi
kvk kwk
(3.18)
1
1
= ,
2
2 2
and so = 13 , i.e., 60 . Similarly, the angle between the polynomials p(x) = x and
q(x) = x2 defined on the interval I = [ 0, 1 ] is given by
Z 1
r
1
x3 dx
2
hx;x i
15
4
0
s
cos =
= sZ
,
= q q =
2
Z
kxk kx k
16
1
1
1
1
3
5
x2 dx
x4 dx
0
1
2
Even in Euclidean space R n , the measurement of angle (and length) depends upon
the choice of an underlying inner product. Different inner products lead to different angle
measurements; only for the standard Euclidean dot product does angle correspond to our
everyday experience.
1/12/04
95
c 2003
Peter J. Olver
Orthogonal Vectors
A particularly important geometrical configuration occurs when two vectors are perpendicular , which means that they meet at a right angle: = 21 or 23 , and so cos = 0.
The angle formula (3.18) implies that the vectors v, w are perpendicular if and only if
their dot product vanishes: v w = 0. Perpendicularity also plays a key role in general
inner product spaces, but, for historical reasons, has been given a different name.
Definition 3.6. Two elements v, w V of an inner product space V are called
orthogonal if their inner product h v ; w i = 0.
Orthogonality is a remarkably powerful tool in all applications of linear algebra, and
often serves to dramatically simplify many computations. We will devote Chapter 5 to its
detailed development.
T
6
1
= 2 1 6 + 5 2 (3) = 18 6
= 0.
;
hv;wi =
3
2
Thus, orthogonality, like angles in general, depends upon which inner product is being
used.
Example 3.8. The polynomials p(x) = x and q(x) = x2 12 are orthogonal with
Z 1
respect to the inner product h p ; q i =
p(x) q(x) dx on the interval [ 0, 1 ], since
0
x; x
1
2
1
0
x x
1
2
dx =
1
0
x3
1
2
x dx = 0.
They fail to be orthogonal on most other intervals. For example, on the interval [ 0, 2 ],
Z 2
Z 2
2 1
3 1
2
1
x x 2 dx =
x; x 2 =
x 2 x dx = 3.
0
The familiar triangle inequality states that the length of one side of a triangle is at
most equal to the sum of the lengths of the other two sides. Referring to Figure 3.3, if the
first two side are represented by vectors v and w, then the third corresponds to their sum
v + w, and so k v + w k k v k + k w k. The triangle inequality is a direct consequence of
the CauchySchwarz inequality, and hence holds for any inner product space.
Theorem 3.9.
inequality
(3.19)
for every v, w V . Equality holds if and only if v and w are parallel vectors.
1/12/04
96
c 2003
Peter J. Olver
v+w
w
Triangle Inequality.
Figure 3.3.
Proof : We compute
k v + w k2 = h v + w ; v + w i = k v k 2 + 2 h v ; w i + k w k 2
2
k v k2 + 2 k v k k w k + k w k 2 = k v k + k w k ,
where the inequality follows from CauchySchwarz. Taking square roots of both sides and
using positivity completes the proof.
Q.E.D.
1
2
3
kf k =
1
0
(x 1)2 dx =
kf + gk =
The triangle inequality requires
1
,
3
s
77
60
kgk =
Z
(x2 + x)2 dx =
0
1
3
23
15
(x2 + 1)2 dx =
0
23
,
15
77
.
60
, which is correct.
The CauchySchwarz and triangle inequalities look much more impressive when writ1/12/04
97
c 2003
Peter J. Olver
ten out in full detail. For the Euclidean inner product (3.1), they are
v
v
n
u n
u n
X
uX 2 uX
vi t
wi2 ,
vi wi t
i=1
i=1
i=1
v
v
v
u n
u n
u n
uX 2
uX 2
uX
2
t
(vi + wi ) t
vi + t
wi .
i=1
i=1
(3.20)
i=1
Theorems 3.5 and 3.9 imply that these inequalities are valid for arbitrary real numbers
v1 , . . . , vn , w1 , . . . , wn . For the L2 inner product (3.12) on function space, they produce
the following splendid integral inequalities:
s
s
Z
Z b
Z b
b
2
f (x) dx
g(x)2 dx ,
f (x) g(x) dx
a
a
(3.21)
s
s
s
Z b
Z b
Z b
2
f (x) + g(x) dx
f (x)2 dx +
g(x)2 dx ,
a
which hold for arbitrary continuous (and even more general) functions. The first of these is
the original CauchySchwarz inequality, whose proof appeared to be quite deep when it first
appeared. Only after the abstract notion of an inner product space was properly formalized
did its innate simplicity and generality become evident. One can also generalize either of
these sets of inequalities to weighted inner products, replacing the integration element dx
by a weighted version w(x) dx, provided w(x) > 0.
3.3. Norms.
Every inner product gives rise to a norm that can be used to measure the magnitude
or length of the elements of the underlying vector space. However, not every such norm
used in analysis and applications arises from an inner product. To define a general norm
on a vector space, we will extract those properties that do not directly rely on the inner
product structure.
Definition 3.12. A norm on the vector space V assigns a real number k v k to each
vector v V , subject to the following axioms for all v, w V , and c R:
(i ) Positivity: k v k 0, with k v k = 0 if and only if v = 0.
(ii ) Homogeneity: k c v k = | c | k v k.
(iii ) Triangle inequality: k v + w k k v k + k w k.
As we now know, every inner product gives rise to a norm. Indeed, positivity of the
norm is one of the inner product axioms. The homogeneity property follows since
p
p
p
kcvk =
hcv;cvi =
c2 h v ; v i = | c | h v ; v i = | c | k v k.
Finally, the triangle inequality for an inner product norm was established in Theorem 3.9.
Here are some important examples of norms that do not come from inner products.
1/12/04
98
c 2003
Peter J. Olver
is
(3.22)
The max or norm is equal to the maximal entry (in absolute value):
k v k = sup { | v1 |, . . . , | vn | }.
(3.23)
Verification of the positivity and homogeneity properties for these two norms is straightforward; the triangle inequality is a direct consequence of the elementary inequality
|a + b| |a| + |b|
for absolute values.
The Euclidean norm, 1norm, and norm on R n are just three representatives of
the general pnorm
v
u n
uX
p
| v i |p .
(3.24)
k v kp = t
i=1
This quantity defines a norm for any 1 p < . The norm is a limiting case of
the pnorm as p . Note that the Euclidean norm (3.3) is the 2norm, and is often
designated as such; it is the only pnorm which comes from an inner product. The positivity
and homogeneity properties of the pnorm are straightforward. The triangle inequality,
however, is not trivial; in detail, it reads
v
v
v
u n
u n
u n
X
u
uX
uX
p
p
p
p
p
t
t
| vi + wi |
| vi | + t
| w i |p ,
(3.25)
i=1
i=1
i=1
In particular, the L1 norm is given by integrating the absolute value of the function:
Z b
k f k1 =
| f (x) | dx.
(3.27)
a
The L norm (3.12) appears as a special case, p = 2, and, again, is the only one arising from
an inner product. The proof of the general triangle or Minkowski inequality for p 6
= 1, 2 is
99
(3.28)
c 2003
Peter J. Olver
k p k = max | 3 x2 2 | : 1 x 1 = 2,
| 3 x2 2 | dx
1
Z
2/3
Z 2/3
(3 x 2) dx +
2
(2 3 x ) dx + (3 x 2) dx
2/3
1
2/3
q
q
q
2
= 34 23 1 + 83 23 + 43 23 1 = 16
3
3 2 = 2.3546 . . . .
(3.29)
For the standard dot product norm, we recover the usual notion of distance between points
in Euclidean space. Other types of norms produce alternative (and sometimes quite useful)
notions of distance that, nevertheless, satisfy all the familiar distance axioms. Notice that
distance is symmetric, d(v, w) = d(w, v). Moreover, d(v, w) = 0 if and only if v = w.
The triangle inequality implies that
d(v, w) d(v, z) + d(z, w)
(3.30)
v kvk
=
= 1.
kuk =
kvk kvk
1/12/04
100
c 2003
Q.E .D.
Peter J. Olver
T
Example 3.17. The vector v = ( 1, 2 ) has length k v k2 = 5 with respect to
the standard Euclidean norm. Therefore, the unit vector pointing in the same direction as
v is
1 !
v
1
1
5
=
u=
.
=
2
k v k2
5
2
5
1
2
1
3
32
is the unit vector parallel to v in the 1 norm. Finally, k v k = 2, and hence the corresponding unit vector for the norm is
1 !
1
v
1
2
b=
.
=
=
u
k v k
2 2
1
Thus, the notion of unit vector will depend upon which norm is being used.
2 1 2
4
7
2
1
k p k2 =
.
x 2 dx =
x x + 4 dx =
60
0
0
p(x)
2
15 is a unit polynomial, k u k = 1, which is
= 60
x
2
7
7
kpk
parallel to (or, more correctly, a scalar multiple of) the polynomial p. On the other
hand, for the L norm,
k p k = max x2 21 0 x 1 = 12 ,
Therefore, u(x) =
S1 = k u k = 1 V.
(3.31)
Thus, the unit sphere for the Euclidean norm on R n is the usual round sphere
S1 = { x R n | x1 = 1 or x2 = 1 or . . . or xn = 1 } .
For the 1 norm, it is the unit diamond or octahedron
S1 = { x R n | | x 1 | + | x 2 | + + | x n | = 1 } .
1/12/04
101
c 2003
Peter J. Olver
-1
0.5
0.5
0.5
-0.5
0.5
-1
-0.5
0.5
-1
-0.5
0.5
-0.5
-0.5
-0.5
-1
-1
-1
Figure 3.4.
In all cases, the closed unit ball B1 = k u k 1 consists of all vectors of norm less
than or equal to 1, and has the unit sphere as its boundary. If V is a finite-dimensional
normed vector space, then the unit ball B1 forms a compact subset, meaning that it is
closed and bounded. This topological fact, which is not true in infinite-dimensional spaces,
underscores the fundamental distinction between finite-dimensional vector analysis and the
vastly more complicated infinite-dimensional realm.
Equivalence of Norms
While there are many different types of norms, in a finite-dimensional vector space
they are all more or less equivalent. Equivalence does not mean that they assume the same
value, but rather that they are, in a certain sense, always close to one another, and so for
most analytical purposes can be used interchangeably. As a consequence, we may be able
to simplify the analysis of a problem by choosing a suitably adapted norm.
Theorem 3.19. Let k k1 and k k2 be any two norms on R n . Then there exist
positive constants c? , C ? > 0 such that
c? k v k 1 k v k 2 C ? k v k 1
for every
v Rn.
(3.32)
Proof : We just sketch the basic idea, leaving the details to a more rigorous real analysis course, cf. [125, 126]. We begin by noting that a norm defines a continuous function
n
f (v) =
k v k on R . (Continuity is, in fact, a consequence of the triangle inequality.) Let
S1 = k u k1 = 1 denote the unit sphere of the first norm. Any continuous function defined on a compact set achieves both a maximum and a minimum value. Thus, restricting
the second norm function to the unit sphere S1 of the first norm, we can set
c? = k u? k2 = min { k u k2 | u S1 } ,
C ? = k U? k2 = max { k u k2 | u S1 } ,
(3.33)
for certain vectors u? , U? S1 . Note that 0 < c? C ? < , with equality holding if and
only if the the norms are the same. The minimum and maximum (3.33) will serve as the
constants in the desired inequalities (3.32). Indeed, by definition,
c? k u k 2 C ?
1/12/04
102
when
k u k1 = 1,
(3.34)
c 2003
Peter J. Olver
Figure 3.5.
Equivalence of Norms.
and so (3.32) is valid for all u S1 . To prove the inequalities in general, assume v 6
= 0.
(The case v = 0 is trivial.) Lemma 3.16 says that u = v/k v k1 S1 is a unit vector
in the first norm: k u k1 = 1. Moreover, by the homogeneity property of the norm,
k u k2 = k v k2 /k v k1 . Substituting into (3.34) and clearing denominators completes the
proof of (3.32).
Q.E.D.
Example 3.20. For example, consider the Euclidean norm k k2 and the max norm
k k on R n . According to (3.33), the bounding constants are found by minimizing and
maximizing k u k = max{ | u1 |, . . . , | un | } over all unit vectors k u k2 = 1 on the (round)
unit sphere. Its maximal value is obtained at the poles, whenU? = ek , with
k ek k = 1.
1
1
Thus, C ? = 1. The minimal value is obtained when u? = , . . . ,
has all equal
n
n
(3.35)
One can interpret these inequalities as follows. Suppose v is a vector lying on the unit
sphere in the Euclidean norm, so k v k2 = 1. Then (3.35) tells us that its norm is
bounded from above and below by 1/ n k v k 1. Therefore, the unit Euclidean
sphere sits inside the unit sphere in the norm, and outside the sphere of radius 1/ n.
Figure 3.5 illustrates the two-dimensional situation.
One significant consequence of the equivalence of norms is that, in R n , convergence is
independent of the norm. The following are all equivalent to the standard convergence
of a sequence u(1) , u(2) , u(3) , . . . of vectors in R n :
(a) the vectors converge: u(k) u? :
(k)
(b) the individual components all converge: ui u?i for i = 1, . . . , n.
(c) the difference in norms goes to zero: k u(k) u? k 0.
The last case, called convergence in norm, does not depend on which norm is chosen.
Indeed, the basic inequality (3.32) implies that if one norm goes to zero, so does any other
1/12/04
103
c 2003
Peter J. Olver
norm. An important consequence is that all norms on R n induce the same topology
convergence of sequences, notions of open and closed sets, and so on. None of this is true in
infinite-dimensional function space! A rigorous development of the underlying topological
and analytical properties of compactness, continuity, and convergence is beyond the scope
of this course. The motivated student is encouraged to consult a text in real analysis, e.g.,
[125, 126], to find the relevant definitions, theorems and proofs.
Example 3.21. Consider the infinite-dimensional vector space C0 [ 0, 1 ] consisting of
all continuous functions on the interval [ 0, 1 ]. The functions
(
1 n x,
0 x n1 ,
fn (x) =
1
0,
n x 1,
have identical L norms
k fn k = sup { | fn (x) | | 0 x 1 } = 1.
On the other hand, their L2 norm
s
s
Z 1
Z
2
fn (x) dx =
k f n k2 =
0
1/n
0
1
(1 n x)2 dx =
3n
goes to zero as n . This example shows that there is no constant C ? such that
k f k C ? k f k2
for all f C0 [ 0, 1 ]. The L and L2 norms on C0 [ 0, 1 ] are not equivalent there exist
functions which have unit L2 norm but arbitrarily small L norm. Similar inequivalence
properties apply to all of the other standard function space norms. As a result, the topology
on function space is intimately connected with the underlying choice of norm.
n
X
xi e i ,
y = y 1 e1 + + y n en =
i=1
104
n
X
yj ej . (3.36)
j =1
c 2003
Peter J. Olver
Let us carefully analyze the three basic inner product axioms, in order. We use the
bilinearity of the inner product to expand
+
* n
n
n
X
X
X
xi yj h ei ; ej i.
yj e j =
xi e i ;
hx;yi =
i,j = 1
j =1
i=1
n
X
kij xi yj = xT K y,
(3.37)
i,j = 1
where K denotes the n n matrix of inner products of the basis vectors, with entries
kij = h ei ; ej i,
i, j = 1, . . . , n.
(3.38)
We conclude that any inner product must be expressed in the general bilinear form (3.37).
The two remaining inner product axioms will impose certain conditions on the inner
product matrix K. The symmetry of the inner product implies that
kij = h ei ; ej i = h ej ; ei i = kji ,
i, j = 1, . . . , n.
kxk = hx;xi = x Kx =
n
X
i,j = 1
kij xi xj 0
for all
x Rn,
(3.39)
with equality if and only if x = 0. The precise meaning of this positivity condition on the
matrix K is not as immediately evident, and so will be encapsulated in the following very
important definition.
Definition 3.22. An n n matrix K is called positive definite if it is symmetric,
K = K, and satisfies the positivity condition
T
xT K x > 0
for all
06
= x R n.
(3.40)
We will sometimes write K > 0 to mean that K is a symmetric, positive definite matrix.
Warning: The condition K > 0 does not mean that all the entries of K are positive.
There are many positive definite matrices which have some negative entries see Example 3.24 below. Conversely, many symmetric matrices with all positive entries are not
positive definite!
1/12/04
105
c 2003
Peter J. Olver
x, y R n ,
for
(3.41)
q(x) = x K x =
n
X
kij xi xj ,
(3.42)
i,j = 1
for all
06
= x R n.
(3.43)
Thus, a quadratic form is positive definite if and only if its coefficient matrix is.
4 2
has two negExample 3.24. Even though the symmetric matrix K =
2 3
ative entries, it is, nevertheless, a positive definite matrix. Indeed, the corresponding
quadratic form
2
q(x) = xT K x = 4 x21 4 x1 x2 + 3 x22 = 2 x1 x2 + 2 x22 0
is a sum of two non-negative quantities. Moreover, q(x) = 0 if and only if both terms are
zero, which requires that 2 x1 x2 = 0 and x2 = 0, whereby x1 = 0 also. This proves
positivity for all nonzero x, and hence K > 0 is indeed a positive definite matrix. The
corresponding inner product on R 2 is
4 2
y1
= 4 x 1 y1 2 x 1 y2 2 x 2 y1 + 3 x 2 y2 .
h x ; y i = ( x 1 x2 )
y2
2 3
1 2
On the other hand, despite the fact that the matrix K =
has all positive
2 1
entries, it is not a positive definite matrix. Indeed, writing out
q(x) = xT K x = x21 + 4 x1 x2 + x22 ,
we find, for instance, that q(1, 1) = 2 < 0, violating positivity. These two simple
examples should be enough to convince the reader that the problem of determining whether
a given symmetric matrix is or is not positive definite is not completely elementary.
Exercise shows that the coefficient matrix K in any quadratic form can be taken to be
symmetric without any loss of generality.
1/12/04
106
c 2003
Peter J. Olver
With a little practice, it is not difficult to read off the coefficient matrix K from the
explicit formula for the quadratic form (3.42).
Example 3.25. Consider the quadratic form
q(x, y, z) = x2 + 4 x y + 6 y 2 2 x z + 9 z 2
depending upon three variables. The corresponding coefficient matrix
1
1 2 1
whereby
q(x, y, z) = ( x y z )
2
K=
2 6 0
1
1 0 9
is
2
6
0
1
x
0
y .
9
z
Note that the squared terms in q contribute directly to the diagonal entries of K, while the
mixed terms are split in half to give the symmetric off-diagonal entries. The reader might
wish to try proving that this particular matrix is positive definite by proving positivity of
T
the quadratic form: q(x, y, z) > 0 for all nonzero ( x, y, z ) R 3 . Later, we will establish
a systematic test for positive definiteness.
Slightly more generally, a quadratic form and its associated symmetric coefficient
matrix are called positive semi-definite if
q(x) = xT K x 0
for all
x R n.
(3.44)
A positive semi-definite matrix may have null directions, meaning non-zero vectors z such
that q(z) = zT K z = 0. Clearly any vector z ker K that lies in the matrixs kernel
defines a null direction, but there may be others. In particular, a positive definite matrix
is not allowed to have null directions, so ker K = {0}. Proposition 2.39 implies that all
positive definite matrices are invertible.
Theorem 3.26. All positive definite matrices K are non-singular.
1 1
Example 3.27. The matrix K =
is positive semi-definite, but not
1 1
positive definite. Indeed, the associated quadratic form
q(x) = xT K x = x21 2 x1 x2 + x22 = (x1 x2 )2 0
is a perfect square, and so clearly non-negative. However, the elements of ker K, namely
T
the scalar multiples of the vector ( 1 1 ) , define null directions, since q(1, 1) = 0.
a b
Example 3.28. A general symmetric 2 2 matrix K =
is positive definite
b c
if and only if the associated quadratic form satisfies
q(x) = a x21 + 2 b x1 x2 + c x22 > 0
(3.45)
for all x 6
= 0. Analytic geometry tells us that this is the case if and only if
a c b2 > 0,
a > 0,
(3.46)
i.e., the quadratic form has positive leading coefficient and positive determinant (or negative discriminant). A direct proof of this elementary fact will appear shortly.
1/12/04
107
c 2003
Peter J. Olver
h v1 ; v 1 i h v1 ; v 2 i . . . h v 1 ; v n i
h v2 ; v 1 i h v2 ; v 2 i . . . h v 2 ; v n i
.
K=
(3.47)
..
..
..
..
.
.
.
.
h vn ; v 1 i h vn ; v 2 i . . . h v n ; v n i
is the n n matrix whose entries are the inner products between the chosen vector space
elements.
Symmetry of the inner product implies symmetry of the Gram matrix:
kij = h vi ; vj i = h vj ; vi i = kji ,
and hence
K T = K.
(3.48)
In fact, the most direct method for producing positive definite and semi-definite matrices
is through the Gram matrix construction.
Theorem 3.30. All Gram matrices are positive semi-definite. A Gram matrix is
positive definite if and only if the elements v1 , . . . , vn V are linearly independent.
Proof : To prove positive (semi-)definiteness of K, we need to examine the associated
quadratic form
n
X
T
q(x) = x K x =
kij xi xj .
i,j = 1
n
X
i,j = 1
h v i ; v j i x i xj .
Bilinearity of the inner product on V implies that we can assemble this summation into a
single inner product
* n
+
n
X
X
q(x) =
xi v i ;
xj v j
= h v ; v i = k v k2 0,
i=1
1/12/04
j =1
108
c 2003
Peter J. Olver
where
v = x 1 v1 + + x n vn
lies in the subspace of V spanned by the given vectors. This immediately proves that K
is positive semi-definite.
Moreover, q(x) = k v k2 > 0 as long as v 6
= 0. If v1 , . . . , vn are linearly independent,
then v = 0 if and only if x1 = = xn = 0, and hence, in this case, q(x) and K are
positive definite.
Q.E.D.
3
1
Example 3.31. Consider the vectors v1 = 2 , v2 = 0 in R 3 . For the
6
1
standard Euclidean dot product, the Gram matrix is
6 3
v1 v 1 v1 v 2
.
=
K=
v2 v 1 v2 v 2
3 45
Positive definiteness implies that the associated quadratic form
q(x1 , x2 ) = 6 x21 6 x1 x2 + 45 x22 > 0
is positive for all (x1 , x2 ) 6
= 0. This can be checked directly using the criteria in (3.46).
On the other hand, if we use the weighted inner product h x ; y i = 3 x1 y1 + 2 x2 y2 +
5 x3 y3 , then the corresponding Gram matrix is
16 21
h v1 ; v 1 i h v1 ; v 2 i
,
=
K=
21 207
h v2 ; v 1 i h v2 ; v 2 i
which, by construction, is also positive definite.
In the case of the Euclidean dot product, the construction of the Gram matrix K
can be directly implemented as follows. Given vectors v1 , . . . , vn R m , let us form the
mn matrix A = ( v1 v2 . . . vn ) whose columns are the vectors in question. Owing to the
identification (3.2) between the dot product and multiplication of row and column vectors,
the (i, j) entry of K is given as the product
kij = vi vj = viT vj
of the ith row of the transpose AT with the j th column of A. In other words, the Gram
matrix
K = AT A
(3.49)
is the matrix
A=
2
1
1 3
3
1
2
1
6
3
T
2 0 =
and so
K=A A=
0 ,
.
3 0 6
3 45
6
1 6
Theorem 3.30 implies that the Gram matrix (3.49) is positive definite if and only if
the columns of A are linearly independent. This implies the following result.
1/12/04
109
c 2003
Peter J. Olver
if we
set
3 0 0
1
3
C = 0 2 0 , then the weighted Gram matrix based on the vectors 2 , 0
0 0 5
1
6
of Example 3.31 is
3 0 0
1 3
16
21
1 2 1
T
,
0 2 0 2 0 =
K =A CA=
21 207
3 0 6
0 0 5
1 6
reproducing the second part of Example 3.31.
1/12/04
110
c 2003
Peter J. Olver
The Gram construction also carries over to inner products on function space. Here is
a particularly important example.
Example 3.35. Consider vector space C0 [ 0, 1 ] consisting of continuous functions
Z 1
2
on the interval 0 x 1, equipped with the L inner product h f ; g i =
f (x) g(x) dx.
0
Let us construct the Gram matrix corresponding to the elementary monomial functions
1, x, x2 . We compute the required inner products
Z 1
Z 1
1
2
h1;1i = k1k =
dx = 1,
h1;xi =
x dx = ,
2
0
0
Z 1
Z 1
1
1
x2 dx = ,
h 1 ; x2 i =
x2 dx = ,
h x ; x i = k x k2 =
3
3
0
0
Z 1
Z 1
1
1
h x2 ; x 2 i = k x 2 k 2 =
h x ; x2 i =
x4 dx = ,
x3 dx = .
5
4
0
0
Therefore, the Gram matrix is
1 12 13
K = 12 31 14 .
1
3
1
4
1
5
The monomial functions 1, x, x2 are linearly independent. Therefore, Theorem 3.30 implies
that this particular matrix is positive definite.
The alert reader may recognize this Gram matrix K = H3 as the 3 3 Hilbert matrix
that we encountered in (1.67). More generally, the Gram matrix corresponding to the
monomials 1, x, x2 , . . . , xn has entries
Z 1
1
i
j
xi+j dt =
,
i, j = 0, . . . , n.
kij = h x ; x i =
i+j+1
0
Therefore, the monomial Gram matrix K = Hn+1 is the (n + 1) (n + 1) Hilbert matrix
(1.67). As a consequence of Theorems 3.26 and 3.33, we have proved the following nontrivial result.
Proposition 3.36. The n n Hilbert matrix Hn is positive definite. In particular,
Hn is a nonsingular matrix.
Example 3.37. Let us construct the Gram matrixZcorresponding to the functions
Z
2
h cos x ; cos x i = k cos x k =
cos2 x dx = ,
Z
sin2 x dx = ,
h sin x ; sin x i = k sin x k2 =
1/12/04
111
h 1 ; cos x i =
h 1 ; sin x i =
cos x dx = 0,
sin x dx = 0,
Z
cos x sin x dx = 0.
h cos x ; sin x i =
c 2003
Peter J. Olver
0
0 . Positive
(3.52)
and, later, in the integration of various types of rational functions. The key idea is to
combine the first two terms in (3.52) as a perfect square, and so rewrite the quadratic
function in the form
2
b
a c b2
= 0.
(3.53)
q(x) = a x +
+
a
a
As a consequence,
b
x+
a
b2 a c
.
a2
b2 a c
a
follows by taking the square root of both sides and then solving for x. The intermediate
step (3.53), where we eliminate the linear term, is known as completing the square.
We can perform the same manipulation on the corresponding homogeneous quadratic
form
(3.54)
q(x1 , x2 ) = a x21 + 2 b x1 x2 + c x22 .
x=
We write
q(x1 , x2 ) =
a x21
+ 2 b x 1 x2 +
c x22
b
= a x1 + x2
a
a c b2 2
a c b2 2
x2 = a y12 +
y2
a
a
(3.55)
a c b2
> 0.
a
a > 0,
1/12/04
112
c 2003
Peter J. Olver
This proves that conditions (3.46) are necessary and sufficient for the quadratic form (3.45)
to be positive definite.
How this simple idea can be generalized to the multi-variable case will become clear
if we write the quadratic form identity (3.55) in matrix form. The original quadratic form
(3.54) is
x1
a b
T
,
x=
.
(3.57)
q(x) = x K x,
where
K=
x2
b c
The second quadratic form in (3.55) is
!
0
y1
T
2
qb (y) = y D y,
where
D=
,
y=
.
(3.58)
ac b
y2
a
Anticipating the final result, the equation connecting x and y can be written in matrix
form as
!
b
1 0
y
1
.
where
LT = b
= x1 + a x2 ,
y = LT x
or
y2
1
x2
a
Substituting into (3.58), we find
a
0
a b
T
that appears in (1.56). We are
is precisely the L D L factorization of K =
b c
thus led to the important conclusion that completing the square is the same as the L D L T
factorization of a symmetric matrix , obtained through Gaussian elimination!
Recall the definition of a regular matrix as one that can be reduced to upper triangular
form without any row interchanges;Theorem 1.32 says that these are the matrices admitting
an L D LT factorization. The identity (3.59) is therefore valid for all regular nn symmetric
matrices, and shows how to write the associated quadratic form as a sum of squares:
qb (y) = yT D y = d1 y12 + + dn yn2 .
(3.60)
The coefficients di are the pivots of K. In particular, according to Exercise , qb (y) > 0
is positive definite if and only if all the pivots are positive: d i > 0. Let us now state the
main result that completely characterizes positive definite matrices.
Theorem 3.38. A symmetric matrix K is positive definite if and only if it is regular
and has all positive pivots. Consequently, K is positive definite if and only if it can be
factored K = L D LT , where L is special lower triangular, and D is diagonal with all
positive diagonal entries.
1 2 1
Example 3.39. Consider the symmetric matrix K = 2 6 0 . Gaussian
1 0 9
elimination produces the factors
1 2 1
1 0 0
1 0 0
LT = 0 1 1 .
D = 0 2 0,
L = 2 1 0,
0 0 1
0 0 6
1 1 1
1/12/04
113
c 2003
Peter J. Olver
for all
x = ( x 1 , x2 , x3 ) 6
= 0.
Indeed, the L D LT factorization implies that q(x) can be explicitly written as a sum of
squares:
q(x) = y12 + 2 y22 + 6 y32 ,
where
y1 = x 1 + 2 x 2 x 3 , y 2 = x 2 + x 3 ,
1
1 0 0
1 0 0
1 2 3
0
0 2 0
K= 2 6 2 = 2 1 0
0
0 0 9
3 2 1
3 2 8
y3 = x3 ,
2
1
0
3
2 ,
1
the fact that D has a negative diagonal entry, 9, implies that K is not positive definite
even though all its entries are positive. The associated quadratic form is
q(x) = x21 + 4 x1 x2 + 6 x1 x3 + 6 x22 + 4 x2 x3 + 8 x23
is not positive definite since, for instance, q(5, 2, 1) = 9 < 0.
The only remaining issue is to show that an irregular matrix cannot be positive
defi0 1
,
nite. For example, the quadratic form corresponding to the irregular matrix K =
1 0
is q(x) = 2 x1 x2 , which is clearly not positive definite, e.g., q(1, 1) = 2. In general,
if the upper left entry k11 = 0, then it cannot serve as the first pivot, and so K is not
regular. But then q(e1 ) = eT1 K e1 = 0, and so K is not positive definite. (It may be
positive semi-definite, or, more likely, indefinite.)
Otherwise, if k11 6
= 0, then we use Gaussian elimination to make all entries lying in
the first column below the pivot equal to zero. As remarked above, this is equivalent to
completing the square in the initial terms of the associated quadratic form
q(x) = k11 x21 + 2 k12 x1 x2 + + 2 k1n x1 xn + k22 x22 + + knn x2n
2
k12
k1n
= k11 x1 +
+ qe(x2 , . . . , xn )
x + +
x
k11 2
k11 n
where
(3.61)
k21
k
k
k
= 12 ,
...
ln1 = n1 = 1n ,
k11
k11
k11
k11
are precisely the multiples appearing in the first column of the lower triangular matrix L
obtained from Gaussian Elimination, while
l21 =
1/12/04
qe(x2 , . . . , xn ) =
114
n
X
i,j = 2
e
k ij xi xj
c 2003
Peter J. Olver
is a quadratic form involving one fewer variable. The entries of its symmetric coefficient
e are
matrix K
e
k ij = e
k ji = kij lj1 k1i ,
for
i j.
e that lie on or below the diagonal are exactly the same as the entries
Thus, the entries of K
appearing on or below the diagonal of K after the the first phase of the elimination process.
In particular, the second pivot of K is the entry e
k 22 that appears in the corresponding slot
e
in K. If qe is not positive definite, then q cannot be positive definite. Indeed, suppose that
there exist x?2 , . . . , x?n , not all zero, such that qe(x?2 , . . . , x?n ) 0. Setting
x?1 = l21 x?2 ln1 x?n ,
makes the initial square term in (3.61) equal to 0, so q(x?1 , x?2 , . . . , x?n ) = qe(x?2 , . . . , x?n ) 0.
In particular, if the second diagonal entry e
k 22 = 0, then qe is not positive definite, and so
neither is q. Continuing this process, if any diagonal entry of the reduced matrix vanishes,
then the reduced quadratic form cannot be positive definite, and so neither can q. This
demonstrates that if K is irregular, then it cannot be positive definite, which completes
the proof of Theorem 3.38.
The Cholesky Factorization
The identity (3.59) shows us how to write any regular quadratic form q(x) as a sum
of squares. One can push this result slightly further in the positive definite case. Since
each pivot di > 0, we can write the diagonal form (3.60) as a sum of squares with unit
coefficients:
2
p
2
p
qb (y) = d1 y12 + + dn yn2 =
d 1 y1 + +
dn yn = z12 + + zn2 ,
p
where zi = di yi . In matrix form, we are writing
qb (y) = yT D y = zT z = k z k2 ,
where
z = C y,
with C = diag (
p
p
d 1 , . . . , dn )
Since D = C 2 , the matrix C can be thought of as a square root of the diagonal matrix
D. Substituting back into (1.52), we deduce the Cholesky factorization
K = L D L T = L C C T LT = M M T ,
where
M = LC
(3.62)
where
z = M T x.
(3.63)
One can interpret this as a change of variables from x to z that converts an arbitrary inner
product norm, as defined by the square root of the positive definite quadratic form q(x),
into the standard Euclidean norm k z k.
1/12/04
115
c 2003
Peter J. Olver
1 2 1
Example 3.40. For the matrix K = 2 6 0 considered in Example 3.39,
1 0 9
T
the Cholesky formula (3.62) gives K = M M , where
1 0
0
1 0 0
1 0
0
M = LC = 2 1 0 0
2 0 = 2 2 0 .
1 1 1
0 0
1
6
2
6
The associated quadratic function can then be written as a sum of pure squares:
are real and i = 1. We call x = Re z the real part of z and y = Im z the imaginary
part. (Note: The imaginary part is the real number y, not i y.) A real number x is
merely a complex number with zero imaginary part: Im z = 0. Complex addition and
multiplication are based on simple adaptations of the rules of real arithmetic to include
the identity i 2 = 1, and so
(x + i y) + (u + i v) = (x + u) + i (y + v),
(x + i y) (u + i v) = (x u y v) + i (x v + y u).
(3.64)
1/12/04
116
c 2003
Peter J. Olver
Complex numbers enjoy all the usual laws of real addition and multiplication, including
commutativity: z w = w z.
T
We can identity a complex number x+ i y with a vector ( x, y ) R 2 in the real plane.
Complex addition (3.64) corresponds to vector addition, but complex multiplication does
not have a readily identifiable vector counterpart.
Another important operation on complex numbers is that of complex conjugation.
Definition 3.41. The complex conjugate of z = x + i y is z = x i y, whereby
Re z = Re z, Im z = Im z.
Geometrically, the operation of complex conjugation coincides with reflection of the
corresponding vector through the real axis, as illustrated in Figure 3.6. In particular z = z
if and only if z is real. Note that
Re z =
z+z
,
2
Im z =
zz
.
2i
(3.65)
z w = z w.
(3.66)
is real and non-negative. Its square root is known as the modulus of the complex number
z = x + i y, and written
p
| z | = x2 + y 2 .
(3.67)
Note that | z | 0, with | z | = 0 if and only if z = 0. The modulus | z | generalizes the
absolute value of a real number, and coincides with the standard Euclidean norm in the
(x, y)plane. This implies the validity of the triangle inequality
| z + w | | z | + | w |.
(3.68)
(3.69)
Rearranging the factors, we deduce the formula for the reciprocal of a nonzero complex
number:
1
z
=
,
z
| z |2
z6
= 0,
1
x iy
= 2
.
x + iy
x + y2
(3.70)
u + iv
(x u + y v) + i (x v y u)
=
,
x + iy
x2 + y 2
(3.71)
or, equivalently
or, equivalently
117
c 2003
Peter J. Olver
z
r
Complex Numbers.
Figure 3.6.
is an immediate consequence.
The modulus of a complex number,
r = |z| =
p
x2 + y 2 ,
y = r sin
or
z = r(cos + i sin ).
(3.72)
The polar angle, which measures the angle that the line connecting z to the origin makes
with the horizontal axis, is known as the phase, and written
ph z = .
(3.73)
The more common term is the argument, and written arg z = ph z. For various reasons,
and to avoid confusion with the argument of a function, we have chosen to use phase
throughout this text. As such, the phase is only defined up to an integer multiple of 2 .
We note that the modulus and phase of a product of complex numbers can be readily
computed:
| z w | = | z | | w |,
ph (z w) = ph z + ph w.
(3.74)
On the other hand, complex conjugation preserves the modulus, but negates the phase:
| z | = | z |,
ph z = ph z.
(3.75)
(3.76)
relating the complex exponential with the real sine and cosine functions. This basic identity has a variety of mathematical justifications; see Exercise for one that is based on
comparing power series. Eulers formula (3.76) can be used to compactly rewrite the polar
form (3.72) of a complex number as
z = r ei
1/12/04
where
118
r = | z |,
= ph z.
(3.77)
c 2003
Peter J. Olver
Figure 3.7.
e i + e i
,
2
sin =
e i e i
.
2i
(3.78)
These formulae are very useful when working with trigonometric identities and integrals.
The exponential of a general complex number is easily derived from the basic Euler formula and the standard properties of the exponential function which carry over
unaltered to the complex domain; thus,
ez = ex+ i y = ex e i y = ex cos y + i ex sin y.
(3.79)
Graphs of the real and imaginary parts of the complex exponential appear in Figure 3.7.
Note that e2 i = 1, and hence the exponential function is periodic
ez+2 i = ez
(3.80)
119
c 2003
Peter J. Olver
1 2i
1 + 2i
then
z = 3 .
z = 3 ,
5 i
5i
|z| = zz
of a complex scalar z C. If, in analogy with the real definition (3.7), the quantity inside
the square root is to represent the inner product of z with itself, then we should define the
dot product between two complex numbers to be
z w = z w,
so that
z z = z z = | z |2 .
If z = x + i y and w = u + i v, then
z w = z w = (x + i y) (u i v) = (x u + y v) + i (y u x v).
(3.81)
Thus, the dot product of two complex numbers is, in general, complex. The real part of
z w is, in fact, the Euclidean dot product between the corresponding vectors in R 2 , while
the imaginary part is, interestingly, their scalar cross-product, cf. (cross2 ).
The vector version of this construction is named after the nineteenth century French
mathematician Charles Hermite, and called the Hermitian dot product on C n . It has the
explicit formula
w
z
1
1
w2
z2
z w = zT w = z1 w1 + z2 w2 + + zn wn , for z =
.. , w = .. . (3.82)
.
.
wn
zn
On the other hand, in relativity, the Minkowski norm is also not always positive, and
indeed the vectors with zero norm play a critical role as they lie on the light cone emanating from
the origin, [ 106 ].
1/12/04
120
c 2003
Peter J. Olver
1+ i
1 + 2i
second vector. For example, if z =
,w=
, then
3 + 2i
i
z w = (1 + i )(1 2 i ) + (3 + 2 i )( i ) = 5 4 i .
On the other hand,
w z = (1 + 2 i )(1 i ) + i (3 2 i ) = 5 + 4 i .
Therefore, the Hermitian dot product is not symmetric. Reversing the order of the vectors
results in complex conjugation of the dot product:
w z = z w.
But this extra complication does have the effect that the induced norm, namely
p
0 k z k = z z = zT z = | z 1 | 2 + + | z n | 2 ,
(3.83)
1 + 3i
p
z = 2 i ,
| 1 + 3 i |2 + | 2 i |2 + | 5 |2 = 39 .
then
kzk =
5
The Hermitian dot product is well behaved under complex vector addition:
(z + b
z) w = z w + b
z w,
b = z w + z w.
b
z (w + w)
However, while complex scalar multiples can be extracted from the first vector without
alteration, when they multiply the second vector, they emerge as complex conjugates:
(c z) w = c (z w),
z (c w) = c (z w),
c C.
Thus, the Hermitian dot product is not bilinear in the strict sense, but satisfies something
that, for lack of a better name, is known as sesqui-linearity.
The general definition of an inner product on a complex vector space is modeled on
the preceding properties of the Hermitian dot product.
Definition 3.42. An inner product on the complex vector space V is a pairing that
takes two vectors v, w V and produces a complex number h v ; w i C, subject to the
following requirements for all u, v, w V , and c, d C.
(i ) Sesqui-linearity:
h c u + d v ; w i = c h u ; w i + d h v ; w i,
(3.84)
h u ; c v + d w i = c h u ; v i + d h u ; w i.
(ii ) Conjugate Symmetry:
h v ; w i = h w ; v i.
(3.85)
(iii ) Positivity:
k v k2 = h v ; v i 0,
1/12/04
and h v ; v i = 0
121
if and only if v = 0.
c 2003
(3.86)
Peter J. Olver
Thus, when dealing with a complex inner product space, one must pay careful attention to the complex conjugate that appears when the second argument in the inner
product is multiplied by a complex scalar, as well as the complex conjugate that appears
when switching the order of the two arguments.
Theorem 3.43. The CauchySchwarz inequality,
| h v ; w i | k v k k w k,
v, w V.
with | | now denoting the complex modulus, and the triangle inequality
kv + wk kvk + kwk
hold for any complex inner product space.
The proof of this result is practically the same as in the real case, and the details are
left to the reader.
T
k v k = 2 + 4 + 9 = 15,
k w k = 5 + 1 + 8 = 14,
v w = (1 + i )(2 + i ) + 2 i + ( 3)(2 2 i ) = 5 + 11 i .
| h v ; w i | = | 5 + 11 i | = 146 210 = 15 14 = k v k k w k.
T
k v + w k = k ( 3, 1 + 2 i , 1 + 2 i ) k = 9 + 5 + 5 = 19 15 + 14 = k v k + k w k.
| f (x) |2 dx =
sZ
u(x)2 + v(x)2 dx .
(3.88)
The reader should check that (3.87) satisfies the basic Hermitian inner product axioms.
For example, if k, l are integers, then the inner product of the complex exponential
functions e i kx and e i lx is
2 ,
k = l,
Z
Z
h e i kx ; e i lx i =
e i kx e i lx dx =
e i (kl)x dx =
e i (kl)x
= 0,
k6
= l.
i (k l)
i kx
x =
Chapter 4
Minimization and Least Squares Approximation
Because Nature strives to be efficient, many systems arising in applications are founded
on a minimization principle. For example, in a mechanical system, the stable equilibrium
positions minimize the potential energy. The basic geometrical problem of minimizing
distance also appears in many contexts. For example, in optics and relativity, light rays
follow the paths of minimal distance the geodesics on the curved space-time. In data
analysis, the most fundamental method for fitting a function to a set of sampled data
points is to minimize the least squares error, which serves as a measurement of the overall
deviation between the sample data and the function. The least squares paradigm carries
over to a wide range of applied mathematical systems. In particular, it underlies the
theory of Fourier series, in itself of inestimable importance in mathematics, physics and
engineering. Solutions to many of the important boundary value problems arising in
mathematical physics and engineering are also characterized by an underlying minimization
principle. Moreover, the finite element numerical solution method relies on the associated
minimization principle. Optimization is ubiquitous in control theory, engineering design
and manufacturing, linear programming, econometrics, and most other fields of analysis.
This chapter introduces and solves the most basic minimization problem that of a
quadratic function of several variables. The minimizer is found by solving an associated
linear system. The solution to the quadratic minimization problem leads directly to a
broad range of applications, including least squares fitting of data, interpolation, and
approximation of functions. Applications to equilibrium mechanics will form the focus of
Chapter 6. Applications to the numerical solution of differential equations in numerical
analysis will appear starting in Chapter 11. More general nonlinear minimization problems,
which, as usual, require a thorough analysis of the linear situation, will be deferred until
Section 19.3.
123
c 2003
Peter J. Olver
is an unstable equilibrium, meaning that any tiny movement will knock it off balance.
Therefore, a better way of stating the principle is that stable equilibria are where the mechanical system minimizes potential energy. For the ball rolling on a curved surface, the
local minima the bottoms of valleys are the stable equilibria, while the local maxima
the tops of hills are unstable. This basic idea is fundamental to the understanding
and analysis of the equilibrium configurations of a wide range of physical systems, including masses and springs, structures, electrical circuits, and even continuum models of solid
mechanics and elasticity, fluid mechanics, electromagnetism, thermodynamics, statistical
mechanics, and so on.
Solution of Equations
Suppose we wish to solve a system of equations
f1 (x) = 0,
f2 (x) = 0,
...
fm (x) = 0,
(4.1)
2
p(x) = f1 (x) + + fm (x) = k f (x) k2 ,
(4.2)
where k k denotes the Euclidean norm on R m . Clearly, p(x) 0 for all x. Moreover,
p(x? ) = 0 if and only if each summand is zero, and hence x? is a solution to (4.1).
Therefore, the minimum value of p(x) is zero, and the minimum is achieved if and only if
x = x? solves the system (4.1).
The most important case is when we have a linear system
Ax = b
(4.3)
124
c 2003
Peter J. Olver
Thus, the least squares solutions naturally generalize traditional solutions. While not the
only possible method, least squares is is easiest to analyze and solve, and hence, typically,
the method of choice for fitting functions to experimental data and performing statistical
analysis.
The Closest Point
The following minimization problem arises in elementary geometry. Given a point
b R m and a subset V R m , find the point v? V that is closest to b. In other words,
we seek to minimize the distance d(b, v) = k v b k over all possible v V .
The simplest situation occurs when V is a subspace of R m . In this case, the closest
point problem can be reformulated as a least squares minimization problem. Let v 1 , . . . , vn
be a basis for V . The general element v V is a linear combination of the basis vectors.
Applying our handy matrix multiplication formula (2.14), we can write the subspace elements in the form
v = x1 v1 + + xn vn = A x,
where A = ( v1 v2 . . . vn ) is the m n matrix formed by the (column) basis vectors. Note
that we can identify V = rng A with the range of A, i.e., the subspace spanned by its
columns. Consequently, the closest point in V to b is found by minimizing
k v b k2 = k A x b k 2
over all possible x R n . This is exactly the same as the least squares function (4.4)!
Thus, if x? is the least squares solution to the system A x = b, then v ? = A x? is the
closest point to b belonging to V = rng A. In this way, we have established a fundamental
connection between least squares solutions to linear systems and the geometrical problem
of minimizing distances to subspaces.
All three of the preceding minimization problems are solved by the same underlying
mathematical construction, which will be described in detail in Section 4.3.
Remark : We will concentrate on minimization problems. Maximizing a function f (x)
is the same as minimizing its negative f (x), and so can be easily handled by the same
methods.
125
c 2003
Peter J. Olver
-1
-1
a>0
-1
a<0
Figure 4.1.
a=0
Parabolas.
If a > 0, then the graph of p is a parabola pointing upwards, and so there exists a unique
minimum value. If a < 0, the parabola points downwards, and there is no minimum
(although there is a maximum). If a = 0, the graph is a straight line, and there is neither
minimum nor maximum except in the trivial case when b = 0 also, and the function
is constant, with every x qualifying as a minimum and a maximum. The three nontrivial
possibilities are sketched in Figure 4.1.
In the case a > 0, the minimum can be found by calculus. The critical points of a
function, which are candidates for minima (and maxima), are found by setting its derivative
to zero. In this case, differentiating, and solving
p0 (x) = 2 a x + 2 b = 0,
we conclude that the only possible minimum value occurs at
b2
b
,
where
p(x? ) = c .
(4.6)
a
a
Of course, one must check that this critical point is indeed a minimum, and not a maximum
or inflection point. The second derivative test will show that p00 (x? ) = 2 a > 0, and so x?
is at least a local minimum.
A more instructive approach to this problem and one that only requires elementary
algebra is to complete the square. As was done in (3.53), we rewrite
2
b
a c b2
p(x) = a x +
+
.
(4.7)
a
a
x? =
If a > 0, then the first term is always 0, and moreover equals 0 only at x ? = b/a,
reproducing (4.6). The second term is constant, and so unaffected by the value of x. We
conclude that p(x) is minimized when the squared term in (4.7) vanishes. Thus, the simple
algebraic identity (4.7) immediately proves that the global minimum of p is at x ? = b/a,
and, moreover its minimal value p(x? ) = (a c b2 )/a is the constant term.
Now that we have the scalar case firmly in hand, let us turn to the more difficult
problem of minimizing quadratic functions that depend on several variables. Thus, we
seek to minimize a (real) quadratic function
p(x) = p(x1 , . . . , xn ) =
n
X
i,j = 1
1/12/04
126
kij xi xj 2
n
X
fi xi + c,
(4.8)
i=1
c 2003
Peter J. Olver
x? = K 1 f .
namely
(4.10)
(4.11)
Proof : Suppose x? = K 1 f is the (unique why?) solution to (4.10). Then, for any
x R n , we can write
p(x) = xT K x 2 xT f + c = xT K x 2 xT K x? + c
= (x x? )T K(x x? ) + c (x? )T K x? ,
(4.12)
4
1
1
3
127
x1
x2
2 ( x 1 x2 )
32
1
+ 1,
c 2003
Peter J. Olver
whereby
K=
4
1
1
,
3
f=
32
1
(4.13)
(Pay attention to the overall factor of 2 preceding the linear terms.) According to the
theorem, to find the minimum, we must solve the linear system
3 !
2
x1
4 1
.
(4.14)
=
1 3
x2
1
Applying our Gaussian elimination algorithm, only one operation is required to place the
coefficient matrix in upper triangular form:
4 1 32
4 1 32
.
7
5
1 3 1
0 11
4
8
Note that the coefficient matrix is regular (no row interchanges are required) and its two
pivots, namely 4, 11
4 , are both positive; this proves that K > 0 and hence p(x 1 , x2 ) really
does have a minimum, obtained by applying Back Substitution to the reduced system:
!
? 7 !
.318182
x
1
22
=
x? =
.
5
x?2
.227273
22
The quickest way to compute the minimal value
7 5
p(x? ) = p 22
, 22 =
13
44
.295455
p
= 2 x1 + 6 x2 2 = 0.
x2
If we divide by an overall factor of 2, these are precisely the same linear equations we
already constructed in (4.14). Thus, not surprisingly, the calculus approach leads to the
same critical point. To check whether a critical point is a local minimum, we need to test
the second derivative. In the case of a function of several variables, this requires analyzing
the Hessian matrix , which is the symmetric matrix of second order partial derivatives
2p
2p
x21
x
x
8
2
1
2
=
H=
= 2 K,
2p
2 6
2p
x1 x2
x22
which is exactly twice the quadratic coefficient matrix (4.13). If the Hessian matrix is
positive definite which we already know in this case then the critical point is indeed
1/12/04
128
c 2003
Peter J. Olver
a (local) minimum. Thus, the calculus and algebraic approaches to this minimization
problem lead (not surprisingly) to identical results. However, the algebraic method is
more powerful, because it immedaitely produces the unique, global minimum, whereas,
without extra work (e.g., proving convexity of the function), calculus can only guarantee
that the critical point is a local minimum, [9]. The reader can find the full story on
minimization of nonlinear functions, which is, in fact based on the algebraic theory of
positive definite matrices, in Section 19.3.
The most efficient method for producing a minimum of a quadratic function p(x) on
R n , then, is to first write out the symmetric coefficient matrix K and the vector f . Solving
the system K x = f will produce the minimizer x? provided K > 0 which should be
checked during the course of the procedure by making sure no row interchanges are used
and all the pivots are positive. If these conditions are not met then (with one minor
exception see below) one immediately concludes that there is no minimizer.
Example 4.3. Let us minimize the quadratic function
p(x, y, z) = x2 + 2 x y + x z + 2 y 2 + y z + 2 z 2 + 6 y 7 z + 5.
This has the matrix form (4.9) with
1 1 12
K = 1 2 12 ,
1
2
1
2
f = 3,
7
2
1 1 12
1 0 0
1 0
K = 1 2 12 = 1 1 0 0 1
1
2
1
2
1
2
c = 5.
0 1
1
0
0 0
3
4
1
2
0 .
The pivots, i.e., the diagonal entries of D, are all positive, and hence K is positive definite.
Theorem 4.1 then guarantees that p(x, y, z) has a unique minimizer, which is found by
solving the linear system K x = f . The solution is then quickly obtained by forward and
back substitution:
x? = 2,
y ? = 3,
z ? = 2,
with
Theorem 4.1 solves the general quadratic minimization problem when the quadratic
coefficient matrix is positive definite. If K is not positive definite, then the quadratic
function (4.9) does not have a minimum, apart from one exceptional situation.
Theorem 4.4. If K > 0 is positive definite, then the quadratic function p(x) =
xT K x2 xT f +c has a unique global minimizer x? satisfying K x? = f . If K 0 is positive
semi-definite, and f rng K, then every solution to K x? = f is a global minimum of p(x).
However, in the semi-definite case, the minimum is not unique since p(x ? + z) = p(x? ) for
any null vector z ker K. In all other cases, there is no global minimum, and p(x) can
assume arbitrarily large negative values.
1/12/04
129
c 2003
Peter J. Olver
Proof : The first part is just a restatement of Theorem 4.1. The second part is proved
by a similar computation, and uses the fact that a positive semi-definite but not definite
matrix has a nontrivial kernel. If K is not positive semi-definite, then one can find a
vector y such that a = yT K y < 0. If we set x = t y, then p(x) = p(t y) = a t2 + 2 b t + c,
with b = yT f . Since a < 0, by choosing | t | 0 sufficiently large, one can arrange that
p(t y) 0 is an arbitrarily large negative quantity. The one remaining case when K is
positive semi-definite, but f 6
rng K is left until Exercise .
Q.E.D.
The minimal distance k v? b k to the closest point is called the distance from the
point b to the subspace V . Of course, if b V lies in the subspace, then the answer is
easy: the closest point is v? = b itself. The distance from b to the subspace is zero. Thus,
the problem only becomes interesting when b 6
V.
Remark : Initially, you may assume that k k denotes the usual Euclidean norm, and
so the distance corresponds to the usual Euclidean length. But it will be no more difficult
to solve the closest point problem for any norm that arises from an inner product: k v k =
p
h v ; v i. In fact, requiring that V R m is not crucial either; the same method works
when V is a finite-dimensional subspace of any inner product space.
However, the methods do not apply to more general norms not coming from inner
products, e.g., the 1 norm or norm. These are much harder to handle, and, in such cases,
the closest point problem is a nonlinear minimization problem whose solution requires the
more sophisticated methods of Section 19.3.
When solving the closest point problem, the goal is to minimize the distance
k v b k2 = k v k2 2 h v ; b i + k b k 2 ,
(4.15)
over all possible v belonging to the subspace V R m . Let us assume that we know a
basis v1 , . . . , vn of V , with n = dim V . Then the most general vector in V is a linear
combination
v = x 1 v1 + + x n vn
(4.16)
of the basis vectors. We substitute the formula (4.16) for v into the distance function
(4.15). As we shall see, the resulting expression is a quadratic function of the coefficients
T
x = ( x1 , x2 , . . . , xn ) , and so the minimum is provided by Theorem 4.1.
First, the quadratic terms come from expanding
2
k v k = h v ; v i = h x 1 v1 + + x n vn ; x 1 v1 + + x n vn i =
1/12/04
130
n
X
i,j = 1
xi xj h vi ; vj i.
c 2003
(4.17)
Peter J. Olver
Therefore,
2
kvk =
n
X
kij xi xj = xT Kx,
i,j = 1
where K is the symmetric n n Gram matrix whose (i, j) entry is the inner product
kij = h vi ; vj i,
(4.18)
n
X
n
X
i=1
xi h vi ; b i,
xi fi = x T f ,
i=1
(4.19)
between the point and the subspace basis elements. We conclude that the squared distance
function (4.15) reduces to the quadratic function
T
p(x) = x Kx 2 x f + c =
n
X
i,j = 1
kij xi xj 2
n
X
fi xi + c,
(4.20)
i=1
131
fi = vi b = viT b.
c 2003
Peter J. Olver
As in (3.49), both sets of equations can be combined into a single matrix equation. If
A = ( v1 v2 . . . vn ) denotes the m n matrix formed by the basis vectors, then
K = AT A,
f = AT b,
c = k b k2 .
(4.23)
1
2
Example 4.6. Let V R 3 be the plane spanned by v1 = 2 , v2 = 3 .
1
1
1
Our goal is to find the point v? V that is closest to b = 0 , where distance is
0
measured
in
the
usual
Euclidean
norm.
We
combine
the
basis
vectors
to form the matrix
1
2
A = 2 3 . According to (4.23), the positive definite Gram matrix and associated
1 1
vector are
6 3
1
T
T
K=A A=
,
f =A b=
.
3 14
2
(Or, alternatively, these can be computed directly by taking inner products, as in (4.18), (4.19).)
4 1 T
, 5 . Theorem 4.5 implies that
We solve the linear system K x = f for x? = K 1 f = 15
the closest point is
2
.6667
3
1
v? = x?1 v1 + x?2 v2 = A x? = 15
.0667 .
.4667
7
15
132
1
3
.5774.
c 2003
Peter J. Olver
Suppose, on the other hand, that distance is measured in the weighted norm k v k =
+ 21 v22 + 31 v32 corresponding to the diagonal matrix C = diag (1, 12 , 13 ). In this case, we
form the weighted Gram matrix and vector (4.24):
!
1 0 0
1
2
2
10
1 2 1
T
1
3
3
0 2 0 2 3 =
K = A CA =
,
53
2
2 3 1
1
1 1
0 0 3
3
6
1 0 0
1
1 2 1
1
T
1
0 2 0 0 =
f = A Cb =
,
2 3 1
2
0
0 0 13
v12
and so
x? = K 1 f
.8563
v? = A x? .0575 .
.6034
.3506
,
.2529
In this case, the distance between the point and the subspace is measured in the weighted
norm: k v? b k .3790.
Remark : The solution to the closest point problem given in Theorem 4.5 applies, as
stated, to the more general case when V W is a finite-dimensional subspace of a general
inner product space W . The underlying inner product space W can even be infinitedimensional, which it is when dealing with least squares approximations in function space,
to be described at the end of this chapter, and in Fourier analysis.
Least Squares
As we first observed in Section 4.1, the solution to the closest point problem also
solves the basic least squares minimization problem! Let us officially define the notion of
a (classical) least squares solution to a linear system.
Definition 4.7. The least squares solution to a linear system of equations
Ax = b
(4.25)
133
c 2003
Peter J. Olver
of A are linearly independent, then they form a basis for the range V . Since every element
of the range can be written as v = A x, minimizing k A x b k is the same as minimizing
the distance k v b k between the point and the subspace. The least squares solution x ?
to the minimization problem gives the closest point v ? = A x? in V = rng A. Therefore,
the least squares solution follows from Theorem 4.5. In the Euclidean case, we state the
result more explicitly by using (4.23) to write out the linear system (4.21) and the minimal
distance (4.22).
Theorem 4.8. Assume ker A = {0}. Set K = AT A and f = AT b. Then the least
squares solution to A x = b is the unique solution to the normal equations
Kx = f
or
(AT A) x = AT b,
(4.26)
namely
x? = (AT A)1 AT b.
(4.27)
k A x? b k2 = k b k2 f T x? = k b k2 bT A (AT A)1 AT b.
(4.28)
Note that the normal equations (4.26) can be simply obtained by multiplying the
original system A x = b on both sides by AT . In particular, if A is square and invertible,
then (AT A)1 = A1 (AT )1 , and so (4.27) reduces to x = A1 b, while the two terms in
the error formula (4.28) cancel out, producing 0 error. In the rectangular case when
this is not allowed formula (4.27) gives a new formula for the solution to (4.25) when
b rng A.
Example 4.9. Consider the linear system
x1 + 2 x 2
= 1,
3 x1 x2 + x3 = 0,
x1 + 2 x2 + x3 = 1,
x1 x2 2 x3 = 2,
2 x1 + x2 x3 = 2,
consisting of 5 equations in 3 unknowns. The coefficient matrix and right hand side are
1
3
A = 1
1
2
2
1
2
1
1
0
1
1 ,
2
1
1
0
b = 1 .
2
2
134
c 2003
Peter J. Olver
16 2 2
8
K = AT A = 2 11
2 ,
f = AT b = 0 .
2 2
7
7
(t2 , y2 ),
...
(tm , ym ).
(4.29)
Suppose our theory indicates that the data points are supposed to all lie on a single line
y = + t,
(4.30)
i = 1, . . . , m.
135
c 2003
Peter J. Olver
Figure 4.2.
where
e
1
e2
e=
..
.
y
1
y2
y=
..
.
t1
t2
..
.
,
while
A=
.
(4.31)
x=
..
.
em
ym
1 tm
We call e the error vector and y the data vector . The coefficients , of our desired
function (4.30) are the unknowns, forming the entries of the column vector x.
If we could fit the data exactly, so yi = + ti for all i, then each ei = 0, and we
could solve A x = y. In matrix language, the data points all lie on a straight line if and
only if y rng A. If the data points are not all collinear, then we seek the straight line
that minimizes the total squared error or Euclidean norm
q
Error = k e k = e21 + + e2m .
Pictorially, referring to Figure 4.2, the errors are the vertical distances from the points to
the line, and we are seeking to minimize the square root of the sum of the squares of the
individual errors , hence the term least squares. In vector language, we are looking for the
T
coefficient vector x = ( , ) which minimizes the Euclidean norm of the error vector
k e k = k A x y k.
(4.32)
This choice of minimization may strike the reader as a little odd. Why not just minimize
the sum of the absolute value of the errors, i.e., the 1 norm k e k1 = | e1 | + + | en | of the
error vector, or minimize the maximal error, i.e., the norm k e k = max{ | e1 |, , | en | }? Or,
even better, why minimize the vertical distance to the line? Maybe the perpendicular distance
from each data point to the line, as computed in Exercise , would be a better measure of
error. The answer is that, although all of these alternative minimization criteria are interesting
and potentially useful, they all lead to nonlinear minimization problems, and are much harder
to solve! The least squares minimization problem can be solved by linear algebra, whereas the
others lead to nonlinear minimization problems. Moreover, one needs to be properly understand
the linear solution before moving on to the more treacherous nonlinear situation, cf. Section 19.3.
1/12/04
136
c 2003
Peter J. Olver
Thus, we are precisely in the situation of characterizing the least squares solution to the
system A x = y that was covered in the preceding subsection.
Theorem 4.8 prescribes the solution to this least squares minimization problem. We
form the normal equations
(AT A) x = AT y,
with solution
x? = (AT A)1 AT y.
(4.33)
Invertibility of the Gram matrix K = AT A relies on the assumption that the matrix A
have linearly independent columns. This requires that its columns be linearly independent,
and so not all the ti are equal, i.e., we must measure the data at at least two distinct times.
Note that this restriction does not preclude measuring some of the data at the same time,
e.g., by repeating the experiment. However, choosing all the ti s to be the same is a silly
data fitting problem. (Why?)
For the particular matrices (4.31), we compute
1 t
1
!
P
1 t2
t
1
m
t
1
1
.
.
.
1
i
AT A =
..
= P t P(t )2 = m t t2 ,
t1 t2 . . . tm ...
.
i
i
1 tm
(4.34)
y
1
P
y2
y
y
1 1 ... 1
T
= P i
=m
A y=
,
.
..
t1 t2 . . . t m
t y
ty
i i
ym
m
1 X
t,
m i=1 i
y=
m
1 X
y,
m i=1 i
t2 =
m
1 X 2
t ,
m i=1 i
ty =
m
1 X
t y,
m i=1 i i
(4.35)
Warning: The average of a product is not equal to the product of the averages! In
particular,
t2 6
= ( t )2 ,
ty 6
= t y.
Substituting (4.34) into the normal equations (4.33), and canceling the common factor
of m, we find that we have only to solve a pair of linear equations
t + t2 = t y.
+ t = y,
The solution is
= y t ,
P
(t t ) yi
= P i
=
.
(ti t )2
t2 ( t ) 2
tyty
(4.36)
Therefore, the best (in the least squares sense) straight line that fits the given data is
y = (t t ) + y,
where the lines slope is given in (4.36).
1/12/04
137
c 2003
Peter J. Olver
Example 4.10. Suppose the data points are given by the table
Then
1
1
A=
1
1
Therefore
ti
yi
12
0
1
,
3
6
T
A A=
AT =
4 10
10 46
1
0
1 1
1 3
1
6
2
3
y = .
7
12
A y=
24
96
10 + 46 = 96,
so
12
7 ,
12
7 .
Therefore, the best least squares fit to the data is the straight line
y=
12
7
12
7
t.
(4.37)
In this case, we are looking for the parabola that best fits the data. For example, Newtons
theory of gravitation says that (in the absence of air resistance) a falling object obeys the
1/12/04
138
c 2003
Peter J. Olver
Linear
Quadratic
Figure 4.3.
Cubic
Interpolating Polynomials.
parabolic law (4.37), where = h0 is the initial height, = v0 is the initial velocity, and
= 21 g m is one half the weight of the object. Suppose we observe a falling body, and
measure its height yi at times ti . Then we can approximate its initial height, initial velocity
and weight by finding the parabola (4.37) that best fits the data. Again, we characterize
the least squares fit by minimizing the sum of the squares of errors ei = yi y(ti ).
The method can evidently be extended to a completely general polynomial function
y(t) = 0 + 1 t + + n tn
(4.38)
of degree n. The total least squares error between the data and the sample values of the
function is equal to
m
X
2
2
kek =
yi y(ti ) = k y A x k2 ,
(4.39)
i=1
where
1
A=
..
.
1
t1
t2
..
.
tm
t21
t22
..
.
t2m
...
...
..
.
...
tn1
tn2
.. ,
.
tnm
x=
.2 .
.
.
n
(4.40)
139
c 2003
Peter J. Olver
Example 4.14. The basic ideas of interpolation and least squares fitting of data
can be applied to approximate complicated mathematical functions by much simpler polynomials. Such approximation schemes are used in all numerical computations when
you ask your computer or calculator to compute et or cos t or any other function, it only
knows how to add, subtract, multiply and divide, and so must rely on an approximation
scheme based on polynomials In the dark ages before computers, one would consult
precomputed tables of values of the function at particular data points. If one needed a
value at a nontabulated point, then some form of polynomial interpolation would typically
be used to accurately approximate the intermediate value.
For example, suppose we want to compute reasonably accurate values for the exponential function et for values of t lying in the interval 0 t 1 by using a quadratic
polynomial
p(t) = + t + t2 .
(4.41)
If we choose 3 points, say t1 = 0, t2 = .5, t3 = 1, then there is a unique quadratic polynomial
(4.41) that interpolates et at the data points, i.e.,
p(ti ) = eti
for
i = 1, 2, 3.
1 0
0
A = 1 .5 .25 ,
1 1
1
e t1
1
y = et2 = 1.64872
e t3
2.71828
1.
x = = .876603
.841679
(4.42)
It is the unique quadratic polynomial that agrees with et at the three specified data points.
See Figure 4.4 for a comparison of the graphs; the first graph shows e t , the second p(t), and
Actually, one could also allow interpolation and approximation by rational functions, a subject known as Pade approximation theory. See [ 12 ] for details.
1/12/04
140
c 2003
Peter J. Olver
2.5
2.5
2.5
1.5
1.5
1.5
0.5
0.5
0.5
0.2
0.4
0.6
0.8
0.2
Figure 4.4.
0.4
0.6
0.8
0.2
0.4
0.6
0.8
the third lays the two graphs on top of each other. Even with such a simple interpolation
scheme, the two functions are quite close. The L norm of the difference is
There is, in fact, an explicit formula for the interpolating polynomial that is named after the influential eighteenth century ItaloFrench mathematician JosephLouis Lagrange.
It relies on the basic superposition principle for solving inhomogeneous systems Theorem 2.42. Specifically, if we know the solutions x1 , . . . , xn+1 to the particular interpolation
systems
A xk = ek ,
k = 1, . . . , n + 1,
(4.43)
where e1 , . . . , en+1 are the standard basis vectors of R n+1 , then the solution to
A x = y = y1 e1 + + yn+1 en+1
is given by the superposition formula
x = y1 x1 + + yn+1 xn+1 .
The particular interpolation equation (4.43) corresponds to interpolation data y = e k ,
meaning that yk = 1, while yi = 0 at all points ti with i 6
= k. If we can find the
n + 1 particular interpolating polynomials that realize this very special data, we can use
superposition to construct the general interpolating polynomial. It turns out that there is
a simple explicit formula for the basic interpolating polynomials.
Theorem 4.15. Given distinct values t1 , . . . , tn+1 , the k th Lagrange interpolating
polynomial is the degree n polynomial given by
Lk (t) =
k = 1, . . . , n + 1.
(4.44)
141
c 2003
(4.45)
Peter J. Olver
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.6
0.8
0.2
0.2
L1 (t)
0.4
0.6
0.8
L2 (t)
Figure 4.5.
0.4
0.6
0.8
L3 (t)
(4.46)
Q.E.D.
Example 4.17. For example, the three quadratic Lagrange interpolating polynomials for the values t1 = 0, t2 = 21 , t3 = 1 used to interpolate et in Example 4.14 are
(t 12 )(t 1)
= 2 t2 3 t + 1,
(0 12 )(0 1)
(t 0)(t 1)
L2 (t) = 1
= 4 t2 + 4 t,
1
( 2 0)( 2 1)
L1 (t) =
(4.47)
(t 0)(t 12 )
L3 (t) =
= 2 t2 t.
(1 0)(1 12 )
142
c 2003
Peter J. Olver
-3
-2
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
-1
-0.2
Figure 4.6.
-3
-2
-1
-3
-2
-0.2
-1
-0.2
One might expect that the higher the degree, the more accurate the interpolating
polynomial. This expectation turns out, unfortunately, not to be uniformly valid. While
low degree interpolating polynomials are usually reasonable approximants to functions,
high degree interpolants are more expensive to compute, and, moreover, can be rather
badly behaved, particularly near the ends of the interval. For example, Figure 4.6 displays
the degree 2, 4 and 10 interpolating polynomials for the function 1/(1 + t 2 ) on the interval
3 t 3 using equally spaced data points. Note the rather poor approximation of the
function near the endpoints of the interval. Higher degree interpolants fare even worse,
although the bad behavior becomes more and more concentrated near the ends of the
interval. As a consequence, high degree polynomial interpolation tends not to be used
in practical applications. Better alternatives rely on least squares approximants by low
degree polynomials, to be described next, and interpolation by piecewise cubic splines, a
topic that will be discussed in depth in Chapter 11.
If we have m > n + 1 data points, then, usually, there is no degree n polynomial
that fits all the data, and so one must switch over to a least squares approximation. The
first requirement is that the associated m (n + 1) interpolation matrix (4.40) has rank
n + 1; this follows from Lemma 4.12 provided at least n + 1 of the values t 1 , . . . , tm are
distinct. Thus, given data at m n + 1 different sample points t1 , . . . , tm , we can uniquely
determine the best least squares polynomial of degree n that fits the data by solving the
normal equations (4.33).
Example 4.18. If we use more than three data points, but still require a quadratic
polynomial, then we cannot interpolate exactly, and must use a least squares approximant.
Let us return to the problem of approximating the exponential function e t . For instance,
using five equally spaced sample points t1 = 0, t2 = .25, t3 = .5, t4 = .75, t5 = 1, the
coefficient matrix and sampled data vector (4.40) are
1
1
A = 1
1
1
1/12/04
0
0
.25 .0625
.5
.25 ,
.75 .5625
1
1
143
1
1.28403
y = 1.64872 .
2.11700
2.71828
c 2003
Peter J. Olver
2.5
2.5
1.5
1.5
0.5
0.5
0.2
Figure 4.7.
0.4
0.6
0.8
0.2
0.4
0.6
0.8
is
5.
T
K=A A=
2.5
1.875
2.5
1.875
1.5625
1.875
1.5625 ,
1.38281
8.76803
f = AT y = 5.45140 ,
4.40153
T
144
c 2003
Peter J. Olver
to a given set of data. Again, the least squares error takes the same form k y A x k 2 as
in (4.39), where
cos t
y
sin t1
1
1
y
cos t2 sin t2
1
,
.2 .
A=
x
=
,
y
=
.
.
.
..
..
2
.
cos tm sin tm
ym
The key is that the unspecified parameters in this case 1 , 2 occur linearly in the
approximating function. Thus, the most general case is to approximate the data (4.29) by
a linear combination
y(t) = 1 h1 (t) + 2 h2 (t) + + n hn (t),
of prescribed, linearly independent functions h1 (x), . . . , hn (x). The least squares error is,
as always, given by
v
u m
u X
2
yi y(ti )
= k y A x k,
Error = t
i=1
h1 (t1 ) h2 (t1 )
h1 (t2 ) h2 (t2 )
A=
..
..
.
.
h1 (tm ) h2 (tm )
y
. . . hn (t1 )
1
1
2
y2
. . . hn (t2 )
,
x=
y=
..
..
.. ,
.. .
.
.
.
.
n
ym
. . . h (t )
n
(4.48)
Thus, the columns of A are the sampled values of the functions. If A is square and
nonsingular, then we can find an interpolating function of the prescribed form by solving
the linear system
A x = y.
(4.49)
A particularly important case is provided by the 2 n + 1 trigonometric functions
1,
cos x,
sin x,
cos 2 x,
sin 2 x,
...
cos n x,
sin n x.
m
X
hi (t ) hj (t ),
(4.50)
=1
1/12/04
145
c 2003
Peter J. Olver
m
X
hi (t ) y .
(4.51)
=1
The one key question is whether the columns of A are linearly independent; this is more
subtle than the polynomial case covered by Lemma 4.12, and requires the sampled function
vectors to be linearly independent, which in general is different than requiring the functions
themselves to be linearly independent. See Exercise for a few details on the distinction
between these two notions of linear independence.
If the parameters do not occur linearly in the functional formula, then we cannot use a
linear analysis to find the least squares solution. For example, a direct linear least squares
approach does not suffice to find the frequency , the amplitude r, and the phase of a
general trigonometric approximation:
y = c1 cos t + c2 sin t = r cos( t + ).
Approximating data by such a function constitutes a nonlinear minimization problem, and
must be solved by the more sophisticated techniques presented in Section 19.3.
Weighted Least Squares
Another generalization is to introduce weights in the measurement of the least squares
error. Suppose some of the data is known to be more reliable or more significant than
others. For example, measurements at an earlier time may be more accurate, or more
critical to the data fitting problem, than measurements at later time. In that situation,
we should penalize any errors at the earlier times and downplay errors in the later data.
In general, this requires the introduction of a positive weight c i > 0 associated to each
data point (ti , yi ); the larger the weight, the more important the error. For a straight line
approximation y = + t, the weighted least squares error is defined as
v
v
u m
u m
X
u
uX
2
2
t
Error =
ci ei = t
ci yi ( + ti ) .
i=1
i=1
Let us rewrite this formula in matrix form. Let C = diag (c1 , . . . , cm ) denote the diagonal
weight matrix . Note that C > 0 is positive definite, since all the weights are positive. The
least squares error
Error = eT C e = k e k
is then the norm of the error vector e with respect to the weighted inner product
h v ; w i = vT C w
(4.52)
= xT AT C A x 2 xT AT C y + yT C y = xT K x 2 xT f + c,
1/12/04
146
c 2003
(4.53)
Peter J. Olver
where
K = AT C A,
f = AT C y,
c = y T C y = k y k2 .
are the weighted inner products between the column vectors v1 , . . . , vn of A. Theorem 3.33
immediately implies that K is positive definite provided A has linearly independent
columns or, equivalently, has rank n m.
Theorem 4.19. Suppose A is an m n matrix with linearly independent columns.
Suppose C > 0 is any positive definite m m matrix. Then, the quadratic function (4.53)
giving the weighted least squares error has a unique minimizer, which is the solution to
the weighted normal equations
AT C A x = AT C y,
x = (AT C A)1 AT C y.
so that
(4.54)
In other words, the weighted least squares solution is obtained by multiplying both
sides of the original system A x = y by the matrix AT C. The derivation of this result
allows C > 0 to be any positive definite matrix. In applications, the off-diagonal entries
of C can be used to weight cross-correlation terms in the data.
Example 4.20. In Example 4.10 we fit the data
ti
yi
12
ci
1
2
1
4
with an unweighted least squares line. Now we shall assign the weights for the error at
each sample point listed in the last row of the table, so that errors in the first two data
values carry more weight. To find the weighted least squares line y = + t that best fits
the data, we compute
1 0
!
3 0 0 0
23
5
1
1
1
1
1
1
0
2
0
0
4
,
AT C A =
1 3
0 1 3 6
0 0 12 0
5 31
2
0 0 0 14
1 6
2
!
3 0 0 0
37
1
1
1
1
0
2
0
0
3
2
=
AT C y =
.
69
0 1 3 6
0 0 21 0
7
2
0 0 0 14
12
Thus, the weighted normal equations (4.54) reduce to
23
4
+ 5 =
37
2 ,
5 +
31
2
69
2 ,
so
= 1.7817,
= 1.6511.
Therefore, the least squares fit to the data under the given weights is y = 1.7817 + 1.6511 t.
1/12/04
147
c 2003
Peter J. Olver
Let P (n) denote the subspace consisting of all polynomials of degree n. For simplicity,
we employ the standard monomial basis 1, t, t2 , . . . , tn . We will be approximating a general
function f (t) C0 [ a, b ] by a polynomial
p(t) = 1 + 2 t + + n+1 tn P (n)
(4.56)
of degree at most n. The error function e(t) = f (t)p(t) measures the discrepancy between
the function and its approximating polynomial at each t. Instead of summing the squares
of the errors at a finite set of sample points, we go to a continuous limit that integrates
the squared errors of all points in the interval. Thus, the approximating polynomial will
be characterized as the one that minimizes the L2 least squares error
s
Z b
[ p(t) f (t) ]2 dt .
(4.57)
Error = k e k = k p f k =
a
n+1
2 n+1
n+1
X
X
X
i1 j
i1
2
i j h t
; t i2
i t
f (t) =
i h ti1 ; f (t) i+k f (t) k2 .
kp f k =
i,j = 1
i=1
i=1
xT K x 2 xT f + c,
(4.58)
where x = 1 , 2 , . . . , n+1
is the vector containing the unknown coefficients in the
minimizing polynomial, while
Z b
Z b
i1 j1
i+j2
i1
(4.59)
kij = h t
;t
i=
t
dt,
fi = h t
;f i =
ti1 f (t) dt,
a
are, as before, the Gram matrix K consisting of inner products between basis monomials
along with the vector f of inner products between the monomials and the right hand side.
The coefficients of the least squares minimizing polynomial are thus found by solving the
associated normal equations K x = f .
1/12/04
148
c 2003
Peter J. Olver
2.5
2.5
2.5
1.5
1.5
1.5
0.5
0.5
0.5
0.2
0.4
0.6
0.8
Figure 4.8.
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
1 12 13
e1
1 1 1
2 3 4 = 1 .
1
3
1
4
1
5
e2
The coefficient matrix is the Gram matrix K consisting of the inner products
Z 1
1
i j
ht ;t i =
ti+j dt =
i+j+1
0
between basis monomials, while the right hand side is the vector of inner products
Z 1
t i
ti et dt.
he ;t i =
0
(4.60)
with the maximum occurring at t = 1. Thus, the simple quadratic polynomial (4.60) will
give a reasonable approximation to the first two decimal places in e t on the entire interval
[ 0, 1 ]. A more accurate approximation can be made by taking a higher degree polynomial,
or by decreasing the length of the interval.
1/12/04
149
c 2003
Peter J. Olver
Remark : Although the least squares polynomial (4.60) minimizes the L 2 norm of the
error, it does slightly worse with the L norm than the previous sample-based minimizer
(4.42). The problem of finding the quadratic polynomial that minimizes the L norm is
more difficult, and must be solved by nonlinear minimization methods.
Remark : As noted in Example 3.35, the Gram matrix for the simple monomial basis
is the nn Hilbert matrix (1.67). The ill conditioned nature of the Hilbert matrix, and the
consequential difficulty in accurately solving the normal equations, complicates the practical numerical implementation of high degree least squares polynomial approximations. A
better approach, based on an alternative orthogonal polynomial basis, will be discussed in
in the ensuing Chapter.
1/12/04
150
c 2003
Peter J. Olver
Chapter 5
Orthogonality
Orthogonality is the mathematical formalization of the geometrical property of perpendicularity suitably adapted to general inner product spaces. In finite-dimensional
spaces, bases that consist of mutually orthogonal elements play an essential role in the theory, in applications, and in practical numerical algorithms. Many computations become
dramatically simpler and less prone to numerical instabilities when performed in orthogonal systems. In infinite-dimensional function space, orthogonality unlocks the secrets of
Fourier analysis and its manifold applications, and underlies basic series solution methods
for partial differential equations. Indeed, many large scale modern applications, including
signal processing and computer vision, would be impractical, if not completely infeasible
were it not for the dramatic simplifying power of orthogonality. As we will later discover,
orthogonal systems naturally arise as eigenvector and eigenfunction bases for symmetric
matrices and self-adjoint boundary value problems for both ordinary and partial differential equations, and so play a major role in both finite-dimensional and infinite-dimensional
analysis and applications.
Orthogonality is motivated by geometry, and the methods have significant geometrical
consequences. Orthogonal matrices play an essential role in the geometry of Euclidean
space, computer graphics, animation, and three-dimensional image analysis, to be discussed
in Chapter 7. The orthogonal projection of a point onto a subspace is the closest point or
least squares minimizer. Moreover, when written in terms of an orthogonal basis for the
subspace, the normal equations underlying least squares analysis have an elegant explicit
solution formula. Yet another important fact is that the four fundamental subspaces of a
matrix form mutually orthogonal pairs. The orthogonality property leads directly to a new
characterization of the compatibility conditions for linear systems known as the Fredholm
alternative.
The duly famous GramSchmidt process will convert an arbitrary basis of an inner
product space into an orthogonal basis. As such, it forms one of the key algorithms of
linear analysis, in both finite-dimensional vector spaces and also function space where it
leads to the classical orthogonal polynomials and other systems of orthogonal functions. In
Euclidean space, the GramSchmidt process can be re-interpreted as a new kind of matrix
factorization, in which a nonsingular matrix A = Q R is written as the product of an
orthogonal matrix Q and an upper triangular matrix R. The Q R factorization underlies
one of the primary numerical algorithms for computing eigenvalues, to be presented in
Section 10.6.
1/12/04
150
c 2003
Peter J. Olver
Figure 5.1.
The methods can be adapted more or less straightforwardly to complex inner product spaces.
The main complication, as noted in Section 3.6, is that we need to be careful with the order of
vectors appearing in the non-symmetric complex inner products. In this chapter, we will write
all inner product formulas in the proper order so that they retain their validity in complex vector
spaces.
1/12/04
151
c 2003
Peter J. Olver
1
v1 = 2 ,
1
v 2 = 1 ,
2
5
v 3 = 2 ,
1
are easily seen to form a basis of R 3 . Moreover, they are mutually perpendicular, v1 v2 =
v1 v3 = v2 v3 = 0 , and so form an orthogonal basis with respect to the standard dot
product on R 3 . When we divide each orthogonal basis vector by its length, the result is
the orthonormal basis
1
5
0
1
0
5
26
302
1
1
1
1
u1 =
= 6 , u2 =
2
1 = 5 , u3 =
2 =
30 ,
6 1
5 2
30
1
2
1
16
5
30
satisfying u1 u2 = u1 u3 = u2 u3 = 0 and k u1 k = k u2 k = k u3 k = 1. The appearance
of square roots in the elements of an orthonormal basis is fairly typical.
A useful observation is that any orthogonal collection of nonzero vectors is automatically linearly independent.
Proposition 5.4. If v1 , . . . , vk V are nonzero, mutually orthogonal, so h vi ; vj i =
0 for all i 6
= j, then they are linearly independent.
Proof : Suppose
c1 v1 + + ck vk = 0.
Let us take the inner product of the equation with any vi . Using linearity of the inner
product and orthogonality of the elements, we compute
0 = h c 1 v1 + + c k vk ; v i i = c 1 h v1 ; v i i + + c k h vk ; v i i = c i h vi ; v i i = c i k v i k2 .
Therefore, provided vi 6
= 0, we conclude that the coefficient ci = 0. Since this holds for
all i = 1, . . . , k, linear independence of v1 , . . . , vk follows.
Q.E.D.
As a direct corollary, we infer that any orthogonal collection of nonzero vectors is
automatically a basis for its span.
Proposition 5.5. Suppose v1 , . . . , vn V are mutually orthogonal nonzero elements
of an inner product space V . Then v1 , . . . , vn form an orthogonal basis for their span
W = span {v1 , . . . , vn } V , which is therefore a subspace of dimension n = dim W . In
particular, if dim V = n, then they form a orthogonal basis for V .
Orthogonality is also of great significance for function spaces.
1/12/04
152
c 2003
Peter J. Olver
Example 5.6. Consider the vector space P (2) consisting of all quadratic polynomials
p(x) = + x + x2 , equipped with the L2 inner product and norm
s
Z 1
Z 1
p
hp;qi =
p(x) q(x) dx,
kpk = hp;pi =
p(x)2 dx .
0
1
2,
(2)
h 1 ; x2 i =
1
3
h x ; x2 i =
1
4
p3 (x) = x2 x + 61 .
(5.1)
1
1
1
1
k p2 k = = ,
k p3 k =
= .
(5.2)
12
2 3
180
6 5
The corresponding orthonormal basis is found by dividing each orthogonal basis element
by its norm:
(5.3)
u3 (x) = 5 6 x2 6 x + 1 .
u1 (x) = 1,
u2 (x) = 3 ( 2 x 1 ) ,
k p1 k = 1,
In Section 5.4 below, we will learn how to construct such orthogonal systems of polynomials.
Computations in Orthogonal Bases
What are the advantages of orthogonal and orthonormal bases? Once one has a basis
of a vector space, a key issue is how to express other elements as linear combinations of
the basis elements that is, to find their coordinates in the prescribed basis. In general,
this is not an easy problem, since it requires solving a system of linear equations, (2.22).
In high dimensional situations arising in applications, computing the solution may require
a considerable, if not infeasible amount of time and effort.
However, if the basis is orthogonal, or, even better, orthonormal, then the change
of basis computation requires almost no work. This is the crucial insight underlying the
efficacy of both discrete and continuous Fourier methods, large data least squares approximations, signal and image processing, and a multitude of other crucial applications.
Theorem 5.7. Let u1 , . . . , un be an orthonormal basis for an inner product space
V . Then one can write any element v V as a linear combination
in which the coordinates
v = c 1 u1 + + c n un ,
ci = h v ; ui i,
(5.4)
i = 1, . . . , n,
(5.5)
(5.6)
i=1
153
c 2003
Peter J. Olver
Proof : Let us compute the inner product of (5.4) with one of the basis vectors. Using
the orthonormality conditions
0
i6
= j,
(5.7)
h ui ; u j i =
1
i = j,
and bilinearity of the inner product, we find
+
* n
n
X
X
c j h uj ; u i i = c i k u i k 2 = c i .
c j uj ; u i =
h v ; ui i =
j =1
j =1
kvk = hv;vi =
n
X
i,j = 1
c i c j h ui ; u j i =
n
X
c2i ,
i=1
Q.E.D.
1
5
0
26
1
30
2 ,
u1 = 6 ,
u2 = 5 ,
u3 =
30
2
1
1
6
5
30
constructed in Example 5.3. Computing the dot products
v u1 =
2
6
v u2 =
3
5
v u3 =
4
30
we conclude that
v=
2
6
u1 +
3
5
u2 +
4
30
u3 ,
as the reader can validate. Needless to say, a direct computation based on solving the
associated linear system, as in Chapter 2, is more tedious.
While passage from an orthogonal basis to its orthonormal version is elementary one
simply divides each basis element by its norm we shall often find it more convenient to
work directly with the unnormalized version. The next result provides the corresponding
formula expressing a vector in terms of an orthogonal, but not necessarily orthonormal
basis. The proof proceeds exactly as in the orthonormal case, and details are left to the
reader.
Theorem 5.9. If v1 , . . . , vn form an orthogonal basis, then the corresponding coordinates of a vector
v = a 1 v1 + + a n vn
1/12/04
are given by
154
ai =
h v ; vi i
.
k v i k2
c 2003
(5.8)
Peter J. Olver
(5.9)
Equation (5.8), along with its orthonormal simplification (5.5), is one of the most
important and useful formulas we shall establish. Applications will appear repeatedly
throughout the remainder of the text.
Example 5.10. The wavelet basis
1
1
1
1
v2 =
v1 = ,
,
1
1
1
1
1
1
v3 =
,
0
0
0
0
v4 =
,
1
1
(5.10)
introduced in Example 2.33 is, in fact, an orthogonal basis of R 4 . The norms are
k v1 k = 2,
k v2 k = 2,
k v3 k = 2,
k v4 k = 2.
Therefore, using (5.8), we can readily express any vector as a linear combination of the
wavelet basis vectors. For example,
4
2
v=
= 2 v 1 v2 + 3 v 3 2 v 4 ,
1
5
where the wavelet basis coordinates are computed directly by
8
h v ; v1 i
= = 2,
2
k v1 k
4
h v ; v2 i
4
=
= 1,
2
k v2 k
4
h v ; v3 i
6
= =3
2
k v3 k
2
h v ; v4 i
4
=
= 2 .
2
k v4 k
2
This is clearly a lot quicker than solving the linear system, as we did in Example 2.33.
Finally, we note that
46 = k v k2 = 22 k v1 k2 + ( 1)2 k v2 k2 + 32 k v3 k2 + ( 2)2 k v4 k2 = 4 4 + 1 4 + 9 2 + 4 2,
in conformity with (5.9).
Example 5.11. The same formulae are equally valid for orthogonal bases in function
spaces. For example, to express a quadratic polynomial
h p ; p1 i
h p ; p2 i
=
= 12
c1 =
p(x) dx,
c2 =
p(x) x 21 dx,
2
2
k p1 k
k p2 k
0
0
Z 1
h p ; p2 i
c3 =
= 180
p(x) x2 x + 16 dx.
2
k p2 k
0
1/12/04
155
c 2003
Peter J. Olver
11
6
as is easily checked.
+2 x
1
2
+ x2 x +
1
6
T (x) =
(5.11)
0 j+k n
of degree n. The constituent monomials (sin x)j (cos x)k obviously span T (n) , but they
do not form a basis owing to identities stemming from the basic trigonometric formula
cos2 x + sin2 x = 1; see Example 2.19 for additional details. Exercise introduced a more
convenient spanning set consisting of the 2 n + 1 functions
1,
cos x,
sin x,
cos 2 x,
sin 2 x,
...
cos n x,
sin n x.
(5.12)
Let us prove that these functions form an orthogonal basis of T (n) with respect to the L2
inner product and norm:
hf ;gi =
kf k =
f (x)2 dx.
(5.13)
k6
= l,
0,
cos k x cos l x dx =
2 , k = l = 0,
,
k=l6
= 0,
sin k x sin l x dx =
0,
k6
= l,
k=l6
= 0,
cos k x sin l x dx = 0,
(5.14)
which are valid for all nonnegative integers k, l 0, imply the orthogonality relations
h cos k x ; cos l x i = h sin k x ; sin l x i = 0,
k6
= l,
k6
= 0,
k cos k x k = k sin k x k = ,
h cos k x ; sin l x i = 0,
k 1 k = 2 .
(5.15)
Proposition 5.5 now assures us that the functions (5.12) form a basis for T (n) . One
key consequence is that dim T (n) = 2 n + 1 a fact that is not so easy to establish
directly. Orthogonality of the trigonometric functions (5.12) means that we can compute
the coefficients a0 , . . . , an , b1 , . . . , bn of any trigonometric polynomial
p(x) = a0 +
n
X
ak cos k x + bk sin k x
k=1
1/12/04
156
(5.16)
c 2003
Peter J. Olver
(5.17)
These formulae willplay an essential role in the theory and applications of Fourier series;
see Chapter 12.
h w2 ; v 1 i
, and therefore
k v 1 k2
v2 = w 2
h w2 ; v 1 i
v1 .
k v 1 k2
(5.18)
157
c 2003
Peter J. Olver
by subtracting suitable multiples of the first two orthogonal basis elements from w 3 . We
want v3 to be orthogonal to both v1 and v2 . Since we already arranged that h v1 ; v2 i = 0,
this requires
0 = h v3 ; v1 i = h w3 ; v1 i c1 h v1 ; v1 i,
0 = h v3 ; v2 i = h w3 ; v2 i c2 h v2 ; v2 i,
and hence
c1 =
h w3 ; v 1 i
,
k v 1 k2
c2 =
h w3 ; v 2 i
.
k v 2 k2
h w3 ; v 1 i
h w3 ; v 2 i
v1
v2 .
2
k v1 k
k v 2 k2
Continuing in the same manner, suppose we have already constructed the mutually
orthogonal vectors v1 , . . . , vk1 as linear combinations of w1 , . . . , wk1 . The next orthogonal basis element vk will be obtained from wk by subtracting off a suitable linear
combination of the previous orthogonal basis elements:
vk = wk c1 v1 ck1 vk1 .
Since v1 , . . . , vk1 are already orthogonal, the orthogonality constraint
0 = h v k ; v j i = h w k ; v j i c j h vj ; v j i
requires
cj =
h wk ; v j i
k v j k2
for
j = 1, . . . , k 1.
(5.19)
k1
X
j =1
h wk ; v j i
vj ,
k v j k2
k = 1, . . . , n.
(5.20)
The GramSchmidt process (5.20) defines a recursive procedure for constructing the orthogonal basis vectors v1 , . . . , vn . If we are actually after an orthonormal basis u1 , . . . , un ,
we merely normalize the resulting orthogonal basis vectors, setting u k = vk /k vk k for
k = 1, . . . , n.
Example 5.13. The vectors
1
w1 = 1 ,
1
1/12/04
1
w2 = 0 ,
2
158
2
w3 = 2 ,
3
c 2003
(5.21)
Peter J. Olver
are readily seen to form a basis of R 3 . To construct an orthogonal basis (with respect
to the standard
dot
4
1
1
3
1
w2 v 1
1
v2 = w 2
v = 0
= 3 .
1
k v 1 k2 1
3
2
1
5
3
1
2
1
3
w v
w v
7
3
v3 = w3 3 21 v1 3 22 v2 = 2
1 14 13 = 32 .
k v1 k
k v2 k
3
3
5
3
1
12
3
14
7
k v1 k = 3,
,
k v3 k =
.
k v2 k =
3
2
we produce the corresponding orthonormal basis vectors
u1 =
1
13
3
13
u2 =
4
42
1
42
5
42
u3 =
2
14
3
14
1
14
(5.22)
Example 5.14. Here is a typical sort of problem: Find an orthonormal basis (with
respect to the dot product) for the subspace V R 4 consisting of all vectors which are
T
T
orthogonal to the vector a = ( 1, 2, 1, 3 ) . Now, a vector x = ( x1 , x2 , x3 , x4 ) is
orthogonal to a if and only if
x a = x1 + 2 x2 x3 3 x4 = 0.
Solving this homogeneous linear system by the usual method, we find that the free variables
are x2 , x3 , x4 , and so a (non-orthogonal) basis for the subspace is
2
1
3
1
0
0
w1 =
w 2 = ,
w 3 = .
,
0
1
0
0
0
1
This will, in fact, be a consequence of the successful completion of the GramSchmidt algorithm and does not need to be checked in advance. If the given vectors were not linearly
independent, then eventually one of the GramSchmidt vectors would vanish, and the process will
break down.
1/12/04
159
c 2003
Peter J. Olver
1
2
2
5
2
w2 v 1
0 2 1
1
v1 =
. The next element is v2 = w2
=5
. The
2
1
0
0
k v1 k
5
1
0
0
0
0
last element of our orthogonal basis is
1 1
3
2
5
2
32
6
w3 v 2
w3 v 1
0
1 55 1
=
v
v
=
v3 = w 3
.
k v 1 k2 1
k v 2 k2 2 0
5 0 65 1 1
2
2
1
1
5
30
10
1
2
2
10
30 ,
5 ,
u1 =
u
=
u
=
.
2
3
5
1
0
30
10
2
0
0
10
(5.23)
(5.24)
160
c 2003
Peter J. Olver
The coefficients rij can, in fact, be directly computed without using the intermediate
derivation. Indeed, taking the inner product of the j th equation with the orthonormal
basis vector uj , we find, in view of the orthonormality constraints (5.7),
h wj ; ui i = h r1j u1 + + rjj uj ; ui i = r1j h u1 ; ui i + + rjj h un ; ui i = rij ,
and hence
rij = h wj ; ui i.
(5.25)
(5.26)
The pair of equations (5.25), (5.26) can be rearranged to devise a recursive procedure to
compute the orthonormal basis. At stage j, we assume that we have already constructed
u1 , . . . , uj1 . We then compute
rij = h wj ; ui i,
for each
i = 1, . . . , j 1.
(5.27)
2 r2
k wj k2 r1j
j1,j ,
uj =
Running through the formulae (5.27), (5.28) for j = 1, . . . , n leads to the same orthonormal
basis u1 , . . . , un as the previous version of the GramSchmidt process.
Example 5.16. Let us apply the revised algorithm to the vectors
2
1
1
w3 = 2 ,
w2 = 0 ,
w1 = 1 ,
3
2
1
r11 = k w1 k =
3,
u1 =
w1
=
r11
r12
1
= h w2 ; u1 i = , r22 =
3
q
2 =
k w2 k2 r12
14
,
3
1
13
3
13
u2 =
w2 r12 u1
=
r22
4
42
1
42
5
42
1/12/04
161
c 2003
Peter J. Olver
r13 = h w3 ; u1 i = 3 ,
r33 =
k w3
k2
2
r13
r23 = h w3 ; u2 i =
2
r23
7
,
2
u3 =
21
,
2
w3 r13 u1 r23 u2
=
r33
2
143
14
1
14
As advertised, the result is the same orthonormal basis vectors u1 , u2 , u3 found in Example 5.13.
For hand computations, the orthogonal version (5.20) of the GramSchmidt process is
slightly easier even if one does ultimately want an orthonormal basis since it avoids
the square roots that are ubiquitous in the orthonormal version (5.27), (5.28). On the
other hand, for numerical implementation on a computer, the orthonormal version is a bit
faster, as it involves fewer arithmetic operations.
However, in practical, large scale computations, both versions of the GramSchmidt
process suffer from a serious flaw. They are subject to numerical instabilities, and so roundoff errors may seriously corrupt the computations, producing inaccurate, non-orthogonal
vectors. Fortunately, there is a simple rearrangement of the calculation that obviates
this difficulty and leads to a numerically robust algorithm that is used in practice. The
idea is to treat the vectors simultaneously rather than sequentially, making full use of
the orthonormal basis vectors as they arise. More specifically, the algorithm begins as
before we take u1 = w1 /k w1 k. We then subtract off the appropriate multiples of u1
from all of the remaining basis vectors so as to arrange their orthogonality to u 1 . This is
accomplished by setting
(2)
wk = w k h w k ; u 1 i u1 ,
(2)
for
k = 2, . . . , n.
(2)
(3)
(2)
wk = w k h w k ; u 2 i u2 ,
k = 3, . . . , n,
(3)
(3)
uj =
wj
(j)
k wj
(j+1)
wk
(j)
(j)
= w k h w k ; u j i uj ,
j = 1, . . . n,
k = j + 1, . . . , n.
(5.29)
(In the final phase, when j = n, the second formula is no longer relevant.) The result is a
numerically stable computation of the same orthonormal basis vectors u 1 , . . . , un .
1/12/04
162
c 2003
Peter J. Olver
Example 5.17. Let us apply the stable GramSchmidt process (5.29) to the basis
vectors
2
0
1
(1)
(1)
(1)
w1 = w 1 = 2 ,
w 2 = w 2 = 4 ,
w3 = w3 = 2 .
1
1
3
2
(1)
w1
(1)
k w1
3
2
3
13
. Next, we compute
1
(2)
(1)
(1)
(2)
w 3 = w 3 h w 3 ; u 1 i u1 = 0 .
w2
2
12
(2)
w2
1
k w3 k
2
232
The resulting vectors u1 , u2 , u3 form the desired orthonormal basis.
2
(1)
(1)
= w 2 h w 2 ; u 1 i u1 = 2 ,
0
(5.30)
The orthogonality condition implies that one can easily invert an orthogonal matrix:
Q1 = QT .
(5.31)
In fact the two conditions are equivalent, and hence a matrix is orthogonal if and only if
its inverse is equal to its transpose. The second important characterization of orthogonal
matrices relates them directly to orthonormal bases.
1/12/04
163
c 2003
Peter J. Olver
a b
is orthogonal if and only if its columns
Example 5.20. A 2 2 matrix Q =
c d
a
b
u1 =
, u2 =
, form an orthonormal basis of R 2 . Equivalently, the requirement
c
d
T
Q Q=
a
b
c
d
a b
c d
a2 + c 2
ac + bd
ac + bd
b 2 + d2
1
0
0
1
a c + b d = 0,
T
b2 + d2 = 1.
T
The first and last equations say the points ( a, c ) and ( b, d ) lie on the unit circle in R 2 ,
and so
a = cos ,
c = sin ,
b = cos ,
d = sin ,
for some choice of angles , . The remaining orthogonality condition is
0 = a c + b d = cos cos + sin sin = cos( ).
This implies that and differ by a right angle: = 12 . The sign leads to two
cases:
b = sin ,
d = cos ,
or
b = sin ,
d = cos .
As a result, every 2 2 orthogonal matrix has one of two possible forms
cos sin
cos
sin
or
,
where
0 < 2 .
sin cos
sin cos
(5.32)
The corresponding orthonormal bases are illustrated in Figure 5.2. Note that the former
is a right-handed basis which can be obtained from the standard basis e 1 , e2 by a rotation
through angle , while the latter has the opposite, reflected orientation.
1/12/04
164
c 2003
Peter J. Olver
u2
u1
u1
u2
Figure 5.2.
Orthonormal Bases in R 2 .
1
3
1
3
13
4
42
1
42
5
42
and .
2
14
314
114
Q.E.D.
The precise mathematical definition of a group can be found in Exercise . Although they
will not play a significant role in this text, groups are the mathematical formalization of symmetry and, as such, form one of the most fundamental concepts in advanced mathematics and its
applications, particularly quantum mechanics and modern theoretical physics. Indeed, according
to the mathematician Felix Klein, cf. [ 152 ], all geometry is based on group theory.
1/12/04
165
c 2003
Peter J. Olver
The Q R Factorization
The GramSchmidt procedure for orthonormalizing bases of R n can be reinterpreted
as a matrix factorization. This is more subtle than the L U factorization that resulted
from Gaussian elimination, but is of comparable importance, and is used in a broad range
of applications in mathematics, physics, engineering and numerical analysis.
Let w1 , . . . , wn be a basis of R n , and let u1 , . . . , un be the corresponding orthonormal
basis that results from any one of the three implementations of the GramSchmidt process.
We assemble both sets of column vectors to form nonsingular n n matrices
A = ( w1 w2 . . . wn ),
Q = ( u1 u2 . . . un ).
Since the ui form an orthonormal basis, Q is an orthogonal matrix. In view of the matrix
multiplication formula (2.14), the GramSchmidt equations (5.24) can be recast into an
equivalent matrix form:
A = Q R,
11
0
R=
..
.
0
where
r12
r22
..
.
...
...
..
.
...
r1n
r2n
..
(5.33)
rnn
is an upper triangular matrix, whose entries are the previously computed coefficients
(5.27), (5.28). Since the GramSchmidt process works on any basis, the only requirement on the matrix A is that its columns form a basis of R n , and hence A can be any
nonsingular matrix. We have therefore established the celebrated Q R factorization of
nonsingular matrices.
Theorem 5.24. Any nonsingular matrix A can be factorized, A = Q R, into the
product of an orthogonal matrix Q and an upper triangular matrix R. The factorization
is unique if all the diagonal entries of R are assumed to be positive.
The proof of uniqueness is left to Exercise .
1 1 2
Example 5.25. The columns of the matrix A = 1 0 2 are the same as
1 2 3
the basis vectors considered in Example 5.16. The orthonormal basis (5.22) constructed
using the GramSchmidt algorithm leads to the orthogonal and upper triangular matrices
1
4
2
1
3
42
14
3
13
1
3 ,
14
21 .
Q=
R
=
3
42
14
3
2
1
5
7
1
3 42 14
0
0
2
While any of the three implementations of the GramSchmidt algorithm will produce
the Q R factorization of a given matrix A = ( w1 w2 . . . wn ), the stable version, as encoded
1/12/04
166
c 2003
Peter J. Olver
Q R Factorization of a Matrix A
start
for j = 1 to n
q
set rjj =
a21j + + a2nj
next k
next j
end
in equations (5.29), is the one to use in practical computations, as it is the least likely to
fail due to numerical artifacts arising from round-off errors. The accompanying pseudocode
program reformulates the algorithm purely in terms of the matrix entries a ij of A. During
the course of the algorithm, the entries of the matrix A are successively overwritten; the
final result is the orthogonal matrix Q appearing in place of A. The entries r ij of R must
be stored separately.
Example 5.26. Let us factorize the matrix
2 1 0
1 2 1
A=
0 1 2
0 0 1
0
0
1
2
using the numerically stable Q R algorithm. As in the program, we work directly on the
matrix A, gradually changing it into orthogonal form. In the first loop, we set r 11 = 5
to be the norm of the first column vector of
We then normalize the first column
A.
2
1 0 0
15
2
1
0
. The next entries r =
5
by dividing by r11 ; the resulting matrix is
12
0 1 2 1
0 0 1 2
1/12/04
167
c 2003
Peter J. Olver
4 ,
5
r13 = 15 , r14 = 0, are obtained by taking the dot products of the first column
with the other three columns. For j = 1, 2, 3, we subtract r1j times the first column
2
3
2
0
5
5
15
6
4
0
5
5
th
is a matrix whose first column is
0
1
2 1
0
0
1 2
normalized to have unit length, and whose second, third and fourth columns are orthogonal
to it. In the next loop, we normalize the second column by dividing by its norm r 22 =
370 25 0
5
1
0
5
14
5
70
5
5
2
1
70
0
0
1 2
5
the second column with the remaining two columns to produce r23 = 1670 , r24 = 14
.
Subtracting these multiples of the second column from the third and fourth columns, we
2
3
2
7
14
5
70
1
4
3
6
7
7
5
70
obtain
, which now has its first two columns orthonormalized,
9
6
5
0
7
14
70
0
0
1
2
and orthogonal to the last
two
columns. We then normalize
the third column by dividing
3
2
3
2
70
14
105
5
1
6
4
3
7
70
105
, and so 5
by r33 = 15
. Finally, we subtract r34 = 20
7
105
5
6
9
0
14
70
105
7
0
0
2
105
times the third column from the fourth column. Dividing the resulting fourth column by
its norm r44 = 56 results in the final formulas
2
5
1
Q=
0
370
6
70
5
70
2
105
4
105
6
105
7
105
130
2
30
330
4
30
R=
0
4
5
14
5
0
0
1
5
16
70
15
7
5
14
20
105
5
6
becomes
Q R x = b,
and hence
R x = QT b,
(5.34)
168
c 2003
Peter J. Olver
can be solved for x by back substitution. The resulting algorithm, while more expensive
to compute, does offer some numerical advantages over traditional Gaussian elimination
as it is less prone to inaccuracies resulting from ill-conditioned coefficient matrices.
Example 5.27. Let us apply the A = Q R factorization
1
4
2
3 13
3
42
14
1 1 2
1 0 2 = 1
1
3 0
14
3
42
14
3
1 2 3
1
5
1
3 42 14
0
0
21 ,
2
7
2
1
1
3
3
0
3
3
3
4
4 = 21 .
1
5
QT b =
2
42
42
42
7
3
1
2
2
14
14
14
3 13 3
3 3
x
21
14
21
y =
Rx =
2
0
3
2
7
7
z
0
0
2
2
T
169
c 2003
Peter J. Olver
L2 inner product
hp;qi =
(5.35)
The method will work for any other bounded interval, but choosing [ 1, 1 ] will lead us to a
particularly important case. We shall apply the GramSchmidt orthogonalization process
to the elementary, but non-orthogonal monomial basis 1, t, t2 , . . . tn . Because
Z 1
2
,
k + l even,
k+l
k l
t
dt =
ht ;t i =
(5.36)
k+l+1
1
0,
k + l odd,
odd degree monomials are orthogonal to even degree monomials, but that is all. Let
q0 (t), q1 (t), . . . , qn (t) denote the orthogonal polynomials that result from applying the
GramSchmidt process to the non-orthogonal monomial basis 1, t, t 2 , . . . , tn . We begin
by setting
Z 1
2
q0 (t) = 1,
k q0 k =
q0 (t)2 dt = 2.
1
h t ; q0 i
q (t) = t,
k q 0 k2 0
k q 1 k2 =
2
3
k1
X
j =0
h tk ; q j i
q (t)
k q j k2 j
for
k = 1, 2, . . . .
q3 (t) = t3 35 t,
4
q4 (t) = t
6 2
7 t
k q 2 k2 =
3
35
k q 3 k2 =
2
k q4 k =
8
45 ,
8
175 ,
128
11025
(5.37)
,
and so on. The reader can verify that they satisfy the orthogonality conditions
Z 1
qi (t) qj (t) dt = 0,
i6
= j.
h qi ; q j i =
1
1/12/04
170
c 2003
Peter J. Olver
subspace P (n1) of polynomials of degree n 1, the next one, qn , is the unique monic
polynomial that is orthogonal to every polynomial of degree n 1:
h tk ; qn i = 0,
k = 0, . . . , n 1.
(5.38)
Since the monic Legendre polynomials form a basis for the space of polynomials, one
can uniquely rewrite any polynomial of degree n as a linear combination:
p(t) = c0 q0 (t) + c1 q1 (t) + + cn qn (t).
(5.39)
In view of the general orthogonality formula (5.8), the coefficients are simply given by
inner products
Z 1
h p ; qk i
1
ck =
=
p(t) qk (t) dt,
k = 0, . . . , n.
(5.40)
k q k k2
k qk k2 1
For example,
t4 = q4 (t) + 67 q2 (t) + 15 q0 (t) = (t4 67 t2 +
3
35 )
6
7
(t2 13 ) + 51 .
The coefficients can either be obtained directly, or via (5.40); for example,
11025
c4 =
128
175
c3 =
8
t q4 (t) dt = 1,
1
1
t4 q3 (t) dt = 0.
(2 k)!
q (t),
(k!)2 k
k = 0, 1, 2, . . . ,
2k
(5.41)
of the orthogonal basis polynomials. The multiple is fixed by the requirement that
Pk (1) = 1,
(5.42)
which is not so important here, but does play a role in other applications. The first few
classical Legendre polynomials are
k P0 k2 = 2,
k P1 k2 = 32 ,
P0 (t) = 1,
P1 (t) = t,
P2 (t) =
P3 (t) =
P4 (t) =
P5 (t) =
P6 (t) =
3 2
1
2t 2,
5 3
3
2 t 2 t,
35 4
15 2
3
8 t 4 t + 8,
35 3
15
63 5
8 t 4 t + 8 t.
231 6
315 4
105 2
16 t 16 t + 16 t
k P2 k2 = 52 ,
k P3 k2 = 27 ,
k P4 k2 = 92 ,
5
16
k P 5 k2 =
k P 6 k2 =
2
11 ,
2
13 ,
and are graphed in Figure 5.3. There is, in fact, an explicit formula for the Legendre polynomials, due to the early nineteenth century Portuguese mathematician Olinde Rodrigues.
1/12/04
171
c 2003
Peter J. Olver
-1
-1
1.5
1.5
1.5
0.5
0.5
0.5
-0.5
0.5
-1
-0.5
0.5
-1
-0.5
-0.5
-0.5
-0.5
-1
-1
-1
-1.5
-1.5
-1.5
1.5
1.5
1.5
0.5
0.5
0.5
-0.5
0.5
-1
-0.5
0.5
-1
-0.5
-0.5
-0.5
-0.5
-1
-1
-1
-1.5
-1.5
-1.5
Figure 5.3.
0.5
0.5
Theorem 5.28. The Rodrigues formula for the classical Legendre polynomials is
r
1
dk
2
2
k
(5.43)
Pk (t) = k
(t 1) ,
k Pk k =
,
k = 0, 1, 2, . . . .
k
2 k! dt
2k + 1
Thus, for example,
P4 (t) =
Proof : Let
d4 2
1 d4 2
1
4
(t
1)
=
(t 1)4 =
16 4! dt4
384 dt4
Rj,k (t) =
35 4
8 t
15 2
4 t
dj 2
(t 1)k ,
dtj
+ 83 .
(5.44)
which is evidently a polynomial of degree 2 kj. In particular, the Rodrigues formula (5.43)
claims that Pk (t) is a multiple of Rk,k (t). Note that
d
R (t) = Rj+1,k (t).
dt j,k
(5.45)
Moreover,
Rj,k (1) = 0 = Rj,k ( 1)
whenever
j < k,
(5.46)
since, by the product rule, differentiating (t2 1)k a total of j < k times still leaves at
least one factor of t2 1 in each summand, which therefore vanishes at t = 1.
Lemma 5.29. If j k, then the polynomial Rj,k (t) is orthogonal to all polynomials
of degree j 1.
1/12/04
172
c 2003
Peter J. Olver
1
1
ti Rj,k (t) dt = 0,
for all
(5.47)
0 i < j k.
0
Since j > 0, we use (5.45) to write Rj,k (t) = Rj1,k
(t). Integrating by parts,
i
h t ; Rj,k i =
1
1
0
ti Rj1,k
(t) dt
= it Rj1,k (t)
i
t = 1
1
1
where the boundary terms vanish owing to (5.46). We then repeat the process, and eventually
h ti ; Rj,k i = i h ti1 ; Rj1,k i
i
i
= (1) i !
Rji,k (t) dt = (1) i ! Rji1,k (t)
= 0,
t = 1
Q.E.D.
In particular, Rk,k (t) is a polynomial of degree k which is orthogonal to every polynomial of degree k 1. By our earlier remarks, this implies that it is a constant multiple,
Rk,k (t) = ck Pk (t)
of the k th Legendre polynomial. To determine ck , we need only compare the leading terms:
Rk,k (t) =
dk 2 k
(2 k)! k
(2 k)!
dk 2
k
(t
1)
=
(t + ) =
t + , while Pk (t) = k t2 k + .
k
k
2
dt
dt
(k!)
2 k!
We conclude that ck = 2k k!, which proves (5.43). The proof of the formula for k Pk k can
be found in Exercise .
Q.E.D.
The Legendre polynomials play an important role in many aspects of applied mathematics, including numerical analysis, least squares approximation of functions, and solution
of partial differential equations.
Other Systems of Orthogonal Polynomials
The standard Legendre polynomials form an orthogonal system with respect to the L 2
inner product on the interval [ 1, 1 ]. Dealing with any other interval, or, more generally,
a weighted inner product between functions on an interval, leads to a different, suitably
adapted collection of orthogonal polynomials. In all cases, applying the GramSchmidt
process to the standard monomials 1, t, t2 , t3 , . . . will produce the desired orthogonal system.
1/12/04
173
c 2003
Peter J. Olver
Example 5.30. In this example, we construct orthogonal polynomials for the weighted
inner product
Z
hf ;gi =
f (t) g(t) e t dt
(5.48)
(5.49)
q (t) = t2 4 t + 2,
k q 0 k2 0
k q 1 k2 1
q1 (t) = t
q3 (t) = t3 9 t2 + 18 t 6,
k q0 k2 = 1,
k q1 k2 = 1,
k q 2 k2 = 4 ,
k q3 k2 = 36 .
The resulting orthogonal polynomials are known as the (monic) Laguerre polynomials,
named after the nineteenth century French mathematician Edmond Laguerre.
In some cases, a change of variables may be used to relate systems of orthogonal polynomials and thereby circumvent the GramSchmidt computation. Suppose, for instance,
that our goal is to construct an orthogonal system of polynomials for the L 2 inner product
Z b
f (t) g(t) dt on the interval [ a, b ]. The key remark is that we can map the
hh f ; g ii =
a
2t b a
ba
will change
atb
to
1 s 1.
(5.50)
The map changes functions F (s), G(s), defined for 1 s 1, into the functions
2t b a
2t b a
f (t) = F
,
g(t) = G
,
(5.51)
ba
ba
defined for a t b. Moreover, interpreting (5.50) as a change of variables for the
2
integrals, we have ds =
dt, and so the inner products are related by
ba
Z b
Z b
2t b a
2t b a
hf ;gi =
f (t) g(t) dt =
F
G
dt
ba
ba
a
a
(5.52)
Z 1
ba
ba
F (s) G(s)
=
ds =
h F ; G i,
2
2
1
1/12/04
174
c 2003
Peter J. Olver
where the final L2 inner product is over the interval [ 1, 1 ]. In particular, the change of
variables maintains orthogonality, while rescaling the norms:
r
ba
(5.53)
k F k.
h f ; g i = 0 if and only if h F ; G i = 0,
kf k =
2
Moreover, if F (s) is a polynomial of degree n in s, then f (t) is a polynomial of degree n in t
and vice versa. Applying these observations to the Legendre polynomials, we immediately
deduce the following.
Proposition 5.31. The transformed Legendre polynomials
r
2
t
a
ba
Pek (t) = Pk
,
,
k = 0, 1, 2, . . . ,
k Pek k =
ba
2k + 1
(5.54)
form an orthogonal system of polynomials with respect to the L2 inner product on the
interval [ a, b ].
Z 1
2
Example 5.32. As an example, consider the L inner product hh f ; g ii =
f (t) g(t) dt
0
Pe1 (t) = 2 t 1,
Pe2 (t) = 6 t2 6 t + 1,
Pe3 (t) = 20 t3 30 t2 + 12 t 1,
Pe5 (t) =
63 5
8 t
35 3
4 t
15
8
(5.55)
t.
One can, as an alternative, derive these formulae through a direct application of the Gram
Schmidt process.
175
c 2003
Peter J. Olver
W
Figure 5.4.
Orthogonal Projection
We begin by characterizing the orthogonal projection of a vector onto a subspace.
Throughout this section, we will consider a prescribed finite-dimensional subspace W V
of a real inner product space V . While the subspace is necessarily finite-dimensional, the
inner product space itself may be infinite-dimensional. Initially, though, you may wish to
concentrate on V = R m with the ordinary Euclidean dot product, which is the easiest case
to visualize as it coincides with our geometric intuition, as in Figure 5.4.
A vector z V is said to be orthogonal to the subspace W if it is orthogonal to every
vector in W , so h z ; w i = 0 for all w W . Given a basis w1 , . . . , wn for W , we note that
z is orthogonal to W if and only if it is orthogonal to every basis vector: h z ; w i i = 0 for
i = 1, . . . , n. Indeed, any other vector in W has the form w = c1 w1 + + cn wn and
hence, by linearity, h z ; w i = c1 h z ; w1 i + + cn h z ; wn i = 0, as required.
Definition 5.33. The orthogonal projection of v onto the subspace W is the element
w W that makes the difference z = v w orthogonal to W .
As we shall see, the orthogonal projection is unique. The explicit construction is
greatly simplified by taking a orthonormal basis of the subspace, which, if necessary, can be
arranged by applying the GramSchmidt process to a known basis. (A direct construction
of the orthogonal projection in terms of a general basis appears in Exercise .)
Theorem 5.34. Let u1 , . . . , un be an orthonormal basis for the subspace W V .
Then the orthogonal projection of a vector v V onto W is
w = c 1 u1 + + c n un
where
ci = h v ; ui i,
i = 1, . . . , n.
(5.56)
Proof : First, since u1 , . . . , un form a basis of the subspace, the orthogonal projection
element w = c1 u1 + + cn un must be some linear combination thereof. Definition 5.33
1/12/04
176
c 2003
Peter J. Olver
where
ai =
h v ; vi i
,
k v i k2
i = 1, . . . , n.
(5.57)
Of course, we could equally well replace the orthogonal basis by the orthonormal basis
obtained by dividing each vector by its length: ui = vi /k vi k. The reader should be able
to prove that the two formulae (5.56), (5.57) for the orthogonal projection yield the same
vector w.
Example 5.35. Consider the plane W R 3 spanned by the orthogonal vectors
1
1
v 2 = 1 .
v1 = 2 ,
1
1
T
1
1
1
h v ; v1 i
h v ; v2 i
1
1 2
w=
v +
v =
2 +
1 = 0 .
k v 1 k2 1
k v 2 k2 2
6
3
1
1
1
2
Alternatively, we can replace v1 , v2 by the orthonormal basis
u1 =
v1
=
k v1 k
1
62
6
1
6
u2 =
1
62
6
1
6
v2
=
k v2 k
+ 1
1
3
1
3
1
3
1
3
1
3
1
3
2
= 0 .
1
2
The answer is, of course, the same. As the reader may notice, while the theoretical formula
is simpler when written in an orthonormal basis, for hand computations the orthogonal basis version avoids dealing with square roots. (Of course, when performing the computation
on a computer, this is not a significant issue.)
1/12/04
177
c 2003
Peter J. Olver
(5.58)
denote the k-dimensional subspace spanned by the first k basis elements. The basic Gram
Schmidt formula (5.20) can be rewritten in the form vk = wk yk , where yk is the orthogonal projection of wk onto the subspace Vk1 . The resulting vector vk is, by construction,
orthogonal to the subspace, and hence orthogonal to all of the previous basis elements,
which serves to rejustify the GramSchmidt construction.
Orthogonal Least Squares
Now we make an important connection: The orthogonal projection of a vector onto a
subspace is also the least squares vector the closest point in the subspace!
Theorem 5.36. Let W V be a finite-dimensional subspace of an inner product
space. Given a vector v V , the closest point or least squares minimizer w W is the
same as the orthogonal projection of v onto W .
Proof : Let w W be the orthogonal projection of v onto the subspace, which requires
e W is any other vector in
that the difference z = v w be orthogonal to W . Suppose w
the subspace. Then,
e k2 = k w + z w
e k2 = k w w
e k2 + 2 h w w
e ; z i + k z k2 = k w w
e k2 + k z k 2 .
kv w
In particular, if we are supplied with an orthonormal or orthogonal basis of our subspace, then we can compute the closest least squares point w W to v using our orthogonal
projection formulae (5.56) or (5.57). In this way, orthogonal bases have a very dramatic
simplifying effect on the least squares approximation formulae. They completely avoid the
construction of and solution to the much more complicated normal equations.
Example 5.37. Consider the least squares problem of finding the closest point w to
T
the vector v = ( 1, 2, 2, 1 ) in the three-dimensional subspace spanned by the orthogonal
1/12/04
178
c 2003
Peter J. Olver
31 13 4 T
is the closest point to v in the subspace.
and so w = 12 v1 + 49 v2 + 34 v3 = 11
6 , 18 , 9 , 9
a1 =
1
3
h v ; v1 i
= = ,
2
k v1 k
6
2
a2 =
Even when we only know a non-orthogonal basis for the subspace, it may still be a
good strategy to first use GramSchmidt to replace it by an orthogonal or even orthonormal
basis, and then apply the orthogonal projection formulae (5.56), (5.57) to calculate the
least squares point. Not only does this simplify the final computation, it will often avoid
the ill-conditioning and numerical inaccuracies that sometimes afflict the direct solution to
the normal equations (4.26). The following example illustrates this alternative procedure.
Example 5.38. Let us return to the problem, solved in Example 4.6, of finding the
T
T
closest point on plane V spanned by w1 = ( 1, 2, 1 ) , w2 = ( 2, 3, 1 ) to the point
T
b = ( 1, 0, 0 ) . We proceed now by first using the GramSchmidt process to compute an
orthogonal basis
5
1
2
w v
v2 = w2 2 21 w1 = 2 ,
v1 = w 1 = 2 ,
k v1 k
1
3
2
for our subspace. Therefore, applying the orthogonal projection formula (5.57), the closest
point is
3
b v1
b v2
1
v? =
v
+
v
=
15 ,
k v 1 k2 1 k v 2 k2 2
7
15
reconfirming our earlier result. By this device, we have managed to circumvent the tedious
solving of linear equations.
Let us revisit the problem, described in Section 4.4, of approximating experimental
data by a least squares minimization procedure. The required calculations are significantly
simplified by the introduction of an orthogonal basis of the least squares subspace. Given
sample points t1 , . . . , tm , let
T
k = 0, 1, 2, . . .
be the vectors obtained by sampling the monomial tk . More generally, sampling a polynomial
y = p(t) = 0 + 1 t + + n tn
(5.59)
results in the self-same linear combination
T
p = ( p(t1 ), . . . , p(tn ) ) = 0 t0 + 1 t1 + + n tn
1/12/04
179
c 2003
(5.60)
Peter J. Olver
of monomial samplevectors. We
that the sampled polynomial vectors form a
conclude
m
subspace W = span t0 , . . . , tn R spanned by the monomial sample vectors.
T
Let y = ( y1 , y2 , . . . , ym ) denote data measured at the sample points. The polynomial least squares approximation to the given data is, by definition, the polynomial y = p(t)
whose corresponding sample vector p W is the closest point or, equivalently, the orthogonal projection of the data vector y onto the subspace W . The sample vectors t 0 , . . . , tn
are not orthogonal, and so the direct approach requires solving the normal equations (4.33)
in order to find the desired polynomial least squares coefficients 0 , . . . , n .
An alternative method is to first use the GramSchmidt procedure to construct an
orthogonal basis for the subspace W , from which the least squares coefficients are found
by simply taking appropriate inner products. Let us adopt the rescaled version
m
1 X
hv;wi =
v w = vw
m i=1 i i
(5.61)
(5.62)
For weighted least squares, we would adopt an appropriately weighted inner product.
The method works without these particular assumptions, but the formulas become more
unwieldy; see Exercise .
1/12/04
180
c 2003
Peter J. Olver
-3
-2
-1
3
-3
-2
-1
-3
-2
-1
-1
-1
Linear
-1
Quadratic
Figure 5.5.
Cubic
k qk k2 = qk (t)2 , follow:
k q0 k2 = 1,
q0 (t) = 1,
q 0 = t0 ,
q1 (t) = t,
q 1 = t1 ,
q2 (t) = t2 t2 ,
q 2 = t 2 t 2 t0 ,
k q 1 k2 = t2 ,
2
(5.63)
k q 2 k 2 = t 4 t2 ,
2
t4
t4
t4
q3 (t) = t3 t ,
q 3 = t 3 t1 ,
k q 3 k2 = t6
.
t2
t2
t2
With these in hand, the least squares approximating polynomial of degree n to the
given data vector y is given by a linear combination
p(t) = a0 q0 (t) + a1 q1 (t) + a2 q2 (t) + + an qn (t).
(5.64)
The required coefficients are obtained directly through the orthogonality formulae (5.57),
and so
q y
h qk ; y i
= k .
(5.65)
ak =
2
k qk k
q2
k
An additional advantage of the orthogonal basis approach, beyond the fact that one
can write down explicit formulas for the coefficients, is that the same coefficients a j appear
in all the least squares formulae, and hence one can readily increase the degree, and,
presumably, the accuracy, of the approximating polynomial without having to recompute
any of the lower degree terms. For instance, if a quadratic approximant a 0 + a1 q1 (t) +
a2 q2 (t) looks insufficiently close, one can add in the cubic term a3 q3 (t) with a3 given
by (5.65) for k = 3, without having to recompute the quadratic coefficients a 0 , a1 , a2 .
This simplification is not valid when using the non-orthogonal basis elements, where the
lower order coefficients will change whenever the degree of the approximating polynomial
is increased.
.
Example 5.39. Consider the following tabulated sample values:
1/12/04
ti
yi
1.4
1.3
.6
.1
.9
1.8
2.9
c 2003
Peter J. Olver
181
To compute polynomial least squares fits of degrees 1, 2 and 3, we begin by computing the
polynomials (5.63), which for the given sample points ti are
q0 (t) = 1,
q1 (t) = t,
k q0 k2 = 1,
k q1 k2 = 4,
q2 (t) = t2 4,
k q2 k2 = 12,
q3 (t) = t3 7 t ,
k q 3 k2 =
216
7 .
Thus, to four decimal places, the coefficients for the least squares approximation (5.64) are
a0 = h q0 ; y i = 0.3429,
a2 =
1
12
h q2 ; y i = 0.0738,
a1 =
1
4
a3 =
7
216
h q1 ; y i = 0.7357,
h q3 ; y i = 0.0083.
with respective least squares errors 0.2093 and 0.1697 at the sample points. A plot of the
three approximations appears in Figure 5.5. The cubic term does not significantly increase
the accuracy of the approximation, and so this data probably comes from sampling a
quadratic function.
Orthogonal Polynomials and Least Squares
In a similar fashion, the orthogonality of Legendre polynomials and more general
orthogonal functions serves to simplify the construction of least squares approximants
in function space. As an example, let us reconsider the problem, from Chapter 4, of
approximating et by a polynomial of degree n. For the interval 1 t 1, we write the
best least squares approximant as a linear combination of Legendre polynomials,
For example, the quadratic approximant is obtained from the first three terms in (5.66),
where
Z
Z
1 1 t
1
1
3 1 t
3
a0 =
e dt =
e
' 1.175201,
a1 =
t e dt =
' 1.103638,
2 1
2
e
2 1
e
Z
5 1 3 2 1 t
7
5
a2 =
t 2 e dt =
e
' .357814.
2 1 2
2
e
1/12/04
182
c 2003
Peter J. Olver
2.5
2.5
-1
-0.5
2.5
1.5
1.5
1.5
0.5
0.5
0.5
0.5
-1
-0.5
0.5
-1
-0.5
0.5
Figure 5.6.
Therefore
2
2 t
1
2
(5.67)
7
2 t t e dt = 2
3
3
2
5
37 e
e
' .070456.
We do not need to recompute the coefficients a0 , a1 , a2 . The successive Legendre polynomial coefficients decrease fairly rapidly:
a0 ' 1.175201,
a4 ' .009965,
a1 ' 1.103638,
a5 ' .001100,
a2 ' .357814,
a6 ' .000099,
a3 ' .070456,
leading to greater and greater accuracy in the least squares approximation. An explanation
will appear in Chapter 12.
If we switch to another norm, then we need to construct an associated set of orthogonal polynomials to apply the method. For instance, the polynomial least squares
approximation of degree n to a function f (t) with respect to the L2 norm on [ 0, 1 ] has
the form a0 + a1 Pe1 (t) + a2 Pe2 (t) + + an Pen (t), where Pe1 (t) are the rescaled Legendre
1/12/04
183
c 2003
Peter J. Olver
a2 = 5
a1 = 3
Z
Z
1
0
1
0
184
c 2003
(5.68)
Peter J. Olver
Figure 5.7.
(5.69)
Thus W is characterized as the solution space to the homogeneous linear equation (5.69),
or, equivalently, the kernel of the 1 3 matrix A = w1T = ( 1 2 3 ). We can write the
general solution to the equation in the form
2y 3z
2
3
= y 1 + z 0 = y z1 + z z2 ,
y
z=
z
0
1
T
185
c 2003
Peter J. Olver
v
z
w
Figure 5.8.
2
3 T
obtained by subtraction: z = v w = 14 , 14
, 14
. Alternatively, one can obtain z
directly by orthogonal projection onto the plane W . You need to be careful: the basis
derived in Example 5.44 is not orthogonal, and so you will need to set up and solve the
normal equations to find the closest point z. Or, you can first convert the basis into an
orthogonal basis by a single GramSchmidt step, and then use the orthogonal projection
formula (5.57). All three methods lead to the same vector z W .
v w2 = x + y + z 2 w = 0.
Applying the usual algorithm the free variables are y and w we find that the solution
T
T
space is spanned by z1 = ( 1, 1, 0, 0 ) , z2 = ( 1, 0, 3, 1 ) , which form a non-orthogonal
basis for W .
T
T
The orthogonal basis y1 = z1 = ( 1, 1, 0, 0 ) and y2 = z2 12 z1 = 12 , 12 , 3, 1
for W is obtained by a single GramSchmidt step. To decompose the vector v =
1/12/04
186
c 2003
Peter J. Olver
1
10
1
1 T
W , and z = 12 y1 21
y2 = 11
W . Or you can
21 , 21 , 7 , 21
21 , 21 , 7 , 21
easily obtain z = v w by subtraction.
Proposition 5.49. If W is a finite-dimensional subspace of an inner product space,
then (W ) = W .
This result is an immediate corollary of the orthogonal decomposition Proposition 5.45.
Warning: Propositions 5.45 and 5.49 are not necessarily true for infinite-dimensional vector spaces. In general, if dim W = , one can only assert that W (W ) . For example,
it can be shown that, [125], on any bounded interval [ a, b ] the orthogonal complement to
the subspace of all polynomials P () C0 [ a, b ] with respect to the L2 inner product is
trivial: (P () ) = {0}. This means that the only continuous function which satisfies the
moment equations
Z b
n
h x ; f (x) i =
xn f (x) dx = 0,
for all
n = 0, 1, 2, . . .
a
is the zero function f (x) 0. But the orthogonal complement of {0} is the entire space,
and so ((P () ) ) = C0 [ a, b ] 6
= P () .
The difference is that, in infinite-dimensional function space, a proper subspace W (V
can be dense , whereas in finite dimensions, every proper subspace is a thin subset that
only occupies an infinitesimal fraction of the entire vector space. This seeming paradox
underlies the success of numerical methods, such as the finite element method, in approximating functions by elements of a subspace.
Orthogonality of the Fundamental Matrix Subspaces and the Fredholm Alternative
In Chapter 2, we introduced the four fundamental subspaces associated with an m n
matrix A. According to the fundamental Theorem 2.47, the first two, the kernel or null
space and the corange or row space, are subspaces of R n having complementary dimensions.
The second two, the cokernel or left null space and the range or column space, are subspaces
of R m , also of complementary dimensions. In fact, more than this is true the subspace
pairs are orthogonal complements with respect to the standard Euclidean dot product!
Theorem 5.50. Let A be an m n matrix of rank r. Then its kernel and corange
are orthogonal complements as subspaces of R n , of respective dimensions n r and r,
while its cokernel and range are orthogonal complements in R m , of respective dimensions
m r and r:
ker A = (corng A) R n ,
coker A = (rng A) R m .
(5.70)
In general, a subset W V of a normed vector space is dense if, for every v V , there
are elements w W that are arbitrarily close, k v w k < for every > 0. The Weierstrass
approximation theorem, [ 126 ], tells us that the polynomials form a dense subspace of the space of
continuous functions, and underlies the proof of the result mentioned in the preceding paragraph.
1/12/04
187
c 2003
Peter J. Olver
ker A
rng A
cokerA
corngA
Figure 5.9.
AT y = 0.
(5.71)
Or, to state in another way, the vector b is a linear combination of the columns of A if
and only if it is orthogonal to every vector y in the cokernel of A. In practice, one only
needs to check orthogonality of b with respect to a basis y1 , . . . , ymr of the cokernel,
leading to a system of m r compatibility constraints, where r = rank A denotes the rank
of the coefficient matrix. We note that m r is also the number of all zero rows in the
row echelon form of A, and hence yields precisely the same number of constraints on the
right hand side b.
Example 5.52. In Example 2.40,
we analyzed the linear system A x = b with
1 0 1
coefficient matrix A = 0 1 2 . Using direct Gaussian elimination, we were led to
1 2 3
1/12/04
188
c 2003
Peter J. Olver
a single compatibility condition, namely b1 +2 b2 +b3 = 0, required for the system to have
a solution. We now understand the meaning behind this equation: it is telling us that the
right hand side b must be orthogonal to the cokernel of A. The cokernel is determined by
solving the homogeneous adjoint system AT y = 0, and is the line spanned by the vector
y1 = (1, 2, 1)T . Thus, the compatibility condition requires that b be orthogonal to y 1 ,
in accordance with the Fredholm Theorem 5.51.
Example 5.53. Let us determine the compatibility conditions for the linear system
x1 x 2 + 3 x 3 = b 1 ,
x 1 + 2 x 2 4 x 3 = b2 ,
2 x 1 + 3 x 2 + x 3 = b3 , x 1 + 2 x 3 = b4 ,
1 1 3
1 2 4
by computing the cokernel of its coefficient matrix A =
. To this end,
2
3
1
1
0
2
T
we need to solve the homogeneous adjoint system A y = 0, namely
y1 y2 + 2 y3 + y4 = 0,
y1 + 2 y2 + 3 y3 = 0,
3 y1 4 y2 + y3 + 2 y4 = 0.
y = y3 ( 7, 5, 1, 0 ) + y4 ( 2, 1, 0, 1 )
is a linear combination (whose coefficients are the free variables) of the two basis vectors
for coker A. Thus, the compatibility conditions are obtained by taking their dot products
with the right hand side of the original system:
7 b1 5 b2 + b3 = 0,
2 b1 b2 + b4 = 0.
The reader can check that these are indeed the same compatibility
conditions
that result
We are now very close to a full understanding of the fascinating geometry that lurks
behind the simple algebraic operation of multiplying a vector x R n by an m n matrix,
resulting in a vector b = A x R m . Since the kernel and corange of A are orthogonal
complementary subspaces in the domain space R n , Proposition 5.46 tells us that we can
uniquely decompose x = w + z where w corng A, while z ker A. Since A z = 0, we
have
b = A x = A(w + z) = A w.
Therefore, we can regard multiplication by A as a combination of two operations:
(i ) The first is an orthogonal projection onto the subspace corng A taking x to w.
(ii ) The second takes a vector in corng A R n to a vector in rng A R m , taking the
orthogonal projection w to the image vector b = A w = A x.
Moreover, if A has rank r then, according to Theorem 2.47, both rng A and corng A are rdimensional subspaces, albeit of different vector spaces. Each vector b rng A corresponds
e corng A satisfy b = A w = A w,
e then
to a unique vector w corng A. Indeed, if w, w
e = 0 and hence w w
e ker A. But, since they are complementary subspaces,
A(w w)
the only vector that belongs to both the kernel and the corange is the zero vector, and
e In this manner, we have proved the first part of the following result; the
hence w = w.
second is left as Exercise .
1/12/04
189
c 2003
Peter J. Olver
Proposition 5.54. Multiplication by an m n matrix A of rank r defines a one-toone correspondence between the r-dimensional subspaces corng A R n and rng A R m .
Moreover, if v1 , . . . , vr forms a basis of corng A then their images A v1 , . . . , A vr form a
basis for rng A.
In summary, the linear system A x = b has a solution if and only if b rng A, or,
equivalently, is orthogonal to every vector y coker A. If the compatibility conditions
hold, then the system has a unique solution w corng A that, by the definition of the
corange or row space, is a linear combination of the rows of A. The general solution to
the system is x = w + z where w is the particular solution belonging to the corange, while
z ker A is an arbitrary element of the kernel.
Theorem 5.55. A compatible linear system A x = b with b rng A = (coker A)
has a unique solution w corng A with A w = b. The general solution is x = w + z
where z ker A. The particular solution is distinguished by the fact that it has minimum
Euclidean norm k w k among all possible solutions.
Indeed, since the corange and kernel are orthogonal subspaces, the norm of a general
solution x = w + z is
k x k2 = k w + z k 2 = k w k2 + 2 w z + k z k 2 = k w k2 + k z k2 k w k2 ,
with equality if and only if z = 0.
Example 5.56. Consider the
1 1
0 1
1 3
5 1
linear system
1
x
2 2
2 1 y 1
.
=
4
z
5 2
6
w
9 6
Applying the standard Gaussian elimination algorithm, we discover that the coefficient
T
matrix has rank 3, and the kernel is spanned by the single vector z1 = ( 1, 1, 0, 1 ) . The
system itself is compatible; indeed, the right hand side is orthogonal to the basis cokernel
T
vector ( 2, 24, 7, 1 ) , and so satisfies the Fredholm alternative.
T
The general solution to the linear system is x = ( t, 3 t, 1, t ) where t = w is the
free variable. We decompose the solution x = w + z into a vector w in the corange and
an element z in the kernel. The easiest way to do this is to first compute its orthogonal
T
projection z = k z1 k2 x z1 z1 = ( t 1, 1 t, 0, t 1 ) of the solution x onto the oneT
dimensional kernel. We conclude that w = x z = ( 1, 2, 1, 1 ) corng A is the unique
solution belonging to the corange of the coefficient matrix, i.e., the only solution that can
be written as a linear combination of its row vectors, or, equivalently, the only solution
which is orthogonal to the kernel. The reader should check this by finding the coefficients
in the linear combination, or, equivalently, writing w = AT v for some v R 4 .
In this example, the analysis was simplified by the fact that the kernel was onedimensional, and hence the orthogonal projection was relatively easy to compute. In more
complicated situations, to determine the decomposition x = w + z one needs to solve the
1/12/04
190
c 2003
Peter J. Olver
normal equations (4.26) in order to find the orthogonal projection or least squares point in
the subspace; alternatively, one can first determine an orthogonal basis for the subspace,
and then apply the orthogonal (or orthonormal) projection formula (5.57). Of course,
once one of the constituents w, z has been found, the other can be simply obtained by
subtraction from x.
1/12/04
191
c 2003
Peter J. Olver
Chapter 6
Equilibrium
In this chapter, we turn to some interesting applications of linear algebra to the
analysis of mechanical structures and electrical circuits. We will discover that there are remarkable analogies between electrical and mechanical systems. Both fit into a very general
mathematical framework which, when suitably formulated, will also apply in the continuous realm, and ultimately governs the equilibria of systems arising throughout physics and
engineering. The one difference is that discrete structures and circuits are governed by
linear algebraic equations on finite-dimensional vector spaces, whereas continuous media
are modeled by differential equations and boundary value problems on infinite-dimensional
function spaces.
We begin by analyzing in detail a linear chain of masses interconnected by springs
and constrained to move only in the longitudinal direction. Our general mathematical
framework is already manifest in this rather simple mechanical structure. Next, we consider
simple electrical circuits consisting of resistors and current sources interconnected by a
network of wires. Finally, we treat small (so as to remain in a linear regime) displacements
of two and three-dimensional structures constructed out of elastic bars. In all cases, we
only consider the equilibrium configurations; dynamical processes for each of the physical
systems will be taken up in Chapter 9.
In the mechanical and electrical systems treated in the present chapter, the linear system governing the equilibrium configuration has the same structure: the coefficient matrix
is of general positive (semi-)definite Gram form. The positive definite cases correspond
to stable structures and circuits, which can support any external forcing, and possess a
unique stable equilibrium solution that can be characterized by a minimization principle.
On the other hand, the positive semi-definite cases correspond to unstable structures and
circuits that cannot remain in equilibrium except for very special configurations of external forces. In the case of mechanical structures, the instabilities are of two types: rigid
motions, under which the structure maintains its overall absence of any applied force.
192
c 2003
Peter J. Olver
m1
m2
m3
Figure 6.1.
allow the masses to move in the vertical direction one-dimensional motion. (Section 6.3
deals with the more complicated cases of two- and three-dimensional motion.)
If we subject some or all of the masses to an external force, e.g., gravity, then the
system will move to a new equilibrium position. The motion of the ith mass is measured
by its displacement ui from its original position, which, since we are only allowing vertical
motion, is a scalar. Referring to Figure 6.1, we use the convention that u i > 0 if the mass
has moved downwards, and ui < 0 if it has moved upwards. The problem is to determine
the new equilibrium configuration of the chain under the prescribed forcing, that is, to set
up and solve a system of equations for the displacements u1 , . . . , un .
Let ej denote the elongation of the j th spring, which connects mass mj1 to mass mj .
By elongation, we mean how far the spring has been stretched, so that e j > 0 if the spring
is longer than its reference length, while ej < 0 if the spring has been compressed. The
elongations can be determined directly from the displacements according to the geometric
formula
ej = uj uj1 ,
j = 2, . . . , n,
(6.1)
while
e1 = u1 ,
en+1 = un ,
(6.2)
since the top and bottom supports are fixed. We write the elongation equations (6.1), (6.2)
in matrix form
e = A u,
(6.3)
The differential equations governing its dynamical behavior will be the subject of Chapter 9. Damping or frictional effects will cause the system to eventually settle down into a stable
equilibrium configuration.
1/12/04
193
c 2003
Peter J. Olver
e
1
e2
where e =
..
.
.
un
en+1
the coefficient matrix
1 1
1 1
1
1
(6.4)
A=
.
.
..
..
1 1
1
has size (n + 1) n, with only the non-zero entries being indicated. The matrix A is
known as the reduced incidence matrix for the massspring chain. It effectively encodes
the underlying geometry of the massspring chain, including the boundary conditions at
the top and the bottom.
The next step is to connect the elongation ej experienced by the j th spring to its internal force yj . This is the basic constitutive assumption, that relates geometry to kinematics.
In the present case, we shall assume that the springs are not stretched (or compressed)
particularly far, and so obey Hookes Law
yj = c j e j ,
(6.5)
named after the prolific seventeenth century English scientist and inventor Robert Hooke.
The constant cj > 0 measures the springs stiffness. Hookes Law says that force is
proportional to elongation the more you stretch a spring, the more internal force it
experiences. A hard spring will have a large stiffness and so takes a large force to stretch,
whereas a soft spring will have a small, but still positive, stiffness. We write (6.5) in matrix
form
y = C e,
(6.6)
where
y
1
y2
y=
..
.
yn+1
C=
c2
..
.
cn+1
The connection with the incidence matrix of a graph will become evident in Section 6.2.
1/12/04
194
c 2003
Peter J. Olver
the ith spring and above the (i + 1)st spring. If the ith spring is stretched, it will exert an
upwards force on mi , while if the (i + 1)st spring is stretched, it will pull mi downwards.
Therefore, the balance of forces on mi requires that
fi = yi yi+1 .
(6.7)
(6.8)
where f = (f1 , . . . , fn )T . The remarkable, and very general fact is that the force balance
coefficient matrix
1 1
1 1
1
1
(6.9)
AT =
1 1
..
..
.
.
1
is the transpose of the reduced incidence matrix (6.4) for the chain. This connection
between geometry and force balance turns out to be very general, and is the reason underlying the positivity of the final coefficient matrix in the resulting system of equilibrium
equations.
Summarizing, we have
e = A u,
y = C e,
f = AT y.
(6.10)
K = AT C A
where
(6.11)
is called the stiffness matrix associated with the entire massspring chain. The stiffness
matrix K has the form of a Gram matrix (3.51) for the weighted inner product h v ; w i =
vT C w induced by the diagonal matrix of spring stiffnesses. Theorem 3.33 tells us that
since A has linearly independent columns (which should be checked), and C > 0 is positive
definite, then the stiffness matrix K > 0 is automatically positive definite. In particular,
Theorem 3.38 guarantees that K is an invertible matrix, and hence the linear system (6.11)
has a unique solution u = K 1 f . We can therefore conclude that the massspring chain
assumes a unique equilibrium position.
In fact, in the particular case considered here,
c1 + c 2
c2
c2
c2 + c 3
c3
c3
c3 + c 4
c4
c4
c4 + c 5 c 5
(6.12)
K=
.
.
.
.
.
.
.
.
.
c
c
+c
c
n1
n1
cn
1/12/04
195
cn + cn+1
c 2003
Peter J. Olver
has a very simple symmetric, tridiagonal form. As such, we can apply our tridiagonal
solution algorithm of Section 1.7 to rapidly solve the system.
Example 6.1. Let us consider the particular case of n = 3 masses connected by
identical springs with unit spring constant. Thus, c1 = c2 = c3 = c4 = 1 and C =
diag (1, 1, 1, 1) = I is the 4 4 identity matrix. The 3 3 stiffness matrix is then
1 1
T
K=A A= 0 1
0 0
1
0
1
0
0
1
0
0
1
1
0
1
1
0
0
2
0
= 1
1
0
1
1
2
1
0
1 .
2
1
0 0
2 1 0
1 12
2 0 0
0
1 2 1 = 1
1 0 0 32 0 0 1 23 .
2
0 1 2
0 0
1
0 32 1
0 0 43
With this in hand, we can solve the basic equilibrium equations K u = f by our basic
forward and back substitution algorithm.
Suppose, for example, we pull the middle mass downwards with a unit force, so f 2 = 1
T
while f1 = f3 = 0. Then f = ( 0, 1, 0 ) , and the solution to the equilibrium equations
1
T
(6.11) is u = 2 , 1, 12 , whose entries prescribe the mass displacements. Observe that
all three masses have moved down, with the middle mass moving twice as far as the other
two. The corresponding spring elongations and internal forces are obtained by matrix
multiplication
T
y = e = A u = 12 , 12 , 12 , 12
.
Thus the top two springs are elongated, while the bottom two are compressed, all by an
equal amount.
Similarly, if all the masses are equal, m1 = m2 = m3 = m, then the solution under a
T
constant downwards gravitational force f = ( m g, m g, m g ) of magnitude g is
3
m
g
mg
2
u = K 1 m g = 2 m g ,
mg
3
2
mg
and
y = e = Au =
2 m g,
1
2
m g, 21 m g, 23 m g
Now, the middle mass has only moved 33% farther than the others, whereas the top and
bottom spring are experiencing three times as much elongation/compression as the middle
springs.
An important observation is that we cannot determine the internal forces y or elongations e directly from the force balance law (6.8) because the transposed matrix A T is
not square, and so the system f = AT y does not have a unique solution. We must first
1/12/04
196
c 2003
Peter J. Olver
determine the displacements u using the full equilibrium equations (6.11), and then use the
resulting displacements to reconstruct the elongations and internal forces. This situation
is referred to as statically indeterminate.
Remark : Even though we construct K = AT C A and then factor it as K = L D LT ,
there is no direct algorithm to get from A and C to L and D, which, typically, are matrices
of a different size.
The behavior of the system will depend upon both the forcing and the boundary
conditions. Suppose, by way of contrast, that we only fix the top of the chain to a support,
and leave the bottom mass hanging freely, as in Figure 6.2. The geometric relation between
the displacements and the elongations has the same form (6.3) as before, but the reduced
incidence matrix is slightly altered:
1
1
A=
1
1
1
1
1
..
.
..
.
1
(6.13)
This matrix has size n n and is obtained from the preceding example (6.4) by eliminating
the last row corresponding to the missing bottom spring. The constitutive equations are
still governed by Hookes law y = C e, as in (6.6), with C = diag (c 1 , . . . , cn ) the n n
diagonal matrix of spring stiffnesses. Finally, the force balance equations are also found
to have the same general form f = AT y as in (6.8), but with the transpose of the revised
incidence matrix (6.13). In conclusion, the equilibrium equations K x = f have an identical
form (6.11), based on the revised stiffness matrix
c1 + c 2
c2
T
K = A CA =
c2
c2 + c 3
c3
c3
c3 + c 4
c4
c4
c4 + c 5
..
c5
..
cn1
..
cn1 + cn
cn
cn
cn
(6.14)
Note that only the bottom right entry is different from the fixed end version (6.12). In
contrast to the chain with two fixed ends, this system is called statically determinate
because the incidence matrix A is square and nonsingular. This means that it is possible
to solve the force balance law (6.8) directly for the internal forces y = A 1 f without having
to solve the full equilibrium equations for the displacements.
Example 6.2.
1/12/04
For a three mass chain with one free end and equal unit spring
197
c 2003
Peter J. Olver
m1
m2
m3
Figure 6.2.
1
1 1 0
T
1
K = A A = 0 1 1
0
0 0
1
is
2
0 0
1 0 = 1
0
1 1
1 0
2 1 .
1 1
T
Pulling the middle mass downwards with a unit force, whereby f = ( 0, 1, 0 ) , results in
the displacements
1
1
so that
y = e = A u = 1 .
u = K 1 f = 2 ,
0
2
In this configuration, the bottom two masses have moved equal amounts, and twice as far
as the top mass. Because we are only pulling on the middle mass, the lower-most spring
hangs free and experiences no elongation, whereas the top two springs are stretched by the
same amount.
Similarly, for a chain of equal masses subject to a constant downwards gravitational
T
force f = ( m g, m g, m g ) , the equilibrium position is
mg
3mg
3mg
and
y = e = A u = 2 m g .
u = K 1 m g = 5 m g ,
mg
6mg
mg
Note how much further the masses have moved now that the restraining influence of the
bottom support has been removed. The top spring is experiencing the most elongation,
and is thus the most likely to break, because it must support all three masses.
The Minimization Principle
According to Theorem 4.1, when the coefficient matrix of the linear system governing a
massspring chain is positive definite, the unique equilibrium solution can be characterized
by a minimization principle. The quadratic function to be minimized has a physical interpretation: it is the potential energy of the system. Nature is parsimonious when it comes to
energy: physical systems seek out equilibrium configurations that minimize energy. This
1/12/04
198
c 2003
Peter J. Olver
n
X
f i ui = u T f .
i=1
Next, we calculate the internal energy of the system. The potential energy in a single
spring elongated by an amount e is obtained by integrating the internal force, y = c e,
leading to
Z e
Z e
y de =
c e de = 12 c e2 .
0
Totalling the contributions from each spring, we find the internal spring energy to be
n
1 X
c e2 =
2 i=1 i i
1
2
eT C e =
1
2
uT AT CA u =
1
2
uT K u,
where we used the incidence equation e = A u relating elongation and displacement. Therefore, the total potential energy is
p(u) =
1
2
uT K u u T f .
(6.15)
Since K > 0, Theorem 4.1 implies that this quadratic function has a unique minimizer
that satisfies the equilibrium equation K u = f .
Example 6.3. For a three mass chain with two fixed ends described in Example 6.1,
the potential energy function (6.15) has the explicit form
f1
2 1 0
u1
1
p(u) = ( u1 u2 u3 ) 1 2 1 u2 ( u1 u2 u3 ) f2
2
f3
u3
0 1 2
= u21 u1 u2 + u22 u2 u3 + u23 f1 u1 f2 u2 f3 u3 ,
T
199
c 2003
Peter J. Olver
u1
R1
R2
u2
R3
u3
R5
R4
u4
Figure 6.3.
nodes the vertices. To begin with we assume that there are no electrical devices (batteries,
inductors, capacitors, etc.) in the network and so the the only impediment to current
flowing through the network is each wires resistance. (If desired, we may add resistors
to the network to increase the resistance along the wires.) As we shall see, resistance (or,
rather, its reciprocal) plays a very similar role to spring stiffness.
We shall introduce current sources into the network at one or more of the nodes, and
would like to determine how the induced current flows through the wires in the network.
The basic equilibrium equations for the currents are the consequence of three fundamental
laws of electricity.
Voltage is defined as the electromotive force that moves electrons through a wire. is
induced by a drop in the voltage potential along the wire. The voltage in a wire is induced
by the difference in the voltage potentials at the two ends, just as the gravitational force on
a mass is induced by a difference in gravitational potential. To quantify voltage, we need
to assign an orientation to the wire. Then a positive voltage means the electrons move in
the assigned direction, while under a negative voltage they move in reverse. The original
choice of orientation is arbitrary, but once assigned will pin down the sign conventions used
by voltages, currents, etc. To this end, we draw a digraph to represent the network, and
each edge or wire is assigned a direction that indicates its starting and ending vertices or
nodes. A simple example is illustrated in Figure 6.3, and contains five wires joined at four
different nodes. The arrows indicate the orientations of the wires, while the wavy lines are
the standard electrical symbols for resistance.
In an electrical network, each node will have a voltage potential, denoted u i . If wire
k starts at node i and ends at node j, under its assigned orientation, then its voltage v k
equals the potential difference at its ends:
vk = ui uj .
(6.16)
Note that vk > 0 if ui > uj , and so the electrons go from the starting node i to the ending
1/12/04
200
c 2003
Peter J. Olver
node j, in accordance with our choice of orientation. In our particular illustrative example,
v1 = u1 u2 ,
v 2 = u1 u3 ,
v 3 = u1 u4 ,
v 4 = u2 u4 ,
v 5 = u3 u4 .
(6.17)
A = 1 0
0 1
0 0
0
1
0
0
1
0
0
1 .
1
1
(6.18)
The alert reader will recognize this matrix as the incidence matrix (2.42) for the digraph
defined by the circuit; see (2.42). This is true in general the voltages along the wires
of an electrical network are related to the potentials at the nodes by a linear system of the
form (6.17), where A is the incidence matrix of the network digraph. The rows of the
incidence matrix are indexed by the wires; the columns are indexed by the nodes. Each
row of the matrix A has a single + 1 in the column indexed by the starting node, and a
single 1 in the column of the ending node.
Kirchhoff s Voltage Law states that the sum of the voltages around each closed loop in
the network is zero. For example, in the circuit under consideration, around the left-hand
triangle we have
v1 + v4 v3 = (u1 u2 ) + (u2 u4 ) (u1 u4 ) = 0.
Note that v3 appears with a minus sign since we must traverse wire #3 in the opposite
direction to its assigned orientation when going around the loop in the counterclockwise
direction. The voltage law is a direct consequence of (6.17). Indeed, as discussed in
Section 2.6, the loops can be identified with vectors ` coker A = ker A T in the cokernel
of the incidence matrix, and so
` v = `T v = `T A u = 0.
(6.19)
Therefore, orthogonality of the voltage vector v to the loop vector ` is the mathematical
formulation of the zero-loop relation.
Given a prescribed set of voltages v along the wires, can one find corresponding voltage
potentials u at the nodes? To answer this question, we need to solve v = A u, which
requires v rng A. According to the Fredholm Alternative Theorem 5.51, the necessary
and sufficient condition for this to hold is that v be orthogonal to coker A. Theorem 2.51
says that the cokernel of an incidence matrix is spanned by the loop vectors, and so v is
a possible set of voltages if and only if v is orthogonal to all the loop vectors ` coker A,
i.e., the Voltage Law is necessary and sufficient for the given voltages to be physically
realizable in the network.
Kirchhoffs Laws are related to the topology of the circuit how the different wires
are connected together. Ohms Law is a constitutive relation, indicating what the wires
1/12/04
201
c 2003
Peter J. Olver
are made of. The resistance along a wire, including any added resistors, prescribes the
relation between voltage and current or the rate of flow of electric charge. The law reads
v k = R k yk ,
(6.20)
where vk is the voltage and yk (often denoted Ik in the engineering literature) denotes the
current along wire k. Thus, for a fixed voltage, the larger the resistance of the wire, the
smaller the current that flows through it. The direction of the current is also prescribed
by our choice of orientation of the wire, so that yk > 0 if the current is flowing from the
starting to the ending node. We combine the individual equations (6.20) into a matrix
form
v = R y,
(6.21)
where the resistance matrix R = diag (R1 , . . . , Rn ) > 0 is diagonal and positive definite.
We shall, in analogy with (6.6), replace (6.21) by the inverse relationship
y = C v,
(6.22)
where C = R1 is the conductance matrix , again diagonal, positive definite, whose entries
are the conductances ck = 1/Rk of the wires. For the particular circuit in Figure 6.3,
C=
c1
c2
c3
c4
c5
1/R1
1/R2
1/R3
1/R4
1/R5
(6.23)
Finally, we stipulate that electric current is not allowed to accumulate at any node, i.e.,
every electron that arrives at a node must leave along one of the wires. Let y k , yl , . . . , ym
denote the currents along all the wires k, l, . . . , m that meet at node i in the network, and
fi an external current source, if any, applied at node i. Kirchhoff s Current Law requires
that the net current into the node, namely
yk yl ym + fi = 0,
(6.24)
must be zero. Each sign is determined by the orientation of the wire, with if node i
is a starting node or + if it is an ending node.
In our particular example, suppose that we send a 1 amp current source into the first
node. Then Kirchhoffs Current Law requires
y1 + y2 + y3 = 1,
y1 + y4 = 0,
y2 + y5 = 0,
y3 y4 y5 = 0.
Since we have solved (6.24) for the currents, the signs in front of the y i have been reversed,
with + now indicating a starting node and an ending node. The matrix form of this
system is
AT y = f ,
(6.25)
1/12/04
202
c 2003
Peter J. Olver
1
1
1
0
0
0
1
0
1 0
AT =
(6.26)
,
0 1 0
0
1
0
0 1 1 1
is the transpose of the incidence matrix (6.18). As in the massspring chain, this is a
general fact, and is an immediate result of Kirchhoffs two laws. The coefficient matrix for
the current law is the transpose of the incidence matrix for the voltage law.
Let us assemble the full system of equilibrium equations:
v = A u,
y = C v,
f = AT y.
(6.27)
Remarkably, we arrive at a system of linear relations that has an identical form to the
massspring chain system (6.10). As before, they combine into a single linear system
Ku = f,
where
K = AT C A
(6.28)
is the resistivity matrix associated with the given network. In our particular example,
combining (6.18), (6.23), (6.26) produces the resistivity matrix
c1 + c 2 + c 3
c1
c2
c3
c1
c1 + c 4
0
c4
K = AT C A =
(6.29)
c2
0
c2 + c5
c5
c3
c4
c5
c3 + c 4 + c 5
depending on the conductances of the five wires in the network.
Remark : There is a simple pattern to the resistivity matrix, evident in (6.29). The
diagonal entries kii equal the sum of the conductances of all the wires having node i at
one end. The non-zero off-diagonal entries kij , i 6
= j, equal ck , the conductance of the
wire joining node i to node j, while kij = 0 if there is no wire joining the two nodes.
Consider the case when all the wires in our network have equal unit resistance, and
so ck = 1/Rk = 1 for k = 1, . . . , 5. Then the resistivity matrix is
3 1 1 1
0 1
1 2
(6.30)
K=
.
1 0
2 1
1 1 1 3
However, trying to solve the system (6.28) runs into an immediate difficulty: there is no
solution! The matrix (6.30) is not positive definite it has zero determinant, and so is
T
not invertible. Moreover, the particular current source vector f = ( 1, 0, 0, 0 ) does not lie
in the range of K. Something is clearly amiss.
This assumes that there is only one wire joining the two nodes.
1/12/04
203
c 2003
Peter J. Olver
Before getting discouraged, let us sit back and use a little physical intuition. We are
trying to put a 1 amp current into the network at node 1. Where can the electrons go? The
answer is nowhere they are trapped in the circuit and, as they accumulate, something
drastic will happen sparks will fly! This is clearly an unstable situation, and so the fact
that the equilibrium equations do not have a solution is trying to tell us that the physical
system cannot remain in a steady state. The physics rescues the mathematics, or, vice
versa, the mathematics elucidates the underlying physical processes!
In order to achieve a steady state in an electrical network, we must remove as much
current as we put in. In other words, the sum of all the current sources must vanish:
f1 + f2 + + fn = 0.
For example, if we feed a 1 amp current into node 1, then we must extract a total of 1
amps worth of current from the other nodes. If we extract a 1 amp current from node
T
4, the modified current source vector f = ( 1, 0, 0, 1 ) does indeed lie in the range of K
(check!) and the equilibrium system (6.28) has a solution. Fine . . .
But we are not out of the woods yet. As we know, if a linear system has a singular
square coefficient matrix, then either it has no solutions the case we already rejected
or it has infinitely many solutions the case we are considering now. In the particular
network under consideration, the general solution to the linear system
u1
3 1 1 1
1
0 1 u2 0
1 2
1 0
2 1
u3
0
1 1 1 3
u4
1
is found by Gaussian elimination:
1
2
u = 4
1
4
1
2
1
+ t
1
= 41 + t ,
1
+ t
4
1
t
0
+t
(6.31)
1
2
+ t,
u2 =
1
4
+ t,
u3 =
1
4
+ t,
u4 = t,
204
c 2003
Peter J. Olver
On the other hand, even without specification of a baseline potential level, the corresponding voltages and currents along the wires are uniquely specified. In our example,
computing y = v = A u gives
y1 = v 1 =
1
4
y2 = v 2 =
1
4
y3 = v 3 =
1
2
y4 = v 4 =
1
4
y5 = v 5 =
1
4
independent of the value of t in (6.31). Thus, the nonuniqueness of the voltage potential
solution u is not an essential difficulty. All physical quantities that we can measure
currents and voltages are uniquely specified by the solution to the equilibrium system.
Remark : Although they have no real physical meaning, we cannot dispense with the
nonmeasurable (and non-unique) voltage potentials u. Most circuits are statically indeterminate since their incidence matrix is rectangular and not invertible, and so the linear
system AT y = f cannot be solved directly for the currents in terms of the voltage sources
it does not have a unique solution. Only by first solving the full equilibrium system
(6.28) for the potentials, and then using the relation y = CA u between the potentials and
the currents, can we determine the actual values of the currents in our network.
Let us analyze what is going on in the context of our general mathematical framework.
Proposition 3.32 says that the resistivity matrix K = AT CA is positive definite (and
hence nonsingular) provided A has linearly independent columns, or, equivalently, ker A =
{0}. But Proposition 2.49 says that the incidence matrix A of a directed graph never
has a trivial kernel. Therefore, the resistivity matrix K is only positive semi-definite,
and hence singular. If the network is connected, then ker A = ker K = coker K is oneT
dimensional, spanned by the vector z = ( 1, 1, 1, . . . , 1 ) . According to the Fredholm
Alternative Theorem 5.51, the fundamental network equation K u = f has a solution if
and only if f is orthogonal to coker K, and so the current source vector must satisfy
f z = f1 + f2 + + fn = 0,
(6.32)
as we already observed. Therefore, the linear algebra reconfirms our physical intuition: a
connected network admits an equilibrium configuration, obtained by solving (6.28), if and
only if the nodal current sources add up to zero, i.e., there is no net influx of current into
the network.
Grounding one of the nodes is equivalent to nullifying the value of its voltage potential:
ui = 0. This variable is now fixed, and can be safely eliminated from our system. To
accomplish this, we let A? denote the m (n 1) matrix obtained by deleting the ith
column from A. For example, if we ground node number 4 in our sample network, then
we erase the fourth column of the incidence matrix (6.18), leading to the reduced incidence
matrix
1 1 0
1 0 1
?
(6.33)
0 .
A = 1 0
0 1
0
0 0
1
1/12/04
205
c 2003
Peter J. Olver
The key observation is that A? has trivial kernel, ker A? = {0}, and therefore the reduced
network resistivity matrix
c1 + c 2 + c 3
c1
c2
(6.34)
K ? = (A? )T C A? =
c1
c1 + c 4
0 .
c2
0
c2 + c5
is positive definite. Note that we can obtain K ? directly from K by deleting both its
T
fourth row and fourth column. Let f ? = ( 1, 0, 0 ) denote the reduced current source
vector obtained by deleting the fourth entry from f . Then the reduced linear system is
K ? u? = f ? ,
where
u ? = ( u1 , u2 , u3 ) ,
(6.35)
is the reduced voltage potential vector. Positive definiteness of K ? implies that (6.35) has
a unique solution u? , from which we can reconstruct the voltages v = A? u? and currents
y = C v = CA? u? along the wires. In our example, if all the wires have unit resistance,
then the reduced system (6.35) is
1
u1
3 1 1
1 2
0 u2 = 0 ,
0
u3
1 0
2
T
and has unique solution u? = 21 14 41 . The voltage potentials are
u1 = 12 ,
u2 = 41 ,
u3 = 14 ,
u4 = 0,
and correspond to the earlier solution (6.31) when t = 0. The corresponding voltages and
currents along the wires are the same as before.
So far, we have only considered the effect of current sources at the nodes. Suppose
now that the circuit contains one or more batteries. Each battery serves as a voltage source
along one of the wires, and we let bk denote the voltage of a battery connected to wire
k. The quantity bk comes with a sign, indicated by the batterys positive and negative
terminals. Our convention is that bk > 0 if the current from the battery runs in the same
direction as our chosen orientation of the wire. The battery voltage modifies the voltage
balance equation (6.16):
vk = ui uj + bk .
The corresponding matrix form (6.17) becomes
v = A u + b,
(6.36)
where b = ( b1 , b2 , . . . , bm ) is the battery vector whose entries are indexed by the wires.
(If there is no battery on wire k, the corresponding entry is bk = 0.) The remaining two
equations are as before, so y = C v are the currents in the wires, and, in the absence of
external current sources, Kirchhoffs Current Law implies AT y = 0. Using the modified
formula (6.36) for the voltages, these combine into the following equilibrium system
K ? u = AT C A u = AT C b.
1/12/04
206
(6.37)
c 2003
Peter J. Olver
R 12
u7
R7
u8
R 10
R9
u3
R6
u4
R2
R4
R8
u6
R3
u1
Figure 6.4.
R 11
u5
R5
R1
u2
Thus, interestingly, the voltage potentials satisfy the normal weighted least squares equations (4.54) corresponding to the system A u = b, with weights given by the conductances
in the individual wires in the circuit. It is a remarkable fact that Nature solves a least
squares problem in order to make the weighted norm of the voltages v as small as possible.
Furthermore, the batteries have exactly the same effect on the voltage potentials as if
we imposed the current source vector
f = AT C b.
(6.38)
Namely, the effect of the battery of voltage bk on wire k is the exactly the same as introducing an additional current sources of ck bk at the starting node and ck bk at the ending
node. Note that the induced current vector f rng K continues to satisfy the network
constraint (6.32). Vice versa, a given system of current sources f has the same effect as
any collection of batteries b that satisfies (6.38).
Unlike a current source, a circuit with a battery always admits a solution for the voltage potentials and currents. Although the currents are uniquely determined, the voltage
potentials are not. As before, to eliminate the ambiguity, we can ground one of the nodes
and use the reduced incidence matrix A? and reduced current source vector f ? obtained
by eliminating the column/entry corresponding to the grounded node.
Example 6.4. Consider an electrical network running along the sides of a cube,
where each wire contains a 2 ohm resistor and there is a 9 volt battery source on one wire.
The problem is to determine how much current flows through the wire directly opposite the
battery. Orienting the wires and numbering them as indicated in Figure 6.4, the incidence
1/12/04
207
c 2003
Peter J. Olver
matrix is
1
1
0
A=
0
0
0
1
0
0
1
1
0
0
0
0
0
0
0
0
1
0
0
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
.
0
1
1
We connect the battery along wire #1 and measure the resulting current along wire #12.
To avoid the ambiguity in the voltage potentials, we ground the last node and erase the
final column from A to obtain the reduced incidence matrix A? . Since the resistance
matrix R has all 2s along the diagonal, the conductance matrix is C = 21 I . Therefore the
network resistivity matrix is
3 1 1 1 0
0
0
0
0 1 1 0
1 3
1 0
3
0 1 0 1
?
T
?
?
? T
?
0
3
0 1 1 .
K = (A ) CA = 12 (A ) A = 1 0
2
3
0
0
0 1 1 0
0 1 0 1 0
3
0
0
0 1 1 0
0
3
T
.
u? = (K ? )1 f ? = 3, 49 , 98 , 98 , 38 , 83 , 34
Thus, the induced currents along the sides of the cube are
15
15 15 15
3
3
3
3
3
3
3 T
y = C v = C (A? u? + b) = 15
.
8 , 16 , 16 , 16 , 16 , 4 , 16 , 4 , 16 , 16 , 16 , 8
In particular, the current on the wire that is opposite the battery is y12 = 38 , flowing in
the opposite direction to its orientation. The most current flows through the battery wire,
while wires 7, 9, 10 and 11 transmit the least current.
The Minimization Principle and the ElectricalMechanical Analogy
As with a massspring chain, the current flows in such a resistive electrical network
can be characterized by a minimization principle. The power in a wire is defined as the
1/12/04
208
c 2003
Peter J. Olver
(6.39)
where R is the resistance, c = 1/R the conductance, and we are using Ohms Law (6.20)
to relate voltage and current. Physically, the power tells us the rate at which electrical
energy is converted into heat or energy by the resistance along the wire.
Summing over all the wires in the network, the total power is the dot product
P =
m
X
yk vk = yT v = vT C v = (A u + b)T C (A u + b)
k=1
= uT AT C A u + 2 uT AT C b + bT C b.
P = p(u) =
1
2
uT K u uT f + c,
(6.40)
1
2
(u? )T K ? u? (u? )T f ? ,
for the power now has a positive definite coefficient matrix K ? > 0. The minimizer of
the power function is the solution u? to the reduced linear system (6.35). Therefore, the
network adjusts itself to minimize the power or total energy loss! Just as with mechanical
systems, Nature solves a minimization problem in an effort to conserve energy.
We have discovered the remarkable correspondence between the equilibrium equations
for electrical networks (6.10), and those of massspring chains (6.27). This Electrical
Mechanical Correspondence is summarized in the following table. In the following section,
we will see that the analogy extends to more general structures. In Chapter 15, we will
discover that it continues to apply in the continuous regime, and subsumes solid mechanics,
fluid mechanics, electrostatics, and many other physical systems in a common mathematical framework!
For alternating currencts, there is no annoying factor of 2 in the formula for the power, and
the analogy is more direct.
1/12/04
209
c 2003
Peter J. Olver
Structures
Variables
Networks
Displacements
Elongations
Spring stiffnesses
Internal Forces
External forcing
Stiffness matrix
Potential energy
u
v = Au
C
y=Cv
f = AT y
K = AT C A
p(u) = 21 uT Ku uT f
Voltages
Voltage drops
Conductivities
Currents
Current sources
Resistivity matrix
1
2 Power
Prestressed bars/springs
v = Au + b
Batteries
210
c 2003
Peter J. Olver
Figure 6.5.
Consider an unstressed bar with one end at position a1 R d and the other end at
T
position a2 R d . In d = 2 dimensions, we write ai = ( ai , bi ) , while in d = 3-dimensional
T
space ai = ( ai , bi , ci ) . The length of the bar is L = k a1 a2 k, where we use the standard
Euclidean norm to measure distance on R d throughout this section.
Suppose we move the ends of the bar a little, sending ai to bi = ai + ui and
simultaneously aj to bj = aj + uj . The unit vectors ui , uj R d indicate the respective
direction of displacement of the two ends, and we think of > 0, the magnitude of the
displacement, as small. How much has this motion stretched the bar? The length of the
displaced bar is
L + e = k bi bj k = k (ai + ui ) (aj + uj ) k = k (ai aj ) + (ui uj ) k
q
(6.41)
= k ai aj k2 + 2 (ai aj ) (ui uj ) + 2 k ui uj k2 .
The difference between the new length and the original length, namely
q
k ai aj k2 + 2 (ai aj ) (ui uj ) + 2 k ui uj k2 k ai aj k,
e =
(6.42)
(6.43)
as in Figure 6.5. In the case of small displacements of a bar, the elongation (6.42) is a
square root function of the particular form
p
g() = a2 + 2 b + 2 c2 a,
1/12/04
211
c 2003
Peter J. Olver
where
a = k ai aj k,
b = (ai aj ) (ui uj ),
c = k ui uj k,
b
are independent of . Since g(0) = 0 and g 0 (0) = , the linear approximation (6.43) has
a
the form
p
b
a2 + 2 b + 2 c 2 a
for
1.
a
In this manner, we arrive at the linear approximation to the bars elongation
e
(ai aj ) (ui uj )
= n ( ui uj ),
k ai aj k
where
n=
(ai aj )
k ai aj k
is the unit vector, k n k = 1, that points in the direction of the bar from node j to node i.
The overall small factor of was merely a device used to derive the linear approximation. It can now be safely discarded, so that the displacement of the i th node is now ui
instead of ui , and we assume k ui k is small. If bar k connects node i to node j, then its
(approximate) elongation is equal to
ek = nk (ui uj ) = nk ui nk uj ,
where
nk =
ai a j
.
k ai aj k
(6.44)
The elongation ek is the sum of two terms: the first, nk ui , is the component of the
displacement vector for node i in the direction of the unit vector nk that points along the
bar towards node i, whereas the second, nk uj , is the component of the displacement
vector for node j in the direction of the unit vector nk that points in the opposite
direction along the bar towards node j. Their sum gives the total elongation of the bar.
We assemble all the linear equations (6.44) relating nodal displacements to bar elongations in matrix form
e = A u.
(6.45)
e
u
1
e2
u
R m is the vector of elongations, while u = .2 R d n is the vector
Here e =
.
.
.
.
.
em
un
of displacements. Each ui R d is itself a column vector with d entries,andso u has a
xi
total of d n entries. For example, in the planar case d = 2, we have ui =
since each
yi
nodes displacement has both an x and y component, and so
x
1
y1
u
1
x2
u2
= y2 R 2 n .
u =
.
.
.
.
.
.
un
xn
yn
1/12/04
212
c 2003
Peter J. Olver
Figure 6.6.
R3 n.
The incidence matrix A connecting the displacements and elongations will be of size
m d n. The k th row of A will have (at most) 2 d nonzero entries. The entries in the d slots
corresponding to node i will be the components of the (transposed) unit bar vector n Tk
pointing towards node i, as given in (6.44), while the entries in the d slots corresponding to
node j will be the components of its negative nTk , which is the unit bar vector pointing
towards node j. All other entries are 0. The constructions are best appreciated by working
through an explicit example.
Example 6.5. Consider the planar structure pictured in Figure 6.6. The four nodes
are at positions
a1 = (0, 0)T ,
a2 = (1, 1)T ,
a3 = (3, 1)T ,
a4 = (4, 0)T ,
so the two side bars are at 45 angles and the center bar is horizontal. Applying our
algorithm, the associated incidence matrix is
12 12 12 12
0
0 0
0
(6.46)
A= 0
1
0 0
0 .
0 1 0
1
1
1
1
0
0 0
0 2 2 2 2
The three rows of A refer to the three bars in our structure. The columns come in pairs,
as indicated by the vertical lines in the matrix: the first two columns refer to the x and
y displacements of the first node; the third and fourth columns refer to the second node,
and so on. The first two entries of the first row of A indicate the unit vector
T
a1 a 2
= 12 , 12
n1 =
k a1 a2 k
that points along the first bar towards the first node, while the third and fourth entries
have the opposite signs, and form the unit vector
T
a2 a 1
= 12 , 12
n1 =
k a2 a1 k
1/12/04
213
c 2003
Peter J. Olver
along the same bar that points in the opposite direction towards the second node. The
remaining entries are zero because the first bar only connects the first two nodes. Similarly,
the unit vector along the second bar pointing towards node 2 is
n2 =
a2 a 3
T
= ( 1, 0 ) ,
k a2 a3 k
and this gives the third and fourth entries of the second row of A; the fifth and sixth entries
are their negatives, corresponding to the unit vector n2 pointing towards node 3. The
last row is constructed from the unit vector in the direction of bar #3 in the same fashion.
Remark : Interestingly, the incidence matrix for a structure only depends on the directions of the bars and not their lengths. This is analogous to the fact that the incidence
matrix for an electrical network only depends on the connectivity properties of the wires
and not on their overall lengths. Indeed, one can regard the incidence matrix for a structure
as a kind of ddimensional generalization of the incidence matrix for a directed graph.
The next phase of our procedure is to introduce the constitutive relations for the bars
in our structure that determine their internal forces or stresses. As we remarked at the
beginning of the section, each bar is viewed as a very strong spring, subject to a linear
Hookes law equation
yk = c k e k
(6.47)
that relates its elongation ek to its internal force yk . The bar stiffness ck > 0 is a positive
scalar, and so yk > 0 if the bar is in tension, while yk < 0 if the bar is compressed. In this
approximation, there is no bending and the bars will only experience external forcing at
the nodes. We write (6.47) in matrix form
y = C e,
where C = diag (c1 , . . . , cm ) > 0 is a diagonal, positive definite matrix.
Finally, we need to balance the forces at each node in order to achieve equilibrium.
If bar k terminates at node i, then it exerts a force yk nk on the node, where nk is
the unit vector pointing towards the node in the direction of the bar, as in (6.44). The
minus sign comes from physics: if the bar is under tension, so yk > 0, then it is trying to
contract back to its unstressed state, and so will pull the node towards it in the opposite
direction to nk while a bar in compression will push the node away. In addition, we
may have an externally applied force vector, denoted by f i , on node i, which might be
some combination of gravity, weights, mechanical forces, and so on. (In this admittedly
simplified model, external forces only act on the nodes.) Force balance at equilibrium
requires that the sum of all the forces, external and internal, at each node cancel; thus,
X
X
fi +
( yk nk ) = 0,
or
yk n k = f i ,
k
where the sum is over all the bars that are attached to node i. The matrix form of the
force balance equations is (and this should no longer come as a surprise)
f = AT y,
1/12/04
214
(6.48)
c 2003
Peter J. Olver
Figure 6.7.
A Triangular Structure.
T
f = AT y,
y = C e,
where
K = AT C A.
(6.49)
The stiffness matrix K is a positive (semi-)definite Gram matrix (3.51) associated with
the weighted inner product on the space of elongations prescribed by the diagonal matrix
C.
As we know, the stiffness matrix for our structure will be positive definite, K > 0, if
and only if the incidence matrix has trivial kernel: ker A = {0}. The preceding example,
and indeed all of these constructed so far, will not have this property, for the same reason
as in an electrical network because we have not tied down (or grounded) our structure
anywhere. In essence, we are considering a structure floating in outer space, which is free
to move around without changing its shape. As we will see, each possible rigid motion
of the structure will correspond to an element of the kernel of its incidence matrix, and
thereby prevent positive definiteness of the structure matrix K.
Example 6.6. Consider a planar space station in the shape of a unit equilateral
triangle, as in Figure 6.7. Placing the nodes at positions
a1 =
1
2,
3
2
a2 = ( 1, 0 ) ,
a3 = ( 0, 0 ) ,
1
3
1 3
0
0
2
2
2
2
3
3 1
A = 12
0
0 ,
2
2 2
0 1
0
0
0 1
1/12/04
215
c 2003
Peter J. Olver
Figure 6.8.
whose rows are indexed by the bars, and whose columns are indexed in pairs by the three
nodes. The kernel of A is three-dimensional, with basis
0
1
0
z2 =
,
1
0
1
1
0
1
z1 =
,
0
1
0
3
2
1
2
.
z3 =
(6.50)
These three displacement vectors correspond to three different planar rigid motions: the
first two correspond to translations, and the third to a rotation.
The translations are easy to discern. Translating the space station in a horizontal direction means that we move all three nodes the same amount, and so the displacements are
u1 = u2 = u3 = a for some fixed vector a. In particular, a rigid unit horizontal translation
T
has a = e1 = ( 1, 0 ) , and corresponds to the first kernel basis vector. Similarly, a unit
T
vertical translation of all three nodes corresponds to a = e2 = ( 0, 1 ) , and corresponds to
the second kernel basis vector. Any other translation is a linear combination of these two.
Translations do not alter the lengths of any of the bars, and so do not induce any stress
in the structure.
The rotations are a little more subtle, owing to the linear approximation that we used
to compute the elongations. Referring to Figure 6.8, rotating the space station through a
T
small angle around the node a3 = ( 0, 0 ) will move the other two nodes to positions
b1 =
1/12/04
1
2
1
2
cos
sin +
3
sin
2
3
2 cos
b2 =
216
!
cos
,
sin
!
0
b3 =
.
0
c 2003
(6.51)
Peter J. Olver
1
2
1
2
u1 = b 1 a 1 =
u2 = b 2 a 2 =
(cos 1)
sin +
3
2
!
cos 1
,
sin
3
2
sin
(cos 1)
!
0
u3 = b3 a3 =
,
0
(6.52)
do not combine into a vector that belongs to ker A. The problem is that, under a rotation,
the nodes move along circles, while the kernel displacements u = z ker A correspond
to straight line motion! In order to maintain consistency, we must adopt a similar linear
approximation of the nonlinear circular motion of the nodes. Thus, we replace the nonlinear
displacements uj () in (6.52) by their linear tangent approximations u0j (0), and so
u1
3
2
1
2
u2
!
0
1
u3 =
!
0
0
1
2
0 1 0 0
= z3
that moves the space station in the direction of the third element of the kernel of the
incidence matrix! Thus, as claimed, z3 represents the linear approximation to a rigid
rotation around the first node.
Remarkably, the rotations around the other two nodes, although distinct nonlinear
motions, can be linearly approximated by particular combinations of the three kernel basis
elements z1 , z2 , z3 , and so already appear in our description of ker A. For example, the
displacement vector
u=
3
2
z1 +
1
2
z2 z 3 = 0 0
3
2
12
3
2
1
2
(6.53)
represents the linear approximation to a rigid rotation around the first node. We conclude
that the three-dimensional kernel of the incidence matrix represents the sum total of all
possible rigid motions of the space station, or, more correctly, their linear approximations.
Which types of forces will maintain the space station in equilibrium? This will happen
if and only if we can solve the force balance equations
AT y = f
(6.54)
for the internal forces y. The Fredholm Alternative Theorem 5.51 implies that the system
(6.54) has a solution if and only if f is orthogonal to coker A T = ker A. Therefore, f =
1/12/04
217
c 2003
Peter J. Olver
z2 f = g1 + g2 + g3 = 0,
z3 f =
3
2
f1 +
1
2
g1 + g3 = 0.
The first requires that there is no net horizontal force on the space station. The second
requires no net vertical force. The last constraint requires that the moment of the forces
around the first node vanishes. The vanishing of the force moments around each of the
other two nodes follows, since the associated kernel vectors can be expressed as linear
combinations of the three basis elements. The physical requirements are clear. If there is
a net horizontal or vertical force, the space station will rigidly translate in that direction;
if there is a non-zero force moment, the station will rigidly rotate. In any event, unless
the force balance equations are satisfied, the space station cannot remain in equilibrium.
A freely floating space station is in an unstable configuration and can easily be set into
motion.
Since there are three independent rigid motions, we need to impose three constraints
on the structure in order to stabilize it. Grounding one of the nodes, i.e., preventing it
from moving by attaching it to a fixed support, will serve to eliminate the two translational
instabilities. For example, setting u3 = 0 has the effect of fixing the third node of the space
station to a support. With this specification, we can eliminate the variables associated with
that node, and thereby delete the corresponding columns of the incidence matrix leaving
the reduced incidence matrix
1
3
0
0
2
2
1
3
3
A? = 12
.
2
2
2
0
3
2
1
2
0 1
which corresponds to (the linear approximation of) the rotations around the fixed node. To
prevent the structure from rotating, we can also fix the second node, by further requiring
u2 = 0. This allows us to eliminate the third and fourth columns of the incidence matrix
and the resulting doubly reduced incidence matrix
1
3
2
A?? =
1
2
2
3
2
Now ker A?? = {0} is trivial, and hence the corresponding reduced stiffness matrix
1
3
!
!
2
2
1
1
1
0
0
2
3 =
2
K ?? = (A?? )T A?? = 2
12
2
3
3
0
0 23
2
2
0
0
1/12/04
218
c 2003
Peter J. Olver
Figure 6.9.
is positive definite. The space station with two fixed nodes is a stable structure, which can
now support an arbitrary external forcing. (Forces on the fixed nodes now have no effect
since they are no longer allowed to move.)
In general, a planar structure without any fixed nodes will have at least a threedimensional kernel, corresponding to the rigid planar motions of translations and (linear
approximations to) rotations. To stabilize the structure, one must fix two (non-coincident)
nodes. A three-dimensional structure that is not tied to any fixed supports will admit 6
independent rigid motions in its kernel. Three of these correspond to rigid translations in
the three coordinate directions, while the other three correspond to linear approximations
to the rigid rotations around the three coordinate axes. To eliminate the rigid motion
instabilities of the structure, one needs to fix three non-collinear nodes; details can be
found in the exercises.
Even after attaching a sufficient number of nodes to fixed supports so as to eliminate
all possible rigid motions, there may still remain nonzero vectors in the kernel of the
reduced incidence matrix of the structure. These indicate additional instabilities in which
the shape of the structure can deform without any applied force. Such non-rigid motions
are known as mechanisms of the structure. Since a mechanism moves the nodes without
elongating any of the bars, it does not induce any internal forces. A structure that admits
a mechanism is unstable even very tiny external forces may provoke a large motion.
Example 6.7. Consider the three bar structure of Example 6.5, but now with its
two ends attached to supports, as pictured in Figure 6.9. Since we are fixing nodes 1 and
4, setting u1 = u4 = 0, we should remove the first two and last column pairs from the
incidence matrix (6.46), leading to the reduced incidence matrix
1
0
0
2
A? = 1 0 1
0 .
1
1
0
0 2 2
The structure no longer admits any rigid motions. However, the kernel of A ? is oneT
dimensional, spanned by reduced displacement vector z? = ( 1 1 1 1 ) , which corresponds to the unstable mechanism that displaces the second node in the direction
T
T
u2 = ( 1 1 ) and the third node in the direction u3 = ( 1 1 ) . Geometrically, then,
1/12/04
219
c 2003
Peter J. Olver
Figure 6.10.
z? represents the displacement where node 2 moves down and to the left at a 45 angle,
while node 3 moves simultaneously up and to the left at a 45 angle. This mechanism does
not alter the lengths of the three bars (at least in our linear approximation regime) and
so requires no net force to be set into motion.
As with the rigid motions of the space station, an external forcing vector f ? will
maintain equilibrium only when it lies in the corange of A? , and hence must be orthogonal
T
to all the mechanisms in ker A? = (corng A? ) . Thus, the nodal forces f 2 = ( f2 , g2 ) and
T
f 3 = ( f3 , g3 ) must satisfy the balance law
z? f ? = f2 g2 + f3 + g3 = 0.
If this fails, the equilibrium equation has no solution, and the structure will move. For
example, a uniform horizontal force f2 = f3 = 1, g2 = g3 = 0, will induce the mechanism,
whereas a uniform vertical force, f2 = f3 = 0, g2 = g3 = 1, will maintain equilibrium. In
the latter case, the solution to the equilibrium equations
3
1
1
0
2
2
1
1
0
0
2
? ?
?
?
? T ?
2
K u =f ,
where
K = (A ) A =
,
3
1
1 0
2
2
0
21
1
2
u? = ( 3 5 2 0 ) + t ( 1 1 1 1 ) .
In other words, the equilibrium position is not unique, since the structure can still be
displaced in the direction of the unstable mechanism while maintaining the overall force
balance. On the other hand, the elongations and internal forces
T
y = e = A ? u? = ( 2 1 2 ) ,
are well-defined, indicating that, under our stabilizing uniform vertical force, all three bars
are compressed, with the two diagonal bars experiencing 41.4% more compression than
the horizontal bar.
1/12/04
220
c 2003
Peter J. Olver
Remark : Just like the rigid rotations, the mechanisms described here are linear approximations to the actual nonlinear motions. In a physical structure, the vertices will
move along curves whose tangents at the initial configuration are the directions indicated
by the mechanism vector. In certain cases, a structure can admit a linear mechanism, but
one that cannot be physically realized due to the nonlinear constraints imposed by the
geometrical configurations of the bars. Nevertheless, such a structure is at best borderline
stable, and should not be used in any real-world applications that rely on stability of the
structure.
We can always stabilize a structure by first fixing nodes to eliminate rigid motions,
and then adding in extra bars to prevent mechanisms. In the preceding example, suppose
we attach an additional bar connecting nodes 2 and 4, leading to the reinforced structure
in Figure 6.10. The revised incidence matrix is
A=
12 12
0 1
0
0
0 310
12
0
0
0
1
2
0
0
1
10
0
1
12
0
0
1
2
0
1
2
3
10
12
110
and is obtained from (6.46) by appending another row representing the added bar. When
nodes 1 and 4 are fixed, the reduced incidence matrix
A =
1
2
1
2
1
0
0
0
1
2
0
3
10
1
10
12
0
has trivial kernel, ker A? = {0}, and hence the structure is stable. It admits no mechanisms, and can support any configuration of forces (within reason very large forces
will take us outside the linear regime described by the model, and the structure may be
crushed!).
This particular case is statically determinate owing to the fact that the incidence
matrix is square and nonsingular, which implies that one can solve the force balance
equations (6.54) directly for the internal forces. For instance, a uniform downwards vertical
force f2 = f3 = 0, g2 = g3 = 1, e.g., gravity, will produce the internal forces
y1 =
2,
y2 = 1,
y3 =
2,
y4 = 0
indicating that bars 1, 2 and 3 are experiencing compression, while, interestingly, the
reinforcing bar 4 remains unchanged in length and hence experiences no internal force.
Assuming the bars are all of the same material, and taking the elastic constant to be 1, so
1/12/04
221
c 2003
Peter J. Olver
12
5
1
5
K = (A ) A =
1
?
? T
1
5
3
5
3
2
0
.
1
2
1
2
1
2
so
u2 = 12 32 ,
u? = 12 32 32 72 ,
T
u3 = 23 27 .
give the displacements of the two nodes under the applied force. Both are moving down
and to the left, with node 3 moving relatively farther owing to its lack of reinforcement.
Suppose we reinforce the structure yet further by adding in a bar connecting nodes 1
and 3. The resulting reduced incidence matrix
1
1
0
0
2
2
1
0
1
0
1
1
0
0
A =
2
2
1
0
0
10
10
0
3
10
1
10
again has trivial kernel, ker A? = {0}, and hence the structure is stable. Indeed, adding
in extra bars to a stable structure cannot cause it to lose stability. (In matrix language,
appending additional rows to a matrix cannot increase the size of its kernel, cf. Exercise .)
Since the incidence matrix is rectangular, the structure is now statically indeterminate and
we cannot determine the internal forces without first solving the full equilibrium equations
(6.49) for the displacements. The stiffness matrix is
12
5
1
5
K ? = (A? )T A? =
1
0
1
5
3
5
12
5
51
0
.
1
5
3
5
1
10
1
17
17
10 10 10
so that the free nodes now move symmetrically down and towards the center of the structure. The internal forces on the bars are
q
q
4
4
2
1
y1 = 5 2,
y3 = 5 2,
y5 = 25 .
y2 = 5 ,
y4 = 5 ,
All five bars are now experiencing compression, the two outside bars being the most
stressed, the reinforcing bars slightly more than half that, while the center bar feels less
1/12/04
222
c 2003
Peter J. Olver
A Swing Set.
Figure 6.11.
than a fifth the stress that the outside bars experience. This simple computation should
already indicate to the practicing construction engineer which of the bars in our structure
are more likely to collapse under an applied external force. By comparison, the reader can
investigate what happens under a uniform horizontal force.
Summarizing our discussion, we have established the following fundamental result
characterizing the stability and equilibrium of structures.
Theorem 6.8. A structure is stable, and will maintain an equilibrium under arbitrary external forcing, if and only if its reduced incidence matrix A ? has linearly independent columns, or, equivalently, ker A? = {0}. More generally, an external force f ?
on a structure will maintain equilibrium if and only if f ? (ker A? ) , which means that
the external force is orthogonal to all rigid motions and all mechanisms admitted by the
structure.
Example 6.9. A swing set is to be constructed, consisting of two diagonal supports
at each end and a horizontal cross bar. Is this configuration stable, i.e., can a child swing
on it without it collapsing? The movable joints are at positions
T
a1 = ( 1, 1, 3 ) ,
a2 = ( 4, 1, 3 ) .
a3 = ( 0, 0, 0 ) ,
a4 = ( 0, 2, 0 ) ,
a5 = ( 5, 0, 0 ) ,
a6 = ( 5, 2, 0 ) .
The reduced incidence matrix for the structure is calculated in the usual manner:
1
11
1
11
?
A = 1
0
0
1
11
111
3
11
3
11
0
0
0
0
0
1
0
0
0
0
1
11
111
3
11
3
11
1
11
1
11
For instance, the first three entries contained in the first row refer to the unit vector
a1 a 3
in the direction of the bar going from a3 to a1 . Suppose the three bars
n1 =
k a1 a3 k
1/12/04
223
c 2003
Peter J. Olver
have the same stiffness, and so (taking c1 = = c5 = 1) the reduced stiffness matrix for
the structure is
13
11
?
? T ?
K = (A ) A = 11
1
0
0
6
11
2
11
18
11
13
11
2
11
6
11
6
11
0
18
11
z? = ( 3 0 1 3 0 1 ) ,
which indicates a mechanism that causes the swing set to collapse: the first node moves
up and to the right, while the second node moves down and to the right, the horizontal
motion being three times as large as the vertical. The structure can support forces f 1 =
T
T
( f1 , g1 , h1 ) , f 2 = ( f2 , g2 , h2 ) , if and only if the combined force vector f ? is orthogonal
to the mechanism vector z? , and so
3 (f1 + f2 ) h1 + h2 = 0.
Thus, as long as the net horizontal force is in the y direction and the vertical forces on the
two joints are equal, the structure will maintain its shape. Otherwise, a reinforcing bar,
say from a1 to a6 (although this will interfere with the swinging!) or a pair of bars from
the nodes to two new ground supports, will be required to completely stabilize the swing.
T
0 43 11
0 0
u? = 13
6
6
and the general solution u = u? + t z? is obtained by adding in an arbitrary element of the
kernel. The resulting forces/elongations are uniquely determined,
y = e = A ? u = A ? u? =
11
6
11
6
31
11
6
11
6
so that every bar is compressed, the middle one experiencing slightly more than half the
stress of the outer supports.
If we stabilize the structure by adding in two vertical supports at the nodes, then the
1/12/04
224
c 2003
Peter J. Olver
?
A =
11
1
11
1
11
111
3
11
3
11
1
11
1
11
13
6
0 11
11
2
0
0
11
6
29
0 11
K ? = 11
1 0
0
0
0
0
0
1
11
111
3
11
11
0
1
1
0
0
0
0
0
0
0
0
6
13
0
11
11
2
0
0
11
29
6
0
11
11
is only slightly different than before, but this is enough to make it positive definite, K ? > 0,
and so allow arbitrary external forcing without collapse. Under the uniform vertical force,
the internal forces are
T
?
11
11
11
11
1
2
2
y = e = A u = 10 10 5 10 10 5 5
.
Note the overall reductions in stress in the original bars; the two new vertical bars are now
experiencing the largest amount of stress.
1/12/04
225
c 2003
Peter J. Olver
Chapter 7
Linear Functions and Linear Systems
We began this book by learning how to systematically solve systems of linear algebraic
equations. This elementary problem formed our launching pad for developing the fundamentals of linear algebra. In its initial form, matrices and vectors were the primary focus
of our study, but the theory was developed in a sufficiently general and abstract form that
it can be immediately applied to many other important situations particularly infinitedimensional function spaces. Indeed, applied mathematics deals, not just with algebraic
equations, but also differential equations, difference equations, integral equations, integrodifferential equations, differential delay equations, control systems, and many, many other
types of systems not all of which, unfortunately, can be adequately developed in this
introductory text. It is now time to assemble what we have learned about linear matrix
systems and place the results in a suitably general framework that will lead to insight into
the fundamental principles that govern completely general linear problems.
The most basic underlying object of linear systems theory is the vector space, and
we have already seen that the elements of vector spaces can be vectors, or functions, or
even vector-valued functions. The seminal ideas of span, linear independence, basis and
dimension are equally applicable and equally vital in more general contexts, particularly
function spaces. Just as vectors in Euclidean space are prototypes of general elements
of vector spaces, matrices are also prototypes of much more general objects, known as
linear functions. Linear functions are also known as linear maps or linear operators,
particularly when we deal with function spaces, and include linear differential operators,
linear integral operators, evaluation of a function or its derivative at a point, and many
other basic operations on functions. Generalized functions, such as the delta function to
be introduced in Chapter 11, are, in fact, properly formulated as linear operators on a
suitable space of functions. As such, linear maps form the simplest class of functions on
vector spaces. Nonlinear functions can often be closely approximated by linear functions,
generalizing the calculus approximation of a function by its tangent line. As a result, linear
functions must be thoroughly understood before any serious progress can be made in the
vastly more complicated nonlinear world.
In geometry, linear functions are interpreted as linear transformations of space (or
space-time), and, as such, lie at the foundations of motion of bodies, computer graphics and games, and the mathematical formulation of symmetry. Most basic geometrical
transformations, including rotations, scalings, reflections, projections, shears and so on,
are governed by linear transformations. However, translations require a slight generalization, known as an affine function. Linear operators on infinite-dimensional function
spaces are the basic objects of quantum mechanics. Each quantum mechanical observable
1/12/04
225
c 2003
Peter J. Olver
L[ c v ] = c L[ v ],
(7.1)
We will call V the domain space and W the target space for L.
In particular, setting c = 0 in the second condition implies that a linear function
always maps the zero element in V to the zero element in W , so
L[ 0 ] = 0.
(7.2)
We can readily combine the two defining conditions into a single rule
L[ c v + d w ] = c L[ v ] + d L[ w ],
for all
v, w V,
c, d R,
(7.3)
that characterizes linearity of a function L. An easy induction proves that a linear function
respects linear combinations, so
L[ c1 v1 + + ck vk ] = c1 L[ v1 ] + + ck L[ vk ]
(7.4)
The term target is used here to avoid later confusion with the range of L, which, in general,
is a subspace of the target vector space W .
1/12/04
226
c 2003
Peter J. Olver
where
a = L(1).
Therefore, the only scalar linear functions are those whose graph is a straight line passing
through the origin.
Warning: Even though the graph of the function
y = a x + b,
(7.5)
is a straight line, this is not a linear function unless b = 0 so the line goes through
the origin. The correct name for a function of the form (7.5) is an affine function; see
Definition 7.20 below.
Example 7.4. Let V = R n and W = R m . Let A be an m n matrix. Then the
function L[ v ] = A v given by matrix multiplication is easily seen to be a linear function.
Indeed, the requirements (7.1) reduce to the basic distributivity and scalar multiplication
properties of matrix multiplication:
A(v + w) = A v + A w,
A(c v) = c A v,
for all
v, w R n ,
c R.
In fact, every linear function between two Euclidean spaces has this form.
Theorem 7.5. Every linear function L: R n R m is given by matrix multiplication,
L[ v ] = A v, where A is an m n matrix.
1/12/04
227
c 2003
Peter J. Olver
Figure 7.1.
Warning: Pay attention to the order of m and n. While A has size m n, the linear
function L goes from R n to R m .
Proof : The key idea is to look at what the linear function does to the basis vectors.
Let e1 , . . . , en be the standard basis of R n , and let b
e1 , . . . , b
em be the standard basis of R m .
(We temporarily place hats on the latter to avoid confusing the two.) Since L[ e j ] R m ,
we can write it as a linear combination of the latter basis vectors:
a1j
a2j
L[ ej ] = aj = . = a1j b
e1 + a2j b
e2 + + amj b
em ,
j = 1, . . . , n.
(7.6)
.
.
amj
Let us construct the m n matrix
a
11
a21
A = ( a 1 a2 . . . a n ) =
..
.
am1
a12
a22
..
.
...
...
..
.
a1n
a2n
..
.
am2
...
amn
(7.7)
whose columns are the image vectors (7.6). Using (7.4), we then compute the effect of L
T
on a general vector v = ( v1 , v2 , . . . , vn ) R n :
L[ v ]= L[ v1 e1 + + vn en ] = v1 L[ e1 ] + + vn L[ en ] = v1 a1 + + vn an = A v.
The final equality follows from our basic formula (2.14) connecting matrix multiplication
and linear combinations. We conclude that the vector L[ v ] coincides with the vector A v
obtained by multiplying v by the coefficient matrix A.
Q.E.D.
The proof of Theorem 7.5 shows us how to construct the matrix representative of
a given linear function L: R n R m . We merely assemble the image column vectors
a1 = L[ e1 ], . . . , an = L[ en ] into an m n matrix A.
Example 7.6. In the case of a function from R n to itself, the two basic linearity
conditions (7.1) have a simple geometrical interpretation. Since vector addition is the
1/12/04
228
c 2003
Peter J. Olver
Linearity of Rotations.
Figure 7.2.
same as completing the parallelogram indicated in Figure 7.1, the first linearity condition
requires that L map parallelograms to parallelograms. The second linearity condition says
that if we stretch a vector by a factor c, then its image under L must also be stretched by
the same amount. Thus, one can often detect linearity by simply looking at the geometry
of the function.
As a specific example, consider the function R : R 2 R 2 that rotates the vectors in
the plane around the origin by a specified angle . This geometric transformation clearly
preserves parallelograms, as well as stretching see Figure 7.2 and hence defines a
linear function. In order to find its matrix representative, we need to find out where the
basis vectors e1 , e2 are mapped. Referring to Figure 7.3, we have
cos
sin
R [ e1 ] = cos e1 + sin e2 =
,
R [ e2 ] = sin e1 + cos e2 =
.
sin
cos
According to the general recipe (7.7), we assemble these two column vectors to obtain the
matrix form of the rotation transformation, and so
cos sin
.
(7.8)
R [ v ] = A v,
where
A =
sin cos
x
through angle gives the vector
Therefore, rotating a vector v =
y
cos sin
x
x cos y sin
b = R [ v ] = A v =
v
=
sin
cos
y
x sin + y cos
with coordinates
x
b = x cos y sin ,
yb = x sin + y cos .
These formulae can be proved directly, but, in fact, are a consequence of the underlying
linearity of rotations.
1/12/04
229
c 2003
Peter J. Olver
e2
Figure 7.3.
e1
Rotation in R .
Linear Operators
So far, we have concentrated on linear functions on Euclidean space, and discovered
that they are all represented by matrices. For function spaces, there is a much wider
variety of linear operators available, and a complete classification is out of the question.
Let us look at some of the main representative examples that arise in applications.
Example 7.7. (i ) Recall that C0 [ a, b ] denotes the vector space consisting of all
continuous functions on the interval [ a, b ]. Evaluation of the function at a point, L[ f ] =
f (x0 ), defines a linear operator L: C0 [ a, b ] R, because
L[ c f + d g ] = c f (x0 ) + d g(x0 ) = c L[ f ] + d L[ g ]
for any functions f, g C0 [ a, b ] and scalars (constants) c, d.
(ii ) Another real-valued linear function is the integration operator
Z b
I[ f ] =
f (x) dx.
(7.9)
c f (x) + d g(x) dx = c
f (x) dx + d
g(x) dx,
a
which is valid for arbitrary integrable which includes continuous functions f, g and
scalars c, d.
(iii ) We have already seen that multiplication of functions by a fixed scalar f (x) 7
c f (x) defines a linear map Mc : C0 [ a, b ] C0 [ a, b ]; the particular case c = 1 reduces to the
identity transformation I = M1 . More generally, if a(x) C0 [ a, b ] is a fixed continuous
function, then the operation Ma [ f (x) ] = a(x) f (x) of multiplication by a also defines a
linear transformation Ma : C0 [ a, b ] C0 [ a, b ].
(iv ) Another important linear transformation is the indefinite integral
Z x
J[ f ] =
f (y) dy.
(7.10)
a
230
c 2003
Peter J. Olver
a b
. The standard basis of M22
tified with the space M22 of 2 2 matrices A =
c d
consists of the 4 = 2 2 matrices
1 0
0 1
0 0
0 0
E11 =
,
E12 =
,
E21 =
,
E22 =
.
0 0
0 0
1 0
0 1
Indeed, we can uniquely write any other matrix
a b
= a E11 + b E12 + c E21 + d E22 ,
A=
c d
1/12/04
231
c 2003
Peter J. Olver
A particularly important case is when the target space of the linear functions is R.
Definition 7.9. The dual space to a vector space V is defined as the vector space
V = L( V, R ) consisting of all real-valued linear functions L: V R.
If V = R n , then every linear function L: R n R is given by multiplication by a 1 n
matrix, i.e., a row vector. Explicitly,
v
1
v2
L[ v ] = a v = a1 v1 + + an vn ,
where a = ( a1 a2 . . . an ),
v=
.. .
.
vn
Therefore, we can identify the dual space (R n ) with the space of row vectors with n
entries. In light of this observation, the distinction between row vectors and column
vectors is now seen to be much more sophisticated than mere semantics or notation. Row
vectors should be viewed as real-valued linear functions the dual objects to column
vectors.
The standard dual basis 1 , . . . , n of (R n ) consists of the standard row basis vectors,
namely j is the row vector with 1 in the j th slot and zeros elsewhere. The j th dual basis
element defines the linear function
Ej [ v ] = j v = v j ,
that picks off the j th coordinate of v with respect to the original basis e1 , . . . , en . Thus,
the dimension of V = R n and its dual (R n ) are both equal to n.
An inner product structure provides a mechanism for identifying a vector space and
its dual. However, it should be borne in mind that this identification will depend upon
the choice of inner product.
Theorem 7.10. Let V be a finite-dimensional real inner product space. Then every
linear function L: V R is given by an inner product
L[ v ] = h a ; v i
(7.12)
n
X
a i xj h ui ; u j i = a 1 x1 + + a n xn .
i,j = 1
232
Q.E.D.
c 2003
Peter J. Olver
Remark : In the particular case when V = R n is endowed with the standard dot
product, then Theorem 7.10 identifies a row vector representing a linear function with
the corresponding column vector obtained by transposition a 7aT . Thus, the nave
identification of a row and a column vector is, in fact, an indication of a much more subtle
phenomenon that relies on the identification of R n with its dual based on the Euclidean
inner product. Alternative inner products will lead to alternative, more complicated,
identifications of row and column vectors; see Exercise for details.
Important: Theorem 7.10 is not true if V is infinite-dimensional. This fact will have
important repercussions for the analysis of the differential equations of continuum mechanics, which will lead us immediately into the much deeper waters of generalized function
theory. Details will be deferred until Section 11.2.
Composition of Linear Functions
Besides adding and multiplying by scalars, one can also compose linear functions.
Lemma 7.11. Let V, W, Z be vector spaces. If L: V W and M : W Z are linear
functions, then the composite function M L: V Z, defined by (M L)[ v ] = M [ L[ v ] ]
is linear.
Proof : This is straightforward:
(M L)[ c v + d w ] = M [ L[ c v + d w ] ] = M [ c L[ v ] + d L[ w ] ]
= c M [ L[ v ] ] + d M [ L[ w ] ] = c (M L)[ v ] + d (M L)[ w ],
where we used, successively, the linearity of L and then of M .
Q.E.D.
cos( + ) sin( + )
cos sin
cos sin
.
=
sin( + ) cos( + )
sin cos
sin cos
Multiplying out the left hand side, we deduce the well-known trigonometric addition formulae
cos( + ) = cos cos sin sin ,
In fact, this computation constitutes a bona fide proof of these two identities!
1/12/04
233
c 2003
Peter J. Olver
Example 7.13. One can build up more sophisticated linear operators on function
space by adding and composing simpler ones. In particular, the linear higher order derivative operators are obtained by composing the derivative operator D, defined in (7.11), with
itself. For example,
D2 [ f ] = D D[ f ] = D[ f 0 ] = f 00
is the second derivative operator. One needs to exercise some care about the domain of
definition, since not every function is differentiable. In general,
Dk [ f ] = f (k) (x)
D k : Cn [ a, b ] Cnk [ a, b ]
for any n k.
If we compose D k with the linear operation of multiplication by a fixed function
a(x) Cnk [ a, b ] we obtain the linear operator f (x) 7a D k [ f ] = a(x) f (k) (x). Finally, a
general linear ordinary differential operator of order n
L = an (x) Dn + an1 (x) Dn1 + + a1 (x) D + a0 (x)
(7.13)
is obtained by summing such linear operators. If the coefficient functions a 0 (x), . . . , an (x)
are continuous, then
dn1 u
du
dn u
+
a
(x)
+ + a1 (x)
+ a0 (x)u
(7.14)
n1
n
n1
dx
dx
dx
defines a linear operator from Cn [ a, b ] to C0 [ a, b ]. The most important case but certainly not the only one arising in applications is when the coefficients a i (x) = ci of L
are all constant.
L[ u ] = an (x)
Inverses
The inverse of a linear function is defined in direct analogy with the Definition 1.13
of the inverse of a (square) matrix.
Definition 7.14. Let L: V W be a linear function. If M : W V is a linear
function such that both composite functions
LM = IW,
M L = IV ,
(7.15)
are equal to the identity function, then we call M the inverse of L and write M = L 1 .
The two conditions (7.15) require
L[ M [ w ] ] = w
for all
w W,
and
M [ L[ v ] ] = v
for all
v V.
BA = I,
for matrix inversion, cf. (1.33). Therefore B = A1 is the inverse matrix. In particular,
for L to have an inverse, we need m = n and its coefficient matrix A to be square and
nonsingular.
1/12/04
234
c 2003
Peter J. Olver
Rotation.
Figure 7.4.
More precisely, the derivative of the indefinite integral of f is equal to f , and hence
Z x
d
D[ J[ f (x) ] ] =
f (y) dy = f (x).
dx a
In other words, the composition
D J = I C0 [ a,b ]
defines the identity operator on the function space C0 [ a, b ]. On the other hand, if we
integrate the derivative of a continuously differentiable function f C 1 [ a, b ], we obtain
Z x
0
f 0 (y) dy = f (x) f (a).
J[ D[ f (x) ] ] = J[ f (x) ] =
a
Therefore
and so
J D 6
= I C1 [ a,b ]
is not the identity operator. Therefore, differentiation, D, is a left inverse for integration,
J, but not a right inverse!
This perhaps surprising phenomenon could not be anticipated from the finite-dimensional
matrix theory. Indeed, if a matrix A has a left inverse B, then B is automatically a right
inverse too, and we write B = A1 as the inverse of A. On an infinite-dimensional vector
space, a linear operator may possess one inverse without necessarily the other. However,
if both a left and a right inverse exist they must be equal; see Exercise .
If we restrict D to the subspace V = { f | f (a) = 0 } C1 [ a, b ] consisting of all continuously differentiable functions that vanish at the left hand endpoint, then J: C 0 [ a, b ] V ,
and D: V C0 [ a, b ] are, by the preceding argument, inverse linear operators: D J =
I C0 [ a,b ] , and J D = I V . Note that V ( C1 [ a, b ] ( C0 [ a, b ]. Thus, we discover the curious
and disconcerting infinite-dimensional phenomenon that J defines a one-to-one, invertible,
linear map from a vector space C0 [ a, b ] to a proper subspace V ( C0 [ a, b ]. This paradoxical
situation cannot occur in finite dimensions. A linear map on a finite-dimensional vector
space can only be invertible when the domain and target spaces have the same dimension,
and hence its matrix is necessarily square!
235
c 2003
Peter J. Olver
Figure 7.5.
A linear function L: R n R n that maps n-dimensional Euclidean space to itself defines a linear transformation. As such, it can be assigned a geometrical interpretation that
leads to further insight into the nature and scope of linear functions. The transformation
L maps a point x R n to its image point L[ x ] = A x, where A is its n n matrix representative. Many of the basic maps that appear in geometry, in computer graphics and
computer gaming, in deformations of elastic bodies, in symmetry and crystallography, and
in Einsteins special relativity, are defined by linear transformations. The two-, three- and
four-dimensional (viewing time as a fourth dimension) cases are of particular importance.
Most of the important classes linear transformations already appear in the two-dimensional case. Every linear function L: R 2 R 2 has the form
a b
ax + by
x
(7.16)
,
where
A=
=
L
c d
cx + dy
y
is an arbitrary 2 2 matrix. We have already encountered the rotation matrices
cos sin
R =
,
sin
cos
(7.17)
whose effect is to rotate every vector in R 2 through an angle ; in Figure 7.4 we illustrate
the effect on a couple of square regions in the plane. Planar rotation matrices coincide
with the 2 2 proper orthogonal matrices, meaning matrices Q that satisfy
QT Q = I ,
det Q = +1.
(7.18)
The improper orthogonal matrices, i.e., those with determinant 1, define reflections. For
example, the matrix
1 0
x
x
A=
corresponds to the linear transformation L
=
, (7.19)
0 1
y
y
which reflects the plane through the y axis; see Figure 7.5. It can be visualized by thinking
of the y axis as a mirror. Another simple example is the improper orthogonal matrix
0 1
x
y
R=
. The corresponding linear transformation L
=
(7.20)
1 0
y
x
is a reflection through the diagonal line y = x, as illustrated in Figure 7.6.
1/12/04
236
c 2003
Peter J. Olver
Figure 7.6.
Figure 7.7.
A ThreeDimensional Rotation.
cos sin 0
tions. For example, the proper orthogonal matrix sin
cos 0 corresponds to a
0
0
1
cos 0 sin
rotation through an angle around the zaxis, while 0
1
0 corresponds
sin 0 cos
to a rotation through an angle around the yaxis. In general, a proper orthogonal matrix Q = ( u1 u2 u3 ) with columns ui = Q ei corresponds to the rotation in which the
standard basis vectors e1 , e2 , e3 are rotated to new positions given by the orthonormal
basis u1 , u2 , u3 . It can be shown see Exercise that every 3 3 orthogonal matrix
corresponds to a rotation around a line through the origin in R 3 the axis of the rotation,
as sketched in Figure 7.7.
Since the product of two (proper) orthogonal matrices is also (proper) orthogonal,
this implies that the composition of two rotations is also a rotation. Unlike the planar
case, the order in which the rotations are performed is important! Multiplication of n n
orthogonal matrices is not commutative for n 3. For example, rotating first around the
zaxis and then rotating around the yaxis does not have the same effect as first rotating
around the yaxis and then rotating first around the zaxis. If you dont believe this,
try it out with a solid object, e.g., this book, and rotate through 90 , say, around each
axis; the final configuration of the book will depend upon the order in which you do the
1/12/04
237
c 2003
Peter J. Olver
Figure 7.8.
Figure 7.9.
rotations. Then prove this mathematically by showing that the two rotation matrices do
not commute.
Other important linear transformations arise from elementary matrices. First, the
elementary matrices corresponding to the third type of row operations multiplying a
row by a scalar correspond to simple stretching transformations. For example, if
2x
x
2 0
=
,
then the linear transformation
L
A=
y
y
0 1
has the effect of stretching along the x axis by a factor of 2; see Figure 7.8. A matrix with
a negative diagonal entry corresponds to a reflection followed by a stretch. For example,
the elementary matrix (7.19) gives an example of a pure reflection, while the more general
elementary matrix
2 0
2 0
1 0
=
0 1
0 1
0 1
can be written as the product of a reflection through the y axis followed by a stretch along
the x axis. In this case, the order of these operations is immaterial.
For 22 matrices, there is only one type of row interchange matrix, namely the matrix
(7.20) that yields a reflection through the diagonal y = x. The elementary matrices of Type
#1 correspond to shearing transformations of the plane. For example, the matrix
1 2
x
x + 2y
represents the linear transformation
L
=
,
0 1
y
y
which has the effect of shearing the plane along the xaxis. The constant 2 will be called
the shear factor , which can be either positive or negative. Each point moves parallel to
the x axis by an amount proportional to its (signed) distance from the axis; see Figure 7.9.
1/12/04
238
c 2003
Peter J. Olver
1 0
represents the linear transformation
3 1
x
x
,
=
L
y 3x
y
which represents a shear along the y axis. Shears map rectangles to parallelograms; distances are altered, but areas are unchanged.
All of the preceding linear maps are invertible, and so represented by nonsingular
matrices. Besides the zero map/matrix, which sends every point x R 2 to the origin, the
simplest singular map is
x
x
1 0
,
=
corresponding to the linear transformation
L
0
y
0 0
T
which is merely the orthogonal projection of the vector ( x, y ) onto the xaxis. Other
rank one matrices represent various kinds of projections from the plane to a line through
the origin; see Exercise for details.
A similar classification of linear maps appears in higher dimensions. The linear transformations constructed from elementary matrices can be built up from the following four
basic types:
(i ) A stretch in a single coordinate direction.
(ii ) A reflection through a coordinate plane.
(iii ) A reflection through a diagonal plane,
(iv ) A shear along a coordinate axis.
Moreover, we already proved that every nonsingular matrix can be written as a product
of elementary matrices; see (1.41). This has the remarkable consequence that every linear
transformation can be constructed from a sequence of elementary stretches, reflections, and
shears. In addition, there is one further, non-invertible type of basic linear transformation:
(v ) An orthogonal projection onto a lower dimensional subspace.
All possible linear transformations of R n can be built up, albeit non-uniquely, as a combination of these five basic types.
1
3
2
corresponding to a plane
Example 7.16. Consider the matrix A = 2
1
2
3
2
rotation through = 30 , cf. (7.17). Rotations are not elementary linear transformations.
To express this particular rotation as a product of elementary matrices, we need to perform
a Gauss-Jordan row reduction to reduce it to the identity matrix. Let us indicate the basic
steps:
!
!
1
3
1
0
2
2
E1 =
,
,
E1 A =
2
13 1
0
!
3
!
3
1
1 0
2
2
E2 =
,
E 2 E1 A =
,
3
0 2
0
1
1/12/04
239
c 2003
Peter J. Olver
E3 =
E4 =
2
3
1
3
,
,
E 3 E2 E1 A =
1 13
0
E 4 E3 E2 E1 A = I =
and hence
3
1
1
2
2
1
1
2
3
2
0
1
1
0
0
2
3
1 0
0 1
3
2
1 13
0
Therefore, a 30 rotation can be effected by performing the following composition of elementary transformations in the prescribed order:
(1) First, a shear in the xdirection with shear factor 13 ,
w = y1 w1 + + ym wm W,
240
c 2003
Peter J. Olver
Proof : We mimic the proof of Theorem 7.5, replacing the standard basis vectors by
more general basis vectors. In other words, we should apply L to the basis vectors of V
and express the result as a linear combination of the basis vectors in W . Specifically, we
m
X
bij wi . The coefficients bij form the entries of the desired coefficient
write L[ vj ] =
i=1
L[ v ] = L[ x1 v1 + + xn vn ] = x1 L[ v1 ] + + xn L[ vn ] =
and so yi =
n
X
m
X
i=1
bij xj as claimed.
n
X
j =1
bij xj wi ,
Q.E.D.
j =1
x
xy
Example 7.18. Consider the linear transformation L
=
which
y
2x + 4y
2
we write in the
Cartesian coordinates x, y on R . The corresponding coefficient
standard,
1 1
is the matrix representation of L relative to the standard basis
matrix A =
2 4
e1 , e2 of R 2 . This means that
1
1
= e1 + 4 e2 .
= e1 + 2 e 2 ,
L[ e2 ] =
L[ e1 ] =
4
2
Let us see what happens if we replace the standard basis by the alternative basis
1
1
v1 =
,
v2 =
.
1
2
What is the corresponding matrix formulation of the same linear transformation? According to the recipe of Theorem 7.17, we must compute
2
3
L[ v1 ] =
= 2 v1 ,
L[ v2 ] =
= 3 v2 .
2
6
The linear transformation acts by stretching in the direction v1 by a factor of 2 and
simultaneously stretching in the direction v2 by a factor of 3. Therefore,
whose effect is to multiply the new basis coordinates a = ( a, b ) by the diagonal matrix
D. Both A and D represent the same linear transformation the former in the standard
1/12/04
241
c 2003
Peter J. Olver
basis and the latter in the new basis. The simple geometry of this linear transformation
is thereby exposed through the inspired choice of an adapted basis. The secret behind the
choice of such well-adapted bases will be revealed in Chapter 8.
How does one effect a change of basis in general? According to formula (2.22), if
T
v1 , . . . , vn form a new basis of R n , then the coordinates y = ( y1 , y2 , . . . , yn ) of a vector
x = y 1 v1 + y2 v2 + + y n vn
are found by solving the linear system
S y = x,
where
S = ( v 1 v2 . . . v n )
(7.21)
(7.22)
Two matrices A and B which are related by such an equation for some nonsingular matrix S
are called similar . Similar matrices represent the same linear transformation, but relative
to different bases of the underlying vector space R n .
Returning to the preceding
example,
we assemble the new basis vectors to form the
1
1
change of basis matrix S =
, and verify that
1 2
2
1
1 1
1 1
2 0
1
S AS =
=
= D,
1 1
2 4
1 2
0 3
reconfirming our earlier computation.
More generally, a linear transformation L: R n R m is represented by an m n
matrix A with respect to the standard bases on both the domain and target spaces. What
happens if we introduce a new basis v1 , . . . , vn on the domain space R n and a new basis
w1 , . . . , wm on the target space R m ? Arguing as above, we conclude that the matrix
representative of L with respect to these new bases is given by
B = T 1 A S,
(7.23)
242
c 2003
Peter J. Olver
1 0 0 ... 0 0 ... 0
0 1 0 ... 0 0 ... 0
0 0 1 ... 0 0 ... 0
. . . .
.. 1 0 . . . 0
(7.24)
B = .. .. ..
.
0 0 0 ... 0 0 ... 0
. . . .
. . ... ... . . . ...
.. .. ..
0 0 0 ... 0 0 ... 0
In this matrix, the first r rows have a single 1 in the diagonal slot, indicating that the first
r basis vectors of the domain space are mapped to the first r basis vectors of the target
space while the last m r rows are all zero, indicating that the last n r basis vectors in
the domain are all mapped to 0. Thus, by a suitable choice of bases on both the domain
and target spaces, any linear transformation has an extremely simple canonical form.
Example 7.19. According to the illustrative example following Theorem 2.47, the
matrix
2 1 1
2
A = 8 4 6 4
4 2 3
2
has rank 2. Based on the calculations, we choose the domain space basis
1
0
2
2
2
0
1
1
0
v 3 = ,
v2 =
v1 =
v4 =
,
,
,
2
1
0
2
4
2
0
1
noting that v1 , v2 are a basis for the row space corng A, while v3 , v4 are a basis for ker A.
For our basis of the target space, we first compute w1 = A v1 and w2 = A v2 , which form
a basis for rng A. We supplement these by the single basis vector w3 for coker A, and so
0
6
10
w3 = 21 ,
w2 = 4 ,
w1 = 34 ,
2
17
1
1
B = T 1 A S = 0
0
1/12/04
243
0 0 0
1 0 0 ,
0 0 0
c 2003
Peter J. Olver
2
2
0
2
10
1 0 1 0
,
T
=
34
S=
1 2 0 2
17
2
4 0 1
6
4
2
0
1
2
x Rn,
(7.25)
where a R n is a fixed vector that determines the direction and the distance that the
points are translated. Except in the trivial case a = 0, the translation T is not a linear
function because
T[x + y] = x + y + a 6
= T [ x ] + T [ y ] = x + y + 2 a.
Or, even more simply, one notes that T [ 0 ] = a 6
= 0.
Combining translations and linear functions leads us to an important class of geometrical transformations.
Definition 7.20. A function F : R n R m of the form
F [ x ] = A x + b,
(7.26)
(7.27)
As mentioned earlier, even though the graph of f (x) is a straight line, f is not a linear
function unless = 0, and the line goes through the origin. Thus, to be technically
correct, we should refer to (7.27) as an affine scalar function.
Example 7.21. The affine function
0 1
x
1
y + 1
F (x, y) =
+
=
1 0
y
2
x2
has the effect of first rotating the plane R 2 by 90 about the origin, and then translating
T
by the vector ( 1, 2 ) . The reader may enjoy proving that this combination has the same
effect as just rotating the plane through an angle of 90 centered at the point 43 , 12 .
See Exercise .
1/12/04
244
c 2003
Peter J. Olver
The composition of two affine functions is again an affine function. Specifically, given
F [ x ] = A x + a, G[ y ] = B y + b, then
(G F )[ x ] = G[ F [ x ] ] = G[ A x + a ]
= B (A x + a) + b = C x + c,
where
C = B A,
c = B a + b.
(7.28)
Note that the coefficient matrix of the composition is the product of the coefficient matrices,
but the resulting vector of translation is not the sum the two translation vectors!
Isometry
A transformation that preserves distance is known as a rigid motion, or, more abstractly, as an isometry. We already encountered the basic rigid motions in Chapter 6
they are the translations and the rotations.
Definition 7.22. A function F : V V is called an isometry on a normed vector
space if it preserves the distance:
d(F [ v ], F [ w ]) = d(v, w)
for all
v, w V.
(7.29)
Since the distance between points is just the norm of the vector between them,
d(v, w) = k v w k, cf. (3.29), the isometry condition (7.29) can be restated as
F[v] F[w] = k v w k
for all
v, w V.
(7.30)
Clearly, any translation
T [ v ] = v + a,
where
aV
is a fixed vector
L[ v ] = k v k
for all
v V,
(7.31)
because, by linearity, L[ v ] L[ w ] = L[ v w ]. More generally, an affine transformation
F [ v ] = L[ v ] + a is an isometry if and only if its linear part L[ v ] is.
For the standard Euclidean norm on V = R n , the linear isometries consist of rotations and reflections. Both are characterized by orthogonal matrices, the rotations having
determinant + 1, while the reflections have determinant 1.
Proposition 7.23. A linear transformation L[ x ] = Q v defines a Euclidean isometry
of R if and only if Q is an orthogonal matrix.
n
Q x 2 = (Q x)T Q x = xT QT Q x = xT x = k x k2
for all
x Rn.
(7.32)
Clearly this holds if and only if QT Q = I , which is precisely the condition (5.30) that Q
be an orthogonal matrix.
Q.E.D.
1/12/04
245
c 2003
Peter J. Olver
Figure 7.10.
A Screw.
Remark : It can be proved, [153], that the most general Euclidean isometry of R n is an
affine transformation F [ x ] = Q x + a where Q is an orthogonal matrix and a is a constant
vector. Therefore, every Euclidean isometry is a combination of translations, rotations and
reflections. The proper isometries correspond to the rotations, with det Q = 1, and can be
realized as physical motions; improper isometries, with det Q = 1, are then obtained by
reflection in a mirror.
The isometries of R 2 and R 3 are fundamental to the understanding of how objects
move in three-dimensional space. Basic computer graphics and animation require efficient
implementation of rigid isometries in three-dimensional space, coupled with appropriate
(nonlinear) perspective maps prescribing the projection of three-dimensional objects onto
a two-dimensional viewing screen.
There are three basic types of proper affine isometries. First are the translations
F [ x ] = x+a in a fixed direction a. Second are the rotations. For example, F [ x ] = Q x with
det Q = 1 represent rotations around the origin, while the more general case F [ x ] = Q(x
b) + b = Q x + ( I Q)b is a rotation around the point b. Finally, the screw motions are
affine maps of the form F [ x ] = Q x+a where the orthogonal matrix Q represents a rotation
through an angle around a fixedaxisa, which
term;
is also the direction
ofthe translation
0
cos sin 0
x
x
cos 0 y + 0 represents
see Figure 7.10. For example, F y = sin
a
0
0
1
z
z
a vertical screw along the the zaxis through an angle by an distance a. As its name
implies, a screw represents the motion of a point on the head of a screw. It can be proved,
cf. Exercise , that every proper isometry of R 3 is either a translation, a rotation, or a
screw.
246
c 2003
Peter J. Olver
partial differential equations, linear boundary value problems, and a wide variety of other
linear problems in mathematics and its applications. The idea is simply to replace matrix
multiplication by a general linear function. Many of the structural results we learned in the
matrix context have, when suitably formulated, direct counterparts in these more general
situations, thereby shedding some light on the nature of their solutions.
Definition 7.24. A linear system is an equation of the form
L[ u ] = f ,
(7.33)
in which L: V W is a linear function between vector spaces, the right hand side f W
is an element of the target space, while the desired solution u V belongs to the domain
space. The system is homogeneous if f = 0; otherwise, it is called inhomogeneous.
Example 7.25. If V = R n and W = R m , then, according to Theorem 7.5, every
linear function L: R n R m is given by matrix multiplication: L[ u ] = A u. Therefore, in
this particular case, every linear system is a matrix system, namely A u = f .
Example 7.26. A linear ordinary differential equation takes the form L[ u ] = f ,
where L is an nth order linear differential operator of the form (7.13), and the right hand
side is, say, a continuous function. Written out, the differential equation takes the familiar
form
dn u
dn1 u
du
+
a
(x)
+ + a1 (x)
+ a0 (x)u = f (x).
(7.34)
n1
n
n1
dx
dx
dx
You should already have some familiarity with solving the constant coefficient case. Appendix C describes a method for constructing series representations for the solutions to
more general, non-constant coefficient equations.
L[ u ] = an (x)
Example 7.27. Let K(x, y) be a function of two variables which is continuous for
all a x, y b. Then the integral
Z b
IK [ u ] =
K(x, y) u(y) dy
a
defines a linear operator IK : C [ a, b ] C0 [ a, b ], known as an integral transform. Important examples include the Fourier and Laplace transforms, to be discussed in Chapter 13.
Finding the inverse transform requires solving a linear integral equation I K [ u ] = f , which
has the explicit form
Z b
K(x, y) u(y) dy = f (x).
a
Example 7.28. One can combine linear maps to form more complicated, mixed
types of linear systems. For example, consider a typical initial value problem
u00 + u0 2 u = x,
u(0) = 1,
u0 (0) = 1,
(7.35)
for a scalar unknown function u(x). The differential equation can be written as a linear
system
L[ u ] = x,
where
L[ u ] = (D 2 + D 2)[ u ] = u00 + u0 2 u
1/12/04
247
c 2003
Peter J. Olver
00
u (x) + u0 (x) 2 u(x)
L[ u ]
,
u(0)
M [ u ] = u(0) =
u0 (0)
u0 (0)
then M defines a linear map whose domain is the space C2 of twice continuously differentiable
functions,
and whose range is the vector space V consisting of all triples
f (x)
v=
a , where f C0 is a continuous function and a, b R are real constants. You
b
should convince yourself that V is indeed a vector space under the evident addition and
scalar multiplication operations. In this way, we can write the initial value problem (7.35)
T
in linear systems form as M [ u ] = f , where f = ( x, 1, 1 ) .
A similar construction applies to linear boundary value problems. For example, the
boundary value problem
u00 + u = ex ,
u(0) = 1,
u(1) = 2,
where
ex
f = 1 .
2
Note that M : C2 V defines a linear map having the preceding domain and target spaces.
The Superposition Principle
Before attempting to tackle general inhomogeneous linear systems, it will help to
look first at the homogeneous version. The most important fact is that homogeneous
linear systems admit a superposition principle, that allows one to construct new solutions
from known solutions. As we learned, the word superposition refers to taking linear
combinations of solutions.
Consider a general homogeneous linear system
L[ z ] = 0
(7.36)
where L is a linear function. If we are given two solutions, say z 1 and z2 , meaning that
L[ z1 ] = 0,
L[ z2 ] = 0,
This is a particular case of the general Cartesian product construction between vector spaces,
with V = C0 R 2 . See Exercise for details.
1/12/04
248
c 2003
Peter J. Olver
Similarly, given a solution z and any scalar c, the scalar multiple c z is automatically a
solution, since
L[ c z ] = c L[ z ] = c 0 = 0.
Combining these two elementary observations, we can now state the general superposition
principle. The proof is an immediate consequence of formula (7.4).
Theorem 7.29. If z1 , . . . , zk are all solutions to the same homogeneous linear system
L[ z ] = 0, and c1 , . . . , ck are any scalars, then the linear combination c1 z1 + + ck zk is
also a solution.
As with matrices, we call the solution space to the homogeneous linear system (7.36)
the kernel of the linear function L. Theorem 7.29 implies that the kernel always forms a
subspace.
Proposition 7.30. If L: V W is a linear function, then its kernel
ker L = { z V | L[ z ] = 0 } V
(7.37)
(7.38)
(7.39)
In accordance with the standard solution method, we plug the exponential ansatz
u = e x
1/12/04
249
c 2003
Peter J. Olver
u2 (x) = e x ,
(7.40)
are two linearly independent solutions of (7.39). According to the general superposition
principle, every linear combination
u(x) = c1 u1 (x) + c2 u2 (x) = c1 e3 x + c2 e x
of these two basic solutions is also a solution, for any choice of constants c 1 , c2 . In fact,
this two-parameter family constitutes the most general solution to the ordinary differential
equation (7.39). Thus, the kernel of the second order differential operator (7.38) is twodimensional, with basis given by the independent exponential solutions (7.40).
In general, the solution space to an nth order homogeneous linear ordinary differential
equation
L[ u ] = an (x)
dn u
dn1 u
du
+
a
(x)
+ + a1 (x)
+ a0 (x)u = 0
n1
n
n1
dx
dx
dx
(7.41)
forms a subspace of the vector space Cn [ a, b ] of n times continuously differentiable functions, since it is just the kernel of a linear differential operator L: C n [ a, b ] C0 [ a, b ]. This
implies that linear combinations of solutions are also solutions. To determine the number
of solutions, or, more precisely, the dimension of the solution space, we need to impose
some mild restrictions on the differential operator.
Definition 7.32. The differential operator L is called nonsingular on an open interval [ a, b ] if all its coefficients an (x), . . . , a0 (x) C0 [ a, b ] are continuous functions and its
leading coefficient does not vanish: an (x) 6
= 0 for all a < x < b.
The basic existence and uniqueness result governing nonsingular homogeneous linear
ordinary differential equations can be formulated as a characterization of the dimension of
the solution space.
Theorem 7.33. The kernel of a nonsingular nth order ordinary differential operator
forms an n-dimensional subspace ker L Cn [ a, b ].
A proof of this result can be found in Section 20.1. The fact that the kernel has
dimension n means that it has a basis consisting of n linearly independent solutions
u1 (x), . . . , un (x) Cn [ a, b ] such that the general solution to the homogeneous differential equation (7.41) is given by a linear combination
u(x) = c1 u1 (x) + + cn un (x),
1/12/04
250
c 2003
Peter J. Olver
(7.42)
where 0 6
= a, b, c are constants, and E = a x2 D2 + b x D + c is a second order, non-constant
coefficient differential operator. Instead of the exponential solution ansatz used in the
constant coefficient case, Euler equations are solved by using a power ansatz
u(x) = xr
with unknown exponent r. Substituting into the differential equation, we find
E[ xr ] = a r (r 1) xr + b r xr + c xr = [ a r (r 1) + b r + c ] xr = 0,
and hence xr is a solution if and only if r satisfies the characteristic equation
a r (r 1) + b r + c = a r 2 + (b a) r + c = 0.
(7.43)
has solution
u = c 1 x + c 2 x3 ,
has solution
251
u = c1 x +
c2
,
x
c 2003
Peter J. Olver
and only the multiples of the first solution x are continuous at x = 0. Therefore, the
solutions that are continuous everywhere form only a one-dimensional subspace of C 0 (R).
Finally,
x2 u00 + 5 x u0 + 3 u = 0
has solution
u=
c
c1
+ 23 ,
x
x
2u 2u
+ 2 =0
x2
y
(7.45)
for a function u(x, y) defined on a domain R 2 . The Laplace equation is the most
important partial differential equation, and its applications range over almost all fields
of mathemtics, physics and engineering, including complex analysis, geometry, fluid mechanics, electromagnetism, elasticity, thermodynamics, and quantum mechanics. It is a
homogeneous linear partial differential equation corresponding to the partial differential
operator = x2 + y2 known as the Laplacian operator. Linearity can either be proved
directly, or by noting that is built up from the basic linear partial derivative operators
x , y by the processes of composition and addition, as in Exercise .
Unlike the case of a linear ordinary differential equation, there are an infinite number
of linearly independent solutions to the Laplace equation. Examples include the trigonometric/exponential solutions
e x cos y,
e x sin y,
e y cos x,
e y sin y,
where is any real constant. There are also infinitely many independent polynomial
solutions, the first few of which are
1,
x,
y,
x2 y 2 ,
x y,
x3 3 x y 2 ,
...
The reader might enjoy finding some more polynomial solutions and trying to spot the
pattern. (The answer will appear shortly.) As usual, we can build up more complicated
solutions by taking general linear combinations of these particular ones. In fact, it will be
shown that the most general solution to the Laplace equation can be written as a convergent
infinite series in the basic polynomial solutions. Later, in Chapters 15 and 16, we will learn
how to construct these and many other solutions to the planar Laplace equation.
Inhomogeneous Systems
Now we turn our attention to an inhomogeneous linear system
L[ u ] = f .
(7.46)
Unless f = 0, the solution space to (7.46) is not a subspace. (Why?) The key question
is existence is there a solution to the system? In the homogeneous case, existence is
not an issue, since 0 is always a solution to L[ z ] = 0. The key question for homogeneous
1/12/04
252
c 2003
Peter J. Olver
systems is uniqueness whether ker L = {0}, in which case 0 is the only solution, or
whether there are nontrivial solutions 0 6
= z ker L.
In the matrix case, the compatibility of an inhomogeneous system A x = b which
was required for the existence of a solution led to the general definition of the range of
a matrix, which we copy verbatim for linear functions.
Definition 7.36. The range of a linear function L: V W is the subspace
rng L = { L[ v ] | v V } W.
The proof that rng L is a subspace is straightforward. If f = L[ v ] and g = L[ w ] are
any two elements of the range, so is any linear combination, since, by linearity
c f + d g = c L[ v ] + d L[ w ] = L[ c v + d w ] rng L.
For example, if L[ v ] = A v is given by multiplication by an m n matrix, then its range
is the subspace rng L = rng A R m spanned by the columns of A the column space
of the coefficient matrix. When L is a linear differential operator, or more general linear
operator, characterizing its range can be a much more challenging problem.
The fundamental theorem regarding solutions to inhomogeneous linear equations exactly mimics our earlier result, Theorem 2.37, in the particular case of matrix systems.
Theorem 7.37. Let L: V W be a linear function. Let f W . Then the inhomogeneous linear system
L[ u ] = f
(7.47)
has a solution if and only if f rng L. In this case, the general solution to the system has
the form
u = u? + z
(7.48)
where u? is a particular solution, so L[ u? ] = f , and z is a general element of ker L, i.e.,
the general solution to the corresponding homogeneous system
L[ z ] = 0.
(7.49)
Proof : We merely repeat the proof of Theorem 2.37. The existence condition f rng L
is an immediate consequence of the definition of the range. Suppose u? is a particular
solution to (7.47). If z is a solution to (7.49), then, by linearity,
L[ u? + z ] = L[ u? ] + L[ z ] = f + 0 = f ,
and hence u? + z is also a solution to (7.47). To show that every solution has this form,
let u be a second solution, so that L[ u ] = f . Then
L[ u u? ] = L[ u ] L[ u? ] = f f = 0.
Therefore u u? = z ker L is a solution to (7.49).
1/12/04
253
Q.E.D.
c 2003
Peter J. Olver
Remark : In physical systems, the inhomogeneity f typically corresponds to an external forcing function. The solution z to the homogeneous system represents the systems
natural, unforced motion. Therefore, the decomposition formula (7.48) states that a linear
system responds to an external force as a combination of its own internal motion and a
specific motion u? induced by the forcing. Examples of this important principle appear
throughout the book.
Corollary 7.38. The inhomogeneous linear system (7.47) has a unique solution if
and only if f rng L and ker L = {0}.
Therefore, to prove that a linear system has a unique solution, we first need to prove
an existence result that there is at least one solution, which requires the right hand side f
to lie in the range of the operator L, and then a uniqueness result, that the only solution
to the homogeneous system L[ z ] = 0 is the trivial zero solution z = 0. Consequently, if
an inhomogeneous system L[ u ] = f has a unique solution, then any other inhomogeneous
system L[ u ] = g that is defined by the same linear function also has a unique solution for
every g rng L.
Example 7.39. Consider the inhomogeneous linear second order differential equation
u00 + u0 2 u = x.
Note that this can be written in the linear system form
L[ u ] = x,
where
L = D2 + D 2
is a linear second order differential operator. The kernel of the differential operator L is
found by solving the associated homogeneous linear equation
L[ z ] = z 00 + z 0 2 z = 0.
Applying the usual solution method, we find that the homogeneous differential equation
has a two-dimensional solution space, with basis functions
z1 (x) = e 2 x ,
z2 (x) = ex .
One could also employ the method of variation of parameters, although in general the undetermined coefficient method, when applicable, is the more straightforward of the two. Details of
the two methods can be found, for instance, in [ 24 ].
1/12/04
254
c 2003
Peter J. Olver
1
4
1
4
+ c 1 e 2 x + c 2 ex .
(7.50)
Theorem 7.37 implies that every solution to this inhomogeneous version of the Laplace
equation takes the form
u(x, y) = 21 sin(x + y) + z(x, y),
where z(x, y) is an arbitrary solution to the homogeneous Laplace equation (7.45).
Example 7.41. The problem is to solve the linear boundary value problem
u00 + u = x,
u(0) = 0,
(7.51)
u() = 0.
The first step is to solve the differential equation. To this end, we find that cos x and sin x
form a basis for the solution space to the corresponding homogeneous differential equation
z 00 + z = 0. The method of undetermined coefficients then produces the particular solution
u? (x) = x to the inhomogeneous differential equation, and so the general solution is
u(x) = x + c1 cos x + c2 sin x.
(7.52)
The next step is to see whether any solutions also satisfy the boundary conditions. Plugging
formula (7.52) into the boundary conditions gives
u(0) = c1 = 0,
u() = c1 = 0.
However, these two conditions are incompatible, and so there is no solution to the linear
system (7.51). The function f (x) = x does not lie in the range of the differential operator
L[ u ] = u00 + u when u is subjected to the boundary conditions.
On the other hand, if we change the inhomogeneity, the boundary value problem
u00 + u = x 12 ,
u(0) = 0,
u() = 0.
(7.53)
does admit a solution, but the solution fails to be unique. Applying the preceding solution
method, we find that the function
u(x) = x 21 + 21 cos x + c sin x
1/12/04
255
c 2003
Peter J. Olver
solves the system for any choice of constant c. Note that z(x) = sin x forms a basis for the
kernel or solution space of the homogeneous boundary value problem
z 00 + z = 0,
z(0) = 0,
z() = 0.
(7.54)
then the system is compatible for any inhomogeneity f (x), and the solution to the boundary value problem is unique. For example, if f (x) = x, then the unique solution is
u(x) = x 21 sin x .
(7.55)
This example highlights some major differences between boundary value problems
and initial value problems for ordinary differential equations. For nonsingular initial value
problems, there is a unique solution for any set of initial conditions. For boundary value
problems, the structure of the solution space either a unique solution for all inhomogeneities, or no solution, or infinitely many solutions, depending on the right hand side
has more of the flavor of a linear matrix system. An interesting question is how to
characterize the inhomogeneities f (x) that admit a solution, i.e., lie in the range of the
operator. We will return to this question in Chapter 11.
Superposition Principles for Inhomogeneous Systems
The superposition principle for inhomogeneous linear systems allows us to combine
different inhomogeneities provided we do not change the underlying linear operator. The
result is a straightforward generalization of the matrix version described in Theorem 2.42.
Theorem 7.42. Let L: V W be a prescribed linear function. Suppose that, for
each i = 1, . . . , k, we know a particular solution u?i to the inhomogeneous linear system
L[ u ] = f i for some f i rng L. Given scalars c1 , . . . , ck , a particular solution to the
combined inhomogeneous system
L[ u ] = c1 f 1 + + ck f k
(7.56)
is the same linear combination u? = c1 u?1 + + ck u?k of particular solutions. The general
solution to the inhomogeneous system (7.56) is
u = u? + z = c1 u?1 + + ck u?k + z,
where z ker L is the general solution to the associated homogeneous system L[ z ] = 0.
The proof is an easy consequence of linearity, and left to the reader. In physical
terms, the superposition principle can be interpreted as follows. If we know the response
of a linear physical system to several different external forces, represented by f 1 , . . . , fk ,
then the response of the system to a linear combination of these forces is just the identical
linear combination of the individual responses. The homogeneous solution z represents an
internal motion that the system acquires independent of any external forcing. Superposition requires linearity of the system, and so is always applicable in quantum mechanics,
1/12/04
256
c 2003
Peter J. Olver
which is a linear theory. But, in classical and relativistic mechanics superposition only
applies in a linear approximation corresponding to small motions/displacements/etc. The
nonlinear regime is much more unpredictable, and combinations of external forces may
lead to unexpected results.
Example 7.43. We already know that a particular solution to the linear differential
equation
u00 + u = x
is
u?1 = x.
The method of undetermined coefficients is used to solve the inhomogeneous equation
u00 + u = cos x.
Since cos x and sin x are already solutions to the homogeneous equation, we must use
the solution ansatz u = a x cos x + b x sin x, which, when substituted into the differential
equation, produces the particular solution
u?2 = 21 x sin x.
Therefore, by the superposition principle, the combination inhomogeneous system
u00 + u = 3 x 2 cos x
The general solution is obtained by appending the general solution to the homogeneous
equation: u = 3 x + x sin x + c1 cos x + c2 sin x.
Example 7.44. Consider the boundary value problem
u00 + u = x,
u(0) = 2,
u 21 = 1,
(7.57)
which is a modification of (7.54) with inhomogeneous boundary conditions. The superposition principle applies here, and allows us to decouple the inhomogeneity due to the
forcing from the inhomogeneity due to the boundary conditions. We already solved the
boundary value problem with homogeneous boundary conditions; see (7.55). On the other
hand, the unforced boundary value problem
u00 + u = 0,
u(0) = 2,
u 12 = 1,
(7.58)
has unique solution
(7.59)
Therefore, the solution to the combined problem (7.57) is the sum of these two:
The solution is unique because the corresponding homogeneous boundary value problem
z 00 + z = 0,
z(0) = 0,
z 21 = 0,
has only the trivial solution z(x) 0. Incidentally, the solution (7.59) can itself be
decomposed as a linear combination of the solutions cos x and sin x to a pair of yet more
elementary boundary
problems with just one inhomogeneous
1 boundary condition;
1 value
namely, u(0) = 1, u 2 = 0, and, respectively, u(0) = 0, u 2 = 1.
1/12/04
257
c 2003
Peter J. Olver
where
v = Re u =
u+u
,
2
w = Im u =
uu
,
2i
(7.60)
1 2i
1
2
1 + 2i
1
2
u = 3 i = 0 + i 3 ,
u = 3 i = 0 i 3 .
then
5
5
0
5
5
0
The same definition of real and imaginary part carries over to general conjugated vector
spaces. A subspace V C n is conjugated if and only if u V whenever u V . Another
prototypical example of a conjugated vector space is the space of complex-valued functions
f (x) = r(x) + i s(x) defined on the interval a x b. The complex conjugate function is
f (x) = r(x) i s(x). Thus, the complex conjugate of
e(1+3 i )x = ex cos 3 x + i ex sin 3 x
is
(7.61)
258
c 2003
Peter J. Olver
whenever
L[ u ] = 0,
and hence the complex conjugate u of any solution is also a solution. Therefore, by linear
superposition, v = Re u = 12 (u+u) and w = Im u = 21i (uu) are also solutions. Q.E.D.
Example 7.48. The real linear matrix system
x
0
2 1 3 0 y
=
0
2 1 1 2
z
w
3
1
1 3 i
1
0
1
u=
.
+ i
=
2
1
1 + 2i
4
2
2 4 i
Since the coefficient matrix is real, the real and imaginary parts,
T
v = ( 1, 1, 1, 2 ) ,
w = ( 3, 0, 2, 4 ) ,
2
1+ i
2 i
0
i
2 i
x
0 y
0
=
1
z
0
w
1
1 i
i 0
u=
= +
2
2
2
2 + 2i
1
1
i
.
0
2
However, neither the real nor the imaginary part is a solution to the system.
Example 7.49. Consider the real ordinary differential equation
u00 + 2 u0 + 5 u = 0.
To solve it, as in Example 7.31, we use the exponential ansatz u = e x , leading to the
characteristic equation
2 + 2 + 5 = 0.
1/12/04
259
c 2003
Peter J. Olver
2 = 1 2 i ,
w(x) = e x sin 2 x,
2u
= n(n 1) i 2 (x + i y)n2 = n(n 1)(x + i y)n2 ,
y 2
and hence uxx + uyy = 0. Since the Laplace operator is real, Theorem 7.47 implies that
the real and imaginary parts of this complex solution are real solutions. The resulting real
solutions are known as harmonic polynomials.
1/12/04
260
c 2003
Peter J. Olver
To find the explicit formulae for the harmonic polynomials, we use the Binomial
Formula and the fact that i 2 = 1, i 3 = i , i 4 = 1, etc., to expand
n n2
n n3
n
n
n1
2
(x + i y) = x + n x
( i y) +
x
( i y) +
x
( i y)3 +
2
3
n n3 3
n n2 2
n
n1
x
y + ,
x
y i
= x + i nx
y
3
2
in which we use the standard notation
n!
n
=
k
k ! (n k) !
(7.62)
for the binomial coefficients. Separating the real and imaginary terms, we find
n n4 4
n n2 2
n
n
x
y + ,
x
y +
Re (x + i y) = x
4
2
n n3 3
n n5 5
n
n1
Im (x + i y) = n x
y
x
y +
x
y + .
3
5
(7.63)
The first few of these harmonic polynomials were described in Example 7.35. In fact, it can
be proved that every polynomial solution to the Laplace equation is a linear combination
of the fundamental real harmonic polynomials; see Chapter 16 for full details.
7.5. Adjoints.
In Sections 2.5 and 5.6, we discovered the importance of the adjoint system A T y = f
in the analysis of systems of linear equations A x = b. Two of the four fundamental matrix
subspaces are based on the transposed matrix. While the m n matrix A defines a linear
function from R n to R m , its transpose, AT , has size n m and hence characterizes a linear
function in the reverse direction, from R m to R n .
As with most fundamental concepts for linear matrix systems, the adjoint system and
transpose operation on the coefficient matrix are the prototypes of a much more general
construction that is valid for general linear functions. However, it is not as obvious how
to transpose a more general linear operator L[ u ], e.g., a differential operator acting on
function space. In this section, we shall introduce the concept of the adjoint of a linear
function that generalizes the transpose operation on matrices. Unfortunately, most of the
interesting examples must be deferred until we develop additional analytical tools, starting
in Chapter 11.
The adjoint (and transpose) relies on an inner product structure on both the domain
and target spaces. For simplicity, we restrict our attention to real inner product spaces,
leaving the complex version to the interested reader. Thus, we begin with a linear function
L: V W that maps an inner product space V to a second inner product space W . We
distinguish the inner products on V and W (which may be different even when V and W
are the same vector space) by using a single angle bracket
1/12/04
ei
hv;v
e V,
v, v
c 2003
Peter J. Olver
e W.
w, w
With the prescription of inner products on both the domain and target spaces, the abstract
definition of the adjoint of a linear function can be formulated.
Definition 7.52. Let V, W be inner product spaces, and let L: V W be a linear
function. The adjoint of L is the function L : W V that satisfies
hh L[ v ] ; w ii = h v ; L [ w ] i
for all
v V,
w W.
(7.64)
Note that the adjoint function goes in the opposite direction to L, just like the transposed matrix. Also, the left hand side of equation (7.64) indicates the inner product on
W , while the right hand side is the inner product on V which is where the respective
vectors live. In infinite-dimensional situations, the adjoint may not exist. But if it does,
then it is uniquely determined by (7.64); see Exercise .
Remark : Technically, (7.64) only defines the formal adjoint of L. For the infinitedimensional function spaces arising in analysis, a true adjoint must satisfy certain additional requirements, [122]. However, we will suppress all such advanced analytical complications in our introductory treatment of the subject.
Lemma 7.53. The adjoint of a linear function is a linear function.
Proof : Given v V , w, z W , and scalars c, d R, we find
h v ; L [ c w + d z ] i = hh L[ v ] ; c w + d z ii = c hh L[ v ] ; w ii + d hh L[ v ] ; z ii
= c h v ; L [ w ] i + d h v ; L [ z ] i = h v ; c L [ w ] + d L [ z ] i.
Since this holds for all v V , we must have
L [ c w + d z ] = c L [ w ] + d L [ z ],
proving linearity.
Q.E.D.
1/12/04
e Rn,
v, v
262
e
e ii = wT w,
hh w ; w
e Rm,
w, w
c 2003
Peter J. Olver
as our inner products on both R n and R m . Evaluation of both sides of the adjoint equation (7.64) gives
hh L[ v ] ; w ii = hh A v ; w ii = (A v)T w = vT AT w,
h v ; L [ w ] i = h v ; A w i = vT A w.
(7.65)
Since these must agree for all v, w, cf. Exercise , the matrix A representing L is equal to
the transposed matrix AT . Therefore, the adjoint of a matrix with respect to the Euclidean
inner product is its transpose: A = AT .
Example 7.56. Let us now adopt different, weighted inner products on the domain
and target spaces for the linear map L: R n 7R m given by L[ v ] = A v. Suppose that
e i = vT M v
e , while
the inner product on the domain space R n is given by h v ; v
m
T
e ii = w C w,
e
the inner product on the target space R is given by hh w ; w
where M > 0 and C > 0 are positive definite matrices of respective sizes m m and n n.
Then, in place of (7.65), we have
hh A v ; w ii = (A v)T C w = vT AT C w,
h v ; A w i = vT M A w.
(7.66)
In applications, M plays the role of the mass matrix, and explicitly appears in the dynamical systems to be solved in Chapter 9. In particular, suppose A is square, defining a
e i = vT C v
e on both the
linear map L: R n R n . If we adopt the same inner product h v ; v
n
1 T
domain and target spaces R , then the adjoint matrix A = C A C is similar to the
transpose.
Everything that we learned about transposes can be reinterpreted in the more general
language of adjoints. The next result generalizes the fact, (1.49), that the transpose of the
product of two matrices is the product of the transposes, in the reverse order.
Lemma 7.57. If L: V W and M : W Z have respective adjoints L : W V
and M : Z W , then the composite linear function M L: V Z has adjoint (M L) =
L M , which maps Z to V .
e i, hh w ; w
e ii, hhh z ; e
Proof : Let h v ; v
z iii, denote, respectively, the inner products on
V, W, Z. For v V , z Z, we compute using the definition (7.64),
h v ; (M L) [ z ] i = hhh M L[ v ] ; z iii = hhh M [ L[ v ] ] ; z iii
= hh L[ v ] ; M [ z ] ii = h v ; L [ M [ z ] ] i = h v ; (L M )[ z ] i.
263
Q.E.D.
c 2003
Peter J. Olver
for all
06
= v V.
(7.67)
In particular, if K > 0 then ker K = {0}, and so the positive definite linear system
K[ u ] = f with f rng K has a unique solution. The next result generalizes our basic
observation that the Gram matrices K = AT A, cf. (3.49), are symmetric and positive
(semi-)definite.
Theorem 7.59. Let L: V W be a linear map between inner product spaces
with adjoint L : W V . Then the composite map K = L L: V V is self-adjoint.
Moreover, K is positive definite if and only if ker L = {0}.
Proof : First, by Lemmas 7.57 and 7.54,
K = (L L) = L (L ) = L L = K,
proving self-adjointness. Furthermore, for v V , the inner product
h v ; K[ v ] i = h v ; L [ L[ v ] ] i = h L[ v ] ; L[ v ] i = k L[ v ] k2 > 0
is strictly positive provided L[ v ] 6
= 0. Thus, if ker L = {0}, then the positivity condition
(7.67) holds, and conversely.
Q.E.D.
Consider the case of a linear function L: R n R m that is represented by the m n
matrix A. For the Euclidean dot product on the two spaces, the adjoint L is represented
by the transpose AT , and hence the map K = L L has matrix representation AT A.
Therefore, in this case Theorem 7.59 reduces to our earlier Proposition 3.32 governing the
positive definiteness of the Gram matrix product AT A. If we change the inner product on
e ii = wT C w,
e then L is represented by AT C, and hence
the target space space to hh w ; w
K = L L has matrix form AT C A, which is the general symmetric, positive definite Gram
matrix constructed in (3.51) that played a key role in our development of the equations of
ei =
equilibrium in Chapter 6. Finally, if we also use the alternative inner product h v ; v
e on the domain space R n , then, according to (7.66), the adjoint of L has matrix
vT M v
form
A = M 1 AT C,
and therefore
K = A A = M 1 AT C A
(7.68)
1/12/04
264
c 2003
Peter J. Olver
is a self-adjoint, positive definite matrix with respect to the weighted inner product on
R n prescribed by the positive definite matrix M . In this case, the positive definite, selfadjoint operator K is no longer represented by a symmetric matrix. So, we did not quite
tell the truth when we said we would only allow symmetric matrices to be positive definite
we really meant only self-adjoint matrices. The general case will be important in our
discussion of the vibrations of mass/spring chains that have unequal masses. Extensions of
these constructions to differential operators underlies the analysis of the static and dynamic
differential equations of continuum mechanics, to be studied in Chapters 1118.
Minimization
In Chapter 4, we learned that the solution to a matrix system K u = f , with positive
definite coefficient matrix K > 0, can be characterized as the unique minimizer for the
quadratic function p(u) = 12 uT K u uT f . There is an analogous minimization principle
that characterizes the solutions to linear systems defined by positive definite linear operators. This general result is of tremendous importance in analysis of boundary value
problems for differential equations and also underlies the finite element numerical solution
algorithms. Details will appear in the subsequent chapters.
Theorem 7.60. Let K: V V be a positive definite operator on an inner product
space V . If f rng K, then the quadratic function
p(u) =
1
2
h u ; K[ u ] i h u ; f i
(7.69)
p(u) =
1
2
h u ; K[ u ] i h u ; K[ u? ] i =
1
2
h u u? ; K[ u u? ] i
1
2
h u? ; K[ u? ] i. (7.70)
where we used linearity, along with the fact that K is self-adjoint to identify the terms
h u ; K[ u? ] i = h u? ; K[ u ] i. Since K > 0 is positive definite, the first term on the right
hand side of (7.70) is always 0; moreover it equals its minimal value 0 if and only if
u = u? . On the other hand, the second term does not depend upon u at all, and hence is
a constant. Therefore, to minimize p(u) we must make the first term as small as possible,
which is accomplished by setting u = u? .
Q.E.D.
Remark : For linear functions given by matrix multiplication, positive definiteness
automatically implies invertibility, and so the linear system K u = f has a solution for
every right hand side. This is no longer necessarily true when K is a positive definite
operator on an infinite-dimensional function space. Therefore, the existence of a solution
or minimizer is a significant issue. And, in fact, many modern analytical existence results
rely on such minimization principles.
Theorem 7.61. Suppose L: V W is a linear map between inner product spaces
with ker L = {0} and adjoint map L : W V . Let K = L L: V V be the associated
positive definite operator. If f rng K, then the quadratic function
p(u) =
1/12/04
1
2
k L[ u ] k2 h u ; f i
265
(7.71)
c 2003
Peter J. Olver
or
K u = M 1 AT C A u = f .
This also follows from our earlier finite-dimensional minimization Theorem 4.1.
This section is a preview of things to come, but the full implications will require us to
develop more analytical expertise. In Chapters 11, 15 and 18 , we will find that the most
important minimization principles for characterizing solutions to the linear boundary value
problems of physics and engineering all arise through this general, abstract construction.
1/12/04
266
c 2003
Peter J. Olver
Chapter 8
Eigenvalues
So far, our applications have concentrated on statics: unchanging equilibrium configurations of physical systems mass/spring chains, circuits, and structures that are
modeled by linear systems of algebraic equations. It is now time to allow motion in our
universe. In general, a dynamical system refers to the (differential) equations governing
the temporal behavior of some physical system: mechanical, electrical, chemical, fluid, etc.
Our immediate goal is to understand the behavior of the simplest class of linear dynamical systems frirst order autonomous linear systems of ordinary differential equations.
As always, complete analysis of the linear situation is an essential prerequisite to making
progress in the more complicated nonlinear realm.
We begin with a very quick review of the scalar case, whose solutions are exponential
functions. Substituting a similar exponential solution ansatz into the system leads us
immediately to the equations defining the eigenvalues and eigenvectors of the coefficient
matrix. Eigenvalues and eigenvectors are of absolutely fundamental importance in both
the mathematical theory and a very wide range of applications, including iterative systems
and numerical solution methods. Thus, to continue we need to gain a proper grounding in
their basic theory and computation.
The present chapter develops the most important properties of eigenvalues and eigenvectors; the applications to dynamical systems will appear in Chapter 9, while applications
to iterative systems and numerical methods is the topic of Chapter 10. Extensions of the
eigenvalue concept to differential operators acting on infinite-dimensional function space,
of essential importance for solving linear partial differential equations modelling continuous
dynamical systems, will be covered in later chapters.Each square matrix has a collection of
one or more complex scalars called eigenvalues and associated vectors, called eigenvectors.
Roughly speaking, the eigenvectors indicate directions of pure stretch and the eigenvalues the amount of stretching. Most matrices are complete, meaning that their (complex)
eigenvectors form a basis of the underlying vector space. When written in the eigenvector
basis, the matrix assumes a very simple diagonal form, and the analysis of its properties
becomes extremely simple. A particularly important class are the symmetric matrices,
whose eigenvectors form an orthogonal basis of R n ; in fact, this is by far the most common
way for orthogonal bases to appear. Incomplete matrices are trickier, and we relegate
them and their associated non-diagonal Jordan canonical form to the final section. The
numerical computation of eigenvalues and eigenvectors is a challenging issue, and must be
See the footnote in Chapter 7 for an explanation of the term ansatz or inspired guess.
1/12/04
267
c 2003
Peter J. Olver
be deferred until Section 10.6. Unless you are prepared to consult that section now, in
order to solve the computer-based problems in this chapter, you will need to make use of
a program that can accurately compute eigenvalues and eigenvectors of matrices.
A non-square matrix A does not have eigenvalues; however, we have already made
extensive use of the associated square Gram matrix K = AT A. The square roots of the
eigenvalues of K serve to define the singular values of A. Singular values and principal
component analysis are now used in an increasingly broad range of modern applications,
including statistical analysis, image processing, semantics, language and speech recognition, and learning theory. The singular values are used to define the condition number of
a matrix, that indicates the degree of difficulty of accurately solving the associated linear
system.
(8.1)
Here a R is a real constant, while the unknown u(t) is a scalar function. As you learned
in first year calculus, the general solution to (8.1) is an exponential function
u(t) = c ea t .
(8.2)
(8.3)
and so
c = b e a t0 .
We conclude that
u(t) = b ea(tt0 ) .
(8.4)
is the unique solution to the scalar initial value problem (8.1), (8.3).
1/12/04
268
c 2003
Peter J. Olver
-1
-0.5
0.5
-1
-0.5
0.5
-1
-0.5
0.5
-2
-2
-2
-4
-4
-4
-6
-6
a<0
-6
a=0
Figure 8.1.
a>0
Solutions to u = a u.
Example 8.1. The radioactive decay of an isotope, say Uranium 238, is governed
by the differential equation
du
= u.
(8.5)
dt
Here u(t) denotes the amount of the isotope remaining at time t, and the coefficient
> 0 governs the decay rate. The solution is given by an exponentially decaying function
u(t) = c e t , where c = u(0) is the initial amount of radioactive material.
The half-life t? is the time it takes for half of a sample to decay, that is when u(t? ) =
1
?
2 u(0). To determine t , we solve the algebraic equation
?
e t = 12 ,
so that
t? =
log 2
.
(8.6)
At each integer multiple n t? of the half-life, exactly half of the isotope has decayed, i.e.,
u(n t? ) = 2n u(0).
Let us make some elementary, but pertinent observations about this simple linear
dynamical system. First of all, since the equation is homogeneous, the zero function
u(t) 0 (corresponding to c = 0) is a constant solution, known as an equilibrium solution
or fixed point, since it does not depend on t. If the coefficient a > 0 is positive, then the
solutions (8.2) are exponentially growing (in absolute value) as t + . This implies that
the zero equilibrium solution is unstable. The initial condition u(t 0 ) = 0 produces the zero
solution, but if we make a tiny error (either physical, numerical, or mathematical) in the
initial data, say u(t0 ) = , then the solution u(t) = ea(tt0 ) will eventually get very far
away from equilibrium. More generally, any two solutions with very close, but not equal,
initial data, will eventually become arbitrarily far apart: | u 1 (t) u2 (t) | as t .
One consequence is the inherent difficulty in accurately computing the long time behavior
of the solution, since small numerical errors will eventually have very large effects.
On the other hand, if a < 0, the solutions are exponentially decaying in time. In this
case, the zero solution is stable, since a small error in the initial data will have a negligible
effect on the solution. In fact, the zero solution is globally asymptotically stable. The
phrase asymptotically stable implies that solutions that start out near zero eventually
return; more specifically, if u(t0 ) = is small, then u(t) 0 as t . The adjective
globally implies that this happens no matter how large the initial data is. In fact, for
1/12/04
269
c 2003
Peter J. Olver
a linear system, the stability (or instability) of an equilibrium solution is always a global
phenomenon.
The borderline case is when a = 0. Then all the solutions to (8.1) are constant. In this
case, the zero solution is stable indeed, globally stable but not asymptotically stable.
The solution to the initial value problem u(t0 ) = is u(t) . Therefore, a solution that
starts out near equilibrium will remain near, but will not asymptotically return. The three
qualitatively different possibilities are illustrated in Figure 8.1.
First Order Dynamical Systems
The simplest class of dynamical systems consist of n first order ordinary differential
equations for n unknown functions
du1
dun
= f1 (t, u1 , . . . , un ),
...
= fn (t, u1 , . . . , un ),
dt
dt
which depend on a scalar variable t R, which we usually view as time. We will often
write the system in the equivalent vector form
du
= f (t, u).
dt
(8.7)
The vector-valued solution u(t) = (u1 (t), . . . , un (t))T serves to parametrize a curve in R n ,
called a solution trajectory. A dynamical system is called autonomous if the time variable
t does not appear explicitly on the right hand side, and so has the system has the form
du
= f (u).
(8.8)
dt
Dynamical systems of ordinary differential equations appear in an astonishing variety of
applications, and have been the focus of intense research activity since the early days of
calculus.
We shall concentrate most of our attention on the very simplest case: a homogeneous,
linear, autonomous dynamical system
du
= A u,
(8.9)
dt
in which A is a constant nn matrix. In full detail, the system consists of n linear ordinary
differential equations
du1
= a11 u1 + a12 u2 + + a1n un ,
dt
du2
= a21 u1 + a22 u2 + + a2n un ,
dt
(8.10)
..
..
.
.
dun
= an1 u1 + an2 u2 + + ann un ,
dt
involving n unknown functions u1 (t), u2 (t), . . . , un (t). In the autonomous case, the coefficients aij are assumed to be (real) constants. We seek not only to develop basic solution
1/12/04
270
c 2003
Peter J. Olver
techniques for such dynamical systems, but to also understand their behavior from both a
qualitative and quantitative standpoint.
Drawing our inspiration from the exponential solution formula (8.2) in the scalar case,
let us investigate whether the vector system has any solutions of a similar exponential form
u(t) = e t v,
(8.11)
d t
du
=
e v = e t v.
dt
dt
is a scalar, it commutes with matrix multiplication, and so
A u = A e t v = e t A v.
(8.12)
271
c 2003
Peter J. Olver
where I is the identity matrix of the correct size . Now, for given , equation (8.13) is a
homogeneous linear system for v, and always has the trivial zero solution v = 0. But we
are specifically seeking a nonzero solution! According to Theorem 1.45, a homogeneous
linear system has a nonzero solution v 6
= 0 if and only if its coefficient matrix, which in
this case is A I , is singular. This observation is the key to resolving the eigenvector
equation.
Theorem 8.3. A scalar is an eigenvalue of the n n matrix A if and only if
the matrix A I is singular, i.e., of rank < n. The corresponding eigenvectors are the
nonzero solutions to the eigenvalue equation (A I )v = 0.
We know a number of ways to characterize singular matrices, including the determinantal criterion given in Theorem 1.50. Therefore, the following result is an immediate
corollary of Theorem 8.3.
Proposition 8.4. A scalar is an eigenvalue of the matrix A if and only if is a
solution to the characteristic equation
det(A I ) = 0.
(8.14)
In practice, when finding eigenvalues and eigenvectors by hand, one first solves the
characteristic equation (8.14). Then, for each eigenvalue one uses standard linear algebra
methods, i.e., Gaussian elimination, to solve the corresponding linear system (8.13) for the
eigenvector v.
Example 8.5. Consider the 2 2 matrix
3 1
A=
.
1 3
We compute the determinant in the characteristic equation using (1.34):
3
1
det(A I ) = det
= (3 )2 1 = 2 6 + 8.
1
3
Note that it is not legal to write (8.13) in the form (A )v = 0 since we do not know how
to subtract a scalar from a matrix A. Worse, if you type A in Matlab, it will subtract
from all the entries of A, which is not what we are after!
1/12/04
272
c 2003
Peter J. Olver
x + y = 0,
1 1
x
0
(A 4 I ) v =
=
,
or
1 1
y
0
x y = 0.
The general solution is
x = y = a,
so
1
a
,
=a
v=
1
a
where a is an arbitrary scalar. Only the nonzero solutions count as eigenvectors, and so
the eigenvectors for the eigenvalue 1 = 4 must have a 6
= 0, i.e., they are all nonzero scalar
T
multiples of the basic eigenvector v1 = ( 1, 1 ) .
Remark : In general, if v is an eigenvector of A for the eigenvalue , then so is any
nonzero scalar multiple of v. In practice, we only distinguish linearly independent eigenT
vectors. Thus, in this example, we shall say v1 = ( 1, 1 ) is the eigenvector corresponding
to the eigenvalue 1 = 4, when we really mean that the eigenvectors for 1 = 4 consist of
all nonzero scalar multiples of v1 .
Similarly, for the second eigenvalue 2 = 2, the eigenvector equation is
1 1
x
0
(A 2 I ) v =
=
.
1 1
y
0
T
1
1
.
,
2 = 2,
v2 =
1 = 4,
v1 =
1
1
Example 8.6. Consider the 3 3 matrix
0 1 1
A = 1 2
1 .
1 1
2
If, at this stage, you end up with a linear system with only the trivial zero solution, youve
done something wrong! Either you dont have a correct eigenvalue maybe you made a mistake
setting up and/or solving the characteristic equation or youve made an error solving the
homogeneous eigenvector system.
1/12/04
273
c 2003
Peter J. Olver
Using the formula (1.82) for a 3 3 determinant, we compute the characteristic equation
1 2
0 = det(A I ) = det
1
1
1
1
2
1 (2 )(1) 1 1 ( ) (2 ) 1 (1)
= 3 + 4 2 5 + 2.
1
(A I )v =
1
0
x
1 1
y = 0.
1
1
0
z
1
1
1
1
a b
v = a = a 1 + b 0
1
0
b
depends upon two free variables, y = a, z = b. Any nonzero solution forms a valid
eigenvector for the eigenvalue 1 = 1, and so the general eigenvector is any non-zero linear
T
b1 = ( 1, 0, 1 )T .
combination of the two basis eigenvectors v1 = ( 1, 1, 0 ) , v
On the other hand, the eigenvector equation for the simple eigenvalue 2 = 2 is
(A 2 I )v =
1
1
1 1
x
0
0
1
y = 0.
1
0
z
0
1
a
v = a = a 1
1
a
274
c 2003
Peter J. Olver
In summary, the eigenvalues and (basis) eigenvectors for this matrix are
1
1
b1 = 0 ,
v
1 = 1,
v1 = 1 ,
1
0
1 .
2 = 2,
v2 =
1
(8.15)
In general, given an eigenvalue , the corresponding eigenspace V R n is the subspace spanned by all its eigenvectors. Equivalently, the eigenspace is the kernel
V = ker(A I ).
(8.16)
1 2 1
Example 8.7. The characteristic equation of the matrix A = 1 1 1 is
2 0 1
0 = det(A I ) = 3 + 2 + 5 + 3 = ( + 1)2 ( 3).
2 2 1
A 1 I = A + I = 1 0 1
2 0 2
T
2
2
2 = 3,
v2 = 1 .
1 = 1,
v1 = 1 ,
2
2
1 2 0
Example 8.8. Finally, consider the matrix A = 0 1 2 . The characteristic
2 2 1
equation is
0 = det(A I ) = 3 + 2 3 5 = ( + 1) (2 2 + 5).
The linear factor yields the eigenvalue 1. The quadratic factor leads to two complex
roots, 1 + 2 i and 1 2 i , which can be obtained via the quadratic formula. Hence A has
one real and two complex eigenvalues:
1 = 1,
1/12/04
2 = 1 + 2 i ,
275
3 = 1 2 i .
c 2003
Peter J. Olver
Complex eigenvalues are as important as real eigenvalues, and we need to be able to handle
them too. To find the corresponding eigenvectors, which will also be complex, we need
to solve the usual eigenvalue equation (8.13), which is now a complex homogeneous linear
system. For example, the eigenvector(s) for 2 = 1 + 2 i are found by solving
2 i
(A (1 + 2 i ) I )v =
0
2
2
2 i
2
0
0
x
2
y = 0.
0
2 2 i
z
This linear system can be solved by Gaussian elimination (with complex pivots). A simpler
approach is to work directly: the first equation 2 i x + 2 y = 0 tells us that y = i x, while
the second equation 2 i y 2 z = 0 says z = i y = x. If we trust our calculations
so far, we do not need to solve the final equation 2 x + 2 y + (2 2 i )z = 0, since we
know that the coefficient matrix is singular and hence it must be a consequence of the first
two equations. (However, it does serve as a useful check on our work.) So, the general
T
solution v = ( x, i x, x ) is an arbitrary constant multiple of the complex eigenvector
T
v2 = ( 1, i , 1 ) .
Summarizing, the matrix under consideration has three complex eigenvalues and three
corresponding eigenvectors, each unique up to (complex) scalar multiple:
1 = 1,
1
v1 = 1 ,
1
2 = 1 + 2 i ,
1
v2 = i ,
1
3 = 1 2 i ,
1
v3 = i .
1
Note that the third complex eigenvalue is the complex conjugate of the second, and the
eigenvectors are similarly related. This is indicative of a general fact for real matrices:
Proposition 8.9. If A is a real matrix with a complex eigenvalue = + i and
corresponding complex eigenvector v = x + i y, then the complex conjugate = i is
also an eigenvalue with complex conjugate eigenvector v = x i y.
Proof : First take complex conjugates of the eigenvalue equation (8.12)
A v = A v = v = v.
Using the fact that a real matrix is unaffected by conjugation, so A = A, we conclude
A v = v,
(8.17)
Q.E.D.
As a consequence, when dealing with real matrices, one only needs to compute the
eigenvectors for one of each complex conjugate pair of eigenvalues. This observation effectively halves the amount of work in the unfortunate event that we are confronted with
complex eigenvalues.
1/12/04
276
c 2003
Peter J. Olver
Remark : The reader may recall that we said one should never use determinants in
practical computations. So why have we reverted to using determinants to find eigenvalues?
The truthful answer is that the practical computation of eigenvalues and eigenvectors never
resorts to the characteristic equation! The method is fraught with numerical traps and
inefficiencies when (a) computing the determinant leading to the characteristic equation,
then (b) solving the resulting polynomial equation, which is itself a nontrivial numerical
problem, [30], and, finally, (c) solving each of the resulting linear eigenvector systems.
e to the true eigenvalue , the approximate
Indeed, if we only know an approximation
e = 0 has a nonsingular coefficient matrix, and hence only
eigenvector system (A )v
admits the trivial solution which does not even qualify as an eigenvector! Nevertheless,
the characteristic equation does give us important theoretical insight into the structure
of the eigenvalues of a matrix, and can be used on small, e.g., 2 2 and 3 3, matrices,
when exact arithmetic is employed. Numerical algorithms for computing eigenvalues and
eigenvectors are based on completely different ideas, and will be discussed in Section 10.6.
Basic Properties of Eigenvalues
If A is an n n matrix, then its characteristic polynomial is
pA () = det(A I ) = cn n + cn1 n1 + + c1 + c0 .
(8.18)
The fact that pA () is a polynomial of degree n is a consequence of the general determinantal formula (1.81). Indeed, every term is plus or minus a product of matrix entries
containing one from each row and one from each column. The term corresponding to the
identity permutation is obtained by multiplying the the diagonal entries together, which,
in this case, is
(8.20)
where tr A, the sum of its diagonal entries, is called the trace of the matrix A. The other
coefficients cn2 , . . . , c1 in (8.18) are more complicated combinations of the entries of A.
However, setting = 0 implies pA (0) = det A =
c0 , andhence the constant term equals the
a b
determinant of the matrix. In particular, if A =
is a 22 matrix, its characteristic
c d
polynomial has the form
a
b
pA () = det(A I ) = det
c
d
(8.21)
2
2
= (a + d) + (a d b c) = (tr A) + (det A).
As a result of these considerations, the characteristic equation of an n n matrix A
is a polynomial equation of degree n, namely pA () = 0. According to the Fundamental
1/12/04
277
c 2003
Peter J. Olver
Theorem of Algebra (see Corollary 16.63) every (complex) polynomial of degree n can be
completely factored:
pA () = (1)n ( 1 )( 2 ) ( n ).
(8.22)
The complex numbers 1 , . . . , n , some of which may be repeated, are the roots of the
characteristic equation pA () = 0, and hence the eigenvalues of the matrix A. Therefore,
we immediately conclude:
Theorem 8.10. An n n matrix A has at least one and at most n distinct complex
eigenvalues.
Most n n matrices meaning those for which the characteristic polynomial factors
into n distinct factors have exactly n complex eigenvalues. More generally, an eigenvalue
j is said to have multiplicity m if the factor ( j ) appears exactly m times in the
factorization (8.22) of the characteristic polynomial. An eigenvalue is simple if it has
multiplicity 1. In particular, A has n distinct eigenvalues if and only if all its eigenvalues are
simple. In all cases, when the eigenvalues are counted in accordance with their multiplicity,
every n n matrix has a total of n possibly repeated eigenvalues.
An example of a matrix with just one eigenvalue, of multiplicity n, is the nn identity
matrix I , whose only eigenvalue is = 1. In this case, every nonzero vector in R n is an
eigenvector of the identity matrix, and so the eigenspace is all of R n . At the other extreme,
the bidiagonal Jordan block matrix
J =
1
..
..
(8.23)
also has only one eigenvalue, , again of multiplicity n. But in this case, J has only
one eigenvector (up to scalar multiple), which is the standard basis vector e n , and so its
eigenspace is one-dimensional.
Remark : If is a complex eigenvalue of multiplicity k for the real matrix A, then its
complex conjugate also has multiplicity k. This is because complex conjugate roots of a
real polynomial necessarily appear with identical multiplicities.
Remark : If n 4, then one can, in fact, write down an explicit formula for the
solution to a polynomial equation of degree n, and hence explicit (but not particularly
helpful) formulae for the eigenvalues of general 2 2, 3 3 and 4 4 matrices. As soon
as n 5, there is no explicit formula (at least in terms of radicals), and so one must
usually resort to numerical approximations. This remarkable and deep algebraic result
was proved by the young Norwegian mathematician Nils Hendrik Abel in the early part of
the nineteenth century, [57].
1/12/04
278
c 2003
Peter J. Olver
If we explicitly multiply out the factored product (8.22) and equate the result to the
characteristic polynomial (8.18), we find that its coefficients c0 , c1 , . . . cn1 can be written
as certain polynomials of the roots, known as the elementary symmetric polynomials. The
first and last are of particular importance:
c 0 = 1 2 n ,
cn1 = (1)n1 (1 + 2 + + n ).
(8.24)
Comparison with our previous formulae for the coefficients c0 and cn1 leads us to the
following useful result.
Proposition 8.11. The sum of the eigenvalues of a matrix equals its trace:
1 + 2 + + n = tr A = a11 + a22 + + ann .
(8.25)
(8.26)
Remark : For repeated eigenvalues, one must add or multiply them in the formulae
(8.25), (8.26) according to their multiplicity.
1 2 1
Example 8.12. The matrix A = 1 1 1 considered in Example 8.7 has trace
2 0 1
and determinant
tr A = 1,
det A = 3.
These fix, respectively, the coefficient of 2 and the constant term in the characteristic
equation. This matrix has two distinct eigenvalues, 1, which is a double eigenvalue, and
3, which is simple. For this particular matrix, formulae (8.25), (8.26) become
1 = tr A = (1) + (1) + 3,
3 = det A = (1)(1) 3.
279
c 2003
Peter J. Olver
(8.27)
On the other hand, if we just multiply the original equation by k , we also have
c1 k v1 + + ck1 k vk1 + ck k vk = 0.
Subtracting this from the previous equation, the final terms cancel and we are left with
the equation
c1 (1 k )v1 + + ck1 (k1 k )vk1 = 0.
This is a vanishing linear combination of the first k 1 eigenvectors, and so, by our
induction hypothesis, can only happen if all the coeffici