100% found this document useful (3 votes)
484 views648 pages

Calculus of Vector Functions

Uploaded by

pktrangxb04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
484 views648 pages

Calculus of Vector Functions

Uploaded by

pktrangxb04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Third Edition

RICHARD E. WILLIAMSON

RICHARD H. CROWELL
HALE F. TROTTER
Digitized by the Internet Archive
in 2010

[Link]
Calculus

of

Vector Functions
Third Edition

Calculus
of
Vector Functions
RICHARD E. WILLIAMSON
Department of Mathematics
Dartmouth College

RICHARD H. CROWELL
Department of Mathematics
Dartmouth College

HALE F. TROTTER
Department of Mathematics
Princeton University

Prentice-Hall, Inc., Englewood Cliffs, New Jersey


1972, 1968, 1962 by Prentice-Hall, Inc.
Englewood Cliffs, N.J.

All rights reserved. No part of this book may be


reproduced in any form or by any means without
permission in writing from the publisher.

10 9 8 7

ISBN: 0-13-1 12367-X

Library of Congress Catalog Card Number 75-167788

Printed in the United States of America

PRENTICE-HALL INTERNATIONAL, INC., London


PRENTICE-HALL OF AUSTRALIA, PTY. LTD., Sydney
PRENTICE-HALL OF CANADA, LTD., Toronto
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
Preface

This book is an introduction to the calculus of functions of several variables


and vector calculus, following the unifying idea that calculus deals with
linear approximations to functions. A working knowledge of one-variable
calculus is the only prerequisite. The necessary linear algebra is presented
in the first two chapters.
The emphasis in this third edition is on learning and understanding
mathematical techniques, together with their applications to specific
problems. The framework of linear algebra is used both to clarify concepts
and to emphasize algebraic techniques as useful tools. We have given
precise statements of all definitions and theorems, and have included
complete proofs of practically everything in the book. The proofs are,
however, meant to be studied only insofar as they help understanding.
With careful attention to the examples, a person can go through the text
intelligently without studying any proofs at all. The book is designed to
be flexible, thereby allowing an instructor to select a level of theoretical
emphasis appropriate to the interests and abilities of his class. It is unlikely
that anyone would follow either extreme course: covering all proofs or
including none.
In this edition, the material on linear algebra has been expanded and
reorganized as two chapters. The first covers vectors and linear functions
in % n and contains nearly all the basic algebraic ideas used in the calculus.
Chapter 2 includes a complete discussion of the solution of linear systems
with a section on applications, presents the fundamental ideas of abstract
vector spaces and dimension, and contains a brief but complete treatment
of linear differential equations with constant coefficients; a section on
Preface

complex vector spaces shows their usefulness in this connection. Other


topics such as orthonormal bases and eigenvectors are also treated briefly.
Material involving numerical calculation has been added in several
places. There are sections on Newton's method for functions of several
variables, on numerical estimation of implicitly defined functions, and on
numerical approximation of definite integrals. This material goes particu-
larly well with the use of an automatic computer. We have included
some exercises in these sections which cannot reasonably be done by
hand, but which students with access to a computer can do by writing
fairly simple programs. Of course we have also included enough numerical

exercises that do not require a computer.


We want to thank the people at many universities who have made
suggestions for improving the book. Their helpful interest has been most
welcome. Thanks are also due to Mrs. Dorothy Krieger and to Mr.
Arthur Wester of Prentice-Hall, whose efforts have helped to make the
book more attractive and useful.

R. E. W.

R. H. C.

H. F. T.
Possible Courses

of Study

We have tried to organize the text so that each section leads naturally
to the next, but it is by no means necessary to include all sections or to
take them up in strictly consecutive order. In particular, it is not necessary
to complete the linear algebra before starting on the calculus. (Students
who have already taken linear algebra can, of course, start with Chapter 3
and use the first two chapters for reference and review.) Everything
essential for Chapters 3 through 7 is contained in the first seven sections
of Chapter (An exception is the use of facts about dimension in the
1.

last section of Chapter 4.) The study of determinants can be postponed

until the section on change of variable in multiple integrals and the last
three sections on vector analysis, where they are really needed. On the
other hand, determinants can be used in practice for inverting small
(2-by-2 or 3-by-3) matrices or solving small systems of linear equations,
so that some may prefer to take them up earlier and postpone (or omit)
the row-reduction method given in the first section of Chapter 2. At the
cost of avoiding a few higher dimensional exercises later on, one may
even restrict discussion of matrix inversion to the trivial 2-by-2 case.
Other changes of order, such as taking up multiple integration early
in the course, should also cause no difficulty.
The sequences of section numbers listed below are minimal in the sense
that they contain the prerequisite material for later entries, but do not
contain everything that might be desirable in a typical course. Additional
sections can be added to make up a full one-year course suitable for par-
ticular needs. Experience has shown that for ordinary class use two
meetings should be used to cover an average section. An occasional short
Possible Courses of Study

section merits only one meeting, and some longer ones, in order to be
covered entirely, require three.

Ch. 1
Contents

Introduction, 1

Chapter 1 Vectors and Linearity, 4

1 Vectors, 4

2 Geometric interpretations, 7

3 Matrices, 15

4 Linear functions, 25

5 Dot products, 37

6 Euclidean geometry, 46

7 Determinants, 51

8 Determinant expansions, 67

Chapter 2 Linear Algebra, 79

1 Linear equations, inverse matrices, 79

2 Some applications, 88

3 Theory of linear equations, 97

4 Vector spaces, subspaces, dimension, 108


Contents

5 Linear functions, 120

6 Differential operators, 130

7 Complex vector spaces, 147

8 Orthonormal bases, 160

9 Eigenvectors, 167

10 Coordinates, 179

Chapter 3 Derivatives, 190

1 Vector functions, 190

2 Functions of one variable, 204

3 Arc length, 212

4 Line integrals, 220

5 Partial derivatives, 227

6 Vector partial derivatives, 234

7 Limits and continuity, 240

8 Differentials, 250

9 Newton's method, 263

Chapter 4 Vector Calculus, 272

1 Directional derivatives, 272

2 The gradient, 277

3 The chain rule, 287

4 Implicitly defined functions, 301

5 Curvilinear coordinates, 310

6 Inverse and implicit function theorems, 325

7 Surfaces and tangents, 337

Chapter 5 Real-Valued Functions, 350

1 Extreme values, 350

2 Quadratic polynomials, 361


Contents

3 Taylor expansions, 380

4 Taylor expansions and extreme values, 390

5 Fourier series, 397

6 Modified Fourier expansions, 405

7 Heat and wave equations, 412

8 Uniform convergence, 422

9 Orthogonal functions, 429

Chapter 6 Multiple Integration, 440

1 Iterated integrals, 440

2 Multiple integrals, 449

3 Change of variable, 467

4 Improper integrals, 482

5 Estimates of integrals, 492

6 Numerical integration, 498

Chapter 7 Vector Field Theory, 510

1 Green's theorem, 510

2 Conservative vector fields, 522

3 Surface integrals, 530

4 Stokes's theorem, 541

5 Gauss's theorem, 551

6 The operators V, V x and , V-, 557

7 Differential forms, 565

8 The exterior derivative, 574

Appendix, 579

1 Introduction, 579

2 The chain rule, 583


Contents

3 Arc length formula, 585

4 Convergence of Fourier series, 587

5 Proof of the inverse and implicit function theorems, 589

6 Proof of Lagrange's theorem, 595

7 Proof of Taylor's theorem, 598

8 Existence of the Riemann integral, 601

9 The change-of-variable formula for integrals, 605

Index, 611
Calculus

of

Vector Functions
Introduction

A first course in calculus deals with real-valued functions of one


variable, that is, functions defined on all or part of the real number line %,
and having values in Si. For example, a formula such as

y = x2 + 3

yields a real number y for any real number x and so defines a function/

from % to %
with (for instance) /(0) 3,/(-2) 7,/(V3) =
6, etc. = =
In this book we are concerned with functions of several variables whose
values may be real numbers or, more generally, may be w-tuples of real
numbers. For example, a pair of formulas

Jl = V^i + x\ -f X3

y2 = xrx 2 + 5x 3

yields a pair of numbers (y u y 2 ) for any triple of numbers (x x x 2 x 3 ) and , ,

so defines a function g from "3-dimensional space" to "2-dimensional


space." In particular, the formulas above give

g(0, 0, 0) = (0, 0),

g(l,2,3) = (Vl4,17),
g(3, 2, 1) = (Vl4, 11).

We shall use 51" to stand for the set of all rt-tuples of real numbers.
1
(Jl is thus the same as '.R .) The domain of a function is on which it
the set
is defined, and the range or image is the set of values assumed by the
Introduction

function. We speak of functions "from ft" to ft


m " and write

/
31" ft"

to indicate that/is a function whose domain is a subset of ft" and whose


range is a subset of ft
m . ft" is then called the domain space of the function
and ft
m is called its range space. The terms "transformation" or "mapping"
are sometimes used instead of "function."
While functions of one variable are basic and very useful, there are
many situations whose mathematical formulation requires the more
general functions we consider here. For example, just as certain curves
in the plane can be represented as graphs of functions from ft 1 to ft 1 so ,

certain surfaces in 3-space can be represented as graphs of functions


from ft 2 to ft 1 The picture below illustrates the graph of a function

.

ft 2 ft 1 defined by a formula

f(x, y) = z.

how curves and surfaces can be described


Other examples showing
by functions are given at the beginning of Chapter 3.
Most of the problems and examples in this book involve func-
tions from ft" to ft m with values of m and n not more than 2 or 3,
since higher-dimensional problems are difficult to visualize and
often require inordinate amounts of computation. In the theoreti-
cal development we nevertheless provide formulations valid for
arbitrary dimensions. This is not an empty generality. An econo-
mist may wish to consider a mathematical model in which the
prices of a number of commodities are determined by a number of
other variables such as costs of production and demands. A civil
engineer may want to study the displacements produced in the many joints
of a complex structure by various combinations of loads applied at many
points. Now that automatic computers have made the arithmetic calcu-
problems involving dozens or even hundreds of variables
lations feasible,
are being attackedand solved. Thus it is important to have techniques and
theorems that apply to functions from ft" to ft m for all values of m and n.
Most of the ideas studied in one-variable calculus reappear in a more
general form in multivariable calculus. For example, the problem of
finding a tangent line to the graphy =/(*) of a function of one variable
becomes the problem of finding a tangent plane to the graph z = g(x, y)
of a function of two variables. The tangent line has an equation of the
form y = mx + b, and the coefficient m is found by computing the
derivative of/. In the higher-dimensional case, the tangent plane has an
equation of the form z = px + qy + c, and, as we shall see, the coeffi-
cients p and q are given by the higher-dimensional derivative of g.
Differential calculus can be described as the technique of studying
general functions by approximating them with functions of the simple
Introduction

form exemplified by y = mx + b or z = px + qy + c. In one dimension,


these "linear" functions are very simple indeed, but in higher dimensions
they are more complicated. The first two chapters of the book are
concerned with linear functions and some closely related topics that have
applications in many fields other than calculus.
The main purpose of the book, to which the later chapters are largely
devoted, is to study generalizations of derivative and integral to higher
dimensions. We shall be concerned with the relations between these ideas
and with how they can be used to solve a variety of problems. The formal
techniques of differentiation and integration from one-variable calculus
continue to be directly applicable in multivariable calculus as methods of
calculation. However the interpretations, and often the underlying ideas,
may be quite different. To get geometric insight into these interpretations
it isworth learning to visualize three-dimensional pictures. Two dimen-
sional ones come fairly readily because they are easy to draw on paper.
Learning to make perspective drawings is a big help in understanding
three-dimensional problems and relating them to the physical world.
The boldface letters x, a, etc., that are used to distinguish vectors from
numbers can be written longhand in several ways. Some possibilities are

x, x, or x; capital letters or ordinary small letters can also be used in a

context in which they will not be confused with the usual notations for
matrices and numbers. The printing of a word in boldface type indicates
that the word is being defined at that point in the text.
1

Vectors and Linearity

SECTION 1

VECTORS We denote by %n the set of all ^-tuples (x 1 , x2 , . . . , x n ) of real numbers.


Boldface letters x, y, z, etc., will stand for /7-tuples, while ordinary light-
face letters will stand for single real numbers. In particular we may write
x = (x, y) or x = (x, y, z) for general pairs and triples in order to save
writing subscripts.
For any two elements x = (x x x 2 x„) and y , , . . . ,
= {y x ,y 2 , . . .
,y n )
in we define the sum x + y to be the «-tuple
.'K'\

(x t +yu x +y t % ,. . . ,x n +_y„).

For any real number r and //-tuple x = (x lf x2 , . . . , x n ), we define the


numerical multiple rx to be the //-tuple

(rxj, rx 2 , . . . , rx n ).

For example, if x = (2, — 1, 0, 3) and y = (0, 7, —2, 3), then

x + y = (2, 6, -2, 6)
and
3x = (6, -3,0,9).

We write — x for the numerical multiple (— l)x, and x — y as an abbrevia-


tion for x + (— y). We use to denote an /?-tuple consisting entirely of
zeros. The zero notation
is ambiguous since, for example, may stand
for (0, 0) inone formula and for (0, 0, 0) in another. The ambiguity
seldom causes any confusion since in most contexts only one interpretation
makes sense. For instance, if z — ( — 2, 0, 3), then in the formula z -f 0,
the must stand for (0, 0, 0) since addition is defined only between n-
tuples with the same number of entries.
.

Sec. 1 Vectors

The following formulas hold for arbitrary x, y, and z in JR." and


arbitrarynumbers r, s. They express laws for our new operations of
addition and numerical multiplication very closely analogous to the
familiar distributive, commutative, and associative laws for ordinary
addition and multiplication of numbers.

1 rx + sx = (r + s)x.
2. rx + ry = r(x + y).

3. r(sx) = (rs)x.

4. x + y = y + x.

5. (x + y) + z = x + (y + z).

6. x + = x.

7. x + (-x) = 0.

These laws are all quite obvious consequences of the definitions of


our new operations and the laws of arithmetic. For illustration, we give a
formal proof of law 2.

Let x = x2(x l5 , . . . , x n ) and y = (y x ,y 2 , . . .


,y n ), and let r be a
real number. Then
rx = (rx x rx 2
, , . . . , rx n ) [definition of numerical
Vectors and Linearity Chap. 1

in Chapter 2, but for the present "vector" may be taken to mean "element
of :K"" for some n. Numbers arc sometimes called scalars when emphasis
on the distinction between numbers and vectors is wanted. In physics, for
example, mass and energy may be referred to as scalar quantities in
distinction to vector quantities such as velocity or momentum. The term
scalar multiple is synonymous with what we have called numerical
multiple.
The vectors

(1,0,. ..,0)

(0, 1 , 0, . . . , 0)

= (0,
en . . . , 0, 1)

have the property that, if x = (x x , . . . , x n ), then

x = xfr + . . . + x„e„.

Because every element of ,'R" can be so simply represented in this way, the
set of vectors {e l5 . . . , e„} is called the natural basis for 3l n . For example,
in 'Ji
3
we have (1 , 2, — 7) = + 2e — 7e The
ei 2 3. entries in

x=(x ,..., x n 1 )

are then called the coordinates of x relative to the natural basis.


A sum of numerical multiples x^ + • • • + xne„ is called a linear
combination of the vectors e r e„. More generally, a sum of multiples
, . . . ,

a1 x 1 + + a n x n is called a linear combination of the vectors x x


. . . x„. , . . . ,

Thus, for example, the equation

(2, 3, 4) = 4(1, 1, 1) - 1(1, 1,0) - 1(1,0, 0)

shows the vector (2, 3, 4) represented as a linear combination of the


vectors (1,1, 1), (1, 1,0), and (1,0,0).

EXERCISES
1. Given x = (3, — 1, 0), y (0, 1, 5), and z = (2, 5, — 1) compute 3x, y -I- z,

and 4x - 2y -r 3z. [Ans. (18, 9, -13).]

2. Find numbers a and b such that ax by (9, — 1, •


10), where x and y are
as inProblem 1. Is there more than one solution?

3. Show that no choice of numbers a and b can make ax by (3, 0, 0), -'

where x and y are as in Problem 1. For what value(s) of c (if any) can the
equation ax by = (3, 0, c) be satisfied?
I
Sec. 2 Geometric Interpretations

4. Write out proofs for (a) law 3 and (b) law 4 on page 5, giving precise justifica-
tion for each step.

5. Verify that the set C[a, b] of all continuous real-valued functions defined on
the interval a <x<b is a vector space, with addition and numerical
multiplication defined by (/ \- g)(x) = f(x) + g(x) and (rf)(x) = rf{x).

6. Prove that the representation of a vector x in Jl" in terms of the natural basis
is unique. That is, show that if

-*i e i + • • • + x ne n y^ + . . . + yn e n ,

then xk = yk for k = 1 n.

7. Represent the first vector below as a linear combination of the remaining


vectors, either by inspection or by solving an appropriate system of equations.

(a) (2, 3, 4); (1,1,1), (1,2, 1), (-1,1,2).


(b) (2, -7); (1,1), (1, -1).

(c) (-2, 3); ei , e 2 .

SECTION 2

Geometric representations of Jl 1
as a line, of 5l 2
as a plane, and of .II
3 GEOMETRIC
as 3-dimensional space may be obtained by using coordinates. To represent INTERPRETATIONS
1
'Si one must first specify a point on the line to be called the origin,
as a line,
a unit of distance, and a direction on the line to be called positive. (The
opposite direction is then called negative.) Then a positive number x
corresponds to the point which is a distance x in the positive direction
from the origin. A negative number x corresponds to the point which is a
distance |.y| from the origin in the negative direction. The number zero of
course corresponds to the origin. The number line is most often thought
of as horizontal with the positive direction to the right. With this standard
convention, we obtain the familiar Fig. 1, in which the arrow indicates
the positive direction.

Figure 1

In the plane, one takes an origin, a unit of distance, a pair of per-


pendicular lines (called the axes) through the origin, and a positive
direction on each axis. is, a pair of numbers
Given a vector in 3l 2 , that

(Xi, x 2 ), the procedure described in the preceding paragraph determines a


point p 1 on the first axis corresponding to the number .v x and a point p 2
on the second axis corresponding to the number x 2 Then the vector .

(jfi, x 2 ) corresponds to the point p in the plane whose projection on the

first axis is
p x and whose projection on the second axis is /?.,. The (per-
pendicular) projection of a point/? on a line L is defined as the foot of the
Vectors and Linearity Chap. 1

perpendicular from p to L if p is not on L. If/? is on L. then the projection


of p on L is p itself.

The conventional choice is to take the first axis horizontal with the
positive direction to the right, and the second axis vertical with the
positive direction upwards. This leads to the usual picture shown in Fig. 2.

Representing a vector by an arrow from the origin to the corresponding


we have done
point, as in Fig. 2, often makes a better picture than simply
marking the point.
An obvious extension of the procedure works in three dimensions.
One takes an origin, three perpendicular axes through and a positive it,

direction on each. A vector in 'J\ 3 is a triple of numbers (x 1 x 2 x 3 ) and , ,

gives points p x ,p 2 and/? 3 on the three axes. Then the point/?, correspond-
,

ing to (jtx x 2 x 3 ), is the one whose projections on the three axes are p x p 2
, , , ,

and p 3 .

There is no universally accepted convention for labeling the axes in


3-dimensional figures. Figure 3 illustrates the convention followed in this

book, but several other schemes are also in common use.

/>(-3,2)

Figure 2 Figure 3

We have described how to set up a correspondence between vectors


(/7-tuples of numbers) and points. We now consider the geometrical
interpretation of the vector operations of addition and numerical multipli-
cation. This time we represent a vector as an arrow from the origin to the
point with given coordinates. Figure 4 shows two vectors u = (ult u 2 )
and v = (»!, v 2 ). The sum in %2 is u + v = (w x + v x u 2 + v 2 ), so the ,

arrow representing u -f- v must be drawn as shown.


As Fig. 4 suggests, u + v is represented by an arrow from the origin
to the opposite corner of the parallelogram whose sides are the arrows
representing u and v. This geometric rule for adding vectors is often
referred to as the parallelogram law of addition. (To prove that this
Sec. 2 Geometric Interpretations

Figure 4

law is a consequence of our definitions, one of course has to use some

theorems of geometry. For an outline of a proof, see Exercise 6(a).)


Another rule for adding vectors geometrically is illustrated in Fig. 5.
Starting at the end of the arrow representing u, draw an arrow equal in
length and parallel to the arrow representing v. (In other words, translate
the arrow representing v from the origin to the end of u.) Then u + v is

represented by an arrow from the origin to the end of the translation of v.

This rule can be applied to any pair of vectors, whereas the parallelogram
law does not (strictly speaking) apply if the arrows representing u and v
lie in the same straight line.
If we write w for u + v, then v =w— u. Figure 6 (which is simply
Fig. 5 relabeled) illustrates the useful fact that the difference of two vectors
w and u is represented by the arrow from the end of u to the end of w,
appropriately translated.

v(translated) w - u (translated)

Figure 5 Figure 6

Figure 7 illustrates numerical multiplication, both by a positive number


a and a negative number For a positive number a, the arrow represent-
b.

ing an has the same direction as the arrow representing u and is a times as
long. For a negative number b, b\\ points in exactly the opposite direction
to u and is \b\ times as long. (See Exercise 6(b).)
The question of whether to represent a vector geometrically as a
10 I 'velars and Linearity Chap. 1
Sec. 2 Geometric Interpretations 11

Example 1. To find a representation for the line in 'Ji


3
parallel to the
vector (1, 1, 1) and passing through the point (
— 1,3,6) we form all

multiples r(l, 1, 1) to get a line through the origin. Then the set of all

points /( 1 , 1 , 1) + (— 1, 3, 6) is a line containing the point (


— 1,3,6),
as we see by setting / = 0.
3
To determine a plane in 'J\ it is natural to start with two noncollinear
vectors x x and x 2 (that is, such that neither is a multiple of the other) and
consider all points ux x -f rx 2 where u and v are numbers.
, The geometric
interpretation of numerical multiplication and addition of vectors shows
that the points i^ + *'x 2 constitute what we would like to call a plane
P through the origin. We then define a plane P parallel to P to be the set
of all points j/x x + vx 2 + x , where x is some fixed vector. P and P are
related as shown in Fig. 9.

Figure 9

Example 2. We represent a plane in % 3


parallel to the two vectors
(1, 1, 1) and (2, —1,3), and passing through the point (0, 1, —1). The
set of all linear combinations w(l, 1, 1) + v(2, —1,3) is a plane P
through the origin, and the set of all points w(l, 1,1) + v(2, —1,3) +
(0, 1,-1) is a parallel plane P. That P contains the point (0, 1,-1)
becomes evident on setting u =v= 0.

The set of all linear combinations of a set S of vectors is called the


span of 5, and we speak of the set of linear combinations as being spanned
by 5". For example, if 5 consists of one vector x, then the line £ consisting of
all numerical multiples tx is the span of S. If S = {x,, x 2 }, the set spanned

by S is the set P of all linear combinations ux x + yx 2 For P to be a .

plane, we require that x x and x 2 not lie on the same line. Another way to
state this condition is that no multiple of x x should equal a multiple of
x 2 except for the zero multiples. More generally, we say that a set of
vectors
Xj, x2 , . . . ,x n

is linearly independent if, whenever

c1 x 1 + . . . + cnxn =
12 Vectors and Linearity Chap. 1

for some numbers c lt c n , then all the c's must be zero. If on the
. . . ,

other hand it is possible to have a linear combination of vectors equal to


the zero vector, but with not all the coefficients equal to zero, then those
vectors are said to be linearly dependent.

Example 3. The set of three vectors

(1,2), (-1,1), (1,-3)


in :i{'- is linearly dependent. For, the equation

c 1 (l,2) + c (-l,l) + c,(l,-3) =


i

is equivalent to the two equations

Ci—c +c = 2 3

2ci + c — 3c = 0. 2 3

If we now let cz have any nonzero value, we can then solve for c x and c 2 .

For example, if we set c 3 = 1, then

Ci — c2 =— 1

2c i + c2 = 3,
and we solve the two equations. Adding them gives 3c x = 2, or c x = §,
while subtracting the second from two times the first gives — 3c = 2 —5,
or Co = f Thus .

(|)(1, 2) +(!)(-!, 1)+(1)(2, -3) = 0,

so the vectors are linearly dependent. Linear independence can be


checked similarly by solving a vector equation and showing that it has
only zero solutions.

A linearly independent set of vectors that spans a subset S of a vector


space U is called
C
a basis for S. A basis for S is useful because, if x is any
vector in 8, then x can be represented as a linear combination of basis
elements with uniquely determined coefficients. Thus if Xj, . . . , x„ is a
basis for S and x is in S, we have, by the spanning property,

X = C&i + . . . + c„x n ,

for some constants c lt . . . , cn . But the c's are completely determined.


For if we had also
x = d{x. + x . . . + dnx n ,

then subtraction of one equation from the other gives

= (Cj - djx! + . . . + (c„ - d n )x n .

It follows from the linear independence of the x's that ck — dk = for


each k. so that c,. = d,..
Sec. 2 Geometric Interpretations 13

3
Example 4. The natural basis for .'K is the set of vectors

ei = (1,0,0), e2 = (0,1,0), e3 = (0,0,1).


It is easy to check that these vectors are linearly independent and that they
3
span 'Ji . For if

Cjej + c2e 2 + c3e 3 = 0,

then (<?!, c2 , c3 ) = (0, 0, 0). On the other hand, any vector (x, y, z)
3
in !il can be written
(x, y, z) = xe x + ye + 2 ze 3 .

EXERCISES
1. For each pair u, v of vectors in Jl 2 given below, draw the arrows representing
u, v, u -f v, u — v, and u + 2v.

(a) u = (1,0), v = (0, 1).


(b) u = (-2, 1), v =(1,2).
(c) u = (-1,1), v = (!,!).
2. Let x = (1, 1) and y = (0, 1). Draw the arrows representing tx + y for the
following values of t : —1 , \, 1 , 2.

3. Let u = (2, 1) and v = ( — 1, 2). Draw the arrows representing /u +


(1 — t)y for the following values of t: —1,0,^,1,2.
4. (a) Let u x = (0, 1), v x = (-1, 1), u 2 = (-3, 2), and v 2 = (2, 1). Sketch
the lines iij + and u 2 + s\.2 Find the vector w at the point of inter-
t\ x .

section of the lines by finding values of t and s for which w = u x + t\ 1 =

u2 + .vv
2
. [Ans. w = (— f, f).]
(b) Let u lt Vj, and u 2 be as in part (a), but take v 2 = (2, —2). (Note that v 2
is then a numerical multiple of v^) Sketch the lines u l | t\ 1 and
u2 + s\.,. Show algebraically that the lines do not intersect.

5. Show that in :R", the lines represented by sux t u and /v t + v are the same
if and only if both v t and v ()
— u are numerical multiples of u l .

u2 + v
2
14 I ectors and Linearity Chap. 1

OVB and UPA are congruent. From this deduce that OK and UP are
parallel and of equal length. Then OVPU is a parallelogram, and the
parallelogram law follows,
(b) In Figs. 7(a) and 7(b) the points U, V, and W are constructed to have
coordinates as shown. Prove that the triangles OUP, OVQ, and OWR
are similar. Show that the angles HOP, VOQ, and WOR are therefore
equal and that the lengths OK and OW^arc proportional to the length
of Oil as stated in the text.

7. It is not essential to use perpendicular axes in setting up coordinates in


the plane. The same procedure can be used if the projection of a point on
an axis is defined as the point of intersection of that axis and the line
through the given point and parallel to the other axis. (If the axes arc
perpendicular, this is equivalent to our previous definition. Why?) It is
also possible to choose different units of distance along the two axes.
Show that the geometric interpretation of the vector operations in two
dimensions remains the same in this more general setting. How would
you extend the definition of projection to make the same generalization in
three dimensions?

8. (a) Show if u and v are two distinct vectors, then the vectors ru +
that
(1 —form a line through the points corresponding to u and v.
r)v
n
(b) If x x and x 2 are two vectors in 3i then the set of all vectors rx x +
,

(1 — /)x 2 where ,< t < 1, is the line segment joining \ x and x 2 A set .

5" in R" is convex if, whenever S contains two points, it also contains the

line segment joining them. Prove that the intersection of any collection
of convex sets is convex.

9. Represent the following lines in the form rx x [ x , where / runs over all real

numbers. Sketch each line.

3
(a) The line in J\ parallel to (1, 2, 0) and passing through the point (1,1, 1).
2
(b) The line in 3i joining the points (1, 0) and (0, 1).
3
(c) The line in :R joining the points (1, 0, 0) and (0, 0, 1).

3
10. Represent the following planes in 3l in the form ux 1 + vx 2 I
x„, where
// and v run over all real numbers. Sketch each plane.

(a) The plane parallel to the vectors (1, 1,0) and (0, 1, 1) and passing
through the origin.
(b) The plane parallel to the vectors e t and ea and passing through the point
(0, 0, 1).
(c) The plane passing through the three points (1, 0, 0), (0, 1, 0), and
(0, 0, 1).

11. Determine whether each of the following sets of vectors is linearly dependent
or linearly independent.

(a) (1,2), (2,1).


(b) (1,2), (-3, -6).
(c) (1, -1), (0, 1), (2, 5).
(d) (1,2,0), ( -1,1, 1), (-1,0, 0).
Sec. 3 Matrices 15

12. Determine whether the first vector below is in the set spanned by the set S.

In each case give a geometric interpretation: the first point does (or does
not) lie on the plane (or line) spanned by S.

(a) (1,\);S = {(-2, -2)}.


(b) (-1, 1,3); 5 ={(1, 1, 1), (-1,0, I)}.

(c) (-1, -2, 1); S = {(3, 6, -3), (1, 2, -1)}.

13. Which of the following sets form bases for :R 2 or H3 ?


(a) (I, 1), (-1, 1). (c) (1, 1, 1), (1, 1,0), (1,0,0).
(b) (1, 2, 1), (-1,2, 1), (0, 4, 2). (d) (1,0, 0), (0, 2, 0), (0, 0, 3).

14. (a) Prove that two nonzero vectors x and y are linearly dependent if and
only if they lie on the same line through the origin,

(b) Prove that three nonzero vectors x, y, and z are linearly dependent if

and only if they all lie on the same plane through the origin.

A set of equations such as

J>!
= 2x x + 3x 2 — 4x 3

y2 = xl — x2 + 2x 3 ,

in which each v, is given as a sum of constant multiples of the Xj, defines a


3 2
function (in this example from 'J\ to ;<{ ) of a particularly simple kind. It

is an example of what we shall call a linear function. (The precise definition


of "linear" appears in the next section.) An understanding of these
functions is basic to the study of more general functions. Linear functions
1 1
from 111 to 'Jl are so simple that they can be taken for granted in studying
the calculus for functions of one variable. In higher dimensions, however,
linear functions can be more complicated, and themain business of this
first chapter is to develop their basic properties and to present notation
and methods of calculation for dealing with them.
In the foregoing example, the .x's and v's can be thought of as place-
holders. The function is described completely by the array of coefficients

Any such rectangular array of numbers is called a matrix. Thus


^0.325>
5X
1 1 1
(l,i,0),
16 Vectors and Linearity Chap. I

fiveexamples above have shapes 3-by-2, 2-by-3, 2-by-2, l-by-3, and 4-by-l.
Note that the number of rows always comes before the number of columns.
The 1-by-/? matrices are called n-dimensional row vectors, and /?-by-l

matrices are called n-dimensional column vectors. A matrix is square if it


has the same number of rows as it has columns.
The number in the Ah row andyth column of a matrix is called the ijth
entry of that matrix. Once more the row index is always put before the
column index. Two matrices are equal if and only if they have the same
shape and the entries in corresponding positions in the two matrices are
equal.
We use capital letters to denote matrices, and often use the corre-
sponding small letters with appropriate subscript to denote their entries.
Thus we write

j a 21 a 22 a 23 a 2i ) or A = {au ) t

where i — 1,2,3, j = 1,2,3,4. We xlt x 2


usually write simply xn , . . . ,

for the entries of an «-dimensional column vector x rather than x n ,

x 2 i, x nl , .

The operations of addition and numerical multiplication which were


n
defined in the last section for vectors in 'i\ can be extended to matrices.
If A and B have the same shape, then their sum A + B defined as the
is

matrix with the same shape and ijth entry equal to a u + b u For example
.

2 3\ /l -1\ /3 2^

4/ \0 0/ \0

Addition is not defined between matrices of different shapes. Recall that


we did not n
and %m with m ^ n.
define addition between elements of
3i

For any matrix A and number r, the numerical multiple rA is defined


as the matrix with the same shape as A and ijth entry equal to ra u For .

example
Sec. 3 Matrices 17

As with vectors in :ft", we write — A for (—\)A and A — B for A +


(—1)2?. For every shape there is a zero matrix which has all its entries
equal to zero. We use to denote any zero matrix; the shape intended
will always be clear from the context.
It is if the matrices X, Y, and Z all have the same shape,
easy to see that
and / and any numbers, then the formulas 1-7 in Section all hold.
5 are 1

(The proofs are just the same as when X, Y, and Z are all in .'ft".) In other
words, according to the definition in Section 2, for any fixed m and n, the
set of in -by- n matrices forms a vector space with the operations of addition

and numerical multiplication that we have just defined.


Another operation between matrices is suggested by the way they may
be used to describe functions from %m to .'H". For example, suppose we
have formulas
-i = 3j! - v->

z2 = 5jj + 2y 2 ,

and

yt = 2x t + 3x 2 + 4x 3

yi = X\ -^2 + ^X 3 ,

defining functions and from %


from .'ft
2
to 'Ji
2 3
to :ft
2
, respectively. These
functions are described by the matrices A and B where

3 -l\ (2 3 4
and B =
5 2/ \l -1 2

If we express the z's directly in terms of the x's we obtain

za = 3(2*! + 3x 2 + 4*3) — Ui — x + 2x 2 3)

z2 = 5(2*1 + 3x 2 + 4a: + 3) 2(*! — x + 2*3). 2

Rearranging terms gives

zx = (3 • 2 - 1 l)x, + (3 •
3 - 1 •
(-1)> 2 + (3 -4 - 1 • 2)x 3

z2 = (5 • 2 + 2 •
1)*! + (5 • 3 +2 • (- l))x 2 + (5 4 + 2 •
2)x3

and we see that the resulting function is described by a matrix C with

'3-2-1-1 3 •
3 — 1 -
(— 1) 3 -
4 — 1
-
2\ (5 10 10
C=
X5-2 + 2-1 5-3 + 2-(-l) 5-4 + 2-2/ \12 13 24

We say that C is obtained from A and B by matrix multiplication and


write C= AB.
To see how the general definition should be made, note that the ijih
entry of C is the sum of products of entries in the /th row of A and they'th
column of B. Thus c 21 = fl 21 6n + a.12 b.n = 5-2 + 2-1 = 12.
18 Vectors and Linearity Chap. J

We give the definition of matrix multiplication in two stages. First


suppose

A = (a u a 2 , . . . , a k) and B =

are row and column vectors of the same dimension. Then the product AB
is defined to be thenumber a-Jjx + a 2 b 2 + + a k b k Now let A be an . . . .

m-by-k matrix and B be an k-by-n matrix. (It is important that the number
of columns of A be equal to the number of rows of B.) Then each row of
A is a A'-dimensional row vector and each column of B is a ^-dimensional
column vector. We define the matrix product AB as the m-by-n matrix
whose ijth entry is the product (in the sense just defined) of the z'th row of
A and they'th column of B. The product AB always has the same number
of rows as A and the same number of columns as B. For instance, in our
example
'3 -1\ /2 3 4\ / 5 10 10^

12 13 24

the entry in the second row and third column of the result is obtained by the
calculation

(5 2)1 ]= 5-4 + 2-2-24.

You should check that the other entries in the product can be obtained by
the rule stated above. Schematically the entries in a matrix product are
found by the mechanism illustrated below. The process is sometimes
called row-by-column multiplication of matrices.

* * * *
* * *
* * * 3 ***[*]*
* * *
* * * *

The following remark is an obvious consequence of the way matrix


multiplication is defined. We state it formally for emphasis and because
we shall refer to it later.
Sec. 3 Matrices 19

3.1 Theorem

The /th row of a matrix product AB is equal to the ith row of A


times B. They'th column of AB is equal to A times the/th column
of B.

There are several important laws relating matrix multiplication and


the operations of matrix addition and numerical multiplication. They
hold for any number and matrices A, B, C for which the indicated
/

operations are defined. (Addition is defined only between matrices of the


same shape. Multiplication is defined only if the left factor has exactly
as many columns as the right factor has rows.)

1. (A + B)C = AC + BC.
2. A(B + C) = AB + AC.
3. {tA)B = t{AB) = A(tB).

4. A(BC) = (AB)C.

According to the last law, it makes sense to talk of the product of three
matrices and simply write ABC, since the result is independent of how the
factors are grouped. In fact this 3-term associative law implies that the
result of multiplying together any finite sequence of matrices is inde-
pendent of how they are grouped. Not all the laws that hold for multi-
plication of numbers hold for multiplication of matrices. In particular,
the value of a matrix product depends on the order of the factors, and
AB is usually different from BA. It is also possible for the product of two
matrices to be a zero matrix, without either of the factors being zero.
Exercise 5 at the end of this section illustrates these points.
The laws stated above are easily proved by writing out what they mean,
using the definitions of the operations, and then applying the associative,
distributive, and commutative laws of arithmetic. Number 4 is the most
complicated to prove, and we give its proof in full below. The other
proofs are left as exercises.
To prove thatA(BC) = (AB)C, let A, B, and C have respective shapes
p-by-q, q-by-r, and r-by-s. Let U = BC and V = AB. (Then U has
shape q-by-s, and V has shape p-by-r.) We have to show that AU = VC.
The //'th element of AU is (by definition of matrix multiplication) equal
Q T

t0 2 a The element of U ^ bkl c u Thus the element of

AU
k=l
ik ll kj-
HIT Ay'th
\
is
1=1
. (/th

equals 2 a ik( 2 b kl c u \. Similarly, the //th element of VC is equal


:

20 Vectors and Linearity Chap. I

r Tin \

to ^= vuc u — 2 IZ a<fc^w) c w Both these expressions are equal to the


1 1 (=1 u=i /

sum

l<l<r
l<k<Q

and hence are equal to each other. Thus corresponding entries of AU


and VC are equal and the matrices are the same. This completes the
proof.
A square matrix of the form
1 ...

. . . 1

that has l's on its main diagonal and zeros elsewhere is called an identity
matrix. It has the property that

IA = A, BI=B
for any matrices A, B such that the products are defined. Thus it is an
identity element for matrix multiplication just as the number 1 is an identity
for multiplication of numbers. There an n x n identity matrix for every
is

it is almost always clear from the


value of n, but, as with the zero matrices,
context what the dimension of an identity matrix must be.
If A is a square matrix and there is a matrix B (of the same size) such
that
AB = BA = I,

then we say that A is invertible and that B is an inverse of A. As we show

below Theorem 3.3, there is at most one matrix B satisfying these


in

conditions, so we are justified in speaking of the inverse of a matrix. If


A is an invertible matrix, we write A' 1 for its inverse. For example, it is
easy to check that

and according
T, 1 - U DC
to the definition this shows that
K
1 2\ /l 2\~ l
=
[1-2
is invertible and that I

3 l) \3 7/ \-3 1
Sec. 3 Matrices 21

Many matrices, on the other hand, are not invertible. No zero matrix can
have an inverse, and several less obvious examples are given in the
exercises.
It is usually not easy to tell whether a large matrix is invertible, and it

can take a lot of work to compute its inverse if it has one. Determinants
can be used to give a formula for the inverse of a matrix (Theorem 8.3),
and a more effective way to compute inverses is given in Section 1 of
Chapter 2. Two-by-two matrices and a few other easy cases are discussed
in Exercises 9 to 13 of this section. The rule for finding the inverse of a
2-by-2 matrix is as follows.

bY 1
-b\
3 2
-
a
=— H d i /
if ad-bc^O.
ad - bc\-
,

\c d! c a

Thus, for example

-1
1 3\ (2 -3
,-1 2/ Hi 1

It is easy to verify that

1 3\ /* —\\ /* -*\ / 1 3

-1 2/U \J U \JY
so that we have indeed found the inverse.
Two important properties of invertible matrices are easily proved
directly from the definition. The first one ensures that, no matter how an
inverse to a matrix A is computed, the resulting matrix A~ x is always the
same.

3.3 Theorem

A matrix A has at most one inverse.

Proof. Suppose there are two matrices, B and C, such that both

AB = BA = / and AC = CA = /.

Then AB = AC, because both products equal /. Multiplying by B


on the left gives
BAB - BAC.
22 Vectors and Linearity Chap. I

But since BA = I, substitution gives IB = IC, from which it follows


that B = C, as was to be proved.

The next theorem shows how to compute the inverse of a product of


matrices each of which is invertible.

3.4 Theorem

If A =A X A2 . . . A n and all of A
,
x , . . . , AH are invertible, then A is

invertible and A~ x = A~ l A~\ x . . . A^A^ 1


.

Proof. In the (A~ A^A^iA^o


product A n ), the terms l
. . . . . .

A\ A X combine to give /, which may then be dropped. Then A^ 1


X

and A 2 cancel, and so on, until / is obtained as the final result.


Similarly (A x A n ) (A~ l. A x x ) reduces to /, and this shows that
. . . . .

the two products are inverses of each other.

An identity matrix is a special case of what is called a diagonal matrix.


A square matrix A is diagonal if its entries off the "main diagonal" are
all zero, that is,ifo = whenever/ ^ j. The notation diag (t x t 2 tn) , , . . . ,

is convenient for the n X


n diagonal matrix which has entries t x t 2 , ,
. . . , tn

on the diagonal. For example, diag (2, 0, — 1, 3) is a notation for

Problems 7 and 8 show that matrix operations with diagonal matrices are
particularly simple.

EXERCISES
-2

C =
Sec. 3 Matrices 23

determine which of the following expressions are defined, and compute


those that are.

(a) 25 - 3G. (f) CD + 3DB.


(b) AB. (g) 2AB 5G.
(c) BA. (h) 2GC - 4AB.
(d) BD. (i) CDC. 3 7 1

(e) DB. (j) £CD. /l/;s. (b)


-2 14 -4

2. Show that for any matrix A and zero matrices of appropriate shapes,

AO =0 and OA = 0.

If A is m-by-/7, for what possible shapes of zero matrices is A0 defined?


For what shapes is OA defined? What are the shapes of the products?

3. With A, B, C, D as in Problem 1, determine what shapes A' and Y would


have to have for each of the following equations to be possible. (In some
cases there may be no possible shape; in some cases there may be more than
one.)

(a) AX = B + Y. (d) CX + DY = 0.

(b) (D 2X)YC = 0. (e) AX = YC.


(c) AX = YD. (f) AX = CY.
[Ans. (d) X is 3-by-n , Y is 2-by-».]
4. Prove the distributive law

A(B + C) = AB + AC
for matrix multiplication.

1 2\ 2 6
5. Let U= V= . Compute UV and VU. Are they the
2-4/ \1 3/
same? It is possible for the product of two matrices to be zero without
either factor being zero?

6. Let X

Let D - diag(l,2, 3)

Compute DX, DP, and DQ.


24 Vectors and Linearity Chap. 1

How may the product SD be described? (Computing the product QD using


the matrices of Exercise 6 should suggest the general rule.)

8. Using the matrices B, C, and G


compute Be Ce x Ge x Be3 of Exercise 1, 1 , , , ,

Ce3 Ge3 Prove


, . and column vector
the general rule that for any matrix M
e,- with appropriate dimension, the product A/e; is they'th column of M.

9. What is the product diag (a u . . . , a n ) diag (b u . . . , b n )l When is the


result the identity matrix? Show that diag (q lt . . . , a n ) has an inverse
provided none of the numbers a t is zero.

b\
10. (a) Let A = I
J
be a 2-by-2 matrix with ad # be, and let A~ x be given

by Formula 3.2. Show that AA~ = A~ A = X l


I. (This proves that A is

invertible if ad - be, called the determinant of A, is not zero.)


(b) Try to find the inverses of the following 2-by-2 matrices using Formula
1 1\ (0 1\ (2 6\

2 1/' \l 0/ \1 3,

What is wrong with the last one?

11. Show that if any matrix X # such that AX = 0, then A cannot be


there is

invertible. [Hint. Suppose B = A' 1 and consider (BA)X = B(AX).] Use ,

this result to show that diag (a l5 a n ) is not invertible if any of the a, is


. . . ,

zero.
b\
12. Show that if ad = be, then I I is not invertible.

13. If A is a square matrix, it can be multiplied by itself, and we can define


A 2 = A A, A 3 = AAA = A 2 A, A n = A A A (n factors). These powers . . .

of A all have the same shape as A. Find A 2 and A 3 if

(a) A = (b) A =
(

4 3^
Ans. (a)
li

(Note that is the only number whose cube is 0. Part (b) of this problem
thus illustrates another difference between the arithmetic of numbers and of
matrices.)

14. The numerical equation a 2 = 1 has a = 1 and a = —1 as its only solutions.

(a) Show that if A = /or -/, then A2 = /, where /is an identity matrix of
any dimension.
(a b\- (\ 0\
(b) Show that I I = I I if a2 + be = 1 ; so the equation
Sec. 4 Linear Functions 25

A2 = I has infinitely many different solutions in the set of 2-by-2


matrices,
(c) Show that every 2-by-2 matrix A for which A2 = I is either /, — /, or
one of the matrices described in (b).

SECTION 4

The product of an m-by-n matrix and ^-dimensional column vector LINEAR FUNCTIONS
(n-by-l matrix) is an w-dimensional column vector. An w-tuple in %
n

obviously corresponds to a unique rt-dimensional column vector, and


vice versa. From here on we shall often simply consider elements of 31"
to be column vectors. With this convention, multiplication by any given
m-by-n matrix defines a function from 31" to 3l m Indeed a matrix equa- .

tion such as

fyi
) =
2 3
~4
\yj
~ (
\l -1 2)

is equivalent to a set of numerical equations

jx = 2x + x 3x 2 — 4x 3

y<i = Xi X2 ~r 2X3,

as may be seen by simply writing out the result of the matrix multiplication.
Thus the vector function described by a matrix amounts to multiplication
of a domain vector by the matrix.
Functions given by matrix multiplication can be characterized by some
very simple properties. Note that the definitions given below apply to
functions between any two vector spaces, although we are at present
concerned only with the spaces 31".

A function / with domain a vector space "Xf and range a subset of a


vector space ID is a linear function if the equations

f(x + y) =/(x) +/(y)


/(rx) = r/(x)

C
hold for all vectors x, y in \J, and all numbers r.

4.1 Theorem

A function from %n to %m is linear if and only if it coincides with


multiplication by some m-by-« matrix.

Proof. We show first that if/ is defined by/(x) = Ax for a fixed


matrix A, then /is linear. We
must show that/(x + y) =/(x) +
/(y) and/(rx) = rf(x) for any x, y, and r. But by the definition of/
26 Vectors and Linearity Chap.]

these equations amount to A(x \- y) Ax + Ay and A(rx) =


/i Ki. which hold by properties 2 and 3 for matrix multiplication.
The proof of the converse is a little more complicated. Suppose

we are given a linear function g from :H" to :i{" We have to find an 1


.

m-by-n matrix A such that g(x) Ax for every column vector x in


:K". Let e, be the column vector in 31" that
for itsy'th entry and lias 1

for all other entries. Now let A be


whose jth column is the matrix
the ///-dimensional vector g(e ; ) fory 1.2 //. (Thus A has m
rows and // columns, as required.) Any column vector x with
entries .v,, x 2 v„ can be written as X& -f-
, + x n e n Since . . . .

g is linear,

g(x) = g{xx*i + • • • + xnen )

= *ig(ei) + • • • + x n g(e„).
By the definition of A we have

g(x)

a 21 x x + . . . + «o„.v„

^/,„ l
.v, + . . . + fl,„, r v,
7

Finally, by the definition of matrix multiplication,

g(x) = /fx.

The proof given above actually shows how to construct the matrix
corresponding to a linear function. This construction is important and
we summarize it as a theorem for emphasis.
Sec. 4 Linear Functions 27

4.2 Theorem

Let 31" — * 'Ji


m be a linear function, and let A be the matrix whose
y'th column is/(e ;
). Then/(x) = Ax for every x in 31".

The result is easy to remember if we consider, for example, the case of


a function /from ft 2 to ft
2
. The matrix A, for which /(x) = Ax, is then
2-by-2, and its columns are the result of applying A successively to
n\ /o\
ei = and e2 =
W \i,

b d]\o)
~ \b)' \b d)\\) \d t

Example 1. Consider the function ft 'Ji described in terms of the 2 — 2

geometric representation of ,'R 2 as a counterclockwise rotation of 30°. Any


rotation leaves the origin fixed, preserves distances, and carries figures
such as parallelograms onto congruent figures. From these properties,
and the way in which addition and numerical multiplication of vectors
can be done by geometrical constructions (Section 2), it can be shown that
a rotation is a linear function. Figure 10(a) shows the vectors e u e 2 ,f(ei),
and/(e 2 ). By trigonometry we see that

By Theorem 4.2,

for any vector


W in ft 2 .

Example 2. Define 'Ji


2 — 3l 2 to be the linear function which multiplies
horizontal distances by 3 and vertical distances by 2. Then fie}) = 3e x
and/(e 2 ) = 2e 2 and /has the diagonal matrix
,

2
28 I 'ectors and Linearity Chap. 1

m = Kg
*•> - ("$

(a) (b)

Figure 10

The geometrical effect of/ is illustrated in Fig. 11, in which C is the unit

circle xx + jc£ = 1, and /(C) is the image of C under/ If


( J
is the

/*i\ /"i\
= =
image of
W under/ then x x
\uj and .v 2 Ui.z . Hence, if
\W2/
I is in

/(C), then h —= 1 ; this can be recognized as the equation of an

ellipse with semimajor axis 3 and semiminor axis 2.

Theorem 4.2 says that, f/is a linear function from :K" to 3t"\ then the
;

jih column of the matrix of/is/(e ). Because we can write any vector x in
;

:K" as a linear combination

x = c^ + . . . + c„e„,

Figure 11
Sec. 4 Linear Functions 29

we can use the linearity of/ to write

f(x) =c 1 f(fi 1 ) ... + c n /(e n ).

Since x is an arbitrary vector in the domain of/, the last equation expresses

an arbitrary vector in the range of/ as a linear combination of the vectors


/(ej), . ,/(e„). But these vectors are just the columns of the matrix of/
. .

so we have the following.

4.3 Theorem

Let 3V — *.'/{'" be a linear function with matrix A. Then the


columns of A, looked at as vectors of %m , span the range of/

Example 3. If P is a plane through the origin in .'K


3
, then P consists of
3
all linear combinations of two vectors Vj and y 2 in .'R . The function
M -L+ 'Ji
3
defined by

/(
= "yi + ^2
J

is linear (why?), and Theorem 4.2 says that the matrix of/has as columns
the two vectors

- J, and = y.
/(J) /(J)
For example, if

then

[\
VI
'
1

and indeed the range of/is spanned by the columns of the 3-by-2 matrix.

Example 4. If L is a line consisting of all points of the form tu x + u ,

with Uj =£ 0, and /is a linear function, then

/(/u 1 + u ) = //(u 1 )+/(u )

= ty x + v .
30 Vectors and Linearity Chap. J

Thus, unless /in,) 0, the image/(L) is also a line. If/'tuj) - 0, then


of course /(/ > is the single point /(u„). The equation /(u,) eannot
hold for a nonzero vector u, if the linear function / is one-to-one, as
defined below.

A function /is said to be one-to-one if each point in the range of/


corresponds to exactly one point in the domain off. In other words,/
isone-to-one if the equation /'(x,) /(x 2 always implies that x, ) x.j.

For example, of the two real-valued functions /( v) x and g(x) — x2 ,

it is obvious that/is one-to-one. But g is not one-to-one because g(x) =


»( x) for every real number .v. I or linear functions we have the following
criterion.

4.4 Theorem

A linear function / is one-to-one if and only if /(x) - implies


x 0.

Proof. For suppose that /'(x) always implies x 0. If /(x,) =


/'(x 2 ). the linearity of /shows that /(Xj \j) 0. But then bv
assumption Xi — x. 0: so x, x._>. and therefore/must be one-to-
one. Conversely, suppose that /(x,) /(x.j) always implies x x = x 2
= .

Then, because /(0) = for a linear function, the equation f(x) =


can be written /(x) = /(0). But then our assumption implies x = 0.

Example 5. The counterclockwise rotation through 30° in Example I

is certainly one-to-one. For only one vector is carried into any other
vector by the rotation. Alternatively, the zero vector is the only vector
carried into the zero vector. In algebraic terms, the rotation is pro\ed in

Example 1 to be representable by

73

1
- x — —
n/3
v

Then Theorem 4.4 shows that the one-to-one property of/ is equivalent
to the equations

-'
2
"
1
v/ 3
Sec. 4 Linear Functions 31

having only the solution (x,y) = (0,0). This of course can be verified
directly by solving the equations.

If/ and g are any two functions (not necessarily linear) such that the
range space off is the same as the domain space of g we define the com-
positiong °f to be the function obtained by applying first /and then g.
More explicitly, if x is in the domain of/and f(x) is in the domain ofg,
then g °/(x) is defined as g(f(x)). If /(x) or g(f(x)) is not defined, then
g °/(x) is not defined.
Composition of linear functions lies behind matrix multiplication. In
introducing the concept of matrix multiplication, we considered a function
from .'H 2 to :K 2 given by
*\ = 3j>i - y2
?2 = 5)>i + 2y 2
3 2
and another from :K to .'K given by

yx = 2x + x 3x 2 + 4.y 3

JAj = Xi X2 -\- ^v 3 .

We then computed that the composition of the two functions was given by
the formulas
z, = 5*! + 10x 2 + \0x 3

z2 = 12*! + 13jc 2 + 24x 3 .

The definition of matrix multiplication was set up to give the matrix

/ 5 10 10\
of the composite function as the product of the matrices
\12 13 24/
'3 -l\/2 3 4\

5 2/ \ 1 — 1 2,

for the original functions. The following theorem states the important
fact that the composition of linear functions is always given by matrix
multiplication.

4.5 Theorem

Let /and g be linear functions with %n —* %m , 'Ji


m -^->- 31 p given
by matrices A, B. Thus/(x) = Ax and ^(y) = By for all x in :K"
and y in 3tm Then the composition g o/is given by the matrix BA,
.

so that g o/( x ) = BAx for all x in 31".

Proof. Note that A is an m-by-n matrix and B a p-by-ni matrix, so


the matrix product BA is defined and has the appropriate shape,
32 1 'ectors and Linearity Chap. 1

namely, p-by-n. By the definition of g f, g °f(x) = g(f(x)) =


B(A\) for all x in .'){". By the associative law of matrix multiplication
this is equal to {BA)\, as was to be proved.

Example 6. Consider the function :ii 2 -=-*• ft 2 defined as/°/o/ where


/is the function of Example 1. By Theorem 4.5, g(x) = Bx, where

B =

Thus g(ej =Be = 1 M= e, and g(e 2 ) = Be 2 = ( j - -e x. As

g amounts geometrically to a counterclockwise rotation of


Fig. 12 shows,
90°,which of course is what the result of three successive 30° rotations
ought to be.

M e 2 =&*l)

gi*i)

Figure 12

Example 7. The identity function :i\" — :R", such that /?(x) = x, is a


linear function. Its matrix the «-by-/; identity matrix Two functions
:R" — is

>-3V and %n —^->-;R" are said to be inverses of each other if/°g


/.

and g o/are both equal to the identity function. For example, if n = 2,


suppose that /is the counterclockwise rotation through 30° discussed in
Example 1. and suppose that g is a clockwise rotation through 30°. The
functions /and g are obviously inverses of each other. The matrices of
Sec. 4 Linear Functions 33

/and g are found from Fig. 12 to be

v5
34 Vectors and Linearity Chap. 1

The following theorem is an immediate consequence of the definitions


and of Theorem 4.1.

4.6 Theorem

A function /from .11" to


m is affine if and only if/(x) = Ax b
'.)i
-f
for some fixed m-by-n matrix A and m-dimensional column vector b.

Finding the matrix A and vector b needed to describe an affine trans-


formation is easy. If/(x) = Ax + b, then b =/(0). Then the function
g(x) =f(x) —f{0) is linear and the matrix A is found by using Theorem
4.2.

Example 10. Let ft 2 — '.R


2
be defined in terms of the standard geomet-
ric representation as a counterclockwise rotation of 90° about the center

= ev We see that f(0) = /"(e.) = (since the center of a


W
I I
, (

0/ l-i/

rotation stays fixed), and/ ^ Introducing the linear function

g(x) =f(x) -/(0), we have£(e x)


= I - (
= (
V and
J J

«w>-(_3-(_;

/0 -1\
Therefore ^(x) = I x for all x and

As a check, let us compute

(2\ /0 -l\/2\ / 1\ /0

\0/ \1 0/\0/ \-l/ \2/ \-l/ \1

which agrees with the geometric description off. See Fig. 13.
Linear and affine functions are often called linear and affine transforma-
tions, though we more often use the term function in this book.
Sec. 4 Linear Functions 35
'

36 Vectors and Linearity Chap. I

(b) What matrix corresponds to reflection in the line through the origin
135 counterclockwise from the horizontal?
(c) Compute the product of the matrices in (a) and (b) and interpret the
result geometrically.
Am. (b) "
')
(
\ - 1 0/

6. (a) Find the 2-by-2 matrix .\f r corresponding to reflection in the line-
through the origin at an angle x from the horizontal. Check your
result against Exercise 5 for a 45 . a 135 . What is M\l
(b) Let -'
be another angle and compute the product M M«. X
Show that this
represents a rotation, and identify the angle o( rotation. When does
MM x p
Mp MS
7. (a) Show that

and (

represent 90 rotations of ft 3 about the .v,-axis and .\\,-axis, respectively.


Find the matrix W which represents a 90 rotation about the x3-axis.
Also find V '
and V '
(which represent rotations in the opposite
direction I.

(b) Compute UVU '


and VUV '
and interpret the results geometrically.
(You may find it helpful to manipulate an actual 3-dimensional model.)

8. Show that a function /from one vector space to another is affine if and only if

f(rx (1 r)y) r/(x) (1 r)/(y)

for all numbers rand all x, y in the domain off. [Hint. Consider the function
g(x) f(x) /(0).]

9. Let / be a linear function from a vector space 'i to a vector space 10. Show
that if x, x„ are linearly independent vectors in U and /is one-to-one,
then the vectors /(x,) /'(*„) are linearly independent in 10. [Hint. If
/ is one-to-one, and f(x) 0, then x 0.]

3 3
10. (a) Prove that if /is a linear function from :K to ft and /is one-to-one,
then the image / (I.) of a line /. by f is also a line.
(b) Show by example that, if the linear function of part (a) fails to be one-to-
one, then the image of a line by / mav reduce to a point.
(c) Show that if L ]
and I.
z
are parallel lines and /'is a linear function, then
/(/.]) and I (L,,) are parallel lines, provided that / is one-to-one.

11. Show that the composition of two affine functions from a space into itself
is affine. Suppose f(x) Ax b, ;mxi (\ d. Suppose*/ g)(x)

Px q. Express P and q in terms of A, b, C, and d. When is/ g the same


function as g /?
:

Sec. 5 Dot Products 37

12. (a) I*. = Q,b, —


clockwise rotation of 90° with center at the origin. Find the matrix
(~J).
and let .H
2
.ft
2
be a counter-

corresponding to the linear function^ and compute the affine function


f= t a og o t b where / a and t b are the translations induced by a and b.
,

(b) Give a geometric interpretation of the composition/ = t a og o t b where ,

a is any vector in ft 2 b = —a, and^ is any rotation about the origin.


,

[Hint. What happens to the point corresponding to a under the


function/?]

13. Show that an affine function A is one-to-one if and only if A(x) = A(0)
always implies x = 0.

14. Which of the following linear functions are one-to-one?

(a) /(a-j, Xo) = (a-! + 2x 2 2x x, x2).


(b) g(x x , x.2 , x3) = (x 2 - x3 x3 - xlt
, * 2).
SECTION 5

To allow the full application of vector ideas to Euclidean geometry, we DOT PRODUCTS
must have a means of introducing concepts such as length, angle, and
perpendicularity. We will show in this section that all these concepts can
be defined if we introduce a new operation on vectors. If x (jcx xn) =
and y = (y\, ,yn) are vectors in Jl", we define the dot product or
. . .

inner product of x and y to be the number

x-y =x y +
1 1 . . . +xnyn .

It is easy to verify (see Exercise 4) that the dot product of vectors in


'J{" has the following properties.

5.1 Positivity: x • x > except that 0-0 = 0.

Symmetry x •
y = y x. •

Additivity: (x -j-
y) • z = x z + y • • z.

Homogeneity: Ox) . y = r(x y). •

Because of the symmetry of the dot product, it follows immediately that


additivity and homogeneity hold for the second vector also, that is,

(y-L z) = x«y + x«z


and
x • (ry) = r(x •
y).

Let us first of all consider the length of a given vector x in 3l 3 . If


x = (xlt x z x 3 ), we think of
, the length of the vector as the distance from
38 I ecfors and Linearit \ C hap. I

3
the origin to the point with coordinates i \,. x 2 x3 ). ,
In :K the Pythagorean
theorem gives us a simple formula for the distance (see Fig. 14). Letting
|x| stand for the length of the sector x. we have

|x|- x\ \
x\ \
x\. (I)

Thus we see that the length of the vector x can be expressed in terms of the

dot product as \ x • x |x|. Note that we use the same symbol for the
length of a vector as for the absolute value of a number. Indeed, if we
think of a number as a one-coordinate vector, its length is its absolute
value. In :H" we define the length of a vector by the same formula that

works in 3l 3 |x| : \ x • x.

Next we would like to express the angle between two nonzero vectors
in :H". The usual convention is to take this angle 6 to be in the interval
: tt (see Exercise 1). The solution to the problem is provided by
the following theorem.

x y

Figure 14

5.2 Theorem

If is the angle between x and y, then

x-y
cos
W |y|

Proof. Let us apply the law of cosines to the triangle shown in Fij

15. It states that

|x — y|
2
= |x|
2
+ |y|
2
— 2 |x| |y| cos 6,

which we can rewrite, using |x|


2
= x • x, as

(x — y) • (x — y) = x • x -f y •
y — 2 |x| |y| cos 6.

Expanding the left-hand member, we obtain

x-x-x-y-yx y-y x-x y-y - 2 |x| |y| cos 6.


Sec. 5 Dot Products 39

Hence,
2x •
y = 2 |x| |y| cos 6,

and the theorem follows by dividing by 2 |x| |y|.

We see from this theorem that x •


y in absolute value is at most

|x| |y|, i.e., |x •


y| < |x| |y|. This is known as the Cauchy-Schwarz
inequality, proved more generally in Theorem 5.4.

Example 1. What is the angle between x = (1, 3) and y = (—1, 1)?


We easily compute that
x-x = l
2
+ 32 = 10,

yy= (-i) 2
+ i
2
= 2,
and
xy = -1+3 = 2.

Hence

|x| = VlO, |y| = J2, and cos 6 = -±= .

By consulting a trigonometric table we find that 0=1.1 radians approxi-


mately (or about 63°).

The theorem also provides a simple test for perpendicularity of two


vectors. They are perpendicular if and only if 6 = rr/2, and hence cos d —
0. Thus the condition for perpendicularity is simply

x •
y = 0. (2)

Example 2. Let us find a vector x of length 2 perpendicular to (1, 2, 3)


and to (1,0, —1). From geometric considerations we see that there will
be two solutions since, if x is a solution, so is — x. (See Fig. 16.) We
40 I ectors and Linearity Chap. 1

have to write down three conditions, two for the perpendicularity require-
ments, using (2), and a third condition to assure length 2:

Xj -f 2x 2 + 3.v 3 = 0,

X, - X3 = 0,

x\ + x\ + x\ = 4.
These equations have the pair of solutions x = ±(vf, — vf, V§).
If n is a unit vector, that is, a vector of length 1 , then the dot product
n • x is called the coordinate of x in the direction of n. The geometric
interpretation of n • x is shown in Fig. 17. For since cos — (n • x)/|x|,
it follows that n • x is either the length of the perpendicular projection on
the line containing n, or else its negative. The vector (n • x)n is called the
component of x in the direction of n and is sometimes denoted x n .

Figure 17

Example 3. Suppose that the vector (1, —1,2) represents a force F


acting at a point in the sense that the direction of the force is in the
direction of the vector and the magnitude of the force is equal to the length
of the vector, namely, \/6. In such an interpretation it is customary to
picture the vector not as an arrow from the origin to the point with co-
ordinates (1, —1, 2), but as that arrow translated parallel to itself and
with the tail moved to the point of application of the force. Figure 18

Figure 18
)

Sec. 5 Dot Products 41

illustrates one possibility. The component of the force F in the direction

of a unit vector n is then interpreted as the force resulting from the


application of F at a point that can move only along a line parallel to n.

Thus to find the component of F = ( 1 ,


— 1 , 2) in the direction of ( 1 , 1 , 1

we first find the unit vector in that direction. Since |(1, 1, 1)| = \ 3, the

desired unit vector is n = (1/V3, 1/V3, 1/v 3). The component of F in

the direction of n is then

(F . n)n = (W .(^A) , 2) n

V3 \3 3 3/

Any vector space on which there is defined a product with the properties
5. 1 is an inner product space. Thus %n is an inner product space,
called
and some other examples are given in Problems 8 and 9. Inner products
in spaces other than Jl" are used in this book only in Chapter 2 Section 4
and Chapter 5 Section 5.
In terms of the inner product we can always define the length or
norm of a vector by
|x| = VX • X.

Then length has the following properties.

5.3 Positivity: |x| > except that |0| = 0.

Homogeneity: |rx| = |r| |x|.

Triangle inequality: |x + y| < |x| + |y|.

The proofs of the first two are easy and are left for the reader to check.
The proof of the third is harder and will be taken up later, though we
remark here on its geometric significance, illustrated by Fig. 19.
The fact that length has been defined in terms of an inner product
leads to some properties of length that are not derivable from those
already listed in 5.3. First we prove the following.
Figure 19

5.4 Cauchy-Schwarz inequality


|x •
y| < |x| |y|.

Proof. We assume first that x and y are unit vectors, that is, that

|x| = |y| = 1. Then,


< |x - y|
2
= (x - y) • (x - y)
= |x|
2
-2x-y + |y|
2
= 2-2x-y,
42 Vectors and Linearity Chap. 1

or
xy < 1.

Assuming that neither x nor y is zero (for the inequality obviously


holds if one of them is zero), we can replace x and y by the unit
vectors x/|x| and y/|y|, getting

x-y < |x| |y|.

Now replace x by — x to get

-x-y < |x| |y|.

The last two inequalities imply the Cauchy-Schwarz inequality.

Notice that the Cauchy-Schwarz inequality may be written as

|x-y|
<1,
|x| |y|

so there will always be an angle 6 such that

cos 6 = ^^- .

|x| |y|

Defining the cosine of the angle 6 between x and y is sometimes more


though it doesn't show us how to tell
satisfactory than defining 6 itself,
whether the angle is

Using the Cauchy-Schwarz inequality, it is easy to give the deferred


proof of the triangle inequality, for from

|x-y| < |x| |y|,

we get
|x + y|
2
= + y) (x + y) = |x| + 2x
(x •
2 •
y + |y|
2

<|x| + 2|x||y| +
2
= (|x| + |y|
2
|y|)
2
,

from which follows


|x + y| < |x| + |y|.

Two vectors x and y that are perpendicular with respect to an inner


product are sometimes called orthogonal. Furthermore, a set S of vectors
Sec. 5 Dot Products 43

that are mutually orthogonal and have length 1 is called an orthonormal


set. The idea has a useful application to finding the inverses of certain
matrices. Any square matrix whose rows, or whose columns, looked at as
n
vectors in 'J{ , form an orthonormal set with respect to the Euclidean dot
product is called an orthogonal matrix. It is very easy to find the inverse
of such a matrix. Suppose the columns of

fa n a 12

A =

form an orthonormal set. Thus, for example a\^ + a\ x + = 1, . . .

a\ 2 + a\ 2 + . . . =
and a lx a 12 + a 21 a 22 +
1 = 0. We form the matrix
. . .

A 1
, called the transpose of A, by reflecting A across its main diagonal. Thus

'«n 21 • •

a 12 a 22 .

Then columns of A are orthonormal shows that A A = I.


the fact that the l

We (Theorem 8.3) that this implies also that AA = I. A


shall see later l

similar argument replacing orthonormal columns by orthonormal rows


would show that AA = I and then (by Theorem 8.3) that A A = /. Thus
t 1

we have proved the following

5.5 Theorem

Every orthogonal matrix A is invertible and A' 1 = AK

Example 4. It is easy to verify that the matrix


44 Vectors and Linearity Chap. 1

has columns (and rows) orthonormal. The transposed matrix is

I 1 V3
\ 2 2
and so A~ x =A 1
.

The second of the above two square matrices is obtained from the first
by interchanging rows and columns. When two matrices are so related
we still say that one is the transpose of the other, even if they are not
square. Thus if

1 2 3
A =
4 5 6

then

T 4\

2 5

3 6,

Obviously,
(A 1
)
1
= A for any matrix A.

EXERCISES
1. Show that the natural basis vectors satisfy e,- • e,- = 1, and et
- • e, = if

i *].

2. (a) Find the angle between the vectors (1, 1, 1) and (1, 0, 1).
(b) Find a vector of length 1 perpendicular to both vectors in part (a).

3. Find the distance between (1, 2) and (0, 5).

4. Prove that the dot product has the properties listed in 5.1.

5. Prove the positivity and homogeneity properties of length listed in 5.3.

6. Find the coordinate of the vector x = (1, —1,2): (a) in the direction of
n = (1/V3, l/v/3, 1/V3) and (b) in the direction of the nonunit vector
(1,1, 3). (c) What is the component of x in the direction of n?

7. (a) Prove that any vector in 31" and n is a unit vector, then x can be
if x is

written as x = y +
z, where y is a multiple of n and z is perpendicular to

n. [Hint. Take y
to be the component of x in the direction of n.]
(b) Show that the vectors y and z of part (a) are uniquely determined. The
vector z so determined is called the component of x perpendicular to n
Sec. 5 Dot Products 45

8. (a) Consider the vector space C[0, 1] consisting of all continuous real-
valued functions defined on the interval <x<
[The sum of/ and
1.

g is defined by (f + g)(x) = f(x) + g(x), and the numerical multiple


rf by (rf)(x) = //(*).] Show that the product (f,g) defined by

<f,g)=\f(x)g{.x)dx

is an inner product on C[0, 1].

(b) Define the norm of/by \f\


= </,/>
1/2
. What is the norm of/(x) = xl
9. (a) Show that the product of x = (x r , x 2 ) and y = (ylt y 2) defined by

x * y = Xiji + 2x 2y 2
is an inner product,
(b) With length defined by jx] „.
= (x * x) 1/2 , sketch the set of points
satisfying |x|„. = 1.

10. Show that if |x| = |y| in an inner product space, then x | y is perpendicular
to x - y.

11. Show that the matrix

/'cos 9 —sin 6

ysin 6 cos 6

is orthogonal, and find its inverse.

12. Derive the inequality


|x| - |y| < |x - y|

from the triangle inequality, and then show that

||x| - |y|| <|x -y|.

13. (a) Show that if A is a 2-by-2 orthogonal matrix, then \A\\ = |x| for all
vectors x in R2 .

(b) Prove the result of part (a) for n-by-n matrices.


(c) Prove that, if A is n-by-n and |^x| = |x| for all x, then (Ax • Ay) =
x y for all x and y in & n
• .

14. A real-valued linear function W —> Jl is sometimes called a linear


functional. that, if/is a linear functional defined on Jl n then there
Show , is

a fixed vector y in Jl n such that/(x) = y • x for all x in 31". [Hint. What is

the matrix of/?]

15. Let

2-1
and B
3-5 2
Compute AB, BA, A B\ and B A l l l
.

46 I eclors and Linearity Chap. I

H>. (a) I or any two matrices A and //. show that it' /)# is defined, so is Z?'/F,
. and that IV A' (Mi)'
(b) Slum that if I in imciliblc, then so is .1'. and that (/f) '
(/J
]
)'.

si ( HON (,

II (I 11)1 \\ In this section we develop some facts and formulas about lines and planes
GEOMETRY in :ii- and 3l 3 . Some of the ideas will reappear in Section 1 of Chapter 3.
2 3
Recall that if x, is a nonzero vector in 5l or :l\ , then the set of all

numerical multiples tx }
is a line L passing through the origin, and any
line parallel to L„ consists of the set of all points /x, f-
x , where x ()
is some
fixed vector. An alternative way to say the same thing is: a line is the
range of a function of the form f{t) tx x„, where x, }
0.

Example 1. To describe a line L passing through two distinct points


a! and a 2 take x, a2 a,. Then all multiples f(a 2
. a,) make up a line

parallel toL, and the points /(a 2 a,) a, make up L itself. For example, |

when I \vc get a, and when / we get a 2 See Fig. 20. Alternatively,
. I .

we can write the points on L in the form /a 2 (1 — t)z v j

Figure 20

3
To determine a plane in .'fi , we take two noncollinear vectors x, and x 2
(that is, so that neither is a multiple of the other) and consider all points
ux + t'x 2 where u and v are numbers. These points form a plane P
x

through the origin. A parallel plane will then consist of all points u\ + l

/x x where x is some fixed vector. We can restate the definition by


._.

,

g 3
saying that a plane is the range of a function 'M- > .'H , where g(u, v)

uXi + vx 2 + x , and the vectors Xj, x 2 do not lie on a line.

Example 2. For three points a^ a 2 and a 3 to determine a unique plane ,

passing through them, they must not lie on a line. (Then the vectors a 3 — a,

and a 2 - a, are not collinear. For otherwise a 3 a, r(a 2 — a,), so

a3 /a 2 (1 Oa, and a 3 would lie on the line determined by a 2 and

a,.) Now take x, a3 a, and x 2 a. a,. The desired plane consists

of all points
w = iv(a 3 - aO + r(a 2 - a,) + a,.
Sec. 6 Euclidean Geometry 47

Figure 21

To get a l5 take u =v— 0. To get a 2 , take u = 0, v = 1. To get a 3 , take


M = l, v = 0. See Fig. 21.
Using the dot product gives an alternative way to describe a plane P.
Suppose x is an arbitrary point on P and that p is a nonzero vector
perpendicular to both x x and x 2 (so p x x = p x 2 = 0) where Xx and x 2 are • •

vectors parallel to P. Then the equation

p • (x -x = ) (1)

is satisfied by all points x = ux l + t>x 2 + x on P because p • {ux x +


vx 2 ) = wp • xt + yp • x2 = 0. It is also easy to check that Equation (1)
can be solved for x to get x = wy x + vy 2 +
x and we leave the ,

solution as an exercise. Figure 22 shows the relationship between


the vectors. To find a vector p perpendicular to two vectors x t
and x 2 , it is convenient to use the cross-product of x x and x 2 ,
defined for Xj = (u u u 2, u 3 ) and x 2 = (v u v 2 , v 3 ) by

P = («2«8 — M 3^2> "3^1 — "l^3> "1^2 — "2^l)- (2)

It is routine to check that p • xx = and p • x — 0. For example,


2

p •
Xi = U l U 2 Vi — UiU 3 V 2 +u 2 u3v x —u 2 u x v3 + u u v — U U V = 0.
3 t 2 3 2 X

The cross-product is taken up in more detail in the next section.


In the meantime we will simply use it to find a vector perpendic-
ular to two given vectors. See Problem 13.

Example 3. Suppose we are given a plane parallel to the two vectors


Xj = (1, 2, —3) and x 2 = (2, 0, 1), and containing the point x =
(1,1,1). A vector p perpendicular to x x and x 2 is given by their cross-
product:

p = ((2)(1) - (0)(-3), (2)(-3) - (1)(1), (1)(0) - (2)(2))

= (2, -7, -4).

Now writing x = (x,y, z), we require that p • (x — x ) = 0, that is,


48 Vectors and Linearity Chap. 1

(2, —7, —4) •


(x — \,y — 1, z — 1) = 0. According to the definition of
the dot product, this last equation is

2{x - 1) - l(y - 1) - 4(z - 1) =


or
2x - ly - 4z + 9 = 0. (3)

In other words, the given plane consists of all points (x, y, z) with co-
ordinates satisfying that equation.

The simplest way to sketch the plane satisfying an equation like (3)
is to pick three points with simple coordinates that satisfy say (0, 0, f), it,

(0, \ , 0), and (— f , 0, 0). Then locate the points relative to coordinate
axes and sketch the plane using the three points as references. We have
purposely chosen in our example points that lie on the coordinate axes.
See Fig. 23.

Figure 23

If p is a nonzero vector in 2
ft and x is a point in ft
2
we can determine ,

the line perpendicular to p and containing x by the equation p (x — x ) = •

0. If p = (pi,p 2 ) and x = (xQ ,y ), the equation becomes

(Pi,p*)- (*— x ,y—y ) =


or
Pi(x - *o) + Pziy - Jo) = 0.

This is one of the several forms for the equation of a line in the xv-plane.
The slope of the line is evidently —p /p 2 1
.

The representation of a plane by an equation p • (x — x ) = is not


unique because any nonzero multiple of p can replace it, leaving the set of
vectors x satisfying the equation unchanged. However, it is sometimes
useful to normalize the equation of a plane by requiring that p be a unit
vector. The normalized equation then becomes n (x — x ) = 0, where

n = p/|p|. Alternatively we can write n • x — c, where c = n x • .


Sec. 6 Euclidean Geometry 49

6.1 Theorem

If n • x = c is the normalized equation of a plane P and y is any point,


then the distance from y to P is the absolute value of c — n •
y.

Proof. By definition the distance is to be measured along a line from


y perpendicular to P. This line can be represented by rn y, and the +
intersection with P will occur for some / t The desired distance is = .

then |(7 n + y) — y| = |/ n|, which is simply the absolute value of t ,

since n has length 1. Since / n +y lies in P, then n •


(/ n + y) = c.

But n • n = 1, so we obtain t +n •
y = c, from which the theorem
follows.

Example 4. We shall find the distance from (1,1,1) to the plane


(3, 0, —4) x • = —3. The normalized equation of the plane is given by
(|, 0, -i) x • = -f . Then

c-n.y = -f-G, 0,-|). (1,1,1)

Hence the distance is f . Notice that the equation of the plane could also
be written 3x — 4z = —3 and, in normalized form, (f)x — (|)z = — f.

EXERCISES
1. Write a formula in the form/(r) = tin x + x for a line containing x and
parallel to \ v In each case determine whether the point a lies on the line,
that is, whether a is in the range off.

(a) x = (1, 1), Xl = (-2, 3); a = (-3, 7).


(b) x = (0, 0, 1), xx= (1, 1, 1); a = (5, 5, 4).
(c) x = (1, 0, 2), x1= (2, 0, 3); a = (3, 0, 5).

2. Write a formula in the form^(w, v) = ux 1 + vx 2 + x for a plane containing


x and parallel to x x and x 2 In each case determine whether the point a
.

lies on the plane, that is, whether a is in the range of g.

(a) x = (0, 0, 0), xx = (1, 0, 1), x2 = (2, 3, 0); a = (4, 3, 1).


(b) x = (1, -1, 1), Xl = (1, 1,0), x2 = (-1, 2, 0); a = (1, 1, 2).
(c) x = (0, 0, 1), x 3
= (1, 2, 1), x2 = (2, 1, 2); a = (3, 0, 4).

3. Sketch the lines determined in Jl 2 or 3l 3 by

(a) r(l,2) + (-1, -1).


(b) /(1,0, 1) + (1,1,0).
(c) (l,2).x=2.
,

50 Vectors and Linearity Chap. 1

4. Sketch the plane determined in ft 3 by

(a) h(1, 2, 0) + v(2, 0, 1) + (1, 0, 0).


(b) (-1, -1,1). x-1.
(c) 2x + y + z = 0.

5. Find an equation for the line in :ft 2 that is perpendicular to the line tx x + x„
and passes through the point y 2 .

6. Find the cosine of the angle between the planes x x • x = cx and y 1 • y = c2 .

7. Prove that if xx • x = cx and y x • y = c 2 are normalized equations of two


planes, then the cosine of the angle between them is xx •
yx .

8. For each of the points and planes or lines listed below, find the distance
from the point to the plane or line.

(a) (1,0, -1); (1, 1, l)-x = 1.

(b) (1,0, -l);x + 2y + 3z = 1.

(c) (l,2);(3,4).x =0.

9. (a) Verify that the cross-product of x x and x 2 [formula (2) in the text] is
perpendicular to x 2 .

(b) Find a representation for the line perpendicular to the plane consisting
of all points w(l, 2, 1) + v( — 1, 0, 1) + (1,1,1) and passing through the
origin.

10. (a) Find a representation for x = (x, y, z) satisfying

2x + 3y + z = 1

by finding vectors x1( x 2 and x such that x ,


= ux 1 + vx 2 +x . [Hint.
Let u = y, v = z.]

(b) Do the same as in part (a) for the general equation p • (x —x) = 0,
when p ^0. [Hint. If p ^=0, then p has a nonzero coordinate.]

11. (a) If x is any vector in &3 , show that

x = (x • e^e! + (x • e 2 )e 2 + (x • e 3 )e3.

(b) If the vector x in part (a) is a unit vector, that is, a vector u of length 1

show that u • e^ = cos a,, where a, is the angle between u and e,. The
coordinates cos a t
are called the direction cosines of u relative to the
natural basis vectors e,. If x is any nonzero vector, the direction
cosines of x are defined to be the direction cosines of the unit vector
x/|x|.
(c) Find the direction cosines of (1, 2, 1).
(d) Show that parts (a) and (b) generalize to &n .

12. Let u and v be points in ft". Show that the point ^u + ^v is the midpoint of
the line segment joining u and v.

3
13. Let u and v be noncollinear vectors in :R . To find a vector x perpendicular
to both u and v, we solve the equations u • x = and v • x =0, that is,
Sec. 7 Determinants 51

solve
u xx + u 2y + u3z =
(*)
v xx + v 2y + v3z = 0,

where u = (%, u 2 w 3 ). v = (v lt v 2 , v 3 ), and x = (x, y, z).

(a) Show that


(u 2 v 1 - u x v 2 )y = (u x v 3 - u3 v x )z

(u x v 2 - u^Jx = (u 2 v 3 - u3 v 2)z.

(b) Show that if ux v2 — u2 v1 =£ 0, then the equations (*) have a solution

(x, y, z) = {u 2 v 3 - u3 v 2 u 3 v 1
,
- u x v3 u x v 2
,
- u 2 v x ).

(c) Show how to solve (*) if u1 v 2 — u2v x = 0.

In this section we define and study a certain numerical-valued function


defined on the set of all square matrices. The value of this function for a
square matrix M is called the determinant of M and is written det M.
Another common way to denote the determinant of a matrix in displayed
form is to replace the parentheses enclosing the array of entries by vertical
bars. Thus the notations

mean the same as


52 Vectors and Linearity Chap. J

Then
-9
5, An
4 2

-5 -6^
a 23 = 0,
-3 4)

=
(4), &la =

We can now make the definition of determinant:


For a 1 -by- 1 matrix A = (a), we define

det A = a.

For an n-by-n matrix A = (a i}), i,j= 1, • • • , «, we define

7.1 det A = 2(-l)' +1 «ii det /4 l3


-

= a n det y4 n — a 12 det A l2 + . . .
— (— \)"a ln det A ln .

In words, the formula says that det A is the sum with alternating signs, of
the elements of the row of A, each multiplied by
first the determinant of
its corresponding minor. For this reason the numbers

det A 1U -det A 12 ...,(- l) n+1


, det A ln
are called the cofactors of the corresponding elements of the first row of A.
In general, the cofactor of the entry a u in A is defined to be (— l) i+i det A tj
.

Thus in Example 1 the entry a 21 = 8 in the matrix A has cofactor

(_ 1)2+1 det /
) =40
\ 4 2/

The factor (
— 1)'>J associates plus and minus signs with det A n according
to the pattern

/+- + -••
+ - + • •

+ - + • •
Sec. 7 Determinants 53

Example 2.

1 2\
(a) det [
= 1 (4) - 2(3) =4-6= -2.

la b\
(c) det I = ad — be.

The example is worth remembering as a rule of


result of the last
[Link] determinant of a 2-by-2 matrix is the product of the entries
on the main diagonal minus the product of the other two entries. Thus
2-by-2 determinants can usually be computed mentally, and 3-by-3
determinants in one or two lines. In principle, any determinant can be
calculated from the definition, but this involves formidable amounts of
arithmetic if the dimension is at all large. Some of the theorems we prove
will justify other methods of calculation, which involve less arithmetic
than that required in working directly from the definition for n > 3.

Determinants were originally invented (in the middle of the eighteenth


century) as a means of expressing the solutions of systems of linear
equations. To see how this works for two equations in two unknowns,
consider the general system

®Z\X\ I #22-^2
= '"

The variable x 2 can be eliminated by multiplying the first equation by a 22


and the second by <z 12 and then taking the difference. The result is the
,

equation
(a 22 a n - a 12 a 2l )x l = (a 22 r x - a 12 r 2 ).

This equation may be written as

xt det A = det B ll)


.

54 Vectors and Linearity Chap. 1

where and /?'"

is
I

W
the result of replacing the
"J
is the matrix ol coefficients

first column of A by
,-

The reader can .


\r, a J
'-
(an rA
easily dense the equation x 2 del A del /i'-'. where S (2)
(rA \«2i V
is the result of substituting for the second column of A. As we shall

see in Theorem 8.4, a similar result holds for systems of // linear equations
in /; unknowns, for all values of n.
Since our definition of determinants by 7.1 is inductive, most of the
proofs have the same character. That is, to prove a theorem about
determinants of all square matrices, we verify it for 1 -by- 1 (or in some
cases. 2-by-2) matrices, and also show that if it is true for (// - l)-by-
(// 1) matrices, then it holds for //-by-" matrices. In the proofs, we
give only the argument for going from step // I to step //. The reader
should verify the propositions directly for 1 -by- 1 and 2-by-2 matrices;
the verification is in all cases quite trivial. A, B, and C will always denote
n-by-n matrices. We write a, for the /lh column of a matrix A. If A has //

rows, then a, is a vector in :\{"

7.2 Theorem

If B is obtained from A by multiplying some column by a number r,

then det B= r det A.

Proof. Suppose b, = ra ; while b k - a for k - /. Then in particular


, .
;
.

b Vj = For k ra-ty B u is obtained from A lk by multiplying a


, /', .

column by r; since B lk and A v are (// — l)-by-(/z — 1), we have .

det B n = r det A lk by the inductive assumption. On the other hand,


.

B\j = A-lj, and b lk — a ik for k / j. Thus, whether k = j or not,


b lk det B Xk — ra u det. A lk Therefore
.

detB=i(-l) tn />,,detB u .

A-=l

= J(— l)' M n/ I7i


det A lk = rdet A.

7.3 Corollary

[fa matrix has a zero column, then its determinant is zero.

Proof. If a ; = 0, then a, 0a,. Then by Theorem 7.2. det A =


• det A 0.
Sec. 7 Determinants 55

Example 3. Let

1 2 3\ / 1 6 3N

1 2 41, 5=1-164
12/ \ 3 2,

B is obtained from A by multiplying the second column by 3.

det A = (1)(4 - 4) - 2(-2 - 0) + 3(- 1 + 0)

=0+4-3=1
det B= - 6(-2 - 0) +
(1)(12 - 12) 3(-3 + 0)

= 0+ 12-9 = 3 = 3 det /I.

7.4 Theorem

Let A, B, and C be identical except in they'th column, and suppose


that they'th column of C is the sum of they'th columns of A and B.
Then det C= det A + det B.

Proof. We have c = a + b Xi ls Xi , and= BXi For also C Xi = ,4 1; .

/: ^£y, c lfc = a = 6 U and Cu


IJk , . is and 5 1J: except
identical with /i lfc

for one column which is the sum of the corresponding columns of


A lk and B lk Thus . for k ^ j, det Clk = det A + lk det 2? lft , by the
inductive assumption. For k ^ j we have
c u det Cu = c1Jfc det + c det B lk
A lk lft

= a lk det 4 + b lk det 5 U lfc ,

while

Cij det C = 1;
a Xj det C +
1? fe
1;
- det CXi
= a t j det Au + 6 l3 det - 2? l3 .

Hence

fc+1
detC=i(-l) c [Link] 1 ,.

= 2(-l) k+ 'a lk det ^ + 2(-l) k -% k det B lfc x

= det A+ det B.
56 Vectors and Linearity Chap. 1

Example 4. Let

1
Sec. 7 Determinants 57

To exhibit det (x, b, c) as a linear function of x, we need merely calculate

(Xi bx cx

x2 b2 c2

x3 b3 c3

lb 2 c2
\
lx 2 c2
=x x det — bi det
\
-f c x det
\b 3 cj \x 3 c3 J \x 3 u3
= Xiibfa - b 3c 2 ) - - x c + c^xfo - x b
b x (x 2 c 3 3 2) 3 2)

= *i(6 c - 2 3 b3c2) - x s (Va - Vi) + ^3(6^2 - c^a),


which is a linear function of x.

Another important property of determinants is that if any two columns


(or rows) of a matrix are interchanged, then its determinant changes sign.
We first prove the result for adjacent columns.

7.6 Lemma
If B is obtained from A by exchanging two adjacent columns, then
det B = — det A.

Proof. Suppose A and B are the same, except that a ; = b J+1 and
a J+1 =.bj. For k #y or j -f 1, we have b lk Blk = = a lk and det
— detA lk by the inductive hypothesis, so Blk = (— l) k+1 b lk det
— (— l) u det A lk On
fc+1
tf the other hand b Xj = a lj+1 and Bu =
.

A 1J+1 so (-iy+1b li detBlj = (-iy+^alti+1 dctA ^rl = -(-iy+ 2 1>

a lj+1 detA lj+1 Similarly (-\)> +2 b lj+1 det B 1J+1 = (-l) 3+1fl„ det A
.
v .

Thus each term in the expansion of det B by 7.1 is matched by


a term equal to its negative in the expansion of A, and it follows
that det B = — det A.

Then
det = (1)(8 - 5) - (3)(-4 - 3) + (-2)(l0 - (-12))
A
= 3 + 21 - 44 = -20
det B = (1)(5 - 8) - (-2)(10 - (-12)) + (3)(-4 - 3)

= -3 + 44 - 21 = 20
= -det A
,

58 Vectors and Linearity Chap. 1

1.1 Theorem

If B is obtained from A by exchanging any two columns, then


det 5 = — det /I.

Proof. Suppose there are k columns between the two columns in


question (so k = if The first column can be
they are adjacent).
brought next to the second by k exchanges of adjacent columns.
Then the two columns can be exchanged, and with another k
exchanges of adjacent columns the second column can be put back
in the original place of the first. There are 2k + steps in all, and 1

by Lemma 7.6 each step changes the sign of the determinant. Since
2k + 1 is an odd number, det B = — det A.

7.8 Theorem

If any two columns of A are identical, then det A = 0.

Proof. Exchanging the two columns gives A again. Therefore


det A = —det A, and so det A = 0.

Multiplication by an n-by-n matrix gives a linear function from %n


to Jl", and it is natural to ask whether the determinant of the matrix is
related to some geometric property of the corresponding linear function.
It turns out that the determinant describes how the function affects
volumes in 3t n In 3\ 2 of course, "volume" is area.
. ,

/3 0\
,
Example 6. Multiplication by gives a function Jl 2 -> 51 2 which
J

multiplies lengths in the x-direction by 3 and in the j-direction by 2.


Areas are magnified by a factor of 6, as illustrated in Fig. 24(a), which
/3 0\
shows a unit square S and its image f(S). Note that det I = 6.

1 2
• Z \
For another example, consider the function g given by the matrix I

which has determinant 1. Its effect is illustrated in Fig. 24(b). The unit
square is mapped into a parallelogram with the same base and altitude,
so the area remains unchanged. The composition g of multiplies areas
by 6 [since f(S) has 6 times the area of S and g(f(S)) has the same area
as/(5)]. The matrix of g °/is given by the matrix product
Sec. 7 Determinants 59

(0,2)

(0,1)
60 Vectors and Linearity Chap. 1

7.10 Product Rule

If A and B are any two square matrices of the same size, then
det (AB) = (det^)(det5).

Proof. Let

L(Xj, . .
.
, x„) — det A det (x ls . . . , xn) — det (Ax Y , . . . , Ax n ),
where x l5 . . . , x„ are vectors in 51". Clearly L is linear as a func-
tion of each vector Xj. Furthermore, L(e, , . . . , e, ) = for any
set {e, , . . . , e, } of natural basis vectors. The reason is that if

any of e, , . . . , e, same then both det (e,


are the , . . . , e, ) and
det (Ae {
, . . . , Ae in ) are zero by Theorem 7.8. Otherwise e, , . . . , e,

are just e l9 . . . e n in some order, and by Theorem 7.7


,

det A det = ±det A det (e en = ±det A det /


(e,
v . . . , e, )
n
ls . . . , )

= ±det A = ±det (Ae lt . . . , Ae n )


= det (^e. /4e. , . . . , ).

But then L(b (


, . . . , bj = 0, where

is they'th column of B. For, using the linearity of L,

Lib,, . .
. , b„) = I.(i><A, . . . , jU»e<)

= i...i6, ll ...fe, nri L(e il ,...,e,J = 0.

Hence,
det A det B- det /*J? = det A det (b lt . .
.
, bj
- det (Ab u . . . , Ab n )
= L(bls . . . , b„) = 0.

Note that the proof just given uses only Theorems 7.5, 7.7, and 7.8
and does not use Theorem 7.9. (The point is important because the
product rule is used in the proof in Section 7 of the Appendix, on which
the proof of Theorem 7.9 depends.)
Sec. 7 Determinants 61

The natural unit of area in 3l 2 is given by the unit square with edges
(1,0) and (0, 1), and the natural unit of volume in iR 3 is given by the unit
cube with edges (1 0, 0), (0, 0), (0, 0, 1). In general we take the unit of
, 1 ,

volume in 31" to be that of the cube whose edges are the natural basis
vectors e l5 e„ that form the columns of the n-by-n identity matrix.
. . . ,

Moreover, we can take any n vectors x l5 x n in %n and form all linear . . . ,

combinations t x x x + + t n x n where each of the real numbers t1}


. . . , . . .
,

tn satisfies the condition < tt < 1. The resulting set of points is called
the parallelepiped determined by its edges x x x n If we choose only , . . . , .

two vectors xl5 x 2 then we speak of the parallelogram determined by


,

x x and x 2 Figure 25 shows a parallelepiped.


.

Figure 25

7.11 Theorem

Let a l5 a 2 ,
. . . , a n be n vectors in 31". Then the volume of the
parallelepiped with edges a l5 . . . , a„ is |det (a l5 . . . , a„)|.

Proof. The function/ whose matrix has columns ax


linear a„
carries e, into a ; , by Theorem
4.2. Hence, /transforms the unit cube
into the parallelepiped with edges a s Since it multiplies volumes by .

the factor |det (a l5 a„)|, and the cube has unit volume, the
. . . ,

volume of the parallelepiped is [det (a l5 a„)|. . . . ,

r x cos X
r2 cos 0;
Example 7. Let a x , a .. The vectors have
r x sin X
r 2 sin 2

lengths r x and r2 , and make angles X


and 2 with the x-axis as shown in
Fig. 26. Then det (a l5 a 2 ) is
62 Vectors and Linearity Chap. 1

Figure 26

r x cos L r2 cos 2
det /^(cos X sin 2
— sin 6 X cos 2)
/*! sin d 1 r 2 sin 2

= r x r 2 sin (0 2 — X)

This number may be interpreted as the product of the base rx by the


perpendicular height r 2 sin (0 2 — dj.

We have seen that the absolute value of det (a l5 a„) can be . . . ,

interpreted as a volume. The sign of this determinant also has a geometric


interpretation. We say that an ordered set of vectors (a 1? . . . , a n) in 31"

has positive orientation (or is positively oriented) if det (a l5 . . . , a„) > 0,


and has negative orientation if det (a 1? . . . , a„) < 0. If the determinant
is equal to zero, the orientation is not defined.
Example 7 shows that in Jl 2 , the sign of det (a 1; a 2 ) is the same as the
sign of sin 6, where B — 62 — 0j is the angle from a 1 to a 2 . Thus the
orientation of (als a 2 ) issome counterclockwise rotation of less
positive if

than 180° will turn a x to the direction of a 2 The orientation is negative if a .

clockwise rotation is required; it is not defined if a and a 2 lie in the same x

line. Thus in 'Ji 2 orientation corresponds to a direction of rotation.


,

(Note that the orientation of (a x , a2) is opposite to that of (a 2 a^.) ,

Property 7.7 of determinants, of course, implies that the orientation of a


set of vectors is always reversed if two vectors of the set are exchanged.
The interpretation of orientation in Jl 3 is less obvious. The sets of
vectors (x, y, z) and (— x, y, z) shown in Fig. 27(a) and 27(b) have
opposite orientations since, by Theorem 7.2, det ( — x, y, z) = —det
(x, y, z). The ordered set of vectors (x, y, z) is said to form a right-
handed system because, when the thumb and index finger of the right
hand are made to point in the x- and y-directions, the middle finger will
point in the z-direction. Similarly, (— x, y, z) form a left-handed system.
In this book we have chosen to draw pictures in 3-space so that the vectors
Sec. 7 Determinants 63

(a) right-handed (b) left-handed

Figure 27

form a right-handed system. Since det (e 1; e 2 e 3 ) = det 1=1,


«i> e 2 , e 3 ,

this implies thatour right-handed system has positive orientation, and a


left-handed system would have negative orientation.
Let u = (ul3 u 2 u 3 ) and v = (u l9 v 2 v 3 ) be vectors in Jt 3 The vector
, , .

with coordinates
«2
64 Vectors and Linearity Chap. I

A convenient way to remember the formula for the cross-product is to


think of the formal "determinant"
Sec. 7 Determinants 65

Figure 28

It is sometimes appropriate to combine the ideas of volume and


orientation in a single concept. We define the oriented volume determined
by an w-tuple of vectors to be the ordinary volume if the orientation of the
rt-tuple is positive and to be the negative of the volume if the orientation

is negative. Then the oriented volume of the ordered set (a l5 a„) is . . . ,

equal to det (a x a„). The relation between oriented volume and


, . . . ,

ordinary volume is very much like the relation between directed distance
on a line and ordinary distance. Indeed, oriented volume may be con-
sidered a generalization of directed distance, and we use the idea in
Chapter 7, Section 7.

EXERCISES
1. Find AB, BA, and the determinants of A, B, AB, and BA when
/l -2
(a) A [Arts, det AB = -14.]
,3 1

'2

| 3

v
4

2. Find the coefficients needed to express each of the following as a linear


function of the x's.

(a)
66 Vectors and Linearity Chap. 1

3. Show that if D is the diagonal matrix diag (/-,,..., r„), then det D =
/,;_, . . . rn .

4. What is the relation between

(a) det A and det ( /*)?


(b) det (a,, a 2 , . . . , a n ._,, a 71 ) and det (a„, a„_i, . . . , a2 , a^?
5. Verify the product rule for the pairs of matrices (a) and (b) in Problem 1.

6. Apply the product rule to show that, if A is invertible, then det A # and
(det A l
) (det/4)- 1 .

7. Let /4 be an m-by-m matrix and B an n-by-n matrix. Consider the (//; /;)-

IA Ov
by-(/w «) matrix which has A in the upper left corner, B in the
\0 B/
lower right corner, and zeros elsewhere. Show that its determinant is equal
to (det /l)(det B). [Suggestion. First consider the case where one of A or B
is an identity matrix, and derive the general result from the product rule.]

8. It is geometrically clear that a rotation in R2 preserves orientations, that a


reflection reverses them, and that both leave areas unchanged. Verify this
by finding the determinants of the associated matrices. (See Problems 4 and
6 in Section 4.)

9. By interpreting the determinant as a volume, show that |det (x 1( x 2 x 3 )| , <


|xj| |x 2 |

|x 3 |
for any three vectors in R 3 and
, that equality holds if and
only if the vectors are mutually orthogonal.

10. Find the volume, and the area of each side, of the parallelepiped with edges
(1, 1, 0), (0, 1, 2), (-3, 5, -1).
[Ans. volume = 17, areas = Vl66> V66, 3.]

11. If u - (2, 1, 3), v = (0, 2, -1), and w = (1, 1, 1), compute u x v,

det (u, v, w), (u x v) w, (u x v) x w, and u x (v x w).

12. Prove that u x v = -v x u, that u x (v f- w) (u x v) + (u x w), and


that (au) X v = u x (av) = a(u X v), for a real.

13. Find a representation for a line perpendicular to (2, 1, 3) and (0, 2, —1),
and passing through (1, 1, 1).
3
14. Let P be a parallelogram determined by two vectors in :K . Let Px P v and
, ,

P, be the projections of P on the ir-plane, the r.v-plane, and the .vv-plane,


respectively. If A (P) is the area of P, show that A 2 (P) - A 2 (PX ) - A 2 (P, ) + I

A 2 (P Z ).
15. (a) Verify by direct coordinate computation that |u x v|
2
= |u|
2
|v|
2 -
2
(u • v) .

(b) Use the result of part (a) to show that |u x v| = |u| |v| sin 0, where 6 is

the angle between u and v such that -.

(c) Show that |u| |v| sin is the area of the parallelogram with edges u and v.

16. The complex numbers can be extended to the quaternion algebra JC, which
is a four-dimensional vector space with natural basis {1, i,j, k). Thus a
R

Sec. 8 Determinant Expansions 67

typical quaternion is written q = a x + a 2 i + a 3j + ajc, where the a's are


real numbers. A product is defined in X
by requiring i 2 = j 2 = k 2 = — 1
and ij = —ji = k, jk = —kj = i, ki = — ik = j. The product of two
quaternions is got by multiplying out and using the above rules for products
of basis vectors. Jl 3 can be looked at as the vector subspace S of JC consisting
of quaternions with "real part" equal to zero and thus with natural basis
{'.;.*}

(a) Show that the quaternion product of two elements of S is not necessarily
in S.
(b) Define a product on S by first forming the quaternion product and then
replacing its real part by zero. Show that the resulting product is the
same as the cross-product in R3 .

17. Prove the identity a x (b x c) = (a • c)b — (a • b)c for vectors a, b, and c in


Jl 3 [Hint. Choose an orthonormal
. set of vectors (u 1( u 2 u 3 ) for ,
Jl
3
so that
(u 1; u 2 , u 3 ) is positively oriented and

a = fliUj + o2u 2 + fl
3u3

b = b^ + b 2u 2

c = c^.]
SECTION 8

In the previous section we defined the determinant of an n-by-n matrix DETERMINANT


A = (aif ) by EXPANSIONS

det^=i(-l) m a 1;.det,4 iy ,

where, in general, A i}
; denotes the (« — l)-by-(« — 1) minor corresponding
to a tj . In the present section we prove more general formulas of the same
kind. These formulas, which apply to any n-by-n matrix A, are

8.1 det A = 2 (-\y +ja a det A u

8.1R det A =Z(-l) i+ia ij detA i


;=1

Formula 8.1 holds for each integer j between Formula 1 and n, while
8. 1 R holds for each integer Formula 8.
i between 1 and n. (For / = 1 , 1

coincides with Formula 7.1 used earlier in defining determinants.) For a


giveny, the matrix elements ati on the right side of 8.1 are in theyth column
of A, so 8.1 is called the expansion of det A by theyth column. Similarly,
the right side of 8.1 R is called the expansion of det A by the /th row. We
68 Vectors and Linearity Chap. 1

postpone the proof of the formulas to the end of the section, first showing
some of their consequences.

Example 1. Let
12 3 4\

A = I 5 6 7

\8 9 0/

The expansion of det A by the second row is

/3 4\ (2 4\ (2 3
-5 det +6 det - 7 det
\9 0/ \8 0/ \8 9

= (-5)(-36) + (6)(-32) - (7)(-6)

= 180 - 192 + 42 = 30.

The expansion by the third column is

IS 6\ (2 3\ (2 3
4 det - 7 det + det
\8 9/ \8 9/ \5 6

= (4)(-3) - (7)(-6)

= -12 + 42 = 30.

Formulas 8.1 and 8.1R can be useful in evaluating a determinant, but


some of their theoretical consequences are more important. Let the
matrix A be given and consider the expression

2= (-iy+^det^,,
t' l

where x 1 , . . . , x n may be any set of n numbers. From 8.1 we see that this
is equal to a certain determinant; in fact it is the expansion by the yth
column of the matrix obtained from A by replacing column with
they'th
(Xj, . . . , -Y,,). Now consider what happens if we x n equal
take xlt . . . ,

to the elements alk a 2k


, , , a nk of the Ath column of A. If k =j, of
course we simply have the expansion of det A by the yth column. If
k 7^y\ we have the determinant of a matrix with two columns (they'th
and Ath) identical, and by Theorem 7.8 the result is 0. We have proved:

8.2 Theorem
For any A7-by-/? matrix A,

i(-D'
!

det Au
:

Sec. 8 Determinant Expansions 69

An exactly similar argument, using 8. 1R instead of 8.1, gives the "row"


form:

8.2R For any n-by-n matrix A,

v/ i\f+; a a
(det /I if k = i

The number (— \)
i+i det Au is called the ijth cofactor of the matrix A.
We shall abbreviate and write A for the matrix with entries a ti
it as a u .

Theorem 8.2 can be formulated as a statement about the matrix product


A'A. Thejkth entry in the product is the product of they'th row of A (i.e., 1

the y'th column of A) and the kth column of A. That is, it is the sum
n
^ a tj a lk . By Theorem 8.2 this is det A if y = k, and otherwise. Hence
«=i
A'A is equal to (det A)I, a numerical multiple of the identity matrix. A
similar calculation using 8.2 shows that AA is also equal to (det A)I. l

If det A =£ 0, we may divide A' by it and obtain a matrix B such that AB =


BA = I. Thus we have proved

8.3 Theorem

If det A 7^ 0, then A is invertible, and A' 1 = (det Ay^A 1


, where A
is the matrix of cofactors of A.

It is an important consequence of Theorem 8.3 that, if a matrix A is

known only to have a "right inverse" (BA = /) or a "left inverse" {AB =


I), then A is invertible and A' 1 = B. This is true because the product rule
for determinants applied to AB = I, or to BA = I, gives

(det A)(det B) = det / = 1.

But then det A ^ 0, so A is invertible. That A' 1 = B follows immediately.


(Why?)

Example 2. We shall compute the inverse of the matrix

12 3 4\

A = j
5 6 7

\8 9 0,
70 Vectors and Linearity Chap. 1

used in Example 1 . Write b u as an abbreviation for det A u the matrix


\ B is
then easily calculated to be

To obtain the matrix of cofactors, insert the factors ( — l) i+i , changing the
sign of every second entry and giving

63
Sec. 8 Determinant Expansions 71

the other hand, A(A~ l \>) = (AA'^b = b. In other words, the equations
have a unique solution, and it is /l _1 b. They'th entry in the column vector
A~ x
\t is the matrix product of the y'th row of A~* and the vector b. If
det A 7^ 0, we may express the elements of A' 1
in terms of cofactors of A
and obtain

(det A)' 1 2 (— l) i+> (det A u)b t

for this product. From 8.1 this may be recognized as (det A)' 1 det B U)
where B U) is the result of replacing they'th column of A by b. We have
proved:

8.4 Cramer's Rule

If the determinant of the matrix of coefficients of a system of n linear


equations in n unknowns x 1 , . . . , xn is different from zero, then
there is a unique solution and it is given by

det B U)
x< = ,

det A

where A is and B U) is the result of replacing


the matrix of coefficients
they'th column of A by the column of numbers that make up the right
side of the equations.

Example 3. Solve the system

xx —2x 2 +4x 3 = 1

2x x +3x 2 —x = 3 3.

We have

1 -2
-1 1

2 3

1 1

-1 2

2 3
72 Vectors and Linearity Chap. I

Expanding the determinants by their first rows gives

del A = (1)(2) - (-2)(3) + (4)(-5) -2 + 6- 20 =-12


det 5 (1)
= (1)(2) - (-2)(1) + (4)(3) = 2 + 2+12=16
det 5< 2 » = (I)(l) - (1)(3) + (4)(-7) = 1 - 3 - 28 = -30
det B {3)
= (l)(-3) - (-2)(-7) + (l)(-5) = -3 - 14 -5= -22.
Thpn v —
1 I1CI1 .*i —
JUL
12

- *
3» Av 2 — 3-0
12 — 2' Av — 12 — 6*
5
3
2 2 l_l

We have not made any assertions in this section about what happens
if the determinant of the coefficient matrix of a system of equations is zero.
It an easy consequence of the product rule that a matrix with zero
is

determinant cannot be invertible (see Problem 1 in Section 7), and it will


be shown in Chapter 2, where we discuss the solution of linear systems
in detail, that in this case the system of equations has either no solution or
infinitely many. While Cramer's rule and the formula for A _1 in terms of
cofactors are important as theoretical results and are quite useful for
solving systems of two or three linear equations, they are less efficient for
larger systems than the methods of Section 1, Chapter 2.

Since 8.1 withy = 1 is exactly like 7.1 except that it refers to the first
column instead of the first row of a matrix, it is clear that transposing a
matrix (which just exchanges the roles of rows and columns) should not
affect the value of the determinant. The formal statement and proof
follow.

8.5 Theorem

For any square matrix A, det A' = det A.

Proof. Let B= A'. By definition of the transpose, b u = aiu and


it is easy to see that Bn = A^. Thus by definition

det B = b n det- b det B +


Bn + (-l) n+1 b ln det B ln
12 l2 . . .

= a n det A'u - a n det A\x + + (-l)" +1 a nl det A'nl . . . .

The A u are — l)-by-(« — 1) matrices, and by the inductive


(/?

hypothesis we may replace det A'n by det An in the formula above.


The result is equal to det A by 8.1 (withy = 1), and we have proved
det A 1
= det B= det A.

We may now take any theorem about determinants that involves


columns of matrices and immediately derive a corresponding theorem
involving rows instead, by applying the given theorem to the transposes
Sec. 8 Determinant Expansions 73

of the matrices. We shall not always bother to write out these corre-
sponding theorems, but shall refer to the "row" version of a numbered
statement by using the same number with an R after it. (We have already
numbered 8.1R and 8.2R to conform to this convention.) For example,
Theorem 7.2R would read: If B is obtained from A by multiplying some
row by the number r, then det B = r det A.
Theorem 8.5 implies that, from any theorem about determinants that
involves columns of matrices, we can derive a corresponding theorem
involving rows instead, by applying the given theorem to the transposes
of the matrices. In particular, 8.1R is a consequence of 8.1 and 8.5. We
shall not bother to write out the row versions of the other statements
but may refer to them by the original statement number with an R after it.

The particularly important fact that a determinant is a linear function of


each of its columns was proved in the previous section. It now follows
that a determinant is also a linear function of each of its rows.
The following theorem (and the corresponding theorem for rows)
leads to an efficient procedure for computing the determinants of large
matrices.

8.6 Theorem

If C and A are n-by-n matrices and C is obtained from A by adding a


numerical multiple of one column to another, then det C= det A.

Proof. Suppose C is the same as A except that c y = a; + ra t . B


Let
be the result of replacing a ; in A with /-a,. By Theorem 7.4, det C=
detA +
det B. By Theorem 7.2, det B is r times the determinant of
a matrix with two identical columns. Therefore det B = and
det C= det A.

-4

The third column of C is equal to the third column of A plus 2 times the
first column. As in Example 5, Section 7, det A = —20. Then det C =
—20. As a check,

det C= (1)(- 16 - 25) - (3)(8 - 15) + (0)(10 - (-12))


= -41 + 21 + = -20.
74 Vectors and Linearity Chap. 1

(b) Let

A =
Sec. 8 Determinant Expansions 75

8.7 Lemma
If the first column of the matrix A is e8 , then

dety* = (-l) i+1 det^ a .

Proof. By Definition 7.1,

3 =1
column of A is e l5 then a u = 1, while for > 1 the first
If the first /'

column of A Xi is 0, and so det A Xi = by Theorem 7.3. Thus the


expression for det A reduces to one term, det A 1X and Theorem 8.7 ,

holds for the case i = 1 For / > 1 we need to use the inductive
.

hypothesis that Lemma 8.7 holds for (n — l)-by-(« — 1) matrices.


In this case a n = and

det ,4 = 2(-l)> +1 a l3 .
det Ar
3 =2

Each minor A Xi has zir_x for its first column. (Removing the top entry
from the vector e f in 31" gives e z _! in 3l n_1 .) By the inductive
hypothesis,

det4„ = (-iy det A ljtil ,

where we write A ljtil for the matrix obtained from A by deleting


the first row, they'th column, the /th row, and the first column. Let
B = A a Then (since the first column of B is formed from the
.

second column of A, etc.) A ljn = B x _ x and a u = b lti_v Com- ;

bining the equations we have derived so far gives

det A =i(~iy +1 b .U-l) 1


i
det 5 1 ,_ 1 .

3=2

By Formula 7.1 the right side of this equation is (— l) i+1 det i?,

which is (— l) i+1 det Aa , as was to be proved.

Proof of 8.1. We first prove 8.1 for the special case j = 1. The first

column of A can be written as a x = a^^ + «2i e 2 + • • • + a ni e n>


so by linearity of det and Lemma 8.7,

det A = det (a l5 a 2 , . . . , a„)

= a n det (e u a 2 , . . . , a„) + a 2l det (e 2 a 2 , , . . . , a„)

+ flni det ( e n, a 2 , . . . , aj

= |(-ir a det^ l
il 1.
j=i
76 Vectors and Linearity

To prove the general case, consider the matrix B obtained from A


by moving the/th column into the first position. This move requires
a series of/ — exchanges of adjacent columns; so by Theorem 7.6,
1

det A = (— l) j_1 det B. For all i, we have b a = a i} and B a = Au .

Thus we obtain

det/4 =(-l)'-1 detB

= (-ir i(-0 ,+ \idetB,


1
1
= 3 1

= i(-l)*"a„det4„
as was to be proved.

EXERCISES
1. Using appropriate cases of 8.1 or 8.1R, express each of the following as a
linear function of the jc's.

(a)

(b)
Sec. 8 Determinant Expansions 77

Check your answers by multiplication.

4. Solve the systems

(a) Ix + 6y = 5
'

6x + 5y = -3.
(b) 2x +y =
3y + z = \
Az + x = 2. M«5. x = --£ 2 S -I

(c) x x + x 2 + x3 + x4 = — 1
x — x2
1 + 2x4 =
->x 1 — X3 == *

x2 — xt = 0.

5. Use the method of Example 4(b) of the text to evaluate

/"'
78 Vectors and Linearity Chap. 1

which is zero except for the entries on or adjacent to the diagonal, is

called a tridiagonal matrix. Let dk be the determinant of the k-by-k minor


formed from the first k rows and columns of A, so, for example, d1 = a x
and d2 = a x a % — b 1 c 1 Show that, for k > 3, dk = ak dk _ l — b k _ 1 c k_ l dk_ i
. .

(b) Consider tridiagonal matrices in which the entries on the diagonal all
have the value 2 and the entries next to the diagonal all have the value 1.
Let dn be the determinant of an n-by-n matrix of this type. Find a formula
for dn [Suggestion. Start out by seeing what happens for n = 2, 3, 4.]
.
2

Linear Algebra

SECTION 1

A system of linear equations is a finite set of equations LINEAR EQUATIONS,


INVERSE MATRICES

a u Xi + . . . + a ln x n = bx

1.1 ...
a mlX l + • • • + a mn X n = t>m

where the a's and fs are given and the ;c's are to be determined. The whole
system can be written in matrix form as Ax = b, where A is the m-by-n
coefficient matrix with entries o i3 b is a column vector in %
,
m and x is a
,

column vector in 31". Doing the matrix multiplication in the equation

shows at once that it is equivalent to the system 1.1.

It frequently happens in applications that there are as many equations


as there are unknowns, so that m= n. Then, if the determinant of the

79
80 Linear Algebra Chap. 2

coefficient matrix not zero, the solution can be obtained by Cramer's


is

rule of Section 8,Chapter 1. The methods of the present section do not


use determinants, and we do not assume m = n; even for systems that
can be solved by Cramer's rule, these methods are more efficient when n
is greater than 3.

Any Ac = b is a solution of the system. As


vector c in 31" such that
we shall show, some systems have no solution, some have exactly one, and
some have infinitely many solutions.
We say that two systems are equivalent if they have exactly the same
set of solutions. Our procedure will be to take a given system and alter it

in a sequence of steps to obtain an equivalent system for which the solutions


are obvious. We illustrate the process with an example before giving a
general description.

Example 1.

3x + \2y + 9z = 3

2x + 5y + 4z = 4

-x + 3y + 2z = -5.

Multiply the first equation by \, which makes the coefficient of* equal to 1

and gives
x+ Ay + 3z = 1

2x+ 5y + 4z = 4
-x + 3y + 2z - -5.

Add (—2) times the first equation to the second, and replace the second
equation by the result. This makes the coefficient of x in the second
equation equal to and gives

x+ 4y + 3z = 1

- ly - 2z = 2

—x + 3y + 2z = —5.

Add the first equation to the third, and replace the third equation by the
result, to get
x + Ay + 3z = 1

- 3y - 2z = 2

ly + 5z = -4.

Multiply the second equation by —\, to get

x + Ay + 3z = 1

y+&= —§
ly + 5z = -A.
Sec. 1 Linear Equations, Inverse Matrices 81

Add (—4) times the second equation to the first, and (—7) times the
second equation to the third, to get

x ~r ~zZ = ~3—

y + §z = -I
17
3^

— 3- 2

Multiply the third equation by 3 to get

x ~r 3^ = "3"
y + |z= -f
z= 2.
Add (—5) times the third equation to the first and (— f) times the third
equation to the second to get

x= 3

y = -2
z= 2.

Clearly, this sytem has just one solution, namely, the column vector

It is easy to verify by substitution in a system of equations that we

have found a solution for them. This verification of course does not rule
out the theoretical possibility that the original equations might have other
solutions as well. In fact the final system is equivalent to the original
system and has the same set of solutions. The same is true for any pair of
systems where one is obtained from the other by steps such as were used
in this example. Before we can prove this, we must first state exactly

what operations are allowed and then investigate their properties.


The operations used were "multiplying an equation by a number,"
and "adding a multiple of one equation to another." We prefer to give the
formal definitions in terms of matrices and, accordingly, consider the
general matrix equation Ax — b.

We define three types of elementary operations which can be applied to


any matrix M\

An elementary multiplication replaces a row of M


by a numerical
multiple of the row, where the multiplier is different from 0.
82 Linear Algebra Chap. 2

An elementary modification replaces a row of M by the sum of that row


and a numerical multiple of some other row.

An elementary transposition interchanges two rows of M. We did not


use any transpositions in Example 1, but they are sometimes useful.

It is important to understand that, if an elementary operation is

applied to both sides of an equation Ax = b, the result is a new matrix


equation that is satisfied by every vector x that satisfied the original
equation. Equally important is the fact that each elementary operation
has an inverse elementary operation by which the original operation can
be undone or reversed. For example, multiplication of a row by a number
r ^ is reversed by multiplying the row by I jr. Similarly, an elementary
modification is reversed by subtracting the same numerical multiple of
the same row instead of adding it. Finally, a transposition is reversed by
interchanging the two rows again.
Written as a matrix equation, the original system of equations in
Example 1 becomes
3 12 9\ /3\

-1

If we apply the elementary operation of multiplying the first row by £


to the matrices on both sides, we obtain

which is form of the system of equations at the second step in


the matrix
Example [Link] way, each operation on the system of equations
In the
amounts to applying an elementary operation to the matrices on both
sides of the equivalent matrix equation. At the final stage, the matrix
equation becomes
0\ / 3>

1 |x = I -2
v
1/ \ 2,

which, since the matrix on the left is the identity matrix, simply amounts
to saying that x is equal to the vector on the right.
The theorem that justifies our method of solving linear equations is as
follows.
Sec. 1 Linear Equations, Inverse Matrices 83

1.2 Theorem

If the system A x x = bx is converted to a system A 2\ = b 2 by applying


a sequence of elementary operations to A x to get A 2 and the same
sequence of elementary operations to b x to get b 2 , then the two
systems are equivalent in the sense that they have the same solutions.

Proof. If x is a solution of Axx = b l5 then applying an elementary


operation to both sides produces a new matrix equation that is

still satisfied by x. So x also satisfies the equation A 2x = b 2 that


results from a finite sequence of such operations. Conversely,
having transformed A{x. = b x into A 2x = b 2 by a sequence of
elementary operations, we can find a sequence of inverse elementary
operations which transforms the new equation back to the original
one. But then the same argument that applied in the first part of the
proof allows us to conclude that every solution of A 2x = b2 is also
a solution of A±x — b x .

Example 2. We now exhibit a system of equations with infinitely many


solutions. Consider the matrix equation

-3

Add (— V) times the row to the second row, and then add 3 times the
first

first row to the third row to produce zeros in the second and third entries
of the first column, and obtain

Multiply the second row by (—1) to obtain

'1 -2 -3\ / 2^

1 5|x= j
-6
,0 -1 -5/ \ 6;

Add 2 times the second row to the first and then add 1 times the second
84 Linear Algebra Chap. 2

row to the third to obtain

1 7\ /-ION
1 5 )x = I -6
,0 0/ \ Oy

At the corresponding stage in Example 1 , we performed an elementary


multiplication to make the third entry in the third row equal to 1. We
were then able to obtain the identity matrix by further elementary opera-
tions. Obviously the row of zeros prevents us from following this pro-

into a system of linear equations. The result is

x +7z=-10
y + 5z = -6
Ox + Oy + Oz = 0.

The third equation is satisfied for any values of x, y, z. The first two
equations may be rewritten as x — — 10 — Iz and y = —6 — 5z. Thus,
for any value of z,

is a solution, and every solution has this form for some value of z. We
have now described the set of solutions of

and by Theorem 1.2 we know that this is the same as the set of solutions
of the matrix equation we started with.

Example 3. Consider the matrix equation

1 -2 -3\ /2\

\ -2 —^ x= 7

-3 5 4/ \2,
Sec. 1

The matrix on the left is the same as the one in Example



Linear Equations, Inverse Matrices

2. Carrying out
85

the same sequence of elementary operations yields

ION

2y

/, 7^
Whatever x is, the third row in the product 10 1 5 |x will be zero

— 0—
\0 0/
because the third row of the left factor is Thus no value of x can give
zero.
a column vector with 2 in the third row, and we conclude that the equation
has no solution.

equations, we obtain
x + 7z=-10
y + 5z = -6
Ox + Oy + Oz = 2.
The last equation obviously cannot be satisfied for any values of x, y, z.)

In these examples we used elementary operations to transform the


original systems of equations into equivalent systems for which the
solutions were easy to find. The property of the final set of equations
that made was that each equation involved a
the solutions obvious
variable that did not appear in any of the other equations. Thus in
Example 2 we found x = — 10 — Iz, y — —6 — 5z, and because the
first equation involved x but not y, and the second involved y but not x,

we could find the values of (x, y, z) satisfying both equations by consider-


ing the equations separately. In practice, there is no difficulty (beyond the
labor of doing the arithmetic) in reducing any system of equations to a
system that has this "noninterference" property. For such a reduced
system, it is either obvious that no solution exists, as in Example 3, or
else the solution or set of solutions can be described explicitly as in Ex-
ample 1 or Example 2. We leave for Section 3 the proof that this procedure
can always be carried out.
Suppose we have a square matrix A and are able to convert it to the
identity matrix / by some sequence of elementary operations, as in
Example 1. Theorem 1.2 then asserts that, if we carry out the same se-

quence of operations on a vector b and get c as a result, then Ax — b


86 Linear Algebra Chap. 2

ifand only if /x = c. In other words c is the (unique) solution of Ax b. =


Carrying out these same operations on the identity matrix gives a matrix C,
whose y'th column c t is the solution of Ac = e J5 where e, is theyth column
}

of /. Hence AC = I. This suggests that C = A' 1 and that we have


arrived at a method for computing matrix inverses. This is so, and we
state the result formally.

1.3 Theorem

If an n-by-n matrix A can be converted to the n-by-n identity matrix


/ by a sequence of elementary operations, then A is invertible and
A l
is equal to the result of applying the same sequence of operations
to/.

Proof. The preceding discussion showed that if C is the result of


applying to / some elementary operations that convert A to /, then
AC = I. To show that and C = A -1 we must show
A is invertible ,

that CA = I. The argument depends on the fact that elementary


operations are reversible. Now since C was constructed by applying
elementary operations to /, it follows that C can be converted back
to / by elementary operations. Applying to C the argument that we
previously applied to A shows that there is some matrix D such that
CD = I. We now have
A = AI = A{CD) = {AC)D = ID = D;
so D= A. Thus CA = I, as was to be proved.

Example 4. We shall find the inverse of

'2 4

A =

We start with
4

-3 -1,
Add —2 times the second row to the first and —1 times the second row
to the third to get

'0 4 8\ /l -2 0\

1
Sec. 1 Linear Equations, Inverse Matrices 87

Multiply the first row by \ and then add 3 times the first row to the third
to get
'0 1 2\ n -\ ON

o I, I 1

Multiply the third row by — 1 and then add —2 times the third row to the
first to get
'0 1 0\

1 I.

1,
v

Transpose the first and second rows to get

1 0\ /

oio, 1

.0 1/ \-|
The last matrix on the right is A~ x , as may be verified by multiplying by A.

EXERCISES
1. Solve the following systems of equations:

'

/
88 Linear Algebra Chap. 2

3. Solve the matrix equations:

1 2
(a)
5 6

4. For each of the following matrices A, find A 1 if A is invertible, and find a


nonzero solution of Ax = if A is not invertible.

Ans.

SECTION 2

SOME In this section we shall look at some applications of vectors and linear
APPLICATIONS equations. The selection of examples is made so as to avoid technical
complications from the fields of application. Several of our examples
involve the notion of a network, which we define to be a finite collection
of points, or nodes, some of which may be joined to some others by line
segments. It is theoretically unimportant whether a network is visualized
as lying in 2-dimensional space or 3-dimensional space; we choose which-
ever is pictorially more convenient. Some networks are illustrated in
Fig. 1.
Sec. 2 Some Applications 89

Figure 1

Example 1. Some electrical circuits can be considered as networks in


which each line represents a connecting wire with a given electrical re-
sistance. When such a circuit is connected to a battery or similar power
source, currents flow in the connections, and a value of electrical potential
is established at each node. For many types of connecting wires, the
current flow in a wire is proportional to the difference in the potential at
its two ends. In quantitative terms,

(»< - v t)
(1)

where c u is the current flowing from node to nodey, r a is the resistance /'

of the connection between nodes i andy, and vt and vt are the values of the
electrical potential at nodes
andy. (The appropriate units of measure-
/'

ment are amperes ohms


for resistance, and volts for potential.)
for current,
A negative value for the current from i toy is to be interpreted as a current
fro my to /'.

Figure 1(a) shows a circuit with four nodes and five segments, with
the resistance of each segment indicated beside it. Suppose an external
power source is connected at nodes 1 to 4 to maintain values v 1 = 12
and v4 = 0. Since node 2 has no external connection, the current flowing
in must balance the current flowing out, so that if signs are taken into
account, the sum of the currents out of node 2 must be zero. Using (1),
we get the equation

K»i - v i) + ip 2 - v3) + £(y 2 - v 4) = 0.

When rewritten in the form

(i + i + i)»2 to, + v3 + lv*

we see that v 2 is a weighted average of vlt v 3 and y 4 with coefficients , ,

which are the reciprocals of the resistances in the lines joining node 2 to
the others. A similar equation will hold at any node that does not have an
external connection. Thus at node 3 we get

(i + 1 + i)» 8 to. + v2 + *y4


90 Linear Algebra Chap. 2

Ifwe put in the assumed values vx — 12 and vt — 0, and rearrange terms,


we get
kv
3 29 t/ — y, = 6

+ fy8
-v 2 2.

Solving this system gives v2 and y 3 = -9- as 6.22. Once the


= £ « 7.33
2

potentials are known, other quantities can be calculated directly. For


example, the current from node to node 2 is calculated as {t\ — v 2 )jr 12 =
1

i(12 — 7.33) = 2.33. Similarly, the current from node to node 3 is 1

^(12 — 6.22) = 0.96. The total current flowing from node 1 into the rest
of the network is then 2.33 + 0.96 = 3.29, which must of course be equal
to the current flowing into node 1 from the external source.

Example 2. We can consider vectors in JR,2 or 'Ji 3 as representing forces


acting atsome point which, for convenience, we take to be the origin. The
direction of the arrow is the direction in which the force acts, and the
length of the arrow is the magnitude of the force. Our fundamental
physical assumption here is that, if more than one force acts at a point,
then the resultant force acting at the point is sum of the
represented by the
separate force vectors acting there. In Fig. 2 we have shown two different
pictures. The resultant arrow r is shown only in Fig. 2(a). For example,
suppose that the force vectors in Fig. 2(a) lie in a plane, which we take to
be :K
2
with the origin at the point of action. If we have

ff, = (-l,3), f2 = (4,3), (-2, -4), (2)

-»-f,

(a) (b)

Figure 2
,

Sec. 2 Some Applications 91

then by definition
r =f + 1 f2 +f = 3 (l,2).

But suppose we are given the directions of the three force vectors
and are asked to find constants of proportionality that will produce a
given resultant, say, r = (— 1, — 1). In other words, suppose we want to
find nonnegative numbers clt c 2 c 3 such that
,

Cjfa + c 2 f2 + c 3 f3 = (— 1, — 1).
(A negative value for some one of the c's would reverse the direction of
the corresponding force.) This vector equation is equivalent to the system
of equations we get by substituting the given vectors (1) that determine
the force directions. We find

Ci

IMaMiH-i) (3>

—c +x 4c 2 — 2c 3 = —1,
3c +x 3c 2 — 4c 3 = — 1.

Since we have two equations and three unknowns, we would expect, in


general, to be able to specify one of the cs and then solve for the others.
However, recall that the c's are to be nonnegative. (In particular, a glance
at Fig. 2(a) shows that we could not get a resultant equal to (—1,-1)
unless c 3 is actually positive.) Hence, we try c 3 = 1. This choice leads to
the pair of equations
-<?! + 4c 2 = 1

C\+ c2 = 1

which has the solution c x = f c 2 = f Then the triple {c x


, . , c 2 , c3 ) =
(f f 1) is one possible solution, and the three force vectors are
, ,

cA = (-f, f), c 2 i2 = (f , f), c 3 f3 = (-2, -4),

with magnitudes

IcAl = fVlO, |c 2 f2 |
= 2, |c 3 f3 |
= 2^5.

We could equally well have asked for an assignment of force magnitudes


that would put the system in equilibrium, that is, so that the resultant is

the zero vector. We would then have replaced the vector (— 1 , — 1) on the
right side of Equation (2) by (0, 0), and solved the new system in a similar
way.

Example 3. In this example we consider the idea of a random walk


in which an object always moves among finitely many positions along
92 Linear Algebra Chap. 2

specified paths, each path having a certain probability or likelihood of


being used. To be specific, we assume that the probability of leaving a
certain position along some
is the same for all paths leading
particular path
away from that position. Some sample configurations are shown in Fig. 3,
and it is clear that they form networks.

O l«5

(a) (b)

Figure 3

It is is always a number between


understood that a probability and 1

inclusive,and that the probability of a particular event is equal to the sum


of the probabilities of the various distinct ways in which that event can
occur. Thus in Fig. 3(a), since we assume that all paths away from a 5
are equally likely, it will happen that the probability of leaving a b along
each of the two possible paths is \. Similarly, each of the three paths
from a 4 has probability g. We further assume that transition from one
position to another is such that the probability of two successive events is
equal to the product of their respective probabilities. Thus going from
o 2 to a x to fl
4 has the probability (£)($) \. —
We can now ask a question such as the following: What is the proba-
bility p k of starting at a k and arriving at the specified position a 5 without
,

going to a4 l We see that, starting at a x , we can go to a b directly with


probability g, or we can go to a 2 with probability J-
and then go to a 5
with probability p 2 Thus
.

Pi = £ + (£)/>2

Similarly, because going to a 4 does not occur in the events we are watching,

Pi = (¥)Pi + i\)Pz

Pa = (*)/>•

We can rewrite these equations as

Pi - (i)/>2 = a

-(£)/>! +/>2-(i)/>3 =

— (h)P2 + Pz =
Sec. 2 Some Applications 93

and solve them by routine methods. We get

Pi = f» Pz — t> />3
= t-
It appears that, the nearer we the more likely we are to get to a 5
start to cr
5 ,

without going to a 4 . However,


important to understand that, in
it is

general, the values of the probabilities depend on the entire configuration.


An analysis like the one just given would be useful in distinguishing
completely random behavior of a creature in a maze from purposeful or
conditioned behavior.

Example 4. A system of interconnected water pipes can be thought of


as a network in which the nodes are joints. It is usual in such a network to
assign each pipe a natural direction of flow, indicated by arrows in Fig. 4.

(a)

Figure 4

Then a positive number r will represent a rate of flow in the assigned


direction, while a negative number — r will represent a flow of equal rate
in the opposite direction. We shall separate the flow rates into internal
and external rates denoted by tk (Specifying an external rate at
rates r k .

any joint has the effect of closing off the external pipe at that joint.) We
assume that the inflow at any joint must equal the outflow. Thus at the
upper left corner in Fig. 4(a) we find t1 r1 + r 2 while at the lower left = ,

we find tz + r2 — r3 For the network of Fig. 4(a), the complete


. set of
equations relating the r's and the r's can be put in the form

riH
94 Linear Algebra Chap. 2

In vector form we can write At = t, where A is the 5-by-6 matrix

1 1 o\

-1 0-1 1

0-1 1

0-1 1

The vector equation shows that there is a linear relation between t and r,

namely, that t is a linear function of r. This implies that, the flows rk


if

are all multiplied by a number c, then so are the flows t


k . Similarly, if two
sets of flows rk and r'
k are superimposed, then the resulting external flows
are given by t = A(r + r'). This phenomenon is called the superposition
principle for linear systems. Of course, specifying the r's will completely
determine what the t's must be.

Turning the problem around, we can ask to what extent specifying


the external flows at the joints will specify the flows rk in the pipes. In
particular, we can try specifying that the exterior flow tk at each joint
should be zero. This leads to the system of five equations in six unknowns

/i + r2 =0
— »i — r« + r5 =0
—r 2 + r3 =0
r6 + r< = 0.

We can let r 6 = a be any number; then we get r 5 = —a from the last

equation. Now let r 4 = b be any number. The remaining four equations


are easily solved in terms of a and b, and we find the solution

r = {—a — b, a + b, a + b, b, —a, a).

A simple check shows that these assignments for r x through r 6 will satisfy the
above system. It follows that there are infinitely many different pipe flows
that will produce zero external flow at each joint. Similarly, if we have
a solution r of the original system (4) for some given vector t on the right
side, then by the superposition principle any of the solutions r can be
added to r to give a new solution r + r; we have

A(t + r) = Ar + At
= +t= t.

Example 5. The derivation of Simpson's rule for approximate in-

tegration is based on the requirement that it should give exact results


when applied to quadratic polynomials. The rule gives an approximation
/

Sec. 2 Some Applications 95

to the integral of a function over an interval (a — //, a + h) in terms of the


value of the function at the end-points and mid-point of the interval, in the
form

I
"
'f(x) dx ^ uf(a - h) -:-
if (a) + wf(a + h)
Ja-h
where it, v, and w are constants. If the formula is to be correct for all
polynomials of degree less than or equal to 2, it must in particular be
correct for the polynomials f (x) = U /i(-v ) = fix) = x 2 Each
x, ar>d .

of these gives an equation for u, v, and w. For instance, for/ (.v) = we 1

have
l+
f (x) dx = 2h and fQ (a - h) =f (a) =/„(a + h)
I -A
so u +v+ u' = 2//. Using /^.v) = .r and 2 (.\)
= .y
2
similarly gives the
equations
(a — h)u + av + (a + h)w = 2ah
and
(a 2 - h2 - 2ah)u + a2 v + (a 2 + h2 + 2a/2)tv = 2a 2 /? + f/?
3
.

These equations are easily solved (see Exercise 13) and give the result
u = vv = \h, v = ^h.
We have obtained a rule which is correct for the particular polynomials
f ,fi, and/ Its correctness for any quadratic polynomial follows readily.
2 .

Let us write E(f) for the error committed when the rule is applied to a
general function/, so
a+k
E(f) - \
Ja—h
fW ^ - \hf(a - h) - if (a) - \f{a + h).

It is easy to see that £ is a linear function from the vector space of poly-
nomials to JR.
1
, and of course E(f )
E{f^) E(f2 ) = = = by construction.
If/is any quadratic polynomial, so f(x) px 2 qx = + + r, then/= pf2 +
qf x
-^
f and by linearity

£(/) = P E(f + qE(f) +


2) /-£(/„) - 0.

The same method can be used to derive a wide variety of formulas


useful in numerical analysis. (See Exercise 14.)

EXERCISES
Figure 1(b) shows an electrical network with the resistance (in ohms) of each
edge marked on Suppose an external power supply maintains node A at
it.

a potential of 10 voltsand node B at a potential of 4 volts. Following the


procedure of Example 1 in the text, set up equations for the potential at
the other nodes and solve them. From the results, calculate the current
flowing into the network at A.
96 Linear Algebra Chap. 2

2. The edges and vertices of a 3-dimensional cube form a network with 8 nodes
and 12 edges. Suppose each edge is a wire of resistance ohm, and that two 1

of the vertices have external connections that maintain one of them at a


potential of and the other at a potential ofO. Find the value of the potential
1

at the other vertices, and the current flowing in the external connections if
(a) the two vertices with external connections are at opposite corners of the
cube
(b) they are at the two ends of an edge
(c) they are at opposite corners of a face of the cube.

3. (a) Suppose that three forces acting at the origin in ft 3 have the same
directions as (1, 0, 0), (1,1, 0), and (1,1, 1). Find magnitudes for the
forces acting in these directions so that the resultant force vector will be
(-1,2,4).
(b) Can any force vector be the resultant of forces acting in directions
specified in part (a)?

4. Show that, if three linearly independent vectors 3


in :R are used to specify
the directions of three forces, then magnitudes can be assigned to these forces
to make the resultant force equal to any given vector.

5. Ifforcesactin Jl 2 in the directions of (2, 1), (2, 2), and (-3, -1), show that
magnitudes can be assigned so that the system is in equilibrium.

6. Suppose that a random walk traverses the paths shown in Fig. 3(a). What is

the probability p k k = 1, 2, 3, that a walk starting


, at a k goes to o4
without passing through o 5 ?

7. (a) Suppose that a particle traces a random walk on the paths shown in
Fig. 3(b). Letting pk be the probability of going from b k to b 6 without
going through b b fmd ,
pk for k = 1,2, 3, 4.
(b) How is the result of part (a) modified if b t and the path leading to it are
eliminated altogether?
(c) How is the result of part (a) modified if a new path is introduced between
6 4 and A 6 ?

8. Suppose that various mixtures can be made of substances S lt S2 S3 54 , ,

having densities 2, 17, 3, and 1 respectively, measured in grams per cubic


centimeter. Suppose also that the price of each substance in cents per gram
is 4, 51, 3 and 1 respectively. Is it possible to make a mixture weighing 10

grams, with a volume of 20 cubic centimeters and costing dollar? 1

is such that/(ej) = (1,2, —1),


3 3
9. Suppose that a linear function/from 3l to Jl
/(e 2 )= (0, 1, 1), and/(e 3 ) = (1,2, 1). Find a vector xlt x 2 x 3 such that ,

f(\ k ) = e k , for k = 1,2, 3.

10. Let /be the linear function from :R


3
to R3 with matrix

\0 1 3/

Show that the points x in X3 such that/(x) = all lie on a line.


Sec. 3 Theory of Linear Equations 97

11. (a) Suppose the vector t = (fj, t 2 , t 3 , f4 , t 5 ) in Fig. 4(a) is specified to be


t = ( — 1, 0, 1, 2, 1). Find a vector r that determines consistent internal
flow rates,
(b) Verify that the vector r = (-a - b, a + b, a + b, b, —a, 0), for any
a and b, is consistent with external flow t = in Fig. 4(a).

12. (a) Let the external flow vector in Fig. 4(b) be given by t = (1, 1, 2, 4).
Show that there is more than one consistent internal flow vector r, and
find two of them,
(b) Let the external flow vector in Fig. 4(b) be given by t = (1,0, 1, 1).

Show that there is no consistent internal flow vector.

13. Carry out the solution of the equations for u, v, w given in Example 5
of the text. [Suggestion: begin by subtracting a times the first equation
from the second and a 2 times the first equation from the third.]

14. By using the method of Example 5, find constants /, u, v, w such that


the formula

a+Sh
f(x) dx = tf(a) + uf{a + h) + vf(a + 2A) + wf(a + 3/0
I
Ja

is exact whenever / is a polynomial of degree less than or equal to 3.

SECTION 3

In this section we are concerned with formalizing the method of solving THEORY OF LINEAR
linear equations that has been illustrated in the two preceding sections, EQUATIONS
and in proving that it always works. We begin by giving a precise definition
for the "noninterference" property discussed informally after Example 3

of Section 1. At that point we expressed the idea by saying that each


equation of a system contained a variable that did not occur in any other
equation. The following definitions express this "noninterference" prop-
erty in terms of the coefficient matrix.
An entry in a matrix is called a leading entry if it is the first nonzero
entry in its row. Thus in

3
Linear Algebra Chap. 2

3.1 (a) Every column containing a leading entry is zero except for the
leading entry.
(b) Every leading entry is 1.

In discussing reduced matrices it is frequently necessary to distinguish


the columns that contain leading entries from those that do not. (Of
course a row contains a leading entry if and only if it is nonzero.) We
shall say that the columns that do contain leading entries are pivotal. In
a reduced matrix, each leading entry belongs to one nonzero row and one
pivotal column; we shall say that the row and column are associated.
This gives a one-to-one correspondence between nonzero rows and
pivotal columns, and it establishes the following fact which will be
referred to later.

3.2 In a reduced matrix, the number of pivotal columns equals the


number of nonzero rows.

Example 1. Consider the matrices

1
Sec. 3 Theory of Linear Equations 99

3.3 Theorem

Suppose A is a reduced matrix. If A has a row of zeros for which


the corresponding entry in b is not zero, then the equation Ax = b
has no solution. Otherwise it has at least one solution.

Proof. If any row of A is zero, then the corresponding entry in Ax


will be zero, whatever x may be. Thus if the corresponding entry in
b is not zero, there can be no solution.
the other half of the theorem, we assume that every zero
To prove
row of A (if there is any) is matched by a zero entry in b, and we
show how to write down a vector x that is a solution. The vector x
must of course have entries xx xn corresponding to the n
columns of A, while b has entries b u b m corresponding to the . . . ,

m rows of A. If they'th column of A is nonpivotal set x, = 0. If


they'th column of A is pivotal, let i be the number of the associated
row, and set x, = b We claim that the vector x constructed in this
t
.

n
way satisfies Ax = b. The /th entry in the product is 2 a ikxk- We
a:=i

must show that it equals b It is zero if the rth row of A is zero,


t
.

and by assumption b is then also zero. If the /th row is not zero,
t

then a ti = 1, where y is the associated pivotal column. By construc-


tion, Xj — b so a u Xj = b
t ,
The terms a ikxk with k =£j are all zero
t
.

because a ik — if the &th column is pivotal (because A is reduced)

and xk = if the &th column is not pivotal (by construction).

Example 2. Consider

W 1 5 0\

= .12
3 0.
A , bx = ^ ,
b2 =

^0 1,

The third row of A is zero and the third entry of \i x is not; so according to
Theorem 3.3 the equation Ax = b x has no solution. On the other hand,
the third entry of b 2 is zero. The pivotal columns of A are the first, third,
and fifth, with associated rows the second, first, and fourth. The proof
of the theorem shows that, if we construct
100 Linear Algebra Chap. 2

by making the first, third, and fifth entries equal to the second, first, and
fourth entries of b 2 and making the other rows zero, then Ax will be equal
,

to b 2 This is easily verified by doing the matrix multiplication.


.

The next problem is how to tell whether an equation that has a solution
has more than one. Theorem 3.4 and its corollary show that the question
can be reduced to a special case, and theorem 3.5 deals with this special
case.

3.4 Theorem

Suppose x is a solution of Ax = b. Then Xj is also a solution if

and only if Xj — x is a solution of Ax = 0.

Proof. We have A{x x — x ) = Ax — Ax =x b — b = 0. Con-


versely, if A{x x — x ) = 0, then Ax = Ax = b.
x

The equation Ax = is often called the homogeneous equation associ-


ated with Ax = Observe that the homogeneous equation always has
b.

at least one solution, namely, x = 0. From 3.4 we immediately obtain


the following.

Corollary

Suppose Ax = b has at least one solution. Then it has exactly one


solution if and only if x = is the only solution of the associated

homogeneous equation.

3.5 Theorem

Suppose A is reduced. If every column of A is pivotal, then x =


is the only solution of the homogeneous equation Ax = 0. Other-
wise the equation has solutions with x^O.

Proof. Suppose every column of A is pivotal. Let / be the number of


the row associated with column j. Then a u = 1 and a ik = for
k ^ j (because A is reduced and all columns are pivotal). Thus the
n
/th entry in Ax is ^ fl,*** = xt \ so if Ax = 0, xs = for ally.

Conversely, if r is the number of a nonpivotal column, we can


construct a nonzero solution of Ax — as follows. Take xr = 1
Sec. 3 Theory of Linear Equations 101

(which guarantees x # 0) and take x, =


is the number of any ifj
other nonpivotal column of A. column is pivotal, take
If the y'th

Xj = —a ir where i is the number of the row associated with column


/. As in the proof of Theorem 3.4, we look at the product Ax a row
at a time. Zero rows of A, of course, give zero entries in the product.
n
If the /th row of A is nonzero, we get the sum ^ a ,k xk- If k is tne

number of any pivotal column except the yth (where column j is


associated with row /"), we have a, k =
0. If it is the number of any

nonpivotal column except the rth, we have xk = 0. Thus the sum


reduces to a^Xj + x ir x T = 1 • (— a iT) + a ir •
1 = 0, as required.

Example Consider the homogeneous equation Ax = 0, where A is


3.

the matrix of Example 2. The second and fourth columns of A are


nonpivotal. The construction given in the proof of Theorem 3.5 can thus
be applied to give a solution with x 2 = 1, x 4 — 0, and one with x4 = 1,
x 9 = 0. The vectors obtained are

and

Any combination ry + sz is also a solution. (Why?) Using Theorem 3.4


to combine this information with the result of Example 2, we see that all
vectors of the form
'
-2

+ s

are solutions of

We have answered most of the questions about solutions of linear


systems that have a reduced coefficient matrix. If we can convert any
Linear Algebra Chap. 2

given system into a reduced one by elementary operations, then we have a


general method of finding solutions of linear systems. The examples of
Section 1 illustrate a reduction process that can in fact be applied success-
fully to any matrix.
Suppose a matrix is not reduced. Then there must be some column
containing a leading entry such that either 3.1(a) or 3.1(b) (or both) is

violated. If the column contains the leading entry r for the /th row,
_1
multiplying the /th row by r will make the leading entry 1. (Since r was

a leading entry, it could not be zero. Of course it might be 1 to begin with,


and the multiplication would be unnecessary.) If any other entries in the
column are nonzero, they can be made zero by adding suitable multiples
of the /th row to the rows they are in. Any column that was "correct"
before these operations must have a zero for its /th entry, and therefore
is unaltered by them. We have just described a process that can be applied
to any unreduced matrix and that increases the number of columns that
satisfy the Conditions 3.1. If the resulting matrix is not reduced, the
process can be repeated. A reduced matrix will be obtained within at
most n steps, where n is the number of columns in the matrix. We have
proved

3.6 Theorem

Given any matrix A, a sequence of elementary operations can be


found which converts A to a reduced matrix.

Example 4. We shall apply the method given in the proof of Theorem


3.6 to reduce the matrix

-2

Column 1 does not satisfy 3.1(a), but has a leading entry of in the first 1

row; so no elementary multiplication is necessary. Subtracting 2 times


row 1 from row 2, and adding row to row 3, clears the other two entries
1

to zero and gives

1 3 -2 0\

Column 2 does not contain a leading entry. The 4 in column 3 can be


Sec. 3 Theory of Linear Equations 103

converted to a 1 by multiplying the second row by \, which gives

3-2 0\
(1
1

Adding 2 times row 2 to row 1 and (—2) times row 2 to row 3 clears out
the other entries in column 3 to give

Multiplying row 3 by § and next adding h times row 3 to row 1 and \


times row 3 to row 2 gives the reduced matrix

as the final result.


In working this example we used the standard procedure given in the
proof of 3.6. This of course is not the only sequence of steps that will
give a reduced matrix, and a different sequence may require less arithmetic.

For instance, consider the matrix A 2 Adding row 3 to row


. 1 and subtract-
ing 2 times row 3 from row 2 gives

1
104 Linear Algebra Chap. 2

We now turn to the special case of systems of/? equations in n unknowns,


that is, systems with square coefficient matrices. The following theorem is

the one most important in applications.

3.7 Theorem

If A is a square matrix and the only solution of the homogeneous


equation Ax = is x = 0, then the equation Ax = b has exactly
one solution for every vector b.

Proof. By Theorem 3.6 we can convert A to a reduced matrix C and,


using the same elementary operations, convert b to a vector d so that
Cx = has the same solution set as Ax = and Cx = d has the
same solution set as Ax = b. The hypothesis therefore implies that
Cx = has no solution except x = 0. Since C is reduced, it follows
from Theorem 3.5 that every column of C is pivotal. The number of
nonzero rows of C is equal to the number of pivotal columns.
Since C is square, there can be no zero rows, therefore, by Theorem
3.3 Cx = d (and hence Ax = b) has at least one solution. That
there is exactly one solution follows from the corollary of Theorem
3.4.

We can now also prove the converse of Theorem 1.3, namely:

3.8 Theorem

If A is invertible, then it can be converted to / by elementary


operations.

Proof. If there is an x ^ with Ax = 0, A cannot have an inverse,


since then Ax = would imply A~~ x Ax = A~*0, i.e., x = 0, which
would be a contradiction. Otherwise A can be converted to a reduced
matrix C, all of whose columns are pivotal. Every column of C then
contains one and has all other entries 0, and the l's in different
1

columns belong to different rows. Appropriately changing the order


of the rows (which can be done by elementary transpositions) will
give the identity matrix. Thus A can be converted to / by elementary
operations.

We thus see that it is sensible to apply the method used for computing

inverses given by Theorem 1.3 to any square matrix A. If the method


succeeds, it shows that the matrix is invertible and computes A' 1 If the .
Sec. 3 Theory of Linear Equations 105

method fails, then one obtains a reduced matrix with at least one non-
pivotal column and, hence, by Theorem 3.5, a nonzero x such that Ax = 0,
which demonstrates that A is not invertible.
We conclude by observing that every elementary operation, when
applied to a column vector in %
n
is a linear function, called an elementary
,

transformation. Indeed, it is obvious that (a) multiplication of a co-


ordinate by r 7^ 0, (b) addition of a multiple of one coordinate to another,
and (c) transposition of two coordinates all have the properties of linearity.
It follows that each of these operations can be performed by multiplying

on the left by some n-by-n elementary matrix. (The precise forms of such
matrices are described in Exercise 7 at the end of this section.) This fact
enables us to prove the next theorem.

3.9 Theorem

An invertible matrix A can be written as a product of elementary


matrices. It follows that the linear function defined by

f(x) = Ax
can be expressed as a composition of elementary transformations.

Proof. By Theorem 3.8, the matrix A can be reduced to / by apply-


ing a product
Q = A1 ...Ak
of elementary matrices to A. Thus QA = I. But since each ele-
1
mentary operation is reversible, each matrix A t
has an inverse Aj .

Then Q has an inverse


0- 1 = A? . . . A?,
and so A = Q~H = A^ 1
. . . A^ 1
. This expresses A as the desired
product.

EXERCISES
1. Determine which of the following matrices are reduced. For those that are
not, state exactly how they violate the conditions. For those that are
reduced, list the pivotal columns and their associated rows.

1
Linear Algebra Chap. 2

2. For each matrix of Exercise 1 which is not already reduced, find an


elementary operation which changes it to a reduced matrix.

3. Let

For each of the equations Ax = r, Ax = s, Ax t, where A is the matrix of

Problem 1, determine whether the equation has no solutions, exactly one


solution, or more than one solution. If there is one solution, give it; if there
are more than one, give two different solutions.

4. Repeat Problem 3 using the matrices B, C, and D instead of A from


Problem 1. (Remember to get the coefficient matrix in reduced form.)

5. Show that if a square «-by-« matrix is reduced and has no all-zero row, then
every row and column contains n — 1 zeros and one 1. Hence show that it

can be converted to an identity matrix by elementary transpositions.

6. Determine the solution sets of the following systems of equations.

[Am. r(l, -2, 1) + (0, -9, ¥)•]

(a) Denote by D t
(r) a matrix which is the same as the identity matrix
except for having r in place of 1 in the /th diagonal entry. Show that the
Sec. 3 Theory of Linear Equations 107

matrix product
D (r)M
f

is the elementary modification of M that results from multiplying the


/th row of M by r.

(b) Denote a matrix with 1 for its //th entry and 0's elsewhere by Eu . For
example, for 3-by-3 matrices,

£,,= 10001 and E,

Show that, if M is an m-by-n matrix and Eti


and / are both m-by-m,
then
(/ + rEu)M
is the elementary modification of M which results from adding r times
they'th row to the /th row.
(c) Denote by Tti
the matrix obtained from / by exchanging the /th andy'th
rows. Show that exchanging the /'th andy'th rows of M gives
TijM.

(d) Any matrix of the form (/ + r2s#), Dfy), or T^ is called an elementary


matrix. Show that each of the three types of elementary matrices is

invertible by verifying that, for /'


y^ j,

(I + rEvT1 = (/ - rE ),
i} D&T X
= />,(;] ,
and Tj = Tu .

8. Prove that if there are more unknowns than there are equations in a linear
system, then the system has either no solutions or infinitely many.

9. Suppose A is a square matrix. Prove that A is invertible if and only if every


matrix equation Ax = b has at least one solution.

10. Let p(x) = a + a xx + + a n x n be a polynomial of degree < n. It is a


. . .

well-known theorem of algebra that if there are more than n values of x


which make p(x) = 0, then all of its coefficients are zero. Use this theorem
to show that if x x n are any n + 1 different numbers, and b
, . . . , bn , . . . ,

are any n + 1 numbers, then there is exactly one polynomial of degree < n
such that/?O ) = b ,p(x n ) = b n [Hint. Show that the problem leads
, . . . .

to a system of linear equations with a a n as unknowns.] , . . . ,

11. A reduced matrix in which the first pivotal column (starting from the left)
is associated with the first row, the second pivotal column is associated with

the second row, etc., is said to be in echelon form. Show that:

(a) Any reduced matrix can be put in echelon form by a sequence of


elementary transpositions.
(b) If a matrix is in echelon form, then the zero rows (if there are any)
come last.
108 Linear Algebra Chap. 2

(c) A square matrix in echelon form is either an identity matrix or has at


least one zero row.

SECTION 4

VECTOR SPACES, In the earlier parts of the book we have restricted ourselves to vectors in
SUBSPACES, n
'Ji . In this section we consider more general vector spaces, though some
DIMENSION
of the ideas have already been introduced with ii\
2
and Jl 3 as the main
examples.
Recall that a vector x is a linear combination of the vectors xXl . . . , xn
if there are numbers rlt . . . ,r„ such that

x = r1 x 1 + ..'.+ rn \ n .

Example 1. Let

l
= (1,0,0), x2 = (0,1,0), x3 = (0, 0, 1), k = 0,1,1).
Then y = (2, 2, 0) is a linear combination of x 1 and x 2 because it is

equal to 2x x+ 2x is a linear combination of x and x


2 ; because
it is 3 4 it

equal to 2x — 2x On the other hand, is not a linear combination of x


4 3. it 2

and x because rx + sx has a first entry of whatever the values of r and


3 2 3

s; so y = rx + sx is impossible. 2 3

Since = Oxj + . . . + 0x„, the zero vector is a linear combination of


any set of vectors. The linear combinations of a single vector x x are just
the numerical multiples rx r .

If a set of vectors lies in a plane through the origin in 3-space, then every
linear combination of them lies in the same plane —
x and y recall that if
are in a plane through the origin, the parallelogram rule makes x + y
a vector in the same plane. Any numerical multiple of x lies in the same
plane because it lies in the line containing x. Any linear combination of
x 1} . . . up by multiplications and additions, and if the vectors
, x„ is built
Xj, x„ lie in a plane, so do all linear combinations of them.
. . . ,

Similarly, if x x x n all lie in one line through the origin (so they
are all multiples of some one vector), any linear combination of them
lies in the same line.

These remarks suggest the following generalization which includes lines


and planes as special cases: A subset S of a vector space IT is called a
linear subspace (or, frequently, simply a subspace) if every linear com-
bination of elements of Sis also in S. We assume S is non-empty.

Example 2. We list some examples of subspaces.

(a) The set of all vectors in %n with first entry equal to is a subspace,
since any linear combination of such vectors will also have a for its first

entry.
.

Sec. 4 Vector Spaces, Subspaces, Dimension 109

(b)
C
For any vector space U,
C
U itself is a subspace. The term proper
subspace is often used to refer to those subspaces that are not the whole
space. In any vector space the zero vector forms a subspace all by itself.

This subspace is called the trivial subspace (because it is).

(c) For any linear function 17 — > ID, the set JC of vectors x in U
with/(x) = is a subspace of T) called the null space off. If x l5 . . . , x fc

k k
are in JV and x = 2 r t x i> tnen /( x ) = 2 r if( x i) = 0> because / is

linear and all the/(x,) are 0. Hence x is also in JV, and so JV is a subspace
C
of \J. In particular, the set of vectors (x, y, z) in 'A 3 such that

x + 2y + 3z = 0,
or equivalently,

(1

3
is a subspace because it is the null space of the linear function from 'Ji to
Jl defined by the preceding l-by-3 matrix.

(d) The range of a linear function / defined on a vector space is a


subspace of the range space off. The reason is that for yu , ys to be
in . .

the range off means that there are vectors \ lf x k in the domain of / . . . ,

such that/(Xj) = y for


;
/ = \, . . . , k. But then, by the linearity off, an
arbitrary linear combination of the y, has the form

riYi + • • • + rkyk = r,/(Xi) + • + rkf{x k


• •
)

=f(r1x 1 + . . + rk x k
. ).

Because the domain of/is a vector space, r1x1 + + rk x k . . . is in it, and so


''iVi + • • • + fkJk s n tne ran g e off.
' ' In particular, the set of all vectors in
3
'Jl of the form
4\ /1\ /4\

5 =x
)0
is a subspace because it is the range of the linear function just defined by
the above 3-by-2 matrix.

We define the span of a set S of vectors to be the set consisting of all

linear combinations of elements of S. It is easy to show that the span of


k

any set is a subspace. Suppose x = ^ rtx i ' s a linear combination of


i=l
.

110 Linear Algebra Chap. 2

n
vectors x x x which are in the span of S, that x = £ s a u 3 f° r
, . . . , fc ,

kin \
is,
n
t
3=1
some vectors u_, in S. Then x = ^ rA ^= s^uA = £ ^, where /
;
=
k = i l \ 3 1 / = 3 1

2 fYfy- This shows that x is itself in the span of S.


i=i
We recall from Section 1 of Chapter 1 the general definition of a vector
space as a set 17 of elements with operations of addition and numerical
multiplication such that for any elements x, y, z of 1J and real numbers
r, s:

1 rx+ sx = (r + s)x.
2. rx + ry = r(x -f y).

3. r(sx) = (rs)x.

4. x + y = y + x.

5. X + y) + z = x + (y +
( z).

6. There exists an element in U such that x + = x.

7. For any x in U, x + (— l)x = 0.

Now if S is any subspace of a vector space D, then the operations of


addition and numerical multiplication, as given for 1), always yield a
result in S when they are applied to elements of S, because x + y =
lx + ly and rx = rx + Oy are linear combinations of x and y. The laws
1 through 5 certainly hold for x, y, z in S, since they hold for all elements
of IT. The zero vector belongs to S, since Ox for any x in S; also, =
if x is in S, then so is (— l)x = — x. Thus laws 6 and 7 hold as well. In

other words, we have proved the next theorem.

4.1 Theorem

Any subspace of a vector space is a vector space, with the operations


inherited from the original space.

While subspaces of :<{" provide important examples of vector spaces,


there are many others. We list some below.

c
Example 3. (a) Let l) consist of all continuous real-valued functions
of a Define/ + g and rf in the obvious way as the functions
real variable.
whose values for any number x are/(x) + g(x) and rf(x), respectively.
(Of course, we are using the theorems that/+ g and //are continuous if
Sec. 4 Vector Spaces, Subspaces, Dimension 111

/ and g are.) It is easy to verify that the laws for a vector space are
satisfied.

C
(b) Let P be the subspace of U consisting of all polynomials, i.e., all

functions /that can be expressed by a formula/(x) = a + ax x + . . . +


ak x k for some constants a , . . . , ak . (What needs to be checked to verify
that this is a subspace?)

(c) Let Pn be the subspace of polynomials of degree less than or equal


to n, i.e., those that require no power of x higher than the «th for their
expression. For k < n, Pk is a subspace of P n and all P n are subspaces ,

of P.

C a) be the vector space of real-valued functions f(x) whose


(d) Let
first k derivatives are continuous for all real x. Then C (fc4_1) is a subspace of
(co)
C If we denote by C
(fc)
. the vector space of functions having derivatives
,

of all orders, then C (00)


is a subspace of C for every k. (fc)

The description of lines, planes, and ordinary space as 1-, 2-, and 3-

dimensional is familiar. It is possible to define the dimension of any vector


space. The examples of lines and planes suggest that the span of A: vectors
should have dimension k. This is not quite right, since, for example, the
span of two vectors that happen to lie in the same line will be only a line
instead of a plane. To handle the question properly requires the concept of
linear independence introduced in Chapter 1. We recall the following
definition.
A set of vectors {x l5 . . . , xk} is (linearly) independent if the only set of
numbers r1} . . . , rk such that r^ + . . . + rk x k = is the set r t = r2 =
. . . = rk = 0. A set of vectors is (linearly) dependent if it is not inde-
pendent. For {x l5 . . . , xk } to be dependent therefore means that there are
numbers rx , . . . ,rk not all zero, such that r 1 x 1 + . . . + rk x k = 0.

Example 4. (a) The four vectors x = (2, 0, 0), x = (0, —2, 0), 1 2

x3 = (0, 0, 3), x = (2, —2, 3) are linearly dependent since x + x +


4 x 2

x — x = 0. The set of three vectors x


3 4 x x independent since x , 2, 3 is

rx + sx + rx =
x only if r — s = = 0.
2 3 t

(b) A set of two vectors x, y independent only if neither is a numer- is

ical multiple of the other. For example, if y = 3x, then 3x — y = 0, and

the vectors are dependent.

A set of vectors which is linearly independent and spans a space TJ


C
is called a basis for U.

Example 5. (a) The natural basis vectors e x , . . . , e„, where e, has 1

for its z'th entry and for all other entries, form a basis for 3\". Verification
that the e, are in fact linearly independent and span Jl" is left to the reader.
112 Linear Algebra Chap. 2

(b) In the space P„ of Example 3(c), the polynomials x —l,x1 =


x, . . . , xn = x" form a basis. Obviously, if f(x) = a + axx -f- + . . .

a n x" is a polynomial of degree less than or equal to n, it is the linear


combination o x + . . . + a,fx n of the x's. If a x + . . . + a nx n is the
zero function, then (since a polynomial of degree less than or equal to n
cannot have more than n roots unless its coefficients are all zero) a =
a1 = ... = an = 0.
While it is true that every vector space has a basis, we shall usually
consider only those which have a basis with a finite number of elements.
The next theorem implies that if a space is spanned by a finite set (which is

perhaps not linearly independent), then it has a finite basis.

4.2 Theorem

Let °l) be the span of the vectors x x , . . . , xn . Either 17 consists of


the zero vector alone, or some subset of the x's is a basis for 17.

Proof. If the set x x , . . . , x„ is independent, then it is itself a basis


for 1). Otherwise some relation rx x x + . . . + rnx n = holds,
with at least one r, say r k , different from 0. Then we can divide by
rk and obtain x fc
= — (r x //j.)xi
— ... — (r n /rk )x n where x k does not ,

appear on the right side. By substituting the right side for x A in any .

linear combination of all the x's, a linear combination is obtained


that does not involve x k . In other words, x fc
can be dropped and the
span of the remaining vectors will still be all of °0. If the resulting
subset is not independent, the process can be repeated. It must end
in a finite number of steps either because a basis has been obtained
or because all the vectors have been discarded. The latter is possible
only if the space contains only the zero vector.

The following theorem is the most important step in developing the


theory of dimension.

4.3 Theorem

If a vector space is spanned by n vectors xlf . . .


, x„, and y x , . . .
, yk
is a linearly independent set of k vectors in the space, then k < n.

[Link] statement is easy to prove if n = 1. In this case, the space


spanned by x x consists of all numerical multiples of x x Then for .

any two vectors y x and y 2 in the space, one must be a multiple of the
Sec. 4 Vector Spaces, Subspaces, Dimension 113

other, and so y x and y., cannot be independent. We proceed by


induction and assume that, in any space spanned by n 1 vectors, —
no independent subset can have more than n — vectors in it. 1

Suppose we now are given a space spanned by n vectors x 1; x„ . . . ,

and containing k vectors y1} yk with k > n. If we can show . . . ,

that the y's are dependent, then we will have shown that the state-
ment of the theorem holds for a spanning set of n elements, and the
inductive proof will be complete. Each of the vectors yu y n+1 . . .
,

can be written as a linear combination of x l5 \ n which means . . . , ,

that there are numbers a il


such that y, = ^ auxv f° r '
= !»••>
= 3 1

n + 1 • If the /i + 1 numbers a a are all zero, then the y's all lie in the
space spanned by the /; — vectors x 2
1 x w thus by the inductive , . . . , ;

assumption the y's would then be dependent, as we want to show.


Otherwise (by renumbering, if necessary) we may suppose that a n
is not zero. Then define n vectors z 2 , . . . , z n+1 by setting z, = y t

dix&idfi- By using the equations giving y, in terms of the x's, it is

easy to see that


n

j =2

so that the z's are linear combinations of the n — 1 vectors x 2 , . . .


,

x n By the inductive assumption, there are numbers


. r2 , . . . , rn+1 ,
not all zero, such that a- 2 z 2 -f r n+1 z n+1 0. . . . + = Using the
definition of the z's in terms of the y's, this last relation becomes

— au(>yru + . . . + r n+1 a n+11 )y! + r2y 2 + . . . + r ^yn+l = n 0.

But since not all the r's are zero, this implies that y x , . . .
, y n+1 are
dependent, as we wanted to show.

4.4 Theorem

Let TF be a vector space with a basis of n elements. Then every basis


for °0 has n elements.

Proof. Let {xx and {y l5 xj C


y k } be two bases for U. Since . . .
,

both sets are independent, and both are spanning sets, Theorem 4.3
implies that k < n and n < k.

The dimension of a vector space that has a finite spanning set is the
number of elements in any basis for the space. (The dimension of the
space consisting of the zero vector alone is defined to be 0.) We write
dim (1)) for the dimension of the vector space C U. Note that Theorem 4.2
114 Linear Algebra Chap. 2

guarantees the existence of a basis, and that Theorem 4.4 guarantees that
the dimension does not depend on which basis is taken.

Example 6. (a) By Example 5(a), dim (:K") = n.

(b) By Example 5(b), dim (P n ) = n + 1.

(c) The space P of all polynomials (Example 4(b)) does not have any
finite spanning set. If it did have one with k elements, then the fact that
1, x, x2 , . . .x k are k -\-
, 1 linearly independent elements of P would
contradict Theorem 4.3.

A vector space with a finite basis is said to be finite-dimensional. As


we have just seen in Example 6(c), there are spaces which are not finite-

dimensional.
Theorem 4.2 asserts that we can get a basis from a finite spanning set by
deletingsome of its members. The next theorem shows that, in a finite-
dimensional space, we can get a basis from a linearly independent set by
adding vectors to it.

4.5 Theorem

Let 5" = {x l5 . . . , Xj.} be a linearly independent set in a vector space


C
\J. If S is not already a basis, either it can be extended to a finite
C
basis for U, or else it can be extended to an infinite sequence of
independent vectors, and "TJ is not finite-dimensional.

Proof. Suppose x1% . .


.
, x k are linearly independent but do not span
C
all of U. Then there is some vector y that is not a linear combination
of x l5 . . . , xk . set x ls
Take x trl xk = y. We claim that the . . . , ,

xk+1 is linearly independent. Suppose r 1 x l + + rk+1 x k+1 = 0. We . . .

must show that all the r's are 0. If rk+l were not 0, we could write
x A.^ x = — (/'i//'i + i)x 1 — ... — (rk lrk+1 )x k which is impossible because
. ,

x k+1 is not a linear combination of the other x's. Therefore we have


rk+i — and r^! + + rk x k = 0. Since x x x k are inde- . . . , . . . ,

pendent, the last equation implies rx = . . . = rk = 0. In other


words, if a linearly independent set does not span a space, then a
vector can be added to it so that the resulting set is also independent.
This process of adding vectors can be repeated unless a spanning
set is reached. If a spanning set is reached, then it is a basis and H) is

finite-dimensional. Otherwise an arbitrarily large independent set


can be found, and
C
U cannot be finite-dimensional. This completes
the proof of the theorem.
Sec. 4 Vector Spaces, Subspaces, Dimension 115

4.6 Theorem

If S is a subspace of a finite-dimensional space 1) with dim (IT) = n,


then S is finite-dimensional and dim (S) < n. Any basis for S can be
extended to a basis for 17, and if dim (S) = n, then S = 17.

Proof. If S consists of alone it has dimension 0. Otherwise start


with a nonzero vector x x in S and apply Theorem 4.5. Since no
independent subset of 17 can contain more than n elements, the
construction must end with afinite basis for S. This basis for S is a

linearly independent subset of 17 and can if necessary be extended


(Theorem 4.5 again) to a basis for 17. Thus we have a basis for
S that is a subset of a basis for 17. If dim (S) = n, the subset must
be the whole set; so the same set is a basis for both S and 17, and
s = i7.

The following theorem states an important property of linear functions.

4.7 Theorem

Let / be a linear function defined on a finite-dimensional vector


space. Then
dim (null space off) + dim (range of/) = dim (domain of/).

Proof. As with any theorem about dimension, the trick to proving


this is to find suitable bases for the spaces involved. Let us write 17
for the domain of/ JV for its null space, and ID for its range. Let
v 1; . . . , vk be a basis for JV, and extend it to a basis v l5 . . . , \k ,

u l5 . . . (Theorem
, ur for 17. 4.6 guarantees the possibility of this
construction.) Then dim (JV) = k and dim (17) =k+ r. Let wx =
/(Ux), . . . , w =/(u
r r ).
We claim that wl9 . . . , wr is a basis for W,
which implies that dim (ID) = r and proves the theorem. It is

obvious that the vectors w l5 . . . , wr span ID, for if y =/(x) is any


vector in the range of/ we may write

k r

x = 2 a y + 2 Mi. t i

and then

y = I aj(jd + 2
=
t' l = Z l
WW = + 2 6,-w,,
2 =1

which shows that y is a linear combination of the w's. It remains to


be shown that w l5 . . . , wr are linearly independent. Suppose
,

116 Linear Algebra Chap. 2

2 6,-w,- = 0. This means that the vector ^ t>i u i


is > n the null space
i=l i=l r

of/ and is therefore equal to some linear combination ^ a v i> so


i

i=l
fljVx + • • • + ak y k — ^i u i — • • — b Tu r = 0. Since v l5 . . . \k u 1, ,
. .
.

u r are linearly independent, this is possible only if all the 6's (and all

the a's) are zero, which shows that the w's are linearly independent.

Example 7. Suppose m < «, and define/ from Jl" to 'Ji m by letting


/(x) be the m-dimensional column vector consisting of the first m entries of
x. The range of/ is all of Jl'",and the null space, of dimension n — m,
consists of the vectors in :H" whose first m components are all 0.

The definitions of line and plane can be unified in terms of subspaces


of :K". Because a subspace always contains the zero vector, we define a
line through the origin to be a 1-dimensional subspace and a plane through
z
the origin to be a 2-dimensional subspace. Examples in 'S\ are shown in
Fig. 5(a). A line or a plane that does not pass through the origin can be
obtained from one that does by a translation, that is, by adding a fixed
vector to every vector in the subspace. We say that a subset S of a vector
space is an affine subspace if it can be expressed in the form

^+ b,

where C
U is a linear subspace and b is a single vector. The dimension of an
c
affine subspace is of course defined to be the dimension of Vf, and a
1-dimensional affine subspace is usually called aline, while a 2-dimensional
3
affine subspace is called a plane. Examples in ,'K are shown in Fig. 5(b).
Two affine subspaces are parallel if they are translates of one another.
Two of the most important ways of describing subspaces, namely, as
range or null space of a linear function, provide standard ways of describ-
ing lines and planes. The parametric representation of planes discussed in
Sec. 4 Vector Spaces, Subspaces, Dimension 1 17

Section 2 of Chapter 1 is obtained by starting with a linear function


3\
2 — > %z having a 2-dimensional range with basis x 1; x 2 . We write

L(u, v) — ux r + vx 2

and then form an affine function

A(u, v) = ux 1 + vx 2 + x

by adding a fixed vector x . The representation of a line takes a similar


form,
A(t) = tx + x ,

in which the vectors /x x form a 1 -dimensional subspace as the range of a

linear function 'J\ — > !R


3
.

To represent a line or plane through the origin as the null space of a


linear function is to consider all solution vectors x of a homogeneous
equation
L(x) = 0,

where L is linear. An afhne subspace is then obtained by adding a fixed


vector x to the result or, alternatively, by solving instead

L(x - x ) = 0.

Because L is linear, this last equation can be written as a nonhomogeneous


equation
L(x) = L(x ). (1)

Methods for solving such equations are discussed in the earlier sections
of this chapter. If x = (x, y) is 2-dimensional then equation (1) would
take the form
ax + by = c,

which is a familiar way to represent a line. Ifx = (x, y, z)is3-dimensional,


a single equation
axx + b xy + cxz = dl
3
in general has as solution a plane in 'J\ , while a system of two equations

a xx + b xy+ cx z = dx
a.,x + b y +
2 c 2z = d2

has as solution the intersection of two planes, which is either a line or a


plane.

Example 8. If we try to solve the equations

x — y + 2z = 1

x +y + 3z =
118 Linear Algebra Chap. 2

by row operations, we add the second equation to the first to get

2x + 5z = 1

x +y + 3z = 0.

Multiplying the first equation by I and subtracting from the second gives

x + \z = J

We can represent the set of all solutions of this pair of equations para-
metrically as an affine subspace by setting z — t. We find

X = + 2"/ 2

/V — — It

— A2

Z = /

Thus we have found a line of solutions; it can be described by starting


with a line through the origin having the direction of x x = (— f, — £, 1),
and then translating by adding the vector x = Q, —\, 0). The
result is shown in Fig. 6.
If we have a system of three equations, its solution set is the
intersection of the solution sets of the three individual equations.
The usual situation is that the three planes intersect in one point,
y Changing the right side of the equations shifts the planes parallel
to themselves, and in this case they will always intersect in just
one point. Another possibility is for all three planes to be parallel.
The associated homogeneous system will have a two-dimensional
solution set; the inhomogeneous system will also have a 2-
Figure 6 dimensional solution set if the three planes happen to coincide.
Otherwise there will be no solution. Another possibility (which
has no analogue in two dimensions) is that no two of the planes are parallel,
but that the planes representing the solutions of the homogeneous equations
all pass through one line. Then (unless they all pass through one line) each

pair of the planes representing the solution of the //momogeneous equations


will intersect in a line parallel to the third plane, and there will be no

common solution for all three equations.


x

Sec. 4 Vector Spaces, Subspaces, Dimension 119

EXERCISES
1. Which of the following subsets of Jl 3 are subspaces ? In each case either show
that the subset is a subspace or find some linear combination of elements of
the subset that is not in the subset.

(a) All vectors (pcltx 2 x3 ) with x 1 + x 2


,
= 0.

(b) All vectors with x3 = 0.


(c) All vectors satisfying (a) and (b).
(d) All vectors satisfying (a) or (b).
(e) All vectors with x1 = (x 2 ) 3 .

(f) All vectors satisfying (a) and (e).

2. Let x x = (1, 2, 3), x2 = (-1, 2, 1), x3 = (1, 1, 1), and x 4 = (1, 1, 0).

(a) Show that x l5 x 2 x 3 x 4


, , is a linearly dependent set by solving an appro-
priate system of equations.
(b) Express x x as a linear combination of x 2 x 3 x 4 by a method similar to
, ,

that used in part (a). [Ans. x 1 = ^x 2 + -§


3
— ^x 4 .]

3. Let C[a, b] be the vector space of continuous real-valued functions defined


on the interval [a, b]. Let C [a, b] be the set of functions /in C[a, b] such
that /(a) =f(b) = 0.

(a) Show that C [a, b] is a proper subspace of C[a, b].


(b) Whatifthecondition/(a) =f(b) = is replaced by/ (a) = \,f{b) = 0?
4. Show that the intersection of two subspaces is always a subspace.

5. Part (d) shows that the union of two subspaces is not always
of Exercise 1

a subspace. Show that the union of two subspaces is a subspace if and only
if one of them is contained in the other.

6. Show that the range of a linear function may be a proper subspace of its

range space.

7. Let Si n - — JT" be the linear function defined by multiplication by the


??i-by-n matrix A. Show that its range is the span of the columns of A
(considered as elements of &m ).

8. For any two subsets A and 3i of a vector space, let .-t — 3i be the set of all

vectors that can be expressed as a sum a + b with a in A and b in 3i.

Show that if A and 3i are subspaces then so is A - 3\.

9. Show that if is the only element in the intersection of two subspaces S, 73,
then
dim (S + TJ) = dim (S) + dim (73)

[Hint. Show that a basis for S together with a basis for 13 gives a basis for
s + t;.]

10. Show that for any two subspaces S, 73,

dim (S + 13) = dim (S) + dim (13) - dim (S n 13).

[Hint. Start with a basis for S n 13 and extend it (Theorem 4.6) to a basis
for S and a basis for 13.]
1

120 Linear Algebra Chap. 2

11. Let Jl 3 J* 2 be defined by the matrix

1 -3 2

Find a basis for the null space of/, and one for the range of/. Verify that
Theorem 4.7 holds.

12. Describe the solution set of each of the equations or systems of equations
below as an affine subspace, that is, as a translate by a specified vector of a
linear subspace with a specified basis.

(a) x +y = 1.

(b) 2jc + y =1
2x - 3y + z = 2.
(c) x + 2y + 3z = 10
Ax + 5y + 6z = 1

7;c + Sy + 9z = 12. [Arts. t(\, -2, 1) + (0, -9, ^»).]


(d) x - y + z = 2.
13. For each of the following sets of three equations in three unknowns,
determine which of the geometric possibilities discussed at the end of this
section hold.

(a) x + 2y

(b)

(c)

SECTION 5

LINEAR FUNCTIONS In Chapter 1 we saw that matrices obey some of the same rules for
addition and multiplication that ordinary numbers do. Furthermore,
we have seen that given a linear function /from :K" to %m , there is an
m-by-/7 matrix A such that/(x) = Ax for all x in J{". Using these facts
we could prove that linear functions from 'A" to %m obey algebraic rules
just as their matrices do. However, it turns out that these same rules
apply to a wider class of linear functions and not just those representable
by matrices. For this reason we shall prove the rules in general form and
then apply them in Section 6 to a systematic analysis of some linear
differential equations.
We
begin by describing the operations of addition, scalar multiplica-
tion,and composition of functions. Let / and g be functions with the
same domain and having the same vector space as range space. Then the
function/ + g is the sum of/ and g defined by

(f+g)(x)=f(x) + g{x)
Sec. 5 Linear Functions 121

for all x in the domain of both/and g. Similarly, if r is a number, then rf


is the numerical multiple of/ by r and is defined by

r/(x) = r(/(x)).
We have already defined the composition of functions in Chapter 1,

but we repeat the definition here. We require now that the range of/ be
contained in the domain of g. Then g°f is the composition of/ and g
and is defined by

It is an easy exercise to prove that sums, numerical multiples, and com-


positions of linear functions are linear.

/ g
Example 1. Suppose 5l 2 > % 2
and .'R
2
3t 2 are given by

and
'Q-C-K -3C
x\ _ /2x+y\ 12 \\lx

y) ~\x + 3yj \1 3/\ 7


Then

v+ O= / +
3x + 2y\ /3 2\/x
2x + 2j/ ~\2 2/V
Also, 3/ is given by

3jc + 3j\ /3 3Wx


3x-3j/ "\3 -3/Vk.
Finally, g °fis given, according to Theorem 4.4 of Chapter 1, by matrix
multiplication as

'•'CM'C
; x -X
.

Linear Algebra Chap. 2

in the above example the matrix of/+ g is equal to the


Notice that
matrix of/ plus the matrix of g, and that the matrix of 3/ is 3 times the
matrix off. Of course we have already proved in Chapter 1, Theorem 4.5,
that the matrix of the composition g °/is the product of the matrices of
g and fin the same order. We can state the general result as follows.

5.1 Theorem

Let/and g be linear fu nctions from %n to 31 m with matrices A and B


respectively. Then:

1 The matrix of/ + g is A + B.

2. For any number r, the matrix of //is rA.

3. If g o/is defined, its matrix is BA.

Proof. To prove statement 1 we write

(f + g)(x)=f(x) + g(x)
= Ax + Bx = (A + B)x.

The first equality holds by the definition of/+ g, the second by the
relation of matrix to function, and the last by the general rule
AC + BC — (A + B)C for matrix multiplication and addition.
To prove statement 2 we write, similarly,

rf(x) = r(Ax) = (rA)x,

where the last equality follows from the rule B(AC) = (BA)C for
matrix multiplication.
Statement 3 is just the result of Theorem 4.5, Chapter 1.

Example 2. Suppose /and g are linear functions such that

'Q-Q- 'C)-t:

;:)-(:) 'Ch:
Then the matrices of/ and g are

2 -1\ /l
and
3 0/ \0 -1
Sec. 5 Linear Functions 123

respectively, by Theorem 4.2 of Chapter 1. By Theorem 5.1, the matrix


of/+ g is thesum of the matrices of/ and g:
2 3 "'
3 ~Vf
0/ \0
°)=(
-1/ \3 -1

Thus/ + g = h is a function such that /? ( ) = ( )


and hi )
= ( )

Similarly, the matrix of/— 2g is

2 -«_/, 0j_/0 -,
3 0/ \0 -1/ \3 2

Finally, the matrix of g °/is the product

^2 -1\/1 0\ /2 1
_
,3 0/\0 -lj \3

The close connection between combinations of matrices and combina-


tions of functions from 31" to 3i m suggests that the rules for matrix algebra
should hold for operations with functions also. In fact, these rules hold
for any linear functions for which the operations are defined.

5.2 Theorem

Let r be a number and let/, g, and h be linear functions for which

both sides of the following equations are defined. Then

(1) (f + g)°h=foh+goh
(2)fo(g + h)=fog + fo h
(3) (rf)og = r(fog)=fo(rg)
(4) ho (gof) = (hog) of.

Proof. These formulas are allproved by applying the function on


either side of the equality to an arbitrary domain vector x, and then
writing out the meaning of each side. We shall prove only Equation
(4), leaving the others as exercises. To prove Equation (4) we
observe that, by definition,

ho(gof)(x) = h(gof(x))
= Kg(A*))Y
Similarly,
(hog)of(x)=hog(f(x))

= h(g(f(*)))-
124 Linear Algebra Chap. 2

The results of the two computations are the same, so the formulas we
started with must be the same. Since x is arbitrary, h ° (g of) =
(h o g) of.

(0O)
Example 3. Let C (Jl) be the vector space of infinitely often differen-
tiable real-valued functions y of the real variable x. If we let D stand for
differentiation with respect to x, then

Dy=y',
and D is a function with domain C <oo)
(3l). The familiar rules about
differentiation stating that (/ + g)' =/' + g' and (cf)' = cf imply that D
is linear. Because the meaning will always be clear, we can omit the little

circle in writing compositions, so that DD will stand for D ° D.


The compositions D 2 = DD, D = DDD, etc., 3
are all meaningful,
and we have for example

D y=y",
2
D*y=y'".
We can even define
D°y =y
for occasional convenience. Combining powers of D by addition and
numerical multiplication leads to formulas like

D -D-
2
2, (D + 1)(Z> - 2),

and applying the rules (1) and (2) of Theorem 5.2 we get

(D + l)(D - 2) =D -D- 2
2.

If we apply either side of this last equation to a function y, we of course


get the same thing:
y" _ y> _ 2y.

Differential operators such as these will be studied in more detail in the


following section.
We have seen that addition and multiplication of matrices are related
to operations on linear functions. Similarly related to the idea of an inverse
matrix is that of an inverse function. A function/has an inverse function,
denoted /_1 , if
/-i /(x) = x

for everyx in the domain of/. Applying/ to both sides of this equation
gives/°/ _1 (/(x)) =/(x); therefore, we also have
/o/-l(y) = y,

for every y =/(x) in the image of/. We leave as an exercise the proof
that:
Sec. 5 Linear Functions 125

x
5.3 If/ is linear, then so is/ .

Because we have derived the techniques of computing matrix inverses


in several stages, we summarize here what we have proved so far. At
the end of Chapter 1 , we showed that

5.4 If A is an orthogonal matrix, then A l


= A 1
, where A' is the trans-
pose of A across its main diagonal.

In Theorem 8.3 of Chapter 1, it is proved that:

5.5 If det,4 ^ 0, then A is invertible, and then A' 1 = (l/detA) A'


where A is the matrix of cofactors of elements of A.

Finally, the row reduction method of the first section of this chapter is

an efficient way to find the inverse of an invertible matrix, that is, to find
A -1 such that
A- 1 A = AA^ = 1
/,

where / is the identity matrix.


The relationship between inverse linear function and inverse matrix
is not quite so straightforward as in the case of the other operations on
functions and matrices. The following example shows why.

Example 4. Let A be an n-by-n invertible matrix. Then the linear


function/from 'Ji" to :i\ n defined by ,

f(x) = Ax,

-1
always has an inverse function/ given by

/~x (x) = A^x.


Indeed,
/-i o/( x ) = A- Ax1
= x,

for all x in .'R". However, consider the linear function g from Jl to 3l 2

defined by
fx\
g(x)
126 Linear Algebra Chap. 2

It is easy to check that g is linear and that g has an inverse function g~ x


defined on the subspace t of '.l\
2
consisting of vectors having both co-
ordinates equal. We define

)= x, for in C.
w
j

But the matrix of g is

which is not invertible because it isn't even a square matrix. The difficulty
_1 2
is that g is not defined on all of ,'il , but only on the subspace C.

In the above example the crucial characteristic of the function g was


the dimension of its range or image. The idea
important enough to is

have a name: the rank of a linear function/is the dimension of the range of
/. Thus the linear function

f =
\y) \0 3/\

(2x\

2
has its rank equal to 2 because its range is all of 'J\ . The function

1 0\

\o oj\j

2
has its rank equal to 1 because its range is just the x-axis in Si For a .

n m with matrix A, we have the following


linear function / from 'Ji to 'Ji

useful criterion.

5.6 Theorem

Let/be a linear function with matrix A. Then the rank of/is equal
to the number of linearly independent columns in A.

Proof. The columns of A span the range of/ because

f(x) = x^) + ...+ x n f(e n ),


Sec. 5 Linear Functions 127

where x = (xls . . . , x n ) and the column vectors /(ej), . . . ,/(e„)


are just the columns of A; hence, these columns span the range of/.
The number of independent columns is then the dimension of the
range of/.

Because of Theorem 5.6, we can speak of the rank of a matrix A and


define it to be either the number of linearly independent columns or else
the rank of the function /(x) = Ax.

Example 5. To find the rank of the matrix

1 7^

we apply elementary row operations to reduce the matrix to echelon form.


At each stage the new matrix is the matrix of a different linear function,
but because the row operations are all invertible, the dimension of the
range remains the same. We find that adding the second row to the third
row gives
T 7\

,2 14/

Then subtracting 2 times the first row from the third row gives

1 7\

15.
v
0/

No furtherreduction is possible; so clearly the matrix has just two linearly


independent columns. Thus the given matrix has rank 2, and the asso-
3 3
ciated linear function from :K to 'Ji has a 2-dimensional range.
Finally, we complete the discussion of the relationship between inverse
function and inverse matrix in 31". If /is a function, then / is said to be
one-to-one if each element of the range of /corresponds to exactly one
element of the domain of/ Clearly, / is one-to-one if and only if/ -1
exists. For linear functions we have the following simple fact.

5.7 Theorem

A linear function / is one-to-one if and only if x = whenever


f(x) = 0.
128 Linear Algebra Chap. 2

Proof. First assume that/(x) = implies x — 0. If/(x,) = /(x 2 ),

then, by the linearity of/,/(x x — x 2) = 0. But then by assumption


xx — x2 = 0, so xx = x 2 Thus
. /must be one-to-one. Conversely,
suppose f(Xi) = f(x 2) always implies Xj = x2 . Then, because
f(0) = for a linear function, the equation/(x) = can be written
f(x) = /(0). But then x — 0, by assumption.

It follows from Theorem 5.7 that, if/ is a one-to-one linear function


from :K" to %m , then the rank of/ equals n. The reason is that the null
spaceof/ has dimension (that is, the dimension of the origin) and so,
by Theorem 4.7, rank (/) + = n. This idea is used in the proof of the
following theorem. For linear functions from 3tB to :R", that is, for
functions representable by square matrices, we have:

5.8 Theorem

Let %n —>% n
be linear, with square matrix A. Then/ has an
inverse if and only if A is invertible.

Proof. We at the beginning of Example 4 that if A~ l


have seen
exists, does/ -1 Conversely, if/ has an inverse/ -1 then/
then so . ,

is one-to-one. By Theorem 5.7 the null space of/ has dimension 0;

so the range of/ is all of %


n
But this means that the domain of .

/ is all of % It follows that/


-1 n -1
. has a square matrix B, such that
/ (x) = Bx for all x in :K". Then
-1
the equations

BAx =/-1 °/(x) = x

ABx =/o/-i(x) = x

both hold. It follows that AB = BA; therefore A is invertible and


A- 1 = B.

EXERCISES
1. Suppose that/ and g are linear functions such that

1\ /1\ /0\ /l

'o-i- A.

y-n -CHI
Sec. 5 Linear Functions 129

Find the matrices of the following functions:

(a)/.
(bU-
(c)f + g.
(£)2f-g.
(e)<?o/.
(Ofo(f + g).
Linear Algebra Chap. 2

7. (a) Show that (D + a)(D + b) = (D + b)(D + a), where a and b are


constants,
(b) Define D + f(x) by
{D + /(x))y =/+/(*) v.
Show that
(Z> + 1)(Z> + x) * (D + *)(£> + 1)

by applying both sides to a twice-differentiable function y(x).

8. Find the rank of/, where /has matrix

(0 1 1 2
(a) (b)
U 0) 2 4y

<o o : 2

(c) | 1
I
2

,1 Oy v4

9. Let °U and '10 be finite dimensional vector spaces and let/be a linear function
with domain 1J and with range equal to 10. (Thus dim (1D) = rank (/).)
(a) Show that V contains a largest subspace S such that/, when restricted to
S, becomes one-to-one from StoW. [Hint. Let 91 be the null space of/
with basis n 1 n k Extend this basis to a basis for 1) by adding
, . . . , .

vectors s l5 . . . , s t .]

(b) Show that dim (S) = rank (/), where S is the subspace of part (a).

10. If/ is a function and y an element of the image of/, then the subset S of
is

the domain of / consisting of those x for which /(x) ^= y is called the


inverse image or pre-image of y. Show that, if/ is linear, then the inverse
image of a fixed vector is either a subspace of the domain of/ or else is a
subspace translated by a fixed vector, that is, an affine subspace.

11. What is the simplest way to find the rank of a diagonal matrix
diag («!,.. .,«„)?

12. Show that if A is an m-by-n matrix, then the rank of A is equal to n k,


where k is the dimension of the solution set of the equation Ax = 0.

SECTION 6

DIFFERENTIAL In this section we look at some vector space ideas that arise in studying
OPERATORS differential equations. The equations we shall treat will be like the
following:
/ - ly = (1)

y — 7>y = ex (2)

y" + 3y' +y = sin x, (3)

where the primes denote differentiation with respect to x. Equations of


this kind are important because they express a relation between the value
Sec. 6 Differential Operators 131

of j, its first derivative (velocity), and its second derivative (acceleration).


The problem posed by each equation is to find all functions y(x) such that
replacing y by y{x) satisfies the equation identically. Before beginning a
systematic treatment of the problem, we shall consider some examples
whose results will be useful later.

Example 1. The differential equation y' — ry = 0, where r is a


constant, can be written
/ = ry (4)

and so specifies that the rate of change of y is proportional to the value


of y for every value of the variable x. The growth of a population is
sometimes assumed to obey such a law. To find solutions we use the fact
that if y = e rx , then y' = re rx . It follows that Equation (4) is satisfied if
we take y = e rx
. More generally, if c is an arbitrary constant, then
Equation (4) is satisfied if we take

y = ce rx , (5)

because the c will cancel on both sides. In fact, Equation (5) gives the
most general solution to (4), for observe that we can write (5) in the form

e~ rx
v — c.

Differentiating with respect to x gives

(e- rxy)' =
or, using the product rule for derivatives,

e -rxy' _ re -rxy — e ~rx(y' _ ry) _ 0.

Dividing by e~ leaves y — ry = 0, which is the given Equation (4)


rx

rewritten. But now we can reverse these steps, supposing that y is some
solution. We start with
y - ry =
rx
and then multiply by e" to get

e~ Txy — re~ TXy = 0.

By the product rule, this last equation is

(e-ry)' = 0.

Integrating both sides with respect to x gives

e~ TX
y = c,

where c is a constant of integration. Multiplying both sides by e rx shows


that y must be of the form
y — ceTX .
132 Linear Algebra Chap. 2

Thus we have shown that ce rx is the most general solution of = ry /


in the sense that any particular solution can be obtained by specifying the
value of c.

The method used in the above example consists of multiplying the


expression y + ay by e ax and then recognizing the result as the derivative
(e axy)' = e axy' + aeaxy. We shall use this exponential multiplier e ax
repeatedly in what follows.

Example 2. To solve the differential equation

y - 3y = ex ,

we multiply by e~ 3x and get

£-3x,/ 2e~ 3x v = e~ 2x .

This is the same as

(e
-3Xy)' — e ~2x

Now we integrate both sides with respect to x, getting

e -zxy _ _i e -2* _|_ C)

where c is some constant of integration. Then multiplying by e 3x we obtain

for the most general solution. It is easy to verify directly, of course, that
we have indeed found some solutions, one for each value of c. What we
have shown additionally is that any solution must be of the form — \ex +
ce 3x .

Before considering more complicated examples, it will be useful to


describe some notation that is often used in solving differential equations.
We let D stand for differentiation with respect to some agreed-on variable,
say x, and interpret D+ 2, D — 1
1 , and similar expressions as linear
functions acting on suitably differentiable functions y. For example:

(D + 2)y = Dy + 2y
= / + 2y
(D 2 - \)y = D' y -y z

= D(Dy) -y=y"-y.
An is that D acts as a linear function on y;
important observation
the term linear operator sometimes used to avoid possible confusion
is

over the fact that y itself is a function of x (though not necessarily a linear
one). To see that D acts linearly, all we have to do is recall the familiar
Sec. 6 Differential Operators 133

properties of differentiation:

D{)\ + y = Dy, + Dy
s) 2

D(ky) = kDy, k constant.

These two equations express the linearity of D. From the fact that
compositions of linear functions are linear it follows that the operators
D D32
, , and in general Dn are also linear. Because numerical multiplica-
tion is a linear operation and because the sum of linear operations is

linear, the operator (D + a) is linear for all constants a. Putting these


facts together allows us to conclude that expressions such as

D +
2
a, D + aD +
2
b, (D + s)(D +
are all linear operators, with the respective interpretations

+ a)y = y" + ay
(D 2

(D + aD + b)y = y" + ay' + by


2

(D + s)(D + t)y = (D + s)(y' + ty)


= D{y' + ty) + s(y' + ty)
= y" + ty' + sy' + sty
= y" + it + s)y' + sty
= (D + (s + t)D + st)y. 2

The last example shows that, for constants s and /,

(D + s){D + t) =D + 2
(s + t)D + st,

and also that


(D + t)(D + s) =D + 2
(s + t)D + st.

Thus operators of form D + a can be multiplied like polynomials in


the
D if a is sometimes also important to be able to factor an
constant. It is

operator, for example D 2 — 1. We see immediately that for this example

Z) a - = (D-
1 \)(D + 1)

= (D+ 1)(Z>- 1).

Returning to differential equations, suppose we are given one of the


form
y" + ay' + by = 0;

Equation (3) at the beginning of this section is similar to this, with a = 3,

b = 1. Writing the equation using differential operators gives

(D 2 + aD + %= 0.
134 Linear Algebra Chap. 2

Our method of solution will be to try to factor the operator into factors
of the form (D + s) and (D + /), and then apply the exponential multiplier
method of Examples and 2 repeatedly.
1

Example 3. Suppose we want to find all functions y = y(x) that


satisfy
y" + 5/ + 6y = 0.

We write the equation in operator form as

(Z)
2
+ 5D + 6)y = 0.

Next we try to factor the operator. We see that

(D 2 + 5D + 6) = (D + 3)(D + 2);

thus we need to solve


(D + 3)(D + 2)y = 0.

To find all solutions, we suppose that y is some solution. Letting

(D + 2)y = u

for the moment, we substitute u into the previous equation and arrive at

(D + 3)w = 0.

But this equation can be solved for u if we multiply through by e 3x . We get


e Zx
Du + 3e 3x
u =
or
D(e3x u) = 0.

Therefore
e 3x u = cx ,
for some constant clf and so
u = c x e~~ 3x .

Recall now that we have temporarily set (D + 2)y = it. We then have

(D + 2)y = c x e~
3x
.

tx
Multiply this last equation by e to get

e^Dy + 2e 2xy = c x e~ x
or
D(e 2xy) = c x e~ x .

Integrating with respect to x gives

e^y = —c x e~ x + c2
or
y = — c e~ 3x + x c 2 e~ 2x .
Sec. 6 Differential Operators 135

Since the constants c x and c 2 are arbitrary anyway, we can change the
sign on the first one to get

y = c t e~ 3x + c^er 21

for the form of the most general solution.

The exponential multiplier used in the previous examples is found as


follows: (D + d)y is multiplied by e aT to produce D(e axy), that is,

e
ax
(D + a)y = D(e axy).

Repeated application of the rule leads to the following general fact.

6.1 Theorem

The differential equation

(D - r,){D - r2) . . . (D - r n )y = 0,

with ru r2 , . . . , r n all different, has its most general solution of the


form
y = c 1 e rix + c 2 e r2X + . . . + c n e TnX .

If some r's are equal, say rx = r2 = r3 = . . . = rk , we replace


~1
e r2X , er *x , . . . , e rkX by xe riX x2 e TlX
, , . . . , xk e riX , respectively, to get
the general solution.

Proof. For simplicity we shall start with the case n — 2. Given

(D - ri )(D - r 2 )y = 0,
we set

(D - r 2 )y = u.

Then the equation


(D - r x )u =
is solved by multiplying through with e~ TlX . We get

e~ riX Du — r x e~ T ^ x u =
or
D{e~ rix u) = 0.

Then integration with respect to x gives

e -TiX U _ Ci

Now
(D - r 2 )y = cx e ^x
r
.
136 Linear Algebra Chap. 2

We multiply by e r- r
to get

er* & Dy — = ~'"- )j '-


r 2 e~ riXy c 1 cj(ri

Integrating both sides of

D{e- r **y) = Cl e
(r >- r * )x
(6)
gives

Cl
e-*z*y = gin-*)*
+ C2

cl
y = e
r

r, — r..

We have assumed above that rt ^ r2 ; so the last integration is

correct as given. For neatness we replace the arbitrary constant c x by


(r x — r 2 )c 1 , which is just as arbitrary.
In case r-^ = r2 , the integration just performed is not correct.
We would have r1 — r2 = 0; so Equation (6) becomes

D{e~ rt*y) = ev
Now integration gives
g-rtxy = CiX _|_
Ca
or
y = c 1 xe TiX + c 2 e T2X ,

as stated in the theorem.


More generally, to solve

(D-r )(D-r )...(D-r


l 2 n )y
= 0,
we set

(D-r )...(D- 2 r n )y = nlf (7)

so that the previous equation becomes

(D - rjiij = 0.

Substitution of the general solution for u^ into (7) gives a new equa-
tion which we split up by setting

(D-r )...(D- 3 r n )y = u2 .

Then Equation (7) becomes

(D — r 2 )u 2 = «l9

to be solved for u z . Continuing in this way, we finally have to solve


an equation
(D - r n )y = «„_!

for y. The solution of each equation reduces to the solution of


;

Sec. 6 Differential Operators 137

a sum of equations of the form


(D — r)y = cxk e*x , s^r
or
(D — r)y = cxke Tx .

In the first case we multiply by e~ Tx . Then integration by parts


leads to a linear combination of powers of x times x
e* . In the second
case, after multiplication by e~ rx
, we only have xk on the
to integrate
right. In either case we get solutions of the form stated in the
theorem.

The key to the application of the theorem is the factorization of a


linear differential operator into factors of the form (D — r). Looked at as
a polynomial in D, the expression

D n + a^D"- + 1
. . . + ax D+ a (8)

is called the characteristic polynomial of the associated differential equation

y
(n)
+ a n_ iy ^-v + . . . + a iy '
+ a y = 0.
Notice that, to get the polynomial from the equation, we replace y
ik)

by D k
and that the term a D° = a Finding a
y corresponds to a . factor-
ization for the polynomial depends on knowing its roots, for if the
polynomial (8) has roots r u r n then it has the factored form
. . . , ,

(D - ri )(D -r )...(D- 2 r n ).

Finding the roots of a polynomial exactly is impossible in general. How-


ever, when n = 2 well-known
(or 3 or 4 for that matter), there are
formulas for the roots in terms of the coefficients. For simplicity we have
assumed that the leading coefficient of the characteristic polynomial is 1
otherwise we can divide through so that the leading coefficient becomes 1.

Example 4. Suppose we want to solve

y'» - Ay" + Ay' = 0.

Writing the characteristic polynomial

Z> 3 - 4D + 2
AD,

we observe that it can be written

D(D 2 - AD + 4) = D(D - 2)
2

The roots are and 2, where 2 is a repeated root. Thus the general
solution to the equation is a linear combination of e 0x , e 2x and xe Zx
, ,
Linear Algebra Chap. 2

and so
y = cx + c 2 e 2x + c 3 xe 2*

is the most general solution.

From here on the theory will be described in terms of vector spaces.


We have referred to the linearity of the operators (D — r) without being
specific about the vector space on which they operate. There are in fact
many possibilities, depending on the requirements of a particular problem.
One choice is to consider C (7i)
the vector space of real-valued functions of
,

x having continuous nth derivatives. If L is a linear differential operator


of order n, that is, an operator containing differentiation of order at most
n, then L acts linearly from C {n) to C (0)
, the space of continuous functions.
Another possibility is to consider L acting on C (c0) , the vector space of
functions having continuous derivatives of all orders. For our purposes
the former choice will be more natural. Observe that the functions e rx
and x k e rx of Theorem 6.1 are nevertheless members of C (co)
, and so they
are automatically also in C (n)
. Furthermore, the set JC of solutions of

(D - rO . . . (D - r n )y =
is a vector subspace of C (n)
, because JV is the null space of the linear
operator
L = (D - rx) . . . (D - rn)

acting on C (n) From


. Theorem 6.1 we can immediately conclude that Jf
has dimension at most n, the order of L, because JC is spanned by n
rx
distinct functions of the form e or x'e rx . (The possibility that r may be
a complex number is not ruled out here but will be explained in the next
section.) In fact we can prove the following theorem. The proof consists
simply of showing that the basic solutions are linearly independent, but it

requires some work.

6.2 Theorem

Let X be the subspace of C (n)


consisting of all solutions of the linear
differential equation

(D-r )...(D- y r n )y = 0.

Then JV has dimension exactly n.

Proof. We have already observed that X has dimension at most n


because itspanned by a certain set of /; elements. We can show
is

that the dimension is exactly n by reviewing the way in which the


exponential multiplier method works. We solve a succession of
Sec. 6 Differential Operators 139

differential equations
{D - rk )y = uk_ x

where uk_ x has the general form

"*-i(*) = c l x ll e rix + . . . + c k _ 1 x lk - 1 e Tk - i:r .

We apply the factor e^ nx to get as usual

D{e~ TkXy) = e~ TkX uk ^. (9)

We now proceed inductively to show that the set of functions of the


form uk has dimension k, assuming that the set of functions of the
form uk _ x has dimension k — 1. If the set of possible functions
uk _ x has dimension k — 1, then the same is true of the set of func-
tions e^ TkX u k _ x (Why?) Thus we consider the linear operator D as
.

acting from a domain of functions e~ TkXy to a range vector space of


dimension k — \. But the null space of D alone has dimension 1.
The reason is that the solutions of

D(e~ TkXy) =
are of the form
e -r k Xy _ Cf

clearly a 1 -dimensional subspace of the domain of D. Because

dim (domain) = dim (null space) + dim (range),

we have dim (domain) = + 1 (k — 1) = k. Thus the set of solu-


tions e~ TkXy of Equation (9) has dimension k. It follows that the
corresponding set of functions

y = e rkX (e- TkXy)

also has dimension k. Since we can prove this for k = 1,2, . . . , n,


we have shown that the set X of all solutions to the given differential
equation has dimension n.

We conclude the section by considering differential equations of the


form
L{y)=f,
where/is a given continuous function of x, and L has the form

L=(D- ri ){D -r2)...(D- r n ).

We have already treated the case in which/is identically zero.

Example 5. Given
y" + 2y' +y = e 3x ,
140 Linear Algebra Chap. 2

we write the characteristic polynomial D + 2D +


2
1 and factor it, putting
the equation in the form
(D + \)
2
y = e 3x .

Letting (D + \)y = u, we try to solve

(D + 1)« = e Zx .

Multiplication by e x gives
e x Du + ex u = e ix
or
D(e*w) = e 4x .

Then, integration gives


ex u = le** + Cj

or
u = £e 3x + c^ - *.
Since (Z) + l)_y = w, we have

(D + 1)7 = ie
3x
+ c^.
Again multiplying by ex , we get

e x Dy + e xy = le
ix
+ cx
or

Then

or
j = ^e * + 3
c x xe~ x + c 2 e _x .

In the above example the solution breaks naturally into a sum of two
parts, y h and y9 :

yh = c x xe~ x + c 2 e~ x ,

The function y h is called the homogeneous part of the solution because it is a


solution of the so-called homogeneous equation

L(y) =
associated with L(y) = f. The function _y„ is called a particular solution of

Uy) =f
because it is just that: a particular solution, though not the most general
one. In fact, we get y p by setting c l = c2 — in the general solution. This
breakup of the solution into two parts is an example of a general fact about
linear functions, a fact that is used in solving systems of linear algebraic
equations. The principle is important enough, and at the same time simple
enough, that we state it here also.
Sec. 6 Differential Operators 141

6.3 Theorem

Let L be a linear function. Let /be an element of the range of L,


and let y p be any element of the domain of L such that L(y v ) =f.
Then every solution y of

can be written as a sum


m =/
where y h is an element of the null space JV of L.

Proof. Suppose that L(y) =/and that also L(y p ) = f. Then, since
L is linear,
L(y - yp = ) L(y) - L(y p )

It follows that ^ — jj, is in the null space X of L, that is, _y — yv =


yh for some y h in JV\ But then y = y h + yP , as was to be shown.

The method of Example 5 can always be used to find the most general
solution to the equation L{y) =/of the form

(D-r )...(D-r n )y=f.


1

However, in some examples


the computations can be shortened by means
of Theorem which shows that yh the homogeneous part of the solution,
6. 1 , ,

can be written down as soon as we know the roots of the characteristic


polynomial. Theorem 6.3 then says that if we find the general homogen-
eous part of the solution yh (using Theorem 6.1) and can somehow find a
particular solution y p then the general solution of the given equation is
,

yh + yp . In finding yp it may be convenient to take advantage of the


linearity of L in case the right-hand side/is a sum of two or more terms.
If we want to solve
L{y) = fli/i +af 2 2 (10)

and we can find solutions yx and y 2 such that

L(yi )=fu L(y 2 )=f2 ,

then, because L is linear, the function

y = Oi7i + a 2y 2

is a solution of Equation (10). In this context, the property of linearity is

sometimes called the superposition principle because the desired solution


is found by superposition (i.e., addition) of solutions of more than one

equation.
.

142 Linear Algebra Chap. 2

Example 6. In Example 5 we found that the differential equation

(D + \)
2
y = e 3x
had the general solution

TVe
3x
+ Cyxe- X + c 2 e~ x .

When cx = c2 = we get the particular solution yx = j\.e 3x . If we now


wanted to solve
(D + \)
2
y = e 3x + 1, (11)

we would not have to start all over again, but would only have to find a
particular solution for
(D + \fy = 1.

This could be solved, of course, by using exponential multipliers, but in


equation is so simple that we can guess a solution,
this case the differential

namely, y 2 = 1. Then a particular solution of Equation (11) is y p =


tV 3x + 1 an d the general solution is
y = ^e3x + 1 + c x xe~ x + c 2 e~ x .

On the other hand, solving

(D + \fy = e 3x + e~*

requires us to find a particular solution to

(D + \fy = e~ x .

To do this we would return to the exponential multiplier method.


Other methods of solution, using "undetermined coefficients" and
"variation of parameters," are given in Exercises 10 and which follow. 1 1

These methods are sometimes more efficient for finding particular


solutions.

EXERCISES
1. For each of the following differential equations, find an appropriate
exponential multiplier and then solve by integrating both sides of the
modified equation

(a) y' + 2y = 1 (c) y' -y = ex .

(b) 2 — +y =-- x. (d) y = sin x.

2. Show that if the expression y + p(x)y is multiplied by

e $p(x) dx

the resulting product can be written in the form

d
(
e lv(x) dx 7
y\
dx
Sec. 6 Differential Operators 143

[We assume that p(x) is a continuous function of x, and that J p(x) dx is

some function with derivative p(x).]

3. Use the result of Problem 2 to find an exponential multiplier for each of the
following differential equations. Then solve the equation.

(a) / + (Ay= 1. (c) / xy

(b)
£ + xy = x. (d) / + y = 0.
4. Write each of the following differential equations in operator form, e.g.,
(D 2 + 2D t 1)/ = 0, and then factor the operator into factors of order 1.

Then find the general solution of the equation.

(a) y" + 2/ + y = 0. (e) (D 2 - \)y = 1.

(b) 2y" -y =0. (f) y" -y = e


x
.

(c) /" + 3y" + y = 0. (g) /' -y = e


x
+ 1.

(d) D{D + 3)y = 0.

5. Sketch the graph of each function of x given below. Then find a differential
equation of which each one is a solution, the equation being of the form
y" + ay' + by = 0.

(a) xe~ x . (c) 1 + x.

(b) e* + e- x . (d) 2e
2x - 3e
3x
.

6. Define the differential operator D + f(x) to act on a function y by


{D + f{x))y = y' + f{x)y.
(a) Show that in general (D + f(x))(D + g(xj) ¥= (D + g(x))(D + f(x)).
(b) Show that if/ and g are constant, then equality holds in part (a).

7. For x ^ 0, the differential equation

x y"2
+ (x 3 + x)y + (x
2
- \)y =
can be written in operator form as

(a) Show that the above equation can also be written as

(D + x)Id +-W =0.

(b) Solve the equation in part (a) by successive application of exponential


multipliers of the form given in Exercise 2.
(c) Show that

(D + x)(d + -] ^ Id +-)(D + x).


144 Linear Algebra Chap. 2

(d) Solve the differential equation

x)y-0.
K)< \(D

8. (a) Find the general solution of the differential equation y" — y — 0.


(b) Determine the constants in the general solution y(x) in part (a) so that
y(0) = and y'(0) = 1. Sketch the graph of the resulting particular
solution.
(c) Determine the constants in the general solution y(x) in part (a) so that
j(0)= 1 and y(l) = 0. Sketch the graph of the resulting particular
solution.

9. (a) Show that the characteristic equation of the differential equation


y" +y = has complex roots r1 i and r2 = — i.

(b) Using the definition


e
±ix _ cos x _j_
i sj n x ^

show that the formal solution

y(x) = cx e
tx
+ c 2 e~
tx

to the differential equation y" +y = can be put in the form

y(x) =d 1
cos x + d2 sin x.

(c) Verify directly that cos x and sin x are solutions of v" +y = 0.

10. (Method of undetermined coefficients.) Given a nonhomogeneous differen-


tial equation
(D 2 + aD + b)y =/(*), (1)

suppose that the function f{x) can itself be recognized as a particular


solution of an equation of a similar but homogeneous form, say,

(D - ri)...(D -r„)/=0.

By applying the operators (Z) — r,) to both sides of the given Equation (1),
it follows that y must also be a solution of the higher-order homogeneous
equation
(Z> - /i) . . . (D - r n )(D
2
+ aD + b)y = 0. (2)

Since we have a routine for writing down the most general solution yg of
this last equation, we can find a particular solution y p of Equation (1) by
substituting the general solution of (2) into it and seeing what conditions

result for the arbitrary constants, or"undetermined coefficients," in yg To .

save duplication, we can first eliminate from yg all terms that already occur
in the homogeneous part, yh of the solution of (1). Linear independence of
,

the basic solutions of the homogeneous equations guarantees that when yg


is substituted into (1), then coefficients of like terms must be equal on
either side. It is these equalities that are used to determine the coefficients
Sec. 6 Differential Operators 145

for y v The
. general solution of (1) is then y = yh + y v Find . the general
solutions of the following differential equations.

(a) y" -y = e 2x . (d) y' - y = x.


(b) y" -y = e
x
. (e) /' -y = e
x
+ x.

x
(c) y" + 2/ + y = e .

11. (Variation of parameters.) This method is useful for finding particular


solutions of linear differential equations with nonconstant coefficients of the
form
y" + ay' + by =/(*). (3)

Suppose that we can find the homogeneous solution

yh = C^iO) + C 2 U 2 (x).

The constants and c 2 can now be thought of as auxiliary variables or


ct
"parameters" in which we allow "variation" as functions of x. We then try
to determine c x (x) and c 2 (x) so that

y(x) = c^u^x) + c 2 (x)u 2 (x) (4)

will be a solution of the nonhomogeneous Equation (3). We compute by the


product rule
y = Cl«l + c2 u2 + C x Ux + c'2 u 2

and then require for simplicity that


.

146 Linear Algebra Chap. 2

(a) Verify that substitution of Equations (4), (6), and (7) of Exercise 11
into

/ I ay' +by =f
produces Equation (8).
(b) Verify that Equations (5) and (8) of Exercise 1 1 have the solution given
by Equation (9).

13. A pellet fired horizontally through a viscous medium has a displacement


from its initial position of the form x = x(t), where t is time. Denoting
derivatives with respect to time by dots, the velocity of the pellet at time / is

x(t) and its acceleration is x(t). From physical principles it can be shown
that, theoretically, x satisfies the differential equation

mx + kx = 0,

where m is the mass of the pellet and k is a positive constant depending on


the viscosity of the medium.

(a) Show that if the initial displacement is x(0) = and the initial velocity

is x(0) = v , then, for t > 0,

Vcfn
x(t) = -7- (1 - e- kt ' m ).
k

(b) Show that the displacement x(t) has an upper bound equal to v^mjk.
What is the effect of increasing the viscosity constant or of increasing
the mass of the pellet?
(c) Show that the velocity of the pellet decreases to zero as t increases.
(d) Show that the acceleration of the pellet is negative for t > 0. What is

the effect of an increase in k or an increase in m on the acceleration ?


(e) Sketch the graph of x(t) if v m/k = 1

14. Let D = d\dt. The first-order system of differential equations

(aD + a)y + (bD + p)z = f{t)


(cD + y)y + (dD + 6)z = g(t)

contains the purely algebraic system

ay + pz =/(/)

yy + 6z = g(t)

as a special case when a, b, c, and d are all zero. The method of elimination
can be used to solve the differential system as well as the algebraic. To find
y{t) and z(t), operate on both sides of the first equation by (dD + d),and
on the second equation by (bD + ji). Then subtract one equation from the
other. The resulting equation can be solved for y(t) since it does not contain
z. Next substitute the general solution y(t) into one of the equations and
solve that for z(t); to determine possible relations between the constants of
integration, it may be necessary to substitute y(t) and z(t) into the other
given equation. Sometimes simplifications in this procedure can be made.
Sec. 7 Complex Vector Spaces 147

(a) Use the method just described to find the general solution of the system

(£> + \)y + z = 0.

3y + (D - \)z = 0.

(b) Determine the constants in the solution of part (a) so that the initial

conditions y(0) = and z(0) = 1 are satisfied.


(c) Find the most general solution of the system

(Z> + \)y + Dz =

Dy - (D - \)z = t.

(d) Determine the constants in the solution of part (c) so that the initial

conditions y(0) = and z(0) = are satisfied.

15. Suppose that two 100-gallon tanks of salt solution have concentrations (in
pounds per gallon) of salt y(t) and z(t), respectively. Suppose that solution
is flowing from the v-tank to the z-tank at a rate of 1 gallon per minute, and

from the z-tank to the j-tank at a rate of 4 gallons per minute, and that the
overflow from the j'-tank goes down the drain, while the z-tank is kept full
by the addition of fresh water. We assume that each tank is kept thoroughly
mixed at all times.

(a) Show that y and z satisfy a system of differential equations of the type
discussed in Exercise 14. [Hint. Express Dy and Dz each as a linear
combination of y and z.]
(b) Find the general solution of the system found in part (a), and then
determine the constants in it to be consistent with initial concentrations
j(0) = ^ and z(0) = i,
(c) Draw the graphs of the particular solutions y{t) and z(t) found in part

(b) and interpret the results physically.

SECTION 7

In the earlier parts of this chapter we have always understood the vector COMPLEX VECTOR
space operation of numerical multiplication to mean multiplication by a SPACES
real number. However, we can replace real numbers by complex numbers,
and let the definition of vector space and linear function remain otherwise
the same. Then, all the theorems we have proved for real vector spaces
are still true relative to the complex numbers. To prove this, all we have
to do is observe that the only properties of real numbers that we used in
proving theorems about an abstract vector space are properties that are
shared by complex numbers. Theorems involving inner products are
another matter, which we shall discuss at the end of the section. As
motivation for considering complex vector spaces, we shall explain how
the extensionis the key to further development of the study of differential

operators begun in Section 6. First we shall review the relevant facts


about complex numbers and complex-valued functions.
The complex number z x — +
iy, with real part Re z = x and imag-

inary part Im z = y, can be identified with the vector (x, y) in 'Ji


2
for the
148 Linear Algebra Chap. 2

purpose of drawing pictures. Furthermore, complex addition is defined so


2
thatit corresponds precisely to addition of elements of .'Jl :

(x + iy) + (*' + iy') = (x + x') + i(y + /).


The relation is illustrated in Fig. 7. It is also true that numerical

(a- •
x') i (v y')

IV

Figure 7

multiplication of a complex number by a real number r corresponds to


numerical multiplication in Jl 2 :

r(x + iy) = rx -f iry.

To complete the identification of the complex numbers with the vectors of


3t 2 it is necessary to introduce an operation of vector multiplication in
Si 2 to correspond to the multiplication of complex numbers. Complex
multiplication is done by defining = —1 and computing as follows:
i
2

(x + iy)(x' + iy') = (xx' — yy') + /(*/ + yx').


The complex conjugate of x + iy, written x + iy, is

y '.l'-

Taking the conjugate z of a complex number z corresponds to reflecting


it in the horizontal axis.(See Fig. 8.) Division by a nonzero complex
number x + iy is most easily done by multiplying both numerator
and denominator by the conjugate of the denominator:

x' + iy' (x' + iy')(x — iy) (xx' + yy') + i(xy' — yx')


x + iy (x + iy)(x — iy) x
2
+f
Finally, the absolute value of x + iy, written \x + iy\, is defined
to be

\x + iy\ = v/x
z
+ y
2

X r iy
and corresponds to the length of a vector in 'Ji
2
. (See Fig. 8.)

Figure 8 Notice that absolute value and conjugate of a complex number z


Sec. 7 Complex Vector Spaces 149

are related by

The geometric significance of complex multiplication is seen best by


representing nonzero complex numbers in polar form:

y
x + iy = \x <y\ + i

x + iy\ \x + iy\)'
iy\

Because \x + iy\ = V* + y 2 2
, the numbers xj\Jx 2 +y 2
and yjs/x 2 +y 2

can be written as cosine and sine, respectively, of an angle 0, called a


polar angle of z = x + iy, and illustrated in Fig. 9. Of course, a complex

r|(cos i sin 8)

Figure 9

number has infinitely many polar angles, each pair differing by an integer
multiple of 277.

Now if z and z' are complex numbers with polar angles 6 and 0', we
can write their product in polar form as follows:

(|z| (cos + / sin 0))(|z|' (cos 6' + i sin 6'))

= \z\ \z'\ ((cos 6 cos 6' — sin 6 sin 6') + / (cos 6 sin d' + sin 6 cos 6'))

= \z\ \z'\ (cos (d + 6') + i sin (0 + 0'))-

In the last step we have used the addition formulas for cosine and
sine. The result of the computation is a number in polar form
having absolute value \z\ \z'\ and polar angle d + 6'. We conclude
that the absolute value of a product of complex numbers z and z'
is the product of their absolute values

\zz'\ = \z\ \z'\,

and that if z and z' have polar angles 0(z) and O(z'), then zz'
has a polar angle 0(z) + d(z'). These facts are illustrated in
Fig. 10. Figure 10
150 Linear Algebra Chap. 2

What we have just established can be expressed most conveniently in


terms o\' the complex exponential function. We define, for real numbers 0,

c'" cos () i sin 0.

2
Thus e'° is a complex number with |<"°| \ sin cos- I , and with
polar angle d. Since polar angles are added when complex numbers are
multiplied, we have

In particular, when 0'


0, we yet < V " I ; therefore

1 = a- = 7°.
These last equations are justifications for using the exponential notation;
the function behaves like the real exponential, for which e°e° = e
6+e '

and \\# = e*.


In terms of the exponential, the polar form of a complex number z
with polar angle 0(z) becomes

z = \z\e
mz
\
and its conjugate is

2 = \z\e~
mz \

The fact that the conjugate of zw is the product of the conjugates of z


and w can be established directly, or by writing

In addition to its algebraic simplicity, another reason for using the


complex exponential notation comes from the formulas for its derivative
and integral. To differentiate or integrate a complex-valued function
u(x) -f /'r(.v) with respect to the real variable x, we simply differentiate or
integrate the real and imaginary parts. By definition,

ax ax clx
and

|
(Ma) - iv(x))dx = |
m(x) clx + i fi-(A) dx.

Then the derivative of e ix with respect to x is given by

—d e
ix
= —d (cos x +
/ , •

i sin x)
dx dx
= —sin x f i cos x

= /(cos x + / sin x) = ie
tx
.
Sec. 7 Complex Vector Spaces 151

In short, we have
4- e
ix
= ie
ix
.

dx
Similarly

e
ix
dx = - e ix + c,
J i

where c may be a real or complex constant. These are analogous to the


formulas for the derivative and integral of e 01 when a is real. More
generally, we can define

and compute

7.1 A. e
<a+ifi)x
= (a + ip)e
la+ifi)x

dx

and

7.2 r e <«+«-/»* rf x = —— 1
e
<«-M/»*
+ Cj a + i/5 ^ 0.
J a + //5

These computations are left as exercises.


We are now in a position to discuss the differential equation

(D 2 + aD + b)y =
when the factored operator

(D - ri )(D - r2) = i)
2
- (r, + r 2)D + r^
contains complex numbers rx and r2 . We shall see that the usual tech-

niques, as discussed for example in Section 6, still apply. In fact, the


exponential multiplier method goes over formally unchanged because of
Equation 7.1 ; we have
D{e r*y) = e rx (D + r)y,

whether r is real or complex.

Example 1. Consider the differential equation y" +y= 0. We write


the equation in operator form,

(£»
2
+ \)y = 0,

and factor D + 2
1 to get

(D - i)(D + i)y = 0.
152 Linear Algebra Chap. 2

Then set

(D + i)y = u, (1)
and try to solve
(D - i)u =
for //. As in the real case, we multiply by a factor designed to make the
left side the derivative of a product. The same multiplier rule suggests
that the correct factor is e~ ix so we write
,

er ix {D - 0" =
or, since D(e~ ix
u) = e~ ix
{D — i)u,
D(e~ ix u) = 0.

We integrate both sides with respect to x to get

e~ ix
u = cx or u — cx eix .

Substituting this result for u into Equation (1) gives

(D + i)y = cxe*,

which must now be solved for y. We do it by multiplying through by e ix


to get
e ix (D + i)y = c x e 2ix

or, since D(e ixy) = e ix (D + i)y,


D(e uy) = Cxe
2 '*.

Integrating gives

e
ix
y = \ Cl e 2ix + c2

or

y =~ cx e
ix
+ c 2 e~
ix
.

2i

On replacing the arbitrary constant c a by 2ic 1 , we have

y = c x e ix + c 2 e~ ix

— c x (cos x + i sin x) + c 2 (cos x — i sin x)

= (c x + c 2 ) cos x + i(Ci — c2 ) sin x.

To make the solution simpler looking, we can set dx = (c x + c2 ) and


d2 = i(cx — c 2 ). This involves no change in generality in the constants
because, whenever dx and d2 are to be real numbers, then c x and c 2 in
general must be complex. In fact we have, solving for c x and c 2 ,

d +
=— x id*
= —d,
— id 2
c, and Co .
Sec. 7 Complex Vector Spaces 153

Example 1 shows that it is important to be able to form linear combina-


tions of elements of a vector space with complex coefficients. A complex
vector space has the same definition as a real vector space except that
numerical multiples are formed using the complex numbers C. The
definitions of linear independence and linear function are also formally
the same, with complex numbers replacing real numbers. It is worth
remarking that every complex vector space can be converted in a natural
way into a real vector space; to obtain it we simply restrict ourselves to
numerical multiplication by real numbers. As a consequence, linear
independence of a set in a complex vector space automatically implies
linear independence of the same set relative to the restricted real vector
space. The reason is that, if

c1 x 1 + . . . + c nx n =
implies that all the c's are zero whenever they are chosen from the complex
numbers, then the same implication certainly holds if the c's are only
chosen from the real numbers. However, the converse statement is not
true, as the following example shows.

Example 2. The set C of complex numbers is itself a complex vector


space. As a real vector space, we have seen that C can be identified with
5l 2 ; hence it has dimension 2 relative to the real numbers. However,
relative to thecomplex numbers, C has dimension 1 because a linear
combination of more than one complex number, with complex coefficients,
can always be made to be zero without all coefficients being zero. For
example, in c x z + c 2 w, choose c x = 1/z and c2 — — 1/iv.
Turning to differential equations again, we observe that Theorems
6. 6.2, and 6.3 of the previous section are all true relative to the complex
1 ,

numbers. The reason is that, in the proofs of these theorems, no assump-


tion is made about the numerical multipliers that occur other than the
ordinary rules of arithmetic which apply to both real and complex numbers.
In addition, the properties of the exponential function that are used hold
for both real and complex exponentials. Therefore we can conclude that
all solutions of the differential equation L(y) = 0, where
L = (D - rj . . . (D - rj (2)

are linear combinations of n functions, even when the numbers rk are


complex. These functions are e rkX or, in the case of multiple roots,
x l
e rkX . This implies, just as for real rk :

7.3 Theorem

The set of solutions of L(y) = 0, with L given by Equation (2),


is a vector space Jf of dimension n relative to the complex numbers.
154 Linear Algebra Chap. 2

Jf has a basis consisting of functions of the form e TkX or x e TkX l


,

where rk may be complex.

Theorem 7.3 is not usually applied directly in the above form because
operators such as
P(D) = D + aD + 2
b,

which occur in practice, most often have real numbers for the coefficients
a and b. This implies that, in the factorization

D + aD +
2
b = (D - r{){D - r 2 ),

the complex roots rk always occur in complex conjugate pairs. (If r is a


root of P(x) with real coefficients, then so is r. Why?) As a result,
solutions of the differential equation also occur in pairs of the form

qTX qTX

perhaps multiplied by some power of x. It then becomes natural to write


r = x + //? and f = a — //?. We get

Cl e
rx
+ c2 e = c e (a+mx + c e ~
fx
x 2
(x ill)x

= e ax (c (cos + sin /Sx) + c (cos /Sx — sin /9x))


1 jffx i
2 /

= rfje ax cos /Sx + d e ax sin /Sx. 2

where ^= cx + c and d = i(c — c The functions e xx cos /5x and


2 2 t 2 ).

e ax sin /3x are easier to interpret geometrically than are the complex
exponentials that gave rise to them. Hence the solutions are often written
using the trigonometric form

y = dx e™ cos j8x + d 2 e*
x
sin /3x.

Example 3. The differential equation

y" + 2/ + 2y =
has characteristic polynomial

D + 2D +
2
2.

The roots of this polynomial are — ± 1 /; so in factored form, the operator


equation can be written

(2>_(_1_/))(2)_(_1 +f))y = 0.
The complex exponential solutions are

e (-l-i)x £ {-l+i)x_

Translated into trigonometric form, these give

e~x cos x, e~ x sin x;


Sec. 7 Complex Vector Spaces 155

so the general solution can be written either

(-i-i)x {
- 1+i)x
Cie _|_ c 2e

or
dx e~ x cos x + d.2 e~
x sin x.

The natural counterpart of Theorem 7.3 for differential equations with


real coefficients is the following theorem which guarantees a basis for the
space of real solutions of L(y) = 0.

7.4 Theorem

The set of real-valued solutions of

{D n + a n _ x D n -i + . . . +aD+ x a )j = 0,
where the # are real, is a vector space JL of dimension n relative to
fc

the real numbers. JL has a basis consisting of functions of the form

l ax l ax
x e cos /5.x, x e sin /?x,

where a and ft
are real.

Proof. We know from Theorem 7.3 that the complex solutions of the
above differential equation are linear combinations of functions of
the form x lerx where either r is real or else there is a companion
,

solution x e
rx
We have seen that any solution, and in particular
l
.

any real solution, can then be written as a linear combination of n


functions of the form

l x l ax
x e" cos fix, x e sin fix. (3)

Since the space JL of solutions has complex dimension n, any n


functions that span it must be linearly independent over the complex
numbers. But then these same n functions will automatically be
linearly independent over the real numbers and so form a basis for
JL. Hence n functions of the form (3) are a basis for JL.

We conclude the section with some additional remarks about complex


vector spaces.

Example 4. Denote by C" the set of all w-tuples of complex numbers.


Then C" is easily seen to be a vector space with addition and numerical
multiplication defined coordinate-wise. Unless something is stated to the
contrary, it is always understood that numerical multiplication in C" is
156 Linear Algebra Chap. 2

relative to the complex numbers. Then C" has e x = (1, 0, . . . , 0), . . .


,

e„ = (0, . . . , and so it has complex dimension


0, 1) as a basis, n.

Example 5. Let P n be the vector space of polynomials in powers of


x from up to x n with complex coefficients. Then P n has complex dimen-
1 ,

sion n + 1 because the spanning set 1, jc xn is linearly independent.


To see this, observe that

c + cx x + . . . + c nx
n
— for all x

implies that the polynomial has more than n roots. This is possible only
if all the coefficients are zero.

It is and inner product (the analog of dot-


possible to define length
product complex vector space. To see how this should be
in 31") in a
done, suppose that the inner product of two vectors x and y is to be a
complex-valued function (x, y). Suppose further that (x, x) is to be
nonnegative so that we can define the length of x by |x| = V (x, x). If now
we simply assume that (x, y) is complex linear in both x and y, then we
would have for any complex number c,

|cx|
2
= (ex, ex) = c 2 (x, x) = c 2 |x| 2 .

But |cx| 2 and |x| 2 are both real numbers, while in general c 2 the square of a ,

complex number, is not real. To get around this difficulty we require that
(x, y) be conjugate symmetric,

<x, y> = (y,x),

instead of symmetric. Thus we define a complex inner product (x, y) of


elements x, y in a complex vector space so that the following properties
hold.

7.5 Positivity: <x, x) > 0, except that (0, 0) = 0.

Conjugate Symmetry: (x, y) = <y, x).

Additivity: <x + y, z) = <x, z) + <y, z).

Homogeneity: (ex, y) = c(x, y).

The conjugate symmetry implies additivity in the second vector also,


so that
<x, y + z> = (x, y) + <x, z).
Sec. 7 Complex Vector Spaces 157

However, we have conjugate homogeneity in the second variable:

<x, cy) = <cy, x) = c(y, x) = c(x, y).

Example 6. In the complex vector space C", let z = (z x , . . . , z„) and


w = (wlt . . . , n„). Define

(z, w) = z^\ + . . . + znwn .

It is easy to check that this defines a complex inner product. If we define


the length of z by
|z| = <z, z)
1/2
,

then |z| turns out to have the three properties of length listed in 5.3 of
Section 5, Chapter 1.

Example 7. Let 8 n be the vector space of exponential polynomials

P(x) = 2c ke
ik *

k=-n
defined for— n < x < tt with complex coefficients ck . We can define a
complex inner product on 8„ by

(P, 4> = P(x)q(x) dx.

To integrate the complex function pq we simply integrate its real and


imaginary parts. We define length by

Ipl = (P, Pf
2

\l/2
2
\p(x)\ dxj .

In Examples 6 and 7, the length of a complex vector z was defined by


|z| = (z, z)
1/2
, using a conjugate symmetric inner product. The usual
properties of length are:

7.6 Positivity: |z| > 0, except that |0| = 0.

Homogeneity: \cz\ = \c\ |z|.

Triangle Inequality: |z + w| < |z| + |w|.

The first is obviously satisfied because of the corresponding property of the


inner product. The second follows from

\cz\ = (cz, cz) 1/2 = [(ccz, z)] 1 ' 2 = \c\ |z|.


158 Linear Algebra Chap. 2

As in the and with the same proof, the


case of a real inner product,
triangle inequality followsfrom the Cauchy-Schwarz inequality: |(z, w)| <
|z| |w|. The proof of Cauchy-Schwarz is a simple modification of the

real case and is left as Exercise at the end of this section. The differ-
1 1

ence between the real and complex proofs here illustrates the fact that,
because a complex inner product has somewhat different properties from
a real inner product, we cannot expect theorems involving inner products
to extend without change from real to complex vector spaces. Complex
inner products are used in this chapter in the exercises following Sections
8 and 9.

EXERCISES
1. For each of the following complex numbers z, find z and \z\. Then write z
in polar form and find 1/z.

(a) 1 + i. (c) 2/.

2 + /
(b) -1 + 2/. (d)
i

2. Verify that, for complex numbers zlt z 2 , and z 3 the distributive law ,

*l(*2 + Z 3> = Z 1Z2 + Z 1 Z 3>

the associative laws

z-Sztf^ = {z x z 2 )z z and zx + (z 2 + z 3) = (z 2 + z2 ) + z3 ,

and the commutative laws

zaz2 = z2z 1 and z1 + z2 = z2 + z1

all hold.

3. (a) Verify Equation 7.1.

(b) Verify Equation 7.2.

4. Prove directly from the definitions of conjugate and absolute values that
z^ = z^ and \z z = x t \
\z x \
|z 2 |.

5. Separate the real and imaginary terms in the infinite series

to k\
fc

e
into two infinite series, and use the result to justify defining e' by

e
tB
= cos 6 + i sin d.

6. Show that if r is complex, then (D + r) is linear as an operator on the


complex vector space consisting of continuously differentiable functions
u(x) + iv(x), where x is a real variable.
Sec. 7 Complex Vector Spaces 159

7. Find the general solutions of the following differential equations.

(a) (£>
2
+ 1)7 = 1. (d) y' +y = \.

(b) (D 2 + D + I)/ =0. (e) y" +y = sin x + 1.

(c) y" + y = sin x. (f) y" +y = tan x.

8. A spring vibrating in a viscous medium has a displacement from its initial

position denoted by x(t), where t is time. Denoting differentiation with


respect to / by a dot, the differential equation,

mx + kx + hx = 0,

can be derived from physical considerations. Here m is the mass of the


spring, k is a positive constant depending on the viscosity of the medium,
and h is a positive constant depending on the stiffness of the spring.

(a) By considering the roots of the characteristic polynomial, show that


x[t) is oscillatory or not, depending on whether Amh > k2 .

(b) Assuming m, h, and k all equal to 1 , find the solution of the differential
equation satisfying x(0) = and x(0) = 3.
(c) If in part (b) we change to k =2, but leave the other conditions the
same, find the corresponding solution to the differential equation.
(d) What is the maximum displacement from the initial position under the
conditions of part (b)? Show that the oscillation tends to zero as t

increases.
(e) Sketch the graph of the displacement function under the conditions of
part (c). What is the maximum displacement and at what time does it

occur?

9. (a) Show that {a cos ex + b sin ex), where a, b, and c are real numbers,
can be written in the form r cos (ex — 0), where r = Va 2 + b 2 and
is an angle such that cos d = a\r and sin Q = bjr.

(b) The result of part (a) is useful because it shows that a linear combination
of cos ex and sin ex has a graph which is the graph of cos ex shifted by a
suitable phase angle, 8, and multiplied by an amplitude, r. Sketch the
graph of
1 1
- cos 2x -i 7= sin 2x
2 V3
by first finding r and an appropriate 0.

ax x
10. Show directly that e cos Px and e' sin fix are linearly independent
relative to the real numbers by using the formulas
gift*
+ e -iP*
i0x _e -ifix

cos fix = , sin fix =


e
— ,

rx TX
together with the fact that e and e are linearly independent relative to the
complex numbers when r j= f.

11. Prove the Cauchy-Schwarz inequality

|<z,w>|<|z||w|
:

160 Linear Algebra Chap. 2

for complex inner products. [Start, as in the real case, by assuming


|z| = |w| = 1. Express in polar form

(z, w) = |<z, vt)\e


i0
,

so that

(e-
i0
z, w) - |<z,w)|.

Then expand |<?~


i0
z — w| 2 in terms of the complex inner product.]

12. Show that C", the set of w-tuples of complex numbers, has dimension //
relative to the complex numbers and dimension 2a? relative to the real
numbers.

13. Show that if a set of real-valued functions is linearly independent relative


to the real numbers, then it is linearly independent relative to the complex
numbers. (The somewhat simpler converse statement is true quite generally
and has been treated in the text.)

SECTION 8

ORTHONORMAL BASES If 1) is a vector space with an inner product, and T) has a basis

Ui,U 2 , . . . , u„,

then it is often desirable that the basis be an orthonormal set. This means
that distinct vectors u (
and u ; are perpendicular:

u. -u ; = 0, i^j, (1)

and that each has length 1

|u<| = Vui-u^ = 1, 1=1, (2)

These conditions are useful because, if a vector x in TJ is expressed as a


linear combination of basis vectors by

X = CjUj + • • + CjU; + • • • + c nu n ,

then the coefficients c,- can be computed easily. In fact, we have

X • U; = CiUj • IT, + . . . + Ci U j • Uj + . . . + c n \x n u;

= + • • + c, + . . . + 0,
so that

C, = X • u, (3)

2
Example 1. The vectors (1,0) and (0, 1) in 'J\ form an orthonormal
basis. So does the pair of vectors (l/\'2, l/\/2) and (— 1/V2, 1/V2). In
terms of the latter basis we can write

(x y) '
= c +c
i-Jrji} 'hr2-jl}
s/2 J2i
-

Sec. 8 Orthonormal Bases 161

To determine cx and c2 we compute, using Equation (3),

*-*»>(*# fe a
5 -

In particular, if (x,j) = (1,2), we have cx = 3/V2 and c2 = 1/V2;


therefore

(1 ' 2) = + 72'^'72)'
7i(vi'^)
Example 2. The set of functions defined for — <x <
tt tt by

cos x, sin x, ... , cos nx, sin nx

span the vector space TS n of trigonometric sums of the form

n
T(x) = 2 (o n cos kx + foj. sin /ex).

A natural inner product for 7S n is

(/g)^ 1 fV(x)g(x)Jx,
77 J-s-

and with respect to this inner product, the above set turns out to be
orthonormal. In fact, computation shows that

o,
- I cos kx cos Ix dx =

— I sin kx sin Ix dx
Linear Algebra Chap. 2

The fact that orthonormal bases are simpler to work with suggests
the following question: Given a basis for a vector space with an inner
product, is there some way to find an orthonormal basis? The answer is

yes, and we shall describe an effective procedure. First we make the


following observation.

8.1 Theorem

If Uj, . . . , un is an orthonormal set in a vector space 1), then the


vectors u ;
are linearly independent. The coefficients in the linear
combination
X = dUi + . . . + c nu n

are computed by c; - = x • a,.

Proof. Suppose there are constants c t , . . . , c n such that

CiUj + . . . + cm + . . . + cnu n = 0.

If for somey we form the dot-product of both sides of this equation


with a,, then the result is

quj • u, + . CjUj • Uj, + . . . + CnU n ' U; = 0.


But u„ • Uj ;
= if k ^ j, and u 3
• u3 = 1 ; so we get c, = 0. Since j is

arbitrary, the u's are linearly independent.

Now suppose that x l5 . . . , xn is a linearly independent set in a vector


C
space \J; we shall describe a process for constructing an orthonormal set

u 1; . . . , un . First we pick any of the x's, say x l5 and set

111 = —
|Xx|
.

Then lu^ = Ix^/lxJ = 1. Now pick another of the x's, say x 2 and form ,

its projection on alt that is, form the vector (x 2 • u 1 )u 1 . If the vectors were
ordinary geometric vectors, the relationship between xlt x 2 , and
ux would be somewhat as shown in Fig. 1 1.

The vector y 2 is defined by

y2 = x2 - (x 2 -u,)uls

and we can check that y 2 is perpendicular to u 1; for


x? » u,),u.

y2 • Uj = x2 • ux - (x 2 •
UiHii! • u2)

Figure 11 = x2 • ux — x2 • ux = 0.
Sec. 8 Orthonormal Bases 163

To get a unit vector perpendicular to u x , we let

u2
"
= —
|y,l
.

The vector y 2 cannot be zero because, by its definition, that would imply
and u x were linearly dependent.
that x 2
Having found u x and u 2 we choose x 3 and form ,
its projection on the
subspace of T) spanned by u t and u 2 By definition, . this is the vector

p = (x3 • U^Ui + (x 3 • u 2 )u 2 .

We define y 3 by subtracting this projection from x 3 :

y3 = x3 - ( x3 • "i)ui - ( X3 ' u 2 )u 2 .

As before we can check that y 3 is perpendicular to u x and also to u 2 . Since


ux • ux = 1 and u x • u2 = 0, we have

y3 • ux = (x 3 • ux) - (x 3 • ux) = 0.

Similarly

y3 • u2 = (x 3 • u2) - (x 3 • u2) = 0.

Figure 12 shows these vectors as geometric arrows. We define u 3 by


setting

u3 = —y3

|y 3 l
.

Once again y 3 =
would imply linear dependence of x 3 ul5 and u 2 ,
.

But because the subspace spanned by u 2 and u 2 is the same as the one
spanned by x x and x 2 this would imply linear dependence of the x's.
,

We proceed in this way, successively computing u x u 2 , , . . . , u,. To


find u, ,
, we set

y m=x J+1
- (x ;+1 • ujux (X; + l ' U,K,
8.2

;+l
ly^+ii

As before we can verify that u ;+1 is perpendicular to uls u 2 , . . . , u; .

Equations 8.2 summarize what is known as the Gram-Schmidt process


for finding an orthonormal set from an independent set. It can be con-
tinued until we run out of x's in the independent set.

The vector
(x • ujux + . . .
+ (x •
11,011,

is called the projection of x on the subspace spanned by u x , . . . , u; . (The


projection is also called the Fourier expansion of x relative to u l5 . . . , ur
L

164 Linear Algebra Chap. 2

In Theorem 7.1 of Chapter 5 it is proved that the Fourier expansion of x


is the linear combination of the u's which is nearest to x.)

Example 3. The vectors x x = (1, — 1, 2) and x 2 = (1, 0, —1) span a


plane P in !R 3 because they are linearly independent. To find an ortho-
normal basis for P, we apply the Gram-Schmidt process to x x and x 2 . We
set

„1 = *l = (1.-1.2) = (±. =1 A.\


|x x |
V6 V6'V6'V6/'
and then compute

y2 = x - (x2 2
• uju!

= (i,o,-i)- (=±)(-L, — *)
V6/V6 V6 %/*/
= (1, 0, -1) + (|, -i, J) = (*, -i, -|).
Then

2 ~ ~
|y.l V66/36

(7 '-''- 4) -

=vk
Thus the plane P can be represented as all linear combinations

WXj + vx 2
or all linear combinations
su x + /u 2 .

The relationship between the two pairs of vectors is shown in Fig. 13.

Figure 13
Sec. 8 Orthonormal Bases 165

Example 4. Let P„[—\, 1] be the vector space of polynomials f(x) =


a + a±x + . . .
-f- a n x n defined for —1 <x <
1. We define an inner

product by

(f, g) =jj(x)g(x) dx.

We have seen in Example 5(b) of Section 4 that the functions 1, x, . . . , xn


form a basis for P n To . find an orthonormal basis, we observe that

(1, 1) = 2 and therefore set u^x) = l/v'2. Then

because (x, l/v'2) = Jlx (x/V2) dx = 0. We then compute

x x
u,(x) =
(x, x r x ax

= y/ix.
Next set

>'
3 (x)
= x
2
— (x
2
, u 1 (x))w 1 (x) - (x
2
, u 2 (x))u 2 (x)

(£5*)^-
Then

» 3 (x) = l\l/2
2 2
'
(x -i) Jx

= V¥(x 2 -i) = 2
Vl(3x -l).

The process can be continued indefinitely. (The resulting polynomials


are called the normalized Legendre polynomials, and another method for
computing them is given in Section 7 of Chapter 5.)

Of course, if we start with two bases for the same vector space "U and
apply the Gram-Schmidt process to them, we will in general get different

orthonormal bases. In particular, if one of the two given bases is already


orthonormal and we apply the Gram-Schmidt process to the other one,
we may get two different bases. The following theorem gives a condition
under which the two resulting orthonormal bases are the same except
perhaps for orientation. It is necessary to assume that U is a vector space
C

with scalar multiplication by real numbers.


e

166 Linear Algebra Chap. 2

8.3 Theorem

Let {u l5 . . . , u„} and {v l5 . . . , v,,} be two orthonormal sets in a real


C
vector space VJ. If the subspaces spanned by {u x , . . . , uk } and
{v x , . . . , v k } are the same for k = 1 , 2, . . . , n, then u k = ±v fc
for
fc= 1,2,...,«.
Proof. Let v*. be any one of the v's. Since by assumption v^ is in the
subspace spanned by {u 1? . . . , u fc}, we can write
k

for some real numbers rt . However, v fc


is orthogonal to each of
{u x , . . . , u^j} because, by assumption, it is orthogonal to
{v l5 . . . , v^}, and these two sets span the same subspace. If we
form the dot product of both sides of the above equation with u ;

for 1 <j < k, we then get

= v fc
• u; - = r it j= 1 k- 1,

yk ' U A:
~ rk-

Thus yk = {yk • u^u^.. Since both the u's and the v's have length 1,

we must have
=
.
,

|v**u*l I-

It follows that yk • uk = ± 1 ; so v fc
= ±uk .

Example 5. Consider the natural basis e x = (1, 0, 0), e2 = (0, 1, 0),

e3 = (0,0,1) together with the basis x x = (3, 0, 0), x 2 = (1,-1,0),


x3 = (1, 1, 1) for :R
3
Clearly, e x and x 1 span the same subspace of 3l 3
. .

Similarly, the pairs {e 1 and {x l5 x 2 } span the xv-plane, and the com- , e2}
plete bases both span all of 3l 3 It follows from Theorem 8.3 that applying .

the Gram-Schmidt process to {x l5 x 2 x 3 } will result in an orthonormal ,

basis of the form {±e 1; ±e 2 ±63}- As a matter of fact, we find that we ,

get {e l5 — 2 e 3 }. We leave the verification as an exercise. The vectors are


,

shown in Fig. 14. e

e,

.•'e,

Figure 14
Sec. 9 Eigenvectors 167

EXERCISES
1. (a) Find a vector (x, y, z) such that the triple of vectors (1, 1, 1), ( — 1, \, \),
and (x, y, z) forms an orthogonal basis for Jl 3 .

(b) Normalize the basis found in part (a).

2. The vectors (1,1,1) and (1, 2, 1) span a plane Pin R 3 . Find an orthonormal
basis for P by using the Gram-Schmidt process.

3. The three vectors (1 2,, 1), ( -10, 0), and (0, 1,0,2) form a basis for a
, 1 , 1 ,

subspace S of 3i i . Use the Gram-Schmidt process to find an orthonormal


basis for S.

4. Let P.2 be the 3-dimensional vector space of quadratic polynomials f{x)


defined for x < <
1. If P 2 has the inner product

f{x)g{x) dx,
</".*>
f
find an orthonormal basis for P.,. [Hint. One basis for P2 consists of
{I,*,* 2 }.]

5. Show that the application of the Gram-Schmidt process to the three vectors
Xj, x 2 x3
, in Example 5 of the text gives the triple e l5 — e2 , e3 .

6. Prove that applying the Gram-Schmidt process to an orthonormal set gives


the same orthonormal set back again.

7. Let C[ — 7T, tt] be the vector space of complex-valued functions /(x) defined
for — tt < x < n. Let {f,g) be defined for/and £ in C[ — n, n] by

<J*g) f(x)g(x) dx.

(a) Show that (f,g) is a complex inner product.


(b) Show that

!\,m = n.

0, m j= n.

(c) Show that the vector subspaces of C[~n, tt] spanned by the following
two sets are the same:

cos kx, sin kx, k =0,1,2,...,/;. (1)


ikx
e , k =0, ±1, ±2, ... , ±n. (2)

8. The conclusion of Theorem 8.3 is that u fc


= ±v fc
. What conclusion is possible
if 1) is a complex vector space? [See Problem 1 1, Section 7.]

SECTION 9

In this section we shall find a natural way to associate a basis for a vector EIGENVECTORS
t
space 1) with a given linear function /from 1? to \5. Suppose that there
168 Linear Algebra Chap. 2

is a nonzero vector x in 1) and a number ?. such that

f(x) = Ax.

Then x is called an eigenvector of the linear function/, and A is called its

associated eigenvalue. (The terms characteristic vector and characteristic


root are sometimes used.) Since /is linear, the foregoing equation will
always have the trivial solution x — for any A; so we rule that out as
being uninteresting. Of course, also by the linearity off, if x is an eigen-
vector corresponding to I, then so is ex for any number c^O.

Example 1. Let /be the linear function from ,'Jl


2
to .'Ji
2
defined by the
matrix
t r
1

Thus

x\ / 1 1 \ I x\ j +y x

\yf \4 1/ \y/ \4x + y

It is easy to verify that

1 1\/1\ /l
= 3
4 l/\2/ \2
and that

:>-i-i
2
That is, the vector (1,2) in 'Ji is an eigenvector corresponding to the
eigenvalue and the vector (1 —2) is an eigenvector corresponding to the
3, ,

eigenvalue —1. Of course, nonzero multiples of these two vectors will


be eigenvectors with the same two eigenvalues.

Before discussing how to find eigenvectors, we shall see why they are
[Link] that 'V is a vector space, /is a linear function from C
U to
HI, and suppose that °0 has a basis {x ]5 x 2 , . . . , x„} consisting of eigen-
vectors of/, that is,

/(x fc ) = 4x ft ,
k= 1,2, ... ,«,

for some numbers Xk . It follows that representing an arbitrary element of


1J using this basis leads to a particularly simple form for/. Suppose

X = CjXx + . . . + c n\ n .
=

Sec. 9 Eigenvectors 169

Then, using the linearity of/ and the fact that the x's are eigenvectors,
we have
/(x) = Cl /(Xl ) + . . .+ c n f(x n )

= c^Xj + . . . + cn X n x n .

Thus, relative to the basis of eigenvectors, the action of/ is simply to


multiply each coefficient by the corresponding eigenvalue. In short,

(n \ n

9.1
6=1 / fc=i

where

9.2 /(x,) = 4x fc
, k=\,2,...,n.

In geometric terms, using the eigenvectors for a basis allows the


action of the function / to be interpreted as a succession of numerical
multiplications, one on each of the basis vectors.

Example 2. Returning to the linear function of Example 1, we


express an arbitrary vector in 51 2 as a linear combination of the two eigen-
vectors Xj = (1, 2) and x 2 = (1, —2):

The corresponding eigenvalues are 3 and — 1 ; so from

zoo U/
Q+ „/( _]

we get

Figure 15 shows the effect of/ on each of the two eigenvectors x x and
x2 . It follows that the image /(x) of any vector x can be constructed by
170 Linear Algebra Chap. 2

MX | • VXj

Figure 15

geometric methods. Furthermore, we can express the effect of/ in words


by saying that /is a combination of two transformations:

1. Stretching away from the line through x 2 along the lines parallel to

2. Reflection in the line through x 1 and along the lines parallel to x2 .

For this analysis to work, it was essential that the eigenvectors of/ span
all of 3l 2 .

Example 3. To find the eigenvectors and corresponding eigenvalues


for the function /of Examples 1 and 2 (in casethey had not been given),
we would proceed as follows. We need to find vectors x ^
0, and numbers
X such that .
r,
/(x)
.
— ,
ax = 0.

In matrix form, this equation is

^4 l/\j>

A0 ,

4 l!\y kJ\y
or

(1-/1) 1 x
(1)
4 (i-X)l\y
:

Sec. 9 Eigenvectors 171

It is clear that if the foregoing 2-by-2 matrix has an inverse, then the only
solutions are and y — 0. Hence we must try to find values of X for
x =
which the matrix fails to have an inverse. This will occur precisely when

/(1-A) 1 \
det = 0,
\ 4 (1 - X)l

that is, when X satisfies (1 — A) 2 — 4 = 0. This quadratic equation in X


has roots X = 3 and X — — 1 , as we can see by inspection or otherwise.
To find eigenvectors corresponding to X = 3 and X = — 1 we must , find
x and j', not both zero, satisfying Equation (1). Thus we consider

and
- (1J0-C

These equations reduce to

X = 3: -2x +y=
X = - 1 2x + y = 0.
It follows that there are many solutions; but all we need is one for each
eigenvalue. We choose for simplicity

X = 3
\yl \2

X= -1
\yl \-2
though any nonzero numerical multiple of either vector would do.

The method of Example 3 can be summarized as follows. To find


eigenvalues of a linear function / from % n to 'Ji
n
, solve the characteristic
equation

9.3 det (A - XI) = 0,

where A is the matrix of/. Then, for each eigenvalue Xx , . . . , X„


try to find nonzero vectors x l5 . . . , x n that satisfy the matrix
equation

9.4 (A - VK = 0.
\

172 Linear Algebra Chap. 2

Because Equations 9.3 and 9.4 are expressed in terms of the matrix A off,
we sometimes and eigenvalues of the matrix rather
refer to eigenvectors
than the function/. Of course Equation 9.4 is just Equation 9.2 in matrix
form; but Equation 9.2 also applies to linear functions that are not
representable by matrices.

Example 4. This example will show one way in which a linear function
can fail to have eigenvectors that provide a suitable basis for the vector
2
space on which/acts. Consider the linear function/from J{ to :K 2 defined
for fixed 0, < < 2tt, by

x\ /cos — s'mO\/x
/
yl \sin cos 01 \y
This function carries

1 /cos
into
0/ \sin
and
0\ /-sin 6
into
1/ \ cos0

and so represents a rotation counterclockwise about the origin through an


angle 6. Equation 9.3 becomes

(cos 6 — X —sin \
=
sin 6 cos 6 — XI
or
X2 - 2X cos 6 + 1 = 0.

This quadratic equation for X has only complex conjugate roots of the
form
cos 6 ± i sin 6 = e
±i6
.

2
Since we are looking for nonzero vectors x in III satisfying

f(x) = e
±ie
x,

we conclude that /has no eigenvectors in ,'K


2
, unless = or = tt. In
the first case we get the identity transformation, which clearly has every
nonzero vector as an eigenvector with eigenvalue 1. In the second case
we get /(x) = —x, which is a reflection in the origin and has every
nonzero vector as an eigenvector with eigenvalue — 1. A moment's
thought shows why, in general, a rotation cannot have eigenvectors in .'K 2 .

See Exercise 8 for an example of a linear function which fails in a different


way to provide a basis of eigenvectors.
Sec. 9 Eigenvectors 173

We conclude by considering the possibility that the eigenvectors of a


linear function /from C U to HJ can be chosen so that they form an ortho-
normal basis. Since eigenvectors are only determined to within a numer-
ical multiple, we can always normalize them to have length 1. We can
achieve the orthogonality if the function / is symmetric with respect to
C
an inner product on \J, that is,

/(x) •
y = x ./(y)
c
for all x and y in l). In particular, if/is a linear function from 51" to 31",
it has a matrix A = (a i}), and we can write the symmetry equation as

Ax '
y = x • Ay.

Example 6. Iff is a linear function from 3t2 to Jl 2 with a matrix

<a b'
A=
\b c

that is symmetric about its main diagonal, then f is a symmetric trans-


formation. We can compute

a b\ / Vl \ />'A _ lax, + bx 2 \ U)
b c!\xj \yj \bx1 + cxj \y 2 l
= ax^i + b(x 2 y 1 + x 1 y 2) + cx 2 y 2 .

Similarly, we compute
f
xA la b\ly 1 \_(xA(ayx + by %

,xj \b cl \yj \xj \byi + cy 2


= a*i>'i + b(x 1 y 2 + x 2 y^ + cx 2 y 2 .

Since the two dot-products are equal, we conclude that /is symmetric.

For a symmetric transformation we have the following theorem.

9.5 Theorem
C
Let /be a symmetric linear function from HJ to \J, where °\3 is a
vector space with an inner product. If x x and x 2 are eigenvectors of
/corresponding to distinct eigenvalues X x and A 2 then x 2 and x 2 are
,

orthogonal.

Proof. We need to show that x x • x2 = assuming that/(xj) = Xxxl


and/(x 2 ) = X 2 x 2 We have
.

/(Xj) • x2 = (X x x x ) • x2 = A^Xx • x2)


174 . Linear Algebra Chap. 2

and
Xi -/(x 2 ) = xx • (A 2 x 2 ) = A 2 (Xi • x 2 ).

Because/is symmetric, /(x x ) • x2 = x */(x );t 2 so

A 1 (x 1 • x2) = X (x x2 1

2 ).

Hence (X 1 — A 2 )(x 1 • x2) = 0. Since X 1 and X 2 are assumed not equal,


we must have x x x 2 • = 0.

A more general version of Theorem 9.5 is given in Chapter 5, Section 7,


where it is applied to a different class of examples.

Example 7. The function g from 3i 2 to ft 2 with matrix

T 2\

has eigenvalues computed from Equation 9.3 by

U\-X) 2 \
det = 0,
\ 2 (1 - X)l

that is, — A) 2 + 4 = 0. By inspection the eigenvalues are 1 — 3 and


(1

X =— (Note that the transformation g is not the same as the trans-


1.

formation /of Example 1, even though they have the same eigenvalues.)
Equation 9.4 for the eigenvectors has two interpretations, depending on
which eigenvalue is used. We have

1 J)0 -C
1= -1:

These equations reduce to

X= 3: -2x + = 0, 2y

A=-l: 2x+2y = 0.
We find solutions
X= 3: (x,y)=(l,l),

X=-\: (x,y) = (l,-l).

The vectors Xj = (1, 1), x2 = (1, — 1) are clearly orthogonal, and they

can be normalized to give u x = (1/V2, 1/V2), u 2 = (1/V2, — 1/V2)-


Of course the normalized vectors u x and u 2 are eigenvectors also.
u

Sec. 9 Eigenvectors 175

We can now take advantage of the fact that ii! and u 2 are eigenvectors
and also that they form an orthonormal set. The latter fact enables us to
express any vector in Jl 2 as a linear combination of u t and u 2 very simply.
From the previous section we have

X = (X • Uj)^ + (x • u 2 )u 2

for any x in 5l
2
. Now using the fact that u x and u 2 are eigenvectors of the
linear function g, we can write

g(\) = (x •
uOsCuO + (x • u 2 )g(u 2 ).

But gi^) = 3u x and g(u 2 ) = — 2 , so

g(x) = 3(x •
u^Ux - (x u 2 )u 2 .

Thus the action of g has a geometric description like that given for fin
Example 2 of this section. The only difference is that the stretching and
reflection takes place along different lines: perpendicular lines in the case
of g, nonperpendicular lines in the case off.

EXERCISES
1. The linear function /from Jl
2
to R2 with matrix

'1 12\

has eigenvalues 7 and 5. Which of the following vectors is an eigenvector


of/? For those that are, what is the corresponding eigenvalue?

2. Find all
;) n< (::)•

the eigenvalues of each of the linear functions defined by the


D c

following matrices.

4\ /l 0\
(b)
10/ (d) 2 1

\0 1 2/

3. For each matrix in Exercise 2, find an eigenvector corresponding to each


eigenvalue.

4. Show that is an eigenvalue of a linear function /if and only if/ is not
one-to-one.
176 Linear Algebra Chap. 2

5. Show that if/ is a one-to-one linear function having A for an eigenvalue,


then/ -1 , the inverse of/, has 1/A for an eigenvalue.

6. Let /be a linear function having A for an eigenvalue.

(a) Show that A 2 is an eigenvalue for/o/.


(b) Show that A n is an eigenvalue for the function got by composing/ with
itself n times.
(oc)
7. Let C (3l) be the vector space of infinitely often differentiable functions

f(x) for x in 31. Then the differential operator D 2 acts linearly from C (G0) (^)
to C (00) (3l).
(a) Show that for any real number A the functions cos ?.x and sin Ax are
eigenvectors of D 2
corresponding to the eigenvalue —A 2 .

(b) Let C (x, [0, tt] be the subspace of C (x, (#) consisting of functions/such
=0. Show if D
2
that /(0) =/(tt) that is restricted to acting on
C (x) [0, 77], then its only eigenfunctions are of the form sin kx, corre-
sponding to A = —A: 2 , where k is an integer.

8. Let h be the linear function from R 2


to 3l
2
with matrix

'1 V

\0 1,

(a) Show that the eigenvectors of A span only a 1-dimensional subspace of


3l 2Thus the geometric action of h cannot be analyzed by looking only
.

at what it does to eigenvectors.


(b) The linear function h is called a shear transformation. Give a geometric
2
description of the action of h on 3t .

9. (a) Find the eigenvalues and a corresponding pair of eigenvectors for the
/from R 2 to 31 2 having matrix
linear function

(b) Show that the eigenvectors of the function /in part (a) form an orthog-
2
onal basis for 3l , and use this fact to give a geometric description of the
2
action of/ on 31 .

(c) Generalize the results you found for (a) and (b) to any linear function/
from 31" to 31" having a diagonal matrix diag (a u a 2 , . . . , c„).

2
10. Find the eigenvalues of the function^ on 3l with matrix

1 2^

,1 1,

2
Show that the corresponding eigenvectors span 3l and describe the action
ofg.
Sec. 9 Eigenvectors 111

11. Let C 2 be the complex vector space of pairs z = (zx z 2) of complex numbers, ,

and suppose that C 2 has the complex inner product

z • w = z 1 w1 + z2 w2 .

(a) Show that the linear function /defined on C 2 by the matrix

/'cos 6 —sin 6\

v
sin 8 cos 8}

has eigenvectors and corresponding eigenvalues

V
X = e
i{

X=e-

(b) Show that if sin 6=0, then every nonzero vector in C 2 is an eigen-
vector of the function fin part (a).

(c) Show that the eigenvectors in part (a) form a basis for C2 .

(d) Show that the eigenvectors in part (a) are orthogonal with respect to the
complex inner product in C2 .

12. Explain in geometric terms why a rotation about the origin in ft 2 through
an angle 8, < 8 < 2tt, has no eigenvectors in ft 2 unless 8 = or 8 = -n.

13. Suppose that 'W is a complex vector space with a complex inner product, so
that <z, w) = (w, z). Then a linear function / from 10 to 10 is called
Hermitian symmetric if </(z), w) = <z,/(w)>, for z, w in 10.

(a) Show that if /'is Hermitian symmetric and has A for a complex eigen-
value, then A must actually be real.
(b) Show that if 10 has complex dimension 2 and / is given by a 2-by-2
complex matrix
y\

then /is Hermitian symmetric if and only if /? = y.


14. If
(x(t)
x(/) =
\yit)

defines a function of the real variable t taking values in ft 2 , then we define


the derivative of x by
(x'(f)\
x'(0 =
\y'(t))
178 Linear Algebra Chap. 2

(a) Let
(a c

b d
Show that the vector equation
x' = Ax
is the same as the system of first-order differential equations

x = ax + cy

y = bx + dy

(b) Show that x(?) = e


x,
c, where c is a constant vector, is a solution of the
vector equation x' = An if c is an eigenvector of the matrix A with
corresponding eigenvalue A.

(c) If the matrix A is

'1 2\

v2 1,

find solutions of the vector differential equation x' = As. in the form

x = e^i'c! + e^'cj.

[Note. The eigenvectors cx and c 2 are determined only to within a non-


zero numerical multiple.]

(d) Show that the second-order differential equation

(Z> - r x ){D - r 2 )y =
is equivalent to the system of first-order equations

x = rx x
y' =x + r 2 x,

where x = (D — r 2 )v- Then find the matrix A that can be used to


express this system in the form of part (a).

(e) Show that the characteristic equation of the matrix A found in part (d)
is the same as the characteristic equation of the operator (D — r^) x
(D — r 2 ) as defined in Section 6.

15. (a) Show that the functions cos kx and sin kx are eigenvectors of the
differential operator D2 acting on C (cc) What are the corresponding
.

eigenvalues?
(b) If we restrict the operator D 2
to real-valued functions defined for
—" < x < n, we can define an inner product by

</><?>=
L f(x)g(x)dx.

Show that the eigenvectors cos kx and sin Ix are orthogonal with respect
to this inner product for k, I = 1,2, 3, ... . Then use Theorem 8.1 to

conclude that the eigenvectors are linearly independent.


Sec. 10 Coordinates 179

(c) Show that the functions e ikx and e~ lkx are eigenvectors of the differential
operator D2 acting on complex-valued functions. What are the corre-
sponding eigenvalues?
(d) Using the complex inner product

</".*>* f{x)g(x) dx,


/:
show that the functions e ikx are orthogonal, for k = 0, ±1, ±2, ....
Can you conclude from Theorem 8.1 that the complex exponentials are
linearly independent?

SECTION 10

In this section we show how any problem about finite-dimensional COORDINATES


vector spaces and linear functions on them can be reduced to an equivalent
problem about 31" and matrices. This is done by introducing coordinates
in the vector spaces. The familiar coordinates of 2- and 3-dimensional
geometry may be considered a special case of what we are about to
describe. The following theorem is fundamental.

10.1 Theorem

Let V= (v l5 . . . , v n ) be a basis for a vector space U, so


£
dim CU) =
n. For any vector x in 'TJ there is one and only one «-tuple of
numbers rl5 , r„ such that x = . . . r^s x + . . . + r n\ n .

Proof. Since the v's span 17, x is a linear combination of them; so


there is at least one «-tuple (rx , . . . , r„) satisfying the condition. If
(s^ . . . , sn) is any other «-tuple such that

then
^Vi + . . . + s n\ n = x = w + r n\ n ,

fa - *iK + • • • + (r n - s n )v n = 0.

Since the v's are independent, the numbers rt — st must all be zero,
and so (s x , . . . , s n ) is the same as (r ls . . . , r n ).

Let V = (vj, . . . , v„) be a basis for a vector space


C
\J, and let x be a
vector in °U. The unique «-tuple (rl9 . . . ,r n ) such that x = r-^ x + . . . +
r n v n is the «-tuple of coordinates of x with respect to the basis V. The
vector in 31" with entries r1} . . . , r n is the coordinate vector of x with
respect to the basis.

Example 1. (a) The vectors Vj = — ( 1, 1, 0) and v 2 — (0, — 1, 1) form


a basis for the subspace 1) of Jl 3 consisting of all vectors the sum of whose
180 Linear Algebra Chap. 2

o entries is 0. The vector x = (1, — 3, 2) is in X5. An easy calcu- <

\ f^\ lation shows that x = —\ + 2v so the coordinates of x with


1 2 ,

\ /y \ respect to the basis vx , v 2 are (— 1, 2).

\ / \ A (b) In the plane, let Vj be the vector of unit length along the
q ~ *"*
horizontal axis, and v 2 the vector of unit length at an angle 60°
counterclockwise from v x . Let x be the vector of unit length 60°
Figure 16 beyond v 2 shown , in Fig. 16.
By geometry, CB is parallel to OA and OC is parallel to AB
(why?); so OABC is a parallelogram and v = 2 x + Vj. Hence x = —v + 1

v2 and the coordinates of x with respect to this basis are (—1, 1).

Coordinates provide a way of representing vectors in a concrete form


suitable for calculations. They also provide a concrete way of representing
linear functions. The key point is that if the value of a linear function is

known for each vector of a basis, then the function is known completely.
This follows from the fact that any vector x can be expressed as a linear
combination r x \x + • • • + r n\ n of the vectors in a basis. Then/(x) is the
combination ^/(Vi) + . . . + rn
f(\ n) of the vectors /(vj with the same

coefficients, because/is Suppose 1) > ID isagivenlinearfunction


linear. —
and that V = (v x v„) and,
= (w 1;
. . .w m ) are bases in CU and
, W . . . ,

ID, respectively. The possibility that 1J and 10 are the same space is not
ruled out. The matrix off with respect to the bases V and is defined to W
be the m-by-n matrix whose y'th column is the coordinate vector of/(v ) ;

with respect to the basis W. Thus if we denote by A = (a^) the matrix off
relative to Kand W, then the entries in they'th column of A appear as the
coefficients a tj in the equation

/(*) = ««*! + ... + a mj yv m .

The matrix A contains all the information needed to evaluate f(x) for
any x. For if x is represented as

with coordinate vector r = (rlt . . . , r n ), then

/(x)=i>,/(v,)

n m

=i
f=l
(i<wW
\j=l I

We can then recognize the coefficients in the representation of f(\)


relative to W as the entries in the product Ax of the matrix A and the
coordinate vector r. We remark that when T) and 1J0 are the same space
it is usually appropriate, though not logically necessary, to take V and W
v

Sec. 10 Coordinates 181

to be the same bases. We do this in all the examples for which HJ is the
same as ID.

Example 2. (a) If / is the linear function from Jl" to %m given by


multiplication by a matrix A, then the matrix of/ with respect to the
natural bases in % n and % m is A itself. This is just a rephrasing of Theorem
4.2 of Chapter 1.

(b) The function / from % 3


to 3l 3 defined by

takes the subspace 1J spanned by v x — — 1 I and v 2 = I — 1 I into

0/ \ 1/
itself, and it may be considered as a linear function from °U to C
U. We have

and/(v 2 ) = — lvi + 0v 2 so the matrix


, of/ with respect to the basis Fis

-1 -1
1

Let v 1; v 2 be the basis in 31 2 taken in Example 1(b), and let /be


(c)

a rotation of 60° counterclockwise. Then/^) v 2 and/(v 2 ), the vector =


x in Fig. 16, is equal to — x + v2 . The matrix of/ with respect to the basis
v is therefore

(0 -1 N

1 1

(d) Let P2 be the vector space of quadratic polynomials, and let ID


be 3l 4 . Define f(p), for p a polynomial in P2 , to be the vector in 3l 4 with
entries p{\),p{—\), p(2), and p(3). Take the polynomials 1, x, x 2 as a
basis for P2
and the natural basis as basis for Jt 4 Then /takes the poly- .

nomial into the 4-tuple (1, 1, 1, 1); it takes the polynomial x into the
1

4-tuple (1, —1,2,3); and it takes x z into (1, 1,4,9). Its matrix with
182 Linear Algebra Chap. 2

respect to the given bases is therefore

Given a basis V = (v 1 , . . . , v n ) for


C
XJ, we may define a function c v
from °0 to ft" by setting c v (x) equal to the coordinate vector of x with
respect to V for every x in XJ.
<
The function cv is called the coordinate
map for the basis V. The range of c v is obviously all of ft" because, for
any n-tuple rlt . . . , we can take x =
rn , r^ + . . . + rn \ n ; then c v {\)
is the given n-tuple. By Theorem 10.1, c v is one-to-one. Therefore there
C
isan inverse function c~y from ft" to U. It is easily seen to be given by the
formula
Cy\rlt . . . ,r n ) = r^j + . . . + rn \ n .

An obvious but important fact is:

10.2 Theorem

Coordinate maps and their inverses are linear functions.

Proof. To show that c v is linear, we must verify that c v (rx + sy) =


rc v (\) + sc v (y). This follows at once from the fact that if x =
n n n

2 fljV,, and y = 2 ^t v i» tnen r\ -\- sy = 2 ( ra i + ^H- The


»=1 t=l i=l
1
linearity of cy is an immediate consequence of the same equation.

Note that if °\J = ft", the coordinate map c for the natural basis is the
identity function c(x) = x.

A linear function 17 — > IT, which is one-to-one and whose range is

all of 'Ui, is called an isomorphism, and two spaces such that there exists an
isomorphism between them are said to be isomorphic. We have shown that
coordinate maps are isomorphisms, which proves:

10.3 Theorem

Every rt-dimensional real vector space is isomorphic to ft".

The significance of these concepts lies in the fact that isomorphic


spaces are alike in all respects, when considered as abstract vector spaces.
Sec. 10 Coordinates 183

Any statement true for one space (provided it can be formulated entirely
in terms of addition and numerical multiplication) can be carried over by
an isomorphism to give a true corresponding statement for an isomorphic
space. For example, if TJ —^-> ID is an isomorphism, then an equation
n n

y =2
i=l
a xi
i
ls true m ^ if an d only if
f(y) =^
t'=l
a if(*i) is true in ID.

By using coordinate maps we can give an alternative description of the

matrix of a linear function. Given TJ — > ID, and bases V = (v x , . . . , v„)


in IT and W= (w 1 , . . . , wm ) in ID, the composition c u -°f° c v
x
is a
function from 3V to 'JV". It is therefore given by multiplication by some
m-by-n matrix A we claim that A is precisely the same as the matrix of/
;

with respect to the bases V and W. To prove the assertion, note that the

f° Cy )^,) = c w (f(\ )), since Cy ^) = v,. Now


1 1
y'th column of A is (c w °
}

c, r (/(Vj)) is (by definition of c w ) the coordinate vector of/(v; ) with respect


to the basis W, and this is just how the matrix of/ with respect to bases
Kand Wwas defined. This description of the matrix of a function makes
the proof of the following generalization of Theorem 4.4 of Chapter 1

very easy. It simply says that, in general, matrix multiplication corresponds


to composition of functions, provided one uses bases consistently.

10.4 Theorem

Let / and g be linear functions with and cl) —^> ID. C


U> —^> TJ
C
Suppose bases U, V, and Ware given in U, and ID, that A is the C
IL,
matrix of/ with respect to the bases U and V, and that B is the
matrix of g with respect to the bases Fand W. Then BA is the matrix
of g o/with respect to the bases U and W.

Proof. By Theorem 4.3 of Chapter 1 and the characterization of


matrices of functions by coordinate maps, this is simply the statement
that

(C W o g o Cy
1
) °(c v ofo Cr]) = CW o
(g of) O c'r],

which is clear because composition of functions is associative.

The special case in which 1L, CU, and ID are all the same space with the
same basis is particularly important. If °0 —f-+ T) has matrix A, then
/o/has matrix A 2 f °f °f has
,
matrix A3 , etc.

Example 3. (a) For the function / of Example 2(b), it is clear that


/°/°/is the identity, and it is easy to verify that
3
1 l\
184 Linear Algebra Chap. 2

(b) Differentiation is a linear function from the space of polynomials


to itself. If we define D(p) to be /;' for elements of the space of quadratic
polynomials and use the basis (\,x, .x 2 ) as in Example 2(d), then the
matrix of D is .

A) 1 0'

A=10 2

\0 0/
It is easy to compute that
'0 2"

\0 0/

and A 3 = 0, which corresponds to the fact that the third derivative of


any quadratic polynomial is 0.

The matrix of a linear function with respect to a pair of bases, of


course, depends on the bases. Since the matrix with respect to any pair
completely determines the function, it should be possible to compute the
matrix of a function with respect to one pair of bases from its matrix with
respect to any other.
Let us look at an example. Suppose/has the matrix

1 2 0\

1 1 3y

3 2
with respect to the natural bases in 'J\ and ll\ . Consider the bases

in 3t 3 and

•) -(;;
in Si 2
. To find the matrix off with respect to (v l5 v 2 v 3 ) , and (w x , w 2 ), we
compute , ,

5w
'(
'Hio) = -
/(v 2 ) = ( 1=8*!- 3w 2 , /(v3) = ( )
= - 4w +
i
3w 2-

Hence the required matrix for/ is

'5 8 -4
\0 -3 3
X
Sec. 10 Coordinates 185

A basis V — (v l5 . . . , v„) can be described in terms of another basis


X= (x l5 . . . , x„) by using the matrix whose y'th column is the coordinate
vector of v; with respect to X. This matrix gives a function from %n to 31",
1
which can be recognized as cx ° cy by observing its effect on the natural
basis vectors e} . Multiplication by this matrix converts the K-coordinates
r
of any vector w into the A -coordinates of w, since

c.v( w) = (c x ° cv
1
) ° c v (v/)

The inverse matrix gives the x's in terms of the v's and corresponds to
x
cv o cy . In the example in the preceding paragraph, the matrices giving
V and W in terms of the natural bases in 'Si
2