100% found this document useful (1 vote)
642 views364 pages

Nering LinearAlgebraAndMatrixTheory Text

Nir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
642 views364 pages

Nering LinearAlgebraAndMatrixTheory Text

Nir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 364

NERING

so

rri ^
o>
Z
33

m
O
O
Z
o

512 I
943

NER LINEAR ALGEBRA AND


MATRIX THEORY
WILEY By EVAR D.NERINGi^S
Nering

second edition
z
m
> LINEAR ALGEBRA
73
AND
o
m MATRIX THEORY
09
73
> Evar D. Nering
>
z

73
X
H
m
o
<
second
edition

Wiley
Ct Ht-

ABOUT THE AUTHOR


EVAR D. NERING is Professor of
Mathematics and Chairman of the
Department at Arizona State Uni-
versity, where he has taught
since 1960. Prior to that, he was
Assistant Professor of Mathe-
matics at the University of Minne-
sota (1948-1956), and Associate
Professor at the University of Ari-
zona (1956-1960). He received his
A.B. and A.M., both in Mathe-
matics, from Indiana University.
In 1947, he earned an A.M. from

Princeton University, and re-


ceived his Ph.D. from the same
university in 1948. He worked as
a mathematician with Goodyear
Aircraft Corporation from 1953 to
1954.

During the summers of 1958, 1960,


and 1962, Dr. Nering was a visit-
ing lecturer at the University of
Colorado. During the summer of
1961, he served as a Research
Mathematician at the Mathe-
matics Research Center of the
University of Wisconsin.
\ Is

3S3M-0
PRESTON POLYTECHNIC
LIBRARY & LEARNING RESOURCES SERVICE

This book must be returned on or before the date last stamped

Y^jji-nP'W'M

:>i

512.943 NER
r.h
A/C 033340

Pr 30107 000 534 047


Linear Algebra and
Matrix Theory
second edition

Evar D* Nering
Professor of Mathematics
Arizona State University

®
John Wiley & Sons, Inc.

New York London Sydney Toronto


HARRIS COLLEGE
PRESTON
M7..i^X N&£~

Copyright © 1963, 1970 by John Wiley & Sons, Inc.

AIJ rights reserved. No part of this book may be


reproduced by any means, nor transmitted, nor trans-
lated into a machine language without the written
permission of the publisher.

Library of Congress Catalog Card Number: 76-9(646


SBN 471 63178 7 y^
Printed in the United States of America
10 987654321
Preface to
first edition

The underlying spiritof this treatment of the theory of matrices is that of a


of an
concept and its representation. For example, the abstract concept
integer is the same for all cultures and presumably should be the same to a

being from another planet. But the various symbols we write down and
refer to as "numbers" are really only representations of the
carelessly
abstract numbers. These representations should be called "numerals" and
we should not confuse a numeral with the number it represents. Numerals
individuals, and
of different types are the inventions of various cultures and
lies in the ease with
the superiority of one system of numerals over another
which they can be manipulated and the insight they give us into the nature

of the numbers they represent.


We happen to use numerals to represent things other than numbers. For
players
example, we put numerals (not numbers) on the backs of football
them. This does not attribute to the football
to represent and identify
players any of the properties of the corresponding numbers, and the usual
operations of arithmetic have no meaning in this context. No
one would
adding the halfback, 20, to the fullback, 40, to obtain the guard, 60.
think of
variety of
Matrices are used to represent various concepts with a wide
different properties. To cover these possibilities a number of different
manipulations with matrices are introduced. In each situation the appro-
priate manipulations that should be performed on a
matrix or a set of
student who
matrices depend critically on the concepts represented. The
the under-
learns the formalisms of matrix "arithmetic" without learning

lying concepts is danger of performing operations which have no


in serious
meaning for the problem at hand.
The formal manipulations with matrices are relatively easy to learn, and
accurately, if
few students have any difficulty performing the operations
somewhat slowly. If a course in matrix theory, however, places too much

emphasis on these formal steps, the student acquires an ill-founded


self-

analogous to adding halfbacks to


assurance. If he makes an error exactly
fullbacks to obtain guards, he usually considers this to be a minor error
appreciate how
(since each step was performed accurately) and does not
serious a blunder he has made.
several
In even the simplest problems matrices can appear as representing
V1
Preface to First Edition

different types of concepts. For example, it is typical to have a problem in


which some matrices represent vectors, some represent linear transforma-
tions, and others represent changes of bases. This alone
should make it
clear that an understanding of the things represented is essential to
a meaning-
ful manipulation of the representing symbols. In courses
in vector analysis
and differential geometry many students have difficulty with the concepts
of co variant vectors and contra variant vectors. The troubles stem
almost
entirely from a confusion between the representing symbols and the things
represented. As long as a student thinks of «-tuples (the representing
symbols) as being identical with vectors (the things represented) he must
think there are two different "kinds" of vectors all jumbled together and
he
sees no way to distinguish between them. There are, in fact, not two
"kinds"
of vectors. There are two entirely distinct vector spaces their representations
;

happen to look alike, but they have to be manipulated differently.


Although the major emphasis in this treatment of matrix theory is on
concepts and proofs that some may consider abstract, full attention is given
to practical
computational procedures. In different problems different
patterns of steps must be used, because the information desired is not the
same in all. Fortunately, although the patterns change, there are only a
few different types of steps. The computational techniques chosen here are
not only simple and effective, but require a variety of steps that is particularly
small. Because these same steps occur often, ample practice is
provided
and the student should be able to obtain more than adequate confidence
in his skill.
A single pattern of discussion recurs regularly throughout the book.
First a concept is introduced. A coordinate system is chosen so that the
concept can be represented by «-tuples or matrices. It is shown how the
representation of the concept must be changed if the coordinate system is
chosen in a different way. Finally, an effective computational procedure is
described by which a particular coordinate system can be chosen so that the
representing w-tuples or matrices are either simple or yield significant in-
formation. In this way computational skill is supported by an understanding
of the reasons for the various procedures and why they differ. Lack of this
understanding is the most serious single source of difficulty for many students
of matrix theory.
The material contained in the first five chapters is intended for use in a
one-semester, or one-quarter, course in the theory of matrices. The topics
have been carefully selected and very few omissions should be made. The
sections recommended for omission are designated with asterisks.The part
of Section 4 of Chapter III following the proof of the Hamilton-Cayley
Theorem can also be omitted without harm. No other omission in the first
three chapters is recommended. For a one-quarter course the following
Preface to First Edition vii

additional omissions are recommended: Chapter IV, Sections 4 and 5;


Chapter V, Sections 3, 4, 5, and 9.
Chapter V contains two parallel developments leading to roughly analogous
results. One, through Sections 1, 6, 7, 8, 10, 11, and 12, is rather formal
and is based on the properties of matrices; the other, through Sections
1, 3, 4, 5, 9, and 11, is more theoretical and is based on the properties of

linear functional This latter material is, in turn, based on the material
.

in Sections 1-5 of Chapter IV. For this reason these omissions should not
be made in a random manner. Omissions can be made to accommodate a
one-quarter course, or for other purposes, by carrying one line of development
to its completion and curtailing the other.
The exercises are an important part of this book. The computational
exercises in particular should be worked. They are intended to illustrate the
theory, and their cumulative effect is to provide assurance and understanding.
Numbers have been chosen so that arithmetic complications are minimized.
A student who understands what he is doing should not find them lengthy
or tedious. The theoretical problems are more difficult. Frequently they
are arranged in sequence so that one exercise leads naturally into the next.
Sometimes an exercise has been inserted mainly to provide a suggestive
context for the next exercise. In other places large steps have been broken
down into a sequence of smaller steps, and these steps arranged in a sequence
may find it easier to work ten exercises
of exercises. For this reason a student
in arow than to work five of them by taking every other one. These exercises
are numerous and the student should not let himself become bogged down
by them. It is more important to keep on with the pace of development
and to obtain a picture of the subject as a whole than it is to work every
exercise.
The last chapter on selected applications of linear algebra contains a lot
of material in rather compact form. For a student who has been through
the first five chapters it should not be difficult, but it is not easy reading for
someone whose previous experience with linear algebra is in a substantially

different form. Many with experience in the applications of mathematics


will be unfamiliar with the emphasis on abstract and general methods. I
have had enough experience in full-time industrial work and as a consultant
to know that these abstract ideas are fully as practical as any concrete
methods, and anyone who takes the trouble to familiarize himself with
these ideas will find that this is true. In all engineering problems and most
scientific problems it is necessary to deal with particular cases and with
particular numbers. This is usually, however, only the last phase of the
work. Initially, the problem must be dealt with in some generality until
some rather important decisions can be made (and this isby far the more
interesting and creative part of the work). At this stage methods which
viii Preface to First Edition

lead to understanding are to be preferred to methods which obscure under-


standing in unnecessary formalisms.
This book is frankly a textbook and not a treatise, and
have not attempted
I

to give credits and references at each point. The material


is a mosaic that

draws on many sources. I must, however, acknowledge my tremendous


debt to Professor Emil Artin, who was my principal source of mathematical
education and inspiration from my junior year through my Ph.D. The
viewpoint presented here, that matrices are mere representations of things
more basic, is as close to his viewpoint as it is possible for an admiring
student to come in adopting the ideas of his teacher. During my student
days the book presenting this material in a form most in harmony with
this viewpoint was Paul Halmos' elegant treatment in Finite Dimensional
Vector Spaces, the first edition. I was deeply impressed by this book, and
its influence on the organization of my text is evident.

Evar D. Nering

Tempe, Arizona
January, 1963
Preface to
second edition

This edition differs from the first primarily in the addition of new material.
Although there are significant changes, essentially the first edition remains
intact and embedded in this book. In effect, the material carried over from
the first edition does not depend logically on the new material. Therefore,
this first edition material can be used independently for a one-semester or
one-quarter course. For such a one-semester course, the first edition usually
required a number of omissions as indicated in its preface. This omitted
material, together with the added material in the second edition, is suitable
for the second semester of a two-semester course.
The concept of duality receives considerably expanded treatment in this
second edition. Because of the aesthetic beauty of duality, it has long been
a favorite topic in abstract mathematics. I am convinced that a thorough
understanding of this concept also should be a standard tool of the applied
mathematician and others who wish to apply mathematics. Several sections
of the chapter concerning applications indicate how duality can be used.
For example, in Section 3 of Chapter V, the inner product can be used to
avoid introducing the concept of duality. This procedure is often followed
in elementary treatments of a variety of subjects because it permits doing
some things with a minimum of mathematical preparation. However, the
cost in loss of clarity a heavy price to pay to avoid linear functionals.
is

Using the inner product to represent linear functionals in the vector space
overlays two different structures on the same space. This confuses concepts
that are similar but essentially different. The lack of understanding which
usually accompanies this shortcut makes facing a new context an unsure
undertaking. I think that the use of the inner product to allow the cheap
and early introduction of some manipulative techniques should be avoided.
It is far better to face the necessity of introducing linear functionals at the
earliest opportunity.
I have made a number of changes aimed at clarification and greater
precision. I am not an advocate of rigor for rigor's sake since it usually

adds nothing to understanding and is almost always dull. However, rigor


is not the same as precision, and algebra is a mathematical subject capable

of both precision and beauty. However, tradition has allowed several


situations to arise in which different words are used as synonyms, and all

IX
x Preface to Second Edition

are applied indiscriminately to concepts that are not quite identical. For
this reason, I have chosen to use "eigenvalue" and "characteristic value" to
denote non-identical concepts; these terms are not synonyms in this text.
Similarly, I have drawn a distinction between dual transformations and
adjoint transformations.
Many people were kind enough to give me constructive comments and
observations resulting from their use of the first edition. comments
All these
were seriously considered, and many resulted in the changes made in this
edition. In addition, Chandler Davis (Toronto), John H. Smith (Boston
College), John V. Ryff (University of Washington), and Peter R. Christopher
(Worcester Polytechnic) went through the first edition or a preliminary
version of the second edition in detail. Their advice and observations were
particularly valuable to me. To each and every one who helped me with
this second edition, I want to express my debt and appreciation.

Evar D. Nering

Tempe, Arizona
September, 1969
Contents

Introduction

I Vector spaces

1. Definitions 5
2. Linear Independence and Linear Dependence 10

3. Bases of Vector Spaces 15


4. Subspaces 20

II Linear transformations and matrices

1. Linear Transformations 27
2. Matrices 37
3. Non-Singular Matrices 45
4. Change of Basis 50
5. Hermite Normal Form 53
6. Elementary Operations and Elementary Matrices 57
7. Linear Problems and Linear Equations 63
8. Other Applications of the Hermite Normal Form 68
9. Normal Forms 74
*10. Quotient Sets, Quotient Spaces 78
11. Hom(U, V) 83

85
III Determinants, eigenvalues, and similarity transformations

1. Permutations 86
2. Determinants 89
3. Cofactors 93
4. The Hamilton-Cayley Theorem 98
5. Eigenvalues and Eigenvectors 104
6. Some Numerical Examples 110
7. Similarity 113
*8. The Jordan Normal Form 118

128
IV Linear functionate, bilinear forms, quadratic forms

1. Linear Functionals 129


2. Duality 133

xi
xu Contents

3. Change of Basis 134


4. Annihilators 138
5. The Dual of a Linear Transformation 142
*6. Duality of Linear Transformations 145
*7. Direct Sums 147
8. Bilinear Forms 156
9. Quadratic Forms 160
10. The Normal Form 162
11. Real Quadratic Forms 168
12. Hermitian Forms 170

V Orthogonal and unitary transformations, normal matrices 175

1. Inner Products and Orthonormal Bases 176


*2. Complete Orthonormal Sets 182
3. The Representation of a Linear Functional by an Inner Product 186
4. The Adjoint Transformation 189
5. Orthogonal and Unitary Transformations 194
6. Orthogonal and Unitary Matrices 195
7. Superdiagonal Form 199
8. Normal Matrices 201
9. Normal Linear Transformations 203
10. Hermitian and Unitary Matrices 208
11. Real Vector Spaces 209
12. The Computational Processes 213

VI Selected applications of linear algebra 219

1. Vector Geometry 220


2. Finite Cones and Linear Inequalities 229
3. Linear Programming 239
4. Applications to Communication Theory 253
5. Vector Calculus 259
6. Spectral Decomposition of Linear Transformations 270
7. Systems of Linear Differential Equations 278
8. Small Oscillations of Mechanical Systems 284
9. Representations of Finite Groups by Matrices 292
10. Application of Representation Theory to Symmetric Mechanical
Systems 312

Appendix 319
Answers to selected exercises 325
Index 345
Introduction

Many of the most important applications of mathematics involve what are


known as linear methods. The idea of what is meant by a linear method
applied to a linear problem or a linear system is so important that it deserves
attention in its own right. We try to describe intuitively what is meant by
a linear system and then give some idea of the reasons for the importance of
the concept.
As an example, consider an electrical network. When the network receives
an input, an output from the network results. As is customary, we can con-
sider combining two inputs by adding them and then putting their sum
through the system. This sum input will also produce an output. If the
output of the sum is the sum of the outputs the system is said to be additive.
We can also modify an input by changing its magnitude, by multiplying the
input by a constant factor. If the resulting output is also multiplied by the
same factor the system is said to be homogeneous. If the system is both
additive and homogeneous it is said to be linear.
The simplification in the analysis of a system that results from the knowl-
edge, or the assumption, that the system is linear is enormous. If we know
the outputs for a collection of different inputs, we know the outputs for all
inputs that can be obtained by combining these inputs in various ways.
Suppose, for example, that we are considering all inputs that are periodic
functions of time with a given period. The theory of Fourier series tells us
that, under reasonable restrictions, these periodic functions can be represented
as sums of simple sine functions. Thus in analyzing the response of a linear
system to a periodic input it is sufficient to determine the response when the
input is a simple sine function.
So many of the problems that we encounter are assumed to be linear
problems and so many of the mathematical techniques developed are
L Introduction

inherently linear that a catalog of the possibilities would be a lengthy under-


taking. Potential theory, the theory of heat, and the theory of small vibrations
of mechanical systems are examples of linear theories. In fact, it is not easy
to find brilliantly successful of mathematics to non-linear
applications
problems. In many applications the system is assumed to be linear even
though it is not. For example, the differential equation for a simple pendulum
isnot linear since the restoring force is proportional to the sine of the dis-
placement angle. We usually replace the sine of the angle by the angle in
radians to obtain a linear differential equation. For small angles this is a
good approximation, but the real justification is that linear methods are
available and easily applied.
In mathematics itself the operations of differentiation and integration are
linear. The linear differential equations studied in elementary courses are
linear in the sense intended here. In this case the unknown function is the
input and the differential operator is the system. Any physical problem that
is describable by a linear differential equation, or system of linear differential

equations, is also linear.


Matrix theory, vector analysis, Fourier series, Fourier transforms, and
Laplace transforms are examples of mathematical techniques which are
particularly suited for handling linear problems. In order for the linear
theory to apply to the linear problem it is necessary that what we have called
"inputs and outputs" and "linear systems" be representable within the theory.
In Chapter I we introduce the concept of vector space. The laws of combina-
tion which will be defined for vector spaces are intended to make precise the
meaning of our vague statement, "combining these inputs in various ways."
Generally, one vector space will be used for the set of inputs and one vector
space will be used for the set of outputs. We also need something to represent
the "linear system," and for this purpose we introduce the concept of linear
transformation in Chapter II.

The next step is to introduce a practical method for performing the needed
calculations with vectors and linear transformations. We restrict ourselves
to the case in which the vector spaces are finite dimensional. Here it is
appropriate to represent vectors by ^-tuples and to represent linear trans-
formations by matrices. These representations are also introduced in
Chapters I and II.
Where the vector spaces are infinite dimensional other representations are
required. In some may be represented by infinite sequences,
cases the vectors
or Fourier series, or Fourier transforms, or Laplace transforms. For
example, it is now common practice in electrical engineering to represent
inputs and outputs by Laplace transforms and to represent linear systems
by still other Laplace transforms called transfer functions.
The point is that the concepts of vector spaces and linear transformations
are common to all linear methods while matrix theory applies to only those
Introduction 3

linear problems that are finite dimensional. Thus it is of practical value to


discuss vector spaces and linear transformations as much as possible before
introducing the formalisms of w-tuples and matrices. And, generally, proofs
that can be given without recourse to «-tuples and matrices will be shorter,
simpler, and clearer.
The correspondences that can be set up between vectors and the //-tuples
which represent them, and between linear transformations and the matrices
which represent them, are not unique. Therefore, we have to study the
totality of all possible ways to represent vectors and linear transformations
and the relations between these different representations. Each possible
correspondence is called a coordinization of the vector space, and the process
of changing from one correspondence to another is called a change of
coordinates.
Any property of the vector space or linear transformation which is
independent of any particular coordinatization is called an invariant or
geometric property. We are primarily interested in those properties of
vectors and linear transformations which are invariant, and, if we use a
coordinatization to establish such a property, we are faced with the problem
of showing that the conclusion does not depend on the particular coordina-
tization being used. This is an additional reason for preferring proofs which
do not make use of any coordinatization.
On the other hand, if a property is known to be invariant, we are free to
choose any coordinate system we wish. In such a case it is desirable and
advantageous to select a coordinate system in which the problem we wish
to handle is particularly simple, or in which the properties we wish to establish
are clearly revealed. Chapter III and V are devoted to methods for selecting
these advantageous coordinate systems.
In Chapter IV we introduce ideas which allow us to define the concept
of distance in our vector spaces. This accounts for the principal differences
between the discussions in Chapters III and V. In Chapter III there is no
restriction on the coordinate systems which are permitted. In Chapter V the
only coordinate systems permitted are "Cartesian" that is, those in which
;

the theorem of Pythagoras holds.This additional restriction in permissible


coordinate systems means that it is more difficult to find advantageous
coordinate systems.
In addition to allowing us to introduce the concept of distance the material
in Chapter IV is of interest in its own right. There we study linear forms,
bilinear forms, and quadratic forms. They have application to a number of
important problems in physics, chemistry, and engineering. Here too,
coordinate systems are introduced to allow specific calculations, but proofs
given without reference to any coordinate systems are preferred.
Historically, the term "linear algebra" was originally applied to the study
of linear equations, bilinear and quadratic forms, and matrices, and their
4 Introduction

changes under a change of variables. With the more recent studies of Hilbert
spaces and other infinite dimensional vector spaces this approach has proved
to be inadequate. New techniques have been developed which depend less
on the choice or introduction of a coordinate system and not at all upon the
use of matrices. Fortunately, in most cases these techniques are simpler than
the older formalisms, and they are invariably clearer and more intuitive.
These newer techniques have long been known to the working mathe-
matician, but until very recently a curious inertia has kept them out of books
on linear algebra at the introductory level.
These newer techniques are admittedly more abstract than the older
formalisms, but they are not more difficult. Also, we should not identify
the word "concrete" with the word "useful." Linear algebra in this more
abstract form is just as useful as in the more concrete form, and in most cases
easier to see how it should be applied.
it is A
problem must be understood,
formulated in mathematical terms, and analyzed before any meaningful
computation is possible. If numerical results are required, a computational
procedure must be devised to give the results with sufficient accuracy and
reasonable efficiency. All the steps to the point where numerical results are
considered are best carried out symbolically. Even though the notation and
terminology of matrix theory is well suited for computation, it is not necessar-
ily the best notation for the preliminary work.

It is a curious fact that if we look at the work of an engineer applying

matrix theory we will seldom see any matrices at all. There will be symbols
standing for matrices, and these symbols will be carried through many steps
in reasoning and manipulation. Only occasionally or at the end will any
matrix be written out in full. This is so because the computational aspects
of matrices are burdensome and unnecessary during the early phases of work
on a problem. All we need is an algebra of rules for manipulating with them.
During this phase of the work it is better to use some concept closer to the
concept in the field of application and introduce matrices at the point where
practical calculations are needed.
An additional advantage in our studying linear algebra in its invariant
form is that there are important applications of linear algebra where the under-
lying vector spaces are infinite dimensional. In these cases matrix theory
must be supplanted by other techniques. A case in point is quantum mechanics
which requires the use of Hilbert spaces. The exposition of linear algebra
given in this book provides an easy introduction to the study of such spaces.
In addition to our concern with the beauty and logic of linear algebra in
this form we are equally concerned with its utility. Although some hints
of the applicability of linear algebra are given along with its development,
Chapter VI is devoted to a discussion of some of the more interesting and
representative applications.
chapter
I Vector spaces

In this chapter we introduce the concepts of a vector space and a basis for
that vector space. We assume that there is at least one basis with a finite
number of elements, and this assumption enables us to prove that the
vector space has a vast variety of different bases but that they all have the
same number of elements. This common number is called the dimension
of the vector space.
For each choice of a basis there is a one-to-one correspondence between
the elements of the vector space and a set of objects we shall call n-tuples.
A different choice for a basis will lead to a different correspondence between
the vectors and the ^-tuples. We, regard the vectors as the fundamental
objects under consideration and the n-tuples a representations of the vectors.
Thus, how a particular vector is represented depends on the choice of the
basis, and these representations are non-invariant. We call the n-tuple the
coordinates of the vector it represents; each basis determines a coordinate
system.
We then introduce the concept of subspace of a vector space and develop
the algebra of subspaces. Under the assumption that the vector space is

finite dimensional, we prove that each subspace has a basis and that for
each basis of the subspace there is a basis of the vector space which includes
the basis of the subspace as a subset.

1 I Definitions

To deal with the concepts that are introduced we adopt some notational
conventions that are commonly used. We usually use sans-serif italic letter
to denote sets.

a e S means a is an element of the set S.


a ^ S means a is not an element of the set S.
:

6 Vector Spaces |
I

S c 7 means S is a subset of the set T.


S n T denotes the intersection of the sets S and T, the set of elements
in both S and T.
S u T denotes the union of the sets S andT, the set of elements in S or T.
T — S denotes the set of elements in T but not in S. In case T is the set
of all objects under consideration, we shall call T — S the
complement of S and denote it by CS.
S^i/u e M denotes a collection of sets indexed so that one set S^ is specified
for each element /u e M. M is called the index set.
n ^M$M denotes the intersection of e M.
all sets S^-./j,

^e/v^V denotes the union of all sets e M.S^-.ju,

denotes the set with no elements, the empty set.

A set will often be specified by listing the elements in the set or by giving
a property which characterizes the elements of the set. In such cases we
use braces: {a, /?} is the set containing just the elements a and /S, {a |
P}
is the set of a with property P, {a^ fi e M} denotes the set of all a„
all
|

corresponding to [x in the index set M. We have such frequent use for the set
of all integers or a subset of the set of all integers as an index set that we
adopt a special convention for these cases, {aj denotes a set of elements
indexed by a subset of the set of integers. Usually the same index set is used
over and over. In such cases it is not necessary to repeat the specifications
of the index set and often designation of the index set will be omitted.
Where clarity requires it, the index set will be specified. We are careful to
distinguish between the set {aj and an element o^ of that set.

Definition. By a. field F we mean a non-empty set of elements with two laws


of combination, which we call addition and multiplication, satisfying the
following conditions

Fl. To every pair of elements a, b e F there


is associated a unique element,

called their sum, which we denote by a + b.


F2. Addition is associative; (a + b) + c = a + (b + c).
F3. There exists an element, which we denote by 0, such that a + =a
for all aeF.
For each a e F there exists an element, which we denote by —a, such
FA.
that a + (—a) = 0. Following usual practice we write b + (—a) = b — a.
F5. Addition is commutative; a + b = b + a.
F6. To every pair of element a, b e F there is associated a unique element,
called their product, which we denote by ab, or a •
b.

Fl. Multiplication is associative; (ab)c = a(bc).


FS. There exists an element different from 0, which we denote by 1,

such that a 1 = a for all a e F.



1 I
Definitions 7

F9. For each a e F, a ^ 0, there exists an element which we denote by


-1
a
-1
, such that a •
a = 1.

F10. Multiplicati6n is commutative: ab = ba.


F\\. Multiplication is distributive with respect to addition:

(a + b)c = ac + be.

The elements of F are called scalars, and will generally be denoted by lower
case Latin italic letters.

The rational numbers, real numbers, and complex numbers are familiar
and important examples of fields, but they do not exhaust the possibilities.
As a less familiar example, consider the set {0, 1} where addition is defined
by the rules: + 0=1 + 1=0, 0+1 = 1; and multiplication is defined
by the rules: 0-0 = 0-1=0, 1-1 = 1. This field has but two elements,
and there are other fields with finitely many elements.
We do not develop the various properties of abstract fields and we are
not concerned with any specific field other than the rational numbers, the
real numbers, and the complex numbers. We find it convenient and desirable
at the moment to leave the exact nature of the field of scalars unspecified
because much of the theory of vector spaces and matrices is valid for arbitrary
fields.
The student unacquainted with the theory of abstract fields will not be
handicapped for it will be sufficient to think of F as being one of the familiar
fields. All that matters is that we can perform the operations of addition and

subtraction, multiplication and division, in the usual way. Later we have to


restrict F to either the field of real numbers or the field of complex numbers
in order to obtain certain classical results; but we postpone that moment as
long as we can. At another point we have to make a very mild assumption,
that is, 1 + 1 5* 0, a condition that happens to be false in the example given
above. The student interested mainly in the properties of matrices with real
or complex coefficients should consider this to be no restriction.

Definition. A vector space V over F is a non-empty set of elements, called


vectors, with two laws of combination, called vector addition (or addition)
and scalar multiplication, satisfying the following conditions:

To every pair of vectors a, $ e V there is associated a unique vector


A\.
in V called their sum, which we denote by <x + /?.
A2. Addition is associative; (a + /J) + y = a + (/? + y).
A3. There exists a vector, which we denote by 0, such that a + =a
for all a e V.
AA. For each ae V there exists an element, which we denote by —a,
such that a + ( — a) = 0.
A5. Addition is commutative; a + (3 = /? + a.
8 Vector Spaces |
I

B\. To every scalar a e F and vector a e V, there is associated a unique


vector, called the product of a and a, which we denote by aa.
52. Scalar multiplication is associative : a(ba.) = (ab)tx..

B3. Scalar multiplication is distributive with respect to vector addition;


a(oc + /?) = aa + a/?.

54. Scalar multiplication is distributive with respect to scalar addition;


(a + b)ct. = aa + bca.

B5. 1 •
a = a (where 1 e F).

We generally use lower case Greek letters to denote vectors. An exception


is the zero vector of A3. From a logical point of view we should not use the
same symbol "0 '
and the zero vector, but this practice
for both the zero scalar
is rooted in a long tradition and not as confusing as it may seem at first.
it is

The vector space axioms concerning addition alone have already appeared
in the definition of a field. A set of elements satisfying the first four axioms
is called a group. If the set of elements also satisfies A5 it is called a com-
mutative group or abelian group. Thus both fields and vector spaces are
abelian groups under addition. The theory of groups is well developed and
our subsequent discussion would be greatly simplified if we were to assume
a prior knowledge of the theory of groups. We do not assume a prior
knowledge of the theory of groups; therefore, we have to develop some of
their elementary properties as we go along, although we do not stop to point
out that what was proved is properly a part of group theory. Except for
specific applications in Chapter VI we do no more than use the term "group"
to denote a set of elements satisfying these conditions.
First, we give some examples of vector spaces. Any notation other than
"F" for a field and "V" for a vector space is used consistently in the same
way throughout the rest of the book, and these examples serve as definitions
for these notations:

(1) Let F be any field and let V = P be the set of all polynomials in an
indeterminate x with coefficients in F. Vector addition is defined to be the
ordinary addition of polynomials, and multiplication is defined to be the
ordinary multiplication of a polynomial by an element of F.
(2) For any positive integer n, let P n be the set of all polynomials in x
with coefficients in F of degree <n— 1 , together with the zero polynomial.
The operations are defined as in Example (1).

(3) Let F = R, the field of real numbers, and take V to be the set of all

real-valued functions of a real variable. If/ and g are functions we define


vector addition and scalar multiplication by the rules

(f+g)(*)=f(*)+g(*)>
(af)(x) = a[f(x)].
"
1 I
Definitions

(4) Let F = R, and let V be the set of continuous real-valued functions


of a real variable. The operations are defined as in Example (3). The point
of this example is that it requires a theorem to show that A\ and B\ are
satisfied.

(5) Let F = R,and let V be the set of real-valued functions defined on


the interval [0, 1] and integrable over that interval. The operations are
defined as in Example (3). Again, the main point is to show that A\ and B\
are satisfied.
(6) Let F = R, and let V be the set of all real- valued functions of a real
variable difterentiable at least m times (m a positive integer). The operations
are defined as in Example (3).
(7) Let F = R, and let V be the set of all real-valued functions difterentiable
^L

at least twice and satisfying the differential equation —+


d 2y
y = 0.

(8) Let F = R, and let V = Rn be the set of all real ordered w-tuples,

a = (a lt a2 , . . . , an) with a t e F. Vector addition and scalar multiplication


are. defined by the rules

(a lt ... ,a n ) + (b u ...,b n )=(a 1 + b 1 ,...,a n + b n ),


a(a lt ... ,a n ) = (aa lt . . . aa n ).
^^
,

We call this vector space the n-dimensional real coordinate space or the
real affine n-space.(The name "Euclidean n-space" is sometimes used, but
that term should be reserved for an affine w-space in which distance is defined.)
n
(9) Let F be the set of all w-tuples of elements of F. Vector addition and
-

scalar multiplication are defined by the rules (1.2). We call this vector
space an n-dimensional coordinate space.

An
immediate consequence of the axioms defining a vector space is that
the zero vector, whose existence is asserted in A3, and the negative vector,

whose existence is asserted in A4, are unique. Specifically, suppose


satisfies A3 for all vectors in V and that for some a e V there is a 0' satisfying

the condition a + = a + 0' = a. Then 0' = 0' + = 0' + (a + (-a)) =


(0' + a) + (-a) = (a + 0') + (-a) = a + (-a) = 0. Notice that we
have proved not merely that the zero vector satisfying A3 for all a is unique;
we have proved that a vector satisfying the condition of A3 for some a. must
be the zero vector, which is a much stronger statement.
Also, suppose that to a given a there were two negatives, (—a) and
(—a)', satisfying the conditions of A4. Then (— a)' =. (— a)' + =
(-a)' + a + (-a) = (-a) + a + (-a)' = (-a) + = (-a). Both
these demonstrations used the commutative law, A5. Use of this axiom
could have been avoided, but the necessary argument would then have been
somewhat longer.
10 Vector Spaces |
I

Uniqueness enables us to prove that 0a = 0. (Here is an example of the V*y


seemingly ambiguous use of the symbol "0." The "0" on the left side is a Ly*
scalar while that on the right is a vector. However, no other interpretation 1

could be given the symbols and it proves convenient to conform to the


convention rather than introduce some other symbol for the zero vector.)
For each a e V, a 1 a (1 0)a = 1 a += a = or+
• = a. +
Thus * • • •

• a = 0. In a similar manner we can show that (— l)a = — a| oT+"X—T)a = / )


(1 — l)a = • a = 0. Since the negative vector is unique we see that \ -

(— l)a = — a. It also follows similarly that a • = 0.

EXERCISES
1 What theorems must be proved in each of the Examples (4), (5), (6), and
to 4.
All To verify B\ ? (These axioms are usually the ones which require
(7) to verify
most specific verification. For example, if we establish that the vector space
described in Example (3) satisfies all the axioms of a vector space, then A\ and B\
are the only ones that must be verified for Examples (4), (5), (6), and (7). Why?)
5. Let P + be the set of polynomials with real coefficients and positive constant
term. Is P+ a vector space ? Why ?
6. Show that if aa =0 and a ¥= 0, then a = 0. {Hint: Use axiom F9 for fields.)

7. Show that if aa =0 and a^fl, then a = 0.


8. Show that the £ such that a + f = p is (uniquely) £ = p + (-a).

9. Let a = (2, —5, 0, 1) and P = (—3, 3, 1, —1) be vectors in the coordinate

space R4 Determine .

(a) a+p.
{b) *-p.
(c) 3a.
{d) 2a + 3/3.

10. Show that any field can be considered to be a vector space over itself.

11. Show that the real numbers can be considered to be a vector space over the
rational numbers.

12. Show that the complex numbers can be considered to be a vector space over
the real numbers.

13. Prove the uniqueness of the zero vector and the uniqueness of the negative
of each vector without using the commutative law, A5.

2 I Linear Independence and Linear Dependence

Because of the associative law for vector addition, we can omit the
parentheses from expressions like + (a2 <x 2 a 3 <xs) a^ + = {^^ + a a2) + 2

« 3 a 3 and write them in the simpler form a2 2 a^ + tx. + a a = ^Li a a


3 3 * i-

It is clear that this convention can be extended to a sum of any number of


2 |
Linear Independence and Linear Dependence 11

such terms provided that only a finite number of coefficients are different
from zero. Thus, whenever we write an expression like (in which we ^ a^
do not specify the range of summation), it will be assumed, tacitly if not
explicitly, that the expression contains only a finite number of non-zero
coefficients.
If /5 = ^t fl
i
a i» we say tnat $ 1S a linear combination of the o^. We also
say that /8 is linearly dependent on the a* if can be expressed as a linear
combination of the o^. An expression of the form 2» fl
z
ai = is called a
linear relation among the <x
f
. A relation with all at = is called a trivial
linear relation; a relation in which at least one coefficient is non-zero is

called a non-trivial linear relation.

Definition. A set of vectors is said to be linearly dependent if there exists a


non-trivial linear relation among them. Otherwise, the set is said to be
linearly independent.

should be noted that any set of vectors that includes the zero vector is
It

linearly dependent. A set consisting of exactly one non-zero vector is linearly


independent. For if aa = with a ^ 0, then a = 1 a = (a- 1 a)a. = • •

ar\aa) — cr 1
= 0. Notice also that the empty set is linearly independent.

It is clear that the concept of linear independence of a set would be mean-


ingless if a vector from a set could occur arbitrarily often in a possible relation.
If a set of vectors is given, however, by itemizing the vectors in the set it is a
definite inconvenience to insist that all the vectors listed be distinct. The
burden of counting the number of times a vector can appear in a relation is
transferred to the index set. For each index in the index set, we require that
a linear relation contain but one term corresponding to that index. Similarly,
when we specify a set by itemizing the vectors in the set, we require that one and
only one vector be listed for each index in the index set. But we allow the
possibility that several indices may be used to identify the same vector. Thus
the set {<x. lt a 2 }, where a x = a 2 is linearly dependent, and any set with any
vector listed at least twice is linearly dependent. To be precise, the concept
of linear independence is a property of indexed sets and not a property of sets.
In the example given above, the relation a x — a2 = involves two terms in
the indexed set {a* i e {1, 2}} while the set {a l5 2 }
|
a actually contains only
one vector. We should refer to the linear dependence of an indexed set rather

than the linear dependence of a set. The conventional terminology, which


we are adopting, is inaccurate. This usage, however, is firmly rooted in

tradition and, once understood, it is a convenience and not a source of


difficulty. We speak of the linear dependence of a set, but the concept always
refers to an indexed set. For a linearly independent indexed set, no vector can
be listed twice; so in this case the inaccuracy of referring to a set rather than
an indexed set is unimportant.
12 Vector Spaces |
I

The concept of linear dependence and independence is used in essentially


two ways. (1) If a set {aj of vectors is known to be linearly dependent,
there exists a non-trivial linear relation of the form ]£t a i Cf — 0- (This -i

relation is not unique, but that is usually incidental.) There is at least one
non-zero coefficient; let a k be non-zero. Then a = ^ ^ k {—af 1 a^)a. i that is fc i
;

one of the vectors of the set {aj is a linear combination of the others. (2) If a
set {aj of vectors is known to be linearly independent and a linear relation

2i a^i = is obtained, we can conclude that all a t = 0. This seemingly


trivial observation is surprisingly useful.
In Example (1) the zero vector is the polynomial with all coefficients
equal to zero. Thus the set of monomials {1, x, x 2 .} is a linearly in- ,
. .

dependent set. The set {l,x,x 2 ,x 2 + x + 1} is linearly dependent since


2 —
1 -f x + x (x 2 + x + 1) = 0. In P n of Example (2), any n + 1 poly-
nomials form a linearly dependent set.
In R 3 consider the vectors {a = (1, 1, 0), p = (1, 0, 1), y = (0, 1, 1),
b — (1, 1, 1)}. These four vectors are linearly dependent since a + /3 +
y — 2d = 0, yet any three of these four vectors are linearly independent.

Theorem 2.1. If a is linearly dependent on {&} and each & is linearly


dependent on {/,}, then a is linearly dependent on {y 3 }.

proof. From a = ^ bfii and & = ^ c^y^ it follows that a =

Theorem 2.2. A set of non-zero vectors {a l5 a 2 , . . .} is linearly dependent

if and only if some tx.


k is a linear combination of the a,- with j < k.
proof. Suppose the vectors {a 1? a 2 , . . .} are linearly dependent. Then
there is a non-trivial linear relation among them; 2* a y
i -i
— 0-. Since a
positive finite number of coefficients are non-zero, there is a last non-zero
coefficient a k Furthermore, k > 2 since x ^ 0. Thus a =
. <x.
fc
— a^^ti^i*** =

The converse is obvious. D

For any subset A of V the set of all linear combinations of vectors in A


iscalled the set spanned by A, and we denote it by (A). We also say that A
spans (A). It is a part of this definition that A c (A). We also agree that the
empty set spans the set consisting of the zero vector alone. It is readily
apparent that if A c 8, then (A) c <8>.
In this notation Theorem 2.1 is equivalent to the statement: If A c <8>
and 8 c: <C>, then A c <C>.

Theorem 2.3. The set {aj of non-zero vectors is linearly independent if


and only iffor each k, <x k $ <a l9 a^). (To follow our definitions exactly,
. . . ,

the set spanned by {a l5 a^} should be denoted by ({o^,


. . . , a^}). . . . ,
2 |
Linear Independence and Linear Dependence 13

We shall use the symbol <a ls . . . , a^) instead since it is simpler and there is
no danger of ambiguity.)
proof. This is merely Theorem 2.2 in contrapositive form and stated in
new notation. ,r .
„ . ,

<fc.- c * ? / •

Theorem 2.4. IfB and C are any subsets such thato <= (C), then (B) <= (Q.
proof. Set A = (B) in Theorem 2.1. Then 8 c <C) implies that <B> =
A c <C>. D

Theorem 2.5. If cnk e A is dependent on the other vectors in A, then (A) =


(A - K}>.
proof. The assumption that a fc is dependent on A — {a fc } means that
Ac.(A- {a fc }>.
It then follows from Theorem 2.4 that (A) c (A - (aj).
The equality follows from the fact that the inclusion in the other direction
is evident.

Theorem 2.6. For any set C, ((C)) = (C).

proof. Setting 6 = (C) in Theorem 2.4 we obtain ((C)) = (B) c (p.


Again, the inclusion in the other direction is obvious.

Theorem 2.7. If a finite set A = {a l5 . . . , a n } spans V, then every linearly


independent set contains at most n elements.
proof. Let B = {/?!, jg 2 , . . . } be a linearly independent set. We shall

successively replace the a f by the &, obtaining at each step a new n-element
set that spans V. Thus, suppose that A k = {^ t &., afc+1 a„} is an , . . . , , . . . ,

n-element set that spans V. (Our starting point, the hypothesis of the
theorem, is the case k = 0.) Since A k spans V, fi k+1 is dependent on A k .

Thus the set {jS l5 )8 ]8 fc+1 a fc+1 .a n } is linearly dependent. In any


. .
, fc ,
, , . . . ,

non-trivial relation that exists the non-zero coefficients cannot be confined


to the j8,-, for they are linearly independent. Thus one of the 0^(1 > k) is
dependent on the others, and after reindexing {afc+1 a n } if necessary , . . . ,

we may assume that it is a fc+1 By Theorem 2.5 the set A k+1 = {/5 X . , . . .

Pk+i, a *+2, <*«} also spans V.


• • • ,

If there were more than n elements in B, we would in this manner arrive


at the spanning set A n = {^, p n }. But then the dependence of /S n+1 on. . .
,

A n would contradict the assumed linear independence of B. Thus B contains


at most n elements.

Theorem 2.7 is stated in slightly different forms in various books. The


essential feature of the proof is the step-by-step replacement of the vectors
in one set by the vectors in the other. The theorem is known as the Steinitz
replacement theorem.
14 Vector Spaces |
I

EXERCISES
In the vector space P of Example (1) let p x (x) = x + x + 1, p 2 (x) =
2
1.

x 2 — x — 2,p 3 (x) = x + x — l,/>4 0*0 = x — 1. Determine whether or not the set


2

linearly dependent,
{pi(x ), Pi(x), Pz{x )> Pi(x )} is linearly independent. If the set is

express one element as a linear combination of the others.

2. Determine ({p x (x), p 2 (x), p z (x), p^x)}), where the p { (x) are the same poly-

nomials as those defined in Exercise 1 (The set required is infinite, so that we


.

cannot list all its elements. What is required is a description; for example, "all
polynomials of a certain degree or less," "all polynomials with certain kinds of
coefficients," etc.)

3. A linearly independent set


is said to be maximal if it is contained in no larger

linearly independent set. In this definition the emphasis is on the concept of set
inclusion and not on the number of elements in a set. In particular, the definition
allows the possibility that two different maximal linearly independent sets might
have different numbers of elements. Find all the maximal linearly independent

subsets of the set given in Exercise 1. How many elements are in each of them?

4. Show that no finite set spans P; that is, show that there is no maximal finite

linearly independent subset of P. Why are these two statements equivalent ?

5. In Example (2) for n = 4, find a spanning set for P4 . Find a minimal spanning
set. Use Theorem 2.7 to show that no other spanning set has fewer elements.

6. In Example (1) or (2) show that {1, x + 1, x2 + x + 1, x3 + x2 + x + 1,

x* + x 3 + x 2 + x + 1} is a linearly independent set.

7. In Example (1) show that the set of all polynomials divisible by x - 1 cannot
span P.

4
8. Determine which of the following set in R are linearly independent over R.

(a) {(1,1,0,1), (1,-1,1,1), (2,2,1,2), (0,1,0,0)}.


(b) {(1,0,0, 1), (0,1,1,0), (1,0,1,0), (0,1,0,1)}.
(c) {(1,0,0, 1), (0,1,0,1), (0,0,1,1), (1,1,1,1)}.

9. Show that {e x = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0,

. .
.
, 1)} is linearly independent in Fn over F.

10. In Exercise 11 of Section 1 it was shown that we may consider the real

numbers to be a vector space over the rational numbers. Show that {1, V2} is a
linearly independent set over the rationals. (This is equivalent to showing that

Vl is irrational.) Using this result show that {1, V2, Vj} is linearly independent.

11. Show that if one vector of a set is the zero vector, then the set is linearly

dependent.
12. Show that if an indexed set of vectors has one vector listed twice, the set is

linearly dependent.

13. Show that if a subset of S is linearly dependent, then S is linearly dependent.

14. Show that if a set S is linearly independent, then every subset of S is linearly

independent.
,

3 |
Bases of Vector Spaces 15

15. Show that if the set A = {a l5 . .


.
, an} is linearly independent and {a l5 . .
.

an , /3} is linearly dependent, then /S is dependent on A.

16. Show that, if each of the vectors {ft,, p lf . . .


, fi n} is a linear combination of
the vectors {«!,...,«„}, then {p , px , . . . , Pn } is linearly dependent.

3 I Bases of Vector Spaces

Definition. A linearly independent set spanning a vector space V is called a

basis or base (the plural is bases) of V.

If A = {a l5 a 2 , . . .} is a basis of V, by definition anaeV can be written


in the form a = J< «i a i- The interesting thing abotit a basis, as distinct from
other spanning sets, is that the coefficients are uniquely determined by a.
For suppose that we also have a = J, ^oq. Upon subtraction we get the
linear relation Ji ( fl < — ^i) a i = °-
set, flj — bi = and ^ = for each
since
/.
is a Nearly independent

related fact is that a basis is a


Z>,- A
M
particularly efficient spanning set, as we shall see.

In Example (1) the vectors {a,- xl i 0, 1 .} form a basis. We have = |


= , . .

already observed that this set is linearly independent, and it clearly spans
the space of all polynomials. The space P„ has a basis with a finite number of
elements; x n - x ).
{1, x, z2 , . . . ,

The vector spaces in Examples (3), (4), (5), (6), and (7) do not have bases
with a finite nuftiber of elements.
In Example (8) every R has a finite basis consisting of {a, a 4 = (d u
n
| ,

<5
2 i, dm)}- (Here 6 ti is the useful symbol known as the Kronecker delta.
• • • »

By definition d i6 = if i ^j and du = 1.)

Theorem 3.1. If a vector space has one basis with a finite number of
elements, then all other bases are finite and have the same number of elements.
Let A be a basis with a finite number n of elements, and let 8 be
proof.
any other basis. Since A spans V and B is linearly independent, by Theorem
2.7 the number m of element in 6 must be at most n. This shows that 8 is
finite and m < n. But then the roles of A and 6 can be interchanged to
obtain the inequality in the other order so that m= n.

A vector space with a finite basis is called a finite dimensional vector


space, and the number of elements in a basis is called the dimension of
the space. Theorem 3.1 says that the dimension of a finite dimensional vector
space is well defined. The vector space with just one element, the zero
vector, has one linearly independent subset, the empty set. The empty set
is also a spanning set and is therefore a basis of {0}. Thus {0} has dimension

zero. There are very interesting vector spaces with infinite bases ; for example,
P of Example (1). Moreover, many of the theorems and proofs we give are
also valid for infinite dimensional vector spaces. It is not our intention,
16 Vector Spaces |
I

however, to deal with infinite dimensional vector spaces as such, and when-
ever we speak of the dimension of a vector space without specifying whether
it is finite or infinite dimensional we mean that the dimension is finite.

Among the examples we have discussed so far, each P n and each R n is


/i-dimensional. We have already given at least one basis for each. There
are many others. The bases we have given happen to be conventional and
convenient choices.

Theorem 3.2. Any n + 1 vectors in an n-dimensional vector space are


linearly dependent.
proof. Their independence would contradict Theorem 2.7.

We have already seen that the four vectors {a = (1, 1, 0), = (1, 0, 1),

y = (0, 1, 1), d = (1, 1, 1)} form a linearly dependent set in R


3
Since R 3 .

is 3-dimensional we see that this must be expected for any set containing
3
at least four vectors from R The next theorem shows that each subset of .

three is a basis.

Theorem 3.3. A set of n vectors in an n-dimensional vector space V is a


basis if and only if it is linearly independent. ^ u* ^ h JUa**^ U^^W- </^v,w-
<
;^Y
proof. The "only if" is part of the definition of a basis. Let A =
{<*!, . . . , a J be a linearly independent set and let a be any vector in V.
Since {a 1? . .a„, a} contains n
. , +
1 elements it must be linearly dependent.

Any non-trivial relation that exists must contain a with a non-zero coefficient,
for if that coefficient were zero the relation would amount to a relation in A.
Thus a is dependent on A. Hence A spans V and is a basis.

Theorem 3.4. A set of n vectors in an n-dimensional vector space V is a


basis if and only if it spans V. « -
• .
* '*' 1 *-
Y*""*"
^
'
'
w ^^
^f >^
' *
u
'<

proof. The "only if" is part of the definition of a basis. If n vectors did

span V and were linearly dependent, then (by Theorem 2.5) a proper subset
would also span V, contrary to Theorem 2 .7. ijn, <Uh^

We see that a basis a maximal linearly independent set and a minimal


is

spanning set. This idea made explicit in the next two theorems.
is

Theorem 3.5. In a finite dimensional vector space, every spanning set


contains a basis.
proof. Let B be a set spanning V. If V = {0}, then <= 8 is a basis of

{0}. If then 6 must contain at least one non-zero vector a x We


V 5^ {0}, .

now search for another vector in B which is not dependent on {%}. We


call this vector a 2 and search for another vector in B which is not dependent
on the linearly independent set {a l9 a 2 }. We continue in this way as long as
we can, but the process must terminate as we cannot find more than n

£«. (M~t 1-3*2. .


-J, *U Wv.^ ^ W" -<r^*-*- W-<- "^ "
>^vw, j^ ,
^
3 |
Bases of Vector Spaces 17

linearlyindependent vectors in 6. Thus suppose we have obtained the set


A = a TO } with the property that every vector in B is linearly de-
{a l5 . . . ,

pendent on A. Then because of Theorem 2. 1 the set A must also span V and
it is a basis.

To drop the assumption that the vector space is ^-dimensional would


change the complexion of Theorem 3.5 entirely. As it stands the theorem
is interesting but minor, and not difficult to prove. Without this assumption

the theorem would assert that every vector space has a basis since every
vector space is spanned by itself. Discussion of such a theorem is beyond the
aims of this treatment of the subject of vector spaces.

Theorem 3.6. In a finite dimensional vector space any linearly independent


set of vectors can be extended to a basis.
proof. Let A = {a l5 aj be a basis of V, and let 8 = {/? l5 /5 m }
. . . , . . .
,

be a linearly independent set (m < n). The set {/? l5 ft m


a l9 aM } . . .
, , . . . ,

spans V. If this set is linearly dependent (and it surely is if m > 0) then


some element is a linear combination of the preceding elements (Theorem
2.2). This element cannot be one of the /3/s for then B would be linearly
dependent. But then this u. can be removed to obtain a smaller set spanning
t

V (Theorem 2.5). We continue in this way, discarding elements as long as


we have a linearly dependent spanning set. At no stage do we discard one of
the /3/s. Since our spanning set is finite this process must terminate with a
basis containing B as a subset.

Theorem one of the most frequently used theorems in the book.


3.6 is

It is often used in the following way. A non-zero vector with a certain desired
property is selected. Since the vector is non-zero, the set consisting of that
vector alone is a linearly independent set. An application of Theorem 3.6
shows that there is a basis containing that vector. This is usually the first step
of a proof by induction in which a basis is obtained for which all the vectors
in the basis have the desired property.
Let A = {a l5 a„} be an arbitrary basis of V, a vector space of dimension
. . . ,

n over the field F. Let a be any vector in V. Since A is a spanning set a can be
represented as a linear combination of the form <x = 2"=i at a »- Since A is
linearly independent this representation is unique, that is, the coefficients at
are uniquely determined by a (for the given basis A). On the other hand,
for each n-tuple (a t a n ) there is a vector in V of the form ^Li ai ai-
, . . . ,

Thus there is a one-to-one correspondence between the vectors in V and the


w-tuples (fl l5 a n) g F n
. . . , .

If a = 2?=i fl i a i' th e sca l ar a i is called the i-th coordinate of a, and a4 a t -

is called the i-th component of a. Generally, coordinates and components


depend on the choice of the entire basis and cannot be determined from
18 Vector Spaces |
I

individual vectors in the basis.Because of the rather simple correspondence


between coordinates and components there is a tendency to confuse them and
to use both terms for both concepts. Since the intended meaning is usually
clear from context, this is seldom a source of difficulty.
If a = JjLx a^i corresponds to the «-tuple (a lf a n ) and /? = 2?=x *t a i . . . ,

corresponds to the n-tuple (b lt . . . , b n ), then a. + /5 = 2?=i( a i + ^t) a i


corresponds to the «-tuple (a x + bu . . . , an + b n ). Also, act. = ^Li afir
»
ai
corresponds to the rc-tuple {aa x , . . . , aa n ). Thus the definitions of vector
addition and scalar multiplication among /i-tuples defined in Example (9)
correspond exactly to the corresponding operations in V among the vectors
which they represent. When two sets of objects can be put into a one-to-one
correspondence which preserves all significant relations among their elements,
we say the two sets are isomorphic; that is, they have the same form. Using
this terminology, we can say that every vector space of dimension n over a
n
given field F isomorphic to the w-dimensional coordinate space F
is Two .

sets which are isomorphic differ in details which are not related to their inter-
nal structure. They are essentially the same. Furthermore, since two sets
isomorphic to a third are isomorphic to each other we see that all w-dimen-
sional vector spaces over the same field of scalars are isomorphic.
The of
set «-tuples together with the rules for addition and scalar multi-
plication forms a vector spaceinitsown right. However, when a basis is chosen
in an abstract vector space V the correspondence described above establishes
an isomorphism between V and F n In this context
. we consider F n to be a
representation of V. Because of the existence of this isomorphism a study
of vector spaces could be confined to a study of coordinate spaces. However,
n
the exact nature of the correspondence between V and F depends upon the
choice of a basis in V. If another basis were chosen in V a correspondence
between the a g V and the «-tuples would exist as before, but the correspond-
ence would be quite different. We choose to regard the vector space V and the
vectors in V as the basic concepts and their representation by «-tuples as a tool
for computation and convenience. There are two important benefits from
this viewpoint. Since we are free to choose the basis we can try to choose a
coordinatization for which the computations are particularly simple or for
which some fact that we wish to demonstrate is particularly evident. In
fact, thechoice of a basis and the consequences of a change in basis is the
central theme of matrix theory. In addition, this distinction between a
vector and its representation removes the confusion that always occurs when
we define a vector as an n-tuple and then use another «-tuple to represent it.
Only the most elementary types of calculations can be carried out in the
abstract. Elaborate or complicated calculations usually require the intro-
duction of a representing coordinate space. In particular, this will be re-
quired extensively in the exercises in this text. But the introduction of
3 |
Bases of Vector Spaces 19

coordinates can result in confusions that are difficult to clarify without ex-
tensive verbal description or awkward notation. Since we wish to avoid
cumbersome notation and keep descriptive material at a minimum in the
exercises, it is helpful to spend some time clarifying conventional notations
and circumlocutions that will appear in the exercises.
The introduction of a coordinate representation for V involves the selection
of a basis {<x x a n } for V. With this choice a x is represented by (1, 0,
, . . . ,

... , 0), a2 is represented by (0, 1 , 0, . . . , 0), etc. While it may be necessary


to find a basis with certain desired properties the basis that is introduced at
first is arbitrary and serves only problem we face in a
to express whatever
form suitable for computation. Accordingly,
customary to suppress it is

specific reference to the basis given initially. In this context it is customary


to speak of "the vector {a x a 2 a„)" rather than "the vector a whose
, , . . . ,

representation with respect to the given basis {a l9 a n }is(a 1 a 2 a n )." . . . , , , . . . ,

Such short-cuts may be disgracefully inexact, but they are so common that
we must learn how to interpret them.
For example, let V be a two-dimensional vector space over R. Let A =
{a 1? a 2 } be the selected basis. If = a x + a 2 an d j8 2 = — a i + a 2> then
^
8 = {/?!, j8 2 } is also a basis of V. With the convention discussed above we
would identify a x with (1,0), a 2 with (0, 1), /? x with (1, 1), and |8 2 with
(— 1, 1). Thus, we would refer to the basis 8 = {(1, 1), (—1, 1)}. Since
a i = IPi ~ £&> a i has the representation (\, —%) with respect to the basis 8.
If we are not careful we can end up by saying that "(1, 0) is represented by

EXERCISES
To show that a given set is a basis by direct appeal to the definition means
that we must show the set is linearly independent and that it spans V. In any
given situation, however, the task is very much simpler. Since V is ^-dimensional
a proposed basis must have n elements. Whether this is the case can be told at a
glance. In view of Theorems 3.3 and 3.4 if a set has n elements, to show that it is a
basis it suffices to show either that it spans V or that it is linearly independent.

1. In R 3 show that {(1, 1,0), (1, 0, 1), (0, 1, 1)} is a basis by showing that it is

linearly independent.

2. Show that {(1,1, 0), (1,0, 1), (0, 1, 1)} is a basis by showing that <(1, 1, 0),
(1, 0, 1), (0, 1, 1)> contains (1, 0, 0), (0, 1,0) and (0, 0, 1). Why does this suffice?
3. In R 4 let = {(1, 1, 0, 0), (0, 0, 1, 1), (1,0, 1, 0), (0, 1, 0, -1)} be a basis
A
(is it?) and 8 ={(1,2, -1, 1), (0, 1,2, -1)} be a linearly independent set
let

(is it ?). Extend B to a basis of R4 (There are many ways to extend 8 to a basis.
.

It is intended here that the student carry out the steps of the proof of Theorem 3.6
for this particular case.)
20 Vector Spaces |
I

4. Find a basis of R 4 containing the vector (1, 2, 3, 4). (This is another even
simpler application of the proof of Theorem 3.6. This, however, is one of the most
important applications of this theorem, to find a basis containing a particular
vector.)

5. Show that a maximal linearly independent set is a basis.

6. Show that a minimal spanning set is a basis.

4 I Subspaces
Definition. A
subspace VV of a vector space V is a non-empty subset of V
which is itself a vector space with respect to the operations of addition and
scalar multiplication defined in V. In particular, the subspace must be a
vector space over the same field F.

The first problem that must be settled is the problem of determining the
conditions under which a subset W is in fact a subspace. It should be clear
that axioms A2, A5, B2, 53, 54, and 55 need not be checked as they are valid
in any subset of V. The most innocuous conditions seem to be Al and B\,
but it is precisely these conditions that must be checked. If B\ holds for a
non-empty subset VV, there is an a e VV so that 0<x = g VV. Also, for each
a e W, (— l) a = — a g W. Thus A3 and AA follow from B\ in any non-
empty subset of a vector space and it is sufficient to check that VV is non-
empty and closed under addition and scalar multiplication.
The two closure conditions can be combined into one statement: if
a, |8 g VV and a, b e F, then aa + bfi e VV. This may seem to be a small
change, but it is a very convenient form of the conditions. It is also equivalent
to the statement that all linear combinations of elements in VV are also in VV;
that is, <W> VV. =
It follows directly from this statement that for any

subset A, (A) a subspace. Thus, instead of speaking of the subset spanned


is

by A, we speak of the subspace spanned by A.


Every vector space V has V and the zero space {0} as subspaces. As a rule
we are interested in subspaces other than these and to distinguish them we
call the subspaces other than V and {0} proper subspaces. In addition, if
VV is a subspace we designate subspaces of VV other than VV and {0} as proper
subspaces of VV.

In Examplesand (2) we can take a fixed finite set {x x x 2


(1)
xm } , , . . . ,

of elements of F and define VV to be the set of all polynomials such that

p{ Xl ) = p(x 2 ) = = p(xj = 0. To show that VV is a subspace it is


• • •

sufficient to show that the sum of two polynomials which vanish at the
x also vanishes at the x it and the product of a scalar and a polynomial
t

vanishing at the x t also vanishes at the x f What is the situation in P n if


.

m> nl Similar subspaces can be defined in examples (3), (4), (5), (6),
and (7).
4 |
Subspaces 21

The space Pm is a subspace of P, and also a subspace of P n for m <n.


In R n , for each m, < m <n, the set of all = (a u a <x 2, . . . , a n) such
that a x =a = 2
• • • = a m = is a subspace of R n This subspace . is proper
if < m < n.
Notice that the set of all n-tuples of rational numbers is a subset of R n
and it is a vector space over the rational numbers, but it is not a subspace of
R n since it is not a vector space over the real numbers. Why ?

Theorem 4.1. The intersection of any collection of subspaces is a subspace.


proof. Let W^-./biE M be an indexed collection of subspaces of V.
n jUe yviW/i is not empty since it contains 0. Let a, /S g n^VV,, and a,beF.
Then a, (3 e W^ e M. Since
for each M
is a subspace aa + Z>/5 e
ju
M
for W W
each [is hA, and hence aa + Z>/5 e n^^W^. Thus H^^W^ is a subspace.

Let A be any subset of V, not necessarily a subspace. There exist subspaces


W^ c: V which contain A; in fact, V is one of them. The intersection
^aczvv//^ of a ll such subspaces is a subspace containing A. It is the
smallest subspace containing A.

Theorem 4.2. For any A c V, Acvv /<


n
<A>; ?/?a? w, ^W = ?Ae smallest
subspace containing A is exactly the subspace spanned by A.
proof. Since H^^W^ is a subspace containing A, it contains all linear
combinations of elements of A. Thus (A) <= n AcW/4 WM On . the other
hand (A) is a subspace containing A, that is, (A) is one of the W and hence
^w^ c <A>. Thus n AcWfl W^ = (A), n
M

W + W defined to be the set of vectors of the form a + where


x 2 is all x <x 2

ax G W and a 6 Wx 2 2.

Theorem IfW and W are subspaces ofV, then W + W a subspace


4.3. x 2 x 2 is

ofV.
proof. If a = ai + a e W + W = & + £ e W + W and a,
2 x 2, 2 x 2,

Z> e F, then act. + bfi = fl(a + a ) + 6(/?x + &) = (aaj + 6/^) + (aa +
x 2 2

Z>/5 2 ) g W +W x 2. Thus W + W is a subspace. x 2

W +W
Theorem smallest subspace containing both W and
4.4. x 2 /s //?e x

W 2; W + VV = (Wx U W //"Ax spans W and A


f/zotf w, x W ^e« 2 2 >. x 2 j/?a/!j 2,

A u A spans W + W
x 2 x 2.

proof. Since e W W c W + W Similarly, ^ c W + VV x , 2 x 2. x 2.

Since W + VV a subspace containing Wx U W (W u W c W + W


x 2 is 2, x 2> x 2.

For any a e W + W a can be written in the form a = a + a where


x 2 , x 2

a g W and a g VV
x Then a G W
x (Wx u W and a g W
2 (Wx U2. x x
<=
2> 2 2
cz

W Since (W u W
2 >. a subspace, a = a + a e (Wx U W 1 Thus 2> is x 2 2 >.

Wx + W = (Wx u W 2 2 >.
22 Vector Spaces |
I

The second part of the theorem now follows directly. W = t (A x ) <=

U Aj>
(A x and 2
= (A 2) W
c (A x U A 2 ) so that W uW 1 2
c (^ u A2 > <=

<W! u W 2 >, and hence <W X U 2>


= <A t u A2 ). a W
Theorem subspace 4.5. A
V is a finite W of an n-dimensional vector space
dimensional vector space of dimension m < n.
proof. If YV = {0}, then is 0-dimensional. Otherwise, there is a non- W
zero vector o^ e W. If (o^) = W, Wis 1-dimensional. Otherwise, there is an
a 2 ^ (ax) in W. We continue in this fashion as long as possible. Suppose we
have obtained the linearly independent set {a l9 cc } and that it does not . . ,
k .

span W. Then there exists an a fc+1 e W, a fc+1 ^ (a l5 a ). In a linear . . . ,


fc

relation of the form 2*ii a^ = we could not have ak+1 ^ for then
afc+i e <a l9 a t >. But then the relation reduces to the form JjLi «*<*» = 0.
. . . ,

Since {a x aj is linearly independent, all a* = 0. Thus {a x


, . . . , a a^+j} , . . . ,
fc ,

is In general, any linearly independent set in


linearly independent. that W
does not span W
can be expanded into a larger linearly independent set in W.
This process cannot go on indefinitely for in that event we would obtain more
than n linearly independent vectors in V. Thus there exists an m such that
<a 1} a TO > = W. It is clear that m < n.
. . . ,

Theorem 4.6. Given any subspace W of dimension m in an n-dimensional


vector space V, there exists a basis {a l5 . . . , a w a TO+1
, , . . . , a n } of V such that
{a x , . . . ,
m } is a basis of W.
oi

proof. By the previous theorem we see that has a basis {a x a TO }. W , . . . ,

This set is also linearly independent when considered in V, and hence by

Theorem 3.6 it can be extended to a basis of V. n


Theorem 4.7. If two subspaces U and Wofa vector space V have the same
finite dimension and U cz W, then U = W.
proof. By the previous theorem there exists a basis of U which can be
extended to a basis of W. But since dim U — dim W, the basis of W can
have no more elements than does the basis of U. This means a basis of
U is also a basis of W; that is, U = W.
Theorem 4.8. If W x and W 2 are any two subspaces of a finite dimensional
vector space V, then dim (W + t
W = dim W + dim VV - dim (W n W
2) 1 2 x 2 ).

proof. Let {a x , . . . , ar } be a basis of W n VV This basis can be 1 2.

extended to a basis {a x r f} lt x , . . . , a &} of W,


and also to a basis
. . .
,

{ai, .ar y x
. .
y t } of VV2 It is clear that {<x x
, , , .
<x r
. . lf,
/? s y lt . , . . . , , . . . , ,

... ,y t } spans Wj + 2 we wish to show that this set is linearly W


independent. ;

Suppose 2* a t at t + 2* + 2* c fcn = is a linear relation. Then M;


2< a &i + Z>

= 2fc k7k- The left side is
c Mi i and tne ri 8 nt side is
in mw
VV2 and hence both are in
, x VV 2 Each side is then expressible as a W n .

linear combination of the {<xj. Since any representation of an element as a


linear combination of the {ai, . . . , ar p it
,
. . .
, &} is unique, this means that
4 |
Subspaces 23

bj =
for ally. By a symmetric argument we see that all c = 0. Finally,
k
thismeans that J, a^ = from which it follows that all at = 0. This
shows that the spanning set {a l5 ar ft, ft, y x 7t } is linearly . . . ,
,
. .
. , , . . .
,

independent and a basis of x + 2 Thus dim (Wj + VV2 ) = W


r + 5 + / = W .

(r + 5) + + - r = dim x + dim 2 - dim (Wx n 2). n W W W


As an example, consider in R3 the subspaces W = x <(1, 0, 2), (1, 2, 2))
and W = 2 <(1, 1, 0), (0, 1, 1)). Both subspaces are of dimension 2. Since
Wi <= W + x VV2 cz R3 we see t h a t 2 < dim (Wx + 2 ) < 3. Because of W
Theorem 4.8 this implies that 1 < dim (W n x W < 2) 2. In more familiar
terms, W x and W 2 are planes in a 3-dimensional space. Since both planes
contain the origin, they do intersect. Their intersection is either a line or,
in case they coincide, a plane. The first problem is to find a basis for W nW 2.
Any a.eW1 nW 2 must be expressible in the forms a =
x

a(l, 0, 2) +
6(1, 2, 2) = c(l, 1, 0) + </(0, 1, 1). This leads to the three equations:

=c a + 6
2b = c + d
2a + 2b= d.

These equations have the solutions b = —3a, c = —2a, d = —4a. Thus


a = a(l, 0, 2) - 3a(l, 2, 2) = a(-2, -6, -4). As a check we also have
a = -2a(l, 1,0)- 4a(0, 1,1) = a(-2, -6, -4). We have determined
that {(1, 3, 2)} is a basis of W n W Also {(1, 3, 2), (1, 0, 2)} is a basis
t 2.

of W x and {(1, 3, 2), (1,1, 0)} is a basis of W 2.


We are all familiar with the theorem
solid geometry to the effect from
that two non-parallel planes and the example above is an
intersect in a line,
illustration of that theorem. In spaces of dimension higher than 3, how-
ever, it is possible for two subspaces of dimension 2 to have but one point
in common. For example, in R4 the subspaces x
= <(1 0, 0, 0), (0, 1 0, 0)> W , ,

and W
2 = <(0, 0, 1,0), (0,0,0,1)) are each 2-dimensional and n W x
W = {0},
2 W, + W = 2 R*.
Those cases in which dim (Wx n VV2 ) = deserve special mention. If
W nW = {0} we say that the sum W + VV direct: W + W a
x 2 x 2 is x 2 is
direct sum of W and W To indicate that a sum direct we use the notation,
x 2. is
W e W For a e W W there exist a e W and a e W such that
x 2. x 2 x x 2 2
a = ax + a2 . This much
any sum of two subspaces. If the sum is
is true for
direct, however, <x x and <x 2 are uniquely determined by a. For if a = a + a =
x 2
^ + a;, then a t - a( = a 2 - a 2 Since the left side is in x and the . W
right side is in VV2 both are in x n ,
2
W
But this means ol x — x = and W . <x.

oc 2 — <x 2 = 0; that is, the decomposition of a into a sum of an element in


x W
plus an element in 2 is unique. W
If V is the direct sum of x and VV2 we say W ,

that W
x and
VV2 are complementary and that VV2 is a complementary subspace
of W
l5 or a complement of W^.
.

24 Vector Spaces |
I

The notion of a direct sum can be extended to a sum of any finite number
of subspaces. The sum x + + k is said to be direct W • • • W if for each i,

W n Q^i Wj) =
t {0}. If the sum of several subspaces is direct, we use the
notation W © W x 2 © • • •
© W k . In this case, too, a e W x © • •
© W k
can be expressed uniquely in the form a = ^a i5
a f e W^.
Theorem 4.9. If W is a subspace of V there exists a subspace W such that
V =w@w.
proof. Let {a l5 a m } be a basis of W. Extend this linearly inde-
. . . ,

pendent set to a basis {a 1} a m a TO+1 a n } of V. Let be the sub- . . . , , , . . . , W


space spanned by {<x TO+1 <x„}. Clearly, r\ = {0} and the sum
, . . . , W W
V = + W W
is direct. D

Thus every subspace of a finite dimensional vector space has a comple-


mentary subspace. The complement is not unique, however. If for there W
exists a subspace such that V = W
© W, we say that is a direct W W
summand of V.
Theorem 4.10. For a sum of several subspaces of a finite dimensional
vector space to be direct it is necessary and sufficient that dim (Wx + -f • • •

W = dim W +
k) x
• • •
+ dim W k .

proof. This an immediate consequence of Theorem 4.8 and the prin-


is

ciple of mathematical induction.

EXERCISES
1 Let P be the space of all polynomials with real coefficients. Determine which
of the following subsets of P are subspaces.
(a){p(x)\p(l)=0}.
(b) {p{x) |
constant term ofp(x) = 0}.
(c) {p(x) |
degree ofp(x) = 3}.

{d) {p{x) |
degree ofp(x) < 3}.
(Strictly speaking, the zeropolynomial does not have a degree associated with it.
It is sometimes convenient to agree that the zero polynomial has degree less than

any integer, positive or negative. With this convention the zero polynomial is
included in the set described above, and it is not necessary to add a separate
comment to include it.)

(e) {p(x) |
degree of p{x) is even} u {0}.
2. Determine which of the following subsets of Rn are subspaces.
(a) {(«!, x 2 , xn ) x 1 = 0}.
. . . , |

(b) {(x lt x 2 ,..., xn ) x x > 0}. |

(c) {(x x x 2
, , xn ) x x + 2x 2 = 0}.
. . . ,
I

(d) {(»!, x 2 , xn ) x x + 2x 2 = 1}.


. . . , |

00 {(«i, *2, »«) x l + 2x 2 ^ °}-


• • • ,

M
I

(/) {(^ij x 2> xn) ™i < x i < Mf. i = 1, 2,


>
n where the m^ and
\
. . . , t
-

are constants}.
(g) {(x i, x z, • • • ,
xn) *i I
= x2 = • • = ^n}-
4 |
Subspaces 25

3. What
the essential difference between the condition used to define the
is

subset in of Exercise 2 and the condition used in (d) ? Is the lack of a non-zero
(c)

constant term important in (c) ?

4. What
the essential difference between the condition used to define the
is

subset in (c) of Exercise 2and the condition used in (e)7 What, in general, are the
differences between the conditions in (a), (c), and (g) and those in (b), (e), and (/)?

5. Show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span
the same subspace of R4 .

6. Let W
be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1),
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}.
Find a basis for W
and the dimension of W.
7. Show that {(1, -1,2, -3), (1, 1, 2, 0), (3, -1,6, -6)} and {(1,0,1,0),
(0, 2, 0, 3)} do not span the same subspace.
8. Let W = <(l, 2, 3, 6), (4, -1,3, 6), (5, 1, 6, 12)> and W 2
= <(1, -1, 1, 1),
(2, -1,4, 5)> be subspaces of R4 Find bases for .
x
n 2 and W W W + W Extend
x 2.
the basis of W x W W
n 2 to a basis of lt and extend the basis of W n VV to a basis
x 2
of W 2. From these bases obtain a basis of
x + 2 W W .

9. Let P be the space of all polynomials with real coefficients, and let W ±
=
(p(x) \p(l) = 0} and W 2 = {p (x) \p(l) = 0}. Determine W 1
n W 2 and W x + W 2.
(These spaces are dimensional and the student is not expected to find
infinite
bases for these subspaces. What is expected is a simple criterion or description
of these subspaces.)

10. We have already seen (Section 1, Exercise 11) that the real numbers form
a vector space over the rationals. Show that {1, V2} and {1 - V2, 1 + V2}
span the same subspace.
11. Show that if x and
W W
2 are subspaces, then Wx
u W 2 is not a subspace
unless one is a subspace of the other.

12. Show that the set of all vectors (x lt x 2 x3 x4 ) e R4


, , satisfying the equations

3x x — 2x 2 — x3 — 4x4 =

4
is a subspace of R Find a basis for this subspace. {Hint: Solve the equations for
.

x x and x 2 in terms of x s and a;4 Then specify various values for x and x to obtain
.
z 4
as many linearly independent vectors as are needed.)

13. Let S, T, and T* be three subspaces of V (of finite dimension) for which
(a)S n T = S nT*,(b)S + T = S +T*, (c) T c T*. Show that T = T*.

14. Show by example that it is possible to have S ®T = S ®T* without having


T = T*.

If V = Wj e W and W
15. 2 is any subspace of V such that W c W, show that
x
W = (W nWj) (W n VV 2 ). Show by an example that the condition W <= W
(or W
x
VV)
2
<=
necessary. is
chapter
II Linear
transformations
and matrices

In this chapter we define linear transformations and various operations:


addition of two linear transformations, multiplication of two linear trans-
formations, and multiplication of a linear transformation by a scalar.
Linear transformations are functions of vectors in one vector space U
with values which are vectors in the same or another vector space V which
preserve linear combinations. They can be represented by matrices in the
same sense that vectors can be represented by w-tuples. This representation
requires that operations of addition, multiplication, and scalar multiplication
of matrices be defined to correspond to these operations with linear trans-
formations. Thus we establish an algebra of matrices by means of the
conceptually simpler algebra of linear transformations.
The matrix representing a linear transformation of U into V depends on
the choice of a basis in U and a basis in V. Our first problem, a recurrent
problem whenever matrices are used to represent anything, is to see how a
change in the choice of bases determines a corresponding change in the
matrix representing the linear transformation. Two matrices which represent
the same linear transformation with respect to different sets of bases must
have some properties in common. This leads to the idea of equivalence
relations among matrices. The exact nature of this equivalence relation

depends on the bases which are permitted.


In this chapter no restriction is placed on the bases which are permitted
and we obtain the widest kind of equivalence. In Chapter III we identify
U and V and require that the same basis be used in both. This yields a
more restricted kind of equivalence, and a study of this equivalence is both
interesting and fruitful. In Chapter V we make further restrictions in the
permissible bases and obtain an even more restricted equivalence.
When no restriction is placed on the bases which are permitted, the
26
1 I
Linear Transformations 27

equivalence is so broad that it is relatively uninteresting. Very useful results


are obtained, however, when we are permitted to change basis only in the
image space V. In every set of mutually equivalent matrices we select one,
representative of of them, which we call a normal form, in this case
all

the Hermite normal form. The Hermite normal form is one of our most
important and effective computational tools, far exceeding in utility its
application to the study of this particular equivalence relation.
The pattern we have described is worth conscious notice since it is re-
current and the principal underlying theme in this exposition of matrix
theory. We define a concept, find a representation suitable for effective
computation, change bases to see how this change affects the representation,
and then seek a normal form in each class of equivalent representations.

1 I Linear Transformations
Let U and V be vector spaces over the same field of scalars F.

Definition. A linear transformation a of U into V is a single-valued mapping


of UV which associates to each element a e U a unique element <r(a) e V
into
such that for all a, e U and all a, b e F we have

o(a<x + bp) = aa(<x) + bo(P). (1.1)

We image of a under the linear transformation a. If a e V,


call cr(a) the

then any vector <xeU such that <r(a) = a is called an inverse image of a.
The set of all a e U such that <r(<x) = a is called the complete inverse image
of a, and it is denoted by o r_1 (a). Generally, <r -1 (a) need not be a single
element as there may be more than one aeli such that c(a) = a.
By taking particular choices for a and b we see that for a linear trans-
formation cr(a + |8) = ff(a) + and (r(aa) = a(r(a). Loosely speaking,
<r(/8)

the image of the sum is the sum of the images and the image of the product
isthe product of the images. This descriptive language has to be interpreted
generously since the operations before and after applying the linear trans-
formation may take place in different vector spaces. Furthermore, the remark
about scalar multiplication is inexact since we do not apply the linear trans-
formation to scalars; the linear transformation is defined only for vectors
in U. Even so, the linear transformation does preserve the structural
operations in a vector space and this is the reason for its importance. Gener-
ally, in algebra a structure-preserving mapping is called a homomorphism.
To describe the special role of the elements of F in the condition, a(aa.) =
ao(<x), we say that a linear transformation is a homomorphism over F, or an
F-homomorphism
If for a 5^ j8 it necessarily follows that <r(a) ^ o(ji), the homomorphism a
is said to be one-to-one and it is called a monomorphism. If A is any subset of
28 Linear Transformations and Matrices |
II

U, a(A) will denote the set of all images of elements of A; a{A) = {a |


a =
<r(a) forsome a e A}. o(A) is called the image of A. a(U) is often denoted by
Im((r) and is called the image of a. If Im(or) = V we shall say that the homo-
morphism is a mapping onto V and it is called an epimorphism.
We call the set U, on which the linear transformation a is defined, the
domain of a. We call V, the set in which the images of a are defined, the
codomain of a. Strictly speaking, a linear transformation must specify
the domain and codomain as well as the mapping. For example, consider
the linear transformation that maps every vector of U onto the zero vector of
V. This mapping is called the zero mapping. If W
is any subspace of V, there is

also a zero mapping of U into W, and this mapping has the same effect on the
elements of U as the zero mapping of U into V. However, they are different
linear transformations since they have different codomains. This may seem
like an unnecessarily fine distinction. Actually, for most of this book we
could get along without this degree of precision. But the more deeply we go
into linear algebra the more such precision is needed. In this book we need
this much care when we discuss dual spaces and dual transformations in
Chapter IV.
A homomorphism that is both an epimorphism and a monomorphism is
called an isomorphism. If a e V, the fact that a is an epimorphism says that
There is an a e U such that a(<x) = a. The fact that a is a monomorphism says
that this a is unique. Thus, for an isomorphism, we can define an inverse
mapping cr_1 that maps a onto a.
-1
Theorem The inverse cr
1.1. of an isomorphism is also an isomorphism.
proof. Since a' is obviously one-to-one and onto, it is necessary only
1

to show that it is linear. If a = <r(a) and /?= <r(/3), then a(aa. + bp) =
a* + bp so that a-\adL + bfi) = a<x + bp = acr\a) + ba- ^).
1

_1
For the inverse isomorphism (a) is an element of U. This conflicts with
cr
_1
the previously given definition of (a) as a complete inverse image in which
ff
-1
o-\dL) is a subset of U. However, the symbol cr , standing alone, will
always be used to denote an isomorphism, and in this case there is no diffi-
_1
culty caused by the fact that <r (a) might denote either an element or a one-
element set,

Let us give some examples of linear transformations. Let U = V = P,

the space of polynomials in x with coefficients in R. For a = 27=0°^'


define cr(a) = — = 2 =o "**?
n
l
"
ln calculus one of tne very first things
dx %
'

d(* + 0) da dp
proved about the derivative is that it is linear, =
dx

+ dx andJ —
j/ j dx
——
\

=a — —
The mapping t(<x) = ^Lo - ^7 x * +1 is also linear Notice -

+ 1
.

dx dx i

that this is not the indefinite integral since we have specified that the constant
1
I
Linear Transformations 29

of integration shall be zero. Notice that a is onto but not one-to-one and t
is one-to-one but not onto.

Let U = R n and V = R m with m < n. For each a = (a lt an) e Rn . . . ,

define a(a) = (a l5 a TO ) e R m.It is clear that this linear transformation


. . , .

isone-to-one if and only if m=


n, but it is onto. For each /? {b x ,
= , . . .

bj g R m define t(j8) fo, 6 m 0, = 0) e R


n
This linear transforma-
. . . , , . . . , .

tion is one-to-one, but it is onto if and only if m n. =


Let U V. =
For a given scalar ae F the mapping of a onto a a is linear since

fl(a + 0) = aa + a£ = a(a) + a(0),


and
«(&<x) = (a6)a = (ba)<x. = b a(cn).

To simplify notation we also denote this linear-transformation by a. Linear


transformations of this type are called scalar transformations, and there is a
one-to-one correspondence between the field of scalars and the set of scalar
transformations. In particular, the linear transformation that leaves every
vector fixed is denoted by 1. It is called the identity transformation or unit
transformation. If linear transformations in several vector spaces are being
discussed at the same time, it may be desirable to identify the space on which
the identity transformation is defined. Thus ly will denote the identity
transformation on U.
When a basis of a finite dimensional vector space V is used to establish a
n
correspondence between vectors in V and «-tuples in F , this correspondence
is an isomorphism. The required arguments have already been given in
Section 1-3. Since V and F n are isomorphic, it is theoretically possible to
n
discuss the properties of V by examining the properties of F . However, there
is much interest and importance attached to concepts that are independent
of the choice of a basis. If a homomorphism or isomorphism can be defined
uniquely by intrinsic properties independent of a choice of basis the mapping is
said to be natural or canonical. In particular, any two vector spaces of
dimension n over F are isomorphic. Such an isomorphism can be established
n
by setting up an isomorphism between each one and F This isomorphism .

will be dependent on a choice of a basis in each space. Such an isomorphism,


dependent upon the arbitrary choice of bases, is not canonical.
Next, let us define the various operations between linear transformations.
For each pair a, r of linear transformation of U into V, define a + t by the
rule
(a + t)(<x) = <r(a) + t(oc) for all ae^
a + t is a linear transformation since
(a + r)(aa + bfi) = o(a«. + r(aa + bp) = aa{a) + M0)
+ bfi)

+ aT(a) + M0) = a[o(a) + r(a)] + b{a{fi) + r(/3)]

= a(a + t)(oc) + b(a + t)(jS).


:

30 Linear Transformations and Matrices |


II

Observe that addition of linear transformation is commutative a + r = ;

r + a.
For each linear transformation a and a e F define aa by the rule (aa)(ct) = ;

a[a(<x.)]. aa is a linear transformation.


It is not difficult to show that with these two operations the set of all linear

transformations of U into V is itself a vector space over F. This is a very


important fact and we occasionally refer to it and make use of it. However,
we wish to emphasize that we define the sum of two linear transformations
if and only if they both have the same domain and the same codomain.
It is neither necessary nor sufficient that they have the same image, or that the
image of one be a subset of the image of the other. It is simply a question of
being clear about the terminology and its meaning. The set of all linear
transformations of U into V will be denoted by Hom(U, V).
There is another, entirely new, operation that we need to define. Let W
be a third vector space over F. Let a be a linear transformation of U into V
and t a linear transformation of V into W. By ra we denote the linear trans-
formation of U into W
defined by the rule: (to-)(oc) = r[a(oC)]. Notice that
in this context ar has no meaning. We refer to this operation as either
iteration or multiplication of linear transformation, and ra is called the
product of t and a.
The operations between linear transformations are related by the following
rules

1. Multiplication is associative: tt{to) = (rrr)a. Here n is a linear


transformation of W into a fourth vector space X.
2. Multiplication is distributive with respect to addition:

(ti + t 2 )o = rx a + r2 a and r(a 1 + a2 ) = ra x + tg2 .

3. Scalar multiplication commutes with multiplication: a(TG) = r(aa).


These properties are easily proved and are left to the reader.

Notice that if W#
U, then to is defined but or is not. If all linear trans-
formations under consideration are mappings of a vector space U into itself,
then these linear transformations can be multiplied in any order. This means
that ra and ar would both be defined, but it would not mean that ra = ar.
The of linear transformation of a vector space into itself is a vector
set
space, as we have already observed, and now we have defined a product
which satisfies the three conditions given above. Such a space is called an
associative algebra. In our case the algebra consists of linear transformation
and it is known as a linear algebra. However, the use of terms is always in

a state of flux, and today this term is used in a more inclusive sense. When
referring to a particular set with an algebraic structure, "linear algebra"
still denotes what we have just described. But when referring to an area of
1 I
Linear Transformations 31

study, the term "linear algebra includes virtually every concept in which
linear transformations play a role, including linear transformations between
different vector spaces (in which the linear transformations cannot always
be multiplied), sequences of vector spaces, and even mappings of sets of
have the structure of a vector space).
linear transformations (since they also

Theorem 1.2. Im(o-) is a subspace of V.

proof. If a and j5 are elements of Im(cr), there exist a, e U such that


<r(a) = a and c(p) = /5. For any a, beF, a(a<x + b§) = aa(<x.) + ba(fi) =
av. + bfi g Im(cr). Thus Im(cr) is a subspace of V. D
Corollary 1.3. If L/ x is a subspace of U, then (ji^d is a subspace of V.

It follows from this corollary that <r(0) = where denotes the zero vector
of U and the zero vector of V. It is even easier, however, to show it directly.
Since o"(0) = <r(0 + 0) = tf(0) + <r(0) it follows from the uniqueness of the

zero vector that cr(0) = 0.

For the rest of this book, unless specific comment is made, we assume
that all vector spaces under consideration are finite dimensional. Let
dim U = n and dim V = m.
The dimension of the subspace Im(cr) is called the rank of the linear trans-
formation a. The rank of a is denoted by p(ff).

Theorem 1.4. p($ < {m, n}.


proof. If dependent in U, there exists a non-trivial
{<*!, . . . , <x
s}
is linearly
relation of the But then 2i «i^( a ») = ff (°) = 0; that is,
form 2< a^ = 0.

{ofay), cr(a s )} is linearly dependent in V.


. . . ,
A linear transformation
preserves lin ear relations and transforms dependent sets into dependent
sets. Thus, there can be no more than n linearly independent elements in

Im((r). In addition, Im(<r) is a subspace of V so that dim lm(a) < m. Thus


p(o) = dim Im(o-) < min {m, n).

Theorem 1.5. If W
is a subspace of V, the set <r\W) of all a e U such that

(r(a)£ W
is a subspace of U.

proof. If a, 6 (f^W), then a(ax + bfS) = ao{a) + botf) e W. Thus


-1
au. + bp e <r (W) and tr^W) is a subspace.
The subspace K(a) = a_1 (0) is called the kernel of the linear transformation
a. The dimension of K{a) is called the nullity of a. The nullity of a is denoted
by v(a).

Theorem 1.6. p(<r) + v(<r) = n.

proof. Let {a l5 . . . , a v &, ,p k ) be a basis of U such that {a l9


. . .
,
av } . . . ,

is a basis of K(a). For a = 2* a^ + 2,- bfis e U we see that <r(a) =


2,
On
a*tf(oO
the other
+ L-
hand
MA)
if %
= 2*
c,<r(ft)
W-
Thus Wi), ••-, *(&)} spans Im(cr).
= 0, then cr(2, c,&) = 2,- c/Kft) = 0; that
32 Linear Transformations and Matrices |
II

is, 2* c Si e Ka ( )- 2; c>& =
In tnis case tnere exist coefficients d{ such that
2, di&i- If any of these coefficients were non-zero we would have a non-
trivial relation among the elements of {a l5 a,, ^ l5 /?*.}. Hence, all. . . , . . . ,

cf
= and M/^), cr(0 )} is linearly
. . .independent.
, fc
But then it is a basis

of Im(<r) so that k = p(a). Thus p{a) + r(a) = n. U


Theorem 1.6 has an important geometric interpretation. Suppose that
a 3-dimensional vector space R were mapped onto a 2-dimensional vector
3

space R 2
In this case, it is simplest and sufficiently accurate to think of a
.

3 2
as the linear transformation which maps (a u a 2 a 3 ) e R onto (a lt a 2 ) e R ,

which we can identify with (a lt a 2 0) e R 3


Since p(a) = 2, i>(V)
,
= 1. .

Clearly, every point (0, 0, « 3 ) on the :r3 -axis is mapped onto the origin.
Thus K{a) is the z 3 -axis, the line through the origin in the direction of the
projection, and {(0, 0, 1) = aj is a basis of K(a). It should be evident
that any plane through the origin not containing K{o) will be projected onto
the a^-plane and that this mapping is one-to-one and onto. Thus the com-
plementary subspace <&, /S 2 > can be taken to be any plane through the origin
not containing the rr 3 -axis. This illustrates the wide latitude of choice possible
for the complementary subspace </? l5 P p ). . . .
,

Theorem 1.7. A linear transformation a of U into V is a monomorphism if


and only ifv{a) = 0, and it is an epimorphism if and only if p(a) = dim V.
proof. K(a) = {0} if and only if v(o) = 0. If a is a monomorphism, then
certainly K{o) = {0} and v{a) = 0. On the other hand, if v(a) = and
<r(a) = ff(j8), then <r(a - j8) = so that a - e K(o) = {0}. Thus, if

v(a) = 0, a is a monomorphism.
It is but a matter of reading the definitions to see that o is an epimorphism
if and only if p(a) = dim V.

If dim U = m, then p(a) = n — v(a) < n < m so that


n < dim V =
a cannot be an epimorphism. If« > m,thenv(a) = n — p(a) >n — 0, m>
so that a cannot be a monomorphism. Any linear transformation from a
vector space into a vector space of higher dimension must fail to be an
epimorphism. Any linear transformation from a vector space into a vector
space of lower dimension must fail to be a monomorphism.

Theorem 1.8. Let U and V have the same finite dimension n. A linear

transformation a ofil into V is an isomorphism if and only if it is an epimorphism.


a is an isomorphism if and only if it is a monomorphism.
proof. It is part of the definition of an isomorphism that it is both an
epimorphism and a monomorphism. Suppose a is an epimorphism. p(a) =
n and v(o) = by Theorem 1.6. Hence, a is a monomorphism. Conversely
if a is a monomorphism, then v(a) = and, by Theorem 1.6, p(o) = n.
Hence, a is an epimorphism.
1 I
Linear Transformations 33

Thus a linear transformation a of U into V is an isomorphism if two of the


following three conditions are satisfied: (1) dim U = dim V, (2) a is an
epimorphism, (3) a is a monomorphism.

Theorem 1.9. p(r) = p(ro) + dim (Im(<7) n A"(r)}.

proof. Let be a new linear transformation defined on lm(a) mapping


t'

ImO) into W so that for all a e Im(o-), t'(oc) = T (a). Then AT(t') = Im(or) n
^(t) and />(Y) = dim t [Im(»] = dim r<r(L/) = p{ra). Then Theorem 1.6

takes the form


p(r') + v(t') *= dim Im(o-),

or

p(r<r) + dim (Im(<r) n #(t)} = p{a). D


Corollary 1.10. p(ra) = dim (Im(o-) + K(r)} - v(t).
proof. This follows from Theorem 1.9 by application of Theorem 4.8
of Chapter I.

Corollary 1.11. IfK(r) <= Im(cr), then p(a) = p(ra) + v(r).

Theorem 1.12. The rank of a product of linear transformations is less than


or equal to the rank of either factor : p{ra) min {p(r), p{a)}. <
proof. The rank of to is the dimension of r[a(U)] c T (V). Thus consider-
ing dim <r(U) as the '
V and dim t(V) as the "m" of Theorem 1.3 we see that
dim ra(U) = p{ra) < min {dim or(V), dim t (V)} = min {p{a), p(r)}. D
Theorem 1.13. If a is an epimorphism, then p(ja) = p(r). If r is a mono-
morphism, then p{ra) = p(a).
proof. If a is an epimorphism, then K(t) <= Im(o-) = V and Corollary
1.11 applies. Thus p(ra) = p{a) — v(t) = m — v(t) = p(r). If t is a
monomorphism, then K(r) = {0} <= Im(o-) and Corollary 1.11 applies.

Thus p(ra) = p(a) - v(t) = p(a). D


The rank of a linear transformation
Corollary 1.14. is not changed by
multiplication by an isomorphism {on either side).

Theorem 1.15. a is an epimorphism if and only if to = implies r — 0.


t i5 a monomorphism if and only ifra = implies a = 0.

proof. Suppose a is an epimorphism. Assume to is defined and ra = 0.


If t 5^ 0, there is a e V such that t(|8) 5^ 0. Since cr is an epimorphism,
there is an a e U such that <r(a) = 0. Then T<r(a) = t(^) 5^ 0. This is a
contradiction and hence t = 0. Now, suppose tot = implies t = 0. If or is
not an epimorphism then Im(cr) is a subspace of V but Im(a) ^ V. Let
{/?!, &.}
. be
. a. basis
,
of Im(o-), and extend this independent set to a basis

{ft, 0„
. . .
PJ,
of V. .Define
. .r^),
= ft for > r and r(ft) = for »
34 Linear Transformations and Matrices |
II

/ <, r. Then to = and t^O. This is a contradiction and, hence, a is an


epimorphism.
Now, assume ra is defined and ra = 0. Suppose r is a monomorphism.
If a 5^ 0, there is an a e U such that <r(a) # 0. Since t is a monomorphism,
T<r(a) 5^ 0. This is a contradiction and, hence, a = 0. Now assume rer =
implies <r = 0. If r is not a monomorphism there is an a e L/ such that a^0
and t(<x) = 0. Let {vl x , . . . , a„} be any basis of U. Define ^(a,) = a for each
i. Then To'(a l ) = r(a) = for all / and ra = 0. This is a contradiction and,
hence, r is a monomorphism.

Corollary 1.16. a is an epimorphism if and only if rx a = r 2 a implies


Ti = T 2- T w a monomorphism if and only if ra x = ra 2 implies a x = a 2 .

The statement that r x a = r 2 a implies t x = r 2 is called a right-cancellation,


and the statement that ra x = ro 2 implies a x = cr 2 is called a left-cancellation.
Thus, an epimorphism is a linear transformation that can be cancelled on the
right, and a monomorphism is a linear transformation that can be cancelled
on the left.

Theorem 1.17. Let A = {<x l5 . . . , a J be any basis of U. Let 6 = {/5j, . . .


,

/3J be any n vectors in V (not necessarily linearly independent). There exists


a uniquely determined linear transformation a of U into V such that o^a,) /9
= t

for i= 1,2, ... ,n.


proof. Since A is a basis of U, any vector ccg U can be expressed uniquely
in the form a = ^"=1 <*& If <* is to be linear we must have

n n
'£ *,** ;!--
(7(a) = ^ *<<<«)< = 2 «ift e U.

It is a simple matter to verify that the mapping so defined is linear.

Corollary 1.18. Let C {y lt = . . .


, y r ) be any linearly independent set in U,
where U is finite dimensional. Let D — {d x , . . . , d r ) be any r vectors in V.
There exists a linear transformation a of U into V such that ^(y^) = d t for
i = 1 , . . . , r.

proof. Extend C to a basis of U. Define a(y ) = d for i = 1, r, t t . . . ,

and define the values of a on the other elements of the basis arbitrarily. This
will yield a linear transformation a with the desired properties.

should be clear that, if C is not already a basis, there are many ways to
It

define a. Itis worth pointing out that the independence of the set C is crucial

to proving the existence of the linear transformation with the desired prop-
erties.Otherwise, a linear relation among the elements of C would impose
a corresponding linear relation among the elements of D, which would mean
that D could not be arbitrary.
1 I
Linear Transformations 35

Theorem 1.17 establishes, for one thing, that linear transformations really
do exist. Moreover, they exist in abundance. The real utility of this theorem
and its corollary is that it enables us to establish the existence of a linear
transformation with some desirable property with great convenience. All
we have to do is to define this function on an independent set.
Definition. A linear transformation tt of V into itself with the property that
7T-
2 = 7T is called a. projection.

Theorem 1.19. If tt is a projection oj V into itself, then V = lm(rr) © K(tt) "

and rr acts like the identity on Im(7r).


proof. For a G V, let o^ = 7r(a). Then 7r(ai) = 7r 2 (a) = 7r(a) = <*j. This
shows that tt acts like the identity on IrmV). Let <x 2 = a — a x Then 7r(a 2 ) = .

7r(a) — Trfai) = a x — a x = 0. Thus a = a x + ol 2 where x1 e Im(7r) and


<x 2 g K(tt). Clearly, Im(7r) n K(tt) = {0}.

Fig. 1

If S =' Im(7r) and T = K(tt), we say that tt is a projection of V onto S along


T. In the case where V is the real plane, Fig. 1 indicates the interpretation of
these words, a is projected onto a point of S in a direction parallel to 7*.

EXERCISES
1. Show that o((x 1 ,x 2 )) = (x 2 , x x ) defines a linear transformation of R 2 into
itself.

2. Let #!((#!, # 2 )) = 0e 2 , — x x ) and ^((^i* ^2)) = (^i>


~ x 2)- Determine ax + a2 ,

G\G 2 and (TgfTj.

3. Let U = V = R™ and let a-^, a;


2» • • • »
x»)) = (^1. «2 > • • • >
x fc> 0, • • • , 0)
where k < n. Describe Im(cr) and A^(ct).

4. Let o((x lt x 2> %> %)) = Qx — i


2cc 2 — x3 — 4x4 x x
, + x2 — 2x3 — 3%). Show
that a is a linear transformation. Determine the kernel of a.

5. Let o((x lt x2 x3 )) , = (2^ + x2 + 3x3 3x x


,
— x2 + x 3 —\x x
, + 3x 2 + x3 ). Find
36 Linear Transformations and Matrices |
II

a basis of o(U). (Hint: Take particular values of the x t to find a spanning set for
or(U).) Find a basis of K(o).
6. Let D denote the operator of differentiation,

dy d 2v
D(y) = -j
x
, D\y) = D[D(y)] = ^ , etc.

Show that D n is a
linear transformation, and also that p(D) is a linear transforma-
tion a polynomial in D with constant coefficients. (Here we must assume
if p(D) is

that the space of functions on which D is denned contains only functions differen-
tiable at least as often as the degree of p(D).)

7. Let U = V and let o and t be linear transformations of U into itself. In this


case gt and to are both defined. Construct an example to show that it is not
always true that or = to.

8. Let U = V = P, the space of polynomials in x with coefficients in R. For


" = 2,7=0 ^ let

n
tfC"*) = 2 * a i xil

and

i=0 * "I" 1

Show that <tt = 1, but that to ^ 1.

9. Show that if two scalar transformations coincide on U then the defining


scalars are equal.

10. Let o be a linear transformation of U into V and let A = {a x, .a„} be a


. . ,

basis of U. Show that if the values {0(0.-^}, . . . , o(a. n )} are known, then the value
of <r(a) can be computed for each a e U.

11. Let 1/ and V be vector spaces of dimensions n and w, respectively, over the
same field F. We have already commented that the set of all linear transformations
of U into V forms a vector space. Give the details of the proof of this assertion.
Let A = {04, a n } be a basis of U and 8 = {/? x
. . . , /? TO } be a basis of V. Let o^ , . . .
,

be the linear transformation of U into V such that

[0 if k *j,

[& if k =j.

Show that {tr


j; I
/ — 1, . . . , m;j = 1, . . . , n} is a basis of this vector space.

For the following sequence of problems let dim U = n and dim V — m. Let o
be a linear transformation of U into V and t a linear transformation of V into W.

12. Show that p(a) < P ( r o) + i>(t). (///«/: Let V= o(U) and apply Theorem
1.6 to t defined on V.)
13. Show that max {0, p(o) + p(-r) — m} < p(to) < min (p(t), p(<r)}.
2 I Matrices 37
14. Show that max {n — m + v(t), v(o)} <, v(to) <, min {n, v(o) + v( T)}. (For
m= n this inequality is known as Sylvester's law of nullity.)
15. Show that if j>(t) = 0, then p (to) = P (a).
16. It is not generally true that v(a) = implies P (ra) = P ( T). Construct an
example to illustrate this fact. (Hint: Let m be very large.)

17. Show that if m = n and v(a) = 0, then p (to) = p (t).


18. Show that if ax and <r
2 are linear transformations of U into V, then

P (a 1 + a2) < min {m, n, pK) + P (a z )}.

19. Show that |/>Oi) - p(ff 2 )l ^ p(°i + ff 2)-

20. If S any subspace of V there is a subspace T such that V = S © T. Then


is

every a; e V can be represented uniquely in the form a = a x + a where a e S and


2 x
a 2 G T. Show that the mapping rr which maps a onto a x is a linear transformation.
Show that T is the kernel of v. Show that tt2 = w. The mapping tt is called a
projection of V onto S along 7.

21. (Continuation) Let n be a projection. Show that 1 — tt is also a projection.


What is the kernel of 1 — -nl Onto what subspace is 1 — tt a projection? Show
that n(l - n) =0.

2 I Matrices
Definition. A matrix over a field F is a rectangular array of scalars. The
array will be written in the form

02i a9

(2.1)

whenever we wish to display all the elements in the array or show the form
of the array. A matrix with m rows and n columns is called an m x n
matrix. Ann x n matrix is said to be of order n.
We often abbreviate a matrix written in the form above to [a ti ] where
the index denotes the number of the row and the second index denotes
first

the number of the column. The particular letter appearing in each index
position is immaterial; it is the position that is important. With this con-
vention a H is a scalar and [a i}] is a matrix. Whereas the elements a and a
H kl
need not be equal, we consider the matrices [a i}] and [akl ] to be identical
since both [a i}] and [akl ] stand for the entire matrix. As a further convenience
we often use upper case Latin italic letters to denote matrices; A = [a ].
u
Whenever we use lower case Latin italic letters to denote the scalars appearing
;

38 Linear Transformations and Matrices |


II

in the matrix, we use the corresponding upper case Latin italic letter to denote
the matrix. The matrix in which all scalars are zero is denoted by (the third
use of this symbol!). The a tj appearing in the array [a i}\ are called the
elements of [a ti ]. Two matrices are equal if and only if they have exactly
the same elements. The main diagonal of the matrix [a {j ] is the set of elements
{a n , . a n ) where / = min {m, n}. A diagonal matrix is a square matrix
. . ,

in which the elements not in the main diagonal are zero.


Matrices can be used to represent a variety of different mathematical con-
cepts. The way matrices
are manipulated depends on the objects which they
represent. Considering the wide variety of situations in which matrices have
found application, there is a remarkable similarity in the operations performed
on matrices in these situations. There are differences too, however, and to
understand these differences we must understand the object represented and
what information can be expected by manipulating with the matrices. We
first investigate the properties of matrices as representations of linear trans-

formations. Not only do the matrices provide us with a convenient means of


doing whatever computation is necessary with linear transformations, but the
theory of vector spaces and linear transformations also proves to be a power-
ful tool in developing the properties of matrices.
Let U be a vector space of dimension n and V a vector space of dimension
m, both over the same field F. Let A = {a x a n } be an arbitrary but
, . . . ,

fixed basis of U, and let B = {j8 l5 /S


TO }.be an
. arbitrary
.
, but fixed basis
of V. Let a be a linear transformation of U into V. Since c(a,) e V, <r(a,)
can be expressed uniquely as a linear combination of the elements of B
m
<*«,) = I <»<A (2-2)

We define the matrix representing a with respect to the bases A and B to be


the matrix A = [a^].
The correspondence between linear transformations and matrices is
actually one-to-one and onto. Given the linear transformation cr, the a tj
exist because B spans V, and they are unique because B is linearly independent.
On the other hand, let A = [a i} ] be any m x n matrix. We can define
o(ol }) = 2*Li a aPi f° r each oijE A, and then we can extend the proposed

linear transformation to all of U by the condition that it be linear. Thus,


if I =
2j=i x a i' we define #« &***« ^uJ.;»4u'i«
/.'•'"'
"!.,*•.<*. - • •*.
;

n i m \

m I n \

= 2(2««*i)A. (2-3)
i=i y=i /
2 I Matrices 39

a can be extended to all of U because A spans U, and the result is well defined
(unique) because A is linearly independent.
Here are some examples of linear transformations and the matrices which
represent them. Consider the real plane R 2 = U = V. Let A = 6 = {(1,0),
(0, 1)}. A 90° rotation counterclockwise would send (1,0) onto (0, 1) and
it would send (0, 1) onto (-1,0). Since <r((l, 0)) = (1, 0) + 1 (0, 1) and • •

(T((0, 1)) = (-1) (1, 0) + (0, 1), cr is represented by the matrix


• •

-1
1

The elements appearing in a column are the coordinates of each image of a


basis vector under a transformation.
In general, a rotation counterclockwise through an angle of 6 will send
(1, 0) onto (cos 6, sin 0) and (0, 1) onto (—sin 0, cos 0). Thus this rotation
is represented by

cos -sin
(2.4)
sin ( cos

Suppose now that r is another linear transformation of U into V represented


by the matrix B = [b ti ]. Then for the sum a + t we have

(a + r)(a,) = cr(a,) + r(a,) = J a wft + J b it fit

m
=2 ( a ii + b ii)Pi- (2.5)

Thus a + t is represented by the matrix [a {j + Z>„]. Accordingly, we


define the sum of two matrices to be that matrix obtained by the addition
of the corresponding elements in the two arrays; A +B= [a tj + bu \ is
the matrix corresponding to a +
The sum of two matrices is denned if
t.

and only if the two matrices have the same number of rows and the same
number of columns.
If a is any scalar, for the linear transformation aa we have

(0<r)(a,.) = a 2 a„ft = 2 (™a)P<- (2.6)

Thus aa is represented by the matrix [aa^]. We therefore define scalar


multiplication by the rule aA = [aa^].
Let W be
a third vector space of dimension r over the field F, and let
C = {yi» yr } be an arbitrary but fixed basis of W. If the linear trans-
• • • ,

formation a of U into V is represented by them matrix A = [a^] and the xn


40 Linear Transformations and Matrices I II

linear transformation t of V into W is represented by the r x m matrix


B= [b ki ], what matrix represents the linear transformation to of U into W?
(V(7)(a,) = t{o{<x,)) = rl |>u&)
m

7H j T \

S.a.^,1

V-$jp*4UV*>~h^ <A &? v


r /m \
• (2.7)
[_0^v— . eci-n-,
fc=l \i=l //

Thus, if we define c kj = ^T=1 b^a^, then C = [c kj ] is the matrix representing


the product transformation to. Accordingly, we call C the matrix product
of B and >4, in that order: C = BA. ^J- u^C**-**
1

^.k*>.i Ike ^«jt u.»vAu I p <

For computational purposes it is customary to write the arrays of B ^T"


1

and A side by side. The element c kj of the product is then obtained by ,^ 7<
multiplying the corresponding elements of row k of B and column j of A
and adding. We can trace the elements of row k of B with a finger of the
left hand while at the same time tracing the elements of column j of A with

a finger of the right hand. At each step we compute the product of the
corresponding elements and accumulate the sum as we go along. Using
this simple rule we can, with practice, become quite proficient, even to the
point of doing "without hands."
Check the process in the following examples

2"
"l -f 2"
1 4 -1 "
5
2
2 1 3 = 11 -1
2 1

-2 1 -2 2_ -2_
3 -2
All definitions and properties we have established for linear transformations
can be carried over immediately for matrices. For example, we have:
1. • A =
(The "0" on the left is a scalar, the "0" on the right
0. is a
matrix with the same number of rows and columns as A.)
2. \- A = A.
3. A(B+ C)= AB + AC.
4. {A + B)C = AC + BC.
5. A(BC) = (AB)C.
Of course, in each of the above statements we must assume the operations
proposed are well defined. For example, in 3, B and C must be the same
2 I Matrices
41

size and A must have the same number of columns as B and C have rows.
The rank and nullity of a matrix A are the rank and nullity of the associated
linear transformation, respectively.

Theorem 2.1. For an m x n matrix A, the rank of A plus the nullity of A


is equal to n. The rank of a product BA is less than or equal to the rank
of
either factor.
These statements have been established for linear transformations and
therefore hold for their corresponding matrices.

The rank of a
is the dimension of the subspace Im(<r) of V. Since Im(a)
isspanned by {ofaj, <r(a B )}, p{a) is the number of elements in a
. . . ,

maximal linearly independent subset of {</(%), <r(a )}. Expressed . . . ,


w
in terms of coordinates, <r(a,) = J™=1 a it pt is represented by the m-tuple
(a li5 a 2j a mj ), which is the ra-tuple in column j of the matrix [a ].
, . . . ,
t
Thus p{a) =
p{A) is also equal to the maximum number of linearly inde-
pendent columns of A. This is usually called the column rank of a matrix
A, and the maximum number of linearly independent rows of A is called
the row rank of A. We, however, show before long that the number of
linearly independent rows in a matrix is equal to the number of linearly
independent columns. Until that time we consider "rank" and "column
rank" as synonymous.
Returning to Equation (2.3), we see that, if £ e 1/ is represented by
( x i, * n ) and the linear transformation a of U into V is represented
• • ,

by the matrix A = [a ti ], then <r(£) e V is represented by


(y ls yj where . . .
,

yt=i an x i
(i=l, ...,m). (2.8)
3=1

In view of the definition of matrix multiplication given by Equation


(2.7)
we can interpret Equations (2.8) as a matrix product of the form

Y= AX (2.9)
where

Vi

r= and X=

This single matric equation contains the equations in (2.8). m


We have already used the n-tuple (xlt xn ) to represent the vector . . . ,

f = XU
x i<*-i- Because of the usefulness of equation (2.9) we also find it
convenient to represent f by the one-column matrix X. In fact, since it is
:

42 Linear Transformations and Matrices I II

somewhat wasteful of space and otherwise awkward to display one-column


matrices we use the «-tuple (x 1} x n ) to represent not only the vector £
. . . ,

but also the column matrix X. With this convention [x x x n ] is a one-row • • •

matrix and (x lt x n ) is a one-column matrix.


. . . ,

Notice that we have now used matrices for two different purposes, (1) to
represent linear transformations, and (2) to represent vectors. The single
matric equation Y = AX contains some matrices used in each way.

EXERCISES
1 . Verify the matrix multiplication in the following examples

'

2 1
-3"
(«) 3 1
-2" 3 -4
-1 6 1 =
-5 2 3 -9 11
1 -2
'

2 1
-3" " 2" "10"
(*)

-1 6 1 3 = 15

1 -2 -1 4

10"
(c) 3 37
15
-5
4

2. Compute
2"

3 -4'
3
-9 11
-1

Interpret the answer to this problem in terms of the computations in Exercise 1.

3. Find AB and BA if

"10 1" '


1 2 3

110 B =
5 6 7
A =
10 10 -1 -2 -3 -4
10 1 -5 -6 -7 -!

be a linear transformation of R into itself that maps (1 0) onto (3, -1)


2
4. Let cr ,

and (0, 1) —
onto ( 1,2). Determine the matrix representing a with respect to the
bases A = B ={(1,0), (0,1)}.
2 |
Matrices 43

Let a be a linear transformation of R 2 into itself that maps (1, 1) onto (2, —3)
5.

and —1) onto (4, —7). Determine the matrix representing a with respect to the
(1,
bases A = B = {(1 0), (0, 1)}. (Hint: We must determine the effect of a when it is
,

applied to (1,0) and (0, 1). Use the fact that (1,0)= |(1, 1) + 1(1, -1) and the
linearity of a.)

happens that the linear transformation denned in Exercise 4 is one-to-one,


6. It

that a does not map two different vectors onto the same vector. Thus, there is
is,

a linear transformation that maps (3, -1) onto (1,0) and ( — 1,2) onto (0, 1).
This linear transformation reverses the mapping given by a. Determine the matrix
representing it with respect to the same bases.
7. Let us consider the geometric meaning of linear transformations. A linear

transformation of R2 into itself leaves the origin fixed (why?) and maps straight
lines into straight lines. (The word "into" is required here because the image of a
straight line may
be another straight line or it may be a single point.) Prove that
the image of a straight line is a subset of a straight line. (Hint: Let a be represented
by the matrix

A =

Then a maps (x, y) onto (a u a> + a 12 y, a 21 x + a 22y). Now show that if (x, y)
satisfies the equation ax + by = c its image satisfies the equation
(aa 22 - ba 21 )x + (a nb - a 12 a)y = (a u a 22 — a 12 a 21 )c.)

8. (Continuation) We
say that a straight line is mapped onto itself if every
point on the line is mapped onto a point on the line (but not all onto the same
point) even though the points on the line may be moved around.
(a) A
linear transformation maps (1, 0) onto (-1,0) and (0, 1) onto (0, -1).
Show that every line through the origin is mapped onto itself. Show that each
such line is mapped onto itself with the sense of direction inverted. This linear
transformation is called an inversion with respect to the origin. Find the matrix
representing this linear transformation with respect to the basis
{(1,0), (0, 1)}.
(b) A
linear transformation maps (1,1) onto (-1, -1) and leaves
(1, -1)
fixed. Show that every line perpendicular to the line x + x = is mapped onto
x 2
itself with the sense of direction inverted. Show that every point on the line
xi + x2 = is left fixed. Which lines through the origin are mapped onto them-
selves? This linear transformation is about the line x x + x 2 = 0.
called a reflection
Find the matrix representing this linear transformation with respect to the basis
{(1,0), (0, 1)}. Find the matrix representing this linear transformation with
respect to the basis {(1, 1), (1, —1)}.
(c) A liner transformation maps (1, 1) onto (2, 2) and (1, -1) onto (3, -3).
Show that the lines through the origin and passing through the points
(1,1) and
(1, -1) are mapped onto themselves and that no other lines are mapped onto
themselves. Find the matrices representing this linear transformation with respect
to the bases {(1, 0), (0, 1)} and {(1, 1), (1, -1)}.
44 Linear Transformations and Matrices I II

(d) A linear transformation leaves (1 , 0) fixed and maps (0, 1) onto (1 , 1). Show
that each line x2 = c is mapped onto itself and translated within itself a distance
equal to c. This linear transformation is called a shear. Which lines through the
origin are mapped onto themselves? Find the matrix representing this linear
transformation with respect to the basis {(1,0), (0, 1)}.
5
0) A linear transformation maps (1 0) onto (T 3 j-f) and (0, 1) onto ( -yf ts).
, , ,

Show that every line through the origin is rotated counterclockwise through the
angle 6 = arc cos T This linear transformation is called a rotation. Find the
V
matrix representing this linear transformation with respect to the basis {(1,0),
(0,1)}-
,

linear transformation maps (1 0) onto (f f ) and (0, 1) onto (3 , 3). Show


n

(/) A , ,

that each point on the line 2x x + x 2 = 3c is mapped onto the single point (c, c).
The line x - x = is left fixed. The only other line through the origin which
1 2

is mapped is the line 2x l + x 2 = 0.


into itself This linear transformation is called
a projection onto the line x x
- x 2
= parallel to the line 2x 1 + x 2 = 0. Find the
matrices representing this linear transformation with respect to the bases {(1 0), ,

(0, 1)} and {(1,1), (1, -2)}.

9. (Continuation) Describe the geometric effect of each of the linear transforma-


2 by the matrices
tions of R into itself represented

"o r "0 0" "l r


(«) (b) (c)
1 1

"1 0" 0" r3 4-1


~b 5 5
id) (e) if) 4 3
a 1 c 5 5

{Hint: In Exercise 7 we have shown that straight lines are mapped into straight

lines. We already know that linear transformations map the origin onto the origin.
Thus it is determine what happens to straight lines passing through
relatively easy to
the origin. For example, to see what happens to the a^-axis it is sufficient to see
what happens to the point (1,0). Among the transformations given appear a
rotation, a reflection, two projections, and one shear.)

10. (Continuation) For the linear transformations given in Exercise 9 find all

lines through the origin which are mapped onto or into themselves.
11. Let U = R 2 and V = R 3 and a be a linear transformation of U into V that
maps (1, 1) onto (0, 1, 2) and ( — 1, 1) onto (2, 1, 0). Determine the matrix that
represents a with respect to the bases A = {(1, 0), (0, 1)} in 8 = {(1, 0, 0), (0, 1,0),

(0,0, 1)} in R
3
. (Hint: |(1> D ~ K"l. D = 0.0)-)

12. What is the effect of multiplying an n x n matrix A by an n x n diagonal


matrix Z>? What is the difference between and DAI AD
13. Let a and b be two numbers such that a ^ b. Find all 2 x 2 matrices A
such that
0"
a a
A.
b
3 |
Non-singular Matrices 45

14. Show that the matrix C = [a^bj] has rank one if not all a t and not all bt are
zero. {Hint: Use Theorem 1.12.)

15. Let a, b, c, and d be given numbers (real or complex) and consider the
function

ax +b
J
ex +d

Let g be another function of the same form. Show that gf where gf{x) = g{f{x))
is a function that can also be written in the same form. Show that each of these
functions can be represented by a matrix in such a way that the matrix representing
gfis the product of the matrices representing £ and/. Show that the inverse function
exists if and only if ad — be ^ 0. To what does the function reduce if ad — be =0?

16. Consider complex numbers of the form x + yi (where x and y are real
numbers and i
2
= — 1) and represent such a complex number by the duple {x, y)
in R2 . Let a + bi be a fixed complex number. Consider the function / defined by
the rule

f{x + yi) = {a + bi){x + yi) = u + vi.

{a) Show that this function is a linear transformation of R 2 into itself mapping
{x, y) onto {u, v).
{b) Find the matrix representing this linear transformation with respect to the
basis {(1,0), (0,1)}.
(c) Find the matrix which represents the linear transformation obtained by using
c + di in place of a + bi. Compute the product of these two matrices. Do they
commute?
{d) Determine the complex number which can be used in place of a + bi to
obtain a transformation represented by this matrix product. How is this complex
number related to a + bi and c + dfl.
17. Show by example that it is possible for two matrices A and B to have the
same rank while A 2 and B 2 have different ranks.

3 I Non-singular Matrices

Let us consider the case where U = V, that is,we are considering trans-
formations of V into itself. Generally, a homomorphism of a set into itself
is called an endomorphism. We
consider a fixed basis in V and represent
the linear transformation of V into itself with respect to that basis. In this
case the matrices are square or n x n matrices. Since the transformations
we are considering map Vinto itself any finite number of them can be iterated
in any order. The commutative law does not hold, however. The same
remarks hold for square matrices. They can be multiplied in any order but
46 Linear Transformations and Matrices I II

the commutative law does not hold. For example


"o r "0 o~ "o r
_o o_ 1_ o o_
"0 0" "o r "0 0"

_o L _o o_ .o o_

The linear transformation that leaves every element of V fixed is the


identity transformation. We
denote the identity transformation by 1,
the scalar identity. Clearly, the identity transformation is represented by
« the matrix / = [d u] for any choice of the basis. Notice that IA = Al = A
for any n X n matrix A. I is called the identity matrix, or unit matrix, of
order n. If we wish to point out the dimension of the space we write /„ for
the identity matrix of order n. The scalar transformation a is represented
by the matrix al. Matrices of the form al are called scalar matrices.

Definition. A
one-to-one linear transformation a of a vector space onto
an automorphism. An automorphism is only a special kind of
itself is called
isomorphism for which the domain and codomain are the same space. If
<x(a) = a, the mapping a (a) = a is called the inverse transformation of a.
-1

The rotations represented in Section 2 are examples of automorphisms.


* Theorem 3.1. The inverse a~ x of an automorphism a is an automorphism.

« Theorem 3.2 A linear transformation r of an n-dimensional vector space


into itself is an automorphism if and only if it is of rank n that is, if and only if
;

it is an epimorphism.
»
Theorem 3.3. A linear transformation a of an n-dimensional vector space
into itself is an automorphism if and only if its nullity is 0, that is, if and only
is a monomorphism.
if it
proof (of Theorems 3.1, 3.2, and 3.3). These properties have already
been established for isomorphisms.
Since it is clear that transformations of rank lessthan n do not have
'inverses because they are not onto, we see that automorphisms are the
'only linear transformations which have inverses. linear transformation A
that has an inverse is said to be non-singular or invertible; otherwise it is
said to be singular. Let A be the matrix representing the automorphism
a, and let A" 1 be the matrix representing the inverse transformation a~ x .

The matrix A~ X A represents the transformation a~ x a. Since cr x a is the


identity transformation, we must have A~ A = I. But a is also the inverse
X

transformation of a~ so that acr = 1 and AA~ = I. We shall refer to


x x X

A~ as the inverse of A. A matrix that has an inverse is said to be non-


x

singular or invertible. Only a square matrix can have an inverse.


3 |
Non-singular Matrices 47

On the other hand suppose that for the matrix A there exists a matrix
B such that BA = I. Since / is of rank n, A must also be of rank n and, *

therefore, A represents an automorphism a. Furthermore, the linear


transformation which B represents is necessarily the inverse transformation
a" 1 since the product with a must yield the identity transformation. Thus
B = A* 1 The same kind of argument shows that if C is a matrix such that
.

AC = I, then C = A~ x Thus we have shown: .

Theorem 3.4. If A and B are square matrices such that BA = /, then


AB = I. If A and B are square matrices such that AB = I, then BA = I.
In either case B is the unique inverse ofA.O

Theorem 3.5. If A and B are non-singular, then (1) AB is non-singular and


(AB)- 1 = B^A' 1 (2) A' 1 is non-singular and (A' 1 )- 1 = A, (3) for a ^ 0,
,

aA is non-singular and (aA) -1 = a~ 1 A~ 1 .

proof. In view of the remarks preceding Theorem 3.4 it is sufficient in


each case to produce a matrix which will act as a left inverse.
(1) (B^A-^iAB) = B- (A~ A)B = B~ IB = B~ B = I.
1 1 X X

(2) AA' 1 = I.

(3) (a^A-^iaA) = (a^aXA^A) = /.

Theorem 3.6. If A is non-singular, we can solve uniquely the equations


XA = B and AY = Bfor any matrix B of the proper size, but the two solutions
need not be equal.
proof. Solutions exist since {BA~ )A = B{A~ A) = B and A(A~ B) = X X X

(AA _1 )B = B. The solutions are unique since for any C having the property
that CA — B we have C = CAA~ = BA~ and similarly with any solution
X X
,

of ,4 7= B. a
As an example illustrating the last statement of the theorem, let

1
2" 0"
I

A = A~ x = B=
1_ » I 1_

Then
"1 -2 -3 -2
X= BA- 1 = and Y = A~ B =X

2 -3 2 1

We add the remark that for non-singular A, the solution of XA = B


existsand is unique if B has n columns, and the solution of AY = B exists
and is unique if B has n rows. The proof given for Theorem 3.6 applies
without change.
48 Linear Transformations and Matrices I II

Theorem The rank of a (not necessarily square) matrix is not changed


3.7.

by multiplication by a non-singular matrix.


proof. Let A be non-singular and let B be of rank p. Then by Theorem
2.1 AB is of rank r < p, and A^iAB) = B is of rank p < r. Thus r = p.
The proof that BA is of rank p is similar.
Theorem 1.14 states the corresponding property for linear transformations.
The existence or non-existence of the inverse of a square matrix depends
on the matrix itself and not on whether it represents a linear transformation
of a vector space into itself or a linear transformation of one vector space
into another. convenient and consistent to extend our usage of the
Thus it is

term "non-singular" to include isomorphisms. Accordingly any square


matrix with an inverse is non-singular.
Let U and V be vector spaces of dimension n over the field F. Let A =
{aj, . . . , a n } be a basis of U and 8 = {(S x , . . . ,
B n } be a basis of V. If

£ = XLi^t an y vector in U we can define <r(£) to be


i s ^ *s ^Li^f
easily seen that <r is an isomorphism and that £ and cr(£) are both repre-
sented by (#!, x n ) e Fn Thus any two vector spaces of the same
. . . , .

dimension over F are isomorphic. As far as their internal structure is con-


cerned they are indistinguishable. Whatever properties may serve to dis-
tinguish them are, by definition, not vector space properties.

EXERCISES
1 Show that the inverse of
"1 2 3" -2 1"

A = 2 3 4 is A'1 = 3 -2
3 4 6 1 -2 1

2. Find the square of the matrix


2"
2

-2 1

1 -2
What is the inverse of Al (Geometrically, this matrix represents a 180° rotation
about the line containing the vector (2, 1, 1). The inverse obtained is therefore
not surprising.)
3. Compute the image of the vector (1, —2, 1) under the linear transformation
represented by the matrix
"1 2 3"

A = 2 3 4

1 2
Show that A cannot have an inverse.
3 |
Non-singular Matrices 49

4. Since
T ll X 12 3 -1 11 ^12 ~"~ 11 ' ^^12

-5 2 JX 21 J^22 "^21 "r" •^ a'22

" 3 -1
we can find the inverse of by solving the equations
_-5 2

ix 1:l 5cc 12 = 1

=
C 21
•'• 5#22 =
X %\ " ^"c 22 = 1-

Solve these equations and check your answer by showing that this gives the inverse
matrix.

We have not as yet developed convenient and effective methods for obtaining
the inverse of a given matrix. Such methods are developed later in this chapter
and in the following chapter. If we know the geometric meaning of the matrix,
however, it is often possible to obtain the inverse with very little work.

5. The matrix represents a rotation about the origin through the angl&

6 = arc cos f. What rotation would be the inverse of this rotation ? What matrix
would represent this inverse rotation? Show that this matrix is the inverse of the
given matrix.

r ° _i i
6. The matrix

represents a reflection about the line x, + x2 = 0.


-1 OJ
What operation is the inverse of this reflection? What matrix represents the
inverse operation? Show that this matrix is the inverse of the given matrix.

7. The matrix
n n represents a shear. The inverse transformation is also a

shear. Which one? What matrix represents the inverse shear? Show that this
matrix is the inverse of the given matrix.

8. Show that the transformation that maps (x x , x 2 x3) onto (x3


,
— x x x 2) is an
, ,

automorphism of F3 . Find the matrix representing this automorphism and its


inverse with respect to the basis {(1,0, 0), (0, 1, 0), (0, 0, 1)}.

9. Show that an automorphism of a vector space maps every subspace onto a


subspace of the same dimension.
10. Find an example to show that there exist non-square matrices A and B
such that AB — I. Specifically, show that there is an m x n matrix A and an
n x m matrix B such that AB is the m x identity. Show that BA is not the m
n x n identity. Prove in general that if m ^ n, then AB and BA cannot both be
identity matrices.
50 Linear Transformations and Matrices |
II

4 I Change of Basis

We have represented vectors and linear transformations as /z-tuples and


matrices with respect to arbitrary but fixed bases. A very natural question
arises: What changes occur in these representations if other choices for
bases are made? The vectors and linear transformations have meaning
independent of any particular choice of bases, independent of any coordinate
systems, but their representations are entirely dependent on the bases chosen.

Definition. Let A =
{a l9 , a„} and A' = {x'v
. .
.
, ol'J be bases of the . . .

vector space U. In a typical "change of basis" situation the representations


of various vectors and linear transformations are known in terms of the
basis A, and we wish to determine their representations in terms of the
basis A'. In this connection, we A as the "old" basis and to A' as
refer to
the "new" basis. Each ex.'. is expressible as a linear combination of the
elements of A; that is,

*; = i>««i. P (4.i)

The associated matrix P= [p i0\ is called the matrix of transition from the
basis A to the basis A\

The columns of P are the ^-tuples representing the new basis vectors in
terms of the old basis. This simple observation is worth remembering as
it is usually the key to determining P when a change of basis is made. Since

the columns of P are the representations of the basis A' they are linearly
independent and P has rank n. Thus P is non-singular.
=
Now let £ 2"=i x iV*i be an arbitrary vector of U and let £ = 2>=1 x\ix s i
~

be the representation of £ in terms of the basis A'. Then <>

n n / n \

= 2 llPiri
i=i y=i
W
/
(4 - 2 >

Since the representation of £ with respect to the basis A is unique we see


that xt =
2j =i/> tfa:i- Notice that the rows of P are used to express the
l

old coordinates of I in terms of the new coordinates. For emphasis and


contradistinction, we repeat that the columns of P are used to express the
new basis vectors in terms of the old basis vectors.
Let X= (x x , . . . , x n ) and X' = (x[, . . . , x^) be n x 1 matrices representing
the vector I with respect to the bases A and A'. Then the set of relations
{x i =
^j=\Pio x'j) can ^ e written as the single matric equation
X = PX'. (4.3)
4 |
Change of Basis 51

Now suppose that we have a linear transformation a of U into V and


that A = [a i}] is the matrix representing a with respect to the bases A in
U and B = {ft, ftj in V. We shall
. . . , now determine the representation
of a with respect to the bases A' and B.

n < '
f
n I m v

k=l k=l \i=l /


to J n \

"\"«i
h
x
^c *,
( .
^/,.,U

*
N
-'
= 1<A- V^o -f\h (4.4)
1=1

Since 6 is a basis, a' i:j


= 2jL x auPki and tne matrix ^f' = [a'i3] representing
a with respect to the bases A' and 6 is related to A by the matric equation

A' = AP. (4.5)

This relation can also be demonstrated in a slightly different way. For


an arbitrary f = 2»=1 a^a, e U let <r(£) = £™ r */,&• Then we have

Y.= AX = ,4 (/>*') = (AP)A". (4.6)

Thus v4P is a matrix representing c with respect to the bases A' and B. Since
the matrix representing a is uniquely determined by the choice of bases we

have A' = AP.


Now consider the effect of a change of basis in the image space V. Thus
let B be replaced by the basis B'
{ft, ftj. Let
= . . .
,
= fo w ] be the
matrix of transition from B to B', that is,
ft
= 2™ ^ft. Then if >4" =
[a".] represents a with respect to the bases A and B' we have
TO TO / TO \

<r(a,-) = 2X;& = 2 a'ki ( 2 0*A)


TO / TO v TO

= 2(I«tt*wU = 2««A. (4.7)


1=1 \fc=l / i=l

Since the representation of cr(a,) in terms of the basis B is unique we see


that A = QA", or
4" = g- 1 ^. (4.8)

Combining these results, we see that, if both changes of bases are made at
once, the new matrix representing a is Q~ X AP.
As in the proof of Theorem 1.6 we can choose a new basis A' = {04, . . , <x.'
.
n}
of U such that the last v = n —
p basis elements form a basis of K{a). Since
M04), . . . , <r(<Xp)} is a basis of a(U) and is linearly independent in V, it can
52 Linear Transformations and Matrices I II

be extended to a basis B' of V. With respect to the bases A' and 8' we have
or(a') =
ft
for/" < p while cr(a^) for/ =
p. Thus the new matrix Q~ AP
X
>
representing a is of the form

p columns v columns

10
1

p rows

m— p rows

Thus we have

Theorem 4.1. If A is any m


X n matrix of rank p, there exist a non-
singular n x n matrix P and a non-singular x m
matrix Q such that m
A' = Q~ AP X
has the first p elements of the main diagonal equal to 1, and
all other elements equal to zero.

When A andB are unrestricted we can always obtain this relatively simple
representation of a linear transformation by a proper choice of bases.
More interesting situations occur when A and B are restricted. Suppose,
for example, that we take U = V and A = B. In this case there is but one
basis to change and but one matrix of transition, that is, P = Q. In this
case it is not possible to obtain a form of the matrix representing a as simple
as that obtained in Theorem 4.1. We say that any two matrices representing
the same linear transformation or of a vector space V into itself are similar.
This is equivalent to saying that two matrices A and A' are similar if and
only if there exists a non-singular matrix of transition P such that A'
=
P- X AP. This case occupies much of our attention in Chapters III and V.

EXERCISES
1. In P 3 , the space of polynomials of degree 2 or smaller with coefficients in
F, let A ={\,x,x2 }.

A' = { Pl
{x) = x2 + x + 1 , p 2 (x) =x -x
2
-2, p 3 (x) = x2 + x - 1}

is also a basis. Find the matrix of transition from A to A'.


5 [
Hermite Normal Form 53

2. In many of the uses of the concepts of this section it is customary to take


A = {a. i \x i = (d n , 6 i2 , ... , d in)} as the old basis in Rn . Thus, in R2 let A =
{(1, 0), (0, 1)} and A' = {(£, V3/2), (- V3/2, £)}. Show that

\ -V3/2
P =
V3/2 i

is the matrix of transition from A to A'.

(Continuation) With A' and A as in Exercise 2, find the matrix of transition R


3.

from A' to A. (Notice, in particular, that in Exercise 2 the columns of P are the
components of the vectors in A' expressed in terms of basis A, whereas in this exercise
the columns of R are the components of the vectors in A expressed in terms of the
basis A'. Thus these two matrices of transition are determined relative to different
bases.) Show that RP = I.

2
4. (Continuation) Consider the linear transformation of o of R into itself which
maps
(1,0) onto (£, V3/2)

(0,1) onto (-V3/2,|).

Find the matrix A that represents a with respect to the basis A.


You should obtain A = P. However, A and P do not represent the same thing.
To see this, let £ = (x lt x 2 ) be an arbitrary vector in R2 and compute a(£) by means
of formula (2.9) and the new coordinates of £ by means of formula (4.3).
A little reflection will show that the results obtained are entirely reasonable.
The matrix A represents a rotation of the real plane counterclockwise through an
angle of w/3. The matrix P represents a rotation of the coordinate axes counter-
clockwise through an angle of w/3. In the latter case the motion of the plane
relative to the coordinate axes is clockwise through an angle of 77/3.

5. letIn R3 A =
{(1,0, 0), (0, 1,0), (0, 0, 1)} and let A' = {(0, 1, 1), (1,0, 1),

(1,1, 0)}. Find the matrix of transition P from A to A' and the matrix of transition
P-1 from A' to A.

Let A, 8, and C be three bases of V. Let P be the matrix of transition from A


6.

to 6 let Q be the matrix of transition from 8 to C. Is PQ or QP the matrix of


and
transition from A to C? Compare the order of multiplication of matrices of transi-
tion and matrices representing linear transformation.

Use the results of Exercise 6 to resolve the question raised in the parenthetical
7.

remark of Exercise 3, and implicitly assumed in Exercise 5. If P is the matrix of


transition from A to A' and Q is the matrix of transition from A' to A, show that
PQ =/.

5 I Hermite Normal Form

We may also ask how much simplification of the matrix representing a


linear transformation a of U into V can be effected by a change of basis in
,

54 Linear Transformations and Matrices I II

V alone. Let A = {a l5 a n } be the given basis in U and let U k = (a l5


. . . , . . .
,

a fc
The subspaces 0(Uk ) of V form a non-decreasing chain of subspaces with
).

«(Uk-i) <= <y(Uk ) and o"(U n ) = cr(U). Since a(Uk ) = + (a(oik )) we see a^)
from Theorem 4.8 of Chapter I that dim a(Uk ) < dim o^L/^) + 1 that is, ;

the dimensions of the o"(U ) do not increase by more than 1 at a time as k


fc

increases. Since dim (r(U n ) = p, the rank of a, an increase of exactly 1


must occur p times. For the other times, if any, we must have dim c(Uk ) =
dim ^(L/fc.i) and hence cr(Uk ) = cf(Uk_^). We have an increase by 1 when
cf(ce. ) <£ ^(C^i) and no increase when a{a. ) e oiU^).
k k
Let k lf k 2 k p be those indices for which o-(a <£ a(U k 1 ). Let
, . . . , .)
fc

ft = ^K)- Since ft £ er^) = (ft, ft_ x >, the set {ft,


. . .
,
.
.'.
,
ft} is

linearly independent (see Theorm ft}


<= 2.3, Chapter 1-2). Since {ft, . . .
,

<r(t/) and cr(l/) is of dimension is a basis of a(U). This set


p, {ft, ft} . . .
,

can be extended to a basis B' of V. Let us now determine the form of the
matrix A' representing a with respect to the bases A and 8'.
o(ct. ^ =
ft, column k t has a 1 in row i and all other elements of
Sincek
this column are O's. For k t <j < k i+1 <r(oc ) e <r(C so that column j ,
3 fc .)

has O's below row /. In general, there is no restriction on the elements of


column j in the first i rows. A' thus has the form

column column
*1 /c
2

r
• •
1 "l,fcl+l •
.. "l,fc 2 +l

• • • •
1 a 2,fc 2 +l

• • •

(5.1)

Once A and a are given, the k and


i
the set {ft, , ft} are uniquely
. . .

determined. There may be many ways to extend this set to the basis 6',
but the additional basis vectors do not affect the determination of A' since
every element of a{U) can be expressed in terms of {ft, ...
ft} alone. Thus ,

A' is uniquely determined by A and a.

Theorem 5.1. Given any m x n matrix A of rank p, there exists a non-


singular m x m matrix Q such that A'= Q~XA has the following form:
(1) There is at least one non-zero element in each of the first p rows of A'
and the elements in all remaining rows are zero.
5 |
Hermite Normal Form 55

(2) The first non-zero element appearing in row i (i < p) is a 1 appearing


in column k t where k x < k 2 <
, < kp
• • •
.

(3) In column k t the only non-zero element is the 1 in row i.

The form A' is uniquely determined by A.


proof. In the applications of this theorem that we wish to make A is

usually given alone without reference to any bases A and


and often without 6,
reference to any linear transformation a. We can, however, introduce any
two vector spaces U and V of dimensions n and m over F and let A be any
basis of U and 8 be any basis of V. We can consider A as defining a linear
transformation a of U into V with respect to the bases A and 8. The discussion
preceding Theorem 5.1 shows that there is at least one non-singular matrix
Q such that Q _1 A satisfies conditions (1), (2), and (3).
Now suppose there are two non-singular matrices Q x and Q 2 such that
Ql A = A'x and Q 2 A = A 2 both satisfy the conditions of the theorem.
x X

We wish to conclude that A[ = A'2 No matter how the vector spaces U


.

and V are introduced and how the bases A and 8 are chosen we can regard
Q x and Q 2 as matrices of transition in V. Thus A[ represents a with respect
to bases A and 8^ and A'2 represents a with respect to bases A and & 2 But .

condition (3) says that for i <p the rth basis element in both 8^ and B'
2
is

<7(<x .). Thus the first p elements of B[ and B' are identical. Condition (1) says
fc 2
that the remaining basis elements have nothing to do with determining the
coefficients in A[ and A'2 Thus A'x = A 2
. .

We say that a matrix satisfying the conditions of Theorem 5.1 is in


Hermite normal form. Often this form is called a row-echelon form. And
sometimes the term, Hermite normal form, is reserved for a square matrix
containing exactly the numbers that appear in the form we obtained in
Theorem 5.1 with the change that row beginning with a 1 in column k t
/'

is moved down to row k t Thus each non-zero row begins on the main
.

diagonal and each column with a 1 on the main diagonal is otherwise zero.
In this text we have no particular need for this special form while the form
described in Theorem 5.1 is one of the most useful tools at our disposal.
The usefulness of the Hermite normal form depends on its form, and
the uniqueness of that form will enable us to develop effective and con-
venient short cuts for determining that form.

Definition. Given the matrix A, the matrix A T obtained from A by inter-


changing rows and columns in A is called the transpose of A. If A T = [a'^],
T
the element a'tj appearing in row i column j of A is the element a H appear-
ing in row j column / of A. It is easy to show that (AB) T = B TA T (See .

Exercise 4.)

Proposition 5.2. The number of linearly independent rows in a matrix is

equal to the number of linearly independent columns.


56 Linear Transformations and Matrices I II

proof. The number of linearly independent columns in a matrix A is its


rank p. The Hermite normal form A' = Q~ X A corresponding to A is also
of rank p. For A' it is obvious that the number of linearly independent rows
in A' is also equal to p, that is, the rank of (A') T is p. Since
Q T is non-
singular, the rank of A T = (QA') T = (A') T T is also p. Thus the number
Q
of linearly independent rows in A is p.

EXERCISES
1. Which of the following matrices are in Hermite normal form?

"01001"
10 1

(a)
10
0_

"00204"
110 3
(b)
12
0_

"i o o o r
10 1

(c)
11
0_

"0101001"
10 10
(d)
10
0_

"10 10 1"

110
(e)
oooio
lj

2. Determine the rank of each of the matrices given in Exercise 1.


3. Let a and r be linear transformations
mapping R3 into R 2 Suppose that for .

a given pair of bases A for R and B for R


3 2 a
and t are represented by
,

T 1 0' 1 1"

A = and B =
1 1
6 |
Elementary Operations and Elementary Matrices 57

respectively. Show that there is no basis 8' of R 2 such that B is the matrix represent-
ing a with respect to A and 8'.

4. Show that
(a) (A + B) T = A T + BT ,

(b) (AB) T = B TA T ,

(c) {A-^)T = (AT)-\

6 I Elementary Operations and Elementary Matrices

Our purpose this section is to develop convenient computational


in
methods. We
have been concerned with the representations of linear
transformations by matrices and the changes these matrices undergo when
a basis is changed. We now show that these changes can be effected by
elementary operations on the rows and columns of the matrices.
We define three types of elementary operations on the rows of a matrix A.
Type I Multiply a row of A by a non-zero scalar.
:

Type II Add a multiple of one row to another row.


:

Type III Interchange two rows.


:

Elementary column operations are defined in an analogous way.


From a logical point of view these operations are redundant. An opera-
tion of type III can be accomplished by a combination of operations of
types I and II. It would, however, require four such operations to take the
place of one operation of type III. Since we wish to develop convenient
computational methods, it would not suit our purpose to reduce the number

of operations at our disposal. On the other hand, it would not be of much


help to extend the list of operations at this point. The student will find that,
with practice, he can combine several elementary operations into one step.
For example, such a combined operation would be the replacing of a row
by a linear combination of rows, provided that the row replaced appeared
in the linear combination with a non-zero coefficient. We leave such short
cuts to the student.
An elementary operation can also be accomplished by multiplying A on
the by a matrix. Thus, for example, multiplying the second row by the
left

scalar c can be effected by the matrix

1 ••• (T

c •••

1
•••

E 2 (c)
= (6.1)
58 Linear Transformations and Matrices I II

The addition of k times the third row to the first row can be effected by the
matrix

"1
k •
• 0"

10- •

1- •

Esl (k) = (6.2)

_0 •
• •
1

The interchange of the first and second rows can be effected by the matrix
~0 1 • • •
0"

1 •••

1
•••
•£l9 (6.3)

_o o o ••• :

These matrices corresponding to the elementary operations are called


elementary matrices. These matrices are all non-singular and their inverses
are also elementary matrices. For example, the inverses of E (c), E (k), and
2 31
E12 are respectively E^c' 1 ), E31 (—k), and E12 -

Notice that the elementary matrix representing an elementary operation is


the matrix obtained by applying the elementary operation to the unit matrix.

Theorem 6.1. Any non-singular matrix A can be written as a product of


elementary matrices.
proof. At least one element in the first column is non-zero or else A
would be singular. Our first goal is to apply elementary operations, if
necessary, to obtain a 1 in the upper left-hand corner. If a n 0, we can =
interchange rows to bring a non-zero element into that position. Thus we
may as well suppose that a lx ^ 0. We can then multiply the first row by
an"1 . Thus, to simplify notation, we may as well assume that a lx = 1.

We now add —a a times the first row to the rth row to make every other
element in the first column equal to zero.
The resulting matrix is still non-singular since the elementary operations
applied were non-singular. We now wish to obtain a 1 in the position of
element a22 At least one element in the second column other than a 12
.
6 |
Elementary Operations and Elementary Matrices 59

isnon-zero for otherwise the first two columns would be dependent. Thus
by a possible interchange of rows, not including row 1, and multiplying the
second row by a non-zero scalar we can obtain a 22 = 1- We now add — a iZ
times the second row to the /th row to make every other element in the second
column equal to zero. Notice that we also obtain a in the position of a 12
without affecting the 1 in the upper left-hand corner.
We way until we obtain the identity matrix. Thus if
continue in this
E E lt ,Er are elementary matrices representing the successive elementary
2, . . .

operations, we have

or
= Er -E E A, ?'
I 2 l M V '

(6.4)
:
-
f\" fi

A = E?E^-E~\n T; ^p- r .

Theorem 5.1 we obtained the Hermite normal form A' from the matrix
In
_1
A by multiplying on the left by the non-singular matrix Q We see now .

that Q~ x is a product of elementary matrices, and therefore that A can be


transformed into Hermite normal form by a succession of elementary row
operations. It is most efficient to use the elementary row operations directly
without obtaining the matrix Q~ x .

We could have shown directly that a matrix could be transformed into


Hermite normal form by means of elementary row operations. We would
then be faced with the necessity of showing that the Hermite normal form
obtained unique and not dependent on the particular sequence of oper-
is

ations used. While this is not particularly difficult, the demonstration is


uninteresting and unilluminating and so tedious that it is usually left as an
"exercise for the reader." Uniqueness, however, is a part of Theorem 5.1,
and we are assured that the Hermite normal form will be independent of
the particular sequence of operations chosen. This is important as many
possible operations are available at each step of the work, and we are free
to choose those that are most convenient.
Basically, the instructions for reducing a matrix to Hermite normal form
are contained in the proof of Theorem 6.1. In that theorem, however, we
were dealing with a non-singular matrix and thus assured that we could
at certain steps obtain a non-zero element For a on the main diagonal.
singular matrix, this is not the case. When
a non-zero element cannot be
obtained with the instructions given we must move our consideration to the
next column.
In the following example we perform several operations at each step to
conserve space. When several operations are performed at once, some
care must be exercised to avoid reducing the rank. This may occur, for
example, if we subtract a row from some hidden fashion. In this
itself in

example we avoid this pitfall, which can occur when several operations of
60 Linear Transformations and Matrices I II

type III are combined, by considering one row as an operator row and adding
multiples of it to several others.
Consider the matrix

4 3 2 -1 4


5 4 3 -1 4

-2 -2 -1 2 -3
11 6 4 1 11

as an example.
According to the instructions for performing the elementary row oper-
ations we should multiply the first row by J. To illustrate another possible
way to obtain the "1" in the upper left corner, multiply row 1 by —1 and
add row 2 to row 1 Multiples of row 1 can now be added to the other rows
.

to obtain

1 1 1

-1 -2 -1 4

1 2 -3
-5 -7 1 11

Now, multiply row 2 by —1 and add appropriate multiples to the other


rows to obtain
1 -1 -1 4

1 2 1 -4
1 2 -3
3 6 -9
Finally, we obtain
"l 1 1

1 -3 2

1 2 -3

which is the Hermite normal form described in Theorem 5.1. If desired,


Q~ x can be obtained by applying the same sequence of elementary row
operations to the unit matrix. However, while the Hermite normal form
is necessarily unique, the matrix Q~
x
need not be unique, as the proof of
Theorem 5.1 should show.
6 |
Elementary Operations and Elementary Matrices 61

Rather than trying to remember the sequence of elementary operations


used to reduce A to Hermite normal form, it is more efficient to perform
these operations on the unit matrix at the same time we are operating on
A. It is suggested that we arrange the work in the following way:

3 2 -1 4

4 3 -1 4
= [A, I]
-2 -1 2 -3
6 4 1 11

1 1 -1 1

-1 -2 -1 4 5 -4
1 2 -3 -2 2

-5 -7 1 11 11 -11

-1 -1 4 4 -3
1 2 1 -4 -5 4

2 -3 -2 2 1

6 -9 -14 9 1

1 1 2 -1 1 o"

-3 2 -1 -2
2 -3 -2 2 1

-8 3 -3 1

In the end we obtain


2 -1 1

-1 -2
0-i =
-2 2 1

3 -3
Verify directly thatQ~ A is in Hermite normal form.
X

If A
were non-singular, the Hermite normal form obtained would be the
identity matrix. In this case Q~ x would be the inverse of A. This method
of finding the inverse of a matrix is one of the easiest available for hand
computation. It is the recommended technique.
: :

62 Linear Transformations and Matrices I II

EXERCISES
1. Elementary operations provide the easiest methods for determining the rank

of a matrix. Proceed as if reducing to Hermite normal form. Actually, it is not


necessary to carry out all the steps as the rank is usually evident long before the
Hermite normal form is obtained. Find the ranks of the following matrices
'1 2 3
(a)

4 5 6

7 8 9
'
(b) 1

-1
-2 -3
"0 2"
(c) 1

1 3

2 3

2. Identify the elementary operations represented by the following elementary


matrices

(a) 1

-2
(b)
"0 r
1

0"
(c) "i

3. Show that the product

"-1 0"
_
1 0" "1 -r "1 0"

1 1 1 i i i

operations represented by each


isan elementary matrix. Identify the elementary
matrix in the product.

4. Show by an example that the


product of elementary matrices is not necessarily
an elementary matrix.
7 |
Linear Problems and Linear Equations 63

5. Reduce each of the following matrices to Hermite normal form.


(a) "2 1 3 -2'

2 -1 5 2

1 1 1 1

(b) 12 3 3 10 6"

2 10 2 3

2 2 2 1 5 5

-113 2 5 2_

6. Use elementary row operations to obtain the inverses of


(a) I" 3 -11
—5 2 , and
(b) Tl 2 3

2 3 4

3 4 6.
7. (a) Show that, by using a sequence of elementary operations of type II only,

any two rows of a matrix can be interchanged with one of the two rows multiplied
by —1. (In fact, the type II operations involve no scalars other than ±1.)
(b) Using the results of part (a), show that a type III operation can be obtained
by a sequence of type II operations and a single type I operation.
(c) Show that the sign of any row can be changed by a sequence of type II
operations and a single type III operation.
8. Show that any matrix A can be reduced to the form described in Theorem 4.1

by a sequence of elementary row operations and a sequence of elementary column


operations.

7 I Linear Problems and Linear Equations


For a given linear transformation a of U into V and a given p e V the
problem of finding any or all |gU for which o(£) = /S is called a linear
problem. Before providing any specific methods for solving such problems,
let us see what the set of solutions should look like.

If jS <£ cf(U), then the problem has no solution.


If /? g a(U), the problem has at least one solution. Let £ °e one such
solution. We call any such | a. particular solution. If I is any other solution,
then <r(£ - | ) = <*(£ ) - <?(£o) = £ - P = so that f - f is in the kernel
of a. Conversely, if £ — £„ is in the kernel of c then a (!) = <r(| + £ — £ =)

or(£ o ) + o-(| _ £ )— £ + o = /S so that f is a solution. Thus the set of all

solutions of (r(!) = /3 is of the form


{f } + K{a). (7.1)
64 Linear Transformations and Matrices |
II

Since {£ } a one-to-one correspondence


contains just one element, there is

between the elements of K{a) and the elements of {£ } + K{a). Thus the
size of the set of solutions can be described by giving the dimension of K(a).
The set of all solutions of the problem <r(£) = is not a subspace of U unless

/8 = 0. Nevertheless, it is convenient to say that the set is of dimension v,


the nullity of a.

Given the linear problem <r(|) = 0, the problem <r(f) = is called the

associated homogeneous problem. The general solution is then any particular


solution plus the solution of the associated homogeneous problem. The
solution of the associated homogeneous problem is the kernel of a.
Now let o be represented by the m x n matrix A = [a tj ], be represented
by B = (b u ... bj, and £ by X = (x u
,
x n ). Then the linear problem . . . ,

o(g) = f}
becomes
AX = B (7.2)

in matrix form, or

!>„*, = &„ (i=l,...,m) (7.3)

in the form of a system of linear equations.


Given A and 5, the augmented matrix [A, B] of the system of linear
equations is defined to be

«n
- -
" a ln bx

[A, B] = (7.4)

Theorem The system of simultaneous linear equations AX = B has a


7.1.

solution if and only if the rank of A is equal to the rank of the augmented
matrix [A, B]. Whenever a solution exists, all solutions can be expressed in
terms of v = n — p independent parameters where p is the rank of A. ,

proof. We have already seen that the linear problem <r(|) = |8 has a
solution if and only if e o(U). This is the case if and only if is linearly
dependent on MoO, cr(a n )}. But this is equivalent to the condition
. . . ,

that B be linearly dependent on the columns of A. Thus adjoining the


column of b/s to form the augmented matrix must not increase the rank.
Since the rank of the augmented matrix cannot be less than the rank of A
we see that the system has a solution if and only if these two ranks are equal.
Now let Q be a non-singular matrix such that Q^A = A' is in Hermite
normal form. Any solution of AX = B is also a solution of A'X = Q~ AX =
X

Q~ X B = B'. Conversely, any solution of A'X = B' is also a solution of


AX = QA'X = QB' = B. Thus the two systems of equations are equivalent.
7 |
Linear Problems and Linear Equations 65

Now the system A'X = B' is particularly easy to solve since the variable
xk appears only in the ith equation. Furthermore, non-zero coefficients
.

appear only in the first p equations. The condition that /? e a(U) also
takes on a form that is easily recognizable. The condition that B' be ex-
pressible as a linear combination of the columns of A' is simply that the
elements of B' below row p be zero. The system A'X = B' has the form

+ ^l.fcj+i^fci+i +" • •
+ ai,fc 2+ i#fc 2+ i + (7.5)
+ a' +l^fc +l
2,fc 2
vCt.
2

Since each x k appears in but one equation with unit coefficient, the remaining
.

n —p unknowns can be given values arbitrarily and the corresponding


values of the x k computed. The n — p unknowns with indices not the k t
.

are the n — p parameters mentioned in the theorem.

As an example, consider the system of equations:

5x x + 4x 2 + 3#3 — xt = 4

\\x x + 6x 2 + 4x 3 + #4 = 11.

The augmented matrix is

'

A 3 2-1 4

5 4 3-14
-2 -2 -1 2 -3
11 6 4 1 11

This is the matrix we chose for an example in the previous section. There
we obtained the Hermite normal form

1 1 1

1 -3 2

1 2 -3

Thus the system of equations A'X = B' corresponding to this augmented


matrix is

X\ + #4=1
x% + 2z4 = —3.
66 Linear Transformations and Matrices |
II

It is clear that this system is very easy to solve. We can take any value

whatever for x 4 and compute the corresponding values for x x and x


lt 2 z , .

A particular solution, obtained by taking x4 = 0, is = (1, 2, -3, 0). X


It is more instructive to write the new system of equations in the form
x± = l — x4
%2 == *• i *%&
Xz i= —J — Zx4
%4 = X^
In vector form this becomes

(a?!, x 2 x z Xi )
, , = (1,2, -3, 0) + *4 (-l, 3, -2, 1).

We can easily verify that (-1,3, -2, 1) is a solution of the associated


homogeneous problem. In fact, {(-1, 3, -2, 1)} is a basis for the kernel,
and * 4 (— 1, 3, —2, 1), for an arbitrary z4 is a general element of the kernel.
,

We have, therefore, expressed the general solution as a particular solution


plus the kernel.
The elementary row operations provide us with the recommended technique
for solving simultaneous linear equations by hand. This application is the
principal reason for introducing elementary row operations rather than
column operations.

Theorem 7.2. The equation AX = B fails to have a solution if and only if


there exists a one-row matrix C such thatCA = and CB = 1.
proof. Suppose the equation AX = B has a solution and a C exists
such that CA = and CB = 1 . Then we would have = (CA)X = C(AX) =
CB = 1 , which is a contradiction.
On the other hand, suppose the equation AX = B has no solution. By
Theorem 7.1 this implies that the
rank of the augmented matrix [A, B] is
greater than the rank of A. Let Q be a non-singular matrix such that
-1
<2 l>4>-#] is in Hermite normal form. Thenif pis therankof,4, the (p + l)st
row of Q~ X [A, B] must be all zeros except for a 1 in the last column. If C
is the (p + l)st row of Q~ x this means that

C[A,B] = [0
••• 1],
or
CA = and CB = 1.

This theorem is important because provides a positive condition for a


it

negative conclusion. Theorem 7.1 also provides such a positive condition


and it is to be preferred when dealing with a particular system of equations.
But Theorem 7.2 provides a more convenient condition when dealing with
systems of equations in general.
Although the sytems of linear equations in the exercises that follow are
written in expanded form, they are equivalent in form to the matric equation
7 Linear Problems and Linear Equations
|
67
AX = B. From any linear problem in this set, or those that will
occur later,
it is possible to obtain an extensive of closely related linear problems
list
that appear to be different. For example, if = B is the given linear
AX
problem with A an m x Q is any non-singular m x m matrix,
n matrix and
then A'X = B' with A' = QA and B' = QB is a problem with the same set of
solutions. If Pis a non-singular n x n matrix, then A"X" = B where A" = AP
is a problem whose solution X" is related to the solution
X of the original
problem by the condition X" = P- X X.
For the purpose of constructing related exercises of the type mentioned,
it is desirable to use matrices P and
Q that do not introduce tedious numerical
calculations. It is very easy to obtain a non-singular matrix P that has only
integral elements and such that its inverse also has only integral elements.
Start with an identity matrix of the desired order and perform a sequence of
elementary operations of types II and III. As long as an operation of type I is
avoided, no fractions will be introduced. Furthermore, the inverse opera-
tions will be of types II and III so the inverse matrix will also have only
integral elements.
For convenience, some matrices with integral elements and inverses with
integral elements are listed in
an appendix. For some of the exercises that
are given later in this book, matrices of transition that satisfy special con-
ditions are also needed. These matrices, known as orthogonal and unitary
matrices, usually do not have integral elements. Simple matrices of these
types are somewhat harder to obtain. Some matrices of these types are also
listed in the appendix.

EXERCISES
1. Show that {(1,1,1, 0), (2, 1, 0, spans the subspace of
1)} all solutions of the
system of linear equations
3x x — 2x 2 — xz — 4#4 =
xi + x 2 — 2x3 — 3x4 = 0.

2. Find the subspace of all solutions of the system of linear equations


xx + 2x 2 — 3*3 +x i =
3x x — x2 + 5x3 — xi =
2x x + x2 4
= a; 0.

3. Find all solutions of the following two systems of non-homogeneous linear


equations.
(a) xx + 3x 2 + 5x3 — 2x4 = 11
3*! — 2x 2 — 7x 3 + 5*4 =
2x x + x2 + x4 = 7,
(b) xx + 3x 2 + 2x 3 + 5x4 = 10
3x x — 2x 2 — 5x3 + 4xi = —5
2x x + x2 - x3 + 5x4 = 5.
68 Linear Transformations and Matrices |
II

4. Find all solutions of the following system of non-homogeneous linear equations


ZtX-t *^2 "^3 — *

Ju i ~~ JC o ~T~ £•& o —— ""~~


^
^tJU-% J&q i~ *& q —— *""" J

5. Find all solutions of the system of equations,

7x 1 + 3x 2 + 21^3 — \3xi + x5 = —14


lOo^ + 3x 2 + 3Cte 3 — 16x4 + £5 = —23
7x 1 + 2« 2 + 21x 3 — llx4 + x5 = —16
9^! + 3z 2 + 27ic 3 - 15a; 4 + x5 = -20.
6. Theorem 7.1 states that a necessary and sufficient condition for the existence

of a solution of a system of simultaneous linear equations is that the rank of the


augmented matrix be equal to the rank of the coefficient matrix. The most efficient
way to determine the rank of each of these matrices is to reduce each to Hermite
normal form. The reduction of the augmented matrix to normal form, however,
automatically produces the reduced form of the coefficient matrix. How, and
where? How is the comparison of the ranks of the coefficient matrix and the
augmented matrix evident from the appearance of the reduced form of the aug-
mented matrix ?
7. The differential equation d 2y/dx 2 + Ay = sin x has the general solution
y = C x
sin 2x + C 2 cos 2x + | sin x. Identify the associated homogeneous prob-
lem, the solution of the associated homogeneous problem, and the particular solu-
tion.

8 I Other Applications of the Hermite Normal Form

The Hermite normal form and the elementary row operations provide
techniques for dealing with problems we have already encountered and
handled rather awkwardly.

A Standard Basis for a Subspace

Let A = {a l5 aj be a basis of U and let


. . . ,
be a subspace of U spanned W
by the set B = {/? l5 /?,.}. Since every subspace of U is spanned by a finite
. . .
,

set, it is no restriction to assume that B is finite. Let & = 2"=i ^a^i s0 tnat

(b a , . . . , b in ) is the «-tuple representing fa. Then in thematrix B = [b^]


each row is the representation of a vector in B. Now suppose an elementary
row operation is applied to B to obtain B'. Every row of B' is a linear com-
bination of the rows of and, since an elementary row operation has an B
inverse, every row of B is a linear combination of the rows of B' Thus the .

rows of B and the rows of B' represent sets spanning the same subspace W.
We can therefore reduce B to Hermite normal form and obtain a particular
set spanning W. Since the non-zero rows of the Hermite normal form are
linearly independent, they form a basis of W.
8 |
Other Applications of the Hermite Normal Form 69

Now C be another set spanning W. In a similar fashion we can con-


let

struct a matrix C whose rows represent the vectors in C and reduce this
matrix to Hermite normal form. Let C" be the Hermite normal form
obtained from C, and let B' be the Hermite normal form obtained from B.
We do not assume that B and C have the same number of elements, and there-
fore B' and C
do not necessarily have the same number of rows. However,
in each the number of non-zero rows must be equal to the dimension of W.
We claim that the non-zero rows in these two normal forms are identical.
To see this, construct a new matrix with the non-zero rows of C" written
beneath the non-zero rows of B' and reduce this matrix to Hermite normal
form. Since the rows of C
are dependent on the rows of B', the rows of C"
can be removed by elementary operations, leaving the rows of B'. Further
reduction is not possible since B' is already in normal form. But by inter-
changing rows, which are elementary operations, we can obtain a matrix in
which the non-zero rows of B' are beneath the non-zero rows of C". As
before, we can remove the rows of B' leaving the non-zero rows of as C
the normal form. Since the Hermite normal form is unique, we see that the
non-zero rows of B' and C are identical. The basis that we obtain from the
non-zero rows of the Hermite normal form is the standard basis with respect
to A for the subspace W.
This gives us an effective method for deciding when two sets span the
same subspace. For example, in Chapter 1-4, Exercise 5, we were asked to
show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span
the same space. In either case we obtain {(1, 0, 1, 1), (0, 1, —1, —1)} as the
standard basis.

The Sum of Two Subspaces

If Ax is a subset spanning W x and A 2 is a subset spanning VV2 then A x U A 2


,

spans W +W
1 2 (Chapter I, Proposition 4.4). Thus we can find a basis for
W + x VV2 by constructing a large matrix whose rows are the representations
u A 2 and reducing
of the vectors in A x it to Hermite normal form by ele-

mentary row operations.

The Characterization of a Subspace by a Set of Homogeneous


Linear Equations

We have already seen that the set of all solutions of a system of homo-
geneous linear equations is a subspace, the kernel of the linear transformation
represented by the matrix of coefficients. The method for solving such a
system which we described in Section 7 amounts to passing from a charac-
terization of a subspace as the set of all solutions of a system of equations
to its description as the set of all linear combinations of a basis. The question
70 Linear Transformations and Matrices II
|

naturally arises: If we are given a spanning set for a subspace W, how can
we find a system of simultaneous homogeneous linear equations for which W
is exactly the set of solutions ?
This
is not at all difficult and no new procedures are
required. All that
isneeded is a new look at what we have already done. Consider the homo-
geneous linear equation a x x x -\ + a n x n = 0. There is no significant
difference between the a/s and the s/s in this equation ; they appear sym-
metrically. Let us exploit this symmetry systematically.
If a 1 x 1 + + a nx n = and b x x x +
• • •
n n =
\- b x are two homo-
geneous linear equations then (a x + b )x +
x x + (a n + b n )xn = is a • • •

homogeneous linear equation as also is aa 1 x x + + aa n x n = where • • •

a e F. Thus we can consider the set of all homogeneous linear equations in


n unknowns as a vector space over F. The equation a x + +ax = x x
• • •
n n
is represented by the «-tuple (a x , . . . , a n ).
When we write a matrix to represent a system of equations and reduce that
matrix to Hermite normal form we are finding a standard basis for the sub-
space of the vector space of all homogeneous linear equations in x x x, . . . ,
n
spanned by system of equations just as we did in the first part of this
this
section for a set of vectors spanning a subspace. The rank of the system
of
equations is the dimension of the subspace of equations spanned by the given
system.
W
Now let be a subspace given by a spanning set and solve for the subspace
£ of all equations satisfied by W. Then solve for the subspace of solutions of
the system of equations £. W
must be a subspace of the set of all solutions.
Let W
be of dimension v. By Theorem 7.1 the dimension of £ is n — v.
Then, in turn, the dimension of the set of all solutions of £ is n — (n — v) = v.
Thus W
must be exactly the space of all solutions. Thus and £ characterize W
each other.
If we start with a system of equations and solve it by means of the Hermite
normal form, as described in Section 7,
in a natural way a basis we obtain
for the subspace of solutions. This basis, however, will not be the standard
basis. We can obtain full symmetry between the standard system of equations
and the standard basis by changing the definition of the standard basis.
Instead of applying the elementary row operations by starting with the left-
hand column, start with the right-hand column. If the basis obtained in this
way called the standard basis, the equations obtained will be the standard
is

equations, and the solution of the standard equations will be the standard
basis. In the following example the computations will be carried out in this
way to illustrate this idea. It is not recommended, however, that this be
generally done since accuracy with one definite routine is more important.
Let
W= <(1, 0, -3, 11, -5), (3, 2, 5, -5, 3), (1, 1,2, -4, 2), (7, 2, 12, 1, 2)>.
8 |
Other Applications of the Hermite Normal Form 71

We now find a standard basis by reducing

-3 11 -5
2 5 -5 3

2 -4 2

12 1 2

to the form
5 1

2 1

From this we see that the coefficients of our systems of equations satisfy the
conditions

2a x + 5az + a5 =
ax + 2a z + a4 =0
ax + a2 = 0.

The coefficients a x and a z can be selected arbitrarily and the others computed
from them. In particular, we have

(a x , a 2 , a 3 , a 4 , ab ) = a x (l, -1, 0, -1, -2) + a 3 (0, 0, 1, -2, -5).


The 5-tuples (1,-1,0,-1,-2) and (0,0, 1, -2, -5) represent the two
standard linear equations

xl ~ x2 ~ xi — 2x 5 =

The reader should check that the vectors in actually W satisfy these equations
and that the standard basis for is obtained. W
The Intersection of Two Subspaces
Let W x and W 2 be subspaces of U of dimensions v x and i> 2 , respectively,

and let W n x VV2 be of dimension v. Then x + VV2 is of dimension W


vi + v2 — v. Let E x and £ 2 be the spaces of equations characterizing Wx
and W 2. As we have seen E x is of dimension n — vx and £ 2 is of dimension
n — v Let the dimension of E x + £ be
2. Then E n £ is of dimension 2 />. x 2
(« - v ) + (n - v ) - p = 2n - v - v2 - p.
x 2 x
Since the vectors in Wx n W satisfy the equations in both £ and £
2
x 2,
they satisfy the equations in E x + £ Thus v < n — p. On the other hand,
2.
72 Linear Transformations and Matrices |
II

W x and VV2 both equations in E n £ so that W + VV satisfies the


satisfy the x 2 1 2
equations n£ Thus v + r — v < n — {2n — v — v — p) =
in £x 2. x 2 x 2
v + ^2 +
i
— «• A comparison of these two inequalities shows that

v — n — p and hence that W n VV is characterized by £ + £ x 2 x 2.


Given W and W the easiest way to find W n W is to determine £ x and
x 2 ,
x 2
£ and then £ + £
2 From £ + £ we can then find W n VV In effect,
x 2- x 2 x 2.
this involves solving three
systems of equations, and reducing to Hermit
normal form three times, but it is still easier than a direct assault on the
problem.
As an example consider Exercise 8 of Chapter 1-4. Let x = ((1,2, 3, 6), W
(4,-1,3,6), (5, 1,6, 12)) and 2
= ((1,-1,1,1), (2,-1,4,5)). Using W
the Hermite normal form, we find that £ x = <(— 2, — 2, 0, 1), (— 1, — 1, 1,0))
and £ 2 = <(-4, -3,0, 1), (-3,-2,1,0)). Again, using the Hermite
normal form we find that the standard basis for Ej + £ 2 is {(1, 0, 0, £),
(0,1,0,-1), (0,0,1,— |)}. And from this we find quite easily that,
W,nW 2
= ((-i,l, i, 1)>.
Let B = {/?!, ^ 2 (5 n ) be a given finite set of vectors.
, . . .
, We wish to
solve the problem posed in Theorem 2.2 of Chapter I. How do we show that
some fi k
is a linear combination of the & with i < k; or how do we show that

no ft k can be so represented ?
We are looking for a relation of the form

& i

P* = I ***&• « (8.1)

This not a meaningful numerical problem unless ^ is a given specific set.


is

This usually means that the are given in terms of some coordinate system, &
relative to some given basis. But the relation (8.1) is independent of any
coordinate system so we are free to choose a different coordinate system if this
willmake the solution any easier. It turns out that the tools to solve this
problem are available.
Let A = {<*!, a TO } be the given basis and let
. . . ,

m
Pi = Z a» a *' ;' = 1, . . . ,
n. (8.2)

If A' = {a{, . . . , a'm } is the new basis (which we have not specified yet),
we would have
m
Pi = 2 a 'n^ j =1,. . . ,n. (8.3)
i=l

What is the relation between A = [a tj ] and A' = [a'i}]7 If P is the matrix of


transition from the basis A to the basis A', by formula (4.3) we see that

A = PA'. (8.4)
8 |
Other Applications of the Hermite Normal Form 73

Since P is non-singular, it can be represented as a product of elementary


matrices. This means A' can be obtained from A by a sequence of elementary
row operations.
The solution to (8.1) is now most conveniently obtained if we take A' to be
in Hermite normal form. Suppose that A' is in Hermite normal form and
use the notation given in Theorem 5.1. Then, for Pk we would have .

Pki = <4 (8.5)


and for j between kr and kr+1 we would have
r

Pi = 2 a i><

= 2 a 'iAi (8.6)

Since k t < kr < j, this last expression is a relation of the required form.
(Actually, every linear relation that exists among the j8 < can be obtained
from those in (8.6). This assertion will not be used later in the book so we
will not take space to prove Consider it "an exercise for the reader.")
it.

Since the columns of A and


A' represent the vectors in 8, the rank of A is
equal to the number of vectors in a maximal linearly independent subset of 8.
Thus, if 8 is linearly independent the rank of A will be n, this means that the
Hermite normal form of A will either show that 6 is linearly independent
or reveal a linear relation in 8 if it is dependent.
For example, consider the {(1,0,-3,11,-5), (3,2,5,-5, 3),
set

(1, 1,2, —4, 2), (7, 2, 12, 1, 2)}. The implied context is that a basis A =
{a l5 . . . , a5 } is considered to be given and that & = a x — 3a 3 + lla 4 — 5a 5
etc. According to (8.2) the appropriate matrix is

"1-3 1 T
2

-3 12

11 -5 -4 1

-5 3 2 2_

which reduces to the Hermite normal form


"1
3"
4

2 3
1 4"

a 9
1 2"
. :

74 Linear Transformations and Matrices |


II

It is easily checked that —1(1, 0, -3, 11, -5) + -2 3


T -(3, 2, 5, -5, 3) -
9
-V -0,l,2, -4, 2) = (7, 2, 12, 1,2).

EXERCISES
1 Determine which of the following set in R4 are linearly independent over R.

(a) {(1,1,0,1), (1, -1,1,1), (2,2, 1,2), (0,1,0,0)}.


(b) {(1, 0, 0, 1), (0, 1, 1, 0), (1, 0, 1, 0), (0, 1, 0, 1)}.
(c) {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1), (1, 1, 1, 1)}.

This problem is identical to Exercise 8, Chapter 1-2.

2. Let W
be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1),
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}.
Find a standard basis for W
and the dimension of W. This problem is identical
to Exercise 6, Chapter 1-4.

3. Show that {(1, -1,2, -3), (1,1,2,0), (3, -1,6, -6)} and {(1,0,1,0),
(0, 2, 0, 3)} do not span the same subspace. This problem is identical to Exercise 7,
Chapter 1-4.

4. If W x
= <(1, 1, 3, -1), (1, 0, -2, 0), (3, 2, 4, -2)> and W 2
= <(1, 0, 0, 1),
(1, 1,7, 1)> determine the dimension of W x + W 2.

5. Let W
= <(1, -1, -3, 0, 1), (2, 1,0, -1,4), (3, 1,-1,1, 8), (1, 2, 3, 2, 6)>.
Determine the standard basis for W. Find a set of linear equations which char-
acterize W.
6. Let W x
= <(1, 2, 3, 6), (4, -1, 3, 6), (5, 1, 6, 12)> and W 2
= <(1, -1, 1, 1),

(2, -1be subspaces of R4 Find bases for


, 4, 5)> x
n 2 and x + VV2 Extend
. W W W .

the basis of VV2 n 2 to a basis of x and extend W


the basis of 1 W
n 2 to a basis W W
of VV2 From these bases obtain a basis of
.
x + 2 This problem is identical W W .

to Exercise 8, Chapter 1-4.

9 I Normal Forms

To understand fully what a normal form is, we must first introduce the
concept of an equivalence relation. We say that a relation is defined in a
set if, for each pair {a, b) of elements in this set, it is decided that "a is

related to Z>" or "a is not related to 6." If a is related to b, we write a~ b.


An equivalence relation in a set S is a relation in S satisfying the following laws
Reflexive law: a~ a,
Symmetric law: If a <~ b, then b <~ a.

Transitive law : If a ~ b and b ~ c, then a ^ c.

If for an equivalence relation we have a ~ b, we say that a is equivalent to b.


9 |
Normal Forms 75

Examples. Among rational fractions we can define a\b ~ c\d (for a, b,c,d
integers) if and only if ad = be. This is the ordinary definition of equality
in rational numbers, and this relation satisfies the three conditions of an
equivalence relation.
In geometry we do not ordinarily say that a straight line is parallel to
itself. But if we agree to say that a straight line is parallel to itself, the
concept of parallelism is an equivalence relation among the straight lines
in the plane or in space.
Geometry has many equivalence relations: congruence of triangles,
similarity of triangles, the concept of projectivity in projective geometry,
etc. In dealing with time we use many equivalence relations: same hour
of the day, same day of the week, etc. An equivalence relation is like a
generalized equality. Elements which are equivalent share some common
or underlying property. As an example of this idea, consider a collection of
sets. We say that two sets are equivalent if their elements can be put into

a one-to-one correspondence; for example, a set of three battleships and a


set of three cigars are equivalent. Any set of three objects shares with
any other set of three objects a concept which we have abstracted and called
"three." All other qualities which these sets may have are ignored.
It is most natural, therefore, to group mutually equivalent elements

together into classes which we call equivalence classes. Let us be specific


about how this is done. For each a e S, let S a be the set of all elements in
S equivalent to a; that is, b e Sa if and only if b a. We wish to show that ~
the various sets we have thus defined are either disjoint or identical.
Suppose S n S b is not empty; that is, there exists a cES a nS b such
that c r^j a and c b. ~
By symmetry b <~ c, and by transitivity b a. —
If d any element of S 6
is and hence , d^b d~
a. Thus d eSa and S 6 <= Sa .

Since the relation between S and S b is symmetric we also have S <= S b and
hence Sa = S b Since a eSu we have shown, in effect, that a proposed
.

equivalence class can be identified by any element in it. An element selected


from an equivalence class will be called a representative of that class.
An equivalence relation in a set S defines a partition of that set into
equivalence classes in the following sense: (1) Every element of S is in some
equivalence class, namely, a e S (2) Two elements are in the same equiva-
.

lence class if and only if they are equivalent. (3) Non-identical equivalence
classes are disjoint. On the other hand, a partition of a set into disjoint
subsets can be used to define an equivalence relation; two elements are
equivalent if and only if they are in the same subset.
The notions of equivalence and equivalence classes are not
relations
nearly so novel as they may seem
Most students have encountered
at first.
these ideas before, although sometimes in hidden forms. For example,
we may say that two differentiable functions are equivalent if and only if
76 Linear Transformations and Matrices |
II

they have the same derivative. In calculus we use the letter "C" in describing
the equivalence classes; x + + 2z + C is the set (equiva-
for example, 3
#2
lence class) of all functions whose derivative is 3x + 2x + 2.
2

In our study of matrices we have so far encountered four different equiva-


lence relations:

I. The matrices A and B are said to be left associate if there exists a


non-singular matrix Q such that B = Q~ X A. Multiplication by Q~ x corre-
sponds to performing a sequence of elementary row operations. If A
represents a linear transformation a of U into V with respect to a basis A in U
and a basis B in V, the matrix B represents a with respect to A and a new basis
in V.
II. The matrices A and B are said to be right associate if there exists a
non-singular matrix P such that B= AP.
The matrices A and B are said
III. to be associate if there exist non-
singular matrixes P and Q such that B = Q~ X AP. The term "associate" is

not a standard term for this equivalence relation, the term most frequently
used being "equivalent." It seems unnecessarily confusing to use the same

term for one particular relation and for a whole class of relations. Moreover,
this equivalence relation is perhaps the least interesting of the equivalence
relations we shall study.
IV. The matrices A and B are said to be similar if there exists a non-
singular matrix P such that B = P~ X AP. As we have seen (Section 4) similar
matrices are representations of a single linear transformation of a vector
space into itself. This is one of the most interesting of the equivalence
relations, and Chapter III isdevoted to a study of it.
Let us show in detail that the reation we have defined as left associate
is an equivalence The matrix Q~ x appears in the definition because
relation.
Q represents the matrix of transition. However, Qr x is just another
singular matrix, so it is clearly the same thing to say that A and B are left
associate if and only if there exists a non-singular matrix Q such that B = QA.

(1) A —A ' since IA = A.


(2) If A ~ B, there is a non-singular matrix Q such that B = QA. But
then A = Q~ X B so that B ~ A.
(3) If A r^j B and B~ C, there exist non-singular matrices Q and P
such that B = QA and C PB. But then PQA = PB = C and PQ
= is

non-singular so that A ~ C.
For a given type of equivalence relation among matrices a normal form
is a particular matrix chosen from each equivalence class. It is a repre-
sentative of the entire class of equivalent matrices. In mathematics the
terms "normal" and "canonical" are frequently used to mean "standard"
in some particular sense. A normal form or canonical form is a standard
9 |
Normal Forms 77

form selected to represent a class of equivalent elements. A normal form


should be selected to have the following two properties Given any matrix :

A, (1) it should be possible by fairly direct and convenient methods to find


the normal form of the equivalence class containing A, and (2) the method
should lead to a unique normal form.
Often the definition of a normal form is compromised with respect to the
second of these desirable properties. For example, if the normal form were
a matrix with complex numbers in the main diagonal and zeros elsewhere,
to make the normal form unique would be necessary to specify the order
it

of the numbers in the main diagonal. But it is usually sufficient to know


the numbers in the main diagonal without regard to their order, so it would
be an awkward complication to have to specify their order.
Normal forms have several uses. Perhaps the most important use is that
the normal form should yield important or useful information about the
concept that the matrix represents. This should be amply illustrated in
the case of the concept of left associate and the Hermite normal form. We

introduced the Hermite normal form through linear transformations, but


we found that it yielded very useful information when the matrix was used
to represent linear equations or bases of subspaces.
Given two matrices, we can use the normal form to tell whether they are
equivalent. It is often easier to reduce each to normal form and compare

the normal forms than it is to transform one into the other. This is the case,
for example, in the application described in the
first part of Section 8.

Sometimes, knowing the general appearance of the normal form, we can


find all the information we need without actually obtaining the normal
form. This is the case for the equivalence relation we have* called associate.
The normal form for this equivalence relation is described in Theorem 4.1.
T*^re is just one normal form for each possible value of the rank. The
lia'mber of different equivalence classes is min {m, n} + 1. With this notion
of equivalence the rank of a matrix is the only property of importance. Any
two matrices of the same rank are associate. In practice we can find the rank
without actually computing the normal form of Theorem 4.1. And knowing
the rank we know the normal form.
We encounter several more equivalence relations among matrices. The
type of equivalence introduced will depend entirely on the underlying con-
cepts the matrices are used to represent. It is worth mentioning that for the
equivalence relations we introduce there is no necessity to prove, as we did
for an example above, that each is an equivalence relation. An underlying
concept will be defined without reference to any coordinate system or choice
of basis. The matrices representing this concept will transform according
to certain rules when the basis is changed. Since a given basis can be retained
the relation defined is reflexive. Since a basis changed can be changed back
78 Linear Transformations and Matrices |
II

to the original basis, the relation defined is symmetric. A basis changed once
and then changed again depends only on the final choice so that the relation is

transitive.
For a fixed basis A in U and 8 in V two different linear transformations
a and t of U into V are represented by different matrices. If it is possible,
however, to choose bases A' in U and 8' in V such that the matrix representing
r with respect to A' and 8' is the same as the matrix representing a with
respect to A and 8, then it is certainly clear that a and t share important
geometric properties.
For a fixed a two matrices A and A' representing a with respect to different
bases are related by a matrix equation of the form A' = Q~XAP. Since
A and A' represent the same linear transformation we feel that they should
have some properties in common, those dependent upon a.
These two points of view are really slightly different views of the same
kind of relationship. In the second case, we can consider A and A' as
representing two linear transformations with respect to the same basis,
instead of the same linear transformation with respect to different bases.
"1 01
For example, in R 2 the matrix represents a reflection about the
-1 0" -lj
^-axis and represents a reflection about the # 2 -axis. When both
1.

linear transformations are referred to the same coordinate system they are
different. However, for the purpose of discussing properties independent
of a coordinate system they are essentially alike. The study of equivalence
relations is motivated by such considerations, and the study of normal forms
is aimed at determining just what these common properties are that are
shared by equivalent linear transformations or equivalent matrices.
To make these ideas precise, let a and r be linear transformations of V
into itself. We say that a and t are similar if there exist bases A and 8 of V
such that the matrix representing a with respect to A is the same as the matrix
representing t with respect to 8. If A and B are the matrices representing
a and t with respect to A and P is the matrix of transition from A to 8, then
P~X BP is the matrix representing r with respect to 8. Thus a and r are
similar if P^BP = A.
In a similar way we can define the concepts of left associate, right associate,
and associate for linear transformations.

*10 I Quotient Sets, Quotient Spaces

Definition. If S is any set on which an equivalence relation is defined, the

collection of equivalence classes is called the quotient or factor set. Let S

denote the quotient set. An element of S is an equivalence class. If a is an


10 I
Quotient Sets, Quotient Spaces 79

element of S and a is the equivalence class containing a, the mapping t] that


maps a onto a is well defined. This mapping is called the canonical mapping.

Although the concept of a quotient set might appear new to some, it is


certain that almost everyone has encountered the idea before, perhaps in
one guise or another. One example occurs in arithmetic. In this setting,
let S be the set of
all formal fractions of the form afb where a and b are

integersand b ^ 0. Two such fractions, ajb and c/d, are equivalent if and
only if ad = be. Each equivalence class corresponds to a single rational
number. The rules of arithmetic provide methods of computing with rational
numbers by performing appropriate operations with formal fractions selected
from the corresponding equivalence classes.
Let U be a vector space over F and let K be a subspace of U. We shall call
two vectors a, p e U equivalent modulo K if and only if their difference lies
in K. Thus a /^- p if and only if a — p e K. We must first show this defines
<

an equivalence relation. (1) a a because a — a = ~


e K. (2)a^/?=>
a — p 6 K => p — a e K => p a. (3) {a p and p ~
y} => {a - e K and ~ ~
p — y eK}. Since K is a subspace a — 7 = (a — /?) + (0 — y) e K and,
hence, a y. Thus ~"~" is an equivalence relation.
We wish to define vector addition and scalar multiplication in U. For
a e U, SeU
denote the equivalence class containing a. a is called a
let

representative of a. Since a may contain other elements besides a, it may

happen that a^a' and yet a = a'. Let a and p be two elements in U. Since
a, /? e U, a + is defined. We wish to define a + to be the sum of a and
jS /? /5.

In order for this to be well defined we must end up with


same equivalence the
class as the sum if different representatives are chosen from a and /?. Suppose
a = a' a nd,g = /?' Then a - a' £ K, . - ? e K, and (a + $) - (a' + ?) £
/C. Thus a + = a' + /?' and the sum is well defined. Scalar multiplication
is defined similarly. For aeF, aa. is thus defined to be the equivalence class

containing ace; that is, aa = aa These operations in (Jare said to be induced


by the corresponding operation in U.

Theorem 10.1. If U is a vector space over F, and K is a subspace of U,


the quotient set U
with vector addition and scalar multiplication defined as
above is a vector space over F.
proof. We leave this as an exercise.

For any a £ U, the symbol a + K is used to denote the set of all elements
in U that can be written in the form a + y where y £ K. (Strictly speaking,
we should denote the set by {a} + K so that the plus sign combines two objects
of the same type. The notation introduced here is traditional and simpler.)
The set a + K is called a coset of K. If /? £ a + K, then p — a £ K and
80 Linear Transformations and Matrices |
II

j8 ~ a. Conversely,
if ~
a, then £ - a = y e K so /S e a + K. Thus
a + isKsimply the equivalence class a containing a. Thus a + K = /? +K
if and only ifae£ = £ + Kor£ea = a + K.

The notation a + K to denote a is convenient to use in some calcula tions.


For example, a + = (a + K) + (0 + K) = a + + K =a+ 0, and
aa = a(a + K) = aa + aK c fla + K = tfa. Notice that aa. =act. when a
and ool are considered to be elements of U and scalar mutliplication is the
induced operation, but that ail and ad. may not be the same when they are
viewed as subsets of U (for example, let a = 0). However, since acx
c: ace

the set acl determines the desired coset in U for the induced operations. Thus
we can compute U by doing the corresponding operations with
effectively in
representatives. This is precisely what is done when we compute in residue
classes of integers modulo an integer m.

Definition. U with the induced operations is called a factor space or quotient

space. In order to designate the role of the subspace K which defines the

equivalence relations, U is usually denoted by U/K.

we actually encountered
In our discussion of solutions of linear problems
quotient spaces, but the discussion was worded in such a way as to avoid
introducing this more sophisticated concept. Given the linear transformation

a of U into the kernel of a and let U = U/K be the corresponding


V, let K be
quotient space. and a 2 are solutions of the linear problem, oc(f) = 0,
If a x
then a(a. x — <x 2 ) = so that a x and a 2 are in the same coset of K. Thus
for each fi e lm(a) there corresponds precisely one coset of K. In fact
the correspondence between U/K and lm(a) is an isomorphism, a fact which
is made more precise in the following theorem.

Theorem homomorphism theorem). Let a be a linear trans-


10.2. (First
formation ofU Let K be the kernel of a. Then a can be written as the
into V.

product of a canonical mapping rj of U onto = U/K and a monomorphism


a x of U into V.
proof. The canonical mapping r\ has already been defined. To define

a lf for each a e U let (7 x (a) = cr(a) where a is any representative of a. Since

(7(a) = (r(a') a~a',


a x is well defined.
for It is easily seen that ax is a
monomorphism since a must have different values in different cosets. D
The homomorphism theorem is usually stated by saying, "The homo-
morphic image is isomorphic to the quotient space of U modulo the kernel."
Theorem (Mapping decomposition theorem). Let a be a linear
10.3.
transformation ofil into V. Let K be the kernel of a and the image of a. Then I

a can be written as the product a = iax rj, where rj is the canonical mapping of
10 I
Quotient Sets, Quotient Spaces 81

U onto U = UjK, a 1 is an isomorphism of U onto I, and i is the injection of \

into V.
proof. Let a' be the linear transformation of U onto induced by re- /

stricting the codomain of a to the image of a. By Theorem 10.2, a' can be


written in the form a' = a^.

Theorem 10.4. {Mapping factor theorem). Let S be a subspace of U and


let U =
U/S be the resulting quotient space. Let a be a linear transformation of
U into V, and let K be the kernel of a. If S <= K, then there exists a linear
transformation a x of U into V such that a = ax r\ where -n is the canonical

mapping of U onto U.
proof. For each oteU, let cr^a) = c(a) where a e a. If a' is another
representative of a, then a — a' £S <= K. Thus <r(a) = cT(a') and a x is well
defined. It is easy to check that g x is linear. Clearly, <r(<x) = o^a) = ^(^(a))
for all a e U, and a = a x rj.

We
say that a factors through (J.
Note that the homomorphism theorem is a special case of the factor theorem
in which K = S.

Theorem 10.5. {Induced mapping theorem). Let U and V be vector spaces


over F, and let r be a linear transformation of (J into V. Let U be a subspace of

U and let V be a subspace of V. If t{U ) <= V // is possible to define in a ,

natural way a mapping f of U/U into V/V such that o 2 t = fax where ax is the
canonical mapping U onto U and a 2 is the canonical mapping of V onto V.

proof. Consider a = a 2 r, which maps U into V. The kernel of a is t -1 (V ).


By assumption, U <= r_1 (V ). Hence, by the mapping factor theorem, there
is a linear transformation f such that fa x = a 2 r.

We say that f is induced by r.


Numerical calculations with quotient spaces can usually be avoided in
problems involving finite dimensional vector spaces. If U is a vector space
over F and K is a subspace of U, we know from Theorem 4.9 of Chapter I
thatK is a direct summand. Let U = K ® W. Then the canonical mapping
r\maps W
isomorphically onto U/K. Thus any calculation involving UjK
can be carried out in W.
Although there are many possible choices for the complementary subspace
W, the Hermite normal form provides a simple and effective way to select a W
and a basis for it. This typically arises in connection with a linear problem.
To see this, reexamine the proof of Theorem 5.1. There we let k± k 2 kp , , . . . ,

be those indices for which <r(a .) ^ a{Ukil). We showed there that {(![,
fc
. . .
,

k}
fi'
where ^ = cr(a .) formed a basis of a{U). {a&i a^,
fc
<x
kJ
is a basis for , . . . ,

a suitable W
which is complementary to K{a).
82 Linear Transformations and Matrices I II

Example. Consider the linear transformation a of R 5 into R 3 represented by


the matrix
1 1 1"

It is easy to determine that the kernel K of a is 2-dimensional with basis


{(1, —1, — 1, 1,0), (0, 0, —1,0, 1)}. This means that a has rank 3 and the
image of a Thus R 5 = R 5 /K is isomorphic to R 3
is all of R 3 . .

Consider the problem of solving the equation c(£) = /?, where jS is


represented by (b x b 2 b 3 ). To solve this problem we reduce the augmented
, ,

matrix
"1 1 1 bx
1 1 b2
1 0-1 b3

to the Hermite normal form


"1 -1 b3

1 1 b2

1 1 1 bx - I
K
This means the solution £ is represented by

(b 3 ,b 2 ,b x -b 3 ,0,0) + x i (l, -l,-l,l,0) + z 5 (0, 0,-1,0,1).

(ft,, bt , bx - b 3 0, 0)
,
= M0, 0,1,0,0) + 6,(0, 1,0,0,0) +b 3 (\ ,0,-1,0,0)
is a particular solution and a convenient basis for a subspace W complementary
to K is {(0, 0, 1,0, 0), (0, 1, 0, 0, 0), (1, 0, -1, 0, 0)}. a maps Z> x (0, 0,
1, 0, 0) + b 2 (0, 1, 0, 0, 0) +
b 3 (l, 0, -1, 0, 0) onto (b lt b 2 b 3 ). , Hence, W
is mapped isomorphically onto R 3 .

This example also provides an opportunity to illustrate the working of the


5
first homomorphism theorem. For any (x x x 2 x 3 x A x 5 ) e R , , , , .

- i
(*!, X 2 X 3 Xi
, , Hr Xb ) =
+ X 3 + Xs)(0, 0, 0, 0)
(X X 1 ,

+ (*2 + *4)(0, 1,0,0,0)'


+ (^-^(1, 0,-1,0,0)
+ * (1, -1, -l,l,0) + (0, 0,-1,0,1).
4 a;
5

Thus {x x , x 2 , x 3 x x 5 is mapped onto the coset (x + x + x )(0, 0, 0, 0) +


, 4, ) x 3 5 1,

(x 2 + a; 4 )(0, 1, 0, 0, 0) + (x - )(l, 0, -1, 0, 0) + K under the natural


x a; 4

homomorphism onto R 5 /K. This coset is then mapped isomorphically onto


{x x + x3 + xh x 2
, + xx x x
,
— x y ) e R3 . However, it is somewhat contrived to
,

ll|Hom(U,V) 83
work out an example of this type. The main importance of the first homo-
morphism theorem is theoretical and not computational.

*11 I Hom(U, V)

Let U and V be vector spaces over F. have already observed in Section 1 We


that the set of all linear transformations of U into Vcan be made into a
vector
space over F by defining addition and scalar multiplication appropriately.
In this section we will explore some of the elementary consequences of this
observation. We shall call this vector space Hom(U, V), "The space of all
homomorphisms of U into V."

Theorem 11.1. If dim U = n and dim V = m, then dim Hom(U, V) = mn.


proof. Let {a x a„} be a basis of U and let {&,
, . . . , . . .
, fi n } be a basis of
V. Define the linear transformation of a tj by the rule

°wO*) = <*«&
TO

= ZWr
r=l
(n.i)

Thus a it is represented by the matrix [d ri d ik ] = Au A.


it has a zero in every
position except for a column j. 1 in row /

The set {a u } is linearly independent. For if a linear relation existed among


the o u it would be of the form

1 au^ii = 0.

This means 2. .fl4 ,(T, y (a ) fc


= for alia*. But J., ^(T,,(a ft
) = 2 w aw <5
y *ft
=
Zi aikPi = 0- Since {&} is a lineary independent set, a ik = for / = 1
2, . m. Since this is true for each k, all a = and {a it } is linearly
. . ,
u
independent.
If or e Hom(U, V) and cr(a fc ) = ^tt a*Pi,
then
m / n \

m n
=2 2>«o«(a*)

i=l 3=1 J

Thus {or„} spans Hom(U, V), which is therefore of dimension mn. a


If Vx is a subspace of V, every linear transformation of U into V also
x defines
a mapping of U into V. This mapping of U into V is a linear transformation
of
84 Linear Transformations and Matrices |
II

U Thus, with each element of Hom(U, Vj) there is associated in a


into V.
natural way an element of Hom(U, V). We can identify Hom(U, Vx ) with
asubsetofHom(U, V). With this identification Horn (U, Vx ) is a subspace of
Hom(U, V).
Now let U 1 be a subspace of U. In this case we cannot consider Hom(Ux V) ,

to be a subset of Hom(U, V) since a linear transformation in Hom(L/ 1 V) ,

is not necessarily defined on But any linear transformation in all of U.


Hom(U, V) is certainly defined on U x If a e Hom(U, V) we shall consider .

the mapping obtained by applying a only to elements in U x to be a new function


and denote it by R(a). R(a) is called the restriction of a to U x We can con- .

sider R(a) to be an element of Hom(L/ x , V).

It may happen that different linear transformations defined on U produce


the same restriction on U x We . say that o x and a 2 are equivalent on L/ if and
x

only if R(a x ) = R(a 2 ). It is clear that R(a + r) = i*(<r) + i?(r) and R(aa) =

aR(a) so that the mapping of Hom(U, V) into Kom(U x V) is linear. We call ,

this mapping R, the restriction mapping.


The kernel of R is clearly the set of all linear transformations in Hom(U, V)
that vanish on U x Let us denote this kernel by U*.
.

If a is any linear transformation belonging to Hom(Ux V), it can be ,

extended to a linear transformation belonging to Hom(U, V) in many ways.


If {a x , <xj is a basis of U such that {a l5
. . . ,
ar } is a basis of U x then let . . . , ,

(7(a,) = <r(a -) for j = 1 3


r, and let crfo) be defined arbitrarily for
, . . . ,

j =r+ 1 , . . . , n. Since a is then the restriction of a, we see that R is an


epimorphism of Hom(U, V) onto Hom^, V). Since Hom(U, V) is of dimen-
sion mn and Hom(U x V) , is of dimension mr, U* is of dimension m(n — r).

Theorem 11.2. Hom(L/ x , V) is canonically isomorphic to Hom(U, V)/U*.

Note: It helps the intuitive understanding of this theorem to examine the


method by which we obtained an extension off on U x to a on U. U* is the ,

set of all extensions of a when a is the zero mapping, and one can see directly
that the dimension of U* is (n — r)m.
chapter
III Determinants,
eigenvalues,
and similarity
transforma-
tions
This chapter is devoted to the study of matrices representing linear trans-

formations of a vector space into itself. We have seen that if A represents


a linear transformation a of V into itself with respect to a basis A, and P
is the matrix of transition from A to a new basis A', then P~ XAP = A' is

the matrix representing a with respect to A'. In this case A and A' are said
to be similar and the mapping of A onto A' = P~ AP is
X
called a similarity
transformation (on the set of matrices, not on V).

Given a, we seek a basis for which the matrix representing a is particularly


simple. In practice a is given only implicitly by giving a matrix A representing
a. The problem, then, is to determine the matrix of transition P so that
P~XAP has the desired form. The matrix representing a has its simplest
form whenever a maps each basis vector onto a multiple of itself; that is,
whenever for each basis vector a there exists a scalar A such that <r(a) = Aa.
It is not always possible to find such a basis, but there are some rather general

conditions under which it is possible. These conditions include most cases


of interest in the applications of this theory to physical problems.
The problem of finding non-zero a such that cr(oc) = Aa is equivalent to
the problem of finding non-zero vectors in the kernel of a A. This is a —
linear problem and we have given practical methods for solving it. But
there is no non-zero solution to this problem unless a — A is singular. Thus
we are faced with the problem of finding those A for which a — X is singular.
The values of X for which a — X is singular are called the eigenvalues of a,
and the non-zero vectors a for which <r(a) = Aa are called eigenvectors of a.
We introduce some topics from the theory of determinants solely for
the purpose of finding the eigenvalues of a linear transformation. Were
it not for this use of determinants we would not discuss them in this book.

Thus, the treatment given them here is very brief.

85
86 Determinants, Eigenvalues, and Similarity Transformations |
III

Whenever a basis of eigenvectors exists, the use of determinants will


provide a method for finding the eigenvalues and, knowing the eigenvalues,
use of the Hermite normal form will enable us to find the eigenvectors.
This method is convenient only for vector spaces of relatively small di-
mension. For numerical work with large matrices other methods are
required.
The chapter closes with a discussion of what can be done if a basis of
eigenvectors does not exist.

1 I Permutations

To define determinants and handle them we have to know something


about permutations. Accordingly, we introduce permutations in a form
most suitable for our purposes and develop their elementary properties.
A permutation n of a set S is a one-to-one mapping of S onto itself. We
are dealing with permutations of finite sets and we take S to be the set of
the first n integers; S == {1, 2, n). Let n(i) denote the element which
. . . ,

n associates with /. Whenever we wish to specify a particular permutation


we describe it by writing the elements of S in two rows the first row con- ;

taining the elements of S in any order and the second row containing the
element n(i) directly below the element i in the first row. Thus for S =
{1, 2, 3, 4}, the permutation n for which n{\) = 2, n(2) = 4, tt-(3) = 3,
and 7r(4) = 1 can conveniently be described by the notations
,

/I 2 3 4, /2 4 1 3V ^ ,4.32
Qr
\2 4 3 1/ \4 1 2 3/ \l 2 3 4

Two permutations acting on the same set of elements can be combined


as functions. Thus, if n and a are two permutations, an will denote that
permutation mapping i onto a[n(i)]; (an)(i) = a[n(i)]. As an example,
let n denote the permutation described above and let

12 3 4
a =
13 4 2
Then

=
12 3 4
OTT
3 2 4 1

Notice particularly that an ^ na.


If n and a is a unique permutation
are two given permutations, there
p such that pn = a. Since p must satisfy the condition that p[n(i)] = a(i),
p can be described in our notation by writing the elements n(i) in the first
:

1 I
Permutations 87

row and the elements a(i) in the second row. For the n and a described
above,
/2 4 3 1'

13 4 2/.

The permutation that leaves all elements of S fixed is called the identity
permutation and will be denoted by e. For a given tt the unique permutation
7T
_1
such that tt~ x tt = e is called the inverse of it.
in S we have tt{i) > -n-(j), we say that tt
If for a pair of elements i <y
performs an inversion. Let k(Tr) denote the total number of inversions
performed by tt; we then say that tt contains k(Tr) inversions. For the
permutation tt described above, k(Tr) = 4. The number of inversions in
-1
7T is equal to the number of inversions in tt.

For a permutation tt, let sgn tt denote the number (— l) kM "Sgn" is .

an abbreviation for "signum" and we use the term "sgn 77-" to mean "the
sign of 77." If sgn tt — 1 , we say that tt is even ; if sgn tt = — 1 , we say that
tt is odd.

Theorem 1.1. Sgn air = sgn a sgn tt. •

proof, a can be represented in the form

77(0 • • •
TT(j)
<y =
OTr(i) • • •
Ott(J)

because every element of S appears in the top row. Thus, in counting the
inversions in a it is sufficient to compare 77(1) and tt(j) with OTr(i) and on(j).
For a given i < j there are four possibilities
1. i <j; tt{i) < 7r(j); ottQ) < o-tt(j): no inversions.
2. i <j; Tr(i) < tt(j); aTr(i) > ott(j): one inversion in a, one in 0-77.
3. i <y; 7r(/) > 7r(y); cnr(i) > ott{j)\ one inversion in 77, one in <77r.
4. 1 <y; 7r(/) > 7r(/); (T7r(i) < gtt(j): one inversion in 77-, one in a, and
none in cm.
Examination of the above table shows that k{oTr) differs from k(a) + k(ir)

by an even number. Thus sgn an = sgn a sgn tt. •

Theorem 1.2. If a permutation tt leaves an element of S fixed, the inversions


involving that element need not be considered in determining whether tt is even
or odd.
proof. Suppose 7r(j) = j. There are j — elements of S less than j and 1

n — j elements of S larger than j. For < j an inversion occurs if and only i

if 7r(0 > tt(j) — j. Let k be the number of elements in S preceding j for i

which 7t(/) > j. Then there must also be exactly k elements of S following i

j for which tt(J) < j. It follows that there are 2k inversions involving j.
Since their number is even they may be ignored in determining sgn tt.
88 Determinants, Eigenvalues, and Similarity Transformations |
III

Theorem 1.3. A permutation which interchanges exactly two elements of


S and leaves all other elements of S fixed is an odd permutation.
proof. Let 77 be a permutation which interchanges the elements / andy
and leaves all other elements of S fixed. According to Theorem 1.2, in
determining sgn tt we can ignore the inversions involving all elements of
S other than i and/ There is just one inversion left to consider and sgn tt =
-1.

Among other things, this shows that there is at least one odd permutation.
In addition, there is at least one even permutation. From this it is but a
step to show that the number of odd permutations is equal to the number of
even permutations.
Let a be a fixed odd permutation. If n is an even permutation, an is odd.
Furthermore, cr_1 is also odd so that to each odd permutation r there cor-
responds an even permutation o~ x t. Since a~ 1 {aTr) = tt, the mapping of
the set of even permutations into the set of odd permutations defined by
tt ->
ott is one-to-one and onto. Thus the number of odd permutations is
equal to the number of even permutations.

EXERCISES
1. Show that there are n\ permutations of n objects.

There are six permutations of three


2. objects. Determine which of them are
even and which are odd.
There are 24 permutations of four objects. By use of Theorem 1.2 and
3.

Exercise 2 we can determine the parity (evenness or oddness) of 15 of these permu-


tations without counting inversions. Determine the parity of these 1 5 permutations
by this method and the parity of the remaining nine by any other method.
4. The nine permutations of four objects that leave no object fixed can be
divided into two types of permutations, those that interchange two pairs of objects
and those that permut the four objects in some cyclic order. There are three
permutations of the first type and six of the second. Find them. Knowing the
parity of the 15 permutations that leave at least one object fixed, as in Exercise 3,
and that exactly half of the 24 permutations must be even, determine the parity
of these nine.
5. By counting the inversions determine the parity of

'12 3 4 5

2 4 5 13
Notice that permutes the objects in {1, 2, 4} among themselves and the objects
tt

in {3, 5} amongthemselves. Determine the parity of it on each of these subsets


separately and deduce the parity of tt on all of S.
2 I
Determinants 89

2 I Determinants

Let A = [a it ] be a square n x n matrix. We wish to associate with this


matrix a scalar that will in some sense measure the "size" of A and tell us
whether or not A is non-singular.

Definition. The determinant of the matrix A = [a ti ] is defined to be the


scalar det A = \a it computed according to the rule
\

det A= \a ti \
= 2 ( s 8n *) a um a **w " ' a mi(n)> (2.1)

where the sum is taken over all permutations of the elements of S = {1 «}. , . . . ,

Each term of the sum is a product of n elements, each taken from a different
row of A and from a different column of A and sgn n. The number n is called ,

the order of the determinant.

As a direct application of this definition we see that

a lx a12
= flnfloo — flioflu
12 21- (2.2)
*2X

fl u fl la tfj

= «U a 22 a33 + ^12^23^31 + fll3«2i a 32


— «12«21^33 (2.3)
'21 fl 9
— tf 13 <Z 2 2 fl 31 — Onfl23^32-

fl a
«3i

In general, a determinant of order n will be the sum of n\ products. As


n increases, the amount of computation increases astronomically. Thus it
is very desirable to develop more efficient ways of handling determinants.

Theorem 2.1. det AT = det A.


proof. In the expansion of det A each term is of the form

(Sgn 7r)a lff(1 )a 2ir (2) " ' ' a nrrM-

The rows appear


factors of this term are ordered so that the indices of the
in the usual order and the column indices appear in a permuted order. In the
A T the same factors will appear but they will be ordered
expansion of det
T that is, according to the column indices
according to the row indices of A ,

of A. Thus this same product will appear in the form


-1
(Sgn 7T )a ff -i(i) i
ia a-i(2),2 ' ' '
«T _1 (n),n-

But since sgn tt-- 1 = sgn tt, this term is identical to the one given above.
T
Thus, in fact, all the terms in the expansion of det A are equal to cor-
T
responding terms in the expansion of det A, and det A = det A.
90 Determinants, Eigenvalues, and Similarity Transformations |
III

A consequence of this discussion is that any property of determinants


developed in terms of the rows (or columns) of A will also imply a cor-
responding property in terms of the columns (or rows) of A.

Theorem If A' is the matrix obtained from A by multiplying a row


2.2.
(or column) by a scalar c, then det A' = c det A.
of A
proof. Each term of the expansion of det A contains just one element
from each row of A. Thus multiplying a row of A by c introduces the factor c
into each term of det A. Thus det A' = c det A. •

Theorem 2.3. If A' is the matrix obtained from A by interchanging any


two rows (or columns) of A, then det A' = — det A.
proof. Interchanging two rows of A has the effect of interchanging two
row indices of the elements appearing in A. If or is the permutation inter-
changing these two indices, this operation has the effect of replacing each
permutation tt by the permutation tra. Since a is an odd permutation, this
has the effect of changing the sign of every term in the expansion of det A.
Therefore, det A' = —det A.

Theorem 2.4. If A has two equal rows, det A = 0.


proof. The matrix obtained from A by interchanging the two equal
rows is identical to A, and yet, by Theorem 2.3, this operation must change
the sign of the determinant. Since the only number equal to its negative is
det A = 0.

Note: There is a minor point to be made here. If 1 + 1 = 0, the proof


of this theorem is not valid, but the theorem is still true. To see this we
return our attention to the definition of a determinant. Sgn tt == 1 for both
even and odd permutations. Then the terms in (2.1) can be grouped into
pairs of equal terms. Since the sum of each pair is 0, the determinant is 0.

Theorem 2.5. If A' is the matrix obtained from A by adding a multiple of


one row (or column) to another, then det A' det A. =
proof. Let A' be the matrix obtained from A by adding c times row k
to rowy. Then

det A' = ^ (sgn 7r)a lw(1)


• • (a jwU) + ca kirU) ) • • •
a Mk) • • •
a nw(n)
IT

= 2 (s§n ^flird) ' ' ' a trli) • • • a k7r{k) • • •


a nn(n)
IT

+ c 2 (sgn 7T)a l1Ta) • • •


a k]rU) • • •
a kjrik) • • •
a nw{n) . (2.4)
ir

The second sum on the right side of this equation is, in effect, the deter-
minant of a matrix in which rows j and k are equal. Thus it is zero. The
first term is just the expansion of det A. Therefore, det A' = det A. u

It is evident from the definition that, if / is the identity matrix, det / = 1.


2 I Determinants 91

If an elementary matrix of type I, det E = c where c is the scalar


E is

factor employed in the corresponding elementary operation. This follows


from Theorem 2.2 applied to the identity matrix.
If E is an elementary matrix of type II, det E = 1. This follows from
Theorem 2.5 applied to the identity matrix.
If E is an elementary matrix of type III, det E = — 1. This follows from
Theorem 2.3 applied to the identity matrix.

Theorem 2.6. If E is an elementary matrix and A is any matrix, then


det EA = det E • det A = det AE.
proof. This is an immediate consequence of Theorems 2.2, 2.5, 2.3, and
the values of the determinants of the corresponding elementary matrices.

Theorem 2.7. det A = if and only if A is singular.


proof. If A is non-singular, a product of elementary matrices (see
it is

Chapter II, Theorem 6.1). Repeated application of Theorem 2.6 shows


that det A
equal to the product of the determinants of the corresponding
is

elementary matrices, and hence is non-zero.


If A is singular, the rows are linearly dependent and one row is a linear
combination of the others. By repeated application of elementary operations
of type II we can obtain a matrix with a row of zeros. The determinant of
this matrix is zero, and by Theorem 2.5 so also is det A. U

Theorem 2.8. If A and B are any two matrices of order n, then det AB =
det A det B = •
det BA.
proof. If A and B are non-singular, the theorem follows by repeated
application of Theorem 2.6. If either matrix is singular, then AB and BA
are also singular and all terms are zero.

EXERCISES
1. If all elements of a matrix below the main diagonal are zero, the matrix is

said to be in superdiagonal form; that is, a^ = for / > j. If A = [a^] is in super-

diagonal form, compute det A.


2. Theorem 2.6 "provides an effective and convenient way to evaluate deter-
minants. Verify the following sequence of steps.

3 2 2 1 4 1 1 4 1

1 4 1 = - 3 2 2 = - -10 -1
-2 -4 -1 -2 -4 -1 4 1

1 4 1 1 4 1

= - -2 1 = - -2 1

4 1 3

Now use the results of Exercise 1 to evaluate the last determinant.


:

92 Determinants, Eigenvalues, and Similarity Transformations |


III

3. compute a determinant there is no need to obtain a superdiagonal


Actually, to
form. Andelementary column operations can be used as well as elementary row
operations. Any sequence of steps that will result in a form with a large number of
zero elements will be helpful. Verify the following sequence of steps.

3 2 2 3 2 2 3 2

1 4 1 = 1 4 1 = 1 3 1

-2 -4 -1 -10 -10
This last determinant can be evaluated by direct use of the definition by computing

just one product. Evaluate this determinant.


4. Evaluate the determinants
(a) 1 -2 ib) 1 2 1

-1 3 1 3 4

2 5 1 5 6

1 2 3 4
5. Consider the real plane R 2 We agree that the two points (a t a 2 ), (b lt b 2 )
.
,

suffice to describe a quadrilateral with corners at (0, 0), {a


x a 2 ), (b x b 2 ), and , ,

(a x + b x ,a 2 + b 2 ). (See Fig. 2.) Show that the area of this quadrilateral is

b\ b2

(oi + bi, U2 + b 2)

Fie. 2
3 |
Cofactors 93

Notice that the determinant can be positive or negative, and that it changes sign
if the first and second rows are interchanged. To interpret the value of the deter-
minant as an area, we must either use the absolute value of the determinant or give
an interpretation to a negative area. We make the latter choice since to take the
absolute value is to discard information. Referring to Fig. 2, we see that the
direction of rotation from (a lf the same as
a 2) to {b lt b 2 ) across the enclosed area is

the direction of rotation from the positive a^-axis to the positive # 2 -axis. To
interchange {a lt a 2 ) and (6 l9 b 2 ) would be to change the sense of rotation and the
sign of the determinant. Thus the sign of the determinant determines an orientation
of the quadrilateral on the coordinate system. Check the sign of the determinant
for choices of (a lf a 2 ) and (Jb ± b 2 ) in various quadrants and various orientations.
,

6. (Continuation) Let Ebe an elementary transformation of R 2 onto itself.


E maps the vertices of the given quadrilateral onto the vertices of another quad-
rilateral. Show that the area of the new quadrilateral is det E times the area of the
old quadrilateral.

7. Let x lt . . . , xn be a set of indeterminates. The determinant

1 xx CBj
2 «n-l

X2

1 xn xn

is called the Vandermonde determinant of order n.

(a) Show that Fis a polynomial of degree n — 1 in each indeterminate separately


and of degree n{n — l)/2 in all the indeterminates together.
(b) Show that, for each i <j, Kis by x6 — x t
divisible .

(c) Show that TJ (*,- — x t) is a polynomial of degree n — 1 in each in-


l<i<3<n
determinate separately, and of degree n(n — l)/2 in all the indeterminates together.
(d) Show that V= JJ (x, - x<).

l<i<j<n

3 I Cofactors

For a given pair i, j, consider in the expansion for det A those terms which
have aH as a factor. Det A is of the form det A = a^A^ + (terms which do
not contain a u as a factor). The scalar A u is called the cofactor of a{j .

In particular, we see that A 1X = (sgn -n)a 2tt(2) nn(n) where this ^


sum includes all permutations -n that leave 1 fixed. Each such n defines a
permutation 7/ on S' = {2, n} which coincides with tt on S. Since
. . . ,

no inversion of tt involves the element 1, we see that sgn tt = sgn tt'. Thus
A i} is a determinant, the determinant of the matrix obtained from A by
crossing out the first row and the first column of A.
94 Determinants, Eigenvalues, and Similarity Transformations |
III

A similar procedure can be used to compute the cofactors A H By a .

sequence of elementary row and column operations of type III we can


obtain a matrix in which the element a{j is moved into row 1, column 1.
By applying the observation of the previous paragraph we see that the
cofactor A u is essentially the determinant of the matrix obtained by crossing
out the row and column containing the element a tj Furthermore, we can .

keep the other rows and columns in the same relative order if the sequence
of operations we use interchanges only adjacent rows or columns. It takes
i — 1 interchanges to move the element a
ti into the first row, and it takes

j — 1 interchanges to move it into the first column. Thus A if is (— l) '-i+>'-i =


l

(_ iy+3 times the determinant of the matrix obtained by crossing out the
rthrow and theyth column of A.
Each term in the expansion of det A contains exactly one factor from each
row and each column of A. Thus, for any given row of A each term of det A
contains exactly one factor from that row. Hence, for any given i,

det A = 2 a^A^. (3.1)


i

Similarly, for any given column of A each term of det A contains exactly one
factor from that column. Hence, for any given k,
det A = 2 a ikA ih . (3.2)
3

These expansions of a determinant according to the cofactors of a row


or column reduce the problem of computing an «th order determinant to
that of computing n determinants of order n — 1. We have already given
explicit expansions for determinants of orders 2 and 3, and the technique
of expansions according to cofactors enables us to compute determinants
of higher orders. The labor of evaluating a determinant of even quite
modest order is still quite formidable, however, and we make some suggestions

as to howwork can be minimized.


the
First, observe that if any row or column has several zeros in it, expansion
according to cofactors of that row or column will require the evaluation of
only those cofactors corresponding to non-zero elements. It is clear that
the presence of several zeros in any row or column would considerably
reduce the labor. If we are not fortunate enough to find such a row or
column, we can produce a row or column with a large number of zeros by
applying some elementary operations of type II. For example, consider
the determinant
3 2-2 10

112
3
det ,4 =
-2234
115 2
3 I Cofactors 95

If the numbers appearing in the array were unwieldy, there would be no


choice but to wade in and make the best of it. The numbers in our example
are all integers, and we will not introduce fractions if we take advantage
of the l's that appear in the array. By Theorem 2.5, a sequence of elementary
operations of type II will not change the value of the determinant. Thus we
can obtain
-1 -17 4
1 -17 4
-2 -14 -4
det^ = -2 14 -4
4 13 8
13 8
1 1

Now we face We can expand the 3rd order determinants as


several options.
it stands; we can
same technique again; or we can try to remove a
try the
common factor from some row or column. We can remove the common
factor —1 from the second row and the common factor 4 from the third
column. Although 2 is factor of the second row, we cannot remove both
a 2 from the second row and a 4 from the third column. Thus we can obtain

-1 -17 -1 -17 1

det A = 4 14 = 4- 3 31

13 6 47

3 31
= 4- — -1
6 47

If we multiply the elements in row by the cofactors of the elements in


i

row k 7^ i, we get the same result as we would if the elements in row k were
equal to the elements in row /. Hence,

2a it AM = for i ^ k, (3.3)

and
2 a^A^ = for j ^ k. (3.4)

The various relations we have developed between the elements of a matrix


and their cofactors can be summarized in the form

2 A* =
«.• <5<* det A, (3.5)

2 a aA =iic djh det A. (3.6)

If A = [a{j ] is any square matrix and A tj


is the cofactor of aijy the matrix
[A tj ] T = adj A is called the adjunct of A. What we here call the "adjunct"
96 Determinants, Eigenvalues, and Similarity Transformations |
III

is traditionallly called the "adjoint." Unfortunately, the term "adjoint"


is also used to denote a linear transformation that is not represented by the
adjoint (or adjunct) matrix. A new term We shall have a
is badly needed.
use for the adjunct matrix only in this chapter. Thus, this unconventional
terminology will cause only a minor inconvenience and help to avoid con-
fusion.

Theorem 3.1. A-ad]A = (adj A) • A = (det A) •


/.

PROOF.

A •
adj A = [a it ] •
[A kl f = 2 a aA u
= (det A) •
I. (3.7)

(adj A)-A = [A kl f [««] = I *ik u ii


= (det A) -I. a (3.8)

Theorem 3.1 provides us with an effective technique for computing the


inverse of a non-singular matrix. However, it is effective only in the sense
that the inverse can be computed by a prescribed sequence of steps. The
number of steps is large for matrices of large order, and it is not sufficiently
small for matrices of low order to make it a preferred technique. The method
described in Section 6 of Chapter II is the best method that is developed in
this text. In numerical analysis where matrices of large order are inverted,
highly specialized methods are available. But a discussion of such methods
is beyond the scope of this book.

A matrix A is non-singular if and only if det A ^ 0, and in this case we


can see from the theorem that

s% = — au s\. (3.9)
det A
This is illustrated in the following example.

"
1 2 3" "-3 5 1

A = 2 1 2 adj A = -2 5 4

_-2 1 -1_ * 4 -5 -3
"-3 5 r
A~* = i -2 5 4

4 -5 -3
The relations between the elements of a matrix and their cofactors lead
to a method for solving a system of n simultaneous equations in n unknowns
3 I Cofactors 97

when the equations are independent. Suppose we are given the system of
equations

J.a i§ x s =b it (i = l,2,...,n). (3.10)


3=1

The assumption that the equations are independent is expressed in the


condition that det A ^ 0, where A = [a it ]. Let AH be the cofactor of a^.
Then for a given k
n n n i n \

22 A
/ \

2A ik { 2,a tJ xA = ik a i
Ax i

n
=2 det 4 w a;,

= det A zfc
= 2 A ncK (3.11)

Since det A y£ we see that

.2, ^ifc^i
3/i. — i=l
(3.12)
det A
The numerator can be interpreted as the cofactor expansion of the deter-
minant of the matrix obtained by replacing the kth column of A by the
column of the b t In this form the method is known as Cramer's rule.
.

Cramer's rule is convenient for systems of equations of low order, but


it fails if the system of equations is dependent or the number of equations

is different from the number of unknowns. Even in these cases Cramer's


rule can be modified to provide solutions. However, the methods we have
already developed are usually easier to apply, and the balance in their favor
increases as the order of the system of equations goes up and the nullity
increases.

EXERCISES
1. In the determinant
2 7 5 8

7-125
10 4 2

-3 6-1 2

find the cofactor of the "8" ; find the cofactor of the " -3."

2. The expansion of a determinant in terms of a row or column, as in formulas


(3.1) and (3.2), provides a convenient method for evaluating determinants. The
.

98 Determinants, Eigenvalues, and Similarity Transformations |


III

amount of work involved can be reduced if a row or column is chosen in which


some of the elements are zeros. Expand the determinant
1 3 4 -1

2 2 1

-1 1 3

3 1 2

in terms of the cofactors of the third row.

3. It is even more convenient to combine an expansion in terms of cofactors

with the method of elementary row and column operations described in Section 2.
Subtract appropriate multiples of column 2 from the other columns to obtain

1 3 7 8

2 2 2 7

-1
-3 1 2

and expand this determinant in terms of cofactors of the third row.


4. Show that det (adj A) = (det A)"- 1 .

5. Show that a matrix is non-singular if and only if its adj A is also non-singular.
6. Let A = [ciij] be an arbitrary n x n matrix and let adj A be the adjunct of A.
If X= (x lt . . . , xn ) and Y = (y lt y n ) show that
. .
,

yy(adj A)X = -

V\
For notation see pages 42 and 55.

4 The Hamilton-Cayley Theorem


I

Let p(x) = a m x m + + a be a polynomial in an indeterminate x


• • •

with scalar coefficients a^ If A is an n x n matrix, by p(A) we mean the


matrix a m A m + a m_ x A m ~ x H + a I. Notice particularly that the
constant term a must be replaced by a I so that each term of p(A) will be
a matrix. No particular problem is encountered with matric polynomials of
this form since all powers of a single matrix commute with each other.
Any polynomial identity will remain valid if the indeterminate is replaced
.

4 |
The Hamilton-Cayley Theorem 99

by a matrix, provided any scalar terms are replaced by corresponding scalar


multiples of the identity matrix.
We may also consider polynomials with matric coefficients. To make
sense, all coefficients must be matrices of the same order. We consider
only the possibility of substituting scalars for the indeterminate, and in all

manipulations with such polynomials the matric coefficients commute with


the powers of the indeterminate. Polynomials with matric coefficients can
be added and multiplied in the usual way, but the order of the factors
is important in multiplication since the coefficients may not commute. The

algebra of polynomials of this type is not simple, but we need no more than

the observation that two polynomials with matric coefficients are equal if
and only if they have exactly the same coefficients.
We avoid discussing the complications that can occur for polynomials
with matric coefficients in a matric variable.
Now we should like to consider matrices for which the elements are
polynomials. If F is the field of scalars for the set of polynomials in the
indeterminate x, let K be the set of all rational functions in x; that is, the
set of all permissible quotients of polynomials in x. It is not difficult to show
that K is a field. Thus a matrix with polynomial components is a special
case of a matrix with elements in K.
From this point of view a polynomial with matric coefficients can be
expressed as a single matrix with polynomial components. For example,

i r o 2i [2 -11 x2 + 2 2x - 1
r °i x 2
+ x + —
-1 2, -2 0_ 1 1_ -x - 2
2x + 1 2x 2
+ 1.

Conversely, a matrix in which the elements are polynomials in an indeter-


minate x can be expanded into a polynomial with matric coefficients. Since
polynomials with matric coefficients and matrices with polynomial compo-
nents can be converted into one another, we refer to both types of expressions
as polynomial matrices.

Definition. If A is any square matrix, the polynomial matrix A — xl —


C is called the characteristic matrix of A
C has the form
flu — X

floo X «2„

(4.1)

[_ "nl 'n2 a„„ — x


100 Determinants, Eigenvalues, and Similarity Transformations |
III

The determinant of C is a polynomial det C = f(x) = k n x n + k n_ xx n - x +


-
+ k of degree n;
• •
it is The
called the characteristic polynomial of A.
equation /(x) = is we should
called the characteristic equation of A. First,
observe that the coefficient of x n in the characteristic polynomial is (— l) n ,

~
the coefficient of x n x is (— l) n_1 £ n=1 a u> and the constant term k = det A.
4

Theorem 4.1. {Hamilton-Cayley theorem). If A is a square matrix and


f{x) is its characteristic polynomial, then j (A) = 0.
proof. Since C is of order n, adj C will contain polynomials in x of degree
not higher than n — 1. Hence adj C can be expanded into a polynomial
with matric coefficients of degree at most n — 1

n~x
adj C = Cn_ xx + C„_ 2 x"- 2
+ • • •
+ Cx + C x (4.2)

where each C t
is a matrix with scalar elements.
By Theorem 3.1 we have
adjC- C= det C- /=/(>;)/
= adj C •
{A - xl) = (adj C)A - (adj C)x. (4.3)
Hence,
k n Ix n + k n _Jx n - x -\ + kjx + k I

+ CV^a;"- 1 + • • •
+ C^x + CQ A. (4.4)

The expressions on the two sides of this equality are n X n polynomial


matrices. Since two polynomial matrices are equal if and only if the cor-
responding coefficients are equal, (4.4) is equivalent to the following set
of matric equations:
kn* = C n_ x
k n-\l = C n-2 + Cn _ x A

(4.5)

k I = C ^4.

Multiply each of these equations by A n


, A n~x , . . . , A, I from the right,
respectively, and add them. The terms on the right side will cancel out
leaving the zero matrix. The terms on the left add up to

k nA n + kn_x A n~x + -.. + hx A + k Q I=f(A) = 0. (4.6)

The equation m(x) = of lowest degree which A satisfies is called the


minimum equation (or minimal equation) for A m(x) is called ; the minimum
polynomial for A. Since A satisfies its characteristic equation the degree
of m(x) is not more than n. Since a linear transformation and any matrix
4 |
The Hamilton-Cayley Theorem 101

representing it satisfy the same relations, similar matrices satisfy the same
set of polynomial equations. In particular, similar matrices have the same
minimum polynomials.

Theorem 4.2. If g(x) is any polynomial with coefficients in F such that


g(A) — 0, then g(x) is divisible by the minimum polynomial for A. The
minimum polynomial is unique except for a possible non-zero scalar factor.
proof. Upon dividing g(x) by m(x) we can write g{x) in the form
g(x) = m{x) q{x) •
+ r(x), (4.7)

where q(x) is the quotient polynomial and r{x) is the remainder, which is
either identically zero or a polynomial of degree less than the degree of
is

m(x). Ifg(x) is a polynomial such that g(A) = 0, then

g{A) = = m(A) q(A) •


+ r(A) = r{A). (4.8)

This would contradict the selection of m(x) as the minimum polynomial


for A unless the remainder r(x) is identically zero. Since two polynomials
of the same lowest degree must divide each other, they must differ by a
scalar factor.

As we have pointedout, the elements of adj C are polynomials of degree at


most n —Let g(x) be the greatest common divisor of the elements of
1.

adj C. Since adj C C =/(*)/, g(x) divides /(x).

z
Theorem 4.3. h(x) = f(
——) is the minimum polynomial for A.
g(x)
proof. Let adj C= g(x)B where the elements of B have no non-scalar
common factor. Since adj C C = f{x)Iwe have h(x)
• •
g(x)I = g(x)BC. Since
g(x) j£ this yields

BC = h(x)I. (4.9)

Using B in place of adj C we can repeat the argument used in the proof of the
Hamilton-Cayley theorem to deduce that h{A) = 0. Thus h(x) is divisible
by m(x).
On the other hand, consider the polynomial m(x) — m(y). Since it is a
sum of terms of the form c^ — y*), each of which is divisible by y — x,
m(x) — m(y) is divisible by y — x:

m(x) - m(y) = (y - x) •
k(x, y). (4.10)

Replacing x by xl and y by A we have

m(xl) - m(A) = m(x)I = (A - xl) •


k(xl, A) =C k(xl, A). (4.1 1)

Multiplying by adj C we have

m(x) adj C= (adj C)C • k(xl, A) = f(x) •


k(xl, A). (4.12)
102 Determinants, Eigenvalues, and Similarity Transformations |
III

Hence,

m(x) g{x)B = h{x) g(x) •


k(xl, A), (4. 13)
or
m(x)B = h(x) •
k(xl, A). (4.14)

Since h{x) divides every element of m{x)B and the elements of B have no
non-scalar common factor, h(x) divides m(x). Thus, h{x) and m{x) differ
at most by a scalar factor.

Theorem 4.4. Each irreducible factor of the characteristic polynomial f(x)


of A is also an irreducible factor of the minimum polynomial m(x).
proof. As we have seen in the proof of the previous theorem

m(x)I =C •
k(xl, A).
Thus
det m(x)I = [m(x)] n = det C det k(xl, A) •

= f(x)- det k(xl, A). (4.15)

We see then that every irreducible factor off(x) divides [m(x)] n , and therefore
m(x) itself.

Theorem 4.4 shows that a characteristic polynomial without repeated


factors is also the minimum polynomial. As we shall see, it is the case in
which the characteristic polynomial has repeated factors that generally causes
trouble.
We now ask the converse question. Given the polynomial f(x) =
~
{—\) nx n + k n_^ x x n x + + k does there exist an n x n matrix A for
• • •
,

which f(x) is the minimum polynomial ?


Let A = {x lt <x } be any basis and define the linear transformation
.
n . . ,

a by the rules
oiu-i) = a i+1 for i < n, (4.16)
and
(-l) n cr(a n ) = -k oc
x
- k x cc 2 A:
n_ x a n .

It follows directly from the definition of a that

/(<0(«i) = (-l)MaJ + & n_ x a n + • • •


+ k t 0L 2 + k ax = 0. (4.17)

For any other basis element we have


-1
/(*)(«*) =A«)[<y
J
M] = o^[f{a)M] = 0. (4.18)

Since f(a) vanishes on the basis elements f(a) = and any matrix repre-
senting a satisfies the equation f(x) 0. =
4 |
The Hamilton-Cayley Theorem 103

On
the other hand, a cannot satisfy an equation of lower degree because
the corresponding polynomial in a applied to a x could be interpreted as
a relation among the basis elements. Thus, f(x) is a minimum polynomial
for a and for any matrix representing a. Since f(x) is of degree n, it must
also be the characteristic polynomial of any matrix representing a.
With respect to the basis A the matrix representing a is

-(-1)%
1 - (-l) n *i
1 -(-l) n k 2
A = (4.19)

1 -(-1)^^
A is called the companion matrix off(x).

Theorem 4.5. f(x) is a minimum polynomial for its companion matrix.

EXERCISES
1. Show that -x 3 + 39z - 90 is the characteristic polynomial for the matrix

"0 -90"

1 39

2. Find the characteristic polynomial for the matrix

~2 -2 3"

1 1 1

1 3 -1
and show by direct substitution that this matrix satisfies its characteristic equation.

3. Find the minimum polynomial for the matrix

^3 2 2"

1 4 1

-2 -4 -1
4. Write down a matrix which has x4 + 3*3 + 2x 2 - x + 6 = as its minimum
equation.
104 Determinants, Eigenvalues, and Similarity Transformations |
III

5. Show that if the matrix A satisfies the equation x 2 + x + 1 =0, then A is


-1
non-singular and the inverse A is expressible as a linear combination of A and /.
6. Show that no real 3x3 matrix satisfies x2 + 1 =0. Show that there are
complex 3x3 matrices which do. Show that there are real 2x2 matrices that
satisfy the equation.

Find a 2 x 2 matrix with integral elements


7. satisfying the equation x3 — 1 =0,
but not satisfying the equation x — 1 =0.
8. Show that the characteristic polynomial of

7 4-4"
4 -8 -1
-4 -1 -8
is not its minimum polynomial. What is the minimum polynomial ?

5 I Eigenvalues and Eigenvectors

Let a be a linear transformation of V into itself. It is often useful to find


subspaces of V in which a also acts as a linear transformation. If is such W
a subspace, this means that cr(VV) <= W. A subspace with this property is
called an invariant subspace of V under a. Generally, the problem of deter-
mining the properties of a on V can be reduced to the problem of determining
the properties of a on the invariant subspaces.
The simplest and most restricted case occurs when an invariant subspace
W is of dimension 1. In that case, let {a x } be a basis for W. Then, since
o{v.i) e W, there is a scalar X x such that o{vl x ) = X x <x x Also for any a e W, .

a = a x a x and hence c(a) = a x a(a x ) = a x X x y. x = X x a. In some sense the


scalar X x is characteristic of the invariant subspace W; a stretches every
vector in W
by the factor X x .

In general, a problem of finding those scalars X and associated vectors


I for which o-(£) =
A| is called an eigenvalue problem. non-zero vector A
£ is called an eigenvector of a if there exists a scalar X such that c(£) X£. =
A scalar X is called an eigenvalue of a if there exists a non-zero vector £
such that <r(£) =
Notice that the equation <r(|) = X£ is an equation in
Xg.
two one of which
variables, is a vector and the other a scalar. The solution

1 = and X any scalar is a solution we choose to ignore since it will not


lead to an invariant subspace of positive dimension. Without further
conditions we have no assurance that the eigenvalue problem has any other
solutions.
A typical and very important eigenvalue problem occurs in the solution
of partial differential equations of the form
2 2
d u d u
dx* dy 2
~ '
5 |
Eigenvalues and Eigenvectors 105

subject to the boundary conditions that w(0, y) = u(tt, y) = 0,

lim u(x, y) = 0, and u(x, 0) = f(x)


y -oo

where f(x) is The standard technique of separation of


a given function.
variables leads us to try to construct a solution which is a sum of functions
of the form XY where X is a function of x alone and Y is a function of
y alone. For this type of function, the partial differential equation becomes

dx 2 dy 2
Since
1 .^!l = _ A .<!*x
Y dy 2 X dx2
is a function of x alone and also a function of y alone, it must be a constant
(scalar) which we shall call k2 . Thus we are trying to solve the equations

— ~- —kkXX —2~-
dx 2
2
>

dy
2
k Y -

These are eigenvalue problems as we have defined the term. The vector
space is the space of infinitely differentiable functions over the real numbers
and the linear transformation is the differential operator d 2 /dx2 .

For a given value of k 2 (k > 0) the solutions would be


X= ax cos kx + a 2 sin kx,
Y= a3 e~ kv + a4 e*».

The boundary conditions w(0, y) = and liny^ u(x, y) = imply that


ai = a \ = 0. The most interesting condition for the purpose of this example
is that the boundary condition u(tt, y) = implies that k is an integer.
Thus, the eigenvalues of this eigenvalue problem are the integers, and the
corresponding eigenfunctions (eigenvectors) are of the form ak e- kv sin kx.
The fourth boundary condition leads to a problem in Fourier series; the
problem of determining the ak so that the series
oo

2 a k sin kx
represents the given function f(x) for < x < n.
Although the vector space in this example is of infinite dimension, we
restrict our attention to the eigenvalue problem in finite dimensional vector
spaces. In a finite dimensional vector space there exists a simple necessary
and sufficient condition which determines the eigenvalues of an eigenvalue
problem.
The eigenvalue equation can be written in the form (a — A)(£) = 0.
We know that there exists a non-zero vector | satisfying this condition if
106 Determinants, Eigenvalues, and Similarity Transformations |
III

and only if a — X is singular. Let A = {a l5 a n } be any basis of V and . . . ,

let A = [a ] be the matrix representing a with respect to this basis. Then


i}

A — XI = C(X) is the matrix representing a — X. Since A — XI is singular


if and only if det {A — XI) = f(X) = 0, we see that we have proved

Theorem 5.1. A scalar X isan eigenvalue of a if and only if it is a solution


of the characteristic equation of a matrix representing a.

Notice that Theorem 5.1 applies only to scalars. In particular a solution


of the characteristic equation which is not a scalar is not an eigenvalue. For
example, if the field of scalars is the field of real numbers, then non-real
complex solutions of the characteristic equation are not eigenvalues. In the
published literature on matrices the terms "proper values" and "characteristic
values" are also used to denote what we have called eigenvalues. But,
unfortunately, the same terms are often also applied to the solutions of the
characteristic equation. We call the solutions of the characteristic equation
characteristic values. Thus, a characteristic value is an eigenvalue if and only
if it is also in the given field of scalars. This distinction between eigenvalues
and characteristic values is not standard in the literature on matrices, but we
hope this or some other means of distinguishing between these concepts will
become conventional.
In abstract algebra a field is said to be algebraically closed if every poly-
nomial with coefficients in the field factors into linear factors in the field.
The field of complex numbers is algebraically closed. Though many proofs

of this assertion are known, none is elementary. It is easy to show that


algebraically closed fields exist, but it is not easy to show that a specific field is
algebraically closed.
Since for most applications of concepts using eigenvalues or characteristic
values the underlying field is either rational, real or complex, we content
ourselves with the observation that the concepts, eigenvalue and characteristic,
value, coincide if the underlying field is complex, and do not coincide if the
underlying field is rational or real.
The procedure for finding the eigenvalues and eigenvectors of a is fairly

direct. For some basis A = {<x x , . . . , <x A be


n }, let the matrix representing a.
Determine the characteristic matrix C(x) =A— xl and the characteristic
equation det {A — XI) = f(x) = 0. Solve the characteristic equation. (It is

this step that presents the difficulties. The characteristic equation may have
no solution in F. In that event the eigenvalue problem has no solution.
Even in those cases where solutions exists, finding them can present practical
difficulties.) For each solution X off(x) = 0, solve the system of homogene-

ous equations

(A - XI)X = C(X) -X=0. (5.1)


5 |
Eigenvalues and Eigenvectors 107

Since this system of equations has positive nullity, non-zero solutions exist
and we should use the Hermite normal form to find them. All solutions
are the representations of eigenvectors corresponding to a.
Generally, we are given the matrix A rather than a itself, and in this case
we regard the problem as solved when the eigenvalues and the representations
of the eigenvectors are obtained. We refer to the eigenvalues and eigenvectors
of a as eigenvalues and eigenvectors, respectively, of A.

Theorem 5.2. Similar matrices have the same eigenvalues and eigenvectors.
proof. This follows directly from the definitions since the eigenvalues
and eigenvectors are associated with the underlying linear transformation.

Theorem 5.3. Similar matrices have the same characteristic polynomial.


proof. Let A and = P~ AP be similar. Then
A' X

det (A' - xl) = det (P^AP - xl) = det {P~ {A - xI)P) = detP- X 1

det {A - xl) det P = det (A - xl) =f(x).

We polynomial of any matrix representing a the


call the characteristic
characteristic polynomial of o. Theorem 5.3 shows that the characteristic
polynomial of a linear transformation is uniquely defined.
Let S(X) be the set of all eigenvectors of a corresponding to X, together
with 0.

Theorem 5.4. S(A) is a subspace of V.


proof. If a and (3 e S(X),then
a(aa. + bp) = ao{v.) + ba{p)
= aXcn + bXfi
= X(a<x + bp). (5.2)

Hence, ace + bp e S(A) and S(X) is a subspace.

We call S(X) the eigenspace of a corresponding to X, and any subspace


of S(X) is called an eigenspace of a.
The dimension of S(X) is equal to the nullity of C(X), the characteristic
matrix of A
with X substituted for the indeterminate x. The dimension of
S(X) is called the geometric multiplicity of X. We have shown that X is also
a solution of the characteristic equation f(x) = 0. Hence, (x — A) is a
factor of f(x). If (x — X) k is a factor of f(x) while (x X) k+l is not, X is a —
root of f(x) = of multiplicity k. We refer to this multiplicity as the
algebraic multiplicity of X.

Theorem 5.5. The geometric multiplicity of X does not exceed the algebraic
multiplicity of X.

proof. Since the geometric multiplicity of X is defined independently of


any matrix representing a and the characteristic equation is the same for all
108 Determinants, Eigenvalues, and Similarity Transformations |
III

matrices representing a prove the theorem for any


it will be sufficient to

particular matrix representing a. We shall choose the matrix representing


a so that the assertion of the theorem is evident. Let r be the dimension
of S(K) and let {f 1} £ r } be a basis of S{X). This linearly independent set
. . .
,

can be extended to a basis {£ 1? £J of V. Since o(£ t) = A!< for / < r,


. . . ,

the matrix A representing a with respect to this basis has the form

'X «l,r+l

I a*2,r+l

A = 1 a r,r+l (5.3)

a r+l,r+l

a.

From the form of A it is evident that det {A — xl)=f(x) is divisible by


(x _ X) r
. Therefore, the algebraic multiplicity of I is at least r, which is the
geometric multiplicity.

Theorem 5.6. If the eigenvalues X x , . . . , X s are all different and {f ls ...,£,}


is a set of eigenvectors, ^ corresponding to Xt , then the set {£ l5 . . . , £,} is

linearly independent.
proof. Suppose the dependent and that we have reordered the
set is
eigenvectors so that the k eigenvectors are linearly independent and
first

the last s — k are dependent on them. Then


k

is = 2 a&i
where the representation is unique. Not all a t
= since | s ^ 0. Upon
applying the linear transformation a we have

There are two possibilities to be considered. If X s = 0, then none of the


A; for / < k is zero since the eigenvalues are distinct. This would imply
5 Eigenvalues and Eigenvectors
|
109
that {iu . . . , ik j is linearly dependent, contrary to assumption. If X s ^ 0,
then
k 1

<-i As
Since not and XJX Sall at =
1, this would contradict the uniqueness of ^
the representation of £ s Since we get a contradiction in any event, the .

set {£ x ... f J must be linearly independent.


, ,

EXERCISES
1. Show that X = is an eigenvalue of a matrix A if and only if A is singular.
2. Show that an eigenvector of
then I is also an eigenvector of a n for
if £ is cr,

each « > 0. If X is the eigenvalue of a corresponding to f what is the eigenvalue ,

of an corresponding to £ ?
3. Show that if £ is an eigenvector of both a and r, then £ is also an eigenvector
of aa{ for aeF) and + t. If A x is the eigenvalue ct of a and A 2 is the eigenvalue of
t corresponding to I, what are the eigenvalues of aa and a + T?
4. Show, by producing an example, that if X x and A 2 are eigenvalues of <r and a 2
x ,

respectively, it is not necessarily true that X 1 + A 2 is an eigenvalue of at + <r


2.
5. Show that if £ is an eigenvector of a, then it is also an
eigenvector of p(o)
where p{x) is a polynomial with coefficients in F. If X is an eigenvalue of a corre-
sponding to I, what is the eigenvalue of p{a) corresponding to f ?
6. Show that if a is non-singular and X is an eigenvalue of a, then A" 1 is an
_1
eigenvalue of <r . What is the corresponding eigenvector?
7. Show that if every vector in V is an eigenvector of a, then a is a scalar trans-
formation.
8. Let P n be the vector space of polynomials of degree at most n — 1 and let D ,

be the differentiation operator; that is D(t k) = kt*-1 Determine the characteristic


.

polynomial for D. From your knowledge of the differentiation operator and net
using Theorem 4.3, determine the minimum polynomial for D. What kind of
differential equation would an eigenvector of D have to satisfy? What are the
eigenvectors of D?
9.

1) is
Let A =
an eigenvector. What
[a^]. Show that if ^ «« = c independent of /, then I = (1 , 1 , . . .
,

is the corresponding eigenvalue?


10. Let W
be an invariant subspace of V under a, and let A = {a
l5 a n } be a . . . ,

basis of V such that {a x a]


J is a basis of W. Let ^ = [a i3 ] be the matrix
, . . . ,

representing a with respect to the basis A. Show that all elements in the first k
columns below the fcth row are zeros.
11. Show that if X1 and A 2 ^ X x are eigenvalues of a x and £ x and £ 2 are eigen-
vectors corresponding to X x and X 2 respectively, then f x + £ 2 is not an eigenvector.
,

12. Assume that {$ lf . . . , | lr } are eigenvectors with distinct eigenvalues. Show


that 2i =1 a -£j t is never an eigenvector unless precisely one coefficient is non-zero.
110 Determinants, Eigenvalues, and Similarity Transformations |
HI

13. Let A
be an n x n matrix with eigenvalues Alt A 2 , . . . , A„. Show that if A
is the diagonal matrix
• • ~~
"li •

A2

A =

and P = [pij] is the matrix in which column j is the M-tupIe representing an eigen-
vector corresponding to A3-, then AP = PA.
14. Use the notation of Exercise 13. Show that if A has n linearly independent
eigenvalues, then eigenvectors can be chosen so that P is non-singular. In this case
p-^AP = A.

6 I Some Numerical Examples


Since we are interested here mainly in the numerical procedures, we
start with the matrices representing the linear transformations and obtain
their eigenvalues and the representations of the eigenvectors.
Example 1. Let
"-I 2 2"

A = 2 2 2

-3 -6 -6
The first step is to obtain the characteristic matrix

-_1 _ x 2 2

C{x) = 2 2- x 2

_3 _6 -6 -

and then the characteristic polynomial

detC(a:)= -(* + 2)(a + 3)a\


Thus the eigenvalues of A are A x —2, A 2 = A3 0. The next = —3, and =
steps are to substitute, successively, the eigenvalues for x in the characteristic

matrix. Thus we have


1 2 2~

C(-2) = 2 4 2

-3 -6 -4
6 |
Some Numerical Examples jjj
The Hermite normal form obtained from C(— 2) is

1 2

_0 0_
The components of the eigenvector corresponding to X
x
= —2 are found
by solving the equations
xx + 2x 2 =0
x3 = 0.

Thus (2, —1,0) is the representation of an eigenvector corresponding to


Xx \
for simplicity we shall write ^=
(2, -1,0), identifying the vector
with its representation.
In a similar fashion we obtain
"222
C(-3)= 2 5 2

__3 _6 -3
From C(— 3) we obtain the Hermite normal form
"1 1"

0.

and hence the eigenvector £ 2 = (1,0, —1).


Similarly, from
'-1 2 2"

C(0)= 2 2 2

_3 _6 -6
we obtain the eigenvector £ 3 = (0, 1, —1).
By Theorem 5.6 the three eigenvectors obtained for the three different
eigenvalues are linearly independent.
Example 2. Let
"
1 i -r
A = -1 3 -1
_-l 2 0_
From the characteristic matrix
1 - X i -r
C(x) = -1 3 — x -1
-1 2 —x
112 Determinants, Eigenvalues, and Similarity Transformations |
III

we obtain the characteristic polynomial det C(x) = — (x — l) (x — 2).


2

Thus we have just two distinct eigenvalues; X x = K = 1 with algebraic


multiplicity two, and /l
3 = 2.

Substituting A x for x in the characteristic matrix we obtain

"
1
-1"

C(l) = -1 2 -1
-1 2 -1

The corresponding Hermite normal form is

1 -1"

1 -1

Thus it is seen that the nullity of C(l) is 1 The eigenspace S(l) is of dimension
.

1 and the geometric multiplicity of the eigenvalue 1 is 1. This shows that the
geometric multiplicity can be lower than the algebraic multiplicity. We obtain
&= (1,1,1).
The eigenvector corresponding to X z = 2 is £3 = (0, 1, 1).

EXERCISES
For each of the following matrices find all the eigenvalues and as many linearly

independent eigenvectors as possible.

"2 2"
1. 4" 2. 3

5 3 -2 3

3. "1 2" 1 -V2


2 -2 V2 4

2"
5.
'4
9 0" 6. 3 2

- -2 8 1 4 1

7 -2 -4 --1

0"
7.
"
7 4 -4" 2 —i

4 -8 -1 / 2

-4 -1 -8 9 3
7 Similarity
|
113

7 I Similarity

Generally, for a given linear transformation a we seek a basis for which


the matrix representing a has as simple a form as possible. The simplest
form is that in which the elements not on the main diagonal are zero, a
diagonal matrix. Not all linear transformations can be represented by
diagonal matrices, but relatively large classes of transformations can be
represented by diagonal matrices, and we seek conditions under which
such a representation exists.

Theorem 7.1. A linear transformation a can be represented by a diagonal


matrix if and only if there exists a basis consisting of eigenvectors
of a.
proof. Suppose there is a linearly independent set X =
{f l9 f } of . . .
,

eigenvectors and that {A l5 .X n ) are the corresponding eigenvalues.


. . ,

Then a(^) = A^ so that the matrix representing a with respect to the


basis X has the form
Xx

(7.1)

that is, cr is represented by a diagonal matrix.


Conversely, if a is represented by a diagonal matrix, the vectors in that
basis are eigenvectors.

Usually, we are not given the linear transformation a directly. We are


given a matrix A representing a with respect to an unspecified basis. In
this case Theorem 7.1 is usually worded in the form: matrix^ is similar A
to a diagonal matrix and only if there exist n linearly independent eigen-
if

vectors of A. In this form a computation is required. We must find the matrix


P such that P~X AP is a diagonal matrix.
Let the matrix A be given; that is, A represents a with respect to some
basis A = {a x , . . ., aj. Let f y =
2?=i/>««* be the representations of the
eigenvectors of A with respect to A. Then the matrix A' representing a
with respect to the basis X = {g lt ... , ^n} is P^AP = A'. By Theorem
7.1, A' is a diagonal matrix.
In Example 1 of Section 6, the matrix

-1 2

A = 2 2

-3 -6
114 Determinants, Eigenvalues, and Similarity Transformations |
III

has three linearly independent eigenvectors, £x = (2, — 1,0), £ 2 = 0> °> — 0»


and f s = (0, 1, —1). The matrix of transition P has the components of
these vectors written in its columns
0" " 1"
2 1 1 1

P= 1 1
p-l = -1 -2 -2
-1 -1 »
1 2 1

The reader should check that P~ X AP is a diagonal matrix with the eigenvalues
appearing in the main diagonal.
In Example 2 of Section 6, the matrix

1 1 -1
A = -1 3 -1
-1 2

has one linearly independent eigenvector corresponding to each of its two


eigenvalues. As there are no other eigenvalues, there does not exist a set of
three linearly independent eigenvectors. Thus, the linear transformation
represented by this matrix cannot be represented by a diagonal matrix;
A is not similar to a diagonal matrix.

Corollary 7.2. If a can be represented by a diagonal matrix D, the elements


in the main diagonal of D are the eigenvalues of a. U
Theorem 7.3. If an n X n matrix has n distinct eigenvalues, then A is

similar to a diagonal matrix.


proof. By Theorem 5.6 the n eigenvectors corresponding to the n eigen-
values of A are linearly independent and form a basis. By Theorem 7.1

the matrix representing the underlying linear transformation with respect


to this basis is a diagonal matrix. Hence, A is similar to a diagonal matrix.

Theorem 7.3 is quite practical because we expect the eigenvalues of a


randomly given matrix to be distinct; however, there are circumstances
under which the theorem does not apply. There may not be n distinct
eigenvalues, either because some have algebraic multiplicity greater than
one or because the characteristic equation does not have enough solutions in
the field. The most general statement that can be made without applying
more conditions to yield more results is

Theorem 7.4. A necessary and sufficient condition that a matrix A be


similar toa diagonal matrix is that its minimum polynomial factor into distinct
linear factors with coefficients in F.
7 Similarity
|
115
proof. Suppose first that the matrix A is similar to a diagonal matrix D.
By Theorem 5.3, A and D have the same characteristic polynomial. Since
D is a diagonal matrix the elements in the main
diagonal are the solutions
of the chracteristic equation and the characteristic polynomial must
factor
into linear factors. By Theorem 4.4 the minimum polynomial
for A must
contain each of the linear factors of f(x), although possibly with
lower
multiplicity. It can be seen, however, either from Theorem 4.3
or by direct
substitution, that D
satisfies an equation without repeated factors. Thus,
the minimum polynomial for A has distinct linear factors.
On the other hand, suppose that the minimum polynomial for A is
m(x) = {x - X x ){x - A2 ) • • • {x - Xp ) with distinct linear factors. Let
M* be the kernel of a — X t . The non-zero vectors in M are the eigenvectors t

of a corresponding to X It follows from Theorem 5.6 that a non-zero


vector in
the sum
M cannot
M +M +
x
t

2
t.

be expressed as a
(- M p is direct.
sum of vectors in ^ . M,. Hence,

Let v t = dim M„ that is, vi is the nullity of a - Xt . Since M x



• •
©
Mp c: V we have vx + • • •
+ vv ^ n. By Theorem 1.5 of Chapter II
dim (a — XJV = n — v t = Pi , By another application of the same theorem
we have dim (a - A,){(cr _ A 2
)V} > Pi
- Vj = n - (v, + Vj ).
by repeated application of the same ideas we obtain
Finally, =
dim m(a)V > n - (v x + + Vp ). Thus, v x + + v p = n. This shows
• • • • • •

that M 1 ®-"@M
9 = V. Since every vector in Visa linear combination of
eigenvectors, there exists a basis of eigenvectors. Thus, A is similar to
a
diagonal matrix.

Theorem 7.4 is important in the theory of matrices, but it does not provide
the most effective means for deciding whether a particular matrix is similar
to diagonal matrix. If we can solve the characteristic equation,
it is easier
to try to find the n linearly independent eigenvectors than it is to use Theorem
7.4 to ascertain that they do or do not exist. If we do use this theorem and
are able to conclude that a basis of eigenvectors does exist, the work done in
making this conclusion is of no help in the attempt to find the eigenvectors.
The straightforward attempt to find the eigenvectors is always conclusive
On the other hand, if it is not necessary to find the eigenvectors, Theorem 7.4
can help us make the necessary conclusion without solving the characteristic
equation.
For any square matrix A [a tj ], Tr(A) =
2?=1 a u is called the trace of A. =
It is the sum of the elements in the diagonal of A. Since Tr{AB) =
ItidU °iM = IjLiCZT-i V«) = Tr(BA),

Tr(/>-MP) = Tr(APP-i) = Tr(A). (7.2)

This shows that the trace is invariant under similarity transformations;


116 Determinants, Eigenvalues, and Similarity Transformations |
III

that is, similar matrices have the For a given linear transforma-
same trace.

tion a of V into itself, all matrices representing a have the same trace. Thus
we can define Tr(cr) to be the trace of any matrix representing a.
n ~x
Consider the coefficient of x in the expansion of the determinant of the
characteristic matrix,

#m
#22 "^ a 2n

(7.3)

— X

is from a product of n — 1 of the


~
The only way an x n x can be obtained
diagonal elements, multiplied by the scalar from the remaining diagonal
n~x B-1
is (- l)""
2?=i a u or (- l)
1
element. Thus, the coefficient of x Tr(,4). ,

If/(at) = det (A — xl) is the characteristic polynomial of A, then det A =

/(0) is the constant term of /(a;). If /(a) is factored into linear factors in the,

form
f{x) = (- l) n (x - Itf^x - A 2 )'» •{x- X P Y\ (7.4)

the constant term is YLLi K- Thus det A is the P roduct of the characteristic
values (each counted with the multiplicity with which it is a factor of the
characteristic polynomials). In a similar way it can be seen that Tr(^) is the
sum of the characteristic values (each counted with multiplicity).
We have now shown the existence of several objects associated with a
matrix, or its underlying linear transformation, which are independent of
the coordinate system. For example, the characteristic polynomial, the
determinant, and the trace are independent of the coordinate system.
Actually, this list is redundant since det A is the constant term of the char-
_1
acteristic polynomial, and Tr(^) is (- 1)" times the coefficient of a;"- 1 of the
characteristic polynomial. Functions of this type are of interest because they
contain information about the linear transformation, or the matrix, and they
are sometimes rather easy to evaluate. But this raises a host of questions.
What information do these invariants contain? Can we find a complete
list of non-redundant invariants, in the sense that any other invariant can

be computed from those in the list? While some partial answers to these
questions will be given, a systematic discussion of these questions is beyond
the scope of this book.
7 |
Similarity 117

Theorem 7.5. Let V be a vector space with a basis consisting of eigen-


vectors of g. IfW is any subspace of V invariant under a, then also has a W
basis consisting of eigenvectors of a.
proof. Let a be any vector in W. Since V has a basis of eigenvectors
of a, a can be expressed as a linear combination of eigenvectors of a. By
disregarding terms with zero coefficients, combining terms corresponding
to thesame eigenvalue, and renaming a term like a^, where £ is an eigen- t

^ 0, as an eigenvector with coefficient 1 we can represent a in


vector and a t ,

the form
r

a = 2 £<»
where the £ f are eigenvectors of a with distinct eigenvalues. Let X { be the
eigenvalue corresponding to | t We will . show that each ^ £ W.
(a l 2 )(oc —
^3) (c —
X r )(<x) is in ' • ' — W since W
is invariant under a,

and hence invariant under a — X for any scalar But then (a — X 2 )(a — X 3)
X.
• • •
(a — A r )(a) = (A x — ^X^ — X3 ) • • •
(X x — A r )| x e W, and | x £ since W
(A a — X 2 )(X 1 — X3 ) • • •
{X — X r ) ?± 0. A
x similar argument shows that each
f,eW.
Since this argument applies to any a £ W, W is spanned by eigenvectors
of (T. Thus, W has a basis of eigenvectors of a. D
Theorem 7.6. Let V be a vector space over C, the field of complex numbers.
Let a be a linear transformation of V into itself V has a basis of eigenvectors
for a if and only iffor every subspace S invariant under a there is a subspace T
invariant under a such that V = S © 7.
proof. The theorem is obviously true if V is of dimension 1. Assume
the assertions of the theorem are correct for spaces of dimension less than n,
where n is the dimension of V.

Assume first that for every subspace S invariant under a there is a com-
plementary subspace T also invariant under a. Since V is a vector space over
the complex numbers a has at least one eigenvalue X ± Let a 2 be an eigenvector .

corresponding to Xj. The subspace S 2 = <a x > is then invariant under a.


By assumption there is a subspace 7^ invariant under a such that V = S x © Tj.
Every subspace S 2 of Tx invariant under Ra is also invariant under a. Thus
there exists a subspace T2 of V invariant under a such that V = S 2 © T2 .

Now S 2 c Tx and 7a = S 2 © (T2 n TJ. (See Exercise 15, Section 1-4.) Since
T2 n 7X is invariant under c, and therefore under Ra, the induction
assumption holds for the subspace Tx Thus, Tj has a basis of eigenvectors, .

and by adjoining a x to this basis we obtain a basis of eigenvectors of V.


Now assume there is a basis of V consisting of eigenvectors of a. By
theorem 7.5 any invariant subspace S has a basis of eigenvectors. The method
118 Determinants, Eigenvalues, and Similarity Transformations |
III

of proof of Theorem 2.7 of Chapter I (the Steinitz replacement theorem)


will yield a basis of V consisting of eigenvectors of o, and this basis will con-
tain the basis of S consisting of eigenvectors. The eigenvectors adjoined
will span a subspace 7", and this subspace will be invariant under a and
complementary to S.

EXERCISES
1. For each matrix A given in the exercises of Section 6 find, when possible,
a non-singular matrix P for which P~ 1 AP is diagonal.
"1 c
2. Show that the matrix where c ?* is not similar to a diagonal matrix
1

3. Show that any 2x2 matrix satisfying x + = 2


1 is similar to the matrix
"0 -1"

4. Show that if A is non-singular, then AB is similar to BA.


5. Show that any two projections of the same rank are similar.

*8 I The Jordan Normal Form

A normal form that is obtainable in general when the field of scalars is

the field of complex numbers is known as the Jordan normal form. An


application of the Jordan normal form to power series of matrices and sys-
tems of linear differential equations is given in the chapter on applications.
Except for these applications this section can be skipped without penalty.
We assume that the field of scalars is the field of complex numbers. Thus
for any square matrix A the characteristic polynomial/^) factors into linear
factors, f(x) = (x — X-^ ri {x — A 2 ) r2 — (x — X v ) r * where X ^ A, for i ^j

t

and r is the algebraic multiplicity of the eigenvalue X The minimum poly-


t t
.

nomial m(x) for A is of the form m(x) = (x — X 1 ) Sl (x — A 2 ) S2 (x — X p ) s v • • -

where 1 < ^ < r^


In the theorems about the diagonalization of matrices we sought bases
made up of eigenvectors. Because we are faced with the possibility that
such bases do not exist, we must seek proper generalizations of the eigen-
vectors. It is more fruitful to think of the eigenspaces rather than the
eigenvectors themselves. An eigenvalue is a scalar X for which the linear
transformation a — X is singular. An eigenspace is the kernel (of positive
dimension) of the linear transformation a — X. The proper generalization
of eigenspaces turns out to be the kernels of higher powers of a — X. For a
given eigenvalue X, let A4 be the kernel of (a — X) k Thus, M° = {0} and M 1
fc
.
8 |
The Jordan Normal Form 119

is X. For a e M k (a - X)* +1 (a) = (a - X){a - Xf(<x) =


the eigenspace of ,

(a - Xj(0j= 0. Hence, M c M k+K Also, for a e M (a - Xf(a - A) (a) = fc fc+1


,

(a - A) (a) = fc+1
so that (a - A)(a) 6 M k
Hence, (a - A)/^ c M*. .
1

Since all M*
y and V is finite dimensional, the sequence of subspaces
<=

M° c M 1 cz M 2 c must eventually stop increasing. Let f be the smallest


• • •

index such that M k = M< for all k > t, and denote M* by M (A) Let m k be the .

dimension of M k and w the dimension of M (A) t .

Let ((T - XfV = k


Then k+ l =
W
(a - X) k+1 V = (a - ^)*{(ct - A)V} <=
. W
(a — X) k V = k
W
Thus, the subspaces .
k
form a decreasing sequence W
VV^W 1
^ 2 =5 W
Since the dimension of W* is « — w we see that
• • •
.
fc ,

W k
= W
for all k > t. Denote W* by
M)
W .

Theorem 8.1. V is the direct sum of M (X) and W (A) .

proof. Since (a - X)W* = (a - Xy+W = W^ = 1


W* we see that a - X
is non-singular on W* = W (A) . Now let a be any vector in V. Then
(a — A)'(a) =
an element of /5 (A) Because (c — A)' is non-singular on
is W .

W (A) there is a unique vector y e (A) such that (<r — A)'(y) = /?. Let W
a — y be denoted by (5. It is easily seen that d e M (A) Hence V = M (A) + U) . W .

Finally, since dim M (A) = w, and dim (A)


= n — m the sum is direct. W t,

In the course of defining M k and W k


we have shown that

(1) (a - X)M k+* c M* c M*+i,


(2) (<r - A)W = W^ c W*. fc 1

This shows that each is invariant under a — X. It follows Mk and W fc

immediately that each is invariant under any polynomial in a — X, and


hence also under any polynomial in a. The use we wish to make of this
observation is that if fx is any other eigenvalue, then a — jj, also maps
M (A) and W (A) into themselves.
Let A x , . . . , X v be the distinct eigenvalues of or. Let M {
be a simpler
notation for the subspace M (X .
}
defined as above, and let W, be a simpler
notation for W (A )#

Theorem 8.2. For A t -


^ X jf M< c W..
proof. Suppose a e M,. is in the kernel of a — A,-. Then

(/, - A^'a = {(a - A,) - (a - *,)}*(a)

= (a - A,)«(a) +|(_1)^W - A,)"- fc


((T - A,)*(a).

The first term is zero because a e M i9 and the others are zero because a
is in the kernel of a — A,-. Since X j — A, ^ 0, it follows that a = 0. This
means that a — A3 - is non-singular on M^, that is, a — Xj maps M. onto
120 Determinants, Eigenvalues, and Similarity Transformations |
III

itself. Thus M i
is contained in the set of images under (a — 1,)'% and hence
M c t
W,.

Theorem 8.3. V = M ©M © © Mp
± 2
• • •
.

proof. Since V = M © VV and M <=


2 2 2 W lt we have V =M 2
V^ =
Mi® {M 2 ® (W n W )}. 1 2 Continuing in the same fashion, we get V =
M © x© M p © {W n• • •
l
• • • n W p }. Thus the theorem will follow if
we can show that = x n W
n p = {0}. By an extension of remarks W • • •
W
already made (<r - A x ) (a — X v ) = q{a) is non-singular on W; that is, • • •

q(a) maps W
onto itself. For arbitrarily large k, [q{o)f also maps onto W
itself. But
contains each factor of the characteristic polynomial f(x)
<7(x)

so that for large enough k, [q{x)f is divisible by f(x). This implies that
W= {0}.

Corollary 8.4. = Stfor = 1,


t p. i

W
. . .
t ,

proof. Since =M © V © M p and (a — x


• • •
vanishes on M i5
it

follows that (a — X )^ (a — X v )^ vanishes on


x
• • •
all of V. Thus (a; —X x)
tl

• • • (x — by the minimum polynomial and s t < t


Aj,)'" is divisible t
.

On the other hand, if for a single / we have s t < t t there is an a e M i ,

such that (a — A ) Si (a) =^ 0. For all X 5 ^ A t a — A, is non-singular on


2 ,

M Hence m(a) ^ 0. This is a contradiction so that t t = s t D


{
. .

Let us return to the situation where, for the single eigenvalue X, M k is the
kernel of (a - Xf and W = k - X) k V.
(a In view of Corollary 8.4 we let
s be the smallest index such that M k = M s
for all k > s. By induction we
can construct a basis {a l9 . . . , a m } of M x such that {a 1? . . . , a TO } is a basis
of Mk .

We now proceed step by step to modify this basis. The set {a TO i+1 ,
. . . ,a m }
consists of those basis elements in M s
which are not in M s_1 . These
elements do not have to be replaced, but for consistency of notation we
change their names; let a m$ _ i+v = m§ _ i+ ,. Now set {a - X)(Pm _ 1+ ,) =
&._,+, an d consider the set {a l5 a Ws2 } u {(3 ms _ 2+1 P ms _ 2+rils_ ms J. . . . , , . . .
,

We wish to show that this set is linearly independent.


If this set were linearly dependent, a non-trivial relation would exist and
it would have to involve at least one of the with a non-zero coefficient &
since the set {a x ., a is linearly independent. But then a non-trivial
TOg _J , . .

linear combination of the & would be an element of M $ ~ 2


and (a X) — s~ 2 ,

would map this linear combination onto 0. This would mean that (or — A) s_1
would map a non-trivial linear combination of {<xm +1 <x
m } onto 0. , . . . ,

Then this non-trivial linear combination would be in M s_1 which would ,

contradict the linear independence of {a l5 a }. Thus the set {a x . . . , , . . .


,

a™ 8 _J {&»,_,+!, u Pms „ 2+ms-m s • • • »


J is linearly independent.
This linearly independent subset of M s_1 can be expanded to a basis of
Ad s_1 We use /Ts to denote these additional elements of this basis, if any
,

8 |
The Jordan Normal Form 121

additional elements are required. Thus we have the new basis {a l5 . . .

We now set (a —
Pms- 3 +v anc P rocee d as before to obtain
A)(/9
TOs _ 2+ „)
= *

a new basis {a l5 . . .
Pm ,J of M
,
s~ 2
a B( JU {/3 mj _ 3+ V • • • , .

Proceeding in this manner we finally get a new basis {&,..., j5 } of


TO
M (A) such that {&, ftj is a basis of M« and (p - X)(Pmk+v ) = ?lk _ 1+v
. . .
,

for k > 1 This relation can be rewritten in the form


.

a (Pm k+v) ~ Wmk+V + Pmk _ 1+V


for k > 1, (8.1)
for v < wv
Thus we see that in a certain sense Pmk+V is "almost" an eigenvector.
This suggests reordering the basis vectors so that {/?!, /5 TOi+1 , . . .
,
/S
TOs _ i+1 }
are listed first. Next we should like to list the vectors {/3 2 , pmi+2 , . . .}, etc.

The general idea is to list each of the first elements from each section of the

/Ts, then each of the second elements from each section, and continue until
a new ordering of the basis is obtained.
With the basis of M (A) listed in this order (and assuming for the moment
that that A1 (A) is all of V) the matrix representing a takes the form

~X 1 • • •

A 1 • • •

I •••

s rows all zeros all zeros

•• •
X 1


•• X

X 1
•••

<s rows *
all zeros all zeros

••• X

all zeros all zeros etc.


122 Determinants, Eigenvalues, and Similarity Transformations |
III

Theorem 8.5. Let A be a matrix with characteristic polynomial f{x) =


(x — A 1 ) ri • ' • (x — 2.
p)
rp
and minimum polynomial m(x) = (x — A2 ) Sl • • •

(x — A P) Sv
. A is similar to a matrix J with submatrices of the form

1 •• 0"
'k

K 1 ••

K ••

B< =

••
K 1

- K_
along the main diagonal. All other elements of J are zero. For each X there
t

at least one B{ of order s<. All other B corresponding to this A are


{ t of order
less than or equal to s t The number of Bt corresponding to this A f is equal
.

to the geometric multiplicity of X The sum of the orders of all the B t corre- .
t

sponding to X t is rt While the ordering of the B along the main diagonal


.
t

of J is not unique, the number ofB of each possible order is uniquely determined
t

by A. J is called the Jordan normal form corresponding to A.


proof. From Theorem 8.3 we have V = M e M„. In the dis- • • •
x
cussion preceding the statement of Theorem 8.5 we have shown that each
M; has a basis of a special type. Since V is the sum of the M
if the union of
these bases spans V. Since the sum is direct, the union of these bases is
linearly independent and, hence, a basis for V. This shows that a matrix
J of the type described in Theorem 8.5 does represent a and is therefore
similar to A.
The discussion preceding the statement of the theorem also shows that
the dimensions m
ik of the kernels of the various (a X t) k determine Mf —
the orders of the B A
determines a and a determines the subspace
t
in /. Since
M 4
* independently of the bases employed, the B
t
are uniquely determined.
Since the X i appear along the main diagonal of / and all other non-zero
elements of / are above the main diagonal, the number of times x — A
t

appears as a factor of the characteristic polynomial of /is equal to the number


of times X t appears in the main diagonal. Thus the sum of the orders of the
Bi corresponding to 2. { is exactly rt This establishes all the statements of .

Theorem 8.5.

Let us illustrate the workings of the theorems of this section with some
examples. Unfortunately, it is a little difficult to construct an interesting
8 I The Jordan Normal Form 123

example of low order. Hence, we give two examples, The first example
illustrates the choice of basis as described for the space M {X)
. The second
example illustrates the situation described by Theorem 8.3.

Example 1. Let
~ 0-1
1 1
0~

-4 1-3 2 1

A = -2-1 1 1

-3 -1 -3 4 1

_-8 -2 -7 5 4_
The first step is to obtain the characteristic matrix
-
~\-x -1 1 o

-4 1 -a; -3 2 1

C{x) = -2 -1 -x 1 1

-3 -1 -3 4 - x 1

_ ~8 -2 -7 5 4 — x
Although it is tedious work we can obtain the characteristic polynomial
fix) = (x — 2)
5
. We have one eigenvalue with algebraic multiplicity 5.

What is and what is the minimum equation for


the geometric multiplicity
A1 Although there is an effective method for determining the minimum
equation, it is less work and less wasted effort to proceed directly with
determining the eigenvectors. Thus, from

-1 0-1 1
0"

-4 -1-3 2 1

C(2) = -2 -1 -2 1 1

-3 -1-3 2 1

-8 -2 -7 5 2_

we obtain by elementary row operations the Hermite normal form

-1
124 Determinants, Eigenvalues, and Similarity Transformations |
III

From this we learn that there are two linearly independent eigenvectors
corresponding to 2. The dimension of M 1
is 2. Without difficulty we find
the eigenvectors
aa = (0,-1,1,1,0)
a2 = (0, 1,0,0, 1).

Now we must compute (A — 21f = (C(2)) 2


, and obtain

0"

{A - 2/)
2
= -1 0-1 1

-1 0-1 1 o

1 -1 1

The rank of {A - 2/)


2
is 1and hence M is of dimension 4. The a x and <x 2
2

we already have are in 2


and M we must obtain two more vectors in M 2
which, together with a x and a 2 will form an independent set.
, There is
quite a bit of freedom for choice and
a3 = (0,1,0,0,0)
a4 = (-1,0,1,0,0)
appear to be as good as any.
Now {A — 2/) 3 = 0, and we know that the minimum polynomial for A
is (x — 2)
3
. We have this knowledge and quite a bit more more for less work
than would be required to find the minimum polynomial directly. We see,
then, that M 3 = V and we have to find another vector independent of a l5
a 2 a 3 and a4 Again, there are many possible choices. Some choices will
, , .

lead to a simpler matrix of transition than will others, and there seems to
be no very good way to make the choice that will result in the simplest
matrix of transition. Let us take
<x 6 = (0,0,0,1,0).
We now have the basis of {a,, a 2 a 3 a 4 a 5 } such that {a 1? a 2 } is a basis
, , ,

of M 1 {a x a 2 a 3 a 4 } is a basis of M 2 and {a l5 a 2 a 3 a a } is a basis of


, , , , , , ,
4 5 ,

M 3 Following our instructions, we set /?5 = <x5 Then


. .

(A--21) ~0~ ~1~

= 1

1
*
_0_ _5_
8 j The Jordan Normal Form 125

Hence, we set fa = (1, 2, 1, 2, 5). Now we must choose fa so that {a l3


a 2> fa> &} is a basis for M 2 We can choose fa = (—1,0, 1,0, 0). Then
.

~0~
(A - 21) ~r {A - 21) ~-r ~o

2 i

1 = 1 i =
2 1

_5_ _1_ ^
o_ 1

Hence, we choose fa = (0, 0, 1, 1, 1) and fa = (0, 1, 0, 0, 1). Thus,

"0
1
-1"

2 1

P= 1 1 1

1 2 1

1 5 1

is the matrix of transition that will transform A to the Jordan normal form

"2 1
0~

2 1

= 2

2 1

_0 2_

Example 2. Let
"5 -1 -3 2 -5"

A = 1 1 1 -2
-1 3 1

1 -1 -1 1 1

The characteristic polynomial is f{x) = —(x — 3


2) (x — 3)
2
. Again we have
126 Determinants, Eigenvalues, and Similarity Transformations |
HI

repeated eigenvalues, one of multiplicity 3 and one of multiplicity 2.

"3 -1 -3 2 -5~

C(2) = 1 -1 1 -2
0-1 1 1

1 -1 -1 1 -1_

from which we obtain the Hermite normal form

1 -1 -2~
1 -1
1

0_

Again, the geometric multiplicity is less than the algebraic multiplicity. We


obtain the eigenvectors
ax = (1,0,1,0,0)

a2 = (2,1,0,0,1).

Now we must compute (A — 2/)


2
. We find

~1 -1 -2"

(A - 2/)
2
=
1 -2 -1 2

1 -1 -1 1 -1

from which we obtain the Hermite normal form

~l 0-1 -2"

1 -1 -1
8 I The Jordan Normal Form 127

For the third basis vector we can choose


a3 = (0, 1,0, 1,0).
Then
- ~0~ ~1~
(A 21)

= 1

_0_ _0_
hence, we have /3 3 = a3 , /3 X = a 1( and we can choose /? 2 = a2 .

In a similar fashion we find &= (-1, 0, 0, 1, 0) and = (2, 0, 0, 0, 1)


&
corresponding to the eigenvalue 3. /? 4 is an eigenvector and {A — 3I)fi 5 = |S 4 .
chapter
IV Linear
functionals,
bilinear forms,
quadratic forms

In this chapter we study scalar- valued functions of vectors. Linear functional


are linear transformations of a vector space into a vector space of dimension
1. As such they are not new to us. But because they are very important, they

have been the subject of much investigation and a great deal of special
terminology has accumulated for them.
For the first time we make use of the fact that the set of linear transforma-
tions can profitably be considered to be a vector space. For finite dimensional
vector spaces the set of linear functionals forms a vector space of the same
dimension, the dual space. We are concerned with the relations between
the structure of a vector space and its dual space, and between the representa-
tions of the various objects in these spaces.
In Chapter V we carry the vector point of view of linear functionals one
step further by mapping them into the original vector space. There is a
certain aesthetic appeal in imposing two separate structures on a single
vector space, and there is value in doing it because it motivates our con-
centration on the aspects of these two structures that either look alike or
are symmetric. For clarity in this chapter, however, we keep these two
structures separate in two different vector spaces.
Bilinear forms are functions of two vector variables which are linear in
each variable separately. A quadratic form is a function of a single vector
variable which is obtained by identifying the two variables in a bilinear
form. Bilinear forms and quadratic forms are intimately tied together,
and this is the principal reason for our treating bilinear forms in detail.
In Chapter VI we give some applications of quadratic forms to physical
problems.
If the field of scalars is the field of complex numbers, then the applications

128
1 Linear Functionals 129
I

we wish to make of bilinear forms and quadratic forms leads us to modify


the definition slightly. In this way we are led to study Hermitian forms.
Aside from their definition they present little additional difficulty.

1 I Linear Functionals

Definition. Let V be a vector space over a field of constants F. A linear


transformation of V into F is called a linear form or linear functional on V.
cf>

Any field can be considered to be a 1 -dimensional vector space over itself

(see Exercise 10, Section 1-1). example, to imagine two


It is possible, for

copies of F, one of which we label U. We retain the operation of addition in


U, but drop the operation of multiplication. We then define scalar multi-
plication in the obvious way the product is computed as if both the scalar
:

and the vector were in the same copy of F and the product taken to be an
element of U. Thus the concept of a linear functional is not really something
new. our familiar linear transformation restricted to a special case.
It is

Linear functionals are so useful, however, that they deserve a special name
and particular study. Linear concepts appear throughout mathematics
particularly in applied mathematics, and in all cases linear functionals play

an important part. It is usually the case, however, that special terminology


isused which tends to obscure the widespread occurrence of this concept.
The term "linear form" would be more consistent with other usage
throughout this book and the history of the theory of matrices. But the
term "linear functional" has come to be almost universally adopted.

Theorem is a vector space of dimension n over F, the set of all


1.1. If V
linear functionals on V is a vector space of dimension n.

proof. If and ip are linear functionals on V, by


<f>
+ tp we mean the <f>

mapping defined by (<f> + ^)(a) = <£(a) + y>(a) for all a e V. For any
a e F, by a<f> we mean the mapping defined by (a<£)(a) = a [<£(<*)] for all
a e V. We must then show that with these laws for vector addition and
scalar multiplication of linear functionals the axioms of a vector space are
satisfied.
These demonstrations are not difficult and they are left to the reader.
(Remember that proving axioms Al and B\ are satisfied really requires
showing that + y> and as defined, are linear functionals.)
<j> a<j>,

We call the vector space of all linear functionals on V the dual or conjugate
space of V and denote it by V (pronounced "vee hat" or "vee caret"). We have
yet to show that Vis of dimension n. Let A = {a l5 a a w } be a basis of V. 2, . . . ,

Define fa by the rule that for any a = JjLi **<**, &( ) a = a e F We sha11 i -

call <f>i
the ith coordinate function.
130 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

2?=i V*> =
= £Li b^ we have (P) = b, and &(a + 0) = &2JU
For any
<MIti («* + ^K> =
/8

a< + ^ = &(a) + &(£)• Also &(aa)


+
=
<f> t ^
&{ X"=i a a/) = &Gt£=i «^ a^} = «o» = ^t(a)- Thus <^ is a linear func-
fl
y

tional.
Suppose that 2»=1 Z>^. = 0. Then Q>=1 ^)(a) = for all a e V.
In particular for a< we have (2"=1 ^XaJ = 2*=xM*( a = *< = <) °- Hence,
all Z)
t
= and the set {<{>!, <f> 2 , , <f> n) must be linearly independent. On the
other hand, for any <£ e V and any a = JjL x «*<**
e V, we have
\ n
In

If we let ^(aj = 6 if then for ^J=i ^</v we have

(iM/W) = 2 MX") = 2 My = #«)• (L2)


\y=i y=i y=i /
A A A.

Thus the set {<f> x , . . .


, n} = A spans V and forms a basis of V. This shows
that V is of dimension n.
A. A.

The basis A of V that we have constructed in the proof of Theorem 1.1


has a very special relation to the basis A. This relation is characterized by
the equations

4>iM = tii> (1-3)

for all i, j. In the proof of Theorem 1 . 1 we have shown that a basis satisfying
these conditions exists. For each i, the conditions in Equation (1.3) specify
the values of ^> t on all the vectors in the basis A. Thus </>; is uniquely deter-
mined as a linear functional. And thus A is uniquely determined by A and the
conditions (1.3). We call A the basis dual to the basis A.

<K*t) = 2 bdlcLj) = b,
i=l

so that, as a linear transformation, <f>


is represented by the 1 x n matrix
\b x ' * *
b n ]. For this reason we choose to represent the linear functionals in
A A A
V by one-row matrices. With respect to the basis A in V, <f> 2?=i b^i w iU =
be represented by the row [b x • • •
bn ] = B. It might be argued that, since V
isa vector space, the elements of V should be represented by columns. But
the set of all linear transformations of one vector space into another also
forms a vector space, and we can as justifiably choose to emphasize the aspect
of V as a set of linear transformations. At most, the choice of a representing
notation is a matter of taste and convenience. The choice we have made
means that some adjustments will have to be made when using the matrix
1 I Linear Functionals 131

of transition to change the coordinates of a linear functional when the basis


is changed. But no choice of representing notation seems to avoid all such
difficulties and the choice we have made seems to offer the most advantages.
If the vector £ e V is represented by the n-tuple (x lt . . . ,
xn) = X, then
we can compute <£(£) directly in terms of the representations.

n n

3 =1

= [b t •b n]

= BX. (1.4)

EXERCISES
1. Let A = {a l5 <x
2,
a 3 } be a basis in a 3-dimensional vector space V over R.
A. Any vector I G V can be written in
Let A ={<f> x 2 3 } be the basis in V dual to
, <f>
, <f>

the form £ = x 1 a 1 + z 2 a 2 + xa a. 3 Determine which of the following functions


.

on V are linear functionals. Determine the coordinates of those that are linear
.A.

functionals in terms of the basis A.


(a) <£(£) = *i + x 2 + x z-

(b) $(!;) =(x1_ + x 2 )\


(c) <£(!) = ^2x v
=x 2 - %x v
m =x -l
{d) <£(£)
(e) 2
A
R3
2. For each of the following bases of R3 determine the dual basis in .

(a) {(1,0,0), (0,1,0), (0,0, 1)}.


(b) {(1,0,0), (1,1,0), (1,1,1)}.
(c) {(1,0, -1), (-1,1,0), (0,1,1)}.
3. Let V = Pn , the space of polynomials of degree less than n over R. For a
fixed aeR, let </>(/?) = p ik) (a), where p ^(x) (
is the fcth derivative of p(x)ePn .

Show that <f>


is a linear functional.

4. Let V be the space of real functions continuous on the interval [0, 1], and let

g be a fixed function in V. For each/e V define

fWOdt.
Jo
132 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

Show that Lg is a linear functional on V. Show that if L g {f) = for every geV,
then/ = 0.

^ 5. Let A = {04, . . . , a n } be a basis of V and {^, n } be the basis of


let A = . . . , <f>

V dual to the basis A. Show that an arbitrary a g V can be represented in the form

n
a = 2 &( a ) a i-

6. Let V be a vector space of finite dimension n > 2 over F. Let a and /S be two
vectors in V such that {a, p) is linearly independent. Show that there exists a
linear functional <f>
such that <£(<*) = 1 and <f>(P)
= 0.

7. Let V =
Pn the space of polynomials over F of degree less than n(n > 1). Let
,

a g F be any scalar. For each p(x) e Pn p(a) is a scalar. Show that the mapping ,

of p(x) onto p(a) is a linear functional on P„ (which we denote by a a ). Show that if


a 7^ b then <r a ^ cr
b.

8. (Continuation) In Exercise 7
A.
we showed that for each ae F there is defined
a linear functional <r
a e Pn . Show that if « > 1 , then not every linear functional in
Pn can be obtained in this way.

9. (Continuation) Let {a x , . . . , a n } be a set of n distinct scalars. Let/0») =


(x - a x ){x - a2) • • • (x - a n ) and hk (x) =f(x) =f(x)/(x - a k ). Show that hk {a ) 3
=
^ikf'i^j), where/'(#) is the derivative off(x).
10. (Continuation) For the a k given in Exercise 9, let

3
°'"
/'K>

Show that {ct


x , . . . ,
an } is and a basis of Pn Show that
linearly independent .

{h x {x), . . . , /?„(x)} is linearly independent and, hence, a basis of Pn (Hint: Apply .

aj to
2fc=i ^fc^/cC^)-) Show that {ffj, a n ) is the basis dual to {h^ix),
. . . ,/? (a;)}
n . . . ,

(Continuation) Let p(x) be any polynomial in Pn


11. . Show that p(x) can be
represented in the form

« /?(a fc )

*;=i / ^^
(Hint: Use Exercise 5.) This formula is known as the Lagrange interpolation
formula. It yields the polynomial of least degree taking on the n specified values
{p(a 1 ), . . .
, p(a n )} at the points {a lf . . . , an ).
12. Let W be a proper subspace of the »-dimensional vector space V. Let a be
a vector in V but not in W. Show that there is a linear functional <£ G y such that
<A(a ) = 1 and 0(a) = for all a g W.
13. Let W be a proper subspace of the M-dimensional vector space V. Let y>

be a linear functional on W. It must be emphasized that y> is an element of W


2 |
Duality 133
a a
and not an element of V. Show that there is at least one element <f>
e V such that
coincides with y> on W.
<f>

14. Show that if a ^ 0, there is a linear functional <f>


such that <f>(a) ^ 0.

15. Let a and /S be vectors such that <£(/?) = implies <f>(a) = 0. Show that a is

a multiple of /3.

2 I Duality

Until now, we have encouraged an unsymmetric point of view with respect


to V and V. Indeed, it is natural to consider <£(<x) for a chosen <f>
and a range
of choices for a. However, there is no reason why we should not choose a
fixed a and consider the expression <£(a) for a range of choices for <f>.
Since
iPi^i + ^2^2)( a ) = (£i<£i)(a) + (6 2 <£ 2 (a), we see that a behaves like a linear
functional on V.

This leads us to consider the space V of all linear functional on V. Corre-


sponding to any a £ V we can define a linear functional a in V by the rule
a(<£) = <j>((x.) for all <f>
£ V. Let the mapping defined by this rule be denoted
by J, that is, 7(a) = a. Since /(aa + bfi)(<f>) = <f>(a<x. + fyff) = fl^(a) +
60(0) = aJ(a)(<f>) + &/(j8)(0) = [aJ(a) + bJ(p)](cf>) we see that / is a linear

transformation mapping V into 9.

Theorem 2.1. If V is finite dimensional, the mapping J of V into V is a


one-to-one linear transformation of V onto V.
proof. Let V be of dimension n. We have already shown that J is linear
and into. If 7(a) = then /(a)(0) = for all <f>
£ V. In particular, 7(a)(^) =
for the basis of coordinate functions. Thus if a = 2X X
a x i we see tnat
i

J(a)(6) = &(a) = 5>,6(a,) = a, =


for each / = 1, . . . , n. Thus a = and the kernel of / is {0}, that is, J{V)
A it
is of dimension n. On the other hand, if V is of dimension n, then V and V
are also of dimension n. Hence J(V) = V and the mapping is onto.

If the mapping J of V into V is actually onto V we say that V is reflexive.

Thus Theorem 2.1 says that a finite dimensional vector space is reflexive.
Infinite dimensional vector spaces are not reflexive, but a proof of this
assertion is beyond the scope of this book. Moreover, infinite dimensional
vector spaces of interest have a topological structure in addition to the
algebraic structurewe are studying. This additional condition requires
a more restricted definition of a linear functional. With this restriction
134 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

the dual space is smaller than our definition permits. Under these condition
it is again possible for the dual of the dual to be covered by the mapping J.
Since J is onto, we identify V and J(V), and consider V as the space of
linear functionals on V. Thus V and V are considered in a symmetrical
position and we speak of them as dual spaces. We also drop the parentheses
from the notation, except when required for grouping, and write </><x instead
of </>(<x). The bases {a. x a. } and {(/>!,...,
n n } are dual bases if and only
, . . . ,
<f>

if ^a, = 6 it .

EXERCISES
1. Let A = {oL lf . . . , a„} be a basis of V, and let A = {<f> lt . . . , <£ n} be the basis of
A A.

V dual to the basis A. Show that an arbitrary <f>eV can be represented in the form

2. Let V be a vector space of finite dimension n > 2 over F. Let </> and y> be two
linear functionals in V such that {<f>, y>} is linearly independent. Show that there
exists a vector a such that <£(a) = 1 and y(a) = 0.

3. Let <f>
be a linear functional not in the subspace S of the space of linear
functionals V. Show that there exists a vector a such that <£ ( a) = 1 and <£(<x) =0
for all <f>eS.

4. Show that if <f>


¥> 0, there is a vector a such that <£(a) ^ 0.

5. and y be two linear functionals such


Let <f>
that <f>{a) = implies tp(cc) = 0.
Show that v is a multiple of <f>.

3 I Change of Basis

If the basis A' = {04, x'2 , . . . , a'J is used instead of the basis A =
{a l5 a 2 , . . . , a n }, we ask how the dual basis A' = {<f>[,
. . . ,
<f>' n } is related to
the dual basis A = {</> l5 . . . , cf> n }. Let i> = [p {j ] be the matrix of transition
from the basis = ^Li/?^. Since </>j(a =
A to the basis A'. Thus a J ')
3

SU/VMO ^= ^xPa^r A This means that P T is the


=/>« we see that

matrix of transition from the basis A' to the basis A. Hence, (P 2 -1 = (P~ ) T ') X
A A
is the matrix of transition from A to A'.

Since linear functionals are represented by row matrices instead of column


matrices, the matrix of transition appears in the formulas for change of
coordinates in a slightly different way. Let B= [b x • •
b n ] be the representa-
tion of a linear functional <f>
with respect to the basis A and B' = [b[ • • •
b'n ]
3 |
Change of Basis 135

be its representation with respect to the basis A'. Then

n in

=i(i^A^;- (3-D
Thus,
B' = BP. (3.2)

We are looking at linear functionals from two different points of view.


Considered as a linear transformation, the effect of a change of coordinates
is given by formula (4.5) of Chapter II, which is identical with (3.2) above.

Considered as a vector, the effect of a change of coordinates is given by


formula (4.3) of Chapter II. In this case we would represent 4> by B T since ,

vectors are represented by column matrices. Then, since (P ) r is the


_1

matrix of transition, we would have

bt = (p-yB ,T = (B'p-y,
or (3.3)
B = B'P~\
which is equivalent to (3.2). Thus the end result is the same from either
point of view. It is this two-sided aspect of linear functionals which has
made them so important and their study so fruitful.
Example 1. In analytic geometry, a hyperplane passing through the
origin is the set of all points with coordinates (x x x 2 , x n ) satisfying an
, . . . ,

equation of the form b^ + b2 x2 + • • •


+ b n x n = 0. Thus the «-tuple
[b x b 2 • • •
b n ] can be considered as representing the hyperplane. Of course,
a given hyperplane can be represented by a family of equations, so that
there is not a one-to-one correspondence between the hyperplanes through
the origin and the w-tuples. However, we can still profitably consider the
space of hyperplanes as dual to the space of points.
Suppose the coordinate system is changed so that points now have the
coordinates (y l5 y n ) where x . . .
, t
= 2"=i aaVy Then the equation of the
hyperplane becomes
n n - n -

2 b x = i=l
i=l
1b i i i
_3'
^aaVi
=1
n - n
= 1 2 b aa i y,
i=1 _*'=!
n
= 2^, = o. (3.4)
j=l
136 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

Thus the equation of the hyperplane is transformed by the rule c, =


2*=i ^i a a-
Notice that while we have expressed the old coordinates in terms of the new
coordinates we have expressed the new coefficients in terms of the old
coefficients. This is typical of related transformations in dual spaces.
Example 2. A much more illuminating example occurs in the calculus of
functions of several variables. Suppose that w is a function of the variables
#!, x2 , . . . , xn , w =f(x x ,
x2 , . . . , x n ). Then it is customary to write down
formulas of the following form:
'

= dw dw 9w
,

dw
,
ax x H dx 2
,

+ • • •
.

-\ dx n
,

, (3.5)
dx x dx 2 dx n
and

\dxx
'
dx 2 '
' dxj
dw is and Viv is usually called the gradient
usually called the differential of w,
of w. customary to call Viv a vector and to regard dw as a scalar,
It is also
approximately a small increment in the value of w.
The difficulty in regarding Vw as a vector is that its coordinates do not
follow the rules for a change of coordinates of a vector. For example, let

us consider (x x , x2 , . . . , x n ) as the coordinates of a vector in a linear vector


space. This implies the existence of a basis {a 1? . . . , a n } such that the linear
combination

I =2 ^ (3-7)

is the vector with coordinates (x x , x2 , . . . , x n ). Let {fix, . . .


, /?„} be a new
basis with matrix of transition P= [p i}] where
n

Pi = lPifi<- (3-8)

Then, if £ = 2"=1 2/*& is the representation of | in the new coordinate system,


we would have
n
xi=^ViiV^ (3-9)

or

*« = i:r^ (3-10)

Let us contrast this with the formulas for changing the coordinates of Vw.
From the calculus of functions of several variables we know that


dw ~ dw dx i
=2, • (3-H)
dy, i=idx{ dy 5
3 |
Change of Basis I37
This formula corresponds to (3.2). Thus Vw changes coordinates as if it were
in the dual space.
In vector analysis it is customary to call a vector whose coordinates
change
according to formula (3.10) a contravariant vector, and a vector
whose
coordinates change according to formula (3.11) a covariant vector.
'
The
~dxl 'dyt
reader should verify that if P = then = (P T)~\ Thus
-dx 4 _
Thus (3.11) is equivalent to the formula

dw^ dw dy*
r=2rT' 3 12 )
-

From the point of view of linear vector spaces it is a mistake to regard


both types of vectors as being in the same vector space. As a matter of fact,
their sum is not defined. It is clearer and more fruitful
to consider the co-
variant and contravariant vectors to be taken from a pair of dual spaces.
This point of view is now taken in modern treatments of advanced
calculus
and vector analysis. Further details in developing this point of view are given
in Chapter VI, Section 4.
In traditional discussions of these topics, all quantities that are
represented
by n-tuples are called vectors.
In fact, the «-tuples themselves are called vectors. Also, it is
customary
to restrict the discussion to coordinate changes in which
both covariant
and contravariant vectors transform according to the same formulas. This
amounts to having P, the matrix of transition, satisfy the condition (P-X r =
)
P. While this does simplify the discussion it makes it almost impossible to
understand the foundations of the subject.
Let A = {«!, . . . , a J be a basis of V and let A = {<f> x , ...,<£ J be the dual
basis in V. Let 8 = {fi lt . . .
, pj be any new basis of V. We are asked to find
the dual basis /S in V. This problem is ordinarily posed by giving the repre-
sentation of the fa with respect to the basis A and expecting the representations
of the elements of the dual basis with respect of A. Let the
fa be represented
with respect to A in the form

n
Pi = I,Pii<Xi> (3.13)

and let

*Pi=2qif<f>i (3.14)
3=1

be the representations of the elements of the dual bases B = {y^, . . .


, y>J.
138 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

Then

<5*i = Vtft = ( 2 «*<&)( %Pn*\


n n

i=l j=l
n
= J 1kiPa- f
(3-15)
i=l

In matrix form, (3.15) is equivalent to

I=QP. (3.16)

(2 is the inverse of P. Because of (3.15), the ip t are represented by the rows


of Q. Thus, to find the dual basis, we write the representation of the basis 8
in the columns of P, find the inverse matrix P _1 and read out the repre- ,
A.

sentations of the basis 6 in the rows of P~x .

EXERCISES
1. Let A = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . .
, (0, 0, . . . , 1)} be a basis of R n .

The basis of R n dual to A has the same coordinates. It is of interest to see if there
are other bases of Rn for which the dual basis has excatly the same coordinates.
Let A' be another basis of R n with matrix of transition P. What condition should
P satisfy in order that the elements of the basis dual to A' have the same coordinates
as the corresponding elements of the basis A' ?

2. Let A = {a l9 a 2 a 3} be a basis of a 3-dimensional vector space V, and


, let A =
{<t>i, K
<&}} be the basis of V dual to A. Then let A' ={(1,1, 1), (1, 0, 1), (0, 1, -1)}

be another basis of V (where the coordinates are given in terms of the basis A).
Use the matrix of transition to find the basis A' dual to A'.

3. Use the matrix of transition to find the basis dual to {(1,0,0), (1, 1,0),
(1,1,1)}.
4. Use the matrix of transition to find the basis dual to {(1, 0, —1), ( — 1, 1, 0),

(0,1,1)}.
5. Let B represent a linear functional $, and X a vector £ with respect to dual
bases, so that BX is the value <f>$ of the linear functional. Let P be the matrix of
new basis so that if X' is the new representation of I, then X = PX'.
transition to a
By substituting PX' for X in the expression for the value of <f>£ obtain another
proof that BP is the representation of in the new dual coordinate system. <f>

4 I Annihilators
A.

Definition. Let V be an w-dimensional vector space and V its dual. If, for
A.

an a g V and a <f>
e V, we have <f>a. = 0, we say that <f>
and a are orthogonal.
4 I Annihilators
139
Since <f> and a are from different vector spaces, it should be clear that we
do
not intend to say that the <f> and a are at "right angles."

Definition.Let W
be a subset (not necessarily a subspace) of V. The set of
functional
all linear such that <£a = for all a e is called the annihilator
<f> W
of W, and we denote it by V\A. Any e V\A is called «« annihilator of W. <f>

Theorem 4.1. The annihilator Wx o/ W


is a subspace of V. If W is a
subspace of dimension p, then W 1 is of dimension n — p.
proof. If <f> and tp are in W1 , then (a^ + by>)a. = a<f>* + biptx. = for
all a e W. Hence, a subspace of V. WL is

Suppose W
is a subspace of V of dimension
p, and let A = {a lf . . . , aB}
be a basis of V such that {a l5 a p } is a basis of W. Let A = . . . ,
{fa, . . . ,
<£J
be the dual basis of A. For {<f>
p+1 fa} we see that = , . . .
, ^a, fo'r all
i < p. Hence, {0 lH 1 fa} is a subset of the annihilator of W.
. , . . . , On the
other hand, if <j> n
= £=
=1 brf> is an annihilator of W, we have 0<x, =
each
set {fa +1 ,
/ < p.

. .
But
. ,
fa
fa} spans
J;=1
t

^
W^. Thus
a, = 6,. Hence, b t
fa} is a basis for W^,
{<£ p+1 , . . .
= for i ^ p and the
for

W
,
x
and is of dimension n - p. The dimension of W^ is called the co-
dimension of W.

It should also be clear from this argument that W is exactly the set of all
a e V annihilated by all <£ e W-1 Thus we have .

Theorem 4.2. If S is any subset of V, the set of all a. e V annihilated


by all
<f>eSis a subspace of V, denoted by S±. If S is a subspace
of dimension r,
then S 1 is a subspace of dimension n r. —
Theorem 4.2 is really Theorem 1.16 of Chapter II in a different form.
If
a linear transformation of V into another vector space is represented by W
a matrix A, then each row of A can be considered as
representing a linear
functional on V. The number of linearly independent rows of
r A is the
dimension of the subspace S of V spanned by these linear functional.
S1
is the kernel of the linear transformation
and its dimension is n — r.
The symmetry in this discussion should be apparent. If e W1
<f> , then
<f>a. = for all a e W. On the other hand, for a e W, <£a = for all (f>eW^.
Theorem 4.3. If W is a subspace, (W^ = W.
proof. By definition, (W^ = W±± is the set of a e V such that
<A<x = for all W^. Clearly,
<f>
e W c W 11 . Since dim W 11 = n -
dim W-l = dim W, W±x = W.
This also leads to a reinterpretation of the discussion in
Section II-8.
A subspace W
of V of dimension p can be characterized by giving its
annihilator W 1 - c V of dimension /• = n ~ p.
140 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV

Theorem 4.4. If W x and W 2 are two subspaces of V, and VV 1 and VV 1


are their respective annihilators in V, the annihilator of W 1 + VV2 is VV1 n
W 2
± and the annihilator
of x
VV2 is 1 VV 1 W n W + .

proof. If <f>
is an annihilator of VVX annihilates all a g + W then l5 <f>
W x

and all g W 2 so that e VV 1 n VV 1 then for


</> g VV 1 n VV 1 . If , all

a g VVX and /S g VV2 we have </><x and </>/? = 0. Hence, (f>(aa. + 6/?)
= =
a<£oc+ b<f>p = so that annihilates W^ + VV2 This shows that (Wx . +
w,) 1 = VV 1 n VV 1 .

The symmetry between the annihilator and the annihilated means that
the second part of the theorem follows immediately from the first. Namely,
since (Wx + VVj) 1 = VV 1 n VV 1 , we have by substituting VV 1 and 1 W
for W x and W 2, (VV 1 + VV 1 ) 1 = (VV 1 ) 1 n (VV 1 ) 1 = VVX n VV>. Hence,
(W1 n VV,) 1 = VV 1 + VV 1 n .

Now the mechanics for finding the sum of two subspaces is somewhat

simpler than that for finding the intersection. To find the sum we merely
combine the two bases for the two subspaces and then discard dependent
vectors until an independent spanning set for the sum remains. It happens
that to find the intersection n it is easier to find VV 1 and 1 W x
W 2 W
and then VV 1 + VV 1 and obtain W nW
x 2 as (VV 1 + VV 1 ) 1 , than it is to
find the intersection directly.
The example in Chapter II-8, page 71, is exactly this process carried out
in detail. In the notation of this discussion £ x
A
= W1 and £ 2 = VV 1 .

Let V be a vector space, V the corresponding dual vector space, and let VV
be a subspace of V. Since VV <= V, is there any simple relation between VV
and V? There is a relation but it is fairly sophisticated. Any function defined
on all of V is certainly defined on any A subset. linear functional <f>
e V,

therefore, defines a function on VV,which we have called the restriction of <j>


A A
to VV. This does not mean that V <= VV; it means that the restriction defines
A A
a mapping of V into VV.
Let us denote the restriction of </> to VV by <j>, and denote the mapping of <f>

onto 4> by R. We call R the restriction mapping. It is easily seen that R is

linear. The kernel of R is the set of all <f> g V such that </>(a) = for all

a g VV. Thus K(R) = VV 1 Since dim VV


. dim VV n = = - dim VV 1 =
n — dim K(R), the restriction map is an epimorphism. Every linear functional
on VV is the restriction of a linear functional on V.
A A
Since K(R) =
VV 1 we have also shown that VV and V/VV 1 are isomorphic.
,

But two vector spaces of the same dimension are isomorphic in many ways.
We have done more than show that VV and V/VV 1 are isomorphic. We have
shown that there is a canonical isomorphism that can be specified in a natural
way independent of any coordinate system. If <f>
is a residue class in V/W 1 ,
.

4 |
Annihilators
j4j

and <f>
is any element of this residue class, then $ and R(<f>) correspond under
this natural isomorphism. If denotes the natural
rj homomorphism of
V onto V/W 1 and, t denotes the mapping of 4> onto R(<f>) defined above,
then R= Tt], and t is uniquely determined by R and t] and this relation.

Theorem 4.5.^ Let W be a subspace of V and let W-1 be the annihilator of


W in V.^ ThenW is isomorphic to V/W 1 . Furthermore, if R is the restriction
map of \ onto W, ifrj is the natural homomorphism of V onto V/W 1 and r , is the
unique isomorphism ofV/W^ onto W characterized by the condition R= rrj,
then r{<f>) = R(cf>) where <j> is any linear functional in the residue class $ e
V/W ± a .

EXERCISES
1 (a) Find a basis for the annihilator of W= <(1 , 0, - 1), (1 , -1 , 0), (0, 1 , - 1)>.
(b) Find a basis for the annihilator of W= <(1, 1, 1, 1,1), '(1, 0, 1,0, 1)'
(0 1
1,1,0), (2, 0, 0, 1, 1), (2, 1,1,2, 1), (1,-1,-1, -2, 2), (1, 2, 3, 4, -1)>. What
are the dimensions of W and W^ ?

2. Find a non-zero linear functional which takes on the same non-zero


value
for ^ = (1, 2, 3), £ 2 = (2, 1, 1), and f8 = (1, 0, 1).
3. Use an argument based on the dimension of the annihilator to show that if
a * 0, there is a e V such that # 0.