Nering LinearAlgebraAndMatrixTheory Text
Nering LinearAlgebraAndMatrixTheory Text
so
rri ^
o>
Z
33
m
O
O
Z
o
512 I
943
second edition
z
m
> LINEAR ALGEBRA
73
AND
o
m MATRIX THEORY
09
73
> Evar D. Nering
>
z
73
X
H
m
o
<
second
edition
Wiley
Ct Ht-
3S3M-0
PRESTON POLYTECHNIC
LIBRARY & LEARNING RESOURCES SERVICE
Y^jji-nP'W'M
:>i
512.943 NER
r.h
A/C 033340
Evar D* Nering
Professor of Mathematics
Arizona State University
®
John Wiley & Sons, Inc.
being from another planet. But the various symbols we write down and
refer to as "numbers" are really only representations of the
carelessly
abstract numbers. These representations should be called "numerals" and
we should not confuse a numeral with the number it represents. Numerals
individuals, and
of different types are the inventions of various cultures and
lies in the ease with
the superiority of one system of numerals over another
which they can be manipulated and the insight they give us into the nature
linear functional This latter material is, in turn, based on the material
.
in Sections 1-5 of Chapter IV. For this reason these omissions should not
be made in a random manner. Omissions can be made to accommodate a
one-quarter course, or for other purposes, by carrying one line of development
to its completion and curtailing the other.
The exercises are an important part of this book. The computational
exercises in particular should be worked. They are intended to illustrate the
theory, and their cumulative effect is to provide assurance and understanding.
Numbers have been chosen so that arithmetic complications are minimized.
A student who understands what he is doing should not find them lengthy
or tedious. The theoretical problems are more difficult. Frequently they
are arranged in sequence so that one exercise leads naturally into the next.
Sometimes an exercise has been inserted mainly to provide a suggestive
context for the next exercise. In other places large steps have been broken
down into a sequence of smaller steps, and these steps arranged in a sequence
may find it easier to work ten exercises
of exercises. For this reason a student
in arow than to work five of them by taking every other one. These exercises
are numerous and the student should not let himself become bogged down
by them. It is more important to keep on with the pace of development
and to obtain a picture of the subject as a whole than it is to work every
exercise.
The last chapter on selected applications of linear algebra contains a lot
of material in rather compact form. For a student who has been through
the first five chapters it should not be difficult, but it is not easy reading for
someone whose previous experience with linear algebra is in a substantially
Evar D. Nering
Tempe, Arizona
January, 1963
Preface to
second edition
This edition differs from the first primarily in the addition of new material.
Although there are significant changes, essentially the first edition remains
intact and embedded in this book. In effect, the material carried over from
the first edition does not depend logically on the new material. Therefore,
this first edition material can be used independently for a one-semester or
one-quarter course. For such a one-semester course, the first edition usually
required a number of omissions as indicated in its preface. This omitted
material, together with the added material in the second edition, is suitable
for the second semester of a two-semester course.
The concept of duality receives considerably expanded treatment in this
second edition. Because of the aesthetic beauty of duality, it has long been
a favorite topic in abstract mathematics. I am convinced that a thorough
understanding of this concept also should be a standard tool of the applied
mathematician and others who wish to apply mathematics. Several sections
of the chapter concerning applications indicate how duality can be used.
For example, in Section 3 of Chapter V, the inner product can be used to
avoid introducing the concept of duality. This procedure is often followed
in elementary treatments of a variety of subjects because it permits doing
some things with a minimum of mathematical preparation. However, the
cost in loss of clarity a heavy price to pay to avoid linear functionals.
is
Using the inner product to represent linear functionals in the vector space
overlays two different structures on the same space. This confuses concepts
that are similar but essentially different. The lack of understanding which
usually accompanies this shortcut makes facing a new context an unsure
undertaking. I think that the use of the inner product to allow the cheap
and early introduction of some manipulative techniques should be avoided.
It is far better to face the necessity of introducing linear functionals at the
earliest opportunity.
I have made a number of changes aimed at clarification and greater
precision. I am not an advocate of rigor for rigor's sake since it usually
IX
x Preface to Second Edition
are applied indiscriminately to concepts that are not quite identical. For
this reason, I have chosen to use "eigenvalue" and "characteristic value" to
denote non-identical concepts; these terms are not synonyms in this text.
Similarly, I have drawn a distinction between dual transformations and
adjoint transformations.
Many people were kind enough to give me constructive comments and
observations resulting from their use of the first edition. comments
All these
were seriously considered, and many resulted in the changes made in this
edition. In addition, Chandler Davis (Toronto), John H. Smith (Boston
College), John V. Ryff (University of Washington), and Peter R. Christopher
(Worcester Polytechnic) went through the first edition or a preliminary
version of the second edition in detail. Their advice and observations were
particularly valuable to me. To each and every one who helped me with
this second edition, I want to express my debt and appreciation.
Evar D. Nering
Tempe, Arizona
September, 1969
Contents
Introduction
I Vector spaces
1. Definitions 5
2. Linear Independence and Linear Dependence 10
1. Linear Transformations 27
2. Matrices 37
3. Non-Singular Matrices 45
4. Change of Basis 50
5. Hermite Normal Form 53
6. Elementary Operations and Elementary Matrices 57
7. Linear Problems and Linear Equations 63
8. Other Applications of the Hermite Normal Form 68
9. Normal Forms 74
*10. Quotient Sets, Quotient Spaces 78
11. Hom(U, V) 83
85
III Determinants, eigenvalues, and similarity transformations
1. Permutations 86
2. Determinants 89
3. Cofactors 93
4. The Hamilton-Cayley Theorem 98
5. Eigenvalues and Eigenvectors 104
6. Some Numerical Examples 110
7. Similarity 113
*8. The Jordan Normal Form 118
128
IV Linear functionate, bilinear forms, quadratic forms
xi
xu Contents
Appendix 319
Answers to selected exercises 325
Index 345
Introduction
The next step is to introduce a practical method for performing the needed
calculations with vectors and linear transformations. We restrict ourselves
to the case in which the vector spaces are finite dimensional. Here it is
appropriate to represent vectors by ^-tuples and to represent linear trans-
formations by matrices. These representations are also introduced in
Chapters I and II.
Where the vector spaces are infinite dimensional other representations are
required. In some may be represented by infinite sequences,
cases the vectors
or Fourier series, or Fourier transforms, or Laplace transforms. For
example, it is now common practice in electrical engineering to represent
inputs and outputs by Laplace transforms and to represent linear systems
by still other Laplace transforms called transfer functions.
The point is that the concepts of vector spaces and linear transformations
are common to all linear methods while matrix theory applies to only those
Introduction 3
changes under a change of variables. With the more recent studies of Hilbert
spaces and other infinite dimensional vector spaces this approach has proved
to be inadequate. New techniques have been developed which depend less
on the choice or introduction of a coordinate system and not at all upon the
use of matrices. Fortunately, in most cases these techniques are simpler than
the older formalisms, and they are invariably clearer and more intuitive.
These newer techniques have long been known to the working mathe-
matician, but until very recently a curious inertia has kept them out of books
on linear algebra at the introductory level.
These newer techniques are admittedly more abstract than the older
formalisms, but they are not more difficult. Also, we should not identify
the word "concrete" with the word "useful." Linear algebra in this more
abstract form is just as useful as in the more concrete form, and in most cases
easier to see how it should be applied.
it is A
problem must be understood,
formulated in mathematical terms, and analyzed before any meaningful
computation is possible. If numerical results are required, a computational
procedure must be devised to give the results with sufficient accuracy and
reasonable efficiency. All the steps to the point where numerical results are
considered are best carried out symbolically. Even though the notation and
terminology of matrix theory is well suited for computation, it is not necessar-
ily the best notation for the preliminary work.
matrix theory we will seldom see any matrices at all. There will be symbols
standing for matrices, and these symbols will be carried through many steps
in reasoning and manipulation. Only occasionally or at the end will any
matrix be written out in full. This is so because the computational aspects
of matrices are burdensome and unnecessary during the early phases of work
on a problem. All we need is an algebra of rules for manipulating with them.
During this phase of the work it is better to use some concept closer to the
concept in the field of application and introduce matrices at the point where
practical calculations are needed.
An additional advantage in our studying linear algebra in its invariant
form is that there are important applications of linear algebra where the under-
lying vector spaces are infinite dimensional. In these cases matrix theory
must be supplanted by other techniques. A case in point is quantum mechanics
which requires the use of Hilbert spaces. The exposition of linear algebra
given in this book provides an easy introduction to the study of such spaces.
In addition to our concern with the beauty and logic of linear algebra in
this form we are equally concerned with its utility. Although some hints
of the applicability of linear algebra are given along with its development,
Chapter VI is devoted to a discussion of some of the more interesting and
representative applications.
chapter
I Vector spaces
In this chapter we introduce the concepts of a vector space and a basis for
that vector space. We assume that there is at least one basis with a finite
number of elements, and this assumption enables us to prove that the
vector space has a vast variety of different bases but that they all have the
same number of elements. This common number is called the dimension
of the vector space.
For each choice of a basis there is a one-to-one correspondence between
the elements of the vector space and a set of objects we shall call n-tuples.
A different choice for a basis will lead to a different correspondence between
the vectors and the ^-tuples. We, regard the vectors as the fundamental
objects under consideration and the n-tuples a representations of the vectors.
Thus, how a particular vector is represented depends on the choice of the
basis, and these representations are non-invariant. We call the n-tuple the
coordinates of the vector it represents; each basis determines a coordinate
system.
We then introduce the concept of subspace of a vector space and develop
the algebra of subspaces. Under the assumption that the vector space is
finite dimensional, we prove that each subspace has a basis and that for
each basis of the subspace there is a basis of the vector space which includes
the basis of the subspace as a subset.
1 I Definitions
To deal with the concepts that are introduced we adopt some notational
conventions that are commonly used. We usually use sans-serif italic letter
to denote sets.
6 Vector Spaces |
I
A set will often be specified by listing the elements in the set or by giving
a property which characterizes the elements of the set. In such cases we
use braces: {a, /?} is the set containing just the elements a and /S, {a |
P}
is the set of a with property P, {a^ fi e M} denotes the set of all a„
all
|
corresponding to [x in the index set M. We have such frequent use for the set
of all integers or a subset of the set of all integers as an index set that we
adopt a special convention for these cases, {aj denotes a set of elements
indexed by a subset of the set of integers. Usually the same index set is used
over and over. In such cases it is not necessary to repeat the specifications
of the index set and often designation of the index set will be omitted.
Where clarity requires it, the index set will be specified. We are careful to
distinguish between the set {aj and an element o^ of that set.
(a + b)c = ac + be.
The elements of F are called scalars, and will generally be denoted by lower
case Latin italic letters.
The rational numbers, real numbers, and complex numbers are familiar
and important examples of fields, but they do not exhaust the possibilities.
As a less familiar example, consider the set {0, 1} where addition is defined
by the rules: + 0=1 + 1=0, 0+1 = 1; and multiplication is defined
by the rules: 0-0 = 0-1=0, 1-1 = 1. This field has but two elements,
and there are other fields with finitely many elements.
We do not develop the various properties of abstract fields and we are
not concerned with any specific field other than the rational numbers, the
real numbers, and the complex numbers. We find it convenient and desirable
at the moment to leave the exact nature of the field of scalars unspecified
because much of the theory of vector spaces and matrices is valid for arbitrary
fields.
The student unacquainted with the theory of abstract fields will not be
handicapped for it will be sufficient to think of F as being one of the familiar
fields. All that matters is that we can perform the operations of addition and
B5. 1 •
a = a (where 1 e F).
The vector space axioms concerning addition alone have already appeared
in the definition of a field. A set of elements satisfying the first four axioms
is called a group. If the set of elements also satisfies A5 it is called a com-
mutative group or abelian group. Thus both fields and vector spaces are
abelian groups under addition. The theory of groups is well developed and
our subsequent discussion would be greatly simplified if we were to assume
a prior knowledge of the theory of groups. We do not assume a prior
knowledge of the theory of groups; therefore, we have to develop some of
their elementary properties as we go along, although we do not stop to point
out that what was proved is properly a part of group theory. Except for
specific applications in Chapter VI we do no more than use the term "group"
to denote a set of elements satisfying these conditions.
First, we give some examples of vector spaces. Any notation other than
"F" for a field and "V" for a vector space is used consistently in the same
way throughout the rest of the book, and these examples serve as definitions
for these notations:
(1) Let F be any field and let V = P be the set of all polynomials in an
indeterminate x with coefficients in F. Vector addition is defined to be the
ordinary addition of polynomials, and multiplication is defined to be the
ordinary multiplication of a polynomial by an element of F.
(2) For any positive integer n, let P n be the set of all polynomials in x
with coefficients in F of degree <n— 1 , together with the zero polynomial.
The operations are defined as in Example (1).
(3) Let F = R, the field of real numbers, and take V to be the set of all
(f+g)(*)=f(*)+g(*)>
(af)(x) = a[f(x)].
"
1 I
Definitions
(8) Let F = R, and let V = Rn be the set of all real ordered w-tuples,
We call this vector space the n-dimensional real coordinate space or the
real affine n-space.(The name "Euclidean n-space" is sometimes used, but
that term should be reserved for an affine w-space in which distance is defined.)
n
(9) Let F be the set of all w-tuples of elements of F. Vector addition and
-
scalar multiplication are defined by the rules (1.2). We call this vector
space an n-dimensional coordinate space.
An
immediate consequence of the axioms defining a vector space is that
the zero vector, whose existence is asserted in A3, and the negative vector,
EXERCISES
1 What theorems must be proved in each of the Examples (4), (5), (6), and
to 4.
All To verify B\ ? (These axioms are usually the ones which require
(7) to verify
most specific verification. For example, if we establish that the vector space
described in Example (3) satisfies all the axioms of a vector space, then A\ and B\
are the only ones that must be verified for Examples (4), (5), (6), and (7). Why?)
5. Let P + be the set of polynomials with real coefficients and positive constant
term. Is P+ a vector space ? Why ?
6. Show that if aa =0 and a ¥= 0, then a = 0. {Hint: Use axiom F9 for fields.)
space R4 Determine .
(a) a+p.
{b) *-p.
(c) 3a.
{d) 2a + 3/3.
10. Show that any field can be considered to be a vector space over itself.
11. Show that the real numbers can be considered to be a vector space over the
rational numbers.
12. Show that the complex numbers can be considered to be a vector space over
the real numbers.
13. Prove the uniqueness of the zero vector and the uniqueness of the negative
of each vector without using the commutative law, A5.
Because of the associative law for vector addition, we can omit the
parentheses from expressions like + (a2 <x 2 a 3 <xs) a^ + = {^^ + a a2) + 2
such terms provided that only a finite number of coefficients are different
from zero. Thus, whenever we write an expression like (in which we ^ a^
do not specify the range of summation), it will be assumed, tacitly if not
explicitly, that the expression contains only a finite number of non-zero
coefficients.
If /5 = ^t fl
i
a i» we say tnat $ 1S a linear combination of the o^. We also
say that /8 is linearly dependent on the a* if can be expressed as a linear
combination of the o^. An expression of the form 2» fl
z
ai = is called a
linear relation among the <x
f
. A relation with all at = is called a trivial
linear relation; a relation in which at least one coefficient is non-zero is
should be noted that any set of vectors that includes the zero vector is
It
ar\aa) — cr 1
= 0. Notice also that the empty set is linearly independent.
•
relation is not unique, but that is usually incidental.) There is at least one
non-zero coefficient; let a k be non-zero. Then a = ^ ^ k {—af 1 a^)a. i that is fc i
;
one of the vectors of the set {aj is a linear combination of the others. (2) If a
set {aj of vectors is known to be linearly independent and a linear relation
We shall use the symbol <a ls . . . , a^) instead since it is simpler and there is
no danger of ambiguity.)
proof. This is merely Theorem 2.2 in contrapositive form and stated in
new notation. ,r .
„ . ,
<fc.- c * ? / •
Theorem 2.4. IfB and C are any subsets such thato <= (C), then (B) <= (Q.
proof. Set A = (B) in Theorem 2.1. Then 8 c <C) implies that <B> =
A c <C>. D
successively replace the a f by the &, obtaining at each step a new n-element
set that spans V. Thus, suppose that A k = {^ t &., afc+1 a„} is an , . . . , , . . . ,
n-element set that spans V. (Our starting point, the hypothesis of the
theorem, is the case k = 0.) Since A k spans V, fi k+1 is dependent on A k .
we may assume that it is a fc+1 By Theorem 2.5 the set A k+1 = {/5 X . , . . .
EXERCISES
In the vector space P of Example (1) let p x (x) = x + x + 1, p 2 (x) =
2
1.
linearly dependent,
{pi(x ), Pi(x), Pz{x )> Pi(x )} is linearly independent. If the set is
2. Determine ({p x (x), p 2 (x), p z (x), p^x)}), where the p { (x) are the same poly-
cannot list all its elements. What is required is a description; for example, "all
polynomials of a certain degree or less," "all polynomials with certain kinds of
coefficients," etc.)
linearly independent set. In this definition the emphasis is on the concept of set
inclusion and not on the number of elements in a set. In particular, the definition
allows the possibility that two different maximal linearly independent sets might
have different numbers of elements. Find all the maximal linearly independent
subsets of the set given in Exercise 1. How many elements are in each of them?
4. Show that no finite set spans P; that is, show that there is no maximal finite
5. In Example (2) for n = 4, find a spanning set for P4 . Find a minimal spanning
set. Use Theorem 2.7 to show that no other spanning set has fewer elements.
7. In Example (1) show that the set of all polynomials divisible by x - 1 cannot
span P.
4
8. Determine which of the following set in R are linearly independent over R.
. .
.
, 1)} is linearly independent in Fn over F.
10. In Exercise 11 of Section 1 it was shown that we may consider the real
numbers to be a vector space over the rational numbers. Show that {1, V2} is a
linearly independent set over the rationals. (This is equivalent to showing that
Vl is irrational.) Using this result show that {1, V2, Vj} is linearly independent.
11. Show that if one vector of a set is the zero vector, then the set is linearly
dependent.
12. Show that if an indexed set of vectors has one vector listed twice, the set is
linearly dependent.
14. Show that if a set S is linearly independent, then every subset of S is linearly
independent.
,
3 |
Bases of Vector Spaces 15
already observed that this set is linearly independent, and it clearly spans
the space of all polynomials. The space P„ has a basis with a finite number of
elements; x n - x ).
{1, x, z2 , . . . ,
The vector spaces in Examples (3), (4), (5), (6), and (7) do not have bases
with a finite nuftiber of elements.
In Example (8) every R has a finite basis consisting of {a, a 4 = (d u
n
| ,
<5
2 i, dm)}- (Here 6 ti is the useful symbol known as the Kronecker delta.
• • • »
Theorem 3.1. If a vector space has one basis with a finite number of
elements, then all other bases are finite and have the same number of elements.
Let A be a basis with a finite number n of elements, and let 8 be
proof.
any other basis. Since A spans V and B is linearly independent, by Theorem
2.7 the number m of element in 6 must be at most n. This shows that 8 is
finite and m < n. But then the roles of A and 6 can be interchanged to
obtain the inequality in the other order so that m= n.
zero. There are very interesting vector spaces with infinite bases ; for example,
P of Example (1). Moreover, many of the theorems and proofs we give are
also valid for infinite dimensional vector spaces. It is not our intention,
16 Vector Spaces |
I
however, to deal with infinite dimensional vector spaces as such, and when-
ever we speak of the dimension of a vector space without specifying whether
it is finite or infinite dimensional we mean that the dimension is finite.
We have already seen that the four vectors {a = (1, 1, 0), = (1, 0, 1),
is 3-dimensional we see that this must be expected for any set containing
3
at least four vectors from R The next theorem shows that each subset of .
three is a basis.
Any non-trivial relation that exists must contain a with a non-zero coefficient,
for if that coefficient were zero the relation would amount to a relation in A.
Thus a is dependent on A. Hence A spans V and is a basis.
proof. The "only if" is part of the definition of a basis. If n vectors did
span V and were linearly dependent, then (by Theorem 2.5) a proper subset
would also span V, contrary to Theorem 2 .7. ijn, <Uh^
spanning set. This idea made explicit in the next two theorems.
is
pendent on A. Then because of Theorem 2. 1 the set A must also span V and
it is a basis.
the theorem would assert that every vector space has a basis since every
vector space is spanned by itself. Discussion of such a theorem is beyond the
aims of this treatment of the subject of vector spaces.
It is often used in the following way. A non-zero vector with a certain desired
property is selected. Since the vector is non-zero, the set consisting of that
vector alone is a linearly independent set. An application of Theorem 3.6
shows that there is a basis containing that vector. This is usually the first step
of a proof by induction in which a basis is obtained for which all the vectors
in the basis have the desired property.
Let A = {a l5 a„} be an arbitrary basis of V, a vector space of dimension
. . . ,
n over the field F. Let a be any vector in V. Since A is a spanning set a can be
represented as a linear combination of the form <x = 2"=i at a »- Since A is
linearly independent this representation is unique, that is, the coefficients at
are uniquely determined by a (for the given basis A). On the other hand,
for each n-tuple (a t a n ) there is a vector in V of the form ^Li ai ai-
, . . . ,
sets which are isomorphic differ in details which are not related to their inter-
nal structure. They are essentially the same. Furthermore, since two sets
isomorphic to a third are isomorphic to each other we see that all w-dimen-
sional vector spaces over the same field of scalars are isomorphic.
The of
set «-tuples together with the rules for addition and scalar multi-
plication forms a vector spaceinitsown right. However, when a basis is chosen
in an abstract vector space V the correspondence described above establishes
an isomorphism between V and F n In this context
. we consider F n to be a
representation of V. Because of the existence of this isomorphism a study
of vector spaces could be confined to a study of coordinate spaces. However,
n
the exact nature of the correspondence between V and F depends upon the
choice of a basis in V. If another basis were chosen in V a correspondence
between the a g V and the «-tuples would exist as before, but the correspond-
ence would be quite different. We choose to regard the vector space V and the
vectors in V as the basic concepts and their representation by «-tuples as a tool
for computation and convenience. There are two important benefits from
this viewpoint. Since we are free to choose the basis we can try to choose a
coordinatization for which the computations are particularly simple or for
which some fact that we wish to demonstrate is particularly evident. In
fact, thechoice of a basis and the consequences of a change in basis is the
central theme of matrix theory. In addition, this distinction between a
vector and its representation removes the confusion that always occurs when
we define a vector as an n-tuple and then use another «-tuple to represent it.
Only the most elementary types of calculations can be carried out in the
abstract. Elaborate or complicated calculations usually require the intro-
duction of a representing coordinate space. In particular, this will be re-
quired extensively in the exercises in this text. But the introduction of
3 |
Bases of Vector Spaces 19
coordinates can result in confusions that are difficult to clarify without ex-
tensive verbal description or awkward notation. Since we wish to avoid
cumbersome notation and keep descriptive material at a minimum in the
exercises, it is helpful to spend some time clarifying conventional notations
and circumlocutions that will appear in the exercises.
The introduction of a coordinate representation for V involves the selection
of a basis {<x x a n } for V. With this choice a x is represented by (1, 0,
, . . . ,
Such short-cuts may be disgracefully inexact, but they are so common that
we must learn how to interpret them.
For example, let V be a two-dimensional vector space over R. Let A =
{a 1? a 2 } be the selected basis. If = a x + a 2 an d j8 2 = — a i + a 2> then
^
8 = {/?!, j8 2 } is also a basis of V. With the convention discussed above we
would identify a x with (1,0), a 2 with (0, 1), /? x with (1, 1), and |8 2 with
(— 1, 1). Thus, we would refer to the basis 8 = {(1, 1), (—1, 1)}. Since
a i = IPi ~ £&> a i has the representation (\, —%) with respect to the basis 8.
If we are not careful we can end up by saying that "(1, 0) is represented by
EXERCISES
To show that a given set is a basis by direct appeal to the definition means
that we must show the set is linearly independent and that it spans V. In any
given situation, however, the task is very much simpler. Since V is ^-dimensional
a proposed basis must have n elements. Whether this is the case can be told at a
glance. In view of Theorems 3.3 and 3.4 if a set has n elements, to show that it is a
basis it suffices to show either that it spans V or that it is linearly independent.
1. In R 3 show that {(1, 1,0), (1, 0, 1), (0, 1, 1)} is a basis by showing that it is
linearly independent.
2. Show that {(1,1, 0), (1,0, 1), (0, 1, 1)} is a basis by showing that <(1, 1, 0),
(1, 0, 1), (0, 1, 1)> contains (1, 0, 0), (0, 1,0) and (0, 0, 1). Why does this suffice?
3. In R 4 let = {(1, 1, 0, 0), (0, 0, 1, 1), (1,0, 1, 0), (0, 1, 0, -1)} be a basis
A
(is it?) and 8 ={(1,2, -1, 1), (0, 1,2, -1)} be a linearly independent set
let
(is it ?). Extend B to a basis of R4 (There are many ways to extend 8 to a basis.
.
It is intended here that the student carry out the steps of the proof of Theorem 3.6
for this particular case.)
20 Vector Spaces |
I
4. Find a basis of R 4 containing the vector (1, 2, 3, 4). (This is another even
simpler application of the proof of Theorem 3.6. This, however, is one of the most
important applications of this theorem, to find a basis containing a particular
vector.)
4 I Subspaces
Definition. A
subspace VV of a vector space V is a non-empty subset of V
which is itself a vector space with respect to the operations of addition and
scalar multiplication defined in V. In particular, the subspace must be a
vector space over the same field F.
The first problem that must be settled is the problem of determining the
conditions under which a subset W is in fact a subspace. It should be clear
that axioms A2, A5, B2, 53, 54, and 55 need not be checked as they are valid
in any subset of V. The most innocuous conditions seem to be Al and B\,
but it is precisely these conditions that must be checked. If B\ holds for a
non-empty subset VV, there is an a e VV so that 0<x = g VV. Also, for each
a e W, (— l) a = — a g W. Thus A3 and AA follow from B\ in any non-
empty subset of a vector space and it is sufficient to check that VV is non-
empty and closed under addition and scalar multiplication.
The two closure conditions can be combined into one statement: if
a, |8 g VV and a, b e F, then aa + bfi e VV. This may seem to be a small
change, but it is a very convenient form of the conditions. It is also equivalent
to the statement that all linear combinations of elements in VV are also in VV;
that is, <W> VV. =
It follows directly from this statement that for any
sufficient to show that the sum of two polynomials which vanish at the
x also vanishes at the x it and the product of a scalar and a polynomial
t
m> nl Similar subspaces can be defined in examples (3), (4), (5), (6),
and (7).
4 |
Subspaces 21
ax G W and a 6 Wx 2 2.
ofV.
proof. If a = ai + a e W + W = & + £ e W + W and a,
2 x 2, 2 x 2,
Z> e F, then act. + bfi = fl(a + a ) + 6(/?x + &) = (aaj + 6/^) + (aa +
x 2 2
W +W
Theorem smallest subspace containing both W and
4.4. x 2 /s //?e x
A u A spans W + W
x 2 x 2.
a g W and a g VV
x Then a G W
x (Wx u W and a g W
2 (Wx U2. x x
<=
2> 2 2
cz
W Since (W u W
2 >. a subspace, a = a + a e (Wx U W 1 Thus 2> is x 2 2 >.
Wx + W = (Wx u W 2 2 >.
22 Vector Spaces |
I
U Aj>
(A x and 2
= (A 2) W
c (A x U A 2 ) so that W uW 1 2
c (^ u A2 > <=
relation of the form 2*ii a^ = we could not have ak+1 ^ for then
afc+i e <a l9 a t >. But then the relation reduces to the form JjLi «*<*» = 0.
. . . ,
{ai, .ar y x
. .
y t } of VV2 It is clear that {<x x
, , , .
<x r
. . lf,
/? s y lt . , . . . , , . . . , ,
bj =
for ally. By a symmetric argument we see that all c = 0. Finally,
k
thismeans that J, a^ = from which it follows that all at = 0. This
shows that the spanning set {a l5 ar ft, ft, y x 7t } is linearly . . . ,
,
. .
. , , . . .
,
a(l, 0, 2) +
6(1, 2, 2) = c(l, 1, 0) + </(0, 1, 1). This leads to the three equations:
=c a + 6
2b = c + d
2a + 2b= d.
and W
2 = <(0, 0, 1,0), (0,0,0,1)) are each 2-dimensional and n W x
W = {0},
2 W, + W = 2 R*.
Those cases in which dim (Wx n VV2 ) = deserve special mention. If
W nW = {0} we say that the sum W + VV direct: W + W a
x 2 x 2 is x 2 is
direct sum of W and W To indicate that a sum direct we use the notation,
x 2. is
W e W For a e W W there exist a e W and a e W such that
x 2. x 2 x x 2 2
a = ax + a2 . This much
any sum of two subspaces. If the sum is
is true for
direct, however, <x x and <x 2 are uniquely determined by a. For if a = a + a =
x 2
^ + a;, then a t - a( = a 2 - a 2 Since the left side is in x and the . W
right side is in VV2 both are in x n ,
2
W
But this means ol x — x = and W . <x.
that W
x and
VV2 are complementary and that VV2 is a complementary subspace
of W
l5 or a complement of W^.
.
24 Vector Spaces |
I
The notion of a direct sum can be extended to a sum of any finite number
of subspaces. The sum x + + k is said to be direct W • • • W if for each i,
W n Q^i Wj) =
t {0}. If the sum of several subspaces is direct, we use the
notation W © W x 2 © • • •
© W k . In this case, too, a e W x © • •
© W k
can be expressed uniquely in the form a = ^a i5
a f e W^.
Theorem 4.9. If W is a subspace of V there exists a subspace W such that
V =w@w.
proof. Let {a l5 a m } be a basis of W. Extend this linearly inde-
. . . ,
W = dim W +
k) x
• • •
+ dim W k .
EXERCISES
1 Let P be the space of all polynomials with real coefficients. Determine which
of the following subsets of P are subspaces.
(a){p(x)\p(l)=0}.
(b) {p{x) |
constant term ofp(x) = 0}.
(c) {p(x) |
degree ofp(x) = 3}.
{d) {p{x) |
degree ofp(x) < 3}.
(Strictly speaking, the zeropolynomial does not have a degree associated with it.
It is sometimes convenient to agree that the zero polynomial has degree less than
any integer, positive or negative. With this convention the zero polynomial is
included in the set described above, and it is not necessary to add a separate
comment to include it.)
(e) {p(x) |
degree of p{x) is even} u {0}.
2. Determine which of the following subsets of Rn are subspaces.
(a) {(«!, x 2 , xn ) x 1 = 0}.
. . . , |
(c) {(x x x 2
, , xn ) x x + 2x 2 = 0}.
. . . ,
I
M
I
are constants}.
(g) {(x i, x z, • • • ,
xn) *i I
= x2 = • • = ^n}-
4 |
Subspaces 25
3. What
the essential difference between the condition used to define the
is
subset in of Exercise 2 and the condition used in (d) ? Is the lack of a non-zero
(c)
4. What
the essential difference between the condition used to define the
is
subset in (c) of Exercise 2and the condition used in (e)7 What, in general, are the
differences between the conditions in (a), (c), and (g) and those in (b), (e), and (/)?
5. Show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span
the same subspace of R4 .
6. Let W
be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1),
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}.
Find a basis for W
and the dimension of W.
7. Show that {(1, -1,2, -3), (1, 1, 2, 0), (3, -1,6, -6)} and {(1,0,1,0),
(0, 2, 0, 3)} do not span the same subspace.
8. Let W = <(l, 2, 3, 6), (4, -1,3, 6), (5, 1, 6, 12)> and W 2
= <(1, -1, 1, 1),
(2, -1,4, 5)> be subspaces of R4 Find bases for .
x
n 2 and W W W + W Extend
x 2.
the basis of W x W W
n 2 to a basis of lt and extend the basis of W n VV to a basis
x 2
of W 2. From these bases obtain a basis of
x + 2 W W .
9. Let P be the space of all polynomials with real coefficients, and let W ±
=
(p(x) \p(l) = 0} and W 2 = {p (x) \p(l) = 0}. Determine W 1
n W 2 and W x + W 2.
(These spaces are dimensional and the student is not expected to find
infinite
bases for these subspaces. What is expected is a simple criterion or description
of these subspaces.)
10. We have already seen (Section 1, Exercise 11) that the real numbers form
a vector space over the rationals. Show that {1, V2} and {1 - V2, 1 + V2}
span the same subspace.
11. Show that if x and
W W
2 are subspaces, then Wx
u W 2 is not a subspace
unless one is a subspace of the other.
3x x — 2x 2 — x3 — 4x4 =
4
is a subspace of R Find a basis for this subspace. {Hint: Solve the equations for
.
x x and x 2 in terms of x s and a;4 Then specify various values for x and x to obtain
.
z 4
as many linearly independent vectors as are needed.)
13. Let S, T, and T* be three subspaces of V (of finite dimension) for which
(a)S n T = S nT*,(b)S + T = S +T*, (c) T c T*. Show that T = T*.
If V = Wj e W and W
15. 2 is any subspace of V such that W c W, show that
x
W = (W nWj) (W n VV 2 ). Show by an example that the condition W <= W
(or W
x
VV)
2
<=
necessary. is
chapter
II Linear
transformations
and matrices
the Hermite normal form. The Hermite normal form is one of our most
important and effective computational tools, far exceeding in utility its
application to the study of this particular equivalence relation.
The pattern we have described is worth conscious notice since it is re-
current and the principal underlying theme in this exposition of matrix
theory. We define a concept, find a representation suitable for effective
computation, change bases to see how this change affects the representation,
and then seek a normal form in each class of equivalent representations.
1 I Linear Transformations
Let U and V be vector spaces over the same field of scalars F.
then any vector <xeU such that <r(a) = a is called an inverse image of a.
The set of all a e U such that <r(<x) = a is called the complete inverse image
of a, and it is denoted by o r_1 (a). Generally, <r -1 (a) need not be a single
element as there may be more than one aeli such that c(a) = a.
By taking particular choices for a and b we see that for a linear trans-
formation cr(a + |8) = ff(a) + and (r(aa) = a(r(a). Loosely speaking,
<r(/8)
the image of the sum is the sum of the images and the image of the product
isthe product of the images. This descriptive language has to be interpreted
generously since the operations before and after applying the linear trans-
formation may take place in different vector spaces. Furthermore, the remark
about scalar multiplication is inexact since we do not apply the linear trans-
formation to scalars; the linear transformation is defined only for vectors
in U. Even so, the linear transformation does preserve the structural
operations in a vector space and this is the reason for its importance. Gener-
ally, in algebra a structure-preserving mapping is called a homomorphism.
To describe the special role of the elements of F in the condition, a(aa.) =
ao(<x), we say that a linear transformation is a homomorphism over F, or an
F-homomorphism
If for a 5^ j8 it necessarily follows that <r(a) ^ o(ji), the homomorphism a
is said to be one-to-one and it is called a monomorphism. If A is any subset of
28 Linear Transformations and Matrices |
II
also a zero mapping of U into W, and this mapping has the same effect on the
elements of U as the zero mapping of U into V. However, they are different
linear transformations since they have different codomains. This may seem
like an unnecessarily fine distinction. Actually, for most of this book we
could get along without this degree of precision. But the more deeply we go
into linear algebra the more such precision is needed. In this book we need
this much care when we discuss dual spaces and dual transformations in
Chapter IV.
A homomorphism that is both an epimorphism and a monomorphism is
called an isomorphism. If a e V, the fact that a is an epimorphism says that
There is an a e U such that a(<x) = a. The fact that a is a monomorphism says
that this a is unique. Thus, for an isomorphism, we can define an inverse
mapping cr_1 that maps a onto a.
-1
Theorem The inverse cr
1.1. of an isomorphism is also an isomorphism.
proof. Since a' is obviously one-to-one and onto, it is necessary only
1
to show that it is linear. If a = <r(a) and /?= <r(/3), then a(aa. + bp) =
a* + bp so that a-\adL + bfi) = a<x + bp = acr\a) + ba- ^).
1
_1
For the inverse isomorphism (a) is an element of U. This conflicts with
cr
_1
the previously given definition of (a) as a complete inverse image in which
ff
-1
o-\dL) is a subset of U. However, the symbol cr , standing alone, will
always be used to denote an isomorphism, and in this case there is no diffi-
_1
culty caused by the fact that <r (a) might denote either an element or a one-
element set,
d(* + 0) da dp
proved about the derivative is that it is linear, =
dx
—
+ dx andJ —
j/ j dx
——
\
=a — —
The mapping t(<x) = ^Lo - ^7 x * +1 is also linear Notice -
+ 1
.
dx dx i
that this is not the indefinite integral since we have specified that the constant
1
I
Linear Transformations 29
of integration shall be zero. Notice that a is onto but not one-to-one and t
is one-to-one but not onto.
r + a.
For each linear transformation a and a e F define aa by the rule (aa)(ct) = ;
Notice that if W#
U, then to is defined but or is not. If all linear trans-
formations under consideration are mappings of a vector space U into itself,
then these linear transformations can be multiplied in any order. This means
that ra and ar would both be defined, but it would not mean that ra = ar.
The of linear transformation of a vector space into itself is a vector
set
space, as we have already observed, and now we have defined a product
which satisfies the three conditions given above. Such a space is called an
associative algebra. In our case the algebra consists of linear transformation
and it is known as a linear algebra. However, the use of terms is always in
a state of flux, and today this term is used in a more inclusive sense. When
referring to a particular set with an algebraic structure, "linear algebra"
still denotes what we have just described. But when referring to an area of
1 I
Linear Transformations 31
study, the term "linear algebra includes virtually every concept in which
linear transformations play a role, including linear transformations between
different vector spaces (in which the linear transformations cannot always
be multiplied), sequences of vector spaces, and even mappings of sets of
have the structure of a vector space).
linear transformations (since they also
It follows from this corollary that <r(0) = where denotes the zero vector
of U and the zero vector of V. It is even easier, however, to show it directly.
Since o"(0) = <r(0 + 0) = tf(0) + <r(0) it follows from the uniqueness of the
For the rest of this book, unless specific comment is made, we assume
that all vector spaces under consideration are finite dimensional. Let
dim U = n and dim V = m.
The dimension of the subspace Im(cr) is called the rank of the linear trans-
formation a. The rank of a is denoted by p(ff).
Theorem 1.5. If W
is a subspace of V, the set <r\W) of all a e U such that
(r(a)£ W
is a subspace of U.
is, 2* c Si e Ka ( )- 2; c>& =
In tnis case tnere exist coefficients d{ such that
2, di&i- If any of these coefficients were non-zero we would have a non-
trivial relation among the elements of {a l5 a,, ^ l5 /?*.}. Hence, all. . . , . . . ,
cf
= and M/^), cr(0 )} is linearly
. . .independent.
, fc
But then it is a basis
space R 2
In this case, it is simplest and sufficiently accurate to think of a
.
3 2
as the linear transformation which maps (a u a 2 a 3 ) e R onto (a lt a 2 ) e R ,
Clearly, every point (0, 0, « 3 ) on the :r3 -axis is mapped onto the origin.
Thus K{a) is the z 3 -axis, the line through the origin in the direction of the
projection, and {(0, 0, 1) = aj is a basis of K(a). It should be evident
that any plane through the origin not containing K{o) will be projected onto
the a^-plane and that this mapping is one-to-one and onto. Thus the com-
plementary subspace <&, /S 2 > can be taken to be any plane through the origin
not containing the rr 3 -axis. This illustrates the wide latitude of choice possible
for the complementary subspace </? l5 P p ). . . .
,
v(a) = 0, a is a monomorphism.
It is but a matter of reading the definitions to see that o is an epimorphism
if and only if p(a) = dim V.
Theorem 1.8. Let U and V have the same finite dimension n. A linear
ImO) into W so that for all a e Im(o-), t'(oc) = T (a). Then AT(t') = Im(or) n
^(t) and />(Y) = dim t [Im(»] = dim r<r(L/) = p{ra). Then Theorem 1.6
or
{ft, 0„
. . .
PJ,
of V. .Define
. .r^),
= ft for > r and r(ft) = for »
34 Linear Transformations and Matrices |
II
n n
'£ *,** ;!--
(7(a) = ^ *<<<«)< = 2 «ift e U.
and define the values of a on the other elements of the basis arbitrarily. This
will yield a linear transformation a with the desired properties.
should be clear that, if C is not already a basis, there are many ways to
It
define a. Itis worth pointing out that the independence of the set C is crucial
to proving the existence of the linear transformation with the desired prop-
erties.Otherwise, a linear relation among the elements of C would impose
a corresponding linear relation among the elements of D, which would mean
that D could not be arbitrary.
1 I
Linear Transformations 35
Theorem 1.17 establishes, for one thing, that linear transformations really
do exist. Moreover, they exist in abundance. The real utility of this theorem
and its corollary is that it enables us to establish the existence of a linear
transformation with some desirable property with great convenience. All
we have to do is to define this function on an independent set.
Definition. A linear transformation tt of V into itself with the property that
7T-
2 = 7T is called a. projection.
Fig. 1
EXERCISES
1. Show that o((x 1 ,x 2 )) = (x 2 , x x ) defines a linear transformation of R 2 into
itself.
a basis of o(U). (Hint: Take particular values of the x t to find a spanning set for
or(U).) Find a basis of K(o).
6. Let D denote the operator of differentiation,
dy d 2v
D(y) = -j
x
, D\y) = D[D(y)] = ^ , etc.
Show that D n is a
linear transformation, and also that p(D) is a linear transforma-
tion a polynomial in D with constant coefficients. (Here we must assume
if p(D) is
that the space of functions on which D is denned contains only functions differen-
tiable at least as often as the degree of p(D).)
n
tfC"*) = 2 * a i xil
and
i=0 * "I" 1
basis of U. Show that if the values {0(0.-^}, . . . , o(a. n )} are known, then the value
of <r(a) can be computed for each a e U.
11. Let 1/ and V be vector spaces of dimensions n and w, respectively, over the
same field F. We have already commented that the set of all linear transformations
of U into V forms a vector space. Give the details of the proof of this assertion.
Let A = {04, a n } be a basis of U and 8 = {/? x
. . . , /? TO } be a basis of V. Let o^ , . . .
,
[0 if k *j,
[& if k =j.
For the following sequence of problems let dim U = n and dim V — m. Let o
be a linear transformation of U into V and t a linear transformation of V into W.
12. Show that p(a) < P ( r o) + i>(t). (///«/: Let V= o(U) and apply Theorem
1.6 to t defined on V.)
13. Show that max {0, p(o) + p(-r) — m} < p(to) < min (p(t), p(<r)}.
2 I Matrices 37
14. Show that max {n — m + v(t), v(o)} <, v(to) <, min {n, v(o) + v( T)}. (For
m= n this inequality is known as Sylvester's law of nullity.)
15. Show that if j>(t) = 0, then p (to) = P (a).
16. It is not generally true that v(a) = implies P (ra) = P ( T). Construct an
example to illustrate this fact. (Hint: Let m be very large.)
2 I Matrices
Definition. A matrix over a field F is a rectangular array of scalars. The
array will be written in the form
02i a9
(2.1)
whenever we wish to display all the elements in the array or show the form
of the array. A matrix with m rows and n columns is called an m x n
matrix. Ann x n matrix is said to be of order n.
We often abbreviate a matrix written in the form above to [a ti ] where
the index denotes the number of the row and the second index denotes
first
the number of the column. The particular letter appearing in each index
position is immaterial; it is the position that is important. With this con-
vention a H is a scalar and [a i}] is a matrix. Whereas the elements a and a
H kl
need not be equal, we consider the matrices [a i}] and [akl ] to be identical
since both [a i}] and [akl ] stand for the entire matrix. As a further convenience
we often use upper case Latin italic letters to denote matrices; A = [a ].
u
Whenever we use lower case Latin italic letters to denote the scalars appearing
;
in the matrix, we use the corresponding upper case Latin italic letter to denote
the matrix. The matrix in which all scalars are zero is denoted by (the third
use of this symbol!). The a tj appearing in the array [a i}\ are called the
elements of [a ti ]. Two matrices are equal if and only if they have exactly
the same elements. The main diagonal of the matrix [a {j ] is the set of elements
{a n , . a n ) where / = min {m, n}. A diagonal matrix is a square matrix
. . ,
n i m \
m I n \
= 2(2««*i)A. (2-3)
i=i y=i /
2 I Matrices 39
a can be extended to all of U because A spans U, and the result is well defined
(unique) because A is linearly independent.
Here are some examples of linear transformations and the matrices which
represent them. Consider the real plane R 2 = U = V. Let A = 6 = {(1,0),
(0, 1)}. A 90° rotation counterclockwise would send (1,0) onto (0, 1) and
it would send (0, 1) onto (-1,0). Since <r((l, 0)) = (1, 0) + 1 (0, 1) and • •
-1
1
cos -sin
(2.4)
sin ( cos
m
=2 ( a ii + b ii)Pi- (2.5)
and only if the two matrices have the same number of rows and the same
number of columns.
If a is any scalar, for the linear transformation aa we have
7H j T \
S.a.^,1
and A side by side. The element c kj of the product is then obtained by ,^ 7<
multiplying the corresponding elements of row k of B and column j of A
and adding. We can trace the elements of row k of B with a finger of the
left hand while at the same time tracing the elements of column j of A with
a finger of the right hand. At each step we compute the product of the
corresponding elements and accumulate the sum as we go along. Using
this simple rule we can, with practice, become quite proficient, even to the
point of doing "without hands."
Check the process in the following examples
2"
"l -f 2"
1 4 -1 "
5
2
2 1 3 = 11 -1
2 1
-2 1 -2 2_ -2_
3 -2
All definitions and properties we have established for linear transformations
can be carried over immediately for matrices. For example, we have:
1. • A =
(The "0" on the left is a scalar, the "0" on the right
0. is a
matrix with the same number of rows and columns as A.)
2. \- A = A.
3. A(B+ C)= AB + AC.
4. {A + B)C = AC + BC.
5. A(BC) = (AB)C.
Of course, in each of the above statements we must assume the operations
proposed are well defined. For example, in 3, B and C must be the same
2 I Matrices
41
size and A must have the same number of columns as B and C have rows.
The rank and nullity of a matrix A are the rank and nullity of the associated
linear transformation, respectively.
The rank of a
is the dimension of the subspace Im(<r) of V. Since Im(a)
isspanned by {ofaj, <r(a B )}, p{a) is the number of elements in a
. . . ,
yt=i an x i
(i=l, ...,m). (2.8)
3=1
Y= AX (2.9)
where
Vi
r= and X=
f = XU
x i<*-i- Because of the usefulness of equation (2.9) we also find it
convenient to represent f by the one-column matrix X. In fact, since it is
:
Notice that we have now used matrices for two different purposes, (1) to
represent linear transformations, and (2) to represent vectors. The single
matric equation Y = AX contains some matrices used in each way.
EXERCISES
1 . Verify the matrix multiplication in the following examples
'
2 1
-3"
(«) 3 1
-2" 3 -4
-1 6 1 =
-5 2 3 -9 11
1 -2
'
2 1
-3" " 2" "10"
(*)
-1 6 1 3 = 15
1 -2 -1 4
10"
(c) 3 37
15
-5
4
2. Compute
2"
3 -4'
3
-9 11
-1
3. Find AB and BA if
110 B =
5 6 7
A =
10 10 -1 -2 -3 -4
10 1 -5 -6 -7 -!
and (0, 1) —
onto ( 1,2). Determine the matrix representing a with respect to the
bases A = B ={(1,0), (0,1)}.
2 |
Matrices 43
Let a be a linear transformation of R 2 into itself that maps (1, 1) onto (2, —3)
5.
and —1) onto (4, —7). Determine the matrix representing a with respect to the
(1,
bases A = B = {(1 0), (0, 1)}. (Hint: We must determine the effect of a when it is
,
applied to (1,0) and (0, 1). Use the fact that (1,0)= |(1, 1) + 1(1, -1) and the
linearity of a.)
that a does not map two different vectors onto the same vector. Thus, there is
is,
a linear transformation that maps (3, -1) onto (1,0) and ( — 1,2) onto (0, 1).
This linear transformation reverses the mapping given by a. Determine the matrix
representing it with respect to the same bases.
7. Let us consider the geometric meaning of linear transformations. A linear
transformation of R2 into itself leaves the origin fixed (why?) and maps straight
lines into straight lines. (The word "into" is required here because the image of a
straight line may
be another straight line or it may be a single point.) Prove that
the image of a straight line is a subset of a straight line. (Hint: Let a be represented
by the matrix
A =
Then a maps (x, y) onto (a u a> + a 12 y, a 21 x + a 22y). Now show that if (x, y)
satisfies the equation ax + by = c its image satisfies the equation
(aa 22 - ba 21 )x + (a nb - a 12 a)y = (a u a 22 — a 12 a 21 )c.)
8. (Continuation) We
say that a straight line is mapped onto itself if every
point on the line is mapped onto a point on the line (but not all onto the same
point) even though the points on the line may be moved around.
(a) A
linear transformation maps (1, 0) onto (-1,0) and (0, 1) onto (0, -1).
Show that every line through the origin is mapped onto itself. Show that each
such line is mapped onto itself with the sense of direction inverted. This linear
transformation is called an inversion with respect to the origin. Find the matrix
representing this linear transformation with respect to the basis
{(1,0), (0, 1)}.
(b) A
linear transformation maps (1,1) onto (-1, -1) and leaves
(1, -1)
fixed. Show that every line perpendicular to the line x + x = is mapped onto
x 2
itself with the sense of direction inverted. Show that every point on the line
xi + x2 = is left fixed. Which lines through the origin are mapped onto them-
selves? This linear transformation is about the line x x + x 2 = 0.
called a reflection
Find the matrix representing this linear transformation with respect to the basis
{(1,0), (0, 1)}. Find the matrix representing this linear transformation with
respect to the basis {(1, 1), (1, —1)}.
(c) A liner transformation maps (1, 1) onto (2, 2) and (1, -1) onto (3, -3).
Show that the lines through the origin and passing through the points
(1,1) and
(1, -1) are mapped onto themselves and that no other lines are mapped onto
themselves. Find the matrices representing this linear transformation with respect
to the bases {(1, 0), (0, 1)} and {(1, 1), (1, -1)}.
44 Linear Transformations and Matrices I II
(d) A linear transformation leaves (1 , 0) fixed and maps (0, 1) onto (1 , 1). Show
that each line x2 = c is mapped onto itself and translated within itself a distance
equal to c. This linear transformation is called a shear. Which lines through the
origin are mapped onto themselves? Find the matrix representing this linear
transformation with respect to the basis {(1,0), (0, 1)}.
5
0) A linear transformation maps (1 0) onto (T 3 j-f) and (0, 1) onto ( -yf ts).
, , ,
Show that every line through the origin is rotated counterclockwise through the
angle 6 = arc cos T This linear transformation is called a rotation. Find the
V
matrix representing this linear transformation with respect to the basis {(1,0),
(0,1)}-
,
(/) A , ,
that each point on the line 2x x + x 2 = 3c is mapped onto the single point (c, c).
The line x - x = is left fixed. The only other line through the origin which
1 2
{Hint: In Exercise 7 we have shown that straight lines are mapped into straight
lines. We already know that linear transformations map the origin onto the origin.
Thus it is determine what happens to straight lines passing through
relatively easy to
the origin. For example, to see what happens to the a^-axis it is sufficient to see
what happens to the point (1,0). Among the transformations given appear a
rotation, a reflection, two projections, and one shear.)
10. (Continuation) For the linear transformations given in Exercise 9 find all
lines through the origin which are mapped onto or into themselves.
11. Let U = R 2 and V = R 3 and a be a linear transformation of U into V that
maps (1, 1) onto (0, 1, 2) and ( — 1, 1) onto (2, 1, 0). Determine the matrix that
represents a with respect to the bases A = {(1, 0), (0, 1)} in 8 = {(1, 0, 0), (0, 1,0),
(0,0, 1)} in R
3
. (Hint: |(1> D ~ K"l. D = 0.0)-)
14. Show that the matrix C = [a^bj] has rank one if not all a t and not all bt are
zero. {Hint: Use Theorem 1.12.)
15. Let a, b, c, and d be given numbers (real or complex) and consider the
function
ax +b
J
ex +d
Let g be another function of the same form. Show that gf where gf{x) = g{f{x))
is a function that can also be written in the same form. Show that each of these
functions can be represented by a matrix in such a way that the matrix representing
gfis the product of the matrices representing £ and/. Show that the inverse function
exists if and only if ad — be ^ 0. To what does the function reduce if ad — be =0?
16. Consider complex numbers of the form x + yi (where x and y are real
numbers and i
2
= — 1) and represent such a complex number by the duple {x, y)
in R2 . Let a + bi be a fixed complex number. Consider the function / defined by
the rule
{a) Show that this function is a linear transformation of R 2 into itself mapping
{x, y) onto {u, v).
{b) Find the matrix representing this linear transformation with respect to the
basis {(1,0), (0,1)}.
(c) Find the matrix which represents the linear transformation obtained by using
c + di in place of a + bi. Compute the product of these two matrices. Do they
commute?
{d) Determine the complex number which can be used in place of a + bi to
obtain a transformation represented by this matrix product. How is this complex
number related to a + bi and c + dfl.
17. Show by example that it is possible for two matrices A and B to have the
same rank while A 2 and B 2 have different ranks.
3 I Non-singular Matrices
Let us consider the case where U = V, that is,we are considering trans-
formations of V into itself. Generally, a homomorphism of a set into itself
is called an endomorphism. We
consider a fixed basis in V and represent
the linear transformation of V into itself with respect to that basis. In this
case the matrices are square or n x n matrices. Since the transformations
we are considering map Vinto itself any finite number of them can be iterated
in any order. The commutative law does not hold, however. The same
remarks hold for square matrices. They can be multiplied in any order but
46 Linear Transformations and Matrices I II
_o L _o o_ .o o_
Definition. A
one-to-one linear transformation a of a vector space onto
an automorphism. An automorphism is only a special kind of
itself is called
isomorphism for which the domain and codomain are the same space. If
<x(a) = a, the mapping a (a) = a is called the inverse transformation of a.
-1
it is an epimorphism.
»
Theorem 3.3. A linear transformation a of an n-dimensional vector space
into itself is an automorphism if and only if its nullity is 0, that is, if and only
is a monomorphism.
if it
proof (of Theorems 3.1, 3.2, and 3.3). These properties have already
been established for isomorphisms.
Since it is clear that transformations of rank lessthan n do not have
'inverses because they are not onto, we see that automorphisms are the
'only linear transformations which have inverses. linear transformation A
that has an inverse is said to be non-singular or invertible; otherwise it is
said to be singular. Let A be the matrix representing the automorphism
a, and let A" 1 be the matrix representing the inverse transformation a~ x .
On the other hand suppose that for the matrix A there exists a matrix
B such that BA = I. Since / is of rank n, A must also be of rank n and, *
(2) AA' 1 = I.
(AA _1 )B = B. The solutions are unique since for any C having the property
that CA — B we have C = CAA~ = BA~ and similarly with any solution
X X
,
of ,4 7= B. a
As an example illustrating the last statement of the theorem, let
1
2" 0"
I
A = A~ x = B=
1_ » I 1_
Then
"1 -2 -3 -2
X= BA- 1 = and Y = A~ B =X
2 -3 2 1
EXERCISES
1 Show that the inverse of
"1 2 3" -2 1"
A = 2 3 4 is A'1 = 3 -2
3 4 6 1 -2 1
-2 1
1 -2
What is the inverse of Al (Geometrically, this matrix represents a 180° rotation
about the line containing the vector (2, 1, 1). The inverse obtained is therefore
not surprising.)
3. Compute the image of the vector (1, —2, 1) under the linear transformation
represented by the matrix
"1 2 3"
A = 2 3 4
1 2
Show that A cannot have an inverse.
3 |
Non-singular Matrices 49
4. Since
T ll X 12 3 -1 11 ^12 ~"~ 11 ' ^^12
" 3 -1
we can find the inverse of by solving the equations
_-5 2
ix 1:l 5cc 12 = 1
=
C 21
•'• 5#22 =
X %\ " ^"c 22 = 1-
Solve these equations and check your answer by showing that this gives the inverse
matrix.
We have not as yet developed convenient and effective methods for obtaining
the inverse of a given matrix. Such methods are developed later in this chapter
and in the following chapter. If we know the geometric meaning of the matrix,
however, it is often possible to obtain the inverse with very little work.
5. The matrix represents a rotation about the origin through the angl&
6 = arc cos f. What rotation would be the inverse of this rotation ? What matrix
would represent this inverse rotation? Show that this matrix is the inverse of the
given matrix.
r ° _i i
6. The matrix
•
7. The matrix
n n represents a shear. The inverse transformation is also a
shear. Which one? What matrix represents the inverse shear? Show that this
matrix is the inverse of the given matrix.
4 I Change of Basis
Definition. Let A =
{a l9 , a„} and A' = {x'v
. .
.
, ol'J be bases of the . . .
*; = i>««i. P (4.i)
The associated matrix P= [p i0\ is called the matrix of transition from the
basis A to the basis A\
The columns of P are the ^-tuples representing the new basis vectors in
terms of the old basis. This simple observation is worth remembering as
it is usually the key to determining P when a change of basis is made. Since
the columns of P are the representations of the basis A' they are linearly
independent and P has rank n. Thus P is non-singular.
=
Now let £ 2"=i x iV*i be an arbitrary vector of U and let £ = 2>=1 x\ix s i
~
n n / n \
= 2 llPiri
i=i y=i
W
/
(4 - 2 >
n < '
f
n I m v
"\"«i
h
x
^c *,
( .
^/,.,U
•
*
N
-'
= 1<A- V^o -f\h (4.4)
1=1
Thus v4P is a matrix representing c with respect to the bases A' and B. Since
the matrix representing a is uniquely determined by the choice of bases we
Combining these results, we see that, if both changes of bases are made at
once, the new matrix representing a is Q~ X AP.
As in the proof of Theorem 1.6 we can choose a new basis A' = {04, . . , <x.'
.
n}
of U such that the last v = n —
p basis elements form a basis of K{a). Since
M04), . . . , <r(<Xp)} is a basis of a(U) and is linearly independent in V, it can
52 Linear Transformations and Matrices I II
be extended to a basis B' of V. With respect to the bases A' and 8' we have
or(a') =
ft
for/" < p while cr(a^) for/ =
p. Thus the new matrix Q~ AP
X
>
representing a is of the form
p columns v columns
10
1
p rows
m— p rows
Thus we have
When A andB are unrestricted we can always obtain this relatively simple
representation of a linear transformation by a proper choice of bases.
More interesting situations occur when A and B are restricted. Suppose,
for example, that we take U = V and A = B. In this case there is but one
basis to change and but one matrix of transition, that is, P = Q. In this
case it is not possible to obtain a form of the matrix representing a as simple
as that obtained in Theorem 4.1. We say that any two matrices representing
the same linear transformation or of a vector space V into itself are similar.
This is equivalent to saying that two matrices A and A' are similar if and
only if there exists a non-singular matrix of transition P such that A'
=
P- X AP. This case occupies much of our attention in Chapters III and V.
EXERCISES
1. In P 3 , the space of polynomials of degree 2 or smaller with coefficients in
F, let A ={\,x,x2 }.
A' = { Pl
{x) = x2 + x + 1 , p 2 (x) =x -x
2
-2, p 3 (x) = x2 + x - 1}
\ -V3/2
P =
V3/2 i
from A' to A. (Notice, in particular, that in Exercise 2 the columns of P are the
components of the vectors in A' expressed in terms of basis A, whereas in this exercise
the columns of R are the components of the vectors in A expressed in terms of the
basis A'. Thus these two matrices of transition are determined relative to different
bases.) Show that RP = I.
2
4. (Continuation) Consider the linear transformation of o of R into itself which
maps
(1,0) onto (£, V3/2)
5. letIn R3 A =
{(1,0, 0), (0, 1,0), (0, 0, 1)} and let A' = {(0, 1, 1), (1,0, 1),
(1,1, 0)}. Find the matrix of transition P from A to A' and the matrix of transition
P-1 from A' to A.
Use the results of Exercise 6 to resolve the question raised in the parenthetical
7.
a fc
The subspaces 0(Uk ) of V form a non-decreasing chain of subspaces with
).
«(Uk-i) <= <y(Uk ) and o"(U n ) = cr(U). Since a(Uk ) = + (a(oik )) we see a^)
from Theorem 4.8 of Chapter I that dim a(Uk ) < dim o^L/^) + 1 that is, ;
can be extended to a basis B' of V. Let us now determine the form of the
matrix A' representing a with respect to the bases A and 8'.
o(ct. ^ =
ft, column k t has a 1 in row i and all other elements of
Sincek
this column are O's. For k t <j < k i+1 <r(oc ) e <r(C so that column j ,
3 fc .)
column column
*1 /c
2
r
• •
1 "l,fcl+l •
.. "l,fc 2 +l
• • • •
1 a 2,fc 2 +l
• • •
(5.1)
determined. There may be many ways to extend this set to the basis 6',
but the additional basis vectors do not affect the determination of A' since
every element of a{U) can be expressed in terms of {ft, ...
ft} alone. Thus ,
and V are introduced and how the bases A and 8 are chosen we can regard
Q x and Q 2 as matrices of transition in V. Thus A[ represents a with respect
to bases A and 8^ and A'2 represents a with respect to bases A and & 2 But .
condition (3) says that for i <p the rth basis element in both 8^ and B'
2
is
<7(<x .). Thus the first p elements of B[ and B' are identical. Condition (1) says
fc 2
that the remaining basis elements have nothing to do with determining the
coefficients in A[ and A'2 Thus A'x = A 2
. .
is moved down to row k t Thus each non-zero row begins on the main
.
diagonal and each column with a 1 on the main diagonal is otherwise zero.
In this text we have no particular need for this special form while the form
described in Theorem 5.1 is one of the most useful tools at our disposal.
The usefulness of the Hermite normal form depends on its form, and
the uniqueness of that form will enable us to develop effective and con-
venient short cuts for determining that form.
Exercise 4.)
EXERCISES
1. Which of the following matrices are in Hermite normal form?
"01001"
10 1
(a)
10
0_
"00204"
110 3
(b)
12
0_
"i o o o r
10 1
(c)
11
0_
"0101001"
10 10
(d)
10
0_
"10 10 1"
110
(e)
oooio
lj
T 1 0' 1 1"
A = and B =
1 1
6 |
Elementary Operations and Elementary Matrices 57
respectively. Show that there is no basis 8' of R 2 such that B is the matrix represent-
ing a with respect to A and 8'.
4. Show that
(a) (A + B) T = A T + BT ,
(b) (AB) T = B TA T ,
1 ••• (T
c •••
1
•••
E 2 (c)
= (6.1)
58 Linear Transformations and Matrices I II
The addition of k times the third row to the first row can be effected by the
matrix
"1
k •
• 0"
10- •
1- •
_0 •
• •
1
The interchange of the first and second rows can be effected by the matrix
~0 1 • • •
0"
1 •••
1
•••
•£l9 (6.3)
_o o o ••• :
We now add —a a times the first row to the rth row to make every other
element in the first column equal to zero.
The resulting matrix is still non-singular since the elementary operations
applied were non-singular. We now wish to obtain a 1 in the position of
element a22 At least one element in the second column other than a 12
.
6 |
Elementary Operations and Elementary Matrices 59
isnon-zero for otherwise the first two columns would be dependent. Thus
by a possible interchange of rows, not including row 1, and multiplying the
second row by a non-zero scalar we can obtain a 22 = 1- We now add — a iZ
times the second row to the /th row to make every other element in the second
column equal to zero. Notice that we also obtain a in the position of a 12
without affecting the 1 in the upper left-hand corner.
We way until we obtain the identity matrix. Thus if
continue in this
E E lt ,Er are elementary matrices representing the successive elementary
2, . . .
operations, we have
or
= Er -E E A, ?'
I 2 l M V '
(6.4)
:
-
f\" fi
A = E?E^-E~\n T; ^p- r .
Theorem 5.1 we obtained the Hermite normal form A' from the matrix
In
_1
A by multiplying on the left by the non-singular matrix Q We see now .
example we avoid this pitfall, which can occur when several operations of
60 Linear Transformations and Matrices I II
type III are combined, by considering one row as an operator row and adding
multiples of it to several others.
Consider the matrix
4 3 2 -1 4
•
5 4 3 -1 4
-2 -2 -1 2 -3
11 6 4 1 11
as an example.
According to the instructions for performing the elementary row oper-
ations we should multiply the first row by J. To illustrate another possible
way to obtain the "1" in the upper left corner, multiply row 1 by —1 and
add row 2 to row 1 Multiples of row 1 can now be added to the other rows
.
to obtain
1 1 1
-1 -2 -1 4
1 2 -3
-5 -7 1 11
1 2 1 -4
1 2 -3
3 6 -9
Finally, we obtain
"l 1 1
1 -3 2
1 2 -3
3 2 -1 4
4 3 -1 4
= [A, I]
-2 -1 2 -3
6 4 1 11
1 1 -1 1
-1 -2 -1 4 5 -4
1 2 -3 -2 2
-5 -7 1 11 11 -11
-1 -1 4 4 -3
1 2 1 -4 -5 4
2 -3 -2 2 1
6 -9 -14 9 1
1 1 2 -1 1 o"
-3 2 -1 -2
2 -3 -2 2 1
-8 3 -3 1
-1 -2
0-i =
-2 2 1
3 -3
Verify directly thatQ~ A is in Hermite normal form.
X
If A
were non-singular, the Hermite normal form obtained would be the
identity matrix. In this case Q~ x would be the inverse of A. This method
of finding the inverse of a matrix is one of the easiest available for hand
computation. It is the recommended technique.
: :
EXERCISES
1. Elementary operations provide the easiest methods for determining the rank
4 5 6
7 8 9
'
(b) 1
-1
-2 -3
"0 2"
(c) 1
1 3
2 3
(a) 1
-2
(b)
"0 r
1
0"
(c) "i
"-1 0"
_
1 0" "1 -r "1 0"
1 1 1 i i i
2 -1 5 2
1 1 1 1
(b) 12 3 3 10 6"
2 10 2 3
2 2 2 1 5 5
-113 2 5 2_
2 3 4
3 4 6.
7. (a) Show that, by using a sequence of elementary operations of type II only,
any two rows of a matrix can be interchanged with one of the two rows multiplied
by —1. (In fact, the type II operations involve no scalars other than ±1.)
(b) Using the results of part (a), show that a type III operation can be obtained
by a sequence of type II operations and a single type I operation.
(c) Show that the sign of any row can be changed by a sequence of type II
operations and a single type III operation.
8. Show that any matrix A can be reduced to the form described in Theorem 4.1
between the elements of K{a) and the elements of {£ } + K{a). Thus the
size of the set of solutions can be described by giving the dimension of K(a).
The set of all solutions of the problem <r(£) = is not a subspace of U unless
Given the linear problem <r(|) = 0, the problem <r(f) = is called the
o(g) = f}
becomes
AX = B (7.2)
in matrix form, or
«n
- -
" a ln bx
[A, B] = (7.4)
solution if and only if the rank of A is equal to the rank of the augmented
matrix [A, B]. Whenever a solution exists, all solutions can be expressed in
terms of v = n — p independent parameters where p is the rank of A. ,
proof. We have already seen that the linear problem <r(|) = |8 has a
solution if and only if e o(U). This is the case if and only if is linearly
dependent on MoO, cr(a n )}. But this is equivalent to the condition
. . . ,
Now the system A'X = B' is particularly easy to solve since the variable
xk appears only in the ith equation. Furthermore, non-zero coefficients
.
appear only in the first p equations. The condition that /? e a(U) also
takes on a form that is easily recognizable. The condition that B' be ex-
pressible as a linear combination of the columns of A' is simply that the
elements of B' below row p be zero. The system A'X = B' has the form
+ ^l.fcj+i^fci+i +" • •
+ ai,fc 2+ i#fc 2+ i + (7.5)
+ a' +l^fc +l
2,fc 2
vCt.
2
Since each x k appears in but one equation with unit coefficient, the remaining
.
5x x + 4x 2 + 3#3 — xt = 4
\\x x + 6x 2 + 4x 3 + #4 = 11.
'
A 3 2-1 4
5 4 3-14
-2 -2 -1 2 -3
11 6 4 1 11
This is the matrix we chose for an example in the previous section. There
we obtained the Hermite normal form
1 1 1
1 -3 2
1 2 -3
X\ + #4=1
x% + 2z4 = —3.
66 Linear Transformations and Matrices |
II
It is clear that this system is very easy to solve. We can take any value
(a?!, x 2 x z Xi )
, , = (1,2, -3, 0) + *4 (-l, 3, -2, 1).
C[A,B] = [0
••• 1],
or
CA = and CB = 1.
EXERCISES
1. Show that {(1,1,1, 0), (2, 1, 0, spans the subspace of
1)} all solutions of the
system of linear equations
3x x — 2x 2 — xz — 4#4 =
xi + x 2 — 2x3 — 3x4 = 0.
The Hermite normal form and the elementary row operations provide
techniques for dealing with problems we have already encountered and
handled rather awkwardly.
set, it is no restriction to assume that B is finite. Let & = 2"=i ^a^i s0 tnat
rows of B and the rows of B' represent sets spanning the same subspace W.
We can therefore reduce B to Hermite normal form and obtain a particular
set spanning W. Since the non-zero rows of the Hermite normal form are
linearly independent, they form a basis of W.
8 |
Other Applications of the Hermite Normal Form 69
struct a matrix C whose rows represent the vectors in C and reduce this
matrix to Hermite normal form. Let C" be the Hermite normal form
obtained from C, and let B' be the Hermite normal form obtained from B.
We do not assume that B and C have the same number of elements, and there-
fore B' and C
do not necessarily have the same number of rows. However,
in each the number of non-zero rows must be equal to the dimension of W.
We claim that the non-zero rows in these two normal forms are identical.
To see this, construct a new matrix with the non-zero rows of C" written
beneath the non-zero rows of B' and reduce this matrix to Hermite normal
form. Since the rows of C
are dependent on the rows of B', the rows of C"
can be removed by elementary operations, leaving the rows of B'. Further
reduction is not possible since B' is already in normal form. But by inter-
changing rows, which are elementary operations, we can obtain a matrix in
which the non-zero rows of B' are beneath the non-zero rows of C". As
before, we can remove the rows of B' leaving the non-zero rows of as C
the normal form. Since the Hermite normal form is unique, we see that the
non-zero rows of B' and C are identical. The basis that we obtain from the
non-zero rows of the Hermite normal form is the standard basis with respect
to A for the subspace W.
This gives us an effective method for deciding when two sets span the
same subspace. For example, in Chapter 1-4, Exercise 5, we were asked to
show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span
the same space. In either case we obtain {(1, 0, 1, 1), (0, 1, —1, —1)} as the
standard basis.
spans W +W
1 2 (Chapter I, Proposition 4.4). Thus we can find a basis for
W + x VV2 by constructing a large matrix whose rows are the representations
u A 2 and reducing
of the vectors in A x it to Hermite normal form by ele-
We have already seen that the set of all solutions of a system of homo-
geneous linear equations is a subspace, the kernel of the linear transformation
represented by the matrix of coefficients. The method for solving such a
system which we described in Section 7 amounts to passing from a charac-
terization of a subspace as the set of all solutions of a system of equations
to its description as the set of all linear combinations of a basis. The question
70 Linear Transformations and Matrices II
|
naturally arises: If we are given a spanning set for a subspace W, how can
we find a system of simultaneous homogeneous linear equations for which W
is exactly the set of solutions ?
This
is not at all difficult and no new procedures are
required. All that
isneeded is a new look at what we have already done. Consider the homo-
geneous linear equation a x x x -\ + a n x n = 0. There is no significant
difference between the a/s and the s/s in this equation ; they appear sym-
metrically. Let us exploit this symmetry systematically.
If a 1 x 1 + + a nx n = and b x x x +
• • •
n n =
\- b x are two homo-
geneous linear equations then (a x + b )x +
x x + (a n + b n )xn = is a • • •
equations, and the solution of the standard equations will be the standard
basis. In the following example the computations will be carried out in this
way to illustrate this idea. It is not recommended, however, that this be
generally done since accuracy with one definite routine is more important.
Let
W= <(1, 0, -3, 11, -5), (3, 2, 5, -5, 3), (1, 1,2, -4, 2), (7, 2, 12, 1, 2)>.
8 |
Other Applications of the Hermite Normal Form 71
-3 11 -5
2 5 -5 3
2 -4 2
12 1 2
to the form
5 1
2 1
From this we see that the coefficients of our systems of equations satisfy the
conditions
2a x + 5az + a5 =
ax + 2a z + a4 =0
ax + a2 = 0.
The coefficients a x and a z can be selected arbitrarily and the others computed
from them. In particular, we have
xl ~ x2 ~ xi — 2x 5 =
The reader should check that the vectors in actually W satisfy these equations
and that the standard basis for is obtained. W
The Intersection of Two Subspaces
Let W x and W 2 be subspaces of U of dimensions v x and i> 2 , respectively,
no ft k can be so represented ?
We are looking for a relation of the form
& i
P* = I ***&• « (8.1)
This usually means that the are given in terms of some coordinate system, &
relative to some given basis. But the relation (8.1) is independent of any
coordinate system so we are free to choose a different coordinate system if this
willmake the solution any easier. It turns out that the tools to solve this
problem are available.
Let A = {<*!, a TO } be the given basis and let
. . . ,
m
Pi = Z a» a *' ;' = 1, . . . ,
n. (8.2)
If A' = {a{, . . . , a'm } is the new basis (which we have not specified yet),
we would have
m
Pi = 2 a 'n^ j =1,. . . ,n. (8.3)
i=l
A = PA'. (8.4)
8 |
Other Applications of the Hermite Normal Form 73
Pi = 2 a i><
= 2 a 'iAi (8.6)
Since k t < kr < j, this last expression is a relation of the required form.
(Actually, every linear relation that exists among the j8 < can be obtained
from those in (8.6). This assertion will not be used later in the book so we
will not take space to prove Consider it "an exercise for the reader.")
it.
(1, 1,2, —4, 2), (7, 2, 12, 1, 2)}. The implied context is that a basis A =
{a l5 . . . , a5 } is considered to be given and that & = a x — 3a 3 + lla 4 — 5a 5
etc. According to (8.2) the appropriate matrix is
"1-3 1 T
2
-3 12
11 -5 -4 1
-5 3 2 2_
2 3
1 4"
a 9
1 2"
. :
EXERCISES
1 Determine which of the following set in R4 are linearly independent over R.
2. Let W
be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1),
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}.
Find a standard basis for W
and the dimension of W. This problem is identical
to Exercise 6, Chapter 1-4.
3. Show that {(1, -1,2, -3), (1,1,2,0), (3, -1,6, -6)} and {(1,0,1,0),
(0, 2, 0, 3)} do not span the same subspace. This problem is identical to Exercise 7,
Chapter 1-4.
4. If W x
= <(1, 1, 3, -1), (1, 0, -2, 0), (3, 2, 4, -2)> and W 2
= <(1, 0, 0, 1),
(1, 1,7, 1)> determine the dimension of W x + W 2.
5. Let W
= <(1, -1, -3, 0, 1), (2, 1,0, -1,4), (3, 1,-1,1, 8), (1, 2, 3, 2, 6)>.
Determine the standard basis for W. Find a set of linear equations which char-
acterize W.
6. Let W x
= <(1, 2, 3, 6), (4, -1, 3, 6), (5, 1, 6, 12)> and W 2
= <(1, -1, 1, 1),
9 I Normal Forms
To understand fully what a normal form is, we must first introduce the
concept of an equivalence relation. We say that a relation is defined in a
set if, for each pair {a, b) of elements in this set, it is decided that "a is
Examples. Among rational fractions we can define a\b ~ c\d (for a, b,c,d
integers) if and only if ad = be. This is the ordinary definition of equality
in rational numbers, and this relation satisfies the three conditions of an
equivalence relation.
In geometry we do not ordinarily say that a straight line is parallel to
itself. But if we agree to say that a straight line is parallel to itself, the
concept of parallelism is an equivalence relation among the straight lines
in the plane or in space.
Geometry has many equivalence relations: congruence of triangles,
similarity of triangles, the concept of projectivity in projective geometry,
etc. In dealing with time we use many equivalence relations: same hour
of the day, same day of the week, etc. An equivalence relation is like a
generalized equality. Elements which are equivalent share some common
or underlying property. As an example of this idea, consider a collection of
sets. We say that two sets are equivalent if their elements can be put into
Since the relation between S and S b is symmetric we also have S <= S b and
hence Sa = S b Since a eSu we have shown, in effect, that a proposed
.
lence class if and only if they are equivalent. (3) Non-identical equivalence
classes are disjoint. On the other hand, a partition of a set into disjoint
subsets can be used to define an equivalence relation; two elements are
equivalent if and only if they are in the same subset.
The notions of equivalence and equivalence classes are not
relations
nearly so novel as they may seem
Most students have encountered
at first.
these ideas before, although sometimes in hidden forms. For example,
we may say that two differentiable functions are equivalent if and only if
76 Linear Transformations and Matrices |
II
they have the same derivative. In calculus we use the letter "C" in describing
the equivalence classes; x + + 2z + C is the set (equiva-
for example, 3
#2
lence class) of all functions whose derivative is 3x + 2x + 2.
2
not a standard term for this equivalence relation, the term most frequently
used being "equivalent." It seems unnecessarily confusing to use the same
term for one particular relation and for a whole class of relations. Moreover,
this equivalence relation is perhaps the least interesting of the equivalence
relations we shall study.
IV. The matrices A and B are said to be similar if there exists a non-
singular matrix P such that B = P~ X AP. As we have seen (Section 4) similar
matrices are representations of a single linear transformation of a vector
space into itself. This is one of the most interesting of the equivalence
relations, and Chapter III isdevoted to a study of it.
Let us show in detail that the reation we have defined as left associate
is an equivalence The matrix Q~ x appears in the definition because
relation.
Q represents the matrix of transition. However, Qr x is just another
singular matrix, so it is clearly the same thing to say that A and B are left
associate if and only if there exists a non-singular matrix Q such that B = QA.
non-singular so that A ~ C.
For a given type of equivalence relation among matrices a normal form
is a particular matrix chosen from each equivalence class. It is a repre-
sentative of the entire class of equivalent matrices. In mathematics the
terms "normal" and "canonical" are frequently used to mean "standard"
in some particular sense. A normal form or canonical form is a standard
9 |
Normal Forms 77
the normal forms than it is to transform one into the other. This is the case,
for example, in the application described in the
first part of Section 8.
to the original basis, the relation defined is symmetric. A basis changed once
and then changed again depends only on the final choice so that the relation is
transitive.
For a fixed basis A in U and 8 in V two different linear transformations
a and t of U into V are represented by different matrices. If it is possible,
however, to choose bases A' in U and 8' in V such that the matrix representing
r with respect to A' and 8' is the same as the matrix representing a with
respect to A and 8, then it is certainly clear that a and t share important
geometric properties.
For a fixed a two matrices A and A' representing a with respect to different
bases are related by a matrix equation of the form A' = Q~XAP. Since
A and A' represent the same linear transformation we feel that they should
have some properties in common, those dependent upon a.
These two points of view are really slightly different views of the same
kind of relationship. In the second case, we can consider A and A' as
representing two linear transformations with respect to the same basis,
instead of the same linear transformation with respect to different bases.
"1 01
For example, in R 2 the matrix represents a reflection about the
-1 0" -lj
^-axis and represents a reflection about the # 2 -axis. When both
1.
linear transformations are referred to the same coordinate system they are
different. However, for the purpose of discussing properties independent
of a coordinate system they are essentially alike. The study of equivalence
relations is motivated by such considerations, and the study of normal forms
is aimed at determining just what these common properties are that are
shared by equivalent linear transformations or equivalent matrices.
To make these ideas precise, let a and r be linear transformations of V
into itself. We say that a and t are similar if there exist bases A and 8 of V
such that the matrix representing a with respect to A is the same as the matrix
representing t with respect to 8. If A and B are the matrices representing
a and t with respect to A and P is the matrix of transition from A to 8, then
P~X BP is the matrix representing r with respect to 8. Thus a and r are
similar if P^BP = A.
In a similar way we can define the concepts of left associate, right associate,
and associate for linear transformations.
integersand b ^ 0. Two such fractions, ajb and c/d, are equivalent if and
only if ad = be. Each equivalence class corresponds to a single rational
number. The rules of arithmetic provide methods of computing with rational
numbers by performing appropriate operations with formal fractions selected
from the corresponding equivalence classes.
Let U be a vector space over F and let K be a subspace of U. We shall call
two vectors a, p e U equivalent modulo K if and only if their difference lies
in K. Thus a /^- p if and only if a — p e K. We must first show this defines
<
happen that a^a' and yet a = a'. Let a and p be two elements in U. Since
a, /? e U, a + is defined. We wish to define a + to be the sum of a and
jS /? /5.
For any a £ U, the symbol a + K is used to denote the set of all elements
in U that can be written in the form a + y where y £ K. (Strictly speaking,
we should denote the set by {a} + K so that the plus sign combines two objects
of the same type. The notation introduced here is traditional and simpler.)
The set a + K is called a coset of K. If /? £ a + K, then p — a £ K and
80 Linear Transformations and Matrices |
II
j8 ~ a. Conversely,
if ~
a, then £ - a = y e K so /S e a + K. Thus
a + isKsimply the equivalence class a containing a. Thus a + K = /? +K
if and only ifae£ = £ + Kor£ea = a + K.
the set acl determines the desired coset in U for the induced operations. Thus
we can compute U by doing the corresponding operations with
effectively in
representatives. This is precisely what is done when we compute in residue
classes of integers modulo an integer m.
space. In order to designate the role of the subspace K which defines the
we actually encountered
In our discussion of solutions of linear problems
quotient spaces, but the discussion was worded in such a way as to avoid
introducing this more sophisticated concept. Given the linear transformation
a can be written as the product a = iax rj, where rj is the canonical mapping of
10 I
Quotient Sets, Quotient Spaces 81
into V.
proof. Let a' be the linear transformation of U onto induced by re- /
mapping of U onto U.
proof. For each oteU, let cr^a) = c(a) where a e a. If a' is another
representative of a, then a — a' £S <= K. Thus <r(a) = cT(a') and a x is well
defined. It is easy to check that g x is linear. Clearly, <r(<x) = o^a) = ^(^(a))
for all a e U, and a = a x rj.
We
say that a factors through (J.
Note that the homomorphism theorem is a special case of the factor theorem
in which K = S.
natural way a mapping f of U/U into V/V such that o 2 t = fax where ax is the
canonical mapping U onto U and a 2 is the canonical mapping of V onto V.
be those indices for which <r(a .) ^ a{Ukil). We showed there that {(![,
fc
. . .
,
k}
fi'
where ^ = cr(a .) formed a basis of a{U). {a&i a^,
fc
<x
kJ
is a basis for , . . . ,
a suitable W
which is complementary to K{a).
82 Linear Transformations and Matrices I II
matrix
"1 1 1 bx
1 1 b2
1 0-1 b3
1 1 b2
1 1 1 bx - I
K
This means the solution £ is represented by
(ft,, bt , bx - b 3 0, 0)
,
= M0, 0,1,0,0) + 6,(0, 1,0,0,0) +b 3 (\ ,0,-1,0,0)
is a particular solution and a convenient basis for a subspace W complementary
to K is {(0, 0, 1,0, 0), (0, 1, 0, 0, 0), (1, 0, -1, 0, 0)}. a maps Z> x (0, 0,
1, 0, 0) + b 2 (0, 1, 0, 0, 0) +
b 3 (l, 0, -1, 0, 0) onto (b lt b 2 b 3 ). , Hence, W
is mapped isomorphically onto R 3 .
- i
(*!, X 2 X 3 Xi
, , Hr Xb ) =
+ X 3 + Xs)(0, 0, 0, 0)
(X X 1 ,
ll|Hom(U,V) 83
work out an example of this type. The main importance of the first homo-
morphism theorem is theoretical and not computational.
*11 I Hom(U, V)
°wO*) = <*«&
TO
= ZWr
r=l
(n.i)
1 au^ii = 0.
m n
=2 2>«o«(a*)
i=l 3=1 J
only if R(a x ) = R(a 2 ). It is clear that R(a + r) = i*(<r) + i?(r) and R(aa) =
set of all extensions of a when a is the zero mapping, and one can see directly
that the dimension of U* is (n — r)m.
chapter
III Determinants,
eigenvalues,
and similarity
transforma-
tions
This chapter is devoted to the study of matrices representing linear trans-
the matrix representing a with respect to A'. In this case A and A' are said
to be similar and the mapping of A onto A' = P~ AP is
X
called a similarity
transformation (on the set of matrices, not on V).
85
86 Determinants, Eigenvalues, and Similarity Transformations |
III
1 I Permutations
taining the elements of S in any order and the second row containing the
element n(i) directly below the element i in the first row. Thus for S =
{1, 2, 3, 4}, the permutation n for which n{\) = 2, n(2) = 4, tt-(3) = 3,
and 7r(4) = 1 can conveniently be described by the notations
,
/I 2 3 4, /2 4 1 3V ^ ,4.32
Qr
\2 4 3 1/ \4 1 2 3/ \l 2 3 4
12 3 4
a =
13 4 2
Then
=
12 3 4
OTT
3 2 4 1
1 I
Permutations 87
row and the elements a(i) in the second row. For the n and a described
above,
/2 4 3 1'
13 4 2/.
The permutation that leaves all elements of S fixed is called the identity
permutation and will be denoted by e. For a given tt the unique permutation
7T
_1
such that tt~ x tt = e is called the inverse of it.
in S we have tt{i) > -n-(j), we say that tt
If for a pair of elements i <y
performs an inversion. Let k(Tr) denote the total number of inversions
performed by tt; we then say that tt contains k(Tr) inversions. For the
permutation tt described above, k(Tr) = 4. The number of inversions in
-1
7T is equal to the number of inversions in tt.
an abbreviation for "signum" and we use the term "sgn 77-" to mean "the
sign of 77." If sgn tt — 1 , we say that tt is even ; if sgn tt = — 1 , we say that
tt is odd.
77(0 • • •
TT(j)
<y =
OTr(i) • • •
Ott(J)
because every element of S appears in the top row. Thus, in counting the
inversions in a it is sufficient to compare 77(1) and tt(j) with OTr(i) and on(j).
For a given i < j there are four possibilities
1. i <j; tt{i) < 7r(j); ottQ) < o-tt(j): no inversions.
2. i <j; Tr(i) < tt(j); aTr(i) > ott(j): one inversion in a, one in 0-77.
3. i <y; 7r(/) > 7r(y); cnr(i) > ott{j)\ one inversion in 77, one in <77r.
4. 1 <y; 7r(/) > 7r(/); (T7r(i) < gtt(j): one inversion in 77-, one in a, and
none in cm.
Examination of the above table shows that k{oTr) differs from k(a) + k(ir)
which 7t(/) > j. Then there must also be exactly k elements of S following i
j for which tt(J) < j. It follows that there are 2k inversions involving j.
Since their number is even they may be ignored in determining sgn tt.
88 Determinants, Eigenvalues, and Similarity Transformations |
III
Among other things, this shows that there is at least one odd permutation.
In addition, there is at least one even permutation. From this it is but a
step to show that the number of odd permutations is equal to the number of
even permutations.
Let a be a fixed odd permutation. If n is an even permutation, an is odd.
Furthermore, cr_1 is also odd so that to each odd permutation r there cor-
responds an even permutation o~ x t. Since a~ 1 {aTr) = tt, the mapping of
the set of even permutations into the set of odd permutations defined by
tt ->
ott is one-to-one and onto. Thus the number of odd permutations is
equal to the number of even permutations.
EXERCISES
1. Show that there are n\ permutations of n objects.
'12 3 4 5
2 4 5 13
Notice that permutes the objects in {1, 2, 4} among themselves and the objects
tt
2 I Determinants
det A= \a ti \
= 2 ( s 8n *) a um a **w " ' a mi(n)> (2.1)
where the sum is taken over all permutations of the elements of S = {1 «}. , . . . ,
Each term of the sum is a product of n elements, each taken from a different
row of A and from a different column of A and sgn n. The number n is called ,
a lx a12
= flnfloo — flioflu
12 21- (2.2)
*2X
fl u fl la tfj
fl a
«3i
But since sgn tt-- 1 = sgn tt, this term is identical to the one given above.
T
Thus, in fact, all the terms in the expansion of det A are equal to cor-
T
responding terms in the expansion of det A, and det A = det A.
90 Determinants, Eigenvalues, and Similarity Transformations |
III
The second sum on the right side of this equation is, in effect, the deter-
minant of a matrix in which rows j and k are equal. Thus it is zero. The
first term is just the expansion of det A. Therefore, det A' = det A. u
Theorem 2.8. If A and B are any two matrices of order n, then det AB =
det A det B = •
det BA.
proof. If A and B are non-singular, the theorem follows by repeated
application of Theorem 2.6. If either matrix is singular, then AB and BA
are also singular and all terms are zero.
EXERCISES
1. If all elements of a matrix below the main diagonal are zero, the matrix is
3 2 2 1 4 1 1 4 1
1 4 1 = - 3 2 2 = - -10 -1
-2 -4 -1 -2 -4 -1 4 1
1 4 1 1 4 1
= - -2 1 = - -2 1
4 1 3
3 2 2 3 2 2 3 2
1 4 1 = 1 4 1 = 1 3 1
-2 -4 -1 -10 -10
This last determinant can be evaluated by direct use of the definition by computing
-1 3 1 3 4
2 5 1 5 6
1 2 3 4
5. Consider the real plane R 2 We agree that the two points (a t a 2 ), (b lt b 2 )
.
,
b\ b2
(oi + bi, U2 + b 2)
Fie. 2
3 |
Cofactors 93
Notice that the determinant can be positive or negative, and that it changes sign
if the first and second rows are interchanged. To interpret the value of the deter-
minant as an area, we must either use the absolute value of the determinant or give
an interpretation to a negative area. We make the latter choice since to take the
absolute value is to discard information. Referring to Fig. 2, we see that the
direction of rotation from (a lf the same as
a 2) to {b lt b 2 ) across the enclosed area is
the direction of rotation from the positive a^-axis to the positive # 2 -axis. To
interchange {a lt a 2 ) and (6 l9 b 2 ) would be to change the sense of rotation and the
sign of the determinant. Thus the sign of the determinant determines an orientation
of the quadrilateral on the coordinate system. Check the sign of the determinant
for choices of (a lf a 2 ) and (Jb ± b 2 ) in various quadrants and various orientations.
,
1 xx CBj
2 «n-l
X2
1 xn xn
l<i<j<n
3 I Cofactors
For a given pair i, j, consider in the expansion for det A those terms which
have aH as a factor. Det A is of the form det A = a^A^ + (terms which do
not contain a u as a factor). The scalar A u is called the cofactor of a{j .
no inversion of tt involves the element 1, we see that sgn tt = sgn tt'. Thus
A i} is a determinant, the determinant of the matrix obtained from A by
crossing out the first row and the first column of A.
94 Determinants, Eigenvalues, and Similarity Transformations |
III
keep the other rows and columns in the same relative order if the sequence
of operations we use interchanges only adjacent rows or columns. It takes
i — 1 interchanges to move the element a
ti into the first row, and it takes
(_ iy+3 times the determinant of the matrix obtained by crossing out the
rthrow and theyth column of A.
Each term in the expansion of det A contains exactly one factor from each
row and each column of A. Thus, for any given row of A each term of det A
contains exactly one factor from that row. Hence, for any given i,
Similarly, for any given column of A each term of det A contains exactly one
factor from that column. Hence, for any given k,
det A = 2 a ikA ih . (3.2)
3
112
3
det ,4 =
-2234
115 2
3 I Cofactors 95
-1 -17 -1 -17 1
det A = 4 14 = 4- 3 31
13 6 47
3 31
= 4- — -1
6 47
row k 7^ i, we get the same result as we would if the elements in row k were
equal to the elements in row /. Hence,
2a it AM = for i ^ k, (3.3)
and
2 a^A^ = for j ^ k. (3.4)
2 A* =
«.• <5<* det A, (3.5)
PROOF.
A •
adj A = [a it ] •
[A kl f = 2 a aA u
= (det A) •
I. (3.7)
s% = — au s\. (3.9)
det A
This is illustrated in the following example.
"
1 2 3" "-3 5 1
A = 2 1 2 adj A = -2 5 4
_-2 1 -1_ * 4 -5 -3
"-3 5 r
A~* = i -2 5 4
4 -5 -3
The relations between the elements of a matrix and their cofactors lead
to a method for solving a system of n simultaneous equations in n unknowns
3 I Cofactors 97
when the equations are independent. Suppose we are given the system of
equations
22 A
/ \
2A ik { 2,a tJ xA = ik a i
Ax i
n
=2 det 4 w a;,
= det A zfc
= 2 A ncK (3.11)
.2, ^ifc^i
3/i. — i=l
(3.12)
det A
The numerator can be interpreted as the cofactor expansion of the deter-
minant of the matrix obtained by replacing the kth column of A by the
column of the b t In this form the method is known as Cramer's rule.
.
EXERCISES
1. In the determinant
2 7 5 8
7-125
10 4 2
-3 6-1 2
find the cofactor of the "8" ; find the cofactor of the " -3."
2 2 1
-1 1 3
3 1 2
with the method of elementary row and column operations described in Section 2.
Subtract appropriate multiples of column 2 from the other columns to obtain
1 3 7 8
2 2 2 7
-1
-3 1 2
5. Show that a matrix is non-singular if and only if its adj A is also non-singular.
6. Let A = [ciij] be an arbitrary n x n matrix and let adj A be the adjunct of A.
If X= (x lt . . . , xn ) and Y = (y lt y n ) show that
. .
,
yy(adj A)X = -
V\
For notation see pages 42 and 55.
4 |
The Hamilton-Cayley Theorem 99
algebra of polynomials of this type is not simple, but we need no more than
the observation that two polynomials with matric coefficients are equal if
and only if they have exactly the same coefficients.
We avoid discussing the complications that can occur for polynomials
with matric coefficients in a matric variable.
Now we should like to consider matrices for which the elements are
polynomials. If F is the field of scalars for the set of polynomials in the
indeterminate x, let K be the set of all rational functions in x; that is, the
set of all permissible quotients of polynomials in x. It is not difficult to show
that K is a field. Thus a matrix with polynomial components is a special
case of a matrix with elements in K.
From this point of view a polynomial with matric coefficients can be
expressed as a single matrix with polynomial components. For example,
i r o 2i [2 -11 x2 + 2 2x - 1
r °i x 2
+ x + —
-1 2, -2 0_ 1 1_ -x - 2
2x + 1 2x 2
+ 1.
floo X «2„
(4.1)
~
the coefficient of x n x is (— l) n_1 £ n=1 a u> and the constant term k = det A.
4
n~x
adj C = Cn_ xx + C„_ 2 x"- 2
+ • • •
+ Cx + C x (4.2)
where each C t
is a matrix with scalar elements.
By Theorem 3.1 we have
adjC- C= det C- /=/(>;)/
= adj C •
{A - xl) = (adj C)A - (adj C)x. (4.3)
Hence,
k n Ix n + k n _Jx n - x -\ + kjx + k I
+ CV^a;"- 1 + • • •
+ C^x + CQ A. (4.4)
(4.5)
k I = C ^4.
representing it satisfy the same relations, similar matrices satisfy the same
set of polynomial equations. In particular, similar matrices have the same
minimum polynomials.
where q(x) is the quotient polynomial and r{x) is the remainder, which is
either identically zero or a polynomial of degree less than the degree of
is
z
Theorem 4.3. h(x) = f(
——) is the minimum polynomial for A.
g(x)
proof. Let adj C= g(x)B where the elements of B have no non-scalar
common factor. Since adj C C = f{x)Iwe have h(x)
• •
g(x)I = g(x)BC. Since
g(x) j£ this yields
BC = h(x)I. (4.9)
Using B in place of adj C we can repeat the argument used in the proof of the
Hamilton-Cayley theorem to deduce that h{A) = 0. Thus h(x) is divisible
by m(x).
On the other hand, consider the polynomial m(x) — m(y). Since it is a
sum of terms of the form c^ — y*), each of which is divisible by y — x,
m(x) — m(y) is divisible by y — x:
m(x) - m(y) = (y - x) •
k(x, y). (4.10)
Hence,
Since h{x) divides every element of m{x)B and the elements of B have no
non-scalar common factor, h(x) divides m(x). Thus, h{x) and m{x) differ
at most by a scalar factor.
m(x)I =C •
k(xl, A).
Thus
det m(x)I = [m(x)] n = det C det k(xl, A) •
We see then that every irreducible factor off(x) divides [m(x)] n , and therefore
m(x) itself.
a by the rules
oiu-i) = a i+1 for i < n, (4.16)
and
(-l) n cr(a n ) = -k oc
x
- k x cc 2 A:
n_ x a n .
Since f(a) vanishes on the basis elements f(a) = and any matrix repre-
senting a satisfies the equation f(x) 0. =
4 |
The Hamilton-Cayley Theorem 103
On
the other hand, a cannot satisfy an equation of lower degree because
the corresponding polynomial in a applied to a x could be interpreted as
a relation among the basis elements. Thus, f(x) is a minimum polynomial
for a and for any matrix representing a. Since f(x) is of degree n, it must
also be the characteristic polynomial of any matrix representing a.
With respect to the basis A the matrix representing a is
-(-1)%
1 - (-l) n *i
1 -(-l) n k 2
A = (4.19)
1 -(-1)^^
A is called the companion matrix off(x).
EXERCISES
1. Show that -x 3 + 39z - 90 is the characteristic polynomial for the matrix
"0 -90"
1 39
~2 -2 3"
1 1 1
1 3 -1
and show by direct substitution that this matrix satisfies its characteristic equation.
^3 2 2"
1 4 1
-2 -4 -1
4. Write down a matrix which has x4 + 3*3 + 2x 2 - x + 6 = as its minimum
equation.
104 Determinants, Eigenvalues, and Similarity Transformations |
III
7 4-4"
4 -8 -1
-4 -1 -8
is not its minimum polynomial. What is the minimum polynomial ?
dx 2 dy 2
Since
1 .^!l = _ A .<!*x
Y dy 2 X dx2
is a function of x alone and also a function of y alone, it must be a constant
(scalar) which we shall call k2 . Thus we are trying to solve the equations
— ~- —kkXX —2~-
dx 2
2
>
dy
2
k Y -
These are eigenvalue problems as we have defined the term. The vector
space is the space of infinitely differentiable functions over the real numbers
and the linear transformation is the differential operator d 2 /dx2 .
2 a k sin kx
represents the given function f(x) for < x < n.
Although the vector space in this example is of infinite dimension, we
restrict our attention to the eigenvalue problem in finite dimensional vector
spaces. In a finite dimensional vector space there exists a simple necessary
and sufficient condition which determines the eigenvalues of an eigenvalue
problem.
The eigenvalue equation can be written in the form (a — A)(£) = 0.
We know that there exists a non-zero vector | satisfying this condition if
106 Determinants, Eigenvalues, and Similarity Transformations |
III
this step that presents the difficulties. The characteristic equation may have
no solution in F. In that event the eigenvalue problem has no solution.
Even in those cases where solutions exists, finding them can present practical
difficulties.) For each solution X off(x) = 0, solve the system of homogene-
ous equations
Since this system of equations has positive nullity, non-zero solutions exist
and we should use the Hermite normal form to find them. All solutions
are the representations of eigenvectors corresponding to a.
Generally, we are given the matrix A rather than a itself, and in this case
we regard the problem as solved when the eigenvalues and the representations
of the eigenvectors are obtained. We refer to the eigenvalues and eigenvectors
of a as eigenvalues and eigenvectors, respectively, of A.
Theorem 5.2. Similar matrices have the same eigenvalues and eigenvectors.
proof. This follows directly from the definitions since the eigenvalues
and eigenvectors are associated with the underlying linear transformation.
det (A' - xl) = det (P^AP - xl) = det {P~ {A - xI)P) = detP- X 1
Theorem 5.5. The geometric multiplicity of X does not exceed the algebraic
multiplicity of X.
the matrix A representing a with respect to this basis has the form
'X «l,r+l
I a*2,r+l
A = 1 a r,r+l (5.3)
a r+l,r+l
a.
linearly independent.
proof. Suppose the dependent and that we have reordered the
set is
eigenvectors so that the k eigenvectors are linearly independent and
first
is = 2 a&i
where the representation is unique. Not all a t
= since | s ^ 0. Upon
applying the linear transformation a we have
<-i As
Since not and XJX Sall at =
1, this would contradict the uniqueness of ^
the representation of £ s Since we get a contradiction in any event, the .
EXERCISES
1. Show that X = is an eigenvalue of a matrix A if and only if A is singular.
2. Show that an eigenvector of
then I is also an eigenvector of a n for
if £ is cr,
of an corresponding to £ ?
3. Show that if £ is an eigenvector of both a and r, then £ is also an eigenvector
of aa{ for aeF) and + t. If A x is the eigenvalue ct of a and A 2 is the eigenvalue of
t corresponding to I, what are the eigenvalues of aa and a + T?
4. Show, by producing an example, that if X x and A 2 are eigenvalues of <r and a 2
x ,
polynomial for D. From your knowledge of the differentiation operator and net
using Theorem 4.3, determine the minimum polynomial for D. What kind of
differential equation would an eigenvector of D have to satisfy? What are the
eigenvectors of D?
9.
1) is
Let A =
an eigenvector. What
[a^]. Show that if ^ «« = c independent of /, then I = (1 , 1 , . . .
,
representing a with respect to the basis A. Show that all elements in the first k
columns below the fcth row are zeros.
11. Show that if X1 and A 2 ^ X x are eigenvalues of a x and £ x and £ 2 are eigen-
vectors corresponding to X x and X 2 respectively, then f x + £ 2 is not an eigenvector.
,
13. Let A
be an n x n matrix with eigenvalues Alt A 2 , . . . , A„. Show that if A
is the diagonal matrix
• • ~~
"li •
A2
A =
and P = [pij] is the matrix in which column j is the M-tupIe representing an eigen-
vector corresponding to A3-, then AP = PA.
14. Use the notation of Exercise 13. Show that if A has n linearly independent
eigenvalues, then eigenvectors can be chosen so that P is non-singular. In this case
p-^AP = A.
A = 2 2 2
-3 -6 -6
The first step is to obtain the characteristic matrix
-_1 _ x 2 2
C{x) = 2 2- x 2
_3 _6 -6 -
C(-2) = 2 4 2
-3 -6 -4
6 |
Some Numerical Examples jjj
The Hermite normal form obtained from C(— 2) is
1 2
_0 0_
The components of the eigenvector corresponding to X
x
= —2 are found
by solving the equations
xx + 2x 2 =0
x3 = 0.
__3 _6 -3
From C(— 3) we obtain the Hermite normal form
"1 1"
0.
C(0)= 2 2 2
_3 _6 -6
we obtain the eigenvector £ 3 = (0, 1, —1).
By Theorem 5.6 the three eigenvectors obtained for the three different
eigenvalues are linearly independent.
Example 2. Let
"
1 i -r
A = -1 3 -1
_-l 2 0_
From the characteristic matrix
1 - X i -r
C(x) = -1 3 — x -1
-1 2 —x
112 Determinants, Eigenvalues, and Similarity Transformations |
III
"
1
-1"
C(l) = -1 2 -1
-1 2 -1
1 -1"
1 -1
Thus it is seen that the nullity of C(l) is 1 The eigenspace S(l) is of dimension
.
1 and the geometric multiplicity of the eigenvalue 1 is 1. This shows that the
geometric multiplicity can be lower than the algebraic multiplicity. We obtain
&= (1,1,1).
The eigenvector corresponding to X z = 2 is £3 = (0, 1, 1).
EXERCISES
For each of the following matrices find all the eigenvalues and as many linearly
"2 2"
1. 4" 2. 3
5 3 -2 3
2"
5.
'4
9 0" 6. 3 2
- -2 8 1 4 1
7 -2 -4 --1
0"
7.
"
7 4 -4" 2 —i
4 -8 -1 / 2
-4 -1 -8 9 3
7 Similarity
|
113
7 I Similarity
(7.1)
-1 2
A = 2 2
-3 -6
114 Determinants, Eigenvalues, and Similarity Transformations |
III
P= 1 1
p-l = -1 -2 -2
-1 -1 »
1 2 1
The reader should check that P~ X AP is a diagonal matrix with the eigenvalues
appearing in the main diagonal.
In Example 2 of Section 6, the matrix
1 1 -1
A = -1 3 -1
-1 2
2
t.
be expressed as a
(- M p is direct.
sum of vectors in ^ . M,. Hence,
that M 1 ®-"@M
9 = V. Since every vector in Visa linear combination of
eigenvectors, there exists a basis of eigenvectors. Thus, A is similar to
a
diagonal matrix.
Theorem 7.4 is important in the theory of matrices, but it does not provide
the most effective means for deciding whether a particular matrix is similar
to diagonal matrix. If we can solve the characteristic equation,
it is easier
to try to find the n linearly independent eigenvectors than it is to use Theorem
7.4 to ascertain that they do or do not exist. If we do use this theorem and
are able to conclude that a basis of eigenvectors does exist, the work done in
making this conclusion is of no help in the attempt to find the eigenvectors.
The straightforward attempt to find the eigenvectors is always conclusive
On the other hand, if it is not necessary to find the eigenvectors, Theorem 7.4
can help us make the necessary conclusion without solving the characteristic
equation.
For any square matrix A [a tj ], Tr(A) =
2?=1 a u is called the trace of A. =
It is the sum of the elements in the diagonal of A. Since Tr{AB) =
ItidU °iM = IjLiCZT-i V«) = Tr(BA),
that is, similar matrices have the For a given linear transforma-
same trace.
tion a of V into itself, all matrices representing a have the same trace. Thus
we can define Tr(cr) to be the trace of any matrix representing a.
n ~x
Consider the coefficient of x in the expansion of the determinant of the
characteristic matrix,
#m
#22 "^ a 2n
(7.3)
— X
/(0) is the constant term of /(a;). If /(a) is factored into linear factors in the,
form
f{x) = (- l) n (x - Itf^x - A 2 )'» •{x- X P Y\ (7.4)
the constant term is YLLi K- Thus det A is the P roduct of the characteristic
values (each counted with the multiplicity with which it is a factor of the
characteristic polynomials). In a similar way it can be seen that Tr(^) is the
sum of the characteristic values (each counted with multiplicity).
We have now shown the existence of several objects associated with a
matrix, or its underlying linear transformation, which are independent of
the coordinate system. For example, the characteristic polynomial, the
determinant, and the trace are independent of the coordinate system.
Actually, this list is redundant since det A is the constant term of the char-
_1
acteristic polynomial, and Tr(^) is (- 1)" times the coefficient of a;"- 1 of the
characteristic polynomial. Functions of this type are of interest because they
contain information about the linear transformation, or the matrix, and they
are sometimes rather easy to evaluate. But this raises a host of questions.
What information do these invariants contain? Can we find a complete
list of non-redundant invariants, in the sense that any other invariant can
be computed from those in the list? While some partial answers to these
questions will be given, a systematic discussion of these questions is beyond
the scope of this book.
7 |
Similarity 117
the form
r
a = 2 £<»
where the £ f are eigenvectors of a with distinct eigenvalues. Let X { be the
eigenvalue corresponding to | t We will . show that each ^ £ W.
(a l 2 )(oc —
^3) (c —
X r )(<x) is in ' • ' — W since W
is invariant under a,
and hence invariant under a — X for any scalar But then (a — X 2 )(a — X 3)
X.
• • •
(a — A r )(a) = (A x — ^X^ — X3 ) • • •
(X x — A r )| x e W, and | x £ since W
(A a — X 2 )(X 1 — X3 ) • • •
{X — X r ) ?± 0. A
x similar argument shows that each
f,eW.
Since this argument applies to any a £ W, W is spanned by eigenvectors
of (T. Thus, W has a basis of eigenvectors of a. D
Theorem 7.6. Let V be a vector space over C, the field of complex numbers.
Let a be a linear transformation of V into itself V has a basis of eigenvectors
for a if and only iffor every subspace S invariant under a there is a subspace T
invariant under a such that V = S © 7.
proof. The theorem is obviously true if V is of dimension 1. Assume
the assertions of the theorem are correct for spaces of dimension less than n,
where n is the dimension of V.
Assume first that for every subspace S invariant under a there is a com-
plementary subspace T also invariant under a. Since V is a vector space over
the complex numbers a has at least one eigenvalue X ± Let a 2 be an eigenvector .
Now S 2 c Tx and 7a = S 2 © (T2 n TJ. (See Exercise 15, Section 1-4.) Since
T2 n 7X is invariant under c, and therefore under Ra, the induction
assumption holds for the subspace Tx Thus, Tj has a basis of eigenvectors, .
EXERCISES
1. For each matrix A given in the exercises of Section 6 find, when possible,
a non-singular matrix P for which P~ 1 AP is diagonal.
"1 c
2. Show that the matrix where c ?* is not similar to a diagonal matrix
1
(a - A) (a) = fc+1
so that (a - A)(a) 6 M k
Hence, (a - A)/^ c M*. .
1
Since all M*
y and V is finite dimensional, the sequence of subspaces
<=
index such that M k = M< for all k > t, and denote M* by M (A) Let m k be the .
W k
= W
for all k > t. Denote W* by
M)
W .
W (A) there is a unique vector y e (A) such that (<r — A)'(y) = /?. Let W
a — y be denoted by (5. It is easily seen that d e M (A) Hence V = M (A) + U) . W .
The first term is zero because a e M i9 and the others are zero because a
is in the kernel of a — A,-. Since X j — A, ^ 0, it follows that a = 0. This
means that a — A3 - is non-singular on M^, that is, a — Xj maps M. onto
120 Determinants, Eigenvalues, and Similarity Transformations |
III
itself. Thus M i
is contained in the set of images under (a — 1,)'% and hence
M c t
W,.
Theorem 8.3. V = M ©M © © Mp
± 2
• • •
.
q(a) maps W
onto itself. For arbitrarily large k, [q{o)f also maps onto W
itself. But
contains each factor of the characteristic polynomial f(x)
<7(x)
so that for large enough k, [q{x)f is divisible by f(x). This implies that
W= {0}.
W
. . .
t ,
Let us return to the situation where, for the single eigenvalue X, M k is the
kernel of (a - Xf and W = k - X) k V.
(a In view of Corollary 8.4 we let
s be the smallest index such that M k = M s
for all k > s. By induction we
can construct a basis {a l9 . . . , a m } of M x such that {a 1? . . . , a TO } is a basis
of Mk .
We now proceed step by step to modify this basis. The set {a TO i+1 ,
. . . ,a m }
consists of those basis elements in M s
which are not in M s_1 . These
elements do not have to be replaced, but for consistency of notation we
change their names; let a m$ _ i+v = m§ _ i+ ,. Now set {a - X)(Pm _ 1+ ,) =
&._,+, an d consider the set {a l5 a Ws2 } u {(3 ms _ 2+1 P ms _ 2+rils_ ms J. . . . , , . . .
,
would map this linear combination onto 0. This would mean that (or — A) s_1
would map a non-trivial linear combination of {<xm +1 <x
m } onto 0. , . . . ,
8 |
The Jordan Normal Form 121
We now set (a —
Pms- 3 +v anc P rocee d as before to obtain
A)(/9
TOs _ 2+ „)
= *
a new basis {a l5 . . .
Pm ,J of M
,
s~ 2
a B( JU {/3 mj _ 3+ V • • • , .
The general idea is to list each of the first elements from each section of the
/Ts, then each of the second elements from each section, and continue until
a new ordering of the basis is obtained.
With the basis of M (A) listed in this order (and assuming for the moment
that that A1 (A) is all of V) the matrix representing a takes the form
~X 1 • • •
A 1 • • •
I •••
•• •
X 1
•
•• X
X 1
•••
<s rows *
all zeros all zeros
••• X
(x — A P) Sv
. A is similar to a matrix J with submatrices of the form
1 •• 0"
'k
K 1 ••
K ••
B< =
••
K 1
- K_
along the main diagonal. All other elements of J are zero. For each X there
t
to the geometric multiplicity of X The sum of the orders of all the B t corre- .
t
of J is not unique, the number ofB of each possible order is uniquely determined
t
Theorem 8.5.
Let us illustrate the workings of the theorems of this section with some
examples. Unfortunately, it is a little difficult to construct an interesting
8 I The Jordan Normal Form 123
example of low order. Hence, we give two examples, The first example
illustrates the choice of basis as described for the space M {X)
. The second
example illustrates the situation described by Theorem 8.3.
Example 1. Let
~ 0-1
1 1
0~
-4 1-3 2 1
A = -2-1 1 1
-3 -1 -3 4 1
_-8 -2 -7 5 4_
The first step is to obtain the characteristic matrix
-
~\-x -1 1 o
-4 1 -a; -3 2 1
C{x) = -2 -1 -x 1 1
-3 -1 -3 4 - x 1
_ ~8 -2 -7 5 4 — x
Although it is tedious work we can obtain the characteristic polynomial
fix) = (x — 2)
5
. We have one eigenvalue with algebraic multiplicity 5.
-1 0-1 1
0"
-4 -1-3 2 1
C(2) = -2 -1 -2 1 1
-3 -1-3 2 1
-8 -2 -7 5 2_
-1
124 Determinants, Eigenvalues, and Similarity Transformations |
III
From this we learn that there are two linearly independent eigenvectors
corresponding to 2. The dimension of M 1
is 2. Without difficulty we find
the eigenvectors
aa = (0,-1,1,1,0)
a2 = (0, 1,0,0, 1).
0"
{A - 2/)
2
= -1 0-1 1
-1 0-1 1 o
1 -1 1
lead to a simpler matrix of transition than will others, and there seems to
be no very good way to make the choice that will result in the simplest
matrix of transition. Let us take
<x 6 = (0,0,0,1,0).
We now have the basis of {a,, a 2 a 3 a 4 a 5 } such that {a 1? a 2 } is a basis
, , ,
= 1
1
*
_0_ _5_
8 j The Jordan Normal Form 125
~0~
(A - 21) ~r {A - 21) ~-r ~o
2 i
1 = 1 i =
2 1
_5_ _1_ ^
o_ 1
"0
1
-1"
2 1
P= 1 1 1
1 2 1
1 5 1
is the matrix of transition that will transform A to the Jordan normal form
"2 1
0~
2 1
= 2
2 1
_0 2_
Example 2. Let
"5 -1 -3 2 -5"
A = 1 1 1 -2
-1 3 1
1 -1 -1 1 1
"3 -1 -3 2 -5~
C(2) = 1 -1 1 -2
0-1 1 1
1 -1 -1 1 -1_
1 -1 -2~
1 -1
1
0_
a2 = (2,1,0,0,1).
~1 -1 -2"
(A - 2/)
2
=
1 -2 -1 2
1 -1 -1 1 -1
~l 0-1 -2"
1 -1 -1
8 I The Jordan Normal Form 127
= 1
_0_ _0_
hence, we have /3 3 = a3 , /3 X = a 1( and we can choose /? 2 = a2 .
have been the subject of much investigation and a great deal of special
terminology has accumulated for them.
For the first time we make use of the fact that the set of linear transforma-
tions can profitably be considered to be a vector space. For finite dimensional
vector spaces the set of linear functionals forms a vector space of the same
dimension, the dual space. We are concerned with the relations between
the structure of a vector space and its dual space, and between the representa-
tions of the various objects in these spaces.
In Chapter V we carry the vector point of view of linear functionals one
step further by mapping them into the original vector space. There is a
certain aesthetic appeal in imposing two separate structures on a single
vector space, and there is value in doing it because it motivates our con-
centration on the aspects of these two structures that either look alike or
are symmetric. For clarity in this chapter, however, we keep these two
structures separate in two different vector spaces.
Bilinear forms are functions of two vector variables which are linear in
each variable separately. A quadratic form is a function of a single vector
variable which is obtained by identifying the two variables in a bilinear
form. Bilinear forms and quadratic forms are intimately tied together,
and this is the principal reason for our treating bilinear forms in detail.
In Chapter VI we give some applications of quadratic forms to physical
problems.
If the field of scalars is the field of complex numbers, then the applications
128
1 Linear Functionals 129
I
1 I Linear Functionals
and the vector were in the same copy of F and the product taken to be an
element of U. Thus the concept of a linear functional is not really something
new. our familiar linear transformation restricted to a special case.
It is
Linear functionals are so useful, however, that they deserve a special name
and particular study. Linear concepts appear throughout mathematics
particularly in applied mathematics, and in all cases linear functionals play
mapping defined by (<f> + ^)(a) = <£(a) + y>(a) for all a e V. For any
a e F, by a<f> we mean the mapping defined by (a<£)(a) = a [<£(<*)] for all
a e V. We must then show that with these laws for vector addition and
scalar multiplication of linear functionals the axioms of a vector space are
satisfied.
These demonstrations are not difficult and they are left to the reader.
(Remember that proving axioms Al and B\ are satisfied really requires
showing that + y> and as defined, are linear functionals.)
<j> a<j>,
We call the vector space of all linear functionals on V the dual or conjugate
space of V and denote it by V (pronounced "vee hat" or "vee caret"). We have
yet to show that Vis of dimension n. Let A = {a l5 a a w } be a basis of V. 2, . . . ,
Define fa by the rule that for any a = JjLi **<**, &( ) a = a e F We sha11 i -
call <f>i
the ith coordinate function.
130 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV
2?=i V*> =
= £Li b^ we have (P) = b, and &(a + 0) = &2JU
For any
<MIti («* + ^K> =
/8
tional.
Suppose that 2»=1 Z>^. = 0. Then Q>=1 ^)(a) = for all a e V.
In particular for a< we have (2"=1 ^XaJ = 2*=xM*( a = *< = <) °- Hence,
all Z)
t
= and the set {<{>!, <f> 2 , , <f> n) must be linearly independent. On the
other hand, for any <£ e V and any a = JjL x «*<**
e V, we have
\ n
In
for all i, j. In the proof of Theorem 1 . 1 we have shown that a basis satisfying
these conditions exists. For each i, the conditions in Equation (1.3) specify
the values of ^> t on all the vectors in the basis A. Thus </>; is uniquely deter-
mined as a linear functional. And thus A is uniquely determined by A and the
conditions (1.3). We call A the basis dual to the basis A.
<K*t) = 2 bdlcLj) = b,
i=l
n n
3 =1
= [b t •b n]
= BX. (1.4)
EXERCISES
1. Let A = {a l5 <x
2,
a 3 } be a basis in a 3-dimensional vector space V over R.
A. Any vector I G V can be written in
Let A ={<f> x 2 3 } be the basis in V dual to
, <f>
, <f>
on V are linear functionals. Determine the coordinates of those that are linear
.A.
4. Let V be the space of real functions continuous on the interval [0, 1], and let
fWOdt.
Jo
132 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV
Show that Lg is a linear functional on V. Show that if L g {f) = for every geV,
then/ = 0.
V dual to the basis A. Show that an arbitrary a g V can be represented in the form
n
a = 2 &( a ) a i-
6. Let V be a vector space of finite dimension n > 2 over F. Let a and /S be two
vectors in V such that {a, p) is linearly independent. Show that there exists a
linear functional <f>
such that <£(<*) = 1 and <f>(P)
= 0.
7. Let V =
Pn the space of polynomials over F of degree less than n(n > 1). Let
,
a g F be any scalar. For each p(x) e Pn p(a) is a scalar. Show that the mapping ,
8. (Continuation) In Exercise 7
A.
we showed that for each ae F there is defined
a linear functional <r
a e Pn . Show that if « > 1 , then not every linear functional in
Pn can be obtained in this way.
3
°'"
/'K>
aj to
2fc=i ^fc^/cC^)-) Show that {ffj, a n ) is the basis dual to {h^ix),
. . . ,/? (a;)}
n . . . ,
« /?(a fc )
*;=i / ^^
(Hint: Use Exercise 5.) This formula is known as the Lagrange interpolation
formula. It yields the polynomial of least degree taking on the n specified values
{p(a 1 ), . . .
, p(a n )} at the points {a lf . . . , an ).
12. Let W be a proper subspace of the »-dimensional vector space V. Let a be
a vector in V but not in W. Show that there is a linear functional <£ G y such that
<A(a ) = 1 and 0(a) = for all a g W.
13. Let W be a proper subspace of the M-dimensional vector space V. Let y>
15. Let a and /S be vectors such that <£(/?) = implies <f>(a) = 0. Show that a is
a multiple of /3.
2 I Duality
Thus Theorem 2.1 says that a finite dimensional vector space is reflexive.
Infinite dimensional vector spaces are not reflexive, but a proof of this
assertion is beyond the scope of this book. Moreover, infinite dimensional
vector spaces of interest have a topological structure in addition to the
algebraic structurewe are studying. This additional condition requires
a more restricted definition of a linear functional. With this restriction
134 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV
the dual space is smaller than our definition permits. Under these condition
it is again possible for the dual of the dual to be covered by the mapping J.
Since J is onto, we identify V and J(V), and consider V as the space of
linear functionals on V. Thus V and V are considered in a symmetrical
position and we speak of them as dual spaces. We also drop the parentheses
from the notation, except when required for grouping, and write </><x instead
of </>(<x). The bases {a. x a. } and {(/>!,...,
n n } are dual bases if and only
, . . . ,
<f>
if ^a, = 6 it .
EXERCISES
1. Let A = {oL lf . . . , a„} be a basis of V, and let A = {<f> lt . . . , <£ n} be the basis of
A A.
V dual to the basis A. Show that an arbitrary <f>eV can be represented in the form
2. Let V be a vector space of finite dimension n > 2 over F. Let </> and y> be two
linear functionals in V such that {<f>, y>} is linearly independent. Show that there
exists a vector a such that <£(a) = 1 and y(a) = 0.
3. Let <f>
be a linear functional not in the subspace S of the space of linear
functionals V. Show that there exists a vector a such that <£ ( a) = 1 and <£(<x) =0
for all <f>eS.
3 I Change of Basis
If the basis A' = {04, x'2 , . . . , a'J is used instead of the basis A =
{a l5 a 2 , . . . , a n }, we ask how the dual basis A' = {<f>[,
. . . ,
<f>' n } is related to
the dual basis A = {</> l5 . . . , cf> n }. Let i> = [p {j ] be the matrix of transition
from the basis = ^Li/?^. Since </>j(a =
A to the basis A'. Thus a J ')
3
matrix of transition from the basis A' to the basis A. Hence, (P 2 -1 = (P~ ) T ') X
A A
is the matrix of transition from A to A'.
n in
=i(i^A^;- (3-D
Thus,
B' = BP. (3.2)
bt = (p-yB ,T = (B'p-y,
or (3.3)
B = B'P~\
which is equivalent to (3.2). Thus the end result is the same from either
point of view. It is this two-sided aspect of linear functionals which has
made them so important and their study so fruitful.
Example 1. In analytic geometry, a hyperplane passing through the
origin is the set of all points with coordinates (x x x 2 , x n ) satisfying an
, . . . ,
2 b x = i=l
i=l
1b i i i
_3'
^aaVi
=1
n - n
= 1 2 b aa i y,
i=1 _*'=!
n
= 2^, = o. (3.4)
j=l
136 Linear Functionals, Bilinear Forms, Quadratic Forms |
IV
= dw dw 9w
,
dw
,
ax x H dx 2
,
+ • • •
.
-\ dx n
,
, (3.5)
dx x dx 2 dx n
and
\dxx
'
dx 2 '
' dxj
dw is and Viv is usually called the gradient
usually called the differential of w,
of w. customary to call Viv a vector and to regard dw as a scalar,
It is also
approximately a small increment in the value of w.
The difficulty in regarding Vw as a vector is that its coordinates do not
follow the rules for a change of coordinates of a vector. For example, let
I =2 ^ (3-7)
Pi = lPifi<- (3-8)
or
*« = i:r^ (3-10)
Let us contrast this with the formulas for changing the coordinates of Vw.
From the calculus of functions of several variables we know that
—
dw ~ dw dx i
=2, • (3-H)
dy, i=idx{ dy 5
3 |
Change of Basis I37
This formula corresponds to (3.2). Thus Vw changes coordinates as if it were
in the dual space.
In vector analysis it is customary to call a vector whose coordinates
change
according to formula (3.10) a contravariant vector, and a vector
whose
coordinates change according to formula (3.11) a covariant vector.
'
The
~dxl 'dyt
reader should verify that if P = then = (P T)~\ Thus
-dx 4 _
Thus (3.11) is equivalent to the formula
dw^ dw dy*
r=2rT' 3 12 )
-
n
Pi = I,Pii<Xi> (3.13)
and let
*Pi=2qif<f>i (3.14)
3=1
Then
i=l j=l
n
= J 1kiPa- f
(3-15)
i=l
I=QP. (3.16)
EXERCISES
1. Let A = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . .
, (0, 0, . . . , 1)} be a basis of R n .
The basis of R n dual to A has the same coordinates. It is of interest to see if there
are other bases of Rn for which the dual basis has excatly the same coordinates.
Let A' be another basis of R n with matrix of transition P. What condition should
P satisfy in order that the elements of the basis dual to A' have the same coordinates
as the corresponding elements of the basis A' ?
be another basis of V (where the coordinates are given in terms of the basis A).
Use the matrix of transition to find the basis A' dual to A'.
3. Use the matrix of transition to find the basis dual to {(1,0,0), (1, 1,0),
(1,1,1)}.
4. Use the matrix of transition to find the basis dual to {(1, 0, —1), ( — 1, 1, 0),
(0,1,1)}.
5. Let B represent a linear functional $, and X a vector £ with respect to dual
bases, so that BX is the value <f>$ of the linear functional. Let P be the matrix of
new basis so that if X' is the new representation of I, then X = PX'.
transition to a
By substituting PX' for X in the expression for the value of <f>£ obtain another
proof that BP is the representation of in the new dual coordinate system. <f>
4 I Annihilators
A.
Definition. Let V be an w-dimensional vector space and V its dual. If, for
A.
an a g V and a <f>
e V, we have <f>a. = 0, we say that <f>
and a are orthogonal.
4 I Annihilators
139
Since <f> and a are from different vector spaces, it should be clear that we
do
not intend to say that the <f> and a are at "right angles."
Definition.Let W
be a subset (not necessarily a subspace) of V. The set of
functional
all linear such that <£a = for all a e is called the annihilator
<f> W
of W, and we denote it by V\A. Any e V\A is called «« annihilator of W. <f>
Suppose W
is a subspace of V of dimension
p, and let A = {a lf . . . , aB}
be a basis of V such that {a l5 a p } is a basis of W. Let A = . . . ,
{fa, . . . ,
<£J
be the dual basis of A. For {<f>
p+1 fa} we see that = , . . .
, ^a, fo'r all
i < p. Hence, {0 lH 1 fa} is a subset of the annihilator of W.
. , . . . , On the
other hand, if <j> n
= £=
=1 brf> is an annihilator of W, we have 0<x, =
each
set {fa +1 ,
/ < p.
. .
But
. ,
fa
fa} spans
J;=1
t
^
W^. Thus
a, = 6,. Hence, b t
fa} is a basis for W^,
{<£ p+1 , . . .
= for i ^ p and the
for
W
,
x
and is of dimension n - p. The dimension of W^ is called the co-
dimension of W.
It should also be clear from this argument that W is exactly the set of all
a e V annihilated by all <£ e W-1 Thus we have .
proof. If <f>
is an annihilator of VVX annihilates all a g + W then l5 <f>
W x
a g VVX and /S g VV2 we have </><x and </>/? = 0. Hence, (f>(aa. + 6/?)
= =
a<£oc+ b<f>p = so that annihilates W^ + VV2 This shows that (Wx . +
w,) 1 = VV 1 n VV 1 .
The symmetry between the annihilator and the annihilated means that
the second part of the theorem follows immediately from the first. Namely,
since (Wx + VVj) 1 = VV 1 n VV 1 , we have by substituting VV 1 and 1 W
for W x and W 2, (VV 1 + VV 1 ) 1 = (VV 1 ) 1 n (VV 1 ) 1 = VVX n VV>. Hence,
(W1 n VV,) 1 = VV 1 + VV 1 n .
Now the mechanics for finding the sum of two subspaces is somewhat
simpler than that for finding the intersection. To find the sum we merely
combine the two bases for the two subspaces and then discard dependent
vectors until an independent spanning set for the sum remains. It happens
that to find the intersection n it is easier to find VV 1 and 1 W x
W 2 W
and then VV 1 + VV 1 and obtain W nW
x 2 as (VV 1 + VV 1 ) 1 , than it is to
find the intersection directly.
The example in Chapter II-8, page 71, is exactly this process carried out
in detail. In the notation of this discussion £ x
A
= W1 and £ 2 = VV 1 .
Let V be a vector space, V the corresponding dual vector space, and let VV
be a subspace of V. Since VV <= V, is there any simple relation between VV
and V? There is a relation but it is fairly sophisticated. Any function defined
on all of V is certainly defined on any A subset. linear functional <f>
e V,
linear. The kernel of R is the set of all <f> g V such that </>(a) = for all
But two vector spaces of the same dimension are isomorphic in many ways.
We have done more than show that VV and V/VV 1 are isomorphic. We have
shown that there is a canonical isomorphism that can be specified in a natural
way independent of any coordinate system. If <f>
is a residue class in V/W 1 ,
.
4 |
Annihilators
j4j
and <f>
is any element of this residue class, then $ and R(<f>) correspond under
this natural isomorphism. If denotes the natural
rj homomorphism of
V onto V/W 1 and, t denotes the mapping of 4> onto R(<f>) defined above,
then R= Tt], and t is uniquely determined by R and t] and this relation.
EXERCISES
1 (a) Find a basis for the annihilator of W= <(1 , 0, - 1), (1 , -1 , 0), (0, 1 , - 1)>.
(b) Find a basis for the annihilator of W= <(1, 1, 1, 1,1), '(1, 0, 1,0, 1)'
(0 1
1,1,0), (2, 0, 0, 1, 1), (2, 1,1,2, 1), (1,-1,-1, -2, 2), (1, 2, 3, 4, -1)>. What
are the dimensions of W and W^ ?