0% found this document useful (0 votes)
38 views690 pages

Coding The Matrix

The document is an introduction to the book 'Coding the Matrix: Linear Algebra through Applications to Computer Science' by Philip N. Klein, which covers mathematical and computational concepts relevant to computer science. It includes various topics such as functions, probability, and Python programming, along with labs and problems for practical application. The book aims to provide a comprehensive understanding of linear algebra through coding and real-world applications.

Uploaded by

alejandrobarca52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views690 pages

Coding The Matrix

The document is an introduction to the book 'Coding the Matrix: Linear Algebra through Applications to Computer Science' by Philip N. Klein, which covers mathematical and computational concepts relevant to computer science. It includes various topics such as functions, probability, and Python programming, along with labs and problems for practical application. The book aims to provide a comprehensive understanding of linear algebra through coding and real-world applications.

Uploaded by

alejandrobarca52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 690

Coding

the Matrix
Linear Algebra
through Applications to Computer Science

Edition 1

PHILIP N. KLEIN
Brown University

Newtonian Press, 2013


The companion website is at codingthematrix.com. There you
will find, in digital form, the data, examples, and support code
you need to solve the problems given in the book. Auto-grading
will be provided for many of the problems.
Contents

0 The Function (and other mathematical and computational preliminaries) 1


0.1 Set terminology and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3 The function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.1 Functions versus procedures, versus computational problems . . . . . . . 4
0.3.2 The two computational problems related to a function . . . . . . . . . . . 5
0.3.3 Notation for the set of functions with given domain and co-domain . . . . 6
0.3.4 Identity function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3.5 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3.6 Associativity of function composition . . . . . . . . . . . . . . . . . . . . . 7
0.3.7 Functional inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.3.8 Invertibility of the composition of invertible functions . . . . . . . . . . . 10
0.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
0.4.1 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
0.4.2 Events, and adding probabilities . . . . . . . . . . . . . . . . . . . . . . . 14
0.4.3 Applying a function to a random input . . . . . . . . . . . . . . . . . . . 14
0.4.4 Perfect secrecy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
0.4.5 Perfect secrecy and invertible functions . . . . . . . . . . . . . . . . . . . 18
0.5 Lab: Introduction to Python—sets, lists, dictionaries, and comprehensions . . . . 19
0.5.1 Simple expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
0.5.2 Assignment statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
0.5.3 Conditional expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
0.5.4 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0.5.5 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
0.5.6 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
0.5.7 Other things to iterate over . . . . . . . . . . . . . . . . . . . . . . . . . . 34
0.5.8 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
0.5.9 Defining one-line procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 40
0.6 Lab: Python—modules and control structures—and inverse index . . . . . . . . . 42
0.6.1 Using existing modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
0.6.2 Creating your own modules . . . . . . . . . . . . . . . . . . . . . . . . . . 43
0.6.3 Loops and conditional statements . . . . . . . . . . . . . . . . . . . . . . . 44
0.6.4 Grouping in Python using indentation . . . . . . . . . . . . . . . . . . . . 45
CONTENTS i

0.6.5 Breaking out of a loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46


0.6.6 Reading from a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
0.6.7 Mini-search engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
0.7 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
0.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1 The Field 51
1.1 Introduction to complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.2 Complex numbers in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.3 Abstracting over fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.4 Playing with C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.4.1 The absolute value of a complex number . . . . . . . . . . . . . . . . . . . 56
1.4.2 Adding complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.4.3 Multiplying complex numbers by a positive real number . . . . . . . . . . 59
1.4.4 Multiplying complex numbers by a negative number: rotation by 180 degrees 60
1.4.5 Multiplying by i: rotation by 90 degrees . . . . . . . . . . . . . . . . . . . 61
1.4.6 The unit circle in the complex plane: argument and angle . . . . . . . . . 63
1.4.7 Euler’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.4.8 Polar representation for complex numbers . . . . . . . . . . . . . . . . . . 66
1.4.9 The First Law of Exponentiation . . . . . . . . . . . . . . . . . . . . . . . 67
1.4.10 Rotation by τ radians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
1.4.11 Combining operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.4.12 Beyond two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.5 Playing with GF (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.5.1 Perfect secrecy revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
1.5.2 Network coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.6 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
1.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2 The Vector 79
2.1 What is a vector? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.2 Vectors are functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.2.1 Representation of vectors using Python dictionaries . . . . . . . . . . . . 83
2.2.2 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3 What can we represent with vectors? . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.4 Vector addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.1 Translation and vector addition . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.2 Vector addition is associative and commutative . . . . . . . . . . . . . . . 87
2.4.3 Vectors as arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.5 Scalar-vector multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.5.1 Scaling arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.2 Associativity of scalar-vector multiplication . . . . . . . . . . . . . . . . . 92
2.5.3 Line segments through the origin . . . . . . . . . . . . . . . . . . . . . . . 92
2.5.4 Lines through the origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6 Combining vector addition and scalar multiplication . . . . . . . . . . . . . . . . 94
CONTENTS ii

2.6.1 Line segments and lines that don’t go through the origin . . . . . . . . . . 94
2.6.2 Distributive laws for scalar-vector multiplication and vector addition . . . 96
2.6.3 First look at convex combinations . . . . . . . . . . . . . . . . . . . . . . 97
2.6.4 First look at affine combinations . . . . . . . . . . . . . . . . . . . . . . . 99
2.7 Dictionary-based representations of vectors . . . . . . . . . . . . . . . . . . . . . 99
2.7.1 Setter and getter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.7.2 Scalar-vector multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.7.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.4 Vector negative, invertibility of vector addition, and vector subtraction . . 103
2.8 Vectors over GF (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.8.1 Perfect secrecy re-revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.8.2 All-or-nothing secret-sharing using GF (2) . . . . . . . . . . . . . . . . . . 105
2.8.3 Lights Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.9 Dot-product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.9.1 Total cost or benefit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.9.2 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.9.3 Measuring similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.9.4 Dot-product of vectors over GF (2) . . . . . . . . . . . . . . . . . . . . . . 119
2.9.5 Parity bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.9.6 Simple authentication scheme . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.9.7 Attacking the simple authentication scheme . . . . . . . . . . . . . . . . . 122
2.9.8 Algebraic properties of the dot-product . . . . . . . . . . . . . . . . . . . 123
2.9.9 Attacking the simple authentication scheme, revisited . . . . . . . . . . . 125
2.10 Our implementation of Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.10.1 Syntax for manipulating Vecs . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.10.2 The implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.10.3 Using Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.10.4 Printing Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.10.5 Copying Vecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.10.6 From list to Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.11 Solving a triangular system of linear equations . . . . . . . . . . . . . . . . . . . 129
2.11.1 Upper-triangular systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.11.2 Backward substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.11.3 First implementation of backward substitution . . . . . . . . . . . . . . . 131
2.11.4 When does the algorithm work? . . . . . . . . . . . . . . . . . . . . . . . 133
2.11.5 Backward substitution with arbitrary-domain vectors . . . . . . . . . . . 133
2.12 Lab: Comparing voting records using dot-product . . . . . . . . . . . . . . . . . . 134
2.12.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.12.2 Reading in the file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.12.3 Two ways to use dot-product to compare vectors . . . . . . . . . . . . . . . 135
2.12.4 Policy comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.12.5 Not your average Democrat . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.12.6 Bitter Rivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.12.7 Open-ended study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.13 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
CONTENTS iii

2.14 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

3 The Vector Space 143


3.1 Linear combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.1.1 Definition of linear combination . . . . . . . . . . . . . . . . . . . . . . . . 143
3.1.2 Uses of linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.1.3 From coefficients to linear combination . . . . . . . . . . . . . . . . . . . . 146
3.1.4 From linear combination to coefficients . . . . . . . . . . . . . . . . . . . . 147
3.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.2.1 Definition of span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.2.2 A system of linear equations implies other equations . . . . . . . . . . . . 149
3.2.3 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.2.4 Linear combinations of linear combinations . . . . . . . . . . . . . . . . . 152
3.2.5 Standard generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.3 The geometry of sets of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.3.1 The geometry of the span of vectors over R . . . . . . . . . . . . . . . . . 155
3.3.2 The geometry of solution sets of homogeneous linear systems . . . . . . . 157
3.3.3 The two representations of flats containing the origin . . . . . . . . . . . . 159
3.4 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3.4.1 What’s common to the two representations? . . . . . . . . . . . . . . . . . 160
3.4.2 Definition and examples of vector space . . . . . . . . . . . . . . . . . . . 161
3.4.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.4.4 *Abstract vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3.5 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3.5.1 Flats that don’t go through the origin . . . . . . . . . . . . . . . . . . . . 164
3.5.2 Affine combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.5.3 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3.5.4 Representing an affine space as the solution set of a linear system . . . . . 170
3.5.5 The two representations, revisited . . . . . . . . . . . . . . . . . . . . . . 171
3.6 Linear systems, homogeneous and otherwise . . . . . . . . . . . . . . . . . . . . . 176
3.6.1 The homogeneous linear system corresponding to a general linear system 176
3.6.2 Number of solutions revisited . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.6.3 Towards intersecting a plane and a line . . . . . . . . . . . . . . . . . . . 179
3.6.4 Checksum functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
3.7 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

4 The Matrix 185


4.1 What is a matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.1.1 Traditional matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.1.2 The matrix revealed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.1.3 Rows, columns, and entries . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.1.4 Our Python implementation of matrices . . . . . . . . . . . . . . . . . . . 189
4.1.5 Identity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.1.6 Converting between matrix representations . . . . . . . . . . . . . . . . . 190
CONTENTS iv

4.1.7 matutil.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


4.2 Column space and row space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.3 Matrices as vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.4 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.5 Matrix-vector and vector-matrix multiplication in terms of linear combinations . 194
4.5.1 Matrix-vector multiplication in terms of linear combinations . . . . . . . . 194
4.5.2 Vector-matrix multiplication in terms of linear combinations . . . . . . . 195
4.5.3 Formulating expressing a given vector as a linear-combination as a matrix-
vector equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.5.4 Solving a matrix-vector equation . . . . . . . . . . . . . . . . . . . . . . . 198
4.6 Matrix-vector multiplication in terms of dot-products . . . . . . . . . . . . . . . 200
4.6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.6.2 Example applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.6.3 Formulating a system of linear equations as a matrix-vector equation . . . 204
4.6.4 Triangular systems and triangular matrices . . . . . . . . . . . . . . . . . 206
4.6.5 Algebraic properties of matrix-vector multiplication . . . . . . . . . . . . 207
4.7 Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.7.1 Homogeneous linear systems and matrix equations . . . . . . . . . . . . . 208
4.7.2 The solution space of a matrix-vector equation . . . . . . . . . . . . . . . 210
4.7.3 Introduction to error-correcting codes . . . . . . . . . . . . . . . . . . . . 211
4.7.4 Linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
4.7.5 The Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
4.8 Computing sparse matrix-vector product . . . . . . . . . . . . . . . . . . . . . . . 213
4.9 The matrix meets the function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
4.9.1 From matrix to function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
4.9.2 From function to matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
4.9.3 Examples of deriving the matrix . . . . . . . . . . . . . . . . . . . . . . . 215
4.10 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
4.10.1 Which functions can be expressed as a matrix-vector product . . . . . . . 218
4.10.2 Definition and simple examples . . . . . . . . . . . . . . . . . . . . . . . . 219
4.10.3 Linear functions and zero vectors . . . . . . . . . . . . . . . . . . . . . . . 221
4.10.4 What do linear functions have to do with lines? . . . . . . . . . . . . . . . 221
4.10.5 Linear functions that are one-to-one . . . . . . . . . . . . . . . . . . . . . 222
4.10.6 Linear functions that are onto? . . . . . . . . . . . . . . . . . . . . . . . . 223
4.10.7 A linear function from FC to FR can be represented by a matrix . . . . . 224
4.10.8 Diagonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
4.11 Matrix-matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
4.11.1 Matrix-matrix multiplication in terms of matrix-vector and vector-matrix
multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
4.11.2 Graphs, incidence matrices, and counting paths . . . . . . . . . . . . . . . 229
4.11.3 Matrix-matrix multiplication and function composition . . . . . . . . . . 233
4.11.4 Transpose of matrix-matrix product . . . . . . . . . . . . . . . . . . . . . 236
4.11.5 Column vector and row vector . . . . . . . . . . . . . . . . . . . . . . . . 237
4.11.6 Every vector is interpreted as a column vector . . . . . . . . . . . . . . . 238
4.11.7 Linear combinations of linear combinations revisited . . . . . . . . . . . . 238
CONTENTS v

4.12 Inner product and outer product . . . . . . . . . . . . . . . . . . . . . . . . . . . 239


4.12.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.12.2 Outer product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
4.13 From function inverse to matrix inverse . . . . . . . . . . . . . . . . . . . . . . . 240
4.13.1 The inverse of a linear function is linear . . . . . . . . . . . . . . . . . . . 240
4.13.2 The matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
4.13.3 Uses of matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
4.13.4 The product of invertible matrices is an invertible matrix . . . . . . . . . 244
4.13.5 More about matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.14 Lab: Error-correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
4.14.1 The check matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.14.2 The generator matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.14.3 Hamming’s code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.14.4 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.14.5 Error syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.14.6 Finding the error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.14.7 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4.15 Lab: Transformations in 2D geometry . . . . . . . . . . . . . . . . . . . . . . . . 252
4.15.1 Our representation for points in the plane . . . . . . . . . . . . . . . . . . 252
4.15.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
4.15.3 Image representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
4.15.4 Loading and displaying images . . . . . . . . . . . . . . . . . . . . . . . . 256
4.15.5 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4.15.6 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4.15.7 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4.15.8 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4.15.9 Rotation about a center other than the origin . . . . . . . . . . . . . . . . 258
4.15.10 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
4.15.11 Color transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
4.15.12 Reflection more generally . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
4.16 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
4.17 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

5 The Basis 269


5.1 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.1.1 René Descartes’ idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.1.2 Coordinate representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
5.1.3 Coordinate representation and matrix-vector multiplication . . . . . . . . 270
5.2 First look at lossy compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
5.2.1 Strategy 1: Replace vector with closest sparse vector . . . . . . . . . . . . 271
5.2.2 Strategy 2: Represent image vector by its coordinate representation . . . 272
5.3 Two greedy algorithms for finding a set of generators . . . . . . . . . . . . . . . . 274
5.3.1 Grow algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
5.3.2 Shrink algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
5.3.3 When greed fails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
CONTENTS vi

5.4 Minimum Spanning Forest and GF (2) . . . . . . . . . . . . . . . . . . . . . . . . 277


5.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
5.4.2 The Grow and Shrink algorithms for Minimum Spanning Forest . . . . . . 279
5.4.3 Formulating Minimum Spanning Forest in linear algebra . . . . . . . . . . 280
5.5 Linear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
5.5.1 The Superfluous-Vector Lemma . . . . . . . . . . . . . . . . . . . . . . . . 282
5.5.2 Defining linear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . 283
5.5.3 Linear dependence in Minimum Spanning Forest . . . . . . . . . . . . . . 284
5.5.4 Properties of linear (in)dependence . . . . . . . . . . . . . . . . . . . . . . 285
5.5.5 Analyzing the Grow algorithm . . . . . . . . . . . . . . . . . . . . . . . . 287
5.5.6 Analyzing the Shrink algorithm . . . . . . . . . . . . . . . . . . . . . . . . 287
5.6 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
5.6.1 Defining basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
5.6.2 The standard basis for FD . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
5.6.3 Towards showing that every vector space has a basis . . . . . . . . . . . . 292
5.6.4 Any finite set of vectors contains a basis for its span . . . . . . . . . . . . 292
5.6.5 Can any linearly independent subset of vectors belonging to V be extended
to form a basis for V? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
5.7 Unique representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
5.7.1 Uniqueness of representation in terms of a basis . . . . . . . . . . . . . . . 294
5.8 Change of basis, first look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
5.8.1 The function from representation to vector . . . . . . . . . . . . . . . . . 295
5.8.2 From one representation to another . . . . . . . . . . . . . . . . . . . . . 295
5.9 Perspective rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
5.9.1 Points in the world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
5.9.2 The camera and the image plane . . . . . . . . . . . . . . . . . . . . . . . 297
5.9.3 The camera coordinate system . . . . . . . . . . . . . . . . . . . . . . . . 299
5.9.4 From the camera coordinates of a point in the scene to the camera coordi-
nates of the corresponding point in the image plane . . . . . . . . . . . . 300
5.9.5 From world coordinates to camera coordinates . . . . . . . . . . . . . . . 302
5.9.6 ... to pixel coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
5.10 Computational problems involving finding a basis . . . . . . . . . . . . . . . . . . 304
5.11 The Exchange Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
5.11.1 The lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
5.11.2 Proof of correctness of the Grow algorithm for MSF . . . . . . . . . . . . 306
5.12 Lab: Perspective rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.12.1 The camera basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
5.12.2 The whiteboard basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
5.12.3 Mapping from pixels to points on the whiteboard . . . . . . . . . . . . . . 310
5.12.4 Mapping a point not on the whiteboard to the corresponding point on the
whiteboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
5.12.5 The change-of-basis matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 312
5.12.6 Computing the change-of-basis matrix . . . . . . . . . . . . . . . . . . . . 313
5.12.7 Image representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
5.12.8 Synthesizing the perspective-free image . . . . . . . . . . . . . . . . . . . . 317
CONTENTS vii

5.13 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319


5.14 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

6 Dimension 330
6.1 The size of a basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
6.1.1 The Morphing Lemma and its implications . . . . . . . . . . . . . . . . . 330
6.1.2 Proof of the Morphing Lemma . . . . . . . . . . . . . . . . . . . . . . . . 331
6.2 Dimension and rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
6.2.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
6.2.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
6.2.3 Dimension and rank in graphs . . . . . . . . . . . . . . . . . . . . . . . . 337
6.2.4 The cardinality of a vector space over GF (2) . . . . . . . . . . . . . . . . 338
6.2.5 Any linearly independent set of vectors belonging to V can be extended to
form a basis for V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
6.2.6 The Dimension Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
6.2.7 The Grow algorithm terminates . . . . . . . . . . . . . . . . . . . . . . . . 340
6.2.8 The Rank Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
6.2.9 Simple authentication revisited . . . . . . . . . . . . . . . . . . . . . . . . 343
6.3 Direct sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
6.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
6.3.2 Generators for the direct sum . . . . . . . . . . . . . . . . . . . . . . . . . 346
6.3.3 Basis for the direct sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
6.3.4 Unique decomposition of a vector . . . . . . . . . . . . . . . . . . . . . . . 347
6.3.5 Complementary subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
6.4 Dimension and linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
6.4.1 Linear function invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . 350
6.4.2 The largest invertible subfunction . . . . . . . . . . . . . . . . . . . . . . 351
6.4.3 The Kernel-Image Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 353
6.4.4 Linear function invertibility, revisited . . . . . . . . . . . . . . . . . . . . 354
6.4.5 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 355
6.4.6 Checksum problem revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 355
6.4.7 Matrix invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
6.4.8 Matrix invertibility and change of basis . . . . . . . . . . . . . . . . . . . 357
6.5 The annihilator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
6.5.1 Conversions between representations . . . . . . . . . . . . . . . . . . . . . 358
6.5.2 The annihilator of a vector space . . . . . . . . . . . . . . . . . . . . . . . 361
6.5.3 The Annihilator Dimension Theorem . . . . . . . . . . . . . . . . . . . . . 363
6.5.4 From generators for V to generators for V o , and vice versa . . . . . . . . . 363
6.5.5 The Annihilator Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
6.6 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
CONTENTS viii

7 Gaussian elimination 374


7.1 Echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
7.1.1 From echelon form to a basis for row space . . . . . . . . . . . . . . . . . 376
7.1.2 Rowlist in echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
7.1.3 Sorting rows by position of the leftmost nonzero . . . . . . . . . . . . . . 378
7.1.4 Elementary row-addition operations . . . . . . . . . . . . . . . . . . . . . 379
7.1.5 Multiplying by an elementary row-addition matrix . . . . . . . . . . . . . 380
7.1.6 Row-addition operations preserve row space . . . . . . . . . . . . . . . . . 381
7.1.7 Basis, rank, and linear independence through Gaussian elimination . . . . 383
7.1.8 When Gaussian elimination fails . . . . . . . . . . . . . . . . . . . . . . . 383
7.1.9 Pivoting, and numerical analysis . . . . . . . . . . . . . . . . . . . . . . . 384
7.2 Gaussian elimination over GF (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
7.3 Using Gaussian elimination for other problems . . . . . . . . . . . . . . . . . . . 385
7.3.1 There is an invertible matrix M such that M A is in echelon form . . . . . 386
7.3.2 Computing M without matrix multiplications . . . . . . . . . . . . . . . . 387
7.4 Solving a matrix-vector equation using Gaussian elimination . . . . . . . . . . . . 390
7.4.1 Solving a matrix-vector equation when the matrix is in echelon form—the
invertible case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
7.4.2 Coping with zero rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
7.4.3 Coping with irrelevant columns . . . . . . . . . . . . . . . . . . . . . . . . 391
7.4.4 Attacking the simple authentication scheme, and improving it . . . . . . . 392
7.5 Finding a basis for the null space . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
7.6 Factoring integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
7.6.1 First attempt at factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
7.7 Lab: Threshold Secret-Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
7.7.1 First attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
7.7.2 Scheme that works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
7.7.3 Implementing the scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
7.7.4 Generating mathbf u . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
7.7.5 Finding vectors that satisfy the requirement . . . . . . . . . . . . . . . . . 400
7.7.6 Sharing a string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
7.8 Lab: Factoring integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
7.8.1 First attempt to use square roots . . . . . . . . . . . . . . . . . . . . . . . 401
7.8.2 Euclid’s algorithm for greatest common divisor . . . . . . . . . . . . . . . 402
7.8.3 Using square roots revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 402
7.9 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

8 The Inner Product 417


8.1 The fire engine problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
8.1.1 Distance, length, norm, inner product . . . . . . . . . . . . . . . . . . . . 418
8.2 The inner product for vectors over the reals . . . . . . . . . . . . . . . . . . . . . 419
8.2.1 Norms of vectors over the reals . . . . . . . . . . . . . . . . . . . . . . . . 419
8.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
8.3.1 Properties of orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 422
8.3.2 Decomposition of b into parallel and perpendicular components . . . . . . 423
CONTENTS ix

8.3.3 Orthogonality property of the solution to the fire engine problem . . . . . 425
8.3.4 Finding the projection and the closest point . . . . . . . . . . . . . . . . . 426
8.3.5 Solution to the fire engine problem . . . . . . . . . . . . . . . . . . . . . . 427
8.3.6 *Outer product and projection . . . . . . . . . . . . . . . . . . . . . . . . 428
8.3.7 Towards solving the higher-dimensional version . . . . . . . . . . . . . . . 429
8.4 Lab: machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
8.4.1 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
8.4.2 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
8.4.3 Hypothesis class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
8.4.4 Selecting the classifier that minimizes the error on the training data . . . 432
8.4.5 Nonlinear optimization by hill-climbing . . . . . . . . . . . . . . . . . . . . 433
8.4.6 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
8.4.7 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
8.5 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
8.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

9 Orthogonalization 440
9.1 Projection orthogonal to multiple vectors . . . . . . . . . . . . . . . . . . . . . . 441
9.1.1 Orthogonal to a set of vectors . . . . . . . . . . . . . . . . . . . . . . . . . 441
9.1.2 Projecting onto and orthogonal to a vector space . . . . . . . . . . . . . . 442
9.1.3 First attempt at projecting orthogonal to a list of vectors . . . . . . . . . 443
9.2 Projecting orthogonal to mutually orthogonal vectors . . . . . . . . . . . . . . . . 445
9.2.1 Proving the correctness of project orthogonal . . . . . . . . . . . . . . 446
9.2.2 Augmenting project orthogonal . . . . . . . . . . . . . . . . . . . . . . 448
9.3 Building an orthogonal set of generators . . . . . . . . . . . . . . . . . . . . . . . 450
9.3.1 The orthogonalize procedure . . . . . . . . . . . . . . . . . . . . . . . . 450
9.3.2 Proving the correctness of orthogonalize . . . . . . . . . . . . . . . . . . 452
9.4 Solving the Computational Problem closest point in the span of many vectors . . 454
9.5 Solving other problems using orthogonalize . . . . . . . . . . . . . . . . . . . . . 455
9.5.1 Computing a basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
9.5.2 Computing a subset basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
9.5.3 augmented orthogonalize . . . . . . . . . . . . . . . . . . . . . . . . . . 456
9.5.4 Algorithms that work in the presence of rounding errors . . . . . . . . . . 457
9.6 Orthogonal complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
9.6.1 Definition of orthogonal complement . . . . . . . . . . . . . . . . . . . . . 457
9.6.2 Orthogonal complement and direct sum . . . . . . . . . . . . . . . . . . . 458
9.6.3 Normal to a plane in R3 given as span or affine hull . . . . . . . . . . . . 459
9.6.4 Orthogonal complement and null space and annihilator . . . . . . . . . . 460
9.6.5 Normal to a plane in R3 given by an equation . . . . . . . . . . . . . . . . 460
9.6.6 Computing the orthogonal complement . . . . . . . . . . . . . . . . . . . 461
9.7 The QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
9.7.1 Orthogonal and column-orthogonal matrices . . . . . . . . . . . . . . . . . 462
9.7.2 Defining the QR factorization of a matrix . . . . . . . . . . . . . . . . . . 463
9.7.3 Requring A to have linearly independent columns . . . . . . . . . . . . . . 463
9.8 Using the QR factorization to solve a matrix equation Ax = b . . . . . . . . . . 464
CONTENTS x

9.8.1The square case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464


9.8.2Correctness in the square case . . . . . . . . . . . . . . . . . . . . . . . . 465
9.8.3The least-squares problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
9.8.4The coordinate representation in terms of the columns of a column-orthogonal
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
9.8.5 Using QR solve when A has more rows than columns . . . . . . . . . . . 468
9.9 Applications of least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
9.9.1 Linear regression (Line-fitting) . . . . . . . . . . . . . . . . . . . . . . . . 468
9.9.2 Fitting to a quadratic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
9.9.3 Fitting to a quadratic in two variables . . . . . . . . . . . . . . . . . . . . 470
9.9.4 Coping with approximate data in the industrial espionage problem . . . . 471
9.9.5 Coping with approximate data in the sensor node problem . . . . . . . . 472
9.9.6 Using the method of least squares in the machine-learning problem . . . . 473
9.10 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
9.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

10 Special Bases 483


10.1 Closest k-sparse vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
10.2 Closest vector whose representation with respect to a given basis is k-sparse . . . 484
10.2.1 Finding the coordinate representation in terms of an orthonormal basis . 485
10.2.2 Multiplication by a column-orthogonal matrix preserves norm . . . . . . . 485
10.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
10.3.1 One-dimensional “images” of different resolutions . . . . . . . . . . . . . . 487
10.3.2 Decomposing Vn as a direct sum . . . . . . . . . . . . . . . . . . . . . . . 489
10.3.3 The wavelet bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
10.3.4 The basis for V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
10.3.5 General n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
10.3.6 The first stage of wavelet transformation . . . . . . . . . . . . . . . . . . . 492
10.3.7 The subsequent levels of wavelet decomposition . . . . . . . . . . . . . . . 493
10.3.8 Normalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
10.3.9 The backward transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
10.3.10 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
10.4 Polynomial evaluation and interpolation . . . . . . . . . . . . . . . . . . . . . . . 496
10.5 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
10.6 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
10.6.1 The Laws of Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . 501
10.6.2 The n stopwatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
10.6.3 Discrete Fourier space: Sampling the basis functions . . . . . . . . . . . . 503
10.6.4 The inverse of the Fourier matrix . . . . . . . . . . . . . . . . . . . . . . . 504
10.6.5 The Fast Fourier Transform Algorithm . . . . . . . . . . . . . . . . . . . . 506
10.6.6 Deriving the FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
10.6.7 Coding the FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
10.7 The inner product for the field of complex numbers . . . . . . . . . . . . . . . . . 509
10.8 Circulant matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
10.8.1 Multiplying a circulant matrix by a column of the Fourier matrix . . . . . 513
CONTENTS xi

10.8.2 Circulant matrices and change of basis . . . . . . . . . . . . . . . . . . . . 515


10.9 Lab: Using wavelets for compression . . . . . . . . . . . . . . . . . . . . . . . . . 515
10.9.1 Unnormalized forward transform . . . . . . . . . . . . . . . . . . . . . . . 517
10.9.2 Normalization in the forward transform . . . . . . . . . . . . . . . . . . . 518
10.9.3 Compression by suppression . . . . . . . . . . . . . . . . . . . . . . . . . . 519
10.9.4 Unnormalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
10.9.5 Unnormalized backward transform . . . . . . . . . . . . . . . . . . . . . . 519
10.9.6 Backward transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
10.9.7 Auxiliary procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
10.9.8 Two-dimensional wavelet transform . . . . . . . . . . . . . . . . . . . . . 521
10.9.9 Forward two-dimensional transform . . . . . . . . . . . . . . . . . . . . . 522
10.9.10 More auxiliary procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
10.9.11 Two-dimensional backward transform . . . . . . . . . . . . . . . . . . . . . 524
10.9.12 Experimenting with compression of images . . . . . . . . . . . . . . . . . . 525
10.10Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
10.11Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

11 The Singular Value Decomposition 529


11.1 Approximation of a matrix by a low-rank matrix . . . . . . . . . . . . . . . . . . 529
11.1.1 The benefits of low-rank matrices . . . . . . . . . . . . . . . . . . . . . . . 529
11.1.2 Matrix norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
11.2 The trolley-line-location problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
11.2.1 Solution to the trolley-line-location problem . . . . . . . . . . . . . . . . . 532
11.2.2 Rank-one approximation to a matrix . . . . . . . . . . . . . . . . . . . . . 536
11.2.3 The best rank-one approximation . . . . . . . . . . . . . . . . . . . . . . . 536
11.2.4 An expression for the best rank-one approximation . . . . . . . . . . . . . 537
11.2.5 The closest one-dimensional affine space . . . . . . . . . . . . . . . . . . . 539
11.3 Closest dimension-k vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
11.3.1 A Gedanken algorithm to find the singular values and vectors . . . . . . . 540
11.3.2 Properties of the singular values and right singular vectors . . . . . . . . 541
11.3.3 The singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . 542
11.3.4 Using right singular vectors to find the closest k-dimensional space . . . . 544
11.3.5 Best rank-k approximation to A . . . . . . . . . . . . . . . . . . . . . . . 547
11.3.6 Matrix form for best rank-k approximation . . . . . . . . . . . . . . . . . 548
11.3.7 Number of nonzero singular values is rank A . . . . . . . . . . . . . . . . 548
11.3.8 Numerical rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
11.3.9 Closest k-dimensional affine space . . . . . . . . . . . . . . . . . . . . . . 549
11.3.10 Proof that U is column-orthogonal . . . . . . . . . . . . . . . . . . . . . . 550
11.4 Using the singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . 551
11.4.1 Using SVD to do least squares . . . . . . . . . . . . . . . . . . . . . . . . 552
11.5 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
11.6 Lab: Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
11.7 Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
11.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
CONTENTS xii

12 The Eigenvector 560


12.1 Modeling discrete dynamic processes . . . . . . . . . . . . . . . . . . . . . . . . . 560
12.1.1 Two interest-bearing accounts . . . . . . . . . . . . . . . . . . . . . . . . . 560
12.1.2 Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
12.2 Diagonalization of the Fibonacci matrix . . . . . . . . . . . . . . . . . . . . . . . 564
12.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
12.3.1 Similarity and diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . 567
12.4 Coordinate representation in terms of eigenvectors . . . . . . . . . . . . . . . . . 569
12.5 The Internet worm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
12.6 Existence of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
12.6.1 Positive-definite and positive-semidefinite matrices . . . . . . . . . . . . . 572
12.6.2 Matrices with distinct eigenvalues . . . . . . . . . . . . . . . . . . . . . . 573
12.6.3 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
12.6.4 Upper-triangular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
12.6.5 General square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
12.7 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
12.8 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
12.8.1 Modeling population movement . . . . . . . . . . . . . . . . . . . . . . . . 578
12.8.2 Modeling Randy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
12.8.3 Markov chain definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
12.8.4 Modeling spatial locality in memory fetches . . . . . . . . . . . . . . . . . 580
12.8.5 Modeling documents: Hamlet in Wonderland . . . . . . . . . . . . . . . . 581
12.8.6 Modeling lots of other stuff . . . . . . . . . . . . . . . . . . . . . . . . . . 582
12.8.7 Stationary distributions of Markov chains . . . . . . . . . . . . . . . . . . 583
12.8.8 Sufficient condition for existence of a stationary distribution . . . . . . . . 583
12.9 Modeling a web surfer: PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . 583
12.10*The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
12.10.1 Areas of parallelograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
12.10.2 Volumes of parallelepipeds . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
12.10.3 Expressing the area of a polygon in terms of areas of parallelograms . . . 587
12.10.4 The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
12.10.5 *Characterizing eigenvalues via the determinant function . . . . . . . . . 592
12.11*Proofs of some eigentheorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
12.11.1 Existence of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
12.11.2 Diagonalization of symmetric matrices . . . . . . . . . . . . . . . . . . . . 594
12.11.3 Triangularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
12.12Lab: Pagerank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
12.12.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
12.12.2 Working with a Big Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 601
12.12.3 Implementing PageRank Using The Power Method . . . . . . . . . . . . . 602
12.12.4 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
12.12.5 Handling queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
12.12.6 Biasing the pagerank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
12.12.7 Optional: Handling multiword queries . . . . . . . . . . . . . . . . . . . . 606
12.13Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
CONTENTS xiii

12.14Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

13 The Linear Program 612


13.1 The diet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
13.2 Formulating the diet problem as a linear program . . . . . . . . . . . . . . . . . . 613
13.3 The origins of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . 614
13.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
13.3.2 Linear programming in different forms . . . . . . . . . . . . . . . . . . . . 615
13.3.3 Integer linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . 616
13.4 Geometry of linear programming: polyhedra and vertices . . . . . . . . . . . . . 616
13.5 There is an optimal solution that is a vertex of the polyhedron . . . . . . . . . . 620
13.6 An enumerative algorithm for linear programming . . . . . . . . . . . . . . . . . 620
13.7 Introduction to linear-programming duality . . . . . . . . . . . . . . . . . . . . . 621
13.8 The simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
13.8.1 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
13.8.2 Representing the current solution . . . . . . . . . . . . . . . . . . . . . . . 624
13.8.3 A pivot step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
13.8.4 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
13.9 Finding a vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
13.10Game theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
13.11Formulation as a linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
13.12Nonzero-sum games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
13.13Lab: Learning through linear programming . . . . . . . . . . . . . . . . . . . . . . 636
13.13.1 Reading in the training data . . . . . . . . . . . . . . . . . . . . . . . . . . 637
13.13.2 Setting up the linear program . . . . . . . . . . . . . . . . . . . . . . . . . 638
13.13.3 Main constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
13.13.4 Nonnegativity constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
13.13.5 The matrix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
13.13.6 The right-hand side vector b . . . . . . . . . . . . . . . . . . . . . . . . . 640
13.13.7 The objective function vector c . . . . . . . . . . . . . . . . . . . . . . . . 641
13.13.8 Putting it together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
13.13.9 Finding a vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
13.13.10Solving the linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
13.13.11Using the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
13.14Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
13.14.1 More quickly acquiring an MRI image . . . . . . . . . . . . . . . . . . . . 642
13.14.2 Computation to the rescue . . . . . . . . . . . . . . . . . . . . . . . . . . 643
13.14.3 Onwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
13.15Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
13.16Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
Introduction

Tourist on Fifty-Seventh Street,


Manhattan: “Pardon me—could you tell
me how to get to Carnegie Hall?”
Native New Yorker: “Practice, practice!”

There’s a scene in the movie The Matrix in which Neo is strapped in a chair and Morpheus
inserts into a machine what looks like a seventies-era videotape cartridge. As the tape plays,
knowledge of how to fight streams into Neo’s brain. After a very short time, he has become an
expert.
I would be delighted if I could strap my students into chairs and quickly stream knowledge
of linear algebra into their brain, but brains don’t learn that way. The input device is rarely the
bottleneck. Students need lots of practice—but what kind of practice?
No doubt students need to practice the basic numerical calculations, such as matrix-matrix
multiplication, that underlie elementary linear algebra and that seem to fill all their time in
traditional cookbook-style courses on linear algebra. No doubt students need to find proofs and
counterexamples to exercise their understanding of the abstract concepts of which linear algebra
is constructed.
However, they also need practice in using linear algebra to think about problems in other
domains, and in actually using linear-algebra computations to address these problems. These
are the skills they most need from a linear-algebra class when they go on to study other topics
such as graphics and machine learning. This book is aimed at students of computer science; such
students are best served by seeing applications from their field because these are the applications
that will be most meaningful for them.
Moreover, a linear-algebra instructor whose pupils are students of computer science has a
special advantage: her students are computationally sophisticated. They have a learning modal-
ity that most students don’t—they can learn through reading, writing, debugging, and using
computer programs.
For example, there are several ways of writing a program for matrix-vector or matrix-matrix
multiplication, each providing its own kernel of insight into the meaning of the operation—and
the experience of writing such programs is more effective in conveying this meaning and cementing
the relationships between the operations than spending the equivalent time carrying out hand
calculations.
Computational sophistication also helps students in the more abstract, mathematical aspects

xiv
CONTENTS xv

of linear algebra. Acquaintance with object-oriented programming helps a student grasp the
notion of a field—a set of values together with operations on them. Acquaintance with subtyping
prepares a student to understand that some vector spaces are inner product spaces. Familiarity
with loops or recursion helps a student understand procedural proofs, e.g. of the existence of a
basis or an orthogonal basis.
Computational thinking is the term suggested by Jeannette Wing, former head of the National
Science Foundation’s directorate on Computer and Information Science and Engineering, to refer
to the skills and concepts that a student of computer science can bring to bear. For this book,
computational thinking is the road to mastering elementary linear algebra.

Companion website
The companion website is at codingthematrix.com. There you will find, in digital form, the
data, examples, and support code you need to solve the problems given in the book.

Intended audience
This book is accessible to a student who is an experienced programmer. Most students who take
my course have had at least two semesters of introductory computer science, or have previously
learned programming on their own. In addition, it is desirable that the student has has some
exposure (in prior semesters or concurrently) in proof techniques such as are studied in a Discrete
Math course.
The student’s prior programming experience can be in pretty much any programming lan-
guage; this book uses Python, and the first two labs are devoted to bringing the student up to
speed in Python programming. Moreover, the programs we write in this book are not particularly
sophisticated. For example, we provide stencil code that obviates the need for the student to
have studied object-oriented programming.
Some sections of the text, marked with *, provide supplementary mathematical material but
are not crucial for the reader’s understanding.

Labs
An important part of the book is the labs. For each chapter, there is a lab assignment in which
the student is expected to write small programs and use some modules we provide, generally
to carry out a task or series of tasks related to an application of the concepts recently covered
or about to be covered. Doing the labs “keeps it real”, grounding the student’s study of linear
algebra in getting something done, something meaningful in its own right but also illustrative of
the concepts.
In my course, there is a lab section each week, a two-hour period in which the students carry
out the lab assignment. Course staff are available during this period, not to supervise but to assist
when necessary. The goal is to help the students move through the lab assignment efficiently, and
to get them unstuck when they encounter obstacles. The students are expected to have prepared
by reviewing the previous week’s course material and reading through the lab assignment.
Most students experience the labs as the most fun part of the course—it is where they
discover the power of the knowledge they are acquiring to help them accomplish something that
CONTENTS xvi

has meaning in the world of computer science.

Programming language
The book uses Python, not a programming language with built-in support for vectors and matri-
ces. This gives students the opportunity to build vectors and matrices out of the data structures
that Python does provide. Using their own implementations of vector and matrix provides trans-
parency. Python does provide complex numbers, sets, lists (sequences), and dictionaries (which
we use for representing functions). In addition, Python provides comprehensions, expressions
that create sets, lists, or dictionaries using a simple and powerful syntax that resembles the
mathematical notation for defining sets. Using this syntax, many of the procedures we write
require only a single line of code.
Students are not expected to know Python at the beginning; the first two labs form an
introduction to Python, and the examples throughout the text reinforce the ideas.

Vector and Matrix representations


The traditional concrete representation for a vector is as a sequence of field elements. This book
uses that representation but also uses another, especially in Python programs: a vector as a
function mapping a finite set D to a field. Similarly, the traditional representation for a matrix
is as a two-dimensional array or grid of field elements. We use this representation but also use
another: a matrix as a function from the Cartesian product R × C of two finite sets to a field.
These more general representations allow the vectors and matrices to be more directly con-
nected to the application. For example, it is traditional in information retrieval to represent a
document as a vector in which, for each word, the vector specifies the number of occurences of
the word in the document. In this book, we define such a vector as a function from the domain
D of English words to the set of real numbers. Another example: when representing, say, a
1024 × 768 black-and-white image as a vector, we define the vector as a function from the domain
D = {1, . . . , 1024} × {1, .., 768} to the real numbers. The function specifies, for each pixel (i, j),
the image intensity of that pixel.
From the programmer’s perspective, it is certainly more convenient to directly index vectors
by strings (in the case of words) or tuples (in the case of pixels). However, a more important
advantage is this: having to choose a domain D for vectors gets us thinking about the application
from the vector perspective.
Another advantage is analogous to that of type-checking in programs or unit-checking in
physical calculations. For an R × C matrix A, the matrix-vector product Ax is only legal if x is
a C-vector; the matrix-matrix product AB is only legal if C is the set of row-labels of B. These
constraints further reinforce the meanings of the operations.
Finally, allowing arbitrary finite sets (not just sequences of consecutive integers) to label the
elements helps make it clear that the order of elements in a vector or matrix is not always (or
even often) significant.
CONTENTS xvii

Fundamental Questions
The book is driven not just by applications but also by fundamental questions and computational
problems that arise in studying these applications. Here are some of the fundamental questions:

• How can we tell whether a solution to a linear system is unique?


• How can we find the number of solutions to a linear system over GF (2)?
• How can we tell if a set V of vectors is equal to the span of vectors v1 , . . . , vn ?

• For a system of linear equations, what other linear equations are implied?

• How can we tell if a matrix is invertible?


• Can every vector space be represented as the solution set of a homogeneous linear system?

Fundamental Computational Problems


There are a few computational problems that are central to linear algebra. In the book, these
arise in a variety of forms as we examine various applications, and we explore the connections
between them. Here are examples:

• Find the solution to a matrix equation M x = b.


• Find the vector x minimizing the distance between M x and b.

• Given vector b, find the closest vector to b whose representation in a given basis is k-sparse.

• Find the solution to a matrix inequality M x ≤ b.

• Given a matrix M , find the closest matrix to M whose rank is at most k.

Multiple representations
The most important theme of this book is the idea of multiple different representations for the
same object. This theme should be familiar to computer scientists. In linear algebra, it arises
again and again:
• Representing a vector space by generators or by homogeneous linear equations.
• Different bases for the same vector space.
• Different data structures used to represent a vector or a matrix.

• Different decompositions of a matrix.


CONTENTS xviii

Multiple fields
In order to illustrate the generality of the ideas of linear algebra and in order to address a broader
range of applications, the book deals with three different fields: the real numbers, the complex
numbers, and the finite field GF (2). Most examples are over the real numbers because they are
most familiar to the reader. The complex numbers serve as a warm-up for vectors since they
can be used to represent points in the plane and transformations on these points. The complex
numbers also come up in the discussion of the finite Fourier transform and in eigenvalues. The
finite field GF (2) comes up in many applications involving information, such as encryption,
authentication, checksums, network coding, secret-sharing, and error-correcting codes.
The multiple codes help to illustrate the idea of an inner-product space. There is a very simple
inner product for vectors over the reals; there is a slightly more complicated inner product for
vectors over the complex numbers; and there is no inner-product for vectors over a finite field.

Acknowledgements
Thanks to my Brown University colleague John F. Hughes, computer scientist and recovering
mathematician. I have learned a great deal from our conversations, and this book owes much to
him.
Thanks to my Brown University colleague Dan Abramovich, mathematician, who has shared
his insights on the tradeoffs between abstraction and simplicity of presentation.
Thanks to the students who have served as teaching assistants for the Brown University
course on which this book is based and who have helped prepare some of the problems and labs,
especially Sarah Meikeljohn, Shay Mozes, Olga Ohrimenko, Matthew Malin, Alexandra Berke,
Anson Rosenthal, and Eli Fox-Epstein.
Thanks to Rosemary Simpson for her work on the index for this book.
Thanks to the creator of xkcd, Randall Munroe, for giving permission to include some of his
work in this book.
Thanks, finally, to my family for their support and understanding.
Chapter 0

The Function (and other


mathematical and computational
preliminaries)

Later generations will regard


Mengenlehre [set theory] as a disease
from which one has recovered.
attributed to Poincáre

The basic mathematical concepts that inform our study of vectors and matrices are sets,
sequences (lists), functions, and probability theory.
This chapter also includes an introduction to Python, the programming language we use to (i)
model the mathematical objects of interest, (ii) write computational procedures, and (iii) carry
out data analyses.

0.1 Set terminology and notation


The reader is likely to be familiar with the idea of a set, a collection of mathematical objects in
which each object is considered to occur at most once. The objects belonging to a set are its
elements. We use curly braces to indicate a set specified by explicitly enumerating its elements.
For example, {♥, ♠, ♣, ♦} is the set of suits in a traditional deck of cards. The order in which
elements are listed is not significant; a set imposes no order among its elements.
The symbol ∈ is used to indicate that an object belongs to a set (equivalently, that the set
contains the object). For example, ♥ ∈ {♥, ♠, ♣, ♦}.
One set S1 is contained in another set S2 (written S1 ⊆ S2 ) if every element of S1 belongs
to S2 . Two sets are equal if they contain exactly the same elements. A convenient way to prove
that two sets are equal consists of two steps: (1) prove the first set is contained in the second,
and (2) prove the second is contained in the first.

1
CHAPTER 0. THE FUNCTION 2

A set can be infinite. In Chapter 1, we discuss the set R, which consists of all real numbers,
and the set C, which consists of all complex numbers.
If a set S is not infinite, we use |S| to denote its cardinality, the number of elements it contains.
For example, the set of suits has cardinality 4.

0.2 Cartesian product


One from column A, one from column B.

The Cartesian product of two sets A and B is the set of all pairs (a, b) where a ∈ A and b ∈ B.

Example 0.2.1: For the sets A = {1, 2, 3} and B = {♥, ♠, ♣, ♦}, the Cartesian product is

{(1, ♥), (2, ♥), (3, ♥), (1, ♠), (2, ♠), (3, ♠), (1, ♣), (2, ♣), (3, ♣), (1, ♦), (2, ♦), (3, ♦)}

Quiz 0.2.2: What is the cardinality of A × B in Example 0.2.1 (Page 2)?

Answer

|A × B| = 12.

Proposition 0.2.3: For finite sets A and B, |A × B| = |A| · |B|.

Quiz 0.2.4: What is the cardinality of {1, 2, 3, . . . , 10, J, Q, K} × {♥, ♠, ♣, ♦}?

Answer

We use Proposition 0.2.3. The cardinality of the first set is 13, and the cardinality of the
second set is 4, so the cardinality of the Cartesian product is 13 · 4, which is 52.

The Cartesian product is named for René Descartes, whom we shall discuss in Chapter 6.

0.3 The function


Mathematicians never die—they just lose function.

Loosely speaking, a function is a rule that, for each element in some set D of possible inputs,
assigns a possible output. The output is said to be the image of the input under the function
CHAPTER 0. THE FUNCTION 3

and the input is a pre-image of the output. The set D of possible inputs is called the domain of
the function.
Formally, a function is a (possibly infinite) set of pairs (a, b) no two of which share the same
first entry.

Example 0.3.1: The doubling function with domain {1, 2, 3, . . .} is

{(1, 2), (2, 4), (3, 6), (4, 8), . . .}

The domain can itself consist of pairs of numbers.

Example 0.3.2: The multiplication function with domain {1, 2, 3, . . .} × {1, 2, 3, . . .} looks
something like this:

{((1, 1), 1), ((1, 2), 2), . . . , ((2, 1), 2), ((2, 2), 4), ((2, 3), 6), . . .}

For a function named f , the image of q under f is denoted by f (q). If r = f (q), we say that
q maps to r under f . The notation for “q maps to r” is q )→ r. (This notation omits specifying
the function; it is useful when there is no ambiguity about which function is intended.)
It is convenient when specifying a function to specify a co-domain for the function. The
co-domain is a set from which the function’s output values are chosen. Note that one has some
leeway in choosing the co-domain since not all of its members need be outputs.
The notation
f : D −→ F
means that f is a function whose domain is the set D and whose co-domain (the set of possible
outputs) is the set F . (More briefly: “a function from D to F ”, or “a function that maps D to
F .”)

Example 0.3.3: Caesar was said to have used a cryptosystem in which each letter was replaced
with the one three steps forward in the alphabet (wrapping around for X,Y, and Z).a Thus the
plaintext MATRIX would be encrypted as the cyphertext PDWULA. The function that maps
each plaintext letter to its cyphertext replacement could be written as

A )→ D, B )→ E, C )→ F, D )→ G, W )→ Z, X )→ A, Y )→ B, Z )→ C

This function’s domain and co-domain are both the alphabet {A, B, . . . , Z}.
a Some imaginary historians have conjectured that Caesar’s assasination can be attributed to his use of

such a weak cryptosystem.

Example 0.3.4: The cosine function, cos, maps from the set of real numbers (indicated by R)
CHAPTER 0. THE FUNCTION 4

to the set of real numbers. We would therefore write

cos : R −→ R

Of course, the outputs of the cos function do not include all real numbers, only those between -1
and 1.

The image of a function f is the set of images of all domain elements. That is, the image of
f is the set of elements of the co-domain that actually occur as outputs. For example, the image
of Caesar’s encryption function is the entire alphabet, and the image of the cosine function is the
set of numbers between -1 and 1.

Example 0.3.5: Consider the function prod that takes as input a pair of integers greater than
1 and outputs their product. The domain (set of inputs) is the set of pairs of integers greater
than 1. We choose to define the co-domain to be the set of all integers greater than 1. The
image of the function, however, is the set of composite integers since no domain element maps
to a prime number.

0.3.1 Functions versus procedures, versus computational problems


There are two other concepts that are closely related to functions and that enter into our story,
and we must take some care to distinguish them.

• A procedure is a precise description of a computation; it accepts inputs (called arguments)


and produces an output (called the return value).

Example 0.3.6: This example illustrates the Python syntax for defining procedures:

def mul(p,q): return p*q

In the hope of avoiding confusion, we diverge from the common practice of referring to
procedures as “functions”.
• A computational problem is an input-output specification that a procedure might be re-
quired to satisfy.

Example 0.3.7: – input: a pair (p, q) of integers greater than 1


– output: the product pq

Example 0.3.8:
– input: an integer m greater than 1
– output: a pair (p, q) of integers whose product is m
CHAPTER 0. THE FUNCTION 5

How do these concepts differ from one another?


• Unlike a procedure, a function or computational problem does not give us any idea how
to compute the output from the input. There are often many different procedures that
satisfy the same input-output specification or that implement the same function. For
integer multiplication, there is ordinary long multiplication (you learned this in elementary
school), the Karatsuba algorithm (used by Python for long-integer multiplication), the
faster Schönhage-Strassen algorithm (which uses the Fast Fourier Transform, discussed in
Chapter 10), and the even faster Fürer algorithm, which was discovered in 2007.
• Sometimes the same procedure can be used for different functions. For example, the Python
procedure mul can be used for multiplying negative integers and numbers that are not
integers.
• Unlike a function, a computational problem need not specify a unique output for every
input; for Example 0.3.8 (Page 4), if the input is 12, the output could be (2, 6) or (3, 4) or
(4, 3) or (6, 2).

0.3.2 The two computational problems related to a function


All the king’s horses and all the king’s men
Couldn’t put Humpty together again.

Although function and computational problem are defined differently, they are clearly related.
For each function f , there is a corresponding computational problem:
The forward problem: Given an element a of f ’s domain, compute f (a), the image of a under f .
Example 0.3.7 (Page 4) is the computational problem that corresponds in this sense to the
function defined in Example 0.3.2 (Page 3).
However, there is another computational problem associated with a function:
The backward problem: Given an element r of the co-domain of the function, compute any
pre-image (or report that none exists).
How very different are these two computational problems? Suppose there is a procedure
P (x) for computing the image under f of any element of the domain. An obvious procedure for
computing the pre-image of r is to iterate through each of the domain elements q, and, one by
one, apply the procedure P (x) on q to see if the output matches r.
This approach seems ridiculously profligate—even if the domain is finite, it might be so large
that the time required for solving the pre-image problem would be much more than that for
P (x)—and yet there is no better approach that works for all functions.
Indeed, consider Example 0.3.7 (Page 4) (integer multiplication) and Example 0.3.8 (Page
4) (integer factoring). The fact that integer multiplication is computationally easy while integer
factoring is computationally difficult is in fact the basis for the security of the RSA cryptosystem,
which is at the heart of secure commerce over the world-wide web.
And yet, as we will see in this book, finding pre-images can be quite useful. What is one to
do?
In this context, the generality of the concept of function is also a weakness. To misquote
Spiderman,
CHAPTER 0. THE FUNCTION 6

With great generality comes great computational difficulty.


This principle suggests that we consider the pre-image problem not for arbitrary functions but
for specific families of functions. Yet here too there is a risk. If the family of functions is
too restrictive, the existence of fast procedures for solving the pre-image problem will have
no relevance to real-world problems. We must navigate between the Scylla of computational
intractability and the Charybdis of inapplicability.
In linear algebra, we will discover a sweet spot. The family of linear functions, which are
introduced in Chapter 4, manage to model enough of the world to be immensely useful. At the
same time, the pre-image problem can be solved for such functions.

0.3.3 Notation for the set of functions with given domain and co-
domain
For sets D and F , we use the notation F D to denote all functions from D to F . For example,
the set of functions from the set W of words to the set R of real numbers is denoted RW .
This notation derives from a mathematical “pun”:

Fact 0.3.9: For any finite sets D and F , |DF | = |D||F | .

0.3.4 Identity function


For any domain D, there is a function idD : D −→ D called the identity function for D, defined
by
idD (d) = d
for every d ∈ D.

0.3.5 Composition of functions


The operation functional composition combines two functions to get a new function. We will later
define matrix multiplication in terms of functional composition. Given two functions f : A −→ B
and g : B −→ C, the function g ◦f , called the composition of g and f , is a function whose domain
is A and its co-domain is C. It is defined by the rule

(g ◦ f )(x) = g(f (x))

for every x ∈ A.
If the image of f is not contained in the domain of g then g ◦ f is not a legal expression.

Example 0.3.10: Say the domain and co-domains of f and g are R, and f (x) = x + 1 and
g(y) = y 2 . Then g ◦ f (x) = (x + 1)2 .
CHAPTER 0. THE FUNCTION 7

Figure 1: This figure represents the composition of the functions f, g, h. Each function is repre-
sented by arrows from circles representing its domain to circles representing its co-domain. The
composition of the three functions is represented by following three arrows.

Example 0.3.11: Define the function

f : {A, B, C, . . . , Z} −→ {0, 1, 2, . . . , 25}

by
A )→ 0, B )→ 1, C )→ 2, · · · , Z )→ 25
Define the function g as follows. The domain and co-domain of g are both the set {0, 1, 2, . . . , 25},
and g(x) = (x + 3) mod 26. For a third function h, the domain is {0, ...25} and the co-domain
is {A, ..., Z}, and 0 )→ A, 1 )→ B, etc. Then h ◦ (g ◦ f ) is a function that implements the Caesar
cypher as described in Example 0.3.3 (Page 3).

For building intuition, we can use a diagram to represent composition of functions with finite
domains and co-domains. Figure 1 depicts the three functions of Example 0.3.11 (Page 7) being
composed.

0.3.6 Associativity of function composition


Next we show that composition of functions is associative:

Proposition 0.3.12 (Associativity of composition): For functions f, g, h,

h ◦ (g ◦ f ) = (h ◦ g) ◦ f

if the compositions are legal.

Proof
CHAPTER 0. THE FUNCTION 8

Let x be any member of the domain of f .

(h ◦ (g ◦ f ))(x) = h((g ◦ f )(x)) by definition of h ◦ (g ◦ f ))


= h(g(f (x)) by definition of g ◦ f
= (h ◦ g)(f (x)) by definition of h ◦ g
= ((h ◦ g) ◦ f )(x) by definition of (h ◦ g) ◦ f

Associativity means that parentheses are unnecessary in composition expression: since h ◦


(g ◦ f ) is the same as (h ◦ g) ◦ f , we can write either of them as simply h ◦ g ◦ f .

0.3.7 Functional inverse


Let us take the perspective of a lieutenant of Caesar who has received a cyphertext: PDWULA.
To obtain the plaintext, the lieutenant must find for each letter in the cyphertext the letter that
maps to it under the encryption function (the function of Example 0.3.3 (Page 3)). That is, he
must find the letter that maps to P (namely M), the letter that maps to D (namely A), and
so on. In doing so, he can be seen to be applying another function to each of the letters of
the cyphertext, specifically the function that reverses the effect of the encryption function. This
function is said to be the functional inverse of the encryption function.
For another example, consider the functions f and h in Example 0.3.11 (Page 7): f is a
function from {A, . . . , Z} to {0, . . . , 25} and h is a function from {0, . . . , 25} to {A, . . . , Z}. Each
one reverses the effect of the other. That is, h ◦ f is the identity function on {A, . . . , Z}, and
f ◦ h is the identity function on {0, . . . , 25}. We say that h is the functional inverse of f . There
is no reason for privileging f , however; f is the functional inverse of h as well.
In general,

Definition 0.3.13: We say that functions f and g are functional inverses of each other if

• f ◦ g is defined and is the identity function on the domain of g, and


• g ◦ f is defined and is the identity function on the domain of f .

Not every function has an inverse. A function that has an inverse is said to be invertible.
Examples of noninvertible functions are shown in Figures 2 and 3

Definition 0.3.14: Consider a function f : D −→ F . We say that f is one-to-one if for every


x, y ∈ D, f (x) = f (y) implies x = y. We say that f is onto if, for every z ∈ F , there exists
x ∈ D such that f (x) = z.

Example 0.3.15: Consider the function prod defined in Example 0.3.5 (Page 4). Since a prime
CHAPTER 0. THE FUNCTION 9

U V

Figure 2: A function f : U → V is depicted that is not onto, because the fourth element of the
co-domain is not the image under f of any element

U V

Figure 3: A function f : U → V is depicted that is not one-to-one, because the third element of
the co-domain is the image under f of more than one element.

number has no pre-image, this function is not onto. Since there are multiple pairs of integers,
e.g. (2, 3) and (3, 2), that map to the same integer, the function is also not one-to-one.

Lemma 0.3.16: An invertible function is one-to-one.

Proof

Suppose f is not one-to-one, and let x1 and x2 be distinct elements of the domain such
that f (x1 ) = f (x2 ). Let y = f (x1 ). Assume for a contradiction that f is invertible. The
definition of inverse implies that f −1 (y) = x1 and also f −1 (y) = x2 , but both cannot be
true. !

Lemma 0.3.17: An invertible function is onto.

Proof

Suppose f is not onto, and let ŷ be an element of the co-domain such that ŷ is not the
image of any domain element. Assume for a contradiction that f is invertible. Then ŷ has
CHAPTER 0. THE FUNCTION 10

an image x̂ under f −1 . The definition of inverse implies that f (x̂) = ŷ, a contradiction. !

Theorem 0.3.18 (Function Invertibility Theorem): A function is invertible iff it is one-


to-one and onto.

Proof

Lemmas 0.3.16 and 0.3.17 show that an invertible function is one-to-one and onto. Suppose
conversely that f is a function that is one-to-one and onto. We define a function g whose
domain is the co-domain of f as follows:
For each element ŷ of the co-domain of f , since f is onto, f ’s domain contains
some element x̂ for which f (x̂) = ŷ; we define g(ŷ) = x̂.

We claim that g ◦ f is the identity function on f ’s domain. Let x̂ be any element of f ’s


domain, and let ŷ = f (x̂). Because f is one-to-one, x̂ is the only element of f ’s domain
whose image under f is ŷ, so g(ŷ) = x̂. This shows g ◦ f is the identity function.
We also claim that f ◦ g is the identity function on g’s domain. Let ŷ be any element of
g’s domain. By the definition of g, f (g(ŷ)) = ŷ. !

Lemma 0.3.19: Every function has at most one functional inverse.

Proof

Let f : U → V be an invertible function. Suppose that g1 and g2 are inverses of f . We show


that, for every element v ∈ V , g1 (v) = g2 (v), so g1 and g2 are the same function.
Let v ∈ V be any element of the co-domain of f . Since f is onto (by Lemma 0.3.17),
there is some element u ∈ U such that v = f (u). By definition of inverse, g1 (v) = u and
g2 (v) = u. Thus g1 (v) = g2 (v). !

0.3.8 Invertibility of the composition of invertible functions


In Example 0.3.11 (Page 7), we saw that the composition of three functions is a function that
implements the Caesar cypher. The three functions being composed are all invertible, and the
result of composition is also invertible. This is not a coincidence:

Lemma 0.3.20: If f and g are invertible functions and f ◦ g exists then f ◦ g is invertible and
(f ◦ g)−1 = g −1 ◦ f −1 .
CHAPTER 0. THE FUNCTION 11

g f f!g
1 A A p 1 p

2 B B q 2 q

3 C C r 3 r

f-1 g-1 (f!g)-1=g-1!f-1


p A A 1 p 1

q B B 2 q 2

r C C 3 r 3

Figure 4: The top part of this figure shows two invertible functions f and g, and their composition
f ◦ g. Note that the composition f ◦ g is invertible. This illustrates Lemma 0.3.20. The bottom
part of this figure shows g −1 , f −1 and (f ◦ g)−1 . Note that (f ◦ g)−1 = g −1 ◦ f −1 . This illustrates
Lemma 0.3.20.

Problem 0.3.21: Prove Lemma 0.3.20.

Problem 0.3.22: Use diagrams like those of Figures 1, 2, and 3 to specify functions g and f
that are a counterexample to the following:
False Assertion 0.3.23: Suppose that f and g are functions and f ◦ g is invertible. Then f
and g are invertible.

!
CHAPTER 0. THE FUNCTION 12

0.4 Probability

Random Number (http://xkcd.com/221/)

One important use of vectors and matrices arises in probability. For example, this is how they
arise in Google’s PageRank method. We will therefore study very rudimentary probability theory
in this course.
In probability theory, nothing ever happens—probability theory is just about what could
happen, and how likely it is to happen. Probability theory is a calculus of probabilities. It is
used to make predictions about a hypothetical experiment. (Once something actually happens,
you use statistics to figure out what it means.)

0.4.1 Probability distributions


+
A function Pr(·)
!from a finite domain Ω to the set R of nonnegative reals is a (discrete) probability
distribution if ω∈Ω Pr(ω) = 1. We refer to the elements of the domain as outcomes. The image
of an outcome under Pr(·) is called the probability of the outcome. The probabilities are supposed
to be proportional to the relative likelihoods of outcomes. Here I use the term likelihood to mean
the common-sense notion, and probability to mean the mathematical abstraction of it.
CHAPTER 0. THE FUNCTION 13

Psychic, http://xkcd.com/628/

Uniform distributions
For the simplest examples, all the outcomes are equally likely, so they are all assigned the same
probabilities. In such a case, we say that the probability distribution is uniform.

Example 0.4.1: To model the flipping of a single coin, Ω = {heads, tails}. We assume that the
two outcomes are equally likely, so we assign them the same probability: Pr(heads) = Pr(tails).
Since we require the sum to be 1, Pr(heads) = 1/2 and Pr(tails) = 1/2. In Python, we would
write the probability distribution as
>>> Pr = {'heads':1/2, 'tails':1/2}

Example 0.4.2: To model the roll of a single die, Ω = {1, 2, 3, 4, 5, 6}, and Pr(1) = Pr(2) =
· · · = Pr(6). Since the probabilities of the six outcomes must sum to 1, each of these probabilities
must be 1/6. In Python,

>>> Pr = {1:1/6, 2:1/6, 3:1/6, 4:1/6, 5:1/6, 6:1/6}

Example 0.4.3: To model the flipping of two coins, a penny and a nickel,
Ω = {HH, HT, T H, T T }, and each of the outcomes has the same probability, 1/4. In Python,

>>> Pr = {('H', 'H'):1/4, ('H', 'T'):1/4, ('T','H'):1/4, ('T','T'):1/4}

Nonuniform distributions
In more complicated situations, different outcomes have different probabilities.

Example 0.4.4: Let Ω = {A, B, C, . . . , Z}, and let’s assign probabilities according to how
likely you are to draw each letter at the beginning of a Scrabble game. Here is the number of
tiles with each letter in Scrabble:
A 9 B 2 C 2 D 4
E 12 F 2 G 3 H 2
I 9 J 1 K 1 L 1
M 2 N 6 O 8 P 2
Q 1 R 6 S 4 T 6
U 4 V 2 W 2 X 1
Y 2 Z 1

The likelihood of drawing an R is twice that of drawing a G, thrice that of drawing a C, and
six times that of drawing a Z. We need to assign probabilities that are proportional to these
CHAPTER 0. THE FUNCTION 14

likelihoods. We must have some number c such that, for each letter, the probability of drawing
that letter should be c times the number of copies of that letter.

Pr[drawing letter X] = c · number of copies of letter X

Summing over all letters, we get

1 = c · total number of tiles

Since the total number of tiles is 95, we define c = 1/95. The probability of drawing an E is
therefore 12/95, which is about .126. The probability of drawing an A is 9/95, and so on. In
Python, the probability distribution is
{'A':9/95, 'B':2/95, 'C':2/95, 'D':4/95, 'E':12/95, 'F':2/95,
'G':3/95, 'H':2/95, 'I':9/95, 'J':1/95, 'K':1/95, 'L':1/95,
'M':2/95, 'N':6/95, 'O':8/95, 'P':2/95, 'Q':1/95, 'R':6/95,
'S':4/95, 'T':6/95, 'U':4/95, 'V':2/95, 'W':2/95, 'X':1/95,
'Y':2/95, 'Z':1/95}

0.4.2 Events, and adding probabilities


In Example 0.4.4 (Page 13), what is the probability of drawing a vowel from the bag?
A set of outcomes is called an event. For example, the event of drawing a vowel is represented
by the set {A, E, I, O, U }.

Principle 0.4.5 (Fundamental Principle of Probability Theory): The probability of


an event is the sum of probabilities of the outcomes making up the event.

According to this principle, the probability of a vowel is

9/95 + 12/95 + 9/95 + 8/95 + 4/95

which is 42/95.

0.4.3 Applying a function to a random input


Now we think about applying a function to a random input. Since the input to the function is
random, the output should also be considered random. Given the probability distribution of the
input and a specification of the function, we can use probability theory to derive the probability
distribution of the output.

Example 0.4.6: Define the function f : {1, 2, 3, 4, 5, 6} −→ {0, 1} by


"
0 if x is even
f (x) =
1 if x is odd
CHAPTER 0. THE FUNCTION 15

Consider the experiment in which we roll a single die (as in Example 0.4.2 (Page 13)), yielding
one of the numbers in {1, 2, 3, 4, 5, 6}, and then we apply f (·) to that number, yielding either
a 0 or a 1. What is the probability function for the outcome of this experiment?
The outcome of the experiment is 0 if the rolled die shows 2, 4, or 6. As discussed in
Example 0.4.2 (Page 13), each of these possibilies has probability 1/6. By the Fundamental
Principle of Probability Theory, therefore, the output of the function is 0 with probability 1/6 +
1/6 + 1/6, which is 1/2. Similarly, the output of the function is 1 with probability 1/2. Thus
the probability distribution of the output of the function is {0: 1/2., 1:1/2.}.

Quiz 0.4.7: Consider the flipping of a penny and a nickel, described in Example 0.4.3 (Page
13). The outcome is a pair (x, y) where each of x and y is 'H' or 'T' (heads or tails). Define
the function
f : {(’H’, ’H’) (’H’, ’T’), (’T’,’H’), (’T’,’T’)}
by
f ((x, y)) = the number of H’s represented
Give the probability distribution for the output of the function.

Answer

{0: 1/4., 1:1/2., 2:1/4.}

Example 0.4.8 (Caesar plays Scrabble): Recall that the function f defined in Exam-
ple 0.3.11 (Page 7) maps A to 0, B to 1, and so on. Consider the experiment in which f
is applied to a letter selected randomly according to the probability distribution described in
Example 0.4.4 (Page 13). What is the probability distribution of the output?
Because f is an invertible function, there is one and only one input for which the output is 0,
namely A. Thus the probability of the output being 0 is exactly the same as the probability of
the input being A, namely 9/95.. Similarly, for each of the integers 0 through 25 comprising the
co-domain of f , there is exactly one letter that maps to that integer, so the probability of that
integer equals the probability of that letter. The probability distribution is thus

{0:9/95., 1:2/95., 2:2/95., 3:4/95., 4:12/95., 5:2/95.,


6:3/95., 7:2/95., 8:9/95., 9:1/95., 10:1/95., 11:1/95.,
12:2/95., 13:6/95., 14:8/95., 15:2/95., 16:1/95., 17:6/95.,
18:4/95., 19:6/95., 20:4/95., 21:2/95., 22:2/95., 23:1/95.,
24:2/95., 25:1/95.}

The previous example illustrates that, if the function is invertible, the probabilities are pre-
served: the probabilities of the various outputs match the probabilities of the inputs. It follows
that, if the input is chosen according to a uniform distribution, the distribution of the output is
CHAPTER 0. THE FUNCTION 16

also uniform.

Example 0.4.9: In Caesar’s Cyphersystem, one encrypts a letter by advancing it three posi-
tions. Of course, the number k of positions by which to advance need not be three; it can
be any integer from 0 to 25. We refer to k as the key. Suppose we select the key k ac-
cording to the uniform distribution on {0, 1, . . . , 25}, and use it to encrypt the letter P. Let
w : {0, 1, . . . , 25} −→ {A, B, . . . , Z} be the the function mapping the key to the cyphertext:

w(k) = h(f (P ) + k mod 26)


= h(15 + k mod 26)

The function w(·) is invertible. The input is chosen according to the uniform distribution, so the
distribution of the output is also uniform. Thus when the key is chosen randomly, the cyphertext
is equally likely to be any of the twenty-six letters.

0.4.4 Perfect secrecy

Cryptography (http://xkcd.com/153/)

We apply the idea of Example 0.4.9 (Page 16) to some even simpler cryptosystems. A cryp-
tosystem must satisfy two obvious requirements:
• the intended recipient of an encrypted message must be able to decrypt it, and

• someone for whom the message was not intended should not be able to decrypt it.
The first requirement is straightforward. As for the second, we must dispense with a miscon-
ception about security of cryptosystems. The idea that one can keep information secure by
CHAPTER 0. THE FUNCTION 17

not revealing the method by which it was secured is often called, disparagingly, security through
obscurity. This approach was critiqued in 1881 by a professor of German, Jean-Guillame-Hubert-
Victor-François-Alexandre-August Kerckhoffs von Niewenhof, known as August Kerckhoffs. The
Kerckhoffs Doctrine is that the security of a cryptosystem should depend only on the secrecy of
the key used, not on the secrecy of the system itself.
There is an encryption method that meets Kerchoffs’ stringent requirement. It is utterly
unbreakable if used correctly.1 Suppose Alice and Bob work for the British military. Bob is the
commander of some troops stationed in Boston harbor. Alice is the admiral, stationed several
miles away. At a certain moment, Alice must convey a one-bit message p (the plaintext) to Bob:
whether to attack by land or by sea (0=land, 1=sea). Their plan, agreed upon in advance, is
that Alice will encrypt the message, obtaining a one-bit cyphertext c, and send the cyphertext c
to Bob by hanging one or two lanterns (say, one lantern = 0, two lanterns = 1). They are aware
that the fate of a colony might depend on the secrecy of their communication. (As it happens,
a rebel, Eve, knows of the plan and will be observing.)
Let’s go back in time. Alice and Bob are consulting with their cryptography expert, who
suggests the following scheme:

Bad Scheme: Alice and Bob randomly choose k from {♣, ♥, ♠} according to the uniform
probability function (pr(♣) = 1/3, pr(♥) = 1/3, pr(♠) = 1/3). Alice and Bob must both
know k but must keep it secret. It is the key.
When it is time for Alice to use the key to encrypt her plaintext message p, obtaining the
cyphertext c, she refers to the following table:
p k c
0 ♣ 0
0 ♥ 1
0 ♠ 1
1 ♣ 1
1 ♥ 0
1 ♠ 0

The good news is that this cryptosystem satisfies the first requirement of cryptosystems: it will
enable Bob, who knows the key k and receives the cyphertext c, to determine the plaintext p.
No two rows of the table have the same k-value and c-value.
The bad news is that this scheme leaks information to Eve. Suppose the message turns out
to be 0. In this case, c = 0 if k = ♣ (which happens with probability 1/3), and c = 1 if k = ♥
or k = ♠ (which, by the Fundamental Principle of Probability Theory, happens with probability
2/3). Thus in this case c = 1 is twice as likely as c = 0. Now suppose the message turns out to
be 1. In this case, a similar analysis shows that c = 0 is twice as likely as c = 1.
Therefore, when Eve sees the cyphertext c, she learns something about the plaintext p. Learn-
ing c doesn’t allow Eve to determine the value of p with certainty, but she can revise her estimate
of the chance that p = 0. For example, suppose that, before seeing c, Eve believed p = 0 and
p = 1 were equally likely. If she sees c = 1 then she can infer that p = 0 is twice as likely as
p = 1. The exact calculation depends on Bayes’ Rule, which is beyond the scope of this analysis
1 For an historically significant occurence of the former Soviet Union failing to use it correctly, look up VENONA.
CHAPTER 0. THE FUNCTION 18

but is quite simple.


Confronted with this argument, the cryptographer changes the scheme simply by removing
♠ as a possible value for p.
Good Scheme: Alice and Bob randomly choose k from {♣, ♥} according to the uniform
probability function (pr(♣) = 1/2, pr(♥) = 1/2)
When it is time for Alice to encrypt her plaintext message p, obtaining the cyphertext c,
she uses the following table:
p k c
0 ♣ 0
0 ♥ 1
1 ♣ 1
1 ♥ 0

0.4.5 Perfect secrecy and invertible functions


Consider the functions
f0 : {♣, ♥} −→ {0, 1}
and
f1 : {♣, ♥} −→ {0, 1}
defined by
f0 (x) = encryption of 0 when the key is x
f1 (x) = encryption of 1 when the key is x
Each of these functions is invertible. Consequently, for each function, if the input x is chosen
uniformly at random, the output will also be distributed according to the uniform distribution.
This in turn means that the probability distribution of the output does not depend on whether 0
or 1 is being encrypted, so knowing the output gives Eve no information about which is being
encrypted. We say the scheme achieves perfect secrecy.
CHAPTER 0. THE FUNCTION 19

0.5 Lab: Introduction to Python—sets, lists, dictionaries, and


comprehensions

Python http://xkcd.com/353/

We will be writing all our code in Python (Version 3.x). In writing Python code, we empha-
size the use of comprehensions, which allow one to express computations over the elements
of a set, list, or dictionary without a traditional for-loop. Use of comprehensions leads to
more compact and more readable code, code that more clearly expresses the mathematical
idea behind the computation being expressed. Comprehensions might be new to even some
readers who are familiar with Python, and we encourage those readers to at least skim the
material on this topic.
CHAPTER 0. THE FUNCTION 20

To start Python, simply open a console (also called a shell or a terminal or, under
Windows, a “Command Prompt” or “MS-DOS Prompt”), and type python3 (or perhaps
just python) to the console (or shell or terminal or Command Prompt) and hit the Enter
key. After a few lines telling you what version you are using (e.g., Python 3.4.1), you should
see >>> followed by a space. This is the prompt; it indicates that Python is waiting for you
to type something. When you type an expression and hit the Enter key, Python evaluates
the expression and prints the result, and then prints another prompt. To get out of this
environment, type quit() and Enter, or Control-D. To interrupt Python when it is running
too long, type Control-C.
This environment is sometimes called a REPL, an acronym for “read-eval-print loop.” It
reads what you type, evaluates it, and prints the result if any. In this assignment, you will
interact with Python primarily through the REPL. In each task, you are asked to come up
with an expression of a certain form.
There are two other ways to run Python code. You can import a module from within the
REPL, and you can run a Python script from the command line (outside the REPL). We
will discuss modules and importing in the next lab assignment. This will be an important
part of your interaction with Python.

0.5.1 Simple expressions


Arithmetic and numbers
You can use Python as a calculator for carrying out arithmetic computations. The binary
operators +, *, -, / work as you would expect. To take the negative of a number, use -
as a unary operator (as in -9). Exponentiation is represented by the binary operator **,
and truncating integer division is //. Finding the remainder when one integer is divided by
another (modulo) is done using the % operator. As usual, ** has precedence over * and /
and //, which have precedence over + and -, and parentheses can be used for grouping.
To get Python to carry out a calculation, type the expression and press the Enter/Return
key:
>>> 44+11*4-6/11.
87.454545454545454
>>>

Python prints the answer and then prints the prompt again.

Task 0.5.1: Use Python to find the number of minutes in a week.

Task 0.5.2: Use Python to find the remainder of 2304811 divided by 47 without using the
modulo operator %. (Hint: Use //.)

Python uses a traditional programming notation for scientific notation. The notation
CHAPTER 0. THE FUNCTION 21

6.022e23 denotes the value 6.02 × 1023 , and 6.626e-34 denotes the value 6.626 × 10−34 . As
we will discover, since Python uses limited-precision arithmetic, there are round-off errors:

>>> 1e16 + 1
1e16

Strings
A string is a series of characters that starts and ends with a single-quote mark. Enter a
string, and Python will repeat it back to you:
>>> 'This sentence is false.'
'This sentence is false.'
You can also use double-quote marks; this is useful if your string itself contains a single-quote
mark:

>>> "So's this one."


"So's this one."

Python is doing what it usually does: it evaluates (finds the value of) the expression it is
given and prints the value. The value of a string is just the string itself.

Comparisons and conditions and Booleans


You can compare values (strings and numbers, for example) using the operators ==, < , >,
<=, >=, and !=. (The operator != is inequality.)
>>> 5 == 4
False
>>> 4 == 4
True
The value of such a comparison is a Boolean value (True or False). An expression whose
value is a boolean is called a Boolean expression.
Boolean operators such as and and or and not can be used to form more complicated
Boolean expressions.

>> True and False


False
>>> True and not (5 == 4)
True

Task 0.5.3: Enter a Boolean expression to test whether the sum of 673 and 909 is divisible
by 3.
CHAPTER 0. THE FUNCTION 22

0.5.2 Assignment statements


The following is a statement, not an expression. Python executes it but produces neither an
error message nor a value.

>>> mynum = 4+1


The result is that henceforth the variable mynum is bound to the value 5. Consequently, when
Python evaluates the expression consisting solely of mynum, the resulting value is 5. We say
therefore that the value of mynum is 5.
A bit of terminology: the variable being assigned to is called the left-hand side of an
assignment, and the expression whose value is assigned is called the right-hand side.
A variable name must start with a letter and must exclude certain special symbols such
as the dot (period). The underscore is allowed in a variable name. A variable can be bound
to a value of any type. You can rebind mynum to a string:

>>> mynum = 'Brown'

This binding lasts until you assign some other value to mynum or until you end your Python
session. It is called a top-level binding. We will encounter cases of binding variables to values
where the bindings are temporary.
It is important to remember (and second nature to most experienced programmers) that
an assignment statement binds a variable to the value of an expression, not to the expression
itself. Python first evaluates the right-hand side and only then assigns the resulting value
to the left-hand side. This is the behavior of most programming languages.
Consider the following assignments.

>>> x = 5+4
>>> y = 2 * x
>>> y
18
>>> x = 12
>>> y
18
In the second assignment, y is assigned the value of the expression 2 * x. The value of that
expression is 9, so y is bound to 18. In the third assignment, x is bound to 12. This does
not change the fact that y is bound to 18.

0.5.3 Conditional expressions


There is a syntax for conditional expressions:
-expression. if -condition. else -expression.
CHAPTER 0. THE FUNCTION 23

The condition should be a Boolean expression. Python evaluates the condition; depending
on whether it is True or False, Python then evaluates either the first or second expression,
and uses the result as the result of the entire conditional expression.
For example, the value of the expression x if x>0 else -x is the absolute value of x.

Task 0.5.4: Assign the value -9 to x and 1/2 to y. Predict the value of the following
expression, then enter it to check your prediction:
2**(y+1/2) if x+10<0 else 2**(y-1/2)

0.5.4 Sets
Python provides some simple data structures for grouping together multiple values, and
integrates them with the rest of the language. These data structures are called collections.
We start with sets.
A set is an unordered collection in which each value occurs at most once. You can use
curly braces to give an expression whose value is a set. Python prints sets using curly braces.
>>> {1+2, 3, "a"}
{'a', 3}
>>> {2, 1, 3}
{1, 2, 3}

Note that duplicates are eliminated and that the order in which the elements of the output
are printed does not necessarily match the order of the input elements.
The cardinality of a set S is the number of elements in the set. In Mathese we write
|S| for the cardinality of set S. In Python, the cardinality of a set is obtained using the
procedure len(·).

>>> len({'a', 'b', 'c', 'a', 'a'})


3

Summing
The sum of elements of collection of values is obtained using the procedure sum(·).

>>> sum({1,2,3})
6
If for some reason (we’ll see one later) you want to start the sum not at zero but at some
other value, supply that value as a second argument to sum(·):
>>> sum({1,2,3}, 10)
16
CHAPTER 0. THE FUNCTION 24

Testing set membership


Membership in a set can be tested using the in operator and the not in operator. If S is a
set, x in S is a Boolean expression that evaluates to True if the value of x is a member of
the set S, and False otherwise. The value of a not in expression is just the opposite
CHAPTER 0. THE FUNCTION 25

>>> S={1,2,3}
>>> 2 in S
True
>>> 4 in S
False
>>> 4 not in S
True

Set union and intersection


The union of two sets S and T is a new set that contains every value that is a member of S
or a member of T (or both). Python uses the vertical bar | as the union operator:

>>> {1,2,3} | {2,3,4}


{1, 2, 3, 4}
The intersection of S and T is a new set that contains every value that is a member of both
S and T . Python uses the ampersand & as the intersection operator:

>>> {1,2,3} & {2,3,4}


{2, 3}

Mutating a set
A value that can be altered is a mutable value. Sets are mutable; elements can be added
and removed using the add and remove methods:

>>> S={1,2,3}
>>> S.add(4)
>>> S.remove(2)
>>> S
{1, 3, 4}
The syntax using the dot should be familiar to students of object-oriented programming
languages such as Java and C++. The operations add(·) and remove(·) are methods.
You can think of a method as a procedure that takes an extra argument, the value of the
expression to the left of the dot.
Python provides a method update(...) to add to a set all the elements of another
collection (e.g. a set or a list):
>>> S.update({4, 5, 6})
>>> S
{1, 3, 4, 5, 6}
Similarly, one can intersect a set with another collection, removing from the set all elements
not in the other collection:
CHAPTER 0. THE FUNCTION 26

>>> S.intersection_update({5,6,7,8,9})
>>> S
{5, 6}

Suppose two variables are bound to the same value. A mutation to the value made
through one variable is seen by the other variable.
>>> T=S
>>> T.remove(5)
>>> S
{6}
This behavior reflects the fact that Python stores only one copy of the underlying data
structure. After Python executes the assignment statement T=S, both T and S point to the
same data structure. This aspect of Python will be important to us: many different variables
can point to the same huge set without causing a blow-up of storage requirements.
Python provides a method for copying a collection such as a set:
>>> U=S.copy()
>>> U.add(5)
>>> S
{6}

The assignment statement binds U not to the value of S but to a copy of that value, so
mutations to the value of U don’t affect the value of S.

Set comprehensions
Python provides for expressions called comprehensions that let you build collections out
of other collections. We will be using comprehensions a lot because they are useful in con-
structing an expression whose value is a collection, and they mimic traditional mathematical
notation. Here’s an example:

>>> {2*x for x in {1,2,3} }


{2, 4, 6}

This is said to be a set comprehension over the set {1,2,3}. It is called a set comprehension
because its value is a set. The notation is similar to the traditional mathematical notation
for expressing sets in terms of other sets, in this case {2x : x ∈ {1, 2, 3}}. To compute the
value, Python iterates over the elements of the set {1,2,3}, temporarily binding the control
variable x to each element in turn and evaluating the expression 2*x in the context of that
binding. Each of the values obtained is an element of the final set. (The bindings of x during
the evaluation of the comprehension do not persist after the evaluation completes.)
CHAPTER 0. THE FUNCTION 27

Task 0.5.5: Write a comprehension over {1, 2, 3, 4, 5} whose value is the set consisting of
the squares of the first five positive integers.

Task 0.5.6: Write a comprehension over {0, 1, 2, 3, 4} whose value is the set consisting of
the first five powers of two, starting with 20 .

Using the union operator | or the intersection operator &, you can write set expressions
for the union or intersection of two sets, and use such expressions in a comprehension:
>>> {x*x for x in S | {5, 7}}
{1, 25, 49, 9}

By adding the phrase if -condition. at the end of the comprehension (before the closing
brace “}”), you can skip some of the values in the set being iterated over:

>>> {x*x for x in S | {5, 7} if x > 2}


{9, 49, 25}

I call the conditional clause a filter.


You can write a comprehension that iterates over the Cartesian product of two sets:

>>>{x*y for x in {1,2,3} for y in {2,3,4}}


{2, 3, 4, 6, 8, 9, 12}

This comprehension constructs the set of the products of every combination of x and y. I
call this a double comprehension.

Task 0.5.7: The value of the previous comprehension,


{x*y for x in {1,2,3} for y in {2,3,4}}
is a seven-element set. Replace {1,2,3} and {2,3,4} with two other three-element sets
so that the value becomes a nine-element set.

Here is an example of a double comprehension with a filter:


>>> {x*y for x in {1,2,3} for y in {2,3,4} if x != y}
{2, 3, 4, 6, 8, 12}

Task 0.5.8: Replace {1,2,3} and {2,3,4} in the previous comprehension with two dis-
joint (i.e. non-overlapping) three-element sets so that the value becomes a five-element
set.
CHAPTER 0. THE FUNCTION 28

Task 0.5.9: Assume that S and T are assigned sets. Without using the intersection oper-
ator &, write a comprehension over S whose value is the intersection of S and T. Hint: Use
a membership test in a filter at the end of the comprehension.
Try out your comprehension with S = {1,2,3,4} and T = {3,4,5,6}.

Remarks
The empty set is represented by set(). You would think that {} would work but, as we will
see, that notation is used for something else.
You cannot make a set that has a set as element. This has nothing to do with Cantor’s
Paradox—Python imposes the restriction that the elements of a set must not be mutable,
and sets are mutable. The reason for this restriction will be clear to a student of data
structures from the error message in the following example:
>>> {{1,2},3}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

There is a nonmutable version of set called frozenset. Frozensets can be elements of sets.
However, we won’t be using them.

0.5.5 Lists
Python represents sequences of values using lists. In a list, order is significant and repeated
elements are allowed. The notation for lists uses square brackets instead of curly braces.
The empy list is represented by [].

>>> [1,1+1,3,2,3]
[1, 2, 3, 2, 3]
There are no restrictions on the elements of lists. A list can contain a set or another list.
>>> [[1,1+1,4-1],{2*2,5,6}, "yo"]
[[1, 2, 3], {4, 5, 6}, 'yo']

However, a set cannot contain a list since lists are mutable.


The length of a list, obtained using the procedure len(·), is the number of elements in
the list, even though some of those elements may themselves be lists, and even though some
elements might have the same value:

>>> len([[1,1+1,4-1],{2*2,5,6}, "yo", "yo"])


4
As we saw in the section on sets, the sum of elements of a collection can be computed using
sum(·)
CHAPTER 0. THE FUNCTION 29

>>> sum([1,1,0,1,0,1,0])
4
>>> sum([1,1,0,1,0,1,0], -9)
-5
In the second example, the second argument to sum(·) is the value to start with.

Task 0.5.10: Write an expression whose value is the average of the elements of the list
[20, 10, 15, 75].

List concatenation
You can combine the elements in one list with the elements in another list to form a new
list (without changing the original lists) using the + operator.

>>> [1,2,3]+["my", "word"]


[1, 2, 3, 'my', 'word']
>>> mylist = [4,8,12]
>>> mylist + ["my", "word"]
[4, 8, 12, 'my', 'word']
>>> mylist
[4, 8, 12]

You can use sum(·) on a collection of lists, obtaining the concatenation of all the lists, by
providing [] as the second argument.
>>> sum([ [1,2,3], [4,5,6], [7,8,9] ])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'
>>> sum([ [1,2,3], [4,5,6], [7,8,9] ], [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

List comprehensions
Next we discuss how to write a list comprehension (a comprehension whose value is a list).
In the following example, a list is constructed by iterating over the elements in a set.
>>> [2*x for x in {2,1,3,4,5} ]
[2, 4, 6, 8, 10]

Note that the order of elements in the resulting list might not correspond to the order of
elements in the set since the latter order is not significant.
You can also use a comprehension that constructs a list by iterating over the elements
in a list:
CHAPTER 0. THE FUNCTION 30

>>> [ 2*x for x in [2,1,3,4,5] ]


[4, 2, 6, 8, 10]

Note that the list [2,1,3,4,5] specifies the order among its elements. In evaluating the
comprehension Python iterates through them in that order. Therefore the order of elements
in the resulting list corresponds to the order in the list iterated over.
You can also write list comprehensions that iterate over multiple collections using two
control variables. As I mentioned in the context of sets, I call these “double comprehensions”.
Here is an example of a list comprehension over two lists.

>>> [ x*y for x in [1,2,3] for y in [10,20,30] ]


[10, 20, 30, 20, 40, 60, 30, 60, 90]
The resulting list has an element for every combination of an element of [1,2,3] with an
element of [10,20,30].
We can use a comprehension over two sets to form the Cartesian product.

Task 0.5.11: Write a double list comprehension over the lists ['A','B','C'] and [1,2,3]
whose value is the list of all possible two-element lists [letter, number]. That is, the value is

[['A', 1], ['A', 2], ['A', 3], ['B', 1], ['B', 2],['B', 3],
['C', 1], ['C', 2], ['C', 3]]

Task 0.5.12: Suppose LofL has been assigned a list whose elements are themselves lists
of numbers. Write an expression that evaluates to the sum of all the numbers in all the
lists. The expression has the form
sum([sum(...
and includes one comprehension. Test your expression after assigning [[.25, .75, .1],
[-1, 0], [4, 4, 4, 4]] to LofL. Note that your expression should work for a list of any
length.
CHAPTER 0. THE FUNCTION 31

Obtaining elements of a list by indexing

Donald Knuth http://xkcd.com/163/

There are two ways to obtain an individual element of a list. The first is by indexing. As in
some other languages (Java and C++, for example) indexing is done using square brackets
around the index. Here is an example. Note that the first element of the list has index 0.
>>> mylist[0]
4
>>> ['in','the','CIT'][1]
'the'

Slices: A slice of a list is a new list consisting of a consecutive subsequence of elements


of the old list, namely those indexed by a range of integers. The range is specified by a
colon-separated pair i : j consisting of the index i as the first element and j as one past the
index of the last element. Thus mylist[1:3] is the list consisting of elements 1 and 2 of
mylist.

Prefixes: If the first element i of the pair is 0, it can be omitted, so mylist[:2] consists
of the first 2 elements of mylist. This notation is useful for obtaining a prefix of a list.

Suffixes: If the second element j of the pair is the length of the list, it can be omitted, so
mylist[1:] consists of all elements of mylist except element 0.

>>> L = [0,10,20,30,40,50,60,70,80,90]
>>> L[:5]
[0, 10, 20, 30, 40]
CHAPTER 0. THE FUNCTION 32

>>> L[5:]
[50, 60, 70, 80, 90]

Slices that skip You can use a colon-separated triple a:b:c if you want the slice to include
every cth element. For example, here is how you can extract from L the list consisting of
even-indexed elements and the list consisting of odd-indexed elements:

>>> L[::2]
[0, 20, 40, 60, 80]
>>> L[1::2]
[10, 30, 50, 70, 90]

Obtaining elements of a list by unpacking


The second way to obtain individual elements is by unpacking. Instead of assigning a list to
a single variable as in mylist =[4,8,12], one can assign to a list of variables:

>>> [x,y,z] = [4*1, 4*2, 4*3]


>>> x
4
>>> y
8

I called the left-hand side of the assignment a “list of variables,” but beware: this is a
notational fiction. Python does not allow you to create a value that is a list of variables.
The assignment is simply a convenient way to assign to each of the variables appearing in
the left-hand side.

Task 0.5.13: Find out what happens if the length of the left-hand side list does not match
the length of the right-hand side list.

Unpacking can similarly be used in comprehensions:

>>> listoflists = [[1,1],[2,4],[3, 9]]


>>> [y for [x,y] in listoflists]
[1, 4, 9]
Here the two-element list [x,y] iterates over all elements of listoflists. This would result
in an error message if some element of listoflists were not a two-element list.

Mutating a list: indexing on the left-hand side of =


You can mutate a list, replacing its ith element, using indexing on the left-hand side of the
=, analogous to an assignment statement:
CHAPTER 0. THE FUNCTION 33

>>> mylist = [30, 20, 10]


>>> mylist[1] = 0
>>> mylist
[30, 0, 10]
Slices can also be used on the left-hand side but we will not use this.

0.5.6 Tuples
Like a list, a tuple is an ordered sequence of elements. However, tuples are immutable so
they can be elements of sets. The notation for tuples is the same as that for lists except
that ordinary parentheses are used instead of square brackets.

>>> (1,1+1,3)
(1, 2, 3)
>>> {0, (1,2)} | {(3,4,5)}
{(1, 2), 0, (3, 4, 5)}

Obtaining elements of a tuple by indexing and unpacking


You can use indexing to obtain an element of a tuple.

>>> mytuple = ("all", "my", "books")


>>> mytuple[1]
'my'
>>> (1, {"A", "B"}, 3.14)[2]
3.14

You can also use unpacking with tuples. Here is an example of top-level variable assignment:
>>> (a,b) = (1,5-3)
>>> a
1
In some contexts, you can get away without the parentheses, e.g.
>>> a,b = (1,5-3)
or even
>>> a,b = 1,5-3

You can use unpacking in a comprehension:

>>> [y for (x,y) in [(1,'A'),(2,'B'),(3,'C')] ]


['A', 'B', 'C']
CHAPTER 0. THE FUNCTION 34

Task 0.5.14: Suppose S is a set of integers, e.g. {−4, −2, 1, 2, 5, 0}. Write a triple
comprehension whose value is a list of all three-element tuples (i, j, k) such that i, j, k are
elements of S whose sum is zero.

Task 0.5.15: Modify the comprehension of the previous task so that the resulting list does
not include (0, 0, 0). Hint: add a filter.

Task 0.5.16: Further modify the expression so that its value is not the list of all such
tuples but is the first such tuple.

The previous task provided a way to compute three elements i, j, k of S whose sum is
zero—if there exist three such elements. Suppose you wanted to determine if there were a
hundred elements of S whose sum is zero. What would go wrong if you used the approach
used in the previous task? Can you think of a clever way to quickly and reliably solve the
problem, even if the integers making up S are very large? (If so, see me immediately to
collect your Ph.D.)

Obtaining a list or set from another collection


Python can compute a set from another collection (e.g. a list) using the constructor set(·).
Similarly, the constructor list(·) computes a list, and the constructor tuple(·) computes
a tuple

>>> set([0,1,2,3,4,5,6,7,8,9])
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> set([1,2,3])
{1, 2, 3}
>>> list({1,2,3})
[1, 2, 3]
>>> set((1,2,3))
{1, 2, 3}

Task 0.5.17: Find an example of a list L such that len(L) and len(list(set(L))) are
different.

0.5.7 Other things to iterate over


Tuple comprehensions—not! Generators
One would expect to be able to create a tuple using the usual comprehension syntax, e.g.
(i for i in [1,2,3]) but the value of this expression is not a tuple. It is a generator.
CHAPTER 0. THE FUNCTION 35

Generators are a very powerful feature of Python but we don’t study them here. Note,
however, that one can write a comprehension over a generator instead of over a list or set
or tuple. Alternatively, one can use set(·) or list(·) or tuple(·) to transform a generator
into a set or list or tuple.

Ranges
A range plays the role of a list consisting of the elements of an arithmetic progression. For
any integer n, range(n) represents the sequence of integers from 0 through n − 1. For exam-
ple, range(10) represents the integers from 0 through 9. Therefore, the value of the following
comprehension is the sum of the squares of these integers: sum({i*i for i in range(10)}).
Even though a range represents a sequence, it is not a list. Generally we will either
iterate through the elements of the range or use set(·) or list(·) to turn the range into a
set or list.

>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Task 0.5.18: Write a comprehension over a range of the form range(n) such that the
value of the comprehension is the set of odd numbers from 1 to 99.

You can form a range with one, two, or three arguments. The expression range(a,b)
represents the sequence of integers a, a + 1, a + 2, . . . , b − 1. The expression range(a,b,c)
represents a, a + c, a + 2c, . . . (stopping just before b).

Zip
Another collection that can be iterated over is a zip. A zip is constructed from other
collections all of the same length. Each element of the zip is a tuple consisting of one
element from each of the input collections.

>>> list(zip([1,3,5],[2,4,6]))
[(1, 2), (3, 4), (5, 6)]
>>> characters = ['Neo', 'Morpheus', 'Trinity']
>>> actors = ['Keanu', 'Laurence', 'Carrie-Anne']
>>> set(zip(characters, actors))
{('Trinity', 'Carrie-Anne'), ('Neo', 'Keanu'), ('Morpheus', 'Laurence')}
>>> [character+' is played by '+actor
... for (character,actor) in zip(characters,actors)]
['Neo is played by Keanu', 'Morpheus is played by Laurence',
'Trinity is played by Carrie-Anne']
CHAPTER 0. THE FUNCTION 36

Task 0.5.19: Assign to L the list consisting of the first five letters ['A','B','C','D','E'].
Next, use L in an expression whose value is
[(0, ’A’), (1, ’B’), (2, ’C’), (3, ’D’), (4, ’E’)]
Your expression should use a range and a zip, but should not use a comprehension.

Task 0.5.20: Starting from the lists [10, 25, 40] and [1, 15, 20], write a compre-
hension whose value is the three-element list in which the first element is the sum of 10
and 1, the second is the sum of 25 and 15, and the third is the sum of 40 and 20. Your
expression should use zip but not list.

reversed
To iterate through the elements of a list L in reverse order, use reversed(L), which does
not change the list L:

>>> [x*x for x in reversed([4, 5, 10])]


[100, 25, 16]

0.5.8 Dictionaries
We will often have occasion to use functions with finite domains. Python provides collec-
tions, called dictionaries, that are suitable for representing such functions. Conceptually,
a dictionary is a set of key-value pairs. The syntax for specifying a dictionary in terms of
its key-value pairs therefore resembles the syntax for sets—it uses curly braces—except that
instead of listing the elements of the set, one lists the key-value pairs. In this syntax, each
key-value pair is written using colon notation: an expression for the key, followed by the
colon, followed by an expression for the value:

key : value

The function f that maps each letter in the alphabet to its rank in the alphabet could be
written as
{'A':0, 'B':1, 'C':2, 'D':3, 'E':4, 'F':5, 'G':6, 'H':7, 'I':8,
'J':9, 'K':10, 'L':11, 'M':12, 'N':13, 'O':14, 'P':15, 'Q':16,
'R':17, 'S':18, 'T':19, 'U':20, 'V':21, 'W':22, 'X':23, 'Y':24,
'Z':25}

As in sets, the order of the key-value pairs is irrelevant, and the keys must be immutable
(no sets or lists or dictionaries). For us, the keys will mostly be integers, strings, or tuples
of integers and strings.
The keys and values can be specified with expressions.
CHAPTER 0. THE FUNCTION 37

>>> {2+1:'thr'+'ee', 2*2:'fo'+'ur'}


{3: 'three', 4: 'four'}

To each key in a dictionary there corresponds only one value. If a dictionary is given multiple
values for the same key, only one value will be associated with that key.

>>> {0:'zero', 0:'nothing'}


{0: 'nothing'}

Indexing into a dictionary


Obtaining the value corresponding to a particular key uses the same syntax as indexing a
list or tuple: right after the dictionary expression, use square brackets around the key:
>>> {4:"four", 3:'three'}[4]
'four'
>>> mydict = {'Neo':'Keanu', 'Morpheus':'Laurence',
'Trinity':'Carrie-Anne'}
>>> mydict['Neo']
'Keanu'
If the key is not represented in the dictionary, Python considers it an error:

>>> mydict['Oracle']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Oracle'

Testing dictionary membership


You can check whether a key is in a dictionary using the in operator we earlier used for
testing membership in a set:
>>> 'Oracle' in mydict
False
>>> mydict['Oracle'] if 'Oracle' in mydict else 'NOT PRESENT'
'NOT PRESENT'
>>> mydict['Neo'] if 'Neo' in mydict else 'NOT PRESENT'
'Keanu'

Lists of dictionaries
CHAPTER 0. THE FUNCTION 38

Task 0.5.21: Suppose dlist is a list of dictionaries and k is a key that appears in all the
dictionaries in dlist. Write a comprehension that evaluates to the list whose ith element
is the value corresponding to key k in the ith dictionary in dlist.
Test your comprehension with some data. Here are some example data.

dlist = [{'James':'Sean', 'director':'Terence'}, {'James':'Roger',


'director':'Lewis'}, {'James':'Pierce', 'director':'Roger'}]
k = 'James'

Task 0.5.22: Modify the comprehension in Task 0.5.21 to handle the case in which k
might not appear in all the dictionaries. The comprehension evaluates to the list whose ith
element is the value corresponding to key k in the ith dictionary in dlist if that dictionary
contains that key, and 'NOT PRESENT' otherwise.
Test your comprehension with k = 'Bilbo' and k = 'Frodo' and with the following
list of dictionaries:

dlist = [{'Bilbo':'Ian','Frodo':'Elijah'},
{'Bilbo':'Martin','Thorin':'Richard'}]

Mutating a dictionary: indexing on the left-hand side of =


You can mutate a dictionary, mapping a (new or old) key to a given value, using the syntax
used for assigning a list element, namely using the index syntax on the left-hand side of an
assignment:

>>> mydict['Agent Smith'] = 'Hugo'


>>> mydict['Neo'] = 'Philip'
>>> mydict
{'Neo': 'Philip', 'Agent Smith': 'Hugo', 'Trinity': 'Carrie-Anne',
'Morpheus': 'Laurence'}

Dictionary comprehensions
You can construct a dictionary using a comprehension.
>>> { k:v for (k,v) in [(3,2),(4,0),(100,1)] }
{3: 2, 4: 0, 100: 1}
>>> { (x,y):x*y for x in [1,2,3] for y in [1,2,3] }
{(1, 2): 2, (3, 2): 6, (1, 3): 3, (3, 3): 9, (3, 1): 3,
(2, 1): 2, (2, 3): 6, (2, 2): 4, (1, 1): 1}
CHAPTER 0. THE FUNCTION 39

Task 0.5.23: Using range, write a comprehension whose value is a dictionary. The keys
should be the integers from 0 to 99 and the value corresponding to a key should be the
square of the key.

The identity function on a set D is the function with the following spec:
• input: an element x of D

• output: x
That is, the identity function simply outputs its input.

Task 0.5.24: Assign some set to the variable D, e.g. D ={'red','white','blue'}.


Now write a comprehension that evaluates to a dictionary that represents the identity func-
tion on D.

Task 0.5.25: Using the variables base=10 and digits=set(range(base)), write a dic-
tionary comprehension that maps each integer between zero and nine hundred ninety nine
to the list of three digits that represents that integer in base 10. That is, the value should be

{0: [0, 0, 0], 1: [0, 0, 1], 2: [0, 0, 2], 3: [0, 0, 3], ...,
10: [0, 1, 0], 11: [0, 1, 1], 12: [0, 1, 2], ...,
999: [9, 9, 9]}
Your expression should work for any base. For example, if you instead assign 2 to base and
assign {0,1} to digits, the value should be

{0: [0, 0, 0], 1: [0, 0, 1], 2: [0, 1, 0], 3: [0, 1, 1],


..., 7: [1, 1, 1]}

Comprehensions that iterate over dictionaries


You can write list comprehensions that iterate over the keys or the values of a dictionary,
using keys() or values():
>>> [2*x for x in {4:'a',3:'b'}.keys() ]
[6, 8]
>>> [x for x in {4:'a', 3:'b'}.values()]
['b', 'a']

Given two dictionaries A and B, you can write comprehensions that iterate over the union
or intersection of the keys, using the union operator | and intersection operator & we learned
about in Section 0.5.4.
CHAPTER 0. THE FUNCTION 40

>>> [k for k in {'a':1, 'b':2}.keys() | {'b':3, 'c':4}.keys()]


['a', 'c', 'b']
>>> [k for k in {'a':1, 'b':2}.keys() & {'b':3, 'c':4}.keys()]
['b']
Often you’ll want a comprehension that iterates over the (key, value) pairs of a dictionary,
using items(). Each pair is a tuple.

>>> [myitem for myitem in mydict.items()]


[('Neo', 'Philip'), ('Morpheus', 'Laurence'),
('Trinity', 'Carrie-Anne'), ('Agent Smith', 'Hugo')]
Since the items are tuples, you can access the key and value separately using unpacking:

>>> [k + " is played by " + v for (k,v) in mydict.items()]


['Neo is played by Philip, 'Agent Smith is played by Hugo',
'Trinity is played by Carrie-Anne', 'Morpheus is played by Laurence']
>>> [2*k+v for (k,v) in {4:0,3:2, 100:1}.items() ]
[8, 8, 201]

Task 0.5.26: Suppose d is a dictionary that maps some employee IDs (a subset of the
integers from 0 to n − 1) to salaries. Suppose L is an n-element list whose ith element is
the name of employee number i. Your goal is to write a comprehension whose value is a
dictionary mapping employee names to salaries. You can assume that employee names are
distinct.
Test your comprehension with the following data:
id2salary = {0:1000.0, 3:990, 1:1200.50}
names = ['Larry', 'Curly', '', 'Moe']

0.5.9 Defining one-line procedures


The procedure twice : R −→ R that returns twice its input can be written in Python as
follows:
def twice(z): return 2*z

The word def introduces a procedure definition. The name of the function being defined is
twice. The variable z is called the formal argument to the procedure. Once this procedure
is defined, you can invoke it using the usual notation: the name of the procedure followed
by an expression in parenthesis, e.g. twice(1+2)
The value 3 of the expression 1+2 is the actual argument to the procedure. When the
procedure is invoked, the formal argument (the variable) is temporarily bound to the actual
argument, and the body of the procedure is executed. At the end, the binding of the actual
argument is removed. (The binding was temporary.)
CHAPTER 0. THE FUNCTION 41

Task 0.5.27: Try entering the definition of twice(z). After you enter the definition, you
will see the ellipsis. Just press enter. Next, try invoking the procedure on some actual
arguments. Just for fun, try strings or lists. Finally, verify that the variable z is now not
bound to any value by asking Python to evaluate the expression consisting of z.

Task 0.5.28: Define a one-line procedure nextInts(L) specified as follows:


• input: list L of integers

• output: list of integers whose ith element is one more than the ith element of L
• example: input [1, 5, 7], output [2, 6, 8].

Task 0.5.29: Define a one-line procedure cubes(L) specified as follows:

• input: list L of numbers


• output: list of numbers whose ith element is the cube of the ith element of L
• example: input [1, 2, 3], output [1, 8, 27].

Task 0.5.30: Define a one-line procedure dict2list(dct,keylist) with this spec:

• input: dictionary dct, list keylist consisting of the keys of dct


• output: list L such that L[i] = dct[keylist[i]] for i = 0, 1, 2, . . . , len(keylist) − 1

• example: input dct={'a':'A', 'b':'B', 'c':'C'} and keylist=['b','c','a'],


output ['B', 'C', 'A']

Task 0.5.31: Define a one-line procedure list2dict(L, keylist) specified as follows:


• input: list L, list keylist of immutable items

• output: dictionary that maps keylist[i] to L[i] for i = 0, 1, 2, . . . , len(L) − 1


• example: input L=[’A’,’B’,’C’] and keylist=[’a’,’b’,’c’],
output {'a':'A', 'b':'B', 'c':'C'}

Hint: Use a comprehension that iterates over a zip or a range.


CHAPTER 0. THE FUNCTION 42

Task 0.5.32: Write a procedure all 3 digit numbers(base, digits) with the follow-
ing spec:

• input: a positive integer base and the set digits which should be {0, 1, 2, . . . , base−1}.
• output: the set of all three-digit numbers where the base is base
For example,

>>> all_3_digit_numbers(2, {0,1})


{0, 1, 2, 3, 4, 5, 6, 7}
>>> all_3_digit_numbers(3, {0,1,2})
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26}
>>> all_3_digit_numbers(10, {0,1,2,3,4,5,6,7,8,9})
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
...
985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999}

0.6 Lab: Python—modules and control structures—and inverse


index
In this lab, you will create a simple search engine. One procedure will be responsible for
reading in a large collection of documents and indexing them to facilitate quick responses to
subsequent search queries. Other procedures will use the index to answer the search queries.
The main purpose of this lab is to give you more Python programming practice.

0.6.1 Using existing modules


Python comes with an extensive library, consisting of components called modules. In order
to use the definitions defined in a module, you must either import the module itself or import
the specific definitions you want to use from the module. If you import the module, you
must refer to a procedure or variable defined therein by using its qualified name, i.e. the
name of the module followed by a dot followed by the short name.
For example, the library math includes many mathematical procedures such as square-
root, cosine, and natural logarithm, and mathematical constants such as π and e.
CHAPTER 0. THE FUNCTION 43

Task 0.6.1: Import the math module using the command


>>> import math

Call the built-in procedure help(modulename) on the module you have just imported:
>>> help(math)
This will cause the console to show documentation on the module. You can move forward
by typing f and backward by typing b, and you can quit looking at the documentation by
typing q.
Use procedures defined by the math module to compute the square root of 3, and raise
it to the power of 2. The result might not be what you expect. Keep in mind that Python
represents nonintegral real numbers with limited precision, so the answers it gives are only
approximate.
Next compute the square root of -1, the cosine of π, and the natural logarithm of e.
The short name of the square-root function is sqrt so its qualified name is math.sqrt.
The short names of the cosine and the natural logarithm are cos and log, and the short
names of π and e are pi and e.

The second way to bring a procedure or variable from a module into your Python envi-
ronment is to specifically import the item itself from the module, using the syntax

from -module name. import -short name.


after which you can refer to it using its short name.

Task 0.6.2: The module random defines a procedure randint(a,b) that returns an inte-
ger chosen uniformly at random from among {a, a + 1, . . . , b}. Import this procedure using
the command

>>> from random import randint


Try calling randint a few times. Then write a one-line procedure movie review(name)
that takes as argument a string naming a movie, and returns a string review selected uni-
formly at random from among two or more alternatives (Suggestions: “See it!”, “A gem!”,
“Ideological claptrap!”)

0.6.2 Creating your own modules


You can create your own modules simply by entering the text of your procedure definitions
and variable assignments in a file whose name consists of the module name you choose,
followed by .py. Use a text editor such as Kate or Vim or, my personal favorite, Emacs.
The file can itself contain import statements, enabling the code in the file to make use
of definitions from other modules.
CHAPTER 0. THE FUNCTION 44

If the file is in the current working directory when you start up Python, you can import
the module.a

Task 0.6.3: In Tasks 0.5.30 and 0.5.31 of Lab 0.5, you wrote procedures dict2list(dct,
keylist) and list2dict(L, keylist). Download the file dictutil.py from the
resource page for Coding the Matrix. Edit the file, replacing each occurence of pass with
the appropriate statement. Import this module, and test the procedures. We might have
occasion to use this module in the future.
a There is an environment variable, PYTHONPATH, that governs the sequence of directories in which Python

searches for modules.

Reloading
You will probably find it useful when debugging your own module to be able to edit it and
load the edited version into your current Python session. Python provides the procedure
reload(module) in the module imp. To import this procedure, use the command

>>> from imp import reload


Note that if you import a specific definition using the from ... import ... syntax
then you cannot reload it.

Task 0.6.4: Edit dictutil.py. Define a procedure listrange2dict(L) with this spec:

• input: a list L
• output: a dictionary that, for i = 0, 1, 2, . . . , len(L) − 1, maps i to L[i]

You can write this procedure from scratch or write it in terms of list2dict(L, keylist).
Use the statement
>>> reload(dictutil)

to reload your module, and then test listrange2dict on the list ['A','B','C'].

0.6.3 Loops and conditional statements


Comprehensions are not the only way to loop over elements of a set, list, dictionary, tuple,
range, or zip. For the traditionalist programmer, there are for-loops: for x in {1,2,3}: print(x).
In this statement, the variable x is bound to each of the elements of the set in turn, and the
statement print(x) is executed in the context of that binding.
There are also while-loops: while v[i] == 0: i = i+1.
There are also conditional statements (as opposed to conditional expressions):
if x > 0: print("positive")
CHAPTER 0. THE FUNCTION 45

0.6.4 Grouping in Python using indentation


You will sometimes need to define loops or conditional statements in which the body consists
of more than one statement. Most programming languages have a way of grouping a series
of statements into a block. For example, C and Java use curly braces around the sequence
of statements.
Python uses indentation to indicate grouping of statements. All the statements form-
ing a block should be indented the same number of spaces. Python is very picky
about this. Python files we provide will use four spaces to indent. Also, don’t mix tabs
with spaces in the same block. In fact, I recommend you avoid using tabs for indentation
with Python.
Statements at the top level should have no indentation. The group of statements forming
the body of a control statement should be indented more than the control statement. Here’s
an example:

for x in [1,2,3]:
y = x*x
print(y)
This prints 1, 4, and 9. (After the loop is executed, y remains bound to 9 and x remains
bound to 3.)

Task 0.6.5: Type the above for-loop into Python. You will see that, after you enter the
first line, Python prints an ellipsis (...) to indicate that it is expecting an indented block of
statements. Type a space or two before entering the next line. Python will again print the
ellipsis. Type a space or two (same number of spaces as before) and enter the next line.
Once again Python will print an ellipsis. Press enter, and Python should execute the loop.

The same use of indentation can be used used in conditional statements and in procedure
definitions.

def quadratic(a,b,c):
discriminant = math.sqrt(b*b - 4*a*c)
return ((-b + discriminant)/(2*a), (-b - discriminant)/(2*a))

You can nest as deeply as you like:

def print_greater_quadratic(L):
for a, b, c in L:
plus, minus = quadratic(a, b, c)
if plus > minus:
print(plus)
else:
print(minus)
CHAPTER 0. THE FUNCTION 46

Many text editors help you handle indentation when you write Python code. For example,
if you are using Emacs to edit a file with a .py suffix, after you type a line ending with a
colon and hit return, Emacs will automatically indent the next line the proper amount,
making it easy for you to start entering lines belonging to a block. After you enter each line
and hit Return, Emacs will again indent the next line. However, Emacs doesn’t know when
you have written the last line of a block; when you need to write the first line outside of
that block, you should hit Delete to unindent.

0.6.5 Breaking out of a loop


As in many other programming languages, when Python executes the break statement,
the loop execution is terminated, and execution continues immediately after the innermost
nested loop containing the statement.

>>> s = "There is no spoon."


>>> for i in range(len(s)):
... if s[i] == 'n':
... break
...
>>> i
9

0.6.6 Reading from a file


In Python, a file object is used to refer to and access a file. The expression
open('stories_small.txt') returns a file object that allows access to the file with the
name given. You can use a comprehension or for-loop to loop over the lines in the file

>>> f = open('stories_big.txt')
>>> for line in f:
... print(line)
or, if the file is not too big, use list(·) to directly obtain a list of the lines in the file, e.g.
>>> f = open('stories_small.txt')
>>> stories = list(f)
>>> len(stories)
50
In order to read from the file again, one way is to first create a new file object by calling
open again.

0.6.7 Mini-search engine


Now, for the core of the lab, you will be writing a program that acts as a sort of search
engine.
CHAPTER 0. THE FUNCTION 47

Given a file of “documents” where each document occupies a line of the file, you are to
build a data structure (called an inverse index) that allows you to identify those documents
containing a given word. We will identify the documents by document number: the document
represented by the first line of the file is document number 0, that represented by the second
line is document number 1, and so on.
You can use a method defined for strings, split(), which splits the string at spaces into
substrings, and returns a list of these substrings:
>>> mystr = 'Ask not what you can do for your country.'
>>> mystr.split()
['Ask', 'not', 'what', 'you', 'can', 'do', 'for', 'your', 'country.']
Note that the period is considered part of a substring. To make this lab easier, we have
prepared a file of documents in which punctuation are separated from words by spaces.
Often one wants to iterate through the elements of a list while keeping track of the indices
of the elements. Python provides enumerate(L) for this purpose.

>>> list(enumerate(['A','B','C']))
[(0, 'A'), (1, 'B'), (2, 'C')]
>>> [i*x for (i,x) in enumerate([10,20,30,40,50])]
[0, 20, 60, 120, 200]
>>> [i*s for (i,s) in enumerate(['A','B','C','D','E'])]
['', 'B', 'CC', 'DDD', 'EEEE']

Task 0.6.6: Write a procedure makeInverseIndex(strlist) that, given a list of strings


(documents), returns a dictionary that maps each word to the set consisting of the document
numbers of documents in which that word appears. This dictionary is called an inverse index.
(Hint: use enumerate.)

Task 0.6.7: Write a procedure orSearch(inverseIndex, query) which takes an inverse


index and a list of words query, and returns the set of document numbers specifying all
documents that conain any of the words in query.

Task 0.6.8: Write a procedure andSearch(inverseIndex, query) which takes an in-


verse index and a list of words query, and returns the set of document numbers specifying
all documents that contain all of the words in query.

Try out your procedures on these two provided files:


• stories_small.txt
CHAPTER 0. THE FUNCTION 48

• stories_big.txt

0.7 Review questions


• What does the notation f : A −→ B mean?
• What are the criteria for f to be an invertible function?
• What is associativity of functional composition?
• What are the criteria for a function to be a probability function?
• What is the Fundamental Principle of Probability Theory?
• If the input to an invertible function is chosen randomly according to the uniform distri-
bution, what is the distribution of the output?

0.8 Problems
Python comprehension problems
For each of the following problems, write the one-line procedure using a comprehension.

Problem 0.8.1: increments(L)


input: list L of numbers
output: list of numbers in which the ith element is one plus the ith element of L.
Example: increments([1,5,7]) should return [2,6,8].

Problem 0.8.2: cubes(L)


input: list L of numbers
output: list of numbers in which the ith element is the cube of the ith element of L.
Example: given [1, 2, 3] return [1, 8, 27].

Problem 0.8.3: tuple sum(A, B)


input: lists A and B of the same length, where each element in each list is a pair (x, y) of numbers
output: list of pairs (x, y) in which the first element of the ith pair is the sum of the first element
of the ith pair in A and the first element of the ith pair in B
example: given lists [(1, 2), (10, 20)] and [(3, 4), (30, 40)], return [(4, 6), (40, 60)].

Problem 0.8.4: inv dict(d)


input: dictionary d representing an invertible function f
output: dictionary representing the inverse of f, the returned dictionary’s keys are the values of
CHAPTER 0. THE FUNCTION 49

d and its values are the keys of d


example: given an English-French dictionary
{'thank you': 'merci', 'goodbye': 'au revoir'}
return a French-English dictionary
{'merci':'thank you', 'au revoir':'goodbye'}

Problem 0.8.5: First write a procedure row(p, n) with the following spec:
• input: integer p, integer n

• output: n-element list such that element i is p + i

• example: given p = 10 and n = 4, return [10, 11, 12, 13]

Next write a comprehension whose value is a 15-element list of 20-element lists such that the
j th element of the ith list is i + j. You can use row(p, n) in your comprehension.
Finally, write the same comprehension but without using row(p, n). Hint: replace the call
to row(p, n) with the comprehension that forms the body of row(p, n).

Functional inverse
Problem 0.8.6: Is the following function invertible? If yes, explain why. If not, can you change
domain and/or codomain of the function to make it invertible? Provide the drawing.

Problem 0.8.7: Is the following function invertible? If yes, explain why. If not, can you change
domain and/or codomain of the function to make it invertible? Provide the drawing.
CHAPTER 0. THE FUNCTION 50

Functional composition
Problem 0.8.8: Let f : R → R where f (x) = √ abs(x). Is there a choice of domain and co-
domain for the function g(x) with rule g(x) = x such that g ◦ f is defined? If so, specify it.
If not, explain why not. Could you change domain and/or codomain of f or g so that g ◦ f will
be defined?

Problem 0.8.9: Consider functions f and g in the following figure:

Is f ◦ g defined? If so, draw it, otherwise explain why not.

Probability
Problem 0.8.10: A function f (x) = x+1 with domain {1, 2, 3, 5, 6} and codomain {2, 3, 4, 6, 7}
has the following probability function on its domain: Pr(1) = 0.5, Pr(2) = 0.2 and Pr(3) =
Pr(5) = Pr(6) = 0.1. What is the probability of getting an even number as an output of f (x)?
An odd number?

Problem 0.8.11: A function g(x) = x mod 3 with domain {1, 2, 3, 4, 5, 6, 7} and codomain
{0, 1, 2} has the following probability function on its domain: Pr(1) = Pr(2) = Pr(3) = 0.2 and
Pr(4) = Pr(5) = Pr(6) = Pr(7) = 0.1. What is the probability of getting 1 as an output of
g(x)? What is the probability of getting 0 or 2?
Chapter 1

The Field

...the different branches of


Arithmetic—Ambition, Distraction,
Uglification, and Derision.
Lewis Carroll, Alice in Wonderland

We introduce the notion of a field, a collection of values with a plus operation and a times
operation. The reader is familiar with the field of real numbers but perhaps not with the field
of complex numbers or the field consisting just of zero and one. We discuss these fields and give
examples of applications.

1.1 Introduction to complex numbers


If you stick to real numbers, there are no solutions to the equation x2 = −1. To fill this void,
mathematicians invented i. That’s a bold letter i, and it’s pronounced “i”, but it is usually
defined as the square root of minus 1.

Guest Week: Bill Amend (excerpt, http://xkcd.com/824)

By definition,
i2 = −1
51
CHAPTER 1. THE FIELD 52

Multiplying both sides by 9, we get


9i2 = −9
which can be transformed to
(3i)2 = −9
solution to the equation x2 = −9. Similarly, for any positive number b, the solution
Thus 3i is the √
2
to x = −b is b times i. The product of a real number and i is called an imaginary number.
What about the equation (x − 1)2 = −9? We can solve this by setting x − 1 = 3i, which yields
x = 1 + 3i. The sum of a real number and an imaginary number is called a complex number. A
complex number has a real part and an imaginary part.

Math Paper (http://xkcd.com/410)

1.2 Complex numbers in Python


Python supports complex numbers. The square root of -9, the imaginary number 3i, is written
3j.
>>> 3j
3j
Thus j plays the role of i. (In electrical engineering, i means “current”)
The square root of -1, the imaginary number i, is written 1j so as to avoid confusion with
the variable j.
>>> j
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'j' is not defined
>>> 1j
1j
Since Python allows the use of + to add a real number and an imaginary number, you can write
the complex solution to (x − 1)2 = −9 as 1+3j:
>>> 1+3j
(1+3j)
CHAPTER 1. THE FIELD 53

In fact, the operators +, -, *, /, and ** all work with complex numbers. When you add two
complex numbers, the real parts are added and the imaginary parts are added.
>>> (1+3j) + (10+20j)
(11+23j)
>>> x=1+3j
>>> (x-1)**2
(-9+0j)
Python considers the value (-9+0j) to be a complex number even though its imaginary part is
zero.
As in ordinary arithmetic, multiplication has precedence over addition; exponentiation has
precedence over multiplication. These precedence rules are illustrated in the following evalua-
tions.
>>> 1+2j*3
(1+6j)
>>> 4*3j**2
(-36+0j)

You can obtain the real and imaginary parts of a complex number using a dot notation.
>>> x.real
1.0
>>> x.imag
3.0
It is not an accident that the notation is that used in object-oriented programming languages
to access instance variables (a.k.a. member variables). The complex numbers form a class in
Python.

>>> type(1+2j)
<class 'complex'>
This class defines the procedures (a.k.a. methods, a.k.a. member functions) used in arithmetic
operations on complex numbers.

1.3 Abstracting over fields


In programming languages, use of the same name (e.g. +) for different procedures operating
on values of different datatypes is called overloading. Here’s an example of why it’s useful in
the present context. Let us write a procedure solve1(a,b, c) to solve an equation of the form
ax + b = c where a is nonzero:

>>> def solve1(a,b,c): return (c-b)/a


It’s a pretty simple procedure. It’s the procedure you would write even if you had never heard
of complex numbers. Let us use it to solve the equation 10x + 5 = 30:
CHAPTER 1. THE FIELD 54

>>> solve1(10, 5, 30)


2.5
The remarkable thing, however, is that the same procedure can be used to solve equations
involving complex numbers. Let’s use it to solve the equation (10 + 5i)x + 5 = 20:

>>> solve1(10+5j, 5, 20)


(1.2-0.6j)
The procedure works even with complex arguments because the correctness of the procedure
does not depend on what kind of numbers are supplied to it; it depends only on the fact that the
divide operator is the inverse of the multiply operator and the subtract operator is the inverse of
the add operator.
The power of this idea goes well beyond this simple procedure. Much of linear algebra—
concepts, theorems, and, yes, procedures—works not just for the real numbers but also for the
complex numbers and for other kinds of numbers as well. The strategy for achieving this is
simple:

• The concepts, theorems, and procedures are stated in terms of the arithmetic operators +,
-, *, and /.
• They assume only that these operators satisfy certain basic laws, such as commutativity
(a + b = b + a) and distributivity (a(b + c) = ab + ac).

Because the concepts, theorems, and procedures rely only on these basic laws, we can “plug in”
any system of numbers, called a field.1 Different fields arise in different applications.
In this book, we illustrate the generality of linear algebra with three different fields.

• R, the field of real numbers,

• C, the field of complex numbers, and


• GF (2), a field that consists of 0 and 1.

In object-oriented programming, one can use the name of a class to refer to the set of instances
of that class, e.g. we refer to instances of the class Rectangle as Rectangles. In mathematics,
one uses the name of the field, e.g. R or GF (2), to refer also to the set of values.

1.4 Playing with C


Because each complex number z consists of two ordinary numbers, z.real and z.imag, it is
traditional to think of z as specifying a point, a location, in the plane (the complex plane).
1 For the reader who knows about object-oriented programming, a field is analogous to a class satisfying an

interface that requires it to possess methods for the arithmetic operators.


CHAPTER 1. THE FIELD 55

z
z.imag

z.real

To build intuition, let us use a set S of complex numbers to represent a black-and-white


image. For each location in the complex plane where we want a dot, we include the corresponding
complex number in S. The following figure shows S = {2 + 2i, 3 + 2i, 1.75 + 1i, 2 + 1i, 2.25 +
1i, 2.5 + 1i, 2.75 + 1i, 3 + 1i, 3.25 + 1i}:

Task 1.4.1: First, assign to the variable S a list or set consisting of the complex numbers listed
above.
We have provided a module plotting for showing points in the complex plane. The module
defines a procedure plot. Import this class from the module as follows:
>>> from plotting import plot
Next, plot the points in S as follows:
>>>> plot(S, 4)

Python should open a browser window displaying the points of S in the complex plane. The
first argument to plot is a collection of complex numbers (or 2-tuples). The second argument
sets the scale of the plot; in this case, the window can show complex numbers whose real and
imaginary parts have absolute value less than 4. The scale argument is optional and defaults to
1, and there is another optional argument that sets the size of the dots.
CHAPTER 1. THE FIELD 56

1.4.1 The absolute value of a complex number


The absolute value of a complex number z, written |z| (and, in Python, abs(z)) is the distance
from the origin to the corresponding point in the complex plane.

z
z.imag
length |z|

z.real

By the Pythagorean Theorem, |z|2 = (z.real)2 + (z.imag)2 .


>>> abs(3+4j)
5.0
>>> abs(1+1j)
1.4142135623730951

Definition 1.4.2: The conjugate of a complex number z, written z̄, is defined as z.real−z.imag.

In Python, we write z.conjugate().


>>> (3+4j).conjugate()
(3-4j)
CHAPTER 1. THE FIELD 57

Using the fact that i2 = −1, we can get a formula for |z|2 in terms of z and z̄:

|z|2 = z · z̄ (1.1)

Proof

z · z̄ = (z.real + z.imag i) · (z.real − z.imag i)


= z.real · z.real − z.real · z.imag i + z.imag i · z.real − z.imag i · z.imag i
= z.real2 − z.imag i · z.imag i
= z.real2 − z.imag · z.imag i2
= z.real2 + z.imag · z.imag

where the last equality uses the fact that i2 = −1. !

1.4.2 Adding complex numbers


Suppose we add a complex number, say 1 + 2i, to each complex number z in S. That is, we
derive a new set by applying the following function to each element of S:

f (z) = 1 + 2i + z

This function increases each real coordinate (the x coordinate) by 1 and increases each imaginary
coordinate (the y coordinate) by 2. The effect is to shift the picture one unit to the right and
two units up:

This transformation of the numbers in S is called a translation. A translation has the form

f (z) = z0 + z (1.2)

where z0 is a complex number. Translations can take the picture anywhere in the complex plane.
For example, adding a number z0 whose real coordinate is negative would have the effect of
translating the picture to the left.
CHAPTER 1. THE FIELD 58

Task 1.4.3: Create a new plot using a comprehension to provide a set of points derived from
S by adding 1 + 2i to each:

>>> plot({1+2j+z for z in S}, 4)

Quiz 1.4.4: The “left eye” of the set S of complex numbers is located at 2 + 2i. For what
value of z0 does the translation f (z) = z0 + z move the left eye to the origin?

Answer

z0 = −2 − 2i. That is, the translation is f (z) = −2 − 2i + z.

Problem 1.4.5: Show that, for any two distinct points z1 and z2 ,

• there is a translation that maps z1 to z2 ,


• there is a translation that maps z2 to z1 , and
• there is no translation that both maps z1 to z2 and z2 to z1 .

Complex numbers as arrows It is helpful to visualize a translation f (z) by an arrow. The


tail of the arrow is located at any point z in the complex plane; the head of the arrow is then
located at the point f (z), the translation of z. Of course, this representation is not unique.
Since a translation has the form f (z) = z0 + z, we represent the translation by the complex
number z0 . It is therefore appropriate to represent the complex number z0 by an arrow.

z0 + z

z0

Again, the representation is not unique. For example, the vector z0 = 5 − 2i can be represented
by an arrow whose tail is at 0 + 0i and whose head is at 5 − 2i, or one whose tail is at 1 + 1i and
whose head is at 6 − 1i, or....
CHAPTER 1. THE FIELD 59

z1+z2
z2

z1

Figure 1.1: This figure illustrates the geometric interpretation of complex-number addition.

Problem 1.4.6: Draw a diagram representing the complex number z0 = −3 + 3i using two
arrows with their tails located at different points.

Composing translations, adding arrows Let f1 (z) = z1 + z and f2 (z) = z2 + z be two


translations. Then their composition is also a translation:

(f2 ◦ f1 )(z) = f2 (f1 (z))


= f2 (z1 + z)
= z2 + z1 + z

and is defined by z )→ (z2 + z1 ) + z. The idea that two translations can be collapsed into one is
illustrated by Figure 1.1, in which each translation is represented by an arrow.
The translation arrow labeled by z1 takes a point (in this case, the origin) to another point,
which in turn is mapped by z2 to a third point. The arrow mapping the origin to the third point
is the composition of the two other translations, so, by the reasoning above, is z1 + z2 .

1.4.3 Multiplying complex numbers by a positive real number


Now suppose we halve each complex number in S:
1
g(z) = z
2
This operation simply halves the real coordinate and the imaginary coordinate of each complex
number. The effect on the picture is to move all the points closer from the origin but also closer
to each other:
CHAPTER 1. THE FIELD 60

This operation is called scaling. The scale of the picture has changed. Similarly, doubling each
complex number moves the points farther from the origin and from each other.

Task 1.4.7: Create a new plot titled “My scaled points” using a comprehension as in Task 1.4.3.
The points in the new plot should be halves of the points in S.

1.4.4 Multiplying complex numbers by a negative number: rotation


by 180 degrees
Here is the result of multiplying each complex number by -1:

Think of the points as drawn on a shape that rotates about the origin; this picture is the result
of rotating the shape by 180 degrees.
CHAPTER 1. THE FIELD 61

1.4.5 Multiplying by i: rotation by 90 degrees


“The number you have dialed is imaginary. Please rotate your phone by ninety degrees
and try again.”

How can we rotate the shape by only 90 degrees?

For this effect, a point located at (x, y) must be moved to (−y, x). The complex number located
at (x, y) is x + iy. Now is our chance to use the fact that i2 = −1. We use the function

h(z) = i · z

Multiplying x + iy by i yields ix + i2 y, which is ix − y, which is the complex number represented


by the point (−y, x).

Task 1.4.8: Create a new plot in which the points of S are rotated by 90 degrees and scaled
by 1/2. Use a comprehension in which the points of S are multiplied by a single complex number.
CHAPTER 1. THE FIELD 62

Task 1.4.9: Using a comprehension, create a new plot in which the points of S are rotated by
90 degrees, scaled by 1/2, and then shifted down by one unit and to the right two units. Use
a comprehension in which the points of S are multiplied by one complex number and added to
another.

Task 1.4.10: We have provided a module image with a procedure file2image(filename)


that reads in an image stored in a file in the .png format. Import this procedure and invoke it,
providing as argument the name of a file containing an image in this format, assigning the returned
value to variable data. An example grayscale image, img01.png, is available for download.
The value of data is a list of lists, and data[y][x] is the intensity of pixel (x,y). Pixel
(0,0) is at the bottom-left of the image, and pixel (width-1, height-1) is at the top-right.
The intensity of a pixel is a number between 0 (black) and 255 (white).
Use a comprehension to assign to a list pts the set of complex numbers x + yi such that the
image intensity of pixel (x, y) is less than 120, and plot the list pts.
CHAPTER 1. THE FIELD 63

Task 1.4.11: Write a Python procedure f(z) that takes as argument a complex number z so
that when f (z) is applied to each of the complex numbers in S, the set of resulting numbers
is centered at the origin. Write a comprehension in terms of S and f whose value is the set of
translated points, and plot the value.

Task 1.4.12: Repeat Task 1.4.8 with the points in pts instead of the points in S.

1.4.6 The unit circle in the complex plane: argument and angle
We shall see that it is not a coincidence that rotation by 180 or 90 degrees can be represented
by complex multiplication: any rotation can be so represented. However, it is convenient to use
radians instead of degrees to measure the angle of rotation.
CHAPTER 1. THE FIELD 64

The argument of a complex number on the unit circle


Consider the unit circle—the circle of radius one, centered at the origin of the complex plane.

! /2 radians
3! /4 radians
! /4 radians

! radians 0 radians

5! /4 radians 7! /4 radians

3! /2 radians

A point z on the circle is represented by the distance an ant would have to travel counterclockwise
along the circle to get to z if the ant started at 1 + 0i, the rightmost point of the circle. We call
this number the argument of z.

! /4 radians

0 radians

Example 1.4.13: Since the circumference of the circle is 2π, the point halfway around the
circle has an argument of π, and the point one-eighth of the way around has an argument of
π/4.

The angle formed by two complex numbers on the unit circle


We have seen how to label points on the unit circle by distances. We can similarly assign a
number to the angle formed by the line segments from the origin to two points z1 and z2 on
the circle. The angle, measured in radians, is the distance along the circle traversed by an ant
walking counterclockwise from z2 to z1 .
CHAPTER 1. THE FIELD 65

z1 z2
1
π
4

5
Example 1.4.14: Let z1 be the point on the circle that has argument 16 π, and let z2 be the
3
point on the circle that has argument 16 π. An ant starting at z2 and traveling to z1 would travel
a distance of 18 π counterclockwise along the circle, so 18 π is the angle between the origin-to-z1
line segment and the origin-to-z2 line segment.

Remark 1.4.15: The argument of z is the angle formed by z with 1 + 0i.

1.4.7 Euler’s formula


He calculated just as men breathe, as
eagles sustain themselves in the air.
Said of Leonhard Euler

We turn to a formula due to Leonhard Euler, a remarkable mathematician who contributed to


the foundation for many subfields of mathematics: number theory and algebra, complex analysis,
calculus, differential geometry, fluid mechanics, topology, graph theory, and even music theory
and cartography. Euler’s formula states that, for any real number θ, ei·θ is the point z on the
unit circle with argument θ. Here e is the famous transcendental number 2.718281828....

Example 1.4.16: The point −1 + 0i has argument π. Plugging π into Euler’s formula yields
the surprising equation eiπ + 1 = 0.
CHAPTER 1. THE FIELD 66

e to the π times i (http://xkcd.com/179/)

Task 1.4.17: From the module math, import the definitions e and pi. Let n be the integer
20. Let w be the complex number e2πi/n . Write a comprehension yielding the list consisting of
w0 , w1 , w2 , . . . , wn−1 . Plot these complex numbers.

1.4.8 Polar representation for complex numbers


Euler’s formula gives us a convenient representation for complex numbers that lie on the unit
circle. Now consider any complex number z. Let L be the line segment in the complex plane
from the origin to z, and let z # be the point at which this line segment intersects the unit circle.

r
z
z'
CHAPTER 1. THE FIELD 67

Let r be the length of the line segment to z. Viewing z # as the result of scaling down z, we have
1
z# = z
r
Let θ be the argument of z # . Euler’s formula tells us that z # = eθi . We therefore obtain

z = reθi

The astute student might recognize that r and θ are the polar coordinates of z. In the context
of complex numbers, we define the argument of z to be θ, and we define the absolute value of z
(written |z|) to be r.

1.4.9 The First Law of Exponentiation


When powers multiply, their exponents add:

eu ev = eu+v

We can use this rule to help us understand how to rotate a complex number z. We can write

z = reθi

where r = |z| and θ = arg z.

1.4.10 Rotation by τ radians


Let τ be a number of radians. The rotation of z by τ should have the same absolute value as z
but its argument should be τ more than that of z, i.e. it should be re(θ+τ )i . How do we obtain
this number from z?

re(θ+τ )i = reθi eτ i
= zeτ i

Thus the function that rotates by τ is simply

f (z) = zeτ i

Task 1.4.18: Recall from Task 1.4.1 the set S of complex numbers. Write a comprehension
whose value is the set consisting of rotations by π/4 of the elements of S. Plot the value of this
comprehension.
CHAPTER 1. THE FIELD 68

Task 1.4.19: Similarly, recall from Task 1.4.10 the list pts of points derived from an image.
Plot the rotation by π/4 of the complex numbers comprising pts.
CHAPTER 1. THE FIELD 69

1.4.11 Combining operations


Task 1.4.20: Write a comprehension that transforms the set pts by translating it so the image
is centered, then rotating it by π/4, then scaling it by half. Plot the result.

Because the complex numbers form a field, familiar algebraic rules can be used. For example,
a · (b · z) = (a · b) · z. Using this rule, two scaling operations can be combined into one; scaling
by 2 and then by 3 is equivalent to scaling by 6.
Similarly, since rotation is carried out by multiplication, two rotations can be combined into
π π
one; rotating by π4 (multiplying by e 4 i ) and and then rotating by π3 (multiplying by e 3 i ) is
π π π π
equivalent to multiplying by e 4 i · e 3 i , which is equal to e 4 i+ 3 i , i.e. rotating by π4 + π3 .
Since scaling and rotation both consist in multiplication, a rotation and a scaling can be
π
combined: rotating by π4 (multiplying by e 4 i ) and then scaling by 12 is equivalent to multiplying
π
by 12 e 4 i .

1.4.12 Beyond two dimensions


The complex numbers are so convenient for transforming images—and, more generally, sets of
points in the plane—one might ask whether there is a similar approach to operating on points in
three dimensions. We discuss this in the next chapter.

1.5 Playing with GF (2)


GF (2) is short for Galois Field 2. Galois was a mathematician, born in 1811, who while in his
teens essentially founded the field of abstract algebra. He died in a duel at age twenty.
CHAPTER 1. THE FIELD 70

The field GF (2) is very easy to describe. It has two elements, 0 and 1. Arithmetic over GF (2)
can be summarized in two small tables:
× 0 1 + 0 1
0 0 0 0 0 1
1 0 1 1 1 0
Addition is modulo 2. It is equivalent to exclusive-or. In particular, 1 + 1 = 0.
Subtraction is identical to addition. The negative of 1 is again 1, and the negative of 0 is
again 0.
Multiplication in GF (2) is just like ordinary multiplication of 0 and 1: multiplication by 0
yields 0, and 1 times 1 is 1. You can divide by 1 (as usual, you get the number you started with)
but dividing by zero is illegal (as usual).
We provide a module, GF2, with a very simple implementation of GF (2). It defines a value,
one, that acts as the element 1 of GF (2). Ordinary zero plays the role of the element 0 of GF (2).
(For visual consistency, the module defines zero to be the value 0.)

>>> from GF2 import one


>>> one*one
one
>>> one*0
0
>>> one + 0
one
>>> one+one
0
>>> -one
one

1.5.1 Perfect secrecy revisited


In Chapter 0, we described a cryptosystem that achieves perfect secrecy (in transmitting a single
bit). Alice and Bob randomly choose the key k uniformly from {♣, ♥}. Subsequently, Alice uses
the following encryption function to transform the plaintext bit p to a cyphertext bit c:
p k c
0 ♣ 0
0 ♥ 1
1 ♣ 1
1 ♥ 0
The encryption method is just GF (2) addition in disguise! When we replace ♣ with 0 and ♥
with 1, the encryption table becomes the addition table for GF (2):
p k c
0 0 0
0 1 1
1 0 1
1 1 0
For each plaintext p ∈ GF (2), the function k ) k + p (mapping GF (2) to GF (2)) is invertible

CHAPTER 1. THE FIELD 71

(hence one-to-one and onto). Therefore, when the key k is chosen uniformly at random, the
cyphertext is also distributed uniformly. This shows that the scheme achieves perfect secrecy.

Using integers instead of GF (2)


Why couldn’t Alice and Bob use, say, ordinary integers instead of GF (2)? After all, for each
x ∈ Z, the function y )→ x + y mapping Z to Z is also invertible. The reason this cannot work
as a cryptosystem is that there is no uniform distribution over Z, so the first step—choosing a
key—is impossible.

Encrypting long messages


How, then, are we to encrypt a long message? Students of computer science know that a long
message can be represented by a long string of bits. Suppose the message to be encrypted will
consist of n bits. Alice and Bob should select an equally long sequence of key bits k1 . . . kn . Now,
once Alice has selected the plaintext p1 . . . pn , she obtains the cyphertext c1 . . . cn one bit at a
time:

c1 = k 1 + p1
c2 = k 2 + p2
..
.
cn = k n + pn

We argue informally that this system has perfect secrecy. The earlier argument shows that each
bit ci of cyphertext tells Eve nothing about the corresponding bit pi of plaintext; certainly the
bit ci tells Eve nothing about any of the other bits of plaintext. From this we infer that the
system has perfect secrecy.
Our description of the multi-bit system is a bit cumbersome, and the argument for perfect
secrecy is rather sketchy. In Chapter 2, we show that using vectors over GF (2) simplify the
presentation.

The one-time pad


The cryptosystem we have described is called the one-time pad. As suggested by the name, it is
crucial that each bit of key be used only once, i.e. that each bit of plaintext be encrypted with
its bit of key. This can be a burden for two parties that are separated for long periods of time
because the two parties must agree before separating on many bits of key.
Starting in 1930, the Soviet Union used the one-time pad for communication. During World
War II, however, they ran out of bits of key and began to re-use some of the bits. The US
and Great Britain happen to discover this; they exploited it (in a top-secret project codenamed
VENONA) to partially decrypt some 1% of the encrypted messages, revealing, for example, the
involvement of Julius Rosenberg and Alger Hiss in espionage.
CHAPTER 1. THE FIELD 72

Problem 1.5.1: An 11-symbol message has been encrypted as follows. Each symbol is repre-
sented by a number between 0 and 26 (A )→ 0, B )→ 1, . . . , Z )→ 25, space )→ 26). Each number
is represented by a five-bit binary sequence (0 )→ 00000, 1 )→ 00001, ..., 26 )→ 11010). Finally,
the resulting sequence of 55 bits is encrypted using a flawed version of the one-time pad: the
key is not 55 random bits but 11 copies of the same sequence of 5 random bits. The cyphertext
is
10101 00100 10101 01011 11001 00011 01011 10101 00100 11001 11010
Try to find the plaintext.

1.5.2 Network coding


Consider the problem of streaming video through a network. Here is a simple example network:

c d
The node at the top labeled s needs to stream a video to each of the two customer nodes, labeled
c and d, at the bottom. Each link in the network has a capacity of 1 megabit per second. The
video stream, however, requires 2 megabits per second. If there were only one customer, this
would be no problem; as shown below, the network can handle two simultaneous 1-megabit-per-
second streams from s to c:
CHAPTER 1. THE FIELD 73

b1
s b2

c d
A million times a second, one bit b1 is sent along one path and another bit b2 is sent along
another path. Thus the total rate of bits delivered to the customer is 2 megabits per second.
However, as shown below, we can’t use the same scheme to deliver two bitstreams to each of
two customers because the streams contend for bandwidth on one of the network links.

b1 s b2

Two bits
contend for
same link

c d
GF (2) to the rescue! We can use the fact that network nodes can do a tiny bit (!) of
computation. The scheme is depicted here:
CHAPTER 1. THE FIELD 74

b1 s b2

b1 + b 2

c d
At the centermost node, the bits b1 and b2 arrive and are combined by GF (2) addition to obtain
a single bit. That single bit is transmitted as shown to the two customers c and d. Customer c
receives bit b1 and the sum b1 + b2 , so can also compute the bit b2 . Customer d receives bit b2
and the sum b1 + b2 , so can also compute the bit b1 .
We have shown that a network that appears to support streaming only one megabit per second
to a pair of customers actually supports streaming two megabits per second. This approach to
routing can of course be generalized to larger networks and more customers; the idea is called
network coding.

1.6 Review questions


• Name three fields.
• What is the conjugate of a complex number? What does it have to do with the absolute
value of a complex number?
• How does complex-number addition work?
• How does complex-number multiplication work?
• How can translation be defined in terms of complex numbers?
• How can scaling be defined in terms of complex numbers?
• How can rotation by 180 degrees be defined in terms of complex numbers?
• How can rotation by 90 degrees be defined in terms of complex numbers?
• How does addition of GF (2) values work?
• How does multiplication of GF (2) values work?
CHAPTER 1. THE FIELD 75

1.7 Problems
Python comprehension problems
Write each of the following three procedures using a comprehension:

Problem 1.7.1: my filter(L, num)


input: list of numbers and a positive integer.
output: list of numbers not containing a multiple of num.
example: given list = [1,2,4,5,7] and num = 2, return [1,5,7].

Problem 1.7.2: my lists(L)


input: list L of non-negative integers.
output: a list of lists: for every element x in L create a list containing 1, 2, . . . , x.
example: given [1,2,4] return [[1],[1,2],[1,2,3,4]]. example: given [0] return [[]].

Problem 1.7.3: my function composition(f,g)


input: two functions f and g, represented as dictionaries, such that g ◦ f exists.
output: dictionary that represents the function g ◦ f .
example: given f = {0:’a’, 1:’b’} and g = {’a’:’apple’, ’b’:’banana’}, return {0:’apple’, 1:’banana’}.

Python loop problems


For procedures in the following five problems, use the following format:

def <ProcedureName>(L):
current = ...
for x in L:
current = ...
return current

The value your procedure initially assigns to current turns out to be the return value in
the case when the input list L is empty. This provides us insight into how the answer should be
defined in that case. Note: You are not allowed to use Python built-in procedures sum(·) and
min(·).

Problem 1.7.4: mySum(L)


Input: list of numbers
Output: sum of numbers in the list
CHAPTER 1. THE FIELD 76

Problem 1.7.5: myProduct(L)


input: list of numbers
output: product of numbers in the list

Problem 1.7.6: myMin(L)


input: list of numbers
output: minimum number in the list

Problem 1.7.7: myConcat(L)


input: list of strings
output: concatenation of all the strings in L

Problem 1.7.8: myUnion(L)


input: list of sets
output: the union of all sets in L.

In each of the above problems, the value of current is combined with an element of myList
using some operation 0. In order that the procedure return the correct result, current should
be initialized with the identity element for the operation 0, i.e. the value i such that i 0 x = x
for any value x.
It is a consequence of the structure of the procedure that, when the input list is empty, the
output value is the initial value of current (since in this case the body of the loop is never
executed). It is convenient to define this to be the correct output!

Problem 1.7.9: Keeping in mind the comments above, what should be the value of each of
the following?

1. The sum of the numbers in an empty set.


2. The product of the numbers in an empty set.
3. The minimum of the numbers in an empty set.
4. The concatenation of an empty list of strings.

5. The union of an empty list of sets.


What goes wrong when we try to apply this reasoning to define the intersection of an empty list
of sets?
CHAPTER 1. THE FIELD 77

Complex addition practice


Problem 1.7.10: Each of the following problems asks for the sum of two complex numbers.
For each, write the solution and illustrate it with a diagram like that of Figure 1.1. The arrows
you draw should (roughly) correspond to the vectors being added.

a. (3 + 1i) + (2 + 2i)
b. (−1 + 2i) + (1 − 1i)

c. (2 + 0i) + (−3 + .001i)


d. 4(0 + 2i) + (.001 + 1i)

Multiplication of exponentials
Problem 1.7.11: Use the First Rule of Exponentiation (Section 1.4.9) to express the product
of two exponentials as a single exponential. For example, e(π/4)i e(π/4)i = e(π/2)i .
a. e1i e2i

b. e(π/4)i e(2π/3)i
c. e−(π/4)i e(2π/3)i

Combining operations on complex numbers


Problem 1.7.12: Write a procedure transform(a,b, L) with the following spec:

• input: complex numbers a and b, and a list L of complex numbers


• output: the list of complex numbers obtained by applying f (z) = az + b to each complex
number in L

Next, for each of the following problems, explain which value to choose for a and b in order
to achieve the specified transformation. If there is no way to achieve the transformation, explain.

a. Translate z one unit up and one unit to the right, then rotate ninety degrees clockwise, then
scale by two.
b. Scale the real part by two and the imaginary part by three, then rotate by forty-five degrees
counterclockwise, and then translate down two units and left three units.

GF (2) arithmetic
Problem 1.7.13: For each of the following problems, calculate the answer over GF (2).
CHAPTER 1. THE FIELD 78

a. 1 + 1 + 1 + 0
b. 1 · 1 + 0 · 1 + 0 · 0 + 1 · 1

c. (1 + 1 + 1) · (1 + 1 + 1 + 1)

Network coding
Problem 1.7.14: Copy the example network used in Section 1.5.2. Suppose the bits that need
to be transmitted in a given moment are b1 = 1 and b2 = 1. Label each link of the network with
the bit transmitted across it according to the network-coding scheme. Show how the customer
nodes c and d can recover b1 and b2 .
Chapter 2

The Vector

One of the principal objects of


theoretical research in my department of
knowledge is to find the point of view
from which the subject appears in its
greatest simplicity.
Josiah Willard Gibbs

Josiah Gibbs, the inventor of modern vector analysis, was up against stiff competition. The
dominant system of analysis, quaternions, had been invented by Sir William Rowan Hamilton.
Hamilton had been a bona fide prodigy. By age five, he was reported to have learned Latin,
Greek, and Hebrew. By age ten, he had learned twelve languages, including Persian, Arabic,
Hindustani and Sanskrit.
Hamilton was a Trinity man. His uncle (who raised him) had gone to Trinity College in
Dublin, and Hamilton matriculated there. He was first in every subject. However, he did not
complete college; while still an undergraduate, he was appointed Professor of Astronomy.
Among Hamilton’s contributions to mathematics is his elegant theory of quaternions. We
saw in Chapter 1 that the field of complex numbers makes it simple to describe transformations
on points in the plane, such as translations, rotations, and scalings. Hamilton struggled to find
a similar approach. When the solution came to him while he was walking along Dublin’s Royal
Canal with his wife, he committed a particularly egregious form of vandalism, carving the defining
equations in the stone of Brougham Bridge.
Hamilton described his epipheny in a letter to a friend:
And here there dawned on me the notion that we must admit, in some sense, a fourth
dimension of space for the purpose of calculating with triples ... An electric circuit
seemed to close, and a spark flashed forth.
Quaternions occupied much of Hamilton’s subsequent life.

Josiah Willard Gibbs, on the other hand, was a Yale man. His father, Josiah Willard Gibbs,
was a professor at Yale, and the son matriculated there at age fifteen. He got his Ph.D. at Yale,
79
CHAPTER 2. THE VECTOR 80

tutored at Yale, and spent three years studying in Europe, after which he returned to become a
professor at Yale and remained there for the rest of his life. He developed vector analysis as an
alternative to quaternions.

Figure 2.1: Josiah Willard Gibbs, the inventor of vector analysis

For twenty years vector analysis did not appear in published form (the primary source was
unpublished notes) until Gibbs finally agreed to publish a book on the topic. It began to displace
the theory of quaternions because it was more convenient to use.
However, it had the drawback of having been invented by an American. The eminent British
physicist Peter Guthrie Tait, a former student of Hamilton and a partisan of quaternions, attacked
mercilessly, writing, for example,
“Professor Willard Gibbs must be ranked as one of the retarders of ... progress in
virtue of his pamphlet on Vector Analysis; a sort of hermaphrodite monster.”
Tait, Elementary Treatise on Quaternions
Today, quaternions are still used, especially in representing rotations in three dimensions. It
has its advocates in computer graphics and computer vision. However, it is safe to say that, in
the end, vector analysis won out. It is used in nearly every field of science and engineering, in
economics, in mathematics, and, of course, in computer science.

2.1 What is a vector?


The word vector comes from the Latin for “carrier”. We don’t plan to study pests; the term
comes from a vector’s propensity to move something from one location to another.
In some traditional math courses on linear algebra, we are taught to think of a vector as a
list of numbers:
[3.14159, 2.718281828, −1.0, 2.0]
CHAPTER 2. THE VECTOR 81

You need to know this way of writing a vector because it is commonly used.1 Indeed, we will
sometimes represent vectors using Python’s lists.

Definition 2.1.1: A vector with four entries, each of which is a real number, is called a 4-vector
over R.

The entries of a vector must all be drawn from a single field. As discussed in the previous
chapter, three examples of fields are R, C, and GF (2). Therefore we can have vectors over each
of these fields.

Definition 2.1.2: For a field F and a positive integer n, a vector with n entries, each belonging
to F, is called an n-vector over F. The set of n-vectors over F is denoted Fn .

For example, the set of 4-vectors over R is written R4 .


This notation might remind you of the notation FD for the set of functions from D to F. In-
deed, I suggest you interpret Fd as shorthand for F{0,1,2,3,...,d−1} According to this interpretation,
Fd is the set of functions from {0, 1, . . . , d − 1} to F.
For example, the 4-vector we started with, [3.14159, 2.718281828, −1.0, 2.0], is in fact the
function

0 ) → 3.14159
1 )→ 2.718281828
2 ) → −1.0
3 )→ 2.0

2.2 Vectors are functions

1 Often parentheses are used instead of brackets.


CHAPTER 2. THE VECTOR 82

excerpt from Matrix Revisited (http://xkcd.com/566/)


Once we embrace this interpretation—once we accept that vectors are functions—a world of
applications opens to us.

Example 2.2.1: Documents as vectors: Here’s an example from a discipline called infor-
mation retrieval that addresses the problem of finding information you want from a corpus of
documents.
Much work in information retrieval has been based on an extremely simple model that dis-
regards grammar entirely: the word-bag model of documents. A document is considered just a
multiset (also called a bag) of words. (A multiset is like a set but can contain more than one
copy of an element. The number of copies is called the multiplicity of the element.)
We can represent a bag of words by a function f whose domain is the set of words and whose
co-domain is the set of real numbers. The image of a word is its multiplicity. Let WORDS be
the set of words (e.g. English words). We write

f : WORDS −→ R

to indicate that f maps from WORDS to R.


Such a function can be interpreted as representing a vector. We would call it a WORDS-vector
over R.

Definition 2.2.2: For a finite set D and a field F, a D-vector over F is a function from D to
F.

This is a computer scientist’s definition; it lends itself to representation in a data structure.


It differs in two important ways from a mathematician’s definition.

• I require the domain D to be finite. This has important mathematical consequences: we will
state theorems that would not be true if D were allowed to be infinite. There are important
mathematical questions that are best modeled using functions with infinite domains, and
you will encounter them if you continue in mathematics.
• The traditional, abstract approach to linear algebra does not directly define vectors at all.
Just as a field is defined as a set of values with some operations (+, -, *, /) that satisfy
certain algebraic laws, a vector space is defined as a set with some operations that satisfy
certain algebraic laws; then vectors are the things in that set. This approach is more
general but it is more abstract, hence harder for some people to grasp. If you continue in
mathematics, you will become very familiar with the abstract approach.

Returning to the more concrete approach we take in this book, according to the notation
from Section 0.3.3, we use FD to denote the set of functions with domain D and co-domain F,
i.e. the set of all D-vectors over F.
CHAPTER 2. THE VECTOR 83

Example 2.2.3: To illustrate this notation for vectors as functions, consider the following:
(a.) RW ORDS : The set of all W ORDS-vectors over R, seen in Example 2.2.1 (Page 82).
(b.) GF (2){0,1,...,n−1} : The set of all n-vectors over GF (2)

2.2.1 Representation of vectors using Python dictionaries


We will sometimes use Python’s lists to represent vectors. However, we have decreed that a
vector is a function with finite domain, and Python’s dictionaries are a convenient representation
of functions with finite domains. Therefore we often use dictionaries in representing vectors.
For example, the 4-vector of Section 2.1 could be represented as {0:3.14159, 1:2.718281828,
2:-1.0, 3:2.0}.
In Example 2.2.1 (Page 82) we discussed the word-bag model of documents, in which a
document is represented by a WORDS-vector over R. We could represent such a vector as a
dictionary but the dictionary would consist of perhaps two hundred thousand key-value pairs.
Since a typical document uses a small subset of the words in WORDS, most of the values would
be equal to zero. In information-retrieval, one typically has many documents; representing each
of them by a two-hundred-thousand-element dictionary would be profligate. Instead, we adopt
the convention of allowing the omission of key-value pairs whose values are zero. This is called a
sparse representation. For example, the document “The rain in Spain falls mainly on the plain”
would be represented by the dictionary

{’on’: 1, ’Spain’: 1, ’in’: 1, ’plain’: 1, ’the’: 2, ’mainly’: 1,


’rain’: 1, ’falls’: 1}

There is no need to explicitly represent the fact that this vector assigns zero to ’snow’, ’France’,
’primarily’, ’savannah’, and the other elements of WORDS.

2.2.2 Sparsity
A vector most of whose values are zero is called a sparse vector. If no more than k of the
entries are nonzero, we say the vector is k-sparse. A k-sparse vector can be represented using
space proportional to k. Therefore, for example, when we represent a corpus of documents by
WORD-vectors, the storage required is proportional to the total number of words comprising all
the documents.
Vectors that represent data acquired via physical sensors (e.g. images or sound) are not likely
to be sparse. In a future chapter, we will consider a computational problem in which the goal,
given a vector and a parameter k, is to find the “closest” k-sparse vector. After we learn what it
means for vectors to be close, it will be straightforward to solve this computational problem.
A solution to this computational problem would seem to be the key to compressing images
and audio segments, i.e. representing them compactly so more can be stored in the same amount
of computer memory. This is correct, but there is a hitch: unfortunately, the vectors representing
images or sound are not even close to sparse vectors. In Section 5.2, we indicate the way around
this obstacle. In Chapter 10, we explore some compression schemes based on the idea.
CHAPTER 2. THE VECTOR 84

In Chapter 4, we introduce matrices and their representation. Because matrices are often
sparse, in order to save on storage and computational time we will again use a dictionary repre-
sentation in which zero values need not be represented.
However, many matrices arising in real-world problems are not sparse in the obvious sense.
In Chapter 11, we investigate another form of sparsity for matrices, low rank. Low-rank matrices
arise in analyzing data to discover factors that explain the data. We consider a computational
probem in which the goal, given a matrix and a parameter k, is to find the closest matrix whose
rank is at most k. We show that linear algebra provides a solution for this computational problem.
It is at the heart of a widely used method called principal component analysis, and we will explore
some of its applications.

2.3 What can we represent with vectors?


We’ve seen two examples of what we can represent with vectors: multisets and sets. Now I want
to give some more examples.

Binary string An n-bit binary string 10111011, e.g. the secret key to a cryptosystem, can
be represented by an n-vector over GF (2), [1, 0, 1, 1, 1, 0, 1, 1]. We will see how some simple
cryptographic schemes can be specified and analyzed using linear algebra.

Attributes In learning theory, we will consider data sets in which each item is represented by
a collection of attribute names and attribute values. This collection is in turn represented by a
function that maps attribute names to the corresponding values.
For example, perhaps the items are congresspeople. Each congressperson is represented by his
or her votes on a set of bills. A single vote is represented by +1, -1, or 0 (aye, nay, or abstain).
We will see in Lab 2.12 a method for measuring the difference between two congresspersons’
voting policies.
Perhaps the items are consumers. Each consumer is represented by his or her age, education
level, and income, e.g.
>>> Jane = {'age':30, 'education level':16, 'income':85000}
Given data on which consumers liked a particular product, one might want to come up with a
function that predicted, for a new consumer vector, whether the consumer would like the product.
This is an example of machine learning. In Lab 8.4, we will consider vectors that describe tissue
samples, and use a rudimentary machine-learning technique to try to predict whether a cancer
is benign or malignant.

State of a system We will also use functions/vectors to represent different states of an evolving
system. The state the world might be represented, for example, by specifying the the population
of each of the five most populous countries:
{'China':1341670000, 'India':1192570000, 'US':308745538,
'Indonesia':237556363, 'Brazil':190732694}
We will see in Chapter 12 that linear algebra provides a way to analyze a system that evolves
over time according to simple known rules.
CHAPTER 2. THE VECTOR 85

Probability distribution Since a finite probability distribution is a function from a finite


domain to the real numbers, e.g.
{1:1/6, 2:1/6, 3:1/6, 4:1/6, 5:1/6, 6:1/6}
it can be considered a vector. We will see in Chapter 12 that linear algebra provides a way to
analyze a random process that evolves over time according to simple probabilistic rules. One
such random process underlies the original definition of PageRank, the method by which Google
ranks pages.

Image A black-and-white 1024 × 768 image can be viewed as a function from the set of pairs
{(i, j) : 0 ≤ i < 1024, 0 ≤ j < 768} to the real numbers, and hence as a vector. The
pixel-coordinate pair (i, j) maps to a number, called the intensity of pixel (i, j). We will study
several applications of representing images by vectors, e.g. subsampling, blurring, searching for
a specified subimage, and face detection.

Example 2.3.1: As an example of a black and white image, consider an 4x8 gradient, repre-
sented as a vector in dictionary form (and as an image), where 0 is black and 255 is white:
{(0,0): 0, (0,1): 0, (0,2): 0, (0,3): 0,
(1,0): 32, (1,1): 32, (1,2): 32, (1,3): 32,
(2,0): 64, (2,1): 64, (2,2): 64, (2,3): 64,
(3,0): 96, (3,1): 96, (3,2): 96, (3,3): 96,
(4,0): 128, (4,1): 128, (4,2): 128, (4,3): 128,
(5,0): 160, (5,1): 160, (5,2): 160, (5,3): 160,
(6,0): 192, (6,1): 192, (6,2): 192, (6,3): 192,
(7,0): 224, (7,1): 224, (7,2): 224, (7,3): 224}

Point in space We saw in Chapter 1 that points in the plane could be represented by com-
plex numbers. Here and henceforth, we use vectors to represent points in the plane, in three
dimensions, and in higher-dimensional spaces.

Task 2.3.2: In this task, we will represent a vector using a Python list.
In Python, assign to the variable L a list of 2-element lists:

>>> L = [[2, 2], [3, 2], [1.75, 1], [2, 1], [2.25, 1], [2.5, 1], [2.75,
1], [3, 1], [3.25, 1]]

Use the plot module described in Task 1.4.1 to plot these 2-vectors.
>>> plot(L, 4)

Unlike complex numbers, vectors can represent points in a higher-dimensional space, e.g. a
three-dimensional space:
CHAPTER 2. THE VECTOR 86

2.4 Vector addition


We have seen examples of what vectors can represent. Now we study the operations performed
with vectors. We have seen that vectors are useful for representing geometric points. The
concept of a vector originated in geometry, and it is in the context of geometry that the basic
vector operations are most easily motivated. We start with vector addition.

2.4.1 Translation and vector addition


We saw in Chapter 1 that translation was achieved in the complex plane by a function f (z) = z0 +z
that adds a complex number z0 to its input complex number; here we similarly achieve translation
by a function f (v) = v0 + v that adds a vector to its input vector.

Definition 2.4.1: Addition of n-vectors is defined in terms of addition of corresponding entries:

[u1 , u2 , . . . , un ] + [v1 , v2 , . . . , vn ] = [u1 + v1 , u2 + v2 , . . . , un + vn ]

For 2-vectors represented in Python as 2-element lists, the addition procedure is as follows:

def add2(v,w):
return [v[0]+w[0], v[1]+w[1]]

Quiz 2.4.2: Write the translation “go east one mile and north two miles” as a function from
2-vectors to 2-vectors, using vector addition. Next, show the result of applying this function to
the vectors [4, 4] and [−4, −4].

Answer
CHAPTER 2. THE VECTOR 87

f (v) = [1, 2] + v

f ([4, 4]) = [5, 6]


f ([−4, −4]) = [−3, −2]

Since a vector such as [1, 2] corresponds to a translation, we can think of the vector as
“carrying” something from one point to another, e.g. from [4, 4] to [5, 6] or from [−4, −4] to
[−3, −2]. This is the sense in which a vector is a carrier.

Task 2.4.3: Recall the list L defined in Task 2.3.2. Enter the procedure definition for 2-vector
addition, and use a comprehension to plot the points obtained from L by adding [1, 2] to each:

>>> plot([add2(v, [1,2]) for v in L], 4)

Quiz 2.4.4: Suppose we represent n-vectors by n-element lists. Write a procedure addn to
compute the sum of two vectors so represented.

Answer

def addn(v, w): return [x+y for (x,y) in zip(v,w)]


or
def addn(v, w): return [v[i]+w[i] for i in range(len(v))]

Every field F has a zero element, so the set FD of D-vectors over F necessarily has a zero
vector, a vector all of whose entries have value zero. I denote this vector by 0D , or merely by 0
if it is not necessary to specify D.
The function f (v) = v + 0 is a translation that leaves its input unchanged.

2.4.2 Vector addition is associative and commutative


Two properties of addition in a field are associativity

(x + y) + z = x + (y + z)

and commutativity
x+y =y+x
Since vector addition is defined in terms of an associative and commutative operation, it too is
associative and commutative:

Proposition 2.4.5 (Associativity and Commutativity of Vector Addition): For any


CHAPTER 2. THE VECTOR 88

vectors u, v, w,
(u + v) + w = u + (v + w)
and
u+v =v+u

2.4.3 Vectors as arrows


Like complex numbers in the plane, n-vectors over R can be visualized as arrows in Rn . The
2-vector [3, 1.5] can be represented by an arrow with its tail at the origin and its head at (3, 1.5)

or, equivalently, by an arrow whose tail is at (−2, −1) and whose head is at (1, 0.5):
CHAPTER 2. THE VECTOR 89

Exercise 2.4.6: Draw a diagram representing the vector [−2, 4] using two different arrows.

In three dimensions, for example, the vector [1, 2, 3] can be represented by an arrow whose
tail is at the origin and whose head is at [1, 2, 3]

or by an arrow whose tail is at [0, 1, 0] and whose head is at [1, 3, 3]:

Like complex numbers, addition of vectors over R can be visualized using arrows. To add u
and v, place the tail of v’s arrow on the head of u’s arrow, and draw a new arrow (to represent
the sum) from the tail of u to the head of v.

u+v
v

We can interpret this diagram as follows: the translation corresponding to u can be composed
CHAPTER 2. THE VECTOR 90

with the translation corresponding to v to obtain the translation corresponding to u + v.

Exercise 2.4.7: Draw a diagram illustrating [−2, 4] + [1, 2].

2.5 Scalar-vector multiplication


We saw in Chapter 1 that scaling could be represented in the complex plane by a function
f (z) = r z that multiplies its complex-number input by a positive real number r, and that
multiplying by a negative number achieves a simultaneous scaling and rotation by 180 degrees.
The analogous operation for vectors is called scalar-vector multiplication. In the context of
vectors, a field element (e.g. a number) is called a scalar because it can be be used to scale a
vector via multiplication. In this book, we typically use Greek letters (e.g. α, β, γ) to denote
scalars.

Definition 2.5.1: Multiplying a vector v by a scalar α is defined as multiplying each entry of


v by α:
α [v1 , v2 , . . . , vn ] = [α v1 , α v2 , . . . , α vn ]

Example 2.5.2: 2 [5, 4, 10] = [2 · 5, 2 · 4, 2 · 10] = [10, 8, 20]

Quiz 2.5.3: Suppose we represent n-vectors by n-element lists. Write a procedure


scalar_vector_mult(alpha, v) that multiplies the vector v by the scalar alpha.

Answer

def scalar_vector_mult(alpha, v):


return [alpha*v[i] for i in range(len(v))]

Task 2.5.4: Plot the result of scaling the vectors in L by 0.5, then plot the result of scaling
them by -0.5.

How shall we interpret an expression such as 2 [1, 2, 3] + [10, 20, 30]? Do we carry out the
scalar-vector multiplication first or the vector addition? Just as multiplication has precedence
over addition in ordinary arithmetic, scalar-vector multiplication has precedence over vector
addition. Thus (unless parentheses indicate otherwise) scalar-vector multiplication happens first,
and the result of the expression above is [2, 4, 6] + [10, 20, 30], which is [12, 24, 36].
CHAPTER 2. THE VECTOR 91

2.5.1 Scaling arrows


Scaling a vector over R by a positive real number changes the length of the corresponding arrow
without changing its direction. For example, an arrow representing the vector [3, 1.5] is this:

and an arrow representing two times this vector is this:

The vector [3, 1.5] corresponds to the translation f (v) = [3, 1.5] + v, and two times this vector
([6, 3]) corresponds to a translation in the same direction but twice as far.
Multiplying a vector by a negative number negates all the entries. As we have seen in
connection with complex numbers, this reverses the direction of the corresponding arrow. For
example, negative two times [3, 1.5] is [−6, −3], which is represented by the arrow
CHAPTER 2. THE VECTOR 92

2.5.2 Associativity of scalar-vector multiplication


Multiplying a vector by a scalar and then multiplying the result by another scalar can be sim-
plified:

Proposition 2.5.5 (Associativity of scalar-vector multiplication): α(βv) = (αβ)v

Proof

To show that the left-hand side equals the right-hand side, we show that each entry of the
left-hand side equals the corresponding enty of the right-hand side. For each element k of
the domain D, entry k of βv is βv[k], so entry k of α(βv) is α(βv[k]). Entry k of (αβ)v is
(αβ)v[k]. By the field’s associative law, α(βv[k]) and (αβ)v[k] are equal. !

2.5.3 Line segments through the origin


Let v be the 2-vector [3, 2] over R. Consider the set of scalars {0, 0.1, 0.2, 0.3, . . . , 0.9, 1.0}. For
each scalar α in this set, α v is a vector that is somewhat shorter than v but points in the same
direction:

The following plot shows the points obtained by multiplying each of the scalars by v:
CHAPTER 2. THE VECTOR 93

plot([scalar_vector_mult(i/10, v) for i in range(11)], 5)

Hmm, seems to be tracing out the line segment from the origin to the point (3, 2). What if
we include as scalar multipliers all the real numbers between 0 and 1? The set of points

{α v : α ∈ R, 0 ≤ α ≤ 1}

forms the line segment between the origin and v. We can visualize this by plotting not all such
points (even Python lacks the power to process an uncountably infinite set of points) but a
sufficiently dense sample, say a hundred points:
plot([scalar_vector_mult(i/100, v) for i in range(101)], 5)

2.5.4 Lines through the origin


As long as we have permitted an infinite set of scales, let’s go all out. What shape do we obtain
when α ranges over all real numbers? The scalars bigger than 1 give rise to somewhat larger
copies of v. The negative scalars give rise to vectors pointing in the opposite direction. Putting
these together,
CHAPTER 2. THE VECTOR 94

we see that the points of


{αv : α ∈ R}
forms the (infinite) line through the origin and through v:

Review question: Express the line segment between the origin and another point through
the origin as a set of scalar multiples of a single vector.
Review question: Express a line through the origin as the set of scalar multiples of a single
vector.

2.6 Combining vector addition and scalar multiplication


2.6.1 Line segments and lines that don’t go through the origin
Great—we can describe the set of points forming a line or line segment through the origin. It
would be a lot more useful if we could describe the set of points forming an arbitrary line or line
segment; we could then, for example, plot street maps:
CHAPTER 2. THE VECTOR 95

We already know that the points forming the segment from [0, 0] to [3, 2] are {α [3, 2] : α ∈
R, 0 ≤ α ≤ 1}. By applying the translation [x, y] )→ [x + 0.5, y + 1] to these points,

we obtain the line segment from [0.5, 1] to [3.5, 3]:

Thus the set of points making up this line segment is:

{α [3, 2] + [0.5, 1] : α ∈ R, 0 ≤ α ≤ 1}

Accordingly, we can plot the line segment using the following statement:
plot([add2(scalar_vector_mult(i/100., [3,2]), [0.5,1]) for i in range(101)], 4)
CHAPTER 2. THE VECTOR 96

We can similarly represent the entire line through two given points. For example, we know
that the line through [0, 0] and [3, 2] is {α [3, 2] : α ∈ R}. Adding [0.5, 1] to each point in this
set gives us the line through [0.5, 1] and [3.5, 3]: {[0.5, 1] + α [3, 2] : α ∈ R}.

Exercise 2.6.1: Given points u = [2, 3] and v = [5, 7] in R2 , what is the point w such that
the origin-to-w line segment can be translated to yield the u-to-v line segment? And what is
the translation vector that is applied to both endpoints?

Exercise 2.6.2: Given a pair of points, u = [1, 4], v = [6, 3] in R2 , write a mathematical
expressing giving the set of points making up the line segment between the points.

2.6.2 Distributive laws for scalar-vector multiplication and vector ad-


dition
To get a better understanding of this formulation of line segments and lines, we make use of two
properties that arise in combining scalar-vector multiplication and vector adddition. Both arise
from the distributive law for fields, x(y + z) = xy + xz.

Proposition 2.6.3 (Scalar-vector multiplication distributes over vector addition):

α(u + v) = αu + αv (2.1)

Example 2.6.4: As an example, consider the multiplication:

2 ([1, 2, 3] + [3, 4, 4]) = 2 [4, 6, 7] = [8, 12, 14]


CHAPTER 2. THE VECTOR 97

which is the same as:

2 ([1, 2, 3] + [3, 4, 4]) = 2 [1, 2, 3] + 2 [3, 4, 4] = [2, 4, 6] + [6, 8, 8] = [8, 12, 14]

Proof

We use the same approach as used in the proof of Proposition 2.5.5. To show that the
left-hand side of Equation 2.1 equals the right-hand side, we show that each entry of the
left-hand side equals the corresponding entry of the right-hand side.
For each element k of the domain D, entry k of (u + v) is u[k] + v[k], so entry k of
α (u + v) is α (u[k] + v[k]).
Entry k of α u is αu[k] and entry k of α v is α v[k], so entry k of α u+α v is α u[k]+α v[k].
Finally, by the distributive law for fields, α(u[k] + v[k]) = α u[k] + α v[k]. !

Proposition 2.6.5 (scalar-vector multiplication distributes over scalar addition):

(α + β)u = αu + βu

Problem 2.6.6: Prove Proposition 2.6.5.

2.6.3 First look at convex combinations


It might seem odd that the form of the expression for the set of points making up the [0.5, 1]-to-
[3.5, 3] segment, {α [3, 2] + [0.5, 1] : α ∈ R, 0 ≤ α ≤ 1}, mentions one endpoint but not the other.
This asymmetry is infelicitous. Using a bit of vector algebra, we can obtain a nicer expression:

α [3, 2] + [0.5, 1] = α ([3.5, 3] − [0.5, 1]) + [0.5, 1]


= α [3.5, 3] − α [0.5, 1] + [0.5, 1] by Proposition 2.6.3
= α [3.5, 3] + (1 − α) [0.5, 1] by Proposition 2.6.5
= α [3.5, 3] + β [0.5, 1]

where β = 1 − α. We now can write an expression for the [0.5, 1]-to-[3.5, 3] segment

{α [3.5, 3] + β [0.5, 1] : α, β ∈ R, α, β ≥ 0, α + β = 1}

that is symmetric in the two endpoints.


An expression of the form α u + β v where α, β ≥ 0 and α + β = 1 is called a convex
combination of u and v. Based on the example, we are led to the following assertion, which is
true for any pair u, v of distinct n-vectors over R:
CHAPTER 2. THE VECTOR 98

Proposition 2.6.7: The u-to-v line segment consists of the set of convex combinations of u
and v.

Example 2.6.8: The table below shows some convex combinations of pairs of 1- and 2-vectors
over R:

1. u1 = [2], v1 = [12]
# $ # $
5 10
2. u2 = , v2 =
2 −6

α=1 α = .75 α = .5 α = .25 α=0


β=0 β = .25 β = .5 β = .75 β=1
αu1 + βv1 # [2] $ # [4.5] $ # [7] $ # [9.5] $ # [12] $
5 6.25 7.5 8.75 10
αu2 + βv2
2 −2 −2 −4 −6

Task 2.6.9: Write a python procedure segment(pt1, pt2) that, given points represented as
2-element lists, returns a list of a hundred points spaced evenly along the line segment whose
endpoints are the two points
Plot the hundred points resulting when pt1 = [3.5, 3] and pt2 = [0.5, 1]

Example 2.6.10: Let’s consider the convex combinations of a pair of vectors that represent
images,

u= and v= .

1
For example, with scalars 2 and 12 , the convex combination, which is the average, looks like this:

1 1
+ =
2 2

To represent the “line segment” between the two face images, we can take a number of convex
combinations:
CHAPTER 2. THE VECTOR 99

7
1u + 0v 8u + 18 v 6
8u + 28 v 5
8u + 38 v 4
8u + 48 v 3
8u + 58 v 2
8u + 68 v 1
8u + 78 v 0u + 1v

By using these images as frames in a video, we get the effect of a crossfade.

2.6.4 First look at affine combinations


What about the infinite line through [0.5, 1] and [3.5, 3]? We saw that this line consists of the
points of {[0.5, 1] + α [3, 2] : α ∈ R}. Using a similar argument, we can rewrite this set as

{α [3.5, 3] + β [0.5, 1] : α ∈ R, β ∈ R, α + β = 1}

An expression of the form α u + β v where α + β = 1 is called an affine combination of u and v.


Based on the example, we are led to the following assertion:

Hypothesis 2.6.11: The line through u and v consists of the set of affine combinations of u
and v.

In Chapter 3, we will explore affine and convex combinations of more than two vectors.

2.7 Dictionary-based representations of vectors


In Section 2.2, I proposed that a vector is a function from some domain D to a field. In Sec-
tion 2.2.1, I proposed to represent such a function using a Python dictionary. It is convenient to
define a Python class Vec so that an instance has two fields (also known as instance variables,
also known as attributes):

• f, the function, represented by a Python dictionary, and


• D, the domain of the function, represented by a Python set.
We adopt the convention described in Section 2.2.1 in which entries with value zero may be
omitted from the dictionary f . This enables sparse vectors to be represented compactly.
It might seem a bad idea to require that each instance of Vec keep track of the domain.
For example, as I pointed out in Section 2.2.1, in information retrieval one typically has many
documents, each including only a very small subset of words; it would waste memory to duplicate
the entire list of allowed words with each document. Fortunately, as we saw in Section 0.5.4,
Python allows many variables (or instance variables) to point to the same set in memory. Thus,
if we are careful, we can ensure that all the vectors representing documents point to the same
domain.
The Python code required to define the class Vec is
CHAPTER 2. THE VECTOR 100

class Vec:
def __init__(self, labels, function):
self.D = labels
self.f = function

Once Python has processed this definition, you can create an instance of Vec like so:
>>> Vec({'A','B','C'}, {'A':1})
The first argument is assigned to the new instance’s D field, and the second is assigned to the
f field. The value of this expression will be the new instance. You can assign the value to a
variable
>>> v = Vec({'A','B','C'}, {'A':1})
and subsequently access the two fields of v, e.g.:

>>> for d in v.D:


... if d in v.f:
... print(v.f[d])
...
1.0

Quiz 2.7.1: Write a procedure zero_vec(D) with the following spec:

• input: a set D
• output: an instance of Vec representing a D-vector all of whose entries have value zero

Answer

Exploiting the sparse-representation convention, we can write the procedure like this:

def zero_vec(D): return Vec(D, {})


Without the convention, one could write it like this:
def zero_vec(D): return Vec(D, {d:0 for d in D})

The procedure zero_vec(D) is defined in the provided file vecutil.py.

2.7.1 Setter and getter


In the following quizzes, you will write procedures that work with the class-based representation
of vectors. In a later problem, you will incorporate some of these procedures into a module that
defines the class Vec.
The following procedure can be used to assign a value to a specified entry of a Vec v:
CHAPTER 2. THE VECTOR 101

def setitem(v, d, val): v.f[d] = val


The second argument d should be a member of the domain v.D. The procedure can be used, for
example, as follows:
>>> setitem(v, 'B', 2.)

Quiz 2.7.2: Write a procedure getitem(v, d) with the following spec:


• input: an instance v of Vec, and an element d of the set v.D

• output: the value of entry d of v


Write your procedure in a way that takes into account the sparse-representation convention.
Hint: the procedure can be written in one-line using a conditional expression (Section 0.5.3).
You can use your procedure to obtain the 'A' entry of the vector v we defined earlier:

>>> getitem(v, 'A')


1

Answer

The following solution uses a conditional expression:

def getitem(v,d): return v.f[d] if d in v.f else 0

Using an if-statement, you could write it like this:


def getitem(v,d):
if d in v.f:
return v.f[d]
else:
return 0

2.7.2 Scalar-vector multiplication


Quiz 2.7.3: Write a procedure scalar_mul(v, alpha) with the following spec:

• input: an instance of Vec and a scalar alpha


• output: a new instance of Vec that represents the scalar-vector product alpha times v.
There is a nice way to ensure that the output vector is as sparse as the input vector, but you are
not required to ensure this. You can use getitem(v, d) in your procedure but are not required
to. Be careful to ensure that your procedure does not modify the vector it is passed as argument;
it creates a new instance of Vec. However, the new instance should point to the same set D as
the old instance.
CHAPTER 2. THE VECTOR 102

Try it out on the vector v:


>>> scalar_mul(v, 2)
<__main__.Vec object at 0x10058cd10>
Okay, that’s not so enlightening. Let’s look at the dictionary of the resulting Vec:
>>> scalar_mul(v, 2).f
{'A': 2.0, 'C': 0, 'B': 4.0}

Answer

The following procedure does not preserve sparsity.


def scalar_mul(v, alpha):
return Vec(v.D, {d:alpha*getitem(v,d) for d in v.D})

To preserve sparsity, you can instead write

def scalar_mul(v, alpha):


return Vec(v.D, {d:alpha*value for d,value in v.f.items()})

2.7.3 Addition
Quiz 2.7.4: Write a procedure add(u, v) with the following spec:

• input: instances u and v of Vec


• output: an instance of Vec that is the vector sum of u and v

Here’s an example of the procedure being used:


>>> u = Vec(v.D, {'A':5., 'C':10.})
>>> add(u,v)
<__main__.Vec object at 0x10058cd10>
>>> add(u,v).f
{'A': 6.0, 'C': 10.0, 'B': 2.0}

You are encouraged to use getitem(v, d) in order to tolerate sparse representations. You are
encouraged not to try to make the output vector sparse. Finally, you are encouraged to use a
dictionary comprehension to define the dictionary for the new instance of Vec.

Answer

def add(u, v):


CHAPTER 2. THE VECTOR 103

return Vec(u.D,{d:getitem(u,d)+getitem(v,d) for d in u.D})

2.7.4 Vector negative, invertibility of vector addition, and vector sub-


traction
The negative of a vector v is the vector −v obtained by negating each element of v. If we
interpret v as an arrow, its negative −v is the arrow of the same length pointed in the exactly
opposite direction.
If we interpret v as a translation (e.g. “Go east two miles and north three miles”), its negative
(e.g., “Go east negative two miles and north negative three miles”) is the inverse translation.
Applying one translation and then another leaves you back where you started.
Vector subtraction is defined in terms of vector addition and negative: u − v is defined as
u + (−v). This definition is equivalent to the obvious definition of vector subtraction: subtract
corresponding elements.
Vector subtraction is the inverse of vector addition. For some vector w, consider the function

f (v) = v + w

that adds w to its input and the function

g(v) = v − w

that subtracts w from its input. One function translates its input by w and the other translates
its input by −w. These functions are inverses of each other. Indeed,

(g ◦ f )(v) = g(f (v))


= g(v + w)
= v+w−w
= v

Quiz 2.7.5: Write a Python procedure neg(v) with the following spec:
• input: an instance v of Vec
• output: a dictionary representing the negative of v
Here’s an example of the procedure being used:

>>> neg(v).f
{'A': -1.0, 'C': 0, 'B': -2.0}

There are two ways to write the procedure. One is by explicitly computing the .f field of the
output vector using a comprehension. The other way is by using an appropriate call to the
procedure scalar_mul you defined in Quiz 2.7.3.
CHAPTER 2. THE VECTOR 104

Answer

def neg(v):
return Vec(v.D, {d:-getitem(v, d) for d in v.D})
or
def neg(v):
return Vec(v.D, {key:-value for key, value in v.f.items()})
or
def neg(v): return scalar_mul(v, -1)

2.8 Vectors over GF (2)


So far we have studied only vectors over R. In this section, we consider vectors over GF (2), and
give some example applications. Remember from Section 1.5 that GF (2) is a field in which the
only values are 0 and 1, and adding 1 and 1 gives 0, and subtracting is the same as adding.
For the sake of brevity, we will sometimes write specific n-vectors over GF (2) as n-bit binary
strings. For example, we write 1101 for the 4-vector whose only zero is in its third entry.

Quiz 2.8.1: GF (2) vector addition practice: What is 1101 + 0111? (Note: it is the same as
1101 − 0111.)

Answer
1010

2.8.1 Perfect secrecy re-revisited


Recall Alice and Bob and their need for perfect secrecy. We saw in Section 1.5.1 that encrypting
a single-bit plaintext consisted of adding that bit to a single-bit key, using GF (2) addition. We
saw also that, to encrypt a sequence of bits of plaintext, it sufficed to just encrypt each bit with
its own bit of key. That process can be expressed more compactly using addition of vectors over
GF (2).
Suppose Alice needs to send a ten-bit plaintext p to Bob.

Vernam’s cypher: Alice and Bob randomly choose a 10-vector k.


Alice computes the cyphertext c according to the formula

c=p+k

where the sum is a vector sum.


CHAPTER 2. THE VECTOR 105

The first thing to check is that this cryptosystem is decryptable—that Bob, who knows k and c,
can recover p. He does so using the equation

p=c−k (2.2)

Example 2.8.2: For example, Alice and Bob agree on the following 10-vector as a key:

k = [0, 1, 1, 0, 1, 0, 0, 0, 0, 1]

Alice wants to send this message to Bob:

p = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1]

She encrypts it by adding p to k:

c = k + p = [0, 1, 1, 0, 1, 0, 0, 0, 0, 1] + [0, 0, 0, 1, 1, 1, 0, 1, 0, 1]

c = [0, 1, 1, 1, 0, 1, 0, 1, 0, 0]
When Bob receives c, he decrypts it by adding k:

c + k = [0, 1, 1, 1, 0, 1, 0, 1, 0, 0] + [0, 1, 1, 0, 1, 0, 0, 0, 0, 1] = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1]

which is the original message.

Next we check that the system satisfies perfect secrecy. The argument should be familiar.
For each plaintext p, the function k )→ k + p is one-to-one and onto, hence invertible. Since the
key k is chosen uniformly at random, therefore, the cyphertext c is also distributed uniformly.

2.8.2 All-or-nothing secret-sharing using GF (2)


I have a secret: the midterm exam. I’ve represented it as an n-vector v over GF (2). I want to
provide it to my two teaching assistants, Alice and Bob (A and B), so they can administer the
midterm while I’m taking a vacation. However, I don’t completely trust them. One TA might
be bribed by a student into giving out the exam ahead of time, so I don’t want to simply provide
each TA with the exam.
I therefore want to take precautions. I provide pieces to the TAs in such a way that the two
TAs can jointly reconstruct the secret but that neither of the TAs all alone gains any information
whatsoever.
Here’s how I do that. I choose a n-vector vA over GF (2) randomly according to the uniform
distribution. (That involves a lot of coin-flips.) I then compute another n-vector vB by

vB := v − vA

Finally, I provide Alice with vA and Bob with vB , and I leave for vacation.
CHAPTER 2. THE VECTOR 106

When the time comes to administer the exam, the two TAs convene to reconstruct the exam
from the vectors they have been given. They simply add together their vectors

vA + v B

The definition of vB ensures that this sum is in fact the secret vector v.
How secure is this scheme against a single devious TA? Assume that Alice is corrupt and
wants to sell information about the exam. Assume that Bob is honest, so Alice cannot get his
help in her evil plan. What does Alice learn from her piece, vA , about the exam v? Since Alice’s
piece was chosen uniformly at random, she learns nothing from it.
Now suppose instead that Bob is the corrupt TA. What does he learn from his piece, vB ,
about the exam? Define the function f : GF (2)n −→ GF (2)n by

f (x) = v − x

The function g(y) = v + x is the inverse of f , so f is an invertible function.2 Therefore, since


vB = f (vA ) and vA is chosen according to the uniform distribution, the distribution of vB is also
uniform. This shows that Bob learns nothing about the secret from his piece. The secret-sharing
scheme is secure.
The company RSA recently introduced a product based on this idea:

The main idea is to split each password into two parts, and store the two parts on two different
servers. An attack on only one server does not compromise the security of the passwords.

Problem 2.8.3: Explain how to share an n-bit secret among three TAs so that a cabal con-
sisting of any two of them learns nothing about the secret.

2.8.3 Lights Out


Vectors over GF (2) can be used to analyze a puzzle called Lights Out. It is a five-by-five grid of
lighted buttons.
2 In fact, because we are using GF (2), it turns out that g is the same function as f , but that is not important

here.
CHAPTER 2. THE VECTOR 107

Initially some lights are on and some are off. When you push a button, you switch the corre-
sponding light (from on to off or vice versa) but you also switch the lights of the button’s four
neighbors. The goal is to turn out all the lights.
Solving this problem is a computational problem:

Computational Problem 2.8.4: Solving Lights Out:


Given an initial configuration of lights, find a sequence of button-pushes that turns out all the
lights, or report that none exists.

This Computational Problem raises a question:

Question 2.8.5: Is there a way to solve the puzzle for every possible starting configuration?

Of course, the Question and the Computational Problem can be studied not just for the tradi-
tional five-by-five version of Lights Out, but for a version of any dimensions.
What do vectors have to do with Lights Out? The state of the puzzle can be represented by
a vector over GF (2) with one entry for each of the grid positions. A convenient domain is the
set of tuples (0,0), (0,1), ..., (4,3), (4,4). We adopt the convention of representing a light that is
on by a one and a light that is off by a zero. Thus the state of the puzzle in the picture is
{(0,0):one, (0,1):one, (0,2):one, (0,3):one, (0,4):0,
(1,0):one, (1,1):one, (1,2):one, (1,3):0, (1,4):one,
(2,0):one, (2,1):one, (2,2):one, (2,3):0, (2,4):one,
(3,0):0, (3,1):0, (3,2):0, (3,3):one, (3,4):one,
(4,0):one, (4,1):one, (4,2):0, (4,3):one, (4,4):one}
Let s denote this vector.
CHAPTER 2. THE VECTOR 108

A move consists of pushing a button, which changes the state of the puzzle. For example,
pushing the top-left button (0,0) flips the light at (0,0), at (0,1), and at (1,0). Therefore this
change can be represented by the “button vector”
{(0,0):one, (0,1):one, (1,0):one}

Let v0,0 denote this vector.


The new state resulting when you start s and then push button (0, 0) is represented by the
vector s + v0,0 . Why?
• For each entry (i, j) for which v0,0 is zero, entries (i, j) in s and s + v0,0 are the same.

• For each entry (i, j) for which v0,0 is one, entries (i, j) in s and s + v0,0 differ.
We chose the vector v0,0 to have ones in exactly the positions that change when you push
button (0, 0).
The state of the puzzle is now

Next we will push the button at (1, 1), which flips the lights at (1,1), (0,1), (1,0), (2,1), and
(1,2). This button corresponds to the vector

{(1,1):one, (0,1):one, (1,0):one, (2,1):one, (1,2):one}


which we denote by v1,1 . Before the button is pushed, the state is represented by the vector
s + v0,0 . After the button is pushed, the state is represented by the vector s + v0,0 + v1,1 .
CHAPTER 2. THE VECTOR 109

Summarizing, executing a move—pushing a button—means updating the state of the puzzle as:
new state := old state + button vector
where by addition we mean addition of vectors over GF (2). Thus a button vector can be viewed
as a translation.
Here is an example of solving an instance of the 3 × 3 puzzle:
• • • •
• + • = • •
• •
state move new state
Returning to the 5 × 5 case, there are twenty-five buttons, and for each button there is a
corresponding button vector. We can use these vectors to help us solve the puzzle. Given an
initial state of the lights, the goal is to find a sequence of button-pushes that turn off all the
lights. Translated into the language of vectors, the problem is: given a vector s representing the
initial state, select a sequence of button vectors v1 , . . . , vm such that
(· · · ((s + v1 ) + v2 ) · · · ) + vm = the zero vector
By the associativity of vector addition, the parenthesization on the left-hand side is irrelevant,
so we can instead write
s + v1 + v2 + · · · + vm = the all-zeroes vector
Let us add s to both sides. Since a vector plus itself is the all-zeroes vector, and adding the
all-zeroes vector does nothing, we obtain
v 1 + v2 + · · · + vm = s
CHAPTER 2. THE VECTOR 110

If a particular button’s vector appears twice on the left-hand side, the two occurences cancel
each other out. Thus we can restrict attention to solutions in which each button vector appears
at most once.
By the commutativity of vector addition, the order of addition is irrelevant. Thus finding
out how to get from a given initial state of the lights to the completely dark state is equivalent
to selecting a subset of the button vectors whose sum is the vector s corresponding to the given
initial state. If we could solve this puzzle, just think of the energy savings!
For practice, let’s try the 2×2 version of the puzzle. The button vectors for the 2×2 puzzle are:

• • • • • •
• • • • • •
where the black dots represent ones.


Quiz 2.8.6: Find the subset of the button vectors whose sum is

Answer

• • • •
= +
• • • •

Now that we know how to model Lights Out in terms of vectors, we can see Computational
Problem 2.8.4 (Solving Lights Out) as a special case of a more general problem:

Computational Problem 2.8.7: Representing a given vector as a sum of a subset of other


given vectors over GF (2)

• input: a vector s and a list L of vectors over GF (2)


• output: A subset of the vectors in L whose sum is s, or a report that there is no such
subset.

There is a brute-force way to compute a solution to this problem: try each possible subset
of vectors in L. The number of possibilities is 2|L| , 2 to the power of the cardinality of L. For
example, in the Lights Out problem, L consists of twenty-five vectors, one for each of the buttons.
The number of possible subsets is 225 , which is 33,554,432.
However, there is a much slicker way to compute the solution. In Chapter 3, we introduce
a still more general problem, Computational Problem 3.1.8. In Chapter 7, we will describe an
algorithm to solve it. The algorithm is relevant not only to Lights Out problems but to other,
perhaps more serious problems such as factoring integers.
CHAPTER 2. THE VECTOR 111

2.9 Dot-product
For two D-vectors u and v, the dot-product is the sum of the product of corresponding entries:
%
u·v = u[k] v[k]
k∈D

For example, for traditional vectors u = [u1 , . . . , un ] and v = [v1 , . . . , vn ],

u · v = u 1 v 1 + u 2 v 2 + · · · + un v n

Note that the output is a scalar, not a vector. For this reason, the dot-product is sometimes
called the scalar product of vectors.

Example 2.9.1: Consider the dot-product of [1, 1, 1, 1, 1] with [10, 20, 0, 40, −100]. To find
the dot-product, we can write the two vectors so that corresponding entries are aligned, multiply
the pairs of corresponding entries, and sum the resulting products.

1 1 1 1 1
•10 20 0 40 -100
10 + 20 + 0 + 40 + (-100) = -30
In general, the dot-product of an all-ones vector with a second vector equals the sum of entries
of the second vector:

[1, 1, . . . , 1] · [v1 , v2 , . . . , vn ] = 1 · v1 + 1 · v 2 + · · · + 1 · v n
= v1 + v 2 + · · · + vn

Example 2.9.2: Consider the dot-product of [0, 0, 0, 0, 1] with [10, 20, 0, 40, −100].
0 0 0 0 1
• 10 20 0 40 -100
0 + 0 + 0 + 0 + (-100) = -100
th
In general, if only one entry of u, say the i entry, is 1, and all other entries of u are zero, u · v
is the ith entry of v:

[0, 0, · · · , 0, 1, 0, · · · , 0, 0] · [v1 , v2 , · · · , vi−1 , vi , vi+1 , . . . , vn ]


= 0 · v1 + 0 · v2 + · · · + 0 · vi−1 + 1 · vi + 0 · vi+1 + · · · + 0 · vn
= 1 · vi
= vi

Quiz 2.9.3: Express the average of the entries of an n-vector v as a dot-product.


CHAPTER 2. THE VECTOR 112

Answer

Let u be the vector in which every entry is 1/n. Then u · v is the average of the entries of
v.

Quiz 2.9.4: Write a procedure list_dot(u, v) with the following spec:


• input: equal-length lists u and v of field elements

• output: the dot-product of u and v interpreted as vectors


Use the sum(·) procedure together with a list comprehension.

Answer

def list_dot(u, v): return sum([u[i]*v[i] for i in range(len(u))])


or
def list_dot(u, v): return sum([a*b for (a,b) in zip(u,v)])

2.9.1 Total cost or benefit


Example 2.9.5: Suppose D is a set of foods, e.g. four ingredients of beer:

D = {hops, malt, water, yeast}

A cost vector maps each food to a price per unit amount:

cost = Vec(D, {hops : $2.50/ounce, malt : $1.50/pound, water : $0.006, yeast : $0.45/gram})

A quantity vector maps each food to an amount (e.g. measured in pounds). For example, here
is the amount of each of the four ingredients going into about six gallons of stout:
quantity = Vec({hops:6 ounces, malt:14 pounds, water:7 gallons, yeast:11 grams})
The total cost is the dot-product of cost with quantity:

cost · quantity = $2.50 · 6 + $1.50 · 14 + $0.006 · 7 + $0.45 · 11 = $40.992

A value vector maps each food to its caloric content per pound:

value = {hops : 0, malt : 960, water : 0, yeast : 3.25}

The total calories represented by six gallons of stout is the dot-product of value with quantity:

value · quantity = 0 · 6 + 960 · 14 + 7 · 0 + 3.25 · 11 = 13475.75


CHAPTER 2. THE VECTOR 113

2.9.2 Linear equations


Definition 2.9.6: A linear equation is an equation of the form a · x = β, where a is a vector,
β is a scalar, and x is a vector variable.

The scalar β is called the right-hand side of the linear equation because it is conventionally
written on the right of the equals sign.

Example 2.9.7: Sensor node energy utilization: Sensor networks are made up of small, cheap
sensor nodes. Each sensor node consists of some hardware components (e.g., radio, temperature
sensor, memory, CPU). Often a sensor node is battery-driven and located in a remote place, so
designers care about each component’s power consumption. Define

D = {radio, sensor, memory, CPU}

The function mapping each hardware component to its power consumption is a vector that we
will call rate:

rate = Vec(D, {memory : 0.06W, radio : 0.1W, sensor : 0.004W, CPU : 0.0025W})

The function mapping each component to the amount of time it is on during a test period is a
vector that we will call duration:

duration = Vec(D, {memory : 1.0s, radio : 0.2s, sensor : 0.5s, CPU : 1.0s})

The total energy consumed by the sensor node during the test period is the dot-product of rate
and duration:
duration · rate = 0.0845J
measured in Joules (equivalently, Watt-seconds)

>>> D = {'memory', 'radio', 'sensor', 'CPU'}


>>> rate = Vec(D, {'memory':0.06, 'radio':0.1, 'sensor':0.004, 'CPU':0.0025})
>>> duration = Vec(D, {'memory':1.0, 'radio':0.2, 'sensor':0.5, 'CPU':1.0})
>>> rate*duration
0.0845

Now suppose that in reality we don’t know the power consumption of each hardware com-
ponent; the values of the entries of rate are unknowns. Perhaps we can calculate these values
by testing the total power consumed during each of several test periods. Suppose that there are
three test periods. For i = 1, 2, 3, we have a vector durationi giving the amount of time each
hardware component is on during test period i, and a scalar βi giving the total power used during
the test period. We consider rate a vector-valued variable, and we write down what we know in
CHAPTER 2. THE VECTOR 114

terms of five linear equations involving that variable:

duration1 · rate = β1
duration2 · rate = β2
duration3 · rate = β3

Can we compute the entries of rate from these equations? This amounts to two questions:
1. Is there an algorithm to find a vector that satisfies these linear equations?

2. Is there exactly one solution, one vector that satisfies the linear equations?
Even if there is an algorithm to compute some vector that satisfies the linear equations, we
cannot be sure that the solution we compute is in fact the vector we are seeking unless there is
only one vector that satisfies the equations.

Example 2.9.8: Here are some duration vectors:

>>> duration1 = Vec(D, {'memory':1.0, 'radio':0.2, 'sensor':0.5, 'CPU':1.0})


>>> duration2 = Vec(D, {'sensor':0.2, 'CPU':0.4})
>>> duration3 = Vec(D, {'memory':0.3, 'CPU':0.1})
Can we find a vector rate such that duration1*rate = 0.11195, duration2*rate = 0.00158,
and duration3*rate = 0.02422? And is there only one such vector?

Quiz 2.9.9: Using the data in the following table, calculate the rate of energy consumption of
each of the hardware components. The table specifies for each of four test periods how long each
hardware component operates and how much charge is transferred through the sensor node.
radio sensor memory CPU TOTAL ENERGY CONSUMED
test 0 1.0 sec 1.0 sec 0 sec 0 sec 1.5 J
test 1 2.0 sec 1.0 sec 0 0 2.5 J
test 2 0 0 1.0 sec 1.0 sec 1.5 J
test 3 0 0 0 1.0 sec 1W

Answer

radio sensor memory CPU


1W 0.5 W 0.5 W 1W

Definition 2.9.10: In general, a system of linear equations (often abbreviated linear system)
CHAPTER 2. THE VECTOR 115

is a collection of equations:

a1 · x = β1
a2 · x = β2
..
. (2.3)
am · x = βm

where x is a vector variable. A solution is a vector x̂ that satisfies all the equations.

With these definitions in hand, we return to the two questions raised in connection with esti-
mating the energy consumption of sensor-node components. First, the question of uniqueness:

Question 2.9.11: Uniqueness of solution to a linear system


For a given linear system (such as 2.3), how can we tell if there is only one solution?

Second, the question of computing a solution:

Computational Problem 2.9.12: Solving a linear system

• input: a list of vectors a1 , . . . , am , and corresponding scalars β1 , . . . , βm (the right-hand


sides)

• output: a vector x̂ satisfying the linear system 2.3 or a report that none exists.

Computational Problem 2.8.7, Representing a given vector as a sum of a subset of other given
vectors over GF (2), will turn out to be a special case of this problem. We will explore the
connections in the next couple of chapters. In later chapters, we will describe algorithms to solve
the computational problems.

2.9.3 Measuring similarity


Dot-product can be used to measure the similarity between vectors over R.

Comparing voting records


In Lab 2.12, you will compare the voting records of senators using dot-product. The domain D
is a set of bills the Senate voted on. Each senator is represented by a vector that maps a bill to
{+1, −1, 0}, corresponding to Aye, Nay, or Abstain. Consider the dot-product of two Senators,
e.g. (then) Senator Obama and Senator McCain. For each bill, if the two senators agreed (if
they both voted in favor or both voted against), the product of the corresponding entries is 1. If
the senators disagreed (if one voted in favor and one voted against), the product is -1. If one or
both abstained, the product is zero. Adding up these products gives us a measure of how much
CHAPTER 2. THE VECTOR 116

they agree: the higher the sum, the greater the agreement. A positive sum indicates general
agreement, and a negative sum indicates general disagreement.

Comparing audio segments


Suppose you have a short audio clip and want to search for occurences of it in a longer audio
segment. How would you go about searching for the needle (the short audio clip) in the long
segment (the haystack)?
We pursue this idea in the next example. In preparation, we consider a simpler problem:
measuring the similarity between two audio segments of the same length.
Mathematically, an audio segment is a waveform, a continuous function of time:

The value of the function is amplitude. The amplitude oscillates between being a positive number
and being a negative number. How positive and how negative depends on the volume of the audio.
On a digital computer, the audio segment is represented by a sequence of numbers, values of
the continuous function sampled at regular time intervals, e.g. 44,100 times a second:

Let’s first consider the task of comparing two equally long audio segments. Suppose we have two
segments, each consisting of n samples, represented as n-vectors u and v.

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

5 -3 -9 0 -1 3 0 -2 -1 6 0 0 -4 5 -7 1 -9 0 -1 0 9 5 -3
CHAPTER 2. THE VECTOR 117

!n
One simple way to compare them is using dot-product i=1 u[i] v[i]. Term i in this sum is
positive if u[i] and v[i] have the same sign, and negative if they have opposite signs. Thus, once
again, the greater the agreement, the greater the value of the dot-product.
Nearly identical audio segments (even if they differ in loudness) will produce a higher value
than different segments. However, the bad news is that if the two segments are even slightly
off in tempo or pitch, the dot-product will be small, probably close to zero. (There are other
techniques to address differences of this kind.)

Finding an audio clip


Back to the problem of finding a needle (a short audio clip) in a haystack (a long audio segment).
Suppose, for example, that the haystack consists of 23 samples and the needle consists of 11
samples.
Suppose we suspect that samples 10 through 22 of the long segment match up with the sam-
ples comprising the short clip. To verify that suspicion, we can form the vector consisting of
samples 10 through 22 of the long segment, and compute the dot-product of that vector with the
short clip:

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

Of course, we ordinarily have no idea where in the long segment we might find the short clip.
It might start at position 0, or position 1, or ... or 12. There are 23 − 11 + 1 possible starting
positions (not counting those positions too close to the end for the short clip to appear there).
We can evaluate each of these possibilities by computing an appropriate dot-product.

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9
CHAPTER 2. THE VECTOR 118

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

2 7 4 -3 0 -1 -6 4 5 -8 -9

For this example, the long segment consists of 23 numbers and the short clip consists of 11
numbers, so we end up with 23 − 11 dot-products. We put these twelve numbers in an output
vector.

Quiz 2.9.13: Suppose the haystack is [1, −1, 1, 1, 1, −1, 1, 1, 1] and the needle is [1, −1, 1, 1, −1, 1].
Compute the dot-products and indicate which position achieves the best match.

Answer

The dot-products are [2, 2, 0, 0], so the best matches start at position 0 and 1 of the haystack.

Quiz 2.9.14: This method of searching is not universally applicable. Say we wanted to locate
the short clip [1, 2, 3] in the longer segment [1, 2, 3, 4, 5, 6]. What would the dot-product method
select as the best match?

Answer

There are 4 possible starts to our vector, and taking the dot product at each yields the
following vector:

[1 + 4 + 9, 2 + 6 + 12, 3 + 8 + 15, 4 + 10 + 18] = [14, 20, 26, 32]

By that measure, the best match is to start at 4, which is obviously not right.

Now you will write a program to carry out these dot-products.

Quiz 2.9.15: Write a procedure dot_product_list(needle,haystack) with the following


spec:
CHAPTER 2. THE VECTOR 119

• input: a short list needle and a long list haystack, both containing numbers
• output: a list of length len(haystack)-len(needle) such that entry i of the output list
equals the dot-product of the needle with the equal-length sublist of haystack starting at
position i
Your procedure should use a comprehension and use the procedure list_dot(u,v) from Quiz 2.9.4.
Hint: you can use slices as described in Section 0.5.5.

Answer

def dot_product_list(needle, haystack):


s = len(needle)
return [dot(needle, haystack[i:i+s]) for i in range(len(haystack)-s)]

First look at linear filters


In Section 2.9.3, we compared a short needle to a slice of a long haystack by turning the slice
into a vector and taking the dot-product of the slice with the needle. Here is another way to
compute the same number: we turn the needle into a longer vector by padding it with zeroes,
and then calculate the dot-product of the padded vector with the haystack:

5 -6 9 -9 -5 -9 -5 5 -8 -5 -9 9 8 -5 -9 6 -2 -4 -9 -1 -1 -9 -3

0 0 0 0 0 0 0 0 0 0 2 7 4 -3 0 -1 -6 4 5 -8 -9 0 0

We can similarly compute the dot-products corresponding to other alignments of the needle
vector with the haystack vector. This process is an example of applying a linear filter. The short
clip plays the role of the kernel of the filter. In a more realistic example, both the needle vector
and the haystack vector would be much longer. Imagine if the haystack were of length 5,000,000
and the needle were of length 50,000. We would have to compute almost 5,000,000 dot-products,
each involving about 50,000 nonzero numbers. This would take quite a while.
Fortunately, there is a computational shortcut. In Chapter 4, we observe that matrix-vector
multiplication is a convenient notation for computing the output vector w from the input vector u
and the kernel. In Chapter 10, we give an algorithm for quickly computing all these dot-products.
The algorithm draws on an idea we study further in Chapter 12.

2.9.4 Dot-product of vectors over GF (2)


We have seen some applications of dot-products of vectors over R. Now we consider dot-products
of vectors over GF (2).

Example 2.9.16: Consider the dot-product of 11111 and 10101:


CHAPTER 2. THE VECTOR 120

1 1 1 1 1
• 1 0 1 0 1
1 + 0 + 1 + 0 + 1 = 1

Next, consider the dot-product of 11111 and 00101:

1 1 1 1 1
• 0 0 1 0 1
0 + 0 + 1 + 0 + 1 = 0

In general, when you take the dot-product of an all-ones vector with a second vector, the value
is the parity of the second vector: 0 if the number of ones is even, 1 if the number of ones is
odd.

2.9.5 Parity bit


When data are stored or transmitted, errors can occur. Often a system is designed to detect such
errors if they occur infrequently. The most basic method of error detection is a parity check bit.
To reliably transmit an n-bit sequence, one computes one additional bit, the parity bit, as the
parity of the n-bit sequence, and sends that along with the n-bit sequence.
For example, the PCI (peripheral component interconnect) bus in a computer has a PAR line
that transmits the parity bit. A mismatch generally causes a processor interrupt.
Parity check has its weaknesses:
• If there are exactly two bit errors (more generally, an even number of errors), the parity
check will not detect the problem. In Section 3.6.4, I discuss checksum functions, which do
a better job of catching errors.

• In case there is a single error, parity check doesn’t tell you in which bit position the error
has occurred. In Section 4.7.3, I discuss error-correcting codes, which can locate the error.

2.9.6 Simple authentication scheme


We consider schemes that enable a human to log onto a computer over an insecure network.
Such a scheme is called an authentication scheme since it provides a way for the human to give
evidence that he is who he says he is. The most familiar such scheme is based on passwords:
Harry, the human, sends his password to Carole, the computer, and the computer verifies that
it is the correct password.
This scheme is a disaster if there is an eavesdropper, Eve, who can read the bits going over the
network. Eve need only observe one log-on before she learns the password and can subsequently
log on as Harry.
A scheme that is more secure against eavesdroppers is a challenge-response scheme: A human
tries to log on to Carole. In a series of trials, Carole repeatedly asks the human questions that
someone not possessing the password would be unlikely to answer correctly. If the human answers
each of several questions correctly, Carole concludes that the human knows the password.
Here is a simple challenge-response scheme. Suppose the password is a n-bit string, i.e. an
n-vector x̂ over GF (2), chosen uniformly at random. In the ith trial, Carole selects a nonzero
CHAPTER 2. THE VECTOR 121

Figure 2.2: Password Reuse (http://xkcd.com/792/)


CHAPTER 2. THE VECTOR 122

n-vector ai , a challenge vector, and sends it to the human. The human sends back a single bit βi ,
which is supposed to be the dot-product of ai and the password x̂, and Carole checks whether
βi = ai · x̂. If the human passes enough trials, Carole concludes that the human knows the
password, and allows the human to log in.

Example 2.9.17: The password is x̂ = 10111. Harry initiates log-in. In response, Carole
selects the challenge vector a1 = 01011 and sends it to Harry. Harry computes the dot-product
a1 · x̂:
0 1 0 1 1
• 1 0 1 1 1
0 + 0 + 0 + 1 + 1 = 0
and responds by sending the resulting bit β1 = 0 back to Carole.
Next, Carole sends the challenge vector a2 = 11110 to Harry. Harry computes the dot-
product a2 · x̂:
1 1 1 1 0
• 1 0 1 1 1
1 + 0 + 1 + 1 + 0 = 1
and responds by sending the resulting bit β2 = 1 back to Carole.
This continues for a certain number k of trials. Carole lets Harry log in if β1 = a1 · x̂, β2 =
a2 · x̂, . . . , βk = ak · x̂.

2.9.7 Attacking the simple authentication scheme


We consider how Eve might attack this scheme. Suppose she eavesdrops on m trials in which
Harry correctly responds. She learns a sequence of challenge vectors a1 , a2 , . . . , am and the
corresponding response bits β1 , β2 , . . . , βm . What do these tell Eve about the password?
Since the password is unknown to Eve, she represents it by a vector-valued variable x. Since
Eve knows that Harry correctly computed the response bits, she knows that the following linear
equations are true:
a1 · x = β1
a2 · x = β2
..
.
am · x = βm (2.4)
Perhaps Eve can compute the password by using an algorithm for Computational Problem 2.9.12,
solving a linear system! Well, perhaps she can find some solution to the system of equations but
is it the correct one? We need to consider Question 2.9.11: does the linear system have a unique
solution?
Perhaps uniqueness is too much to hope for. Eve would likely be satisfied if the number of
solutions were not too large, as long as she could compute them all and then try them out one
by one. Thus we are interested in the following Question and Computational Problem:
CHAPTER 2. THE VECTOR 123

Question 2.9.18: Number of solutions to a linear system over GF (2)


How many solutions are there to a given linear system over GF (2)?

Computational Problem 2.9.19: Computing all solutions to a linear system over GF (2)
Find all solutions to a given linear system over GF (2).

However, Eve has another avenue of attack. Perhaps even without precisely identifying the
password, she can use her knowledge of Harry’s response bits to derive the answers to future
challenges! For which future challenge vectors a can the dot-products with x be computed from
the m equations? Stated more generally:

Question 2.9.20: Does a system of linear equations imply any other linear equations? If so,
what other linear equations?

We next study properties of dot-product, one of which helps address this Question.

2.9.8 Algebraic properties of the dot-product


In this section we introduce some simple but powerful algebraic properties of the dot-product.
These hold regardless of the choice of field (e.g. R or GF (2)).

Commutativity When you take a dot-product of two vectors, the order of the two does not
matter:

Proposition 2.9.21 (Commutativity of dot-product): u · v = v · u

Commutativity of the dot-product follows from the fact that scalar-scalar multiplication is com-
mutative:

Proof

[u1 , u2 , . . . , un ] · [v1 , v2 , . . . , vn ] = u1 v 1 + u 2 v 2 + · · · + un v n
= v1 u1 + v 2 u2 + · · · + v n un
= [v1 , v2 , . . . , vn ] · [u1 , u2 , . . . , un ]

!
CHAPTER 2. THE VECTOR 124

Homogeneity The next property relates dot-product to scalar-vector multiplication: multi-


plying one of the vectors in the dot-product is equivalent to multiplying the value of the dot-
product.

Proposition 2.9.22 (Homogeneity of dot-product): (α u) · v = α (u · v)

Problem 2.9.23: Prove Proposition 2.9.22.

Problem 2.9.24: Show that (α u) · (α v) = α (u · v) is not always true by giving a counterex-


ample.

Distributivity The final property relates dot-product to vector addition.

Proposition 2.9.25 (Dot-product distributes over vector addition): (u + v) · w =


u·w+v·w

Proof

Write u = [u1 , . . . , un ], v = [v1 , . . . , vn ] and w = [w1 , . . . , wn ].

(u + v) · w = ([u1 , . . . , un ] + [v1 , . . . , vn ]) · [w1 , . . . , wn ]


= [u1 + v1 , . . . , un + vn ] · [w1 , . . . , wn ]
= (u1 + v1 )w1 + · · · + (un + vn )wn
= u1 w 1 + v 1 w 1 + · · · + u n w n + v n w n
= (u1 w1 + · · · + un wn ) + (v1 w1 + · · · + vn wn )
= [u1 , . . . , un ] · [w1 , . . . , wn ] + [v1 , . . . , vn ] · [w1 , . . . , wn ]

Problem 2.9.26: Show by giving a counterexample that (u + v) · (w + x) = u · w + v · x is


not true.

Example 2.9.27: We first give an example of the distributive property for vectors over the
CHAPTER 2. THE VECTOR 125

reals: [27, 37, 47] · [2, 1, 1] = [20, 30, 40] · [2, 1, 1] + [7, 7, 7] · [2, 1, 1]:

20 30 40
• 2 1 1
20 · 2 + 30 · 1 + 40 · 1 = 110

7 7 7
• 2 1 1
7·2 + 7·1 + 7·1 = 28
27 37 47
• 2 1 1
27 · 2 + 37 · 1 + 47 · 1 = 138

2.9.9 Attacking the simple authentication scheme, revisited


I asked in Section 2.9.7 whether Eve can use her knowledge of Harry’s responses to some challenges
to derive the answers to others. We address that question by using the distributive property for
vectors over GF (2).

Example 2.9.28: This example builds on Example 2.9.17 (Page 122). Carole had previously
sent Harry the challenge vectors 01011 and 11110, and Eve had observed that the response bits
were 0 and 1. Suppose Eve subsequently tries to log in as Harry, and Carole happens to send
her as a challenge vector the sum of 01011 and 11110. Eve can use the distributive property to
compute the dot-product of this sum with the password x even though she does not know the
password:
(01011 + 11110) · x = 01011 · x + 11110 · x
= 0 + 1
= 1
Since you know the password, you can verify that this is indeed the correct response to the
challenge vector.

This idea can be taken further. For example, suppose Carole sends a challenge vector that
is the sum of three previously observed challenge vectors. Eve can compute the response bit
(the dot-product with the password) as the sum of the responses to the three previous challenge
vectors.
Indeed, the following math shows that Eve can compute the right response to the sum of any
number of previous challenges for which she has the right response:
if a1 · x = β1
and a2 · x = β2
.. ..
. .
and ak · x = βk
then (a1 + a2 + · · · + ak ) · x = (β1 + β2 + · · · + βk )
CHAPTER 2. THE VECTOR 126

Problem 2.9.29: Eve knows the following challenges and responses:

challenge response
110011 0
101010 0
111011 1
001100 1

Show how she can derive the right responses to the challenges 011101 and 000100.

Imagine that Eve has observed hundreds of challenges a1 , . . . , an and responses β1 , . . . , βn ,


and that she now wants to respond to the challenge a. She must try to find a subset of a1 , . . . , an
whose sum equals a.
Question 2.9.20 asks: Does a system of linear equations imply any other linear equations?
The example suggests a partial answer:
if a1 · x = β1
and a2 · x = β2
.. ..
. .
and ak · x = βk
then (a1 + a2 + · · · + ak ) · x = (β1 + β2 + · · · + βk )

Therefore, from observing challenge vectors and the response bits, Eve can derive the response
to any challenge vector that is the sum of any subset of previously observed challenge vectors.
That presumes, of course, that she can recognize that the new challenge vector can be ex-
pressed as such a sum, and determine which sum! This is precisely Computational Problem 2.8.7.
We are starting to see the power of computational problems in linear algebra; the same compu-
tational problem arises in addressing solving a puzzle and attacking an authentication scheme!
Of course, there are many other settings in which this problem arises.

2.10 Our implementation of Vec


In Section 2.7, we gave the definition of a rudimentary Python class for representing vectors, and
we developed some procedures for manipulating this representation.

2.10.1 Syntax for manipulating Vecs


We will expand our class definition of Vec to provide some notational conveniences:
CHAPTER 2. THE VECTOR 127

operation syntax
vector addition u+v
vector negation -v
vector subtraction u-v
scalar-vector multiplication alpha*v
division of a vector by a scalar v/alpha
dot-product u*v
getting value of an entry v[d]
setting value of an entry v[d] = ...
testing vector equality u == v
pretty-printing a vector print(v)
copying a vector v.copy()

In addition, if an expression has as a result a Vec instance, the value of the expression will be
presented not as an obscure Python incantation
>>> v
<__main__.Vec object at 0x10058cad0>
but as an expression whose value is a vector:
>>> v
Vec({'A', 'B', 'C'},{'A': 1.0})

2.10.2 The implementation


In Problem 2.14.10, you will implement Vec. However, since this book is not about the intricacies
of defining classes in Python, you need not write the class definition; it will be provided for you.
All you need to do is fill in the missing bodies of some procedures, most of which you wrote in
Section 2.7.

2.10.3 Using Vecs


You will write the bodies of named procedures such as setitem(v, d, val) and add(u,v) and
scalar mul(v, alpha). However, in actually using Vecs in other code, you must use operators
instead of named procedures, e.g.
>>> v['a'] = 1.0
instead of
>>> setitem(v, 'a', 1.0)
and
>>> b = b - (b*v)*v
instead of
>>> b = add(b, neg(scalar_mul(v, dot(b,v))))
CHAPTER 2. THE VECTOR 128

In fact, in code outside the vec module that uses Vec, you will import just Vec from the vec
module:
from vec import Vec
so the named procedures will not be imported into the namespace. Those named procedures in
the vec module are intended to be used only inside the vec module itself.

2.10.4 Printing Vecs


The class Vec defines a procedure that turns an instance into a string for the purpose of printing:
>>> print(v)

A B C
------
1 0 0
The procedure for pretty-printing a vector v must select some order on the domain v.D. Ours uses
sorted(v.D, key=hash), which agrees with numerical order on numbers and with alphabetical
order on strings, and which does something reasonable on tuples.

2.10.5 Copying Vecs


The Vec class defines a .copy() method. This method, called on an instance of Vec, returns a
new instance that is equal to the old instance. It shares the domain .D with the old instance.
but has a new function .f that is initially equal to that of the old instance.
Ordinarily you won’t need to copy Vecs. The scalar-vector multiplication and vector addition
operations return new instances of Vec and do not mutate their inputs.

2.10.6 From list to Vec


The Vec class is a useful way of representing vectors, but it is not the only such representation. As
mentioned in Section 2.1, we will sometimes represent vectors by lists. A list L can be viewed as a
function from {0, 1, 2, . . . , len(L) − 1}, so it is possible to convert from a list-based representation
to a dictionary-based representation.

Quiz 2.10.1: Write a procedure list2vec(L) with the following spec:


• input: a list L of field elements
• output: an instance v of Vec with domain {0, 1, 2, . . . , len(L) − 1} such that v[i] = L[i]
for each integer i in the domain

Answer

def list2vec(L):
CHAPTER 2. THE VECTOR 129

return Vec(set(range(len(L))), {k:x for k,x in enumerate(L)})


or

def list2vec(L):
return Vec(set(range(len(L))), {k:L[k] for k in range(len(L))})

This procedure facilitates quickly creating small Vec examples. The procedure definition is
included in the provided file vecutil.py.

2.11 Solving a triangular system of linear equations


As a step towards Computational Problem 2.9.12 (Solving a linear system), we describe an
algorithm for solving a system if the system has a special form.

2.11.1 Upper-triangular systems


A upper-triangular system of linear equations has the form

[ a11 , a12 , a13 , a14 , ··· a1,n−1 , a1,n ] · x = β1


[ 0, a22 , a23 , a24 , ··· a2,n−1 , a2,n ] · x = β2
[ 0, 0, a33 , a34 , ··· a3,n−1 , a3,n ] · x = β3
..
.
[ 0, 0, 0, 0, ··· an−1,n−1 , an−1,n ] · x = βn−1
[ 0, 0, 0, 0, ··· 0, an,n ] · x = βn

That is,

• the first vector need not have any zeroes,


• the second vector has a zero in the first position,

• the third vector has zeroes in the first and second positions,

• the fourth vector has zeroes in the first, second, and third positions,

..
.

• the n − 1st vector is all zeroes except possibly for the n − 1st and nth entries, and
• the nth vector is all zeroes except possibly for the nth entry.
CHAPTER 2. THE VECTOR 130

Example 2.11.1: Here’s an example using 4-vectors:

[ 1, 0.5, −2, 4 ] · x = −8
[ 0, 3, 3, 2 ]·x = 3
[ 0, 0, 1, 5 ] · x = −4
[ 0, 0, 0, 2 ]·x = 6

The right-hand sides are -8, 3, -4, and 6.

The origin of the term upper-triangular system should be apparent by considering the positions
of the nonzero entries: they form a triangle:

Writing x = [x1 , x2 , x3 , x4 ] and using the definition of dot-product, we can rewrite this system
as four ordinary equations in the (scalar) unknowns x1 , x2 , x3 , x4 :
1x1 + 0.5x2 − 2x3 + 4x4 = −8
3x2 + 3x3 + 2x4 = 3
1x3 + 5x4 = −4
2x4 = 6

2.11.2 Backward substitution


This suggests a solution strategy. First, solve for x4 using the fourth equation. Plug the resulting
value for x4 into the third equation, and solve for x3 . Plug the values for x3 and x4 into the
second equation and solve for x2 . Plug the values for x2 , x3 , and x4 into the first equation and
solve for x1 . In each iteration, only one variable needs to be solved for.
Thus the above system is solved as follows:
2x4 = 6
so x4 = 6/2 = 3

1x3 = −4 − 5x4 = −4 − 5(3) = −19


so x3 = −19/1 = −19

3x2 = 3 − 3x3 − 2x4 = 3 − 2(3) − 3(−19) = 54


so x2 = 54/3 = 18

1x1 = −8 − 0.5x2 + 2x3 − 4x4 = −8 − 4(3) + 2(−19) − 0.5(18) = −67


so x1 = −67/1 = −67
The algorithm I have illustrated is called backward substitution (“backward” because it starts
with the last equation and works its way towards the first).
CHAPTER 2. THE VECTOR 131

Quiz 2.11.2: Using the above technique, solve the following system by hand:

2x1 + 3x2 − 4x3 = 10


1x2 + 2x3 = 3
5x3 = 15

Answer

x3 = 15/5 = 3
x2 = 3 − 2x3 = −3
x1 = (10 + 4x3 − 3x2 )/2 = (10 + 12 + 9)/2 = 31/2

Exercise 2.11.3: Solve the following system:

1x1 − 3x2 − 2x3 = 7


2x2 + 4x3 = 4
−10x3 = 12

2.11.3 First implementation of backward substitution


There is a convenient way to express this algorithm in terms of vectors and dot-products. The
procedure initializes the solution vector x to the all-zeroes vector. The procedure will populate
x entry by entry, starting at the last entry. By the beginning of the entry in which xi will be
populated, entries xi+1 , xi+2 , . . . , xn will have already been populated and the other entries are
zero, so the procedure can use a dot-product to calculate the part of the expression that involves
variables whose values are already known:

entry aii · value of xi = βi − (expression involving known variables)

so
βi − (expression involving known variables)
value of xi =
aii
Using this idea, let’s write a procedure triangular_solve_n(rowlist, b) with the following
spec:
• input: for some integer n, a triangular system consisting of a list rowlist of n-vectors, and
a length-n list b of numbers
• output: a vector x̂ such that, for i = 0, 1, . . . , n − 1, the dot-product of rowlist[i] with x̂
equals b[i]
The n in the name indicates that this procedure requires each of the vectors in rowlist to have
domain {0, 1, 2, . . . , n − 1}. (We will later write a procedure without this requirement.)
CHAPTER 2. THE VECTOR 132

Here is the code:


def triangular_solve_n(rowlist, b):
D = rowlist[0].D
n = len(D)
assert D == set(range(n))
x = zero_vec(D)
for i in reversed(range(n)):
x[i] = (b[i] - rowlist[i] * x)/rowlist[i][i]
return x

Exercise 2.11.4: Enter triangular_solve_n into Python and try it out on the example
system above.
CHAPTER 2. THE VECTOR 133

2.11.4 When does the algorithm work?


The backward substitution algorithm does not work on all upper triangular systems of equations.
If rowlist[i][i] is zero for some i, the algorithm will fail. We must therefore require when
using this algorithm that these entries are not zero. Thus the spec given above is incomplete.
If these entries are nonzero so the algorithm does succeed, it will have found the only solution
to the system of linear equations. The proof is by induction; it is based on the observation that
the value assigned to a variable in each iteration is the only possible value for that variable that
is consistent with the values assigned to variables in previous iterations.

Proposition 2.11.5: For a triangular system specified by a length-n list rowlist of n-vectors
and an n-vector b, if rowlist[i][i] 2= 0 for i = 0, 1, . . . , n − 1 then the solution found by
triangular_solve_n(rowlist, b) is the only solution to the system.

On the other hand,

Proposition 2.11.6: For a length-n list rowlist of n-vector, if rowlist[i][i] = 0 for some
integer i then there is a vector b for which the triangular system has no solution.

Proof

Let k be the largest integer less than n such that rowlist[k][k] = 0. Define b to be a
vector whose entries are all zero except for entry k which is nonzero. The algorithm iterates
i = n − 1, n − 2, . . . , k + 1. In each of these iterations, the value of x before the iteration is
the zero vector, and b[i] is zero, so x[i] is assigned zero. In each of these iterations, the
value assigned is is the only possible value consistent with the values assigned to variables
in previous iterations.
Finally, the algorithm gets to i = k. The equation considered at this point is

rowlist[k][k]*x[k]+rowlist[k][k+1]*x[k+1]+ · · · +rowlist[k][n-1]*x[n-1] = nonzero

but the variables x[k+1], x[k+2], x[n-1] have all been forced to be zero, and rowlist[k][k]
is zero, so the left-hand side of the equation is zero, so the equation cannot be satisfied. !

2.11.5 Backward substitution with arbitrary-domain vectors


Next we write a procedure triangular_solve(rowlist, label_list, b) to solve a triangular
system in which the domain of the vectors in rowlist need not be {0, 1, 2, . . . , n − 1}. What
does it mean for a system to be triangular? The argument label_list is a list that specifies an
ordering of the domain. For the system to be triangular,
• the first vector in rowlist need not have any zeroes,

• the second vector has a zero in the entry labeled by the first element of label_list,
CHAPTER 2. THE VECTOR 134

• the third vector has zeroes in the entries labeled by the first two elements of label_list,
and so on.
The spec of the procedure is:
• input: for some positive integer n, a list rowlist of n Vecs all having the same n-element
domain D, a list label list consisting of the elements of D, and a list b consisting of n
numbers such that, for i = 0, 1, . . . , n − 1,
– rowlist[i][label list[j]] is zero for j = 0, 1, 2, . . . , i − 1 and is nonzero for j = i
• output: the Vec x such that, for i = 0, 1, . . . , n − 1, the dot-product of rowlist[i] and x
equals b[i].
The procedure involves making small changes to the procedure given in Section 2.11.3.
Here I illustrate how the procedure is used.
>>> label_list = ['a','b','c','d']
>>> D = set(label_list)
>>> rowlist=[Vec(D,{'a':4, 'b':-2,'c':0.5,'d':1}), Vec(D,{'b':2,'c':3,'d':3}),
Vec(D,{'c':5, 'd':1}), Vec(D,{'d':2.})]
>>> b = [6, -4, 3, -8]
>>> triangular_solve(rowlist, label_list, b)
Vec({'d', 'b', 'c', 'a'},{'d': -4.0, 'b': 1.9, 'c': 1.4, 'a': 3.275})
Here is the code for triangular solve. Note that it uses the procedure zero vec(D).
def triangular_solve(rowlist, label_list, b):
D = rowlist[0].D
x = zero_vec(D)
for j in reversed(range(len(D))):
c = label_list[j]
row = rowlist[j]
x[c] = (b[j] - x*row)/row[c]
return x
The procedures triangular solve(rowlist, label list, b) and
triangular solve n(rowlist, b) are provided in the module triangular.

2.12 Lab: Comparing voting records using dot-product


In this lab, we will represent a US senator’s voting record as a vector over R, and will use
dot-products to compare voting records. For this lab, we will just use a list to represent a
vector.

2.12.1 Motivation
These are troubled times. You might not have noticed from atop the ivory tower, but take
our word for it that the current sociopolitical landscape is in a state of abject turmoil. Now
CHAPTER 2. THE VECTOR 135

is the time for a hero. Now is the time for someone to take up the mantle of protector, of
the people’s shepherd. Now is the time for linear algebra.
In this lab, we will use vectors to evaluate objectively the political mindset of the senators
who represent us. Each senator’s voting record can be represented as a vector, where each
element of that vector represents how that senator voted on a given piece of legislation. By
looking at the difference between the “voting vectors” of two senators, we can dispel the fog
of politics and see just where our representatives stand.
Or, rather, stood. Our data are a bit dated. On the bright side, you get to see how
Obama did as a senator. In case you want to try out your code on data from more recent
years, we will post more data files on resources.codingthematrix.com.

2.12.2 Reading in the file


As in the last lab, the information you need to work with is stored in a whitespace-
delimited text file. The senatorial voting records for the 109th Congress can be found
in voting record dump109.txt.
Each line of the file represents the voting record of a different senator. In case you’ve
forgotten how to read in the file, you can do it like this:

>>> f = open('voting_record_dump109.txt')
>>> mylist = list(f)

You can use the split(·) procedure to split each line of the file into a list; the first
element of the list will be the senator’s name, the second will be his/her party affiliation (R
or D), the third will be his/her home state, and the remaining elements of the list will be
that senator’s voting record on a collection of bills. A “1” represents a ’yea’ vote, a “-1” a
’nay’, and a “0” an abstention.

Task 2.12.1: Write a procedure create voting dict(strlist) that, given a list of
strings (voting records from the source file), returns a dictionary that maps the last name
of a senator to a list of numbers representing that senator’s voting record. You will need to
use the built-in procedure int(·) to convert a string representation of an integer (e.g. ‘1’)
to the actual integer (e.g. 1).

2.12.3 Two ways to use dot-product to compare vectors


Suppose u and v are two vectors. Let’s take the simple case (relevant to the current lab) in
which the entries are all 1, 0, or -1. Recall that the dot-product of u and v is defined as
%
u·v = u[k]v[k]
k

Consider the k th entry. If both u[k] and v[k] are 1, the corresponding term in the sum is 1.
If both u[k] and v[k] are -1, the corresponding term in the sum is also 1. Thus a term in the
CHAPTER 2. THE VECTOR 136

sum that is 1 indicates agreement. If, on the other hand, u[k] and v[k] have different signs,
the corresponding term is -1. Thus a term in the sum that is -1 indicates disagreement. (If
one or both of u[k] and v[k] are zero then the term is zero, reflecting the fact that those
entries provide no evidence of either agreement or disagreement.) The dot-product of u and
v therefore is a measure of how much u and v are in agreement.

2.12.4 Policy comparison


We would like to determine just how like-minded two given senators are. We will use the
dot-product of vectors u and v to judge how often two senators are in agreement.

Task 2.12.2: Write a procedure policy compare(sen a, sen b, voting dict) that,
given two names of senators and a dictionary mapping senator names to lists representing
voting records, returns the dot-product representing the degree of similarity between two
senators’ voting policies.

Task 2.12.3: Write a procedure most similar(sen, voting dict) that, given the name
of a senator and a dictionary mapping senator names to lists representing voting records,
returns the name of the senator whose political mindset is most like the input senator
(excluding, of course, the input senator him/herself).

Task 2.12.4: Write a very similar procedure least similar(sen, voting dict) that
returns the name of the senator whose voting record agrees the least with the senator whose
name is sen.

Task 2.12.5: Use these procedures to figure out which senator is most like Rhode Island
legend Lincoln Chafee. Then use these procedures to see who disagrees most with Pennsyl-
vania’s Rick Santorum. Give their names.

Task 2.12.6: How similar are the voting records of the two senators from your favorite
state?

2.12.5 Not your average Democrat


CHAPTER 2. THE VECTOR 137

Task 2.12.7: Write a procedure find average similarity(sen, sen set, voting dict)
that, given the name sen of a senator, compares that senator’s voting record to the voting
records of all senators whose names are in sen set, computing a dot-product for each, and
then returns the average dot-product.
Use your procedure to compute which senator has the greatest average similarity with
the set of Democrats (you can extract this set from the input file).

In the last task, you had to compare each senator’s record to the voting record of each
Democrat senator. If you were doing the same computation with, say, the movie preferences
of all Netflix subscribers, it would take far too long to be practical.
Next we see that there is a computational shortcut, based on an algebraic property of
the dot-product: the distributive property:

(v1 + v2 ) · x = v1 · x + v2 · x

Task 2.12.8: Write a procedure find average record(sen set, voting dict) that,
given a set of names of senators, finds the average voting record. That is, perform vector
addition on the lists representing their voting records, and then divide the sum by the number
of vectors. The result should be a vector.
Use this procedure to compute the average voting record for the set of Democrats, and
assign the result to the variable average Democrat record. Next find which senator’s
voting record is most similar to the average Democrat voting record. Did you get the same
result as in Task 2.12.7? Can you explain?

2.12.6 Bitter Rivals


Task 2.12.9: Write a procedure bitter rivals(voting dict) to find which two sena-
tors disagree the most.

This task again requires comparing each pair of voting records. Can this be done faster than
the obvious way? There is a slightly more efficient algorithm, using fast matrix multiplication.
We will study matrix multiplication later, although we won’t cover the theoretically fast
algorithms.

2.12.7 Open-ended study


You have just coded a set of simple yet powerful tools for sifting the truth from the sordid
flour of contemporary politics. Use your new abilities to answer at least one of the following
questions (or make up one of your own):

• Who/which is the most Republican/Democratic senator/state?


• Is John McCain really a maverick?
CHAPTER 2. THE VECTOR 138

• Is Barack Obama really an extremist?


• Which two senators are the most bitter rivals?

• Which senator has the most political opponents? (Assume two senators are opponents
if their dot-product is very negative, i.e. is less than some negative threshold.)

2.13 Review Questions


• What is vector addition?
• What is the geometric interpretation of vector addition?

• What is scalar-vector multiplication?


• What is the distributive property that involves scalar-vector multiplication but not vector
addition?

• What is the distributive property that involves both scalar-vector multiplication and vector
addition?

• How is scalar-vector multiplication used to represent the line through the origin and a given
point?

• How are scalar-vector multiplication and vector addition used to represent the line through
a pair of given points?

• What is dot-product?
• What is the homogeneity property that relates dot-product to scalar-vector multiplication?

• What is the distributive property property that relates dot-product to vector addition?
• What is a linear equation (expressed using dot-product)?
• What is a linear system?
• What is an upper-triangular linear system?

• How can one solve an upper-triangular linear system?

2.14 Problems
Vector addition practice
Problem 2.14.1: For vectors v = [−1, 3] and u = [0, 4], find the vectors v + u, v − u, and
3v − 2u. Draw these vectors as arrows on the same graph..
CHAPTER 2. THE VECTOR 139

Problem 2.14.2: Given the vectors v = [2, −1, 5] and u = [−1, 1, 1], find the vectors v + u,
v − u, 2v − u, and v + 2u.

Problem 2.14.3: For the vectors v = [0, one, one] and u = [one, one, one] over GF (2), find
v + u and v + u + u.

Expressing one GF (2) vector as a sum of others


Problem 2.14.4: Here are six 7-vectors over GF (2):
a= 1100000 d= 0001100
b= 0110000 e= 0000110
c= 0011000 f= 0000011
For each of the following vectors u, find a subset of the above vectors whose sum is u, or report
that no such subset exists.

1. u = 0010010

2. u = 0100010

Problem 2.14.5: Here are six 7-vectors over GF (2):


a= 1110000 d= 0001110
b= 0111000 e= 0000111
c= 0011100 f= 0000011
For each of the following vectors u, find a subset of the above vectors whose sum is u, or report
that no such subset exists.
1. u = 0010010

2. u = 0100010

Finding a solution to linear equations over GF (2)


Problem 2.14.6: Find a vector x = [x1 , x2 , x3 , x4 ] over GF (2) satisfying the following linear
equations:

1100 · x = 1
1010 · x = 1
1111 · x = 1
CHAPTER 2. THE VECTOR 140

Show that x + 1111 also satisfies the equations.

Formulating equations using dot-product


Problem 2.14.7: Consider the equations

2x0 + 3x1 − 4x2 + x3 = 10


x0 − 5x1 + 2x2 + 0x3 = 35
4x0 + x1 − x2 − x3 = 8

Your job is not to solve these equations but to formulate them using dot-product. In particular,
come up with three vectors v1, v2, and v3 represented as lists so that the above equations are
equivalent to

v1 · x = 10
v2 · x = 35
v3 · x = 8

where x is a 4-vector over R.

Plotting lines and line segments


Problem 2.14.8: Use the plot module to plot

(a) a substantial portion of the line through [-1.5,2] and [3,0], and
(b) the line segment between [2,1] and [-2,2].

For each, provide the Python statements you used and the plot obtained.

Practice with dot-product


Problem 2.14.9: For each of the following pairs of vectors u and v over R, evaluate the
expression u · v:
(a) u = [1, 0], v = [5, 4321]
(b) u = [0, 1], v = [12345, 6]

(c) u = [−1, 3], v = [5, 7]


√ √ √ √
2 2 2 2
(d) u = [− 2 , 2 ], v =[ 2 ,− 2 ]

Writing procedures for the Vec class


CHAPTER 2. THE VECTOR 141

Problem 2.14.10: Download the file vec.py to your computer, and edit it. The file defines
procedures using the Python statement pass, which does nothing. You can import the vec
module and create instancs of Vec but the operations such as * and + currently do nothing.
Your job is to replace each occurence of the pass statement with appropriate code. Your code
for a procedure can include calls to others of the seven. You should make no changes to the
class definition.

Docstrings At the beginning of each procedure body is a multi-line string (deliminated by


triple quote marks). This is called a documentation string (docstring). It specifies what the
procedure should do.

Doctests The documentation string we provide for a procedure also includes examples of the
functionality that procedure is supposed to provide to Vecs. The examples show an interaction
with Python: statements and expressions are evaluated by Python, and Python’s responses are
shown. These examples are provided to you as tests (called doctests). You should make sure that
your procedure is written in such a way that the behavior of your Vec implementation matches
that in the examples. If not, your implementation is incorrect.a
Python provides convenient ways to test whether a module such as vec passes all its doctests.
You don’t even need to be in a Python session. From a console, make sure your current working
directory is the one containing vec.py, and type

python3 -m doctest vec.py


to the console, where python3 is the name of your Python executable. If your implementation
passes all the tests, this command will print nothing. Otherwise, the command prints information
on which tests were failed.
You can also test a module’s doctest from within a Python session:

>>> import doctest


>>> doctest.testfile("vec.py")

Assertions For most of the procedures to be written, the first statement after the docstring
is an assertion. Executing an assertion verifies that the condition is true, and raises an error if
not. The assertions are there to detect errors in the use of the procedures. Take a look at the
assertions to make sure you understand them. You can take them out, but you do so at your
own risk.

Arbitrary set as domain: Our vector implementation allows the domain to be, for example,
a set of strings. Do not make the mistake of assuming that the domain consists of integers. If
your code includes len or range, you’re doing it wrong.

Sparse representation: Your procedures should be able to cope with our sparse represen-
tation, i.e. an element in the domain v.D that is not a key of the dictionary v.f. For example,
getitem(v, k) should return a value for every domain element even if k is not a key of v.f.
CHAPTER 2. THE VECTOR 142

However, your procedures need not make any effort to retain sparsity when adding two vectors.
That is, for two instances u and v of Vec, it is okay if every element of u.D is represented
explicitly in the dictionary of the instance u+v.
Several other procedures need to be written with the sparsity convention in mind. For
example, two vectors can be equal even if their .f fields are not equal: one vector’s .f field can
contain a key-value pair in which the value is zero, and the other vector’s .f field can omit this
particular key. For this reason, the equal(u, v) procedure needs to be written with care.
a The examples provided for each procedure are supposed to test that procedure; however, note that, since

equality is used in tests for procedures other than equal(u,v), a bug in your definition for equal(u,v) could
cause another procedure’s test to fail.
Chapter 3

The Vector Space

[Geometry of the ancients] ... is so


exclusively restricted to the consideration
of figures that it can exercise the
understanding only on condition of
greatly fatiguing the imagination....
René Descartes, Discourse on Method

In the course of discussing applications of vectors in the previous chapter, we encountered four
Questions. We will soon encounter two more. However, we won’t answer any of the Questions in
this chapter; instead, we will turn them into newer and deeper Questions. The answers will come
in Chapters 5 and 6. In this chapter, we will encounter the concept of vector spaces, a concept
that underlies the answers and everything else we do in this book.

3.1 Linear combination


3.1.1 Definition of linear combination
Definition 3.1.1: Suppose v1 , . . . , vn are vectors. We define a linear combination of v1 , . . . , vn
to be a sum
α 1 v1 + · · · + αn vn
where α1 , . . . , αn are scalars. In this context, we refer to α1 , . . . , αn as the coefficients in this
linear combination. In particular, α1 is the coefficient of v1 in the linear combination, α2 is the
coefficient of v2 , and so on.

Example 3.1.2: One linear combination of [2, 3.5] and [4, 10] is

−5 [2, 3.5] + 2 [4, 10]

143
CHAPTER 3. THE VECTOR SPACE 144

which is equal to [−5 · 2, −5 · 3.5] + [2 · 4, 2 · 10], which is equal to [−10, −17.5] + [8, 20], which
is [−2, 2.5].
Another linear combination of the same vectors is

0 [2, 3.5] + 0 [4, 10]

which is equal to the zero vector [0, 0].

If all the coefficients in a linear combination are zero, we say that it is a trivial linear combi-
nation

3.1.2 Uses of linear combinations


Example 3.1.3: Stock portfolios: Let D be the set of stocks. A D-vector over R represents a
portfolio, i.e. it maps each stock to a number of shares owned.
Suppose that there are n mutual funds. For i = 1, . . . , n, each share of mutual fund i
represents ownership of a certain amount of each stock, and can therefore be represented by a
D-vector vi . Let αi be the number of shares of mutual fund i that you own. Then your total
implied ownership of stocks is represented by the linear combination

α 1 v1 + · · · + αn vn

Example 3.1.4: Diet design: In the 1930’s and 1940’s the US military wanted to find the
minimum-cost diet that would satisfy a soldier’s nutritional requirements. An economist, George
Stigler, considered seventy-seven different foods (wheat flour, evaporated milk, cabbage ...) and
nine nutritional requirements (calories, Vitamin A, riboflavin...). For each food, he calculated
how much a unit of that food satisfied each of nine nutritional requirements. The results can be
represented by seventy-seven 9-vectors vi , one for each food.
A possible diet is represented by an amount of each food: one pound wheat flour, half a
pound of cabbage, etc. For i = 1, . . . , 77, let αi be the amount of food i specified by the diet.
Then the linear combination
α1 v1 + · · · + α77 v77
represents the total nutritional value provided by that diet.
In Chapter 13, we will study how to find the minimum-cost diet achieving specified nutritional
goals.

Example 3.1.5: Average face: As mentioned in Section 2.3, black-and-white images, e.g. of
faces, can be stored as vectors. A linear combination of three such vectors, with coefficients 1/3
and 1/3 and 1/3, yields an average of the three faces.
CHAPTER 3. THE VECTOR SPACE 145

1 1 1
3 + 3 + 3 =

The idea of average faces arises later in the book, when we describe a method for face detection.

Example 3.1.6: Products and resources: The JunkCo factory makes things using five re-
sources: metal, concrete, plastic, water, and electricity. Let D be this set of resources. The
factory has the ability to make five different products.

Here is a fabricated table that shows how much of each resource is used in making each product,
on a per-item basis:

metal concrete plastic water electricity


garden gnome 0 1.3 .2 .8 .4
hula hoop 0 0 1.5 .4 .3
slinky .25 0 0 .2 .7
silly putty 0 0 .3 .7 .5
salad shooter .15 0 .5 .4 .8

The ith product’s resource utilization is stored in a D-vector vi over R. For example, a gnome
is represented by

vgnome = Vec(D,{'concrete':1.3,'plastic':.2,'water':.8,'electricity':.4})

Suppose the factory plans to make αgnome garden gnomes, αhoop hula hoops, αslinky slinkies,
αputty silly putties, and αshooter salad shooters. The total resource utilization is expressed as a
linear combination

αgnome vgnome + αhoop vhoop + αslinky vslinky + αputty vputty + αshooter vshooter

For example, suppose JunkCo decides to make 240 gnomes, 55 hoops, 150 slinkies, 133 putties,
and 90 shooters. Here’s how the linear combination can be written in Python using our Vec
class:
CHAPTER 3. THE VECTOR SPACE 146

>>> D = {'metal','concrete','plastic','water','electricity'}
>>> v_gnome = Vec(D,{'concrete':1.3,'plastic':.2,'water':.8,'electricity':.4})
>>> v_hoop = Vec(D, {'plastic':1.5, 'water':.4, 'electricity':.3})
>>> v_slinky = Vec(D, {'metal':.25, 'water':.2, 'electricity':.7})
>>> v_putty = Vec(D, {'plastic':.3, 'water':.7, 'electricity':.5})
>>> v_shooter = Vec(D, {'metal':.15, 'plastic':.5, 'water':.4,'electricity':.8})

>>> print(240*v_gnome + 55*v_hoop + 150*v_slinky + 133*v_putty + 90*v_shooter)

plastic metal concrete water electricity


-----------------------------------------
215 51 312 373 356

We build on this example in the next section.

3.1.3 From coefficients to linear combination


For a length-n list [v1 , . . . , vn ] of vectors, there is a function f that maps each length-n list
[α1 , . . . , αn ] of coefficients to the corresponding linear combination α1 v1 + · · · + αn vn . As
discussed in Section 0.3.2, there are two related computational problems, the forward problem
(given an element of the domain, find the image under the function) and the backward problem
(given an element of the co-domain, find any pre-image if there is one).
Solving the forward problem is easy.

Quiz 3.1.7: Define a procedure lin_comb(vlist, clist) with the following spec:

• input: a list vlist of vectors, a list clist of the same length consisting of scalars
• output: the vector that is the linear combination of the vectors in vlist with corresponding
coefficients clist

Answer

def lin_comb(vlist,clist):
return sum([coeff*v for (coeff,v) in zip(clist, vlist)])
or

def lin_comb(vlist,clist):
return sum([clist[i]*vlist[i] for i in range(len(vlist))])

For example, the JunkCo factory can use this procedure for the forward problem: given an
amount of each product, the factory can compute how much of each resource will be required.
CHAPTER 3. THE VECTOR SPACE 147

3.1.4 From linear combination to coefficients


Suppose, however, you are an industrial spy. Your goal is to figure out how many garden gnomes
the JunkCo factory is manufacturing. To do this, you can sneakily observe how much of each
resource the factory is consuming. That is, you can acquire the vector b that is the output of
the function f .
The first question is: can you solve the backward problem? That is, can you obtain a pre-
image of b under f ? The second question is: how can we tell whether there is a single solution?
If there are multiple pre-images of b, we cannot be confident that we have calculated the true
number of garden gnomes.
The first question is a computational problem:

Computational Problem 3.1.8: Expressing a given vector as a linear combination of other


given vectors

• input: a vector b and a list [v1 , . . . , vn ] of n vectors


• output: a list [α1 , . . . , αn ] of coefficients such that

b = α1 v 1 + · · · + α n v n

or a report that none exists.

In Chapter 4, we will see that finding a linear combination of given vectors v1 , . . . , vn that equals
a given vector b is equivalent to solving a linear system. Therefore the above Computational
Problem is equivalent to Computational Problem 2.9.12, Solving a system of linear equations,
and the question of whether there is at most a single solution is equivalent to Question 2.9.11,
Uniqueness of solutions to systems of linear equations.

Example 3.1.9: (Lights Out) We saw in Section 2.8.3 that the state of the Lights Out puzzle
could be represented by a vector over GF (2), and that each button corresponds to a “button”
vector over GF (2).
Let s denote the initial state of the puzzle. We saw that finding a solution to the puzzle
(which buttons to push in order to turn off all the lights) is equivalent to finding a subset of the
button vectors whose sum is s.
We can in turn formulate this problem using the notion of linear combinations. Over GF (2),
the only coefficients are zero and one. A linear combination of the twenty-five button vectors

α0,0 v0,0 + α0,1 v0,1 + · · · + α4,4 v4,4

is the sum of some subset of button vectors, namely those whose corresponding coefficents are
one.
Our goal, then, is to find a linear combination of the twenty-five button vectors whose value
is s:
s = α0,0 v0,0 + α0,1 v0,1 + · · · + α4,4 v4,4 (3.1)
CHAPTER 3. THE VECTOR SPACE 148

That is, once again we must solve Computational Problem 3.1.8.

Quiz 3.1.10: For practice, we use the 2 × 2 version of Lights Out. Show how to express

s= as a linear combination of the button vectors

• • • • • •
• • • • • •

Answer

• • • • • • •
= 1 + 0 + 0 + 1
• • • • • • •

3.2 Span
3.2.1 Definition of span
Definition 3.2.1: The set of all linear combinations of vectors v1 , . . . , vn is called the span of
these vectors, and is written Span {v1 , . . . , vn }.

For vectors over infinite fields such as R or over C, the span is usually an infinite set. In the
next section, we discuss the geometry of such a set. For vectors over GF (2), a finite field, the
span is finite.

Quiz 3.2.2: How many vectors are in Span {[1, 1], [0, 1]} over the field GF (2)?

Answer

The linear combinations are

0 [1, 1] + 0 [0, 1] = [0, 0]


0 [1, 1] + 1 [0, 1] = [0, 1]
1 [1, 1] + 0 [0, 1] = [1, 1]
1 [1, 1] + 1 [0, 1] = [1, 0]

Thus there are four vectors in the span.


CHAPTER 3. THE VECTOR SPACE 149

Quiz 3.2.3: How many vectors are in Span {[1, 1]} over the field GF (2)?

Answer

The linear combinations are

0 [1, 1] = [0, 0]
1 [1, 1] = [1, 1]

Thus there are two vectors in the span.

Quiz 3.2.4: How many vectors are in the span of an empty set of 2-vectors?

Answer

Don’t make the mistake of thinking that there are no linear combinations, i.e. no assignments
of numbers to coefficients. There is one such assignment: the empty assignment. Taking
the sum of this empty set of vectors (and thinking back to Problem 1.7.9), we obtain [0, 0].

Quiz 3.2.5: How many vectors are in the span of the 2-vector [2, 3] over R?

Answer

There are an infinite number. The span is {α [2, 3] : α ∈ R}, which, as we saw in
Section 2.5.3, consists of the points on the line through the origin and [2, 3].

Quiz 3.2.6: For which 2-vector v over R does Span {v} consists of a finite number of vectors?

Answer

The zero vector [0, 0].

3.2.2 A system of linear equations implies other equations


CHAPTER 3. THE VECTOR SPACE 150

Example 3.2.7: Recall the simple authentication scheme from Section 2.9.6. The secret pass-
word is a vector x̂ over GF (2). The computer tests the human’s knowledge of the password by
sending a challenge vector a; the human must respond with the dot-product a · x̂.
Meanwhile, the eavesdropper, Eve, is observing all their communication. Suppose Eve has
observed the challenges a1 = [1, 1, 1, 0, 0], a2 = [0, 1, 1, 1, 0], a3 = [0, 0, 1, 1, 1] and the corre-
sponding responses β1 = 1, β2 = 0, β3 = 1. For what possible challenge vectors can Eve derive
the right response?
We consider all linear combinations of a1 , a2 , a3 . Since there are three vectors, there are
three coefficients α1 , α2 , α3 to choose. For each coefficient αi , there are two choices, 0 and 1.
Therefore there are eight vectors in the span. Here is a table of them:

0 [1, 1, 1, 0, 0] + 0 [0, 1, 1, 1, 0] + 0 [0, 0, 1, 1, 1] = [0, 0, 0, 0, 0]


1 [1, 1, 1, 0, 0] + 0 [0, 1, 1, 1, 0] + 0 [0, 0, 1, 1, 1] = [1, 1, 1, 0, 0]
0 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 0 [0, 0, 1, 1, 1] = [0, 1, 1, 1, 0]
1 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 0 [0, 0, 1, 1, 1] = [1, 0, 0, 1, 0]
0 [1, 1, 1, 0, 0] + 0 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1] = [0, 0, 1, 1, 1]
1 [1, 1, 1, 0, 0] + 0 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1] = [1, 1, 0, 1, 1]
0 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1] = [0, 1, 0, 0, 1]
1 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1] = [1, 0, 1, 0, 1]

If the challenge is in the span, Eve can calculate the right response to it. For example, suppose
the challenge is [1, 0, 1, 0, 1], the last vector in the table. We see from the table that

[1, 0, 1, 0, 1] = 1 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1]

Therefore

[1, 0, 1, 0, 1] · x̂ = (1 [1, 1, 1, 0, 0] + 1 [0, 1, 1, 1, 0] + 1 [0, 0, 1, 1, 1]) · x̂


= 1 [1, 1, 1, 0, 0] · x̂ + 1 [0, 1, 1, 1, 0] · x̂ + 1 [0, 0, 1, 1, 1] · x̂ by distributivity
= 1 ([1, 1, 1, 0, 0] · x̂) + 1 ([0, 1, 1, 1, 0] · x̂) + 1 ([0, 0, 1, 1, 1] · x̂) by homogeneity
= 1β1 + 1β2 + 1β3
=1·1+1·0+1·1
=0

More generally, if you know that a vector x̂ satisfies linear equations


a1 · x = β1
..
.
am · x = βm

over any field then you can calculate the dot-product with x̂ of any vector a that is in the span
CHAPTER 3. THE VECTOR SPACE 151

of a1 , . . . , am .
Suppose a = α1 a1 + · · · + αm am . Then

a · x = (α1 a1 + · · · + αm am ) · x
= α1 a1 · x + · · · + αm am · x by distributivity
= α1 (a1 · x) + · · · + αm (am · x) by homogeneity
= α1 β1 + · · · + αm βm

This math addresses Question 2.9.20: Does a system of linear equations imply any other linear
equations? If so, what other linear equations? The system of linear equations implies a linear
equation of the form a · x = β for every vector a in the span of a1 , . . . , am .
But we have only partially answered the Question, for we have not yet shown that these are
the only linear equations implied by the system. We will show this in a later chapter.

Example 3.2.8: (Attacking the simple authentication scheme:) Suppose Eve has already seen
a collection of challenge vectors a1 , . . . , am for which she knows the responses. She can answer
any challenge in Span {a1 , . . . , am }. Does that include all possible challenges? This is equivalent
to asking if GF (2)n equals Span {a1 , . . . , am }.

3.2.3 Generators
Definition 3.2.9: Let V be a set of vectors. If v1 , . . . , vn are vectors such that V = Span {v1 , . . . , vn }
then we say {v1 , . . . , vn } is a generating set for V, and we refer to the vectors v1 , . . . , vn as
generators for V.

Example 3.2.10: Let V be the set {00000, 11100, 01110, 10010, 00111, 11011, 01001, 10101}
of 5-vectors over GF (2). We saw in Example 3.2.7 (Page 150) that these eight vectors are exactly
the span of 11100, 01110, and 00111. Therefore 11100, 01110, and 00111 form a generating
set for V.

Example 3.2.11: I claim that {[3, 0, 0], [0, 2, 0], [0, 0, 1]} is a generating set for R3 . To prove
that claim, I must show that the set of linear combinations of these three vectors is equal to R3 .
That means I must show two things:
1. Every linear combination is a vector in R3 .

2. Every vector in R3 is a linear combination.


The first statement is pretty obvious since R3 includes all 3-vectors over R. To prove the second
statement, let [x, y, z] be any vector in R3 . I must demonstrate that [x, y, z] can be written as
CHAPTER 3. THE VECTOR SPACE 152

a linear combination, i.e. I must specify the coefficients in terms of x, y, and z. Here goes:

[x, y, z] = (x/3) [3, 0, 0] + (y/2) [0, 2, 0] + z [0, 0, 1]

3.2.4 Linear combinations of linear combinations


I claim that another generating set for R3 is {[1, 0, 0], [1, 1, 0], [1, 1, 1]}. This time, I prove that
their span includes all of R3 by writing each of the three vectors in Example 3.2.11 (Page 151)
as a linear combination:

[3, 0, 0] = 3 [1, 0, 0]
[0, 2, 0] = −2 [1, 0, 0] + 2 [1, 1, 0]
[0, 0, 1] = 0 [1, 0, 0] − 1 [1, 1, 0] + 1 [1, 1, 1]

Why is that sufficient? Because each of the old vectors can in turn be written as a linear
combination of the new vectors, I can convert any linear combination of the old vectors into a
linear combination of the new vectors. We saw in Example 3.2.11 (Page 151) that any 3-vector
[x, y, z] can be written as a linear combination of the old vectors, hence it can be written as a
linear combination of the new vectors.
Let’s go through that explicitly. First we write [x, y, z] as a linear combination of the old
vectors:
[x, y, z] = (x/3) [3, 0, 0] + (y/2) [0, 2, 0] + z [0, 0, 1]
Next, we replace each old vector with an equivalent linear combination of the new vectors:
& ' & ' & '
[x, y, z] = (x/3) 3 [1, 0, 0] + (y/2) − 2 [1, 0, 0] + 2 [1, 1, 0] + z − 1 [1, 1, 0] + 1 [1, 1, 1]

Next, we multiply through, using associativity of scalar-vector multiplication (Proposition 2.5.5)


and the fact that scalar multiplication distributes over vector addition (Proposition 2.6.3):

[x, y, z] = x [1, 0, 0] − y [1, 0, 0] + y [1, 1, 0] − z [1, 1, 0] + z [1, 1, 1]

Finally, we collect like terms, using the fact that scalar-vector multiplication distributes over
scalar addition (Proposition 2.6.5):

[x, y, z] = (x − y) [1, 0, 0] + (y − z) [1, 1, 0] + z [1, 1, 1]

We have shown that an arbitrary vector in R3 can be written as a linear combination of [1, 0, 0],
[1, 1, 0], and [1, 1, 1]. This shows that R3 is a subset of Span {[1, 0, 0], [1, 1, 0], [1, 1, 1]}.
Of course, every linear combination of these vectors belongs to R3 , which means
Span {[1, 0, 0], [1, 1, 0], [1, 1, 1]} is a subset of R3 . Since each of these two sets is a subset of the
other, they are equal.

Quiz 3.2.12: Write each of the old vectors [3, 0, 0], [0, 2, 0], and [0, 0, 1] as a linear combination
of new vectors [2, 0, 1], [1, 0, 2], [2, 2, 2], and [0, 1, 0].
CHAPTER 3. THE VECTOR SPACE 153

Answer

[3, 0, 0] = 2 [2, 0, 1] − 1 [1, 0, 2] + 0 [2, 2, 2]


2 2
[0, 2, 0] = − [2, 0, 1] − [1, 0, 2] + 1 [2, 2, 2]
3 3
1 2
[0, 0, 1] = − [2, 0, 1] + [1, 0, 2] + 0 [2, 2, 2]
3 3

3.2.5 Standard generators


We saw a formula expressing [x, y, z] as a linear combination of the vectors [3, 0, 0], [0, 2, 0], and
[0, 0, 1]. The formula was particularly simple because of the special form of those three vectors.
It gets even simpler if instead we use [1, 0, 0], [0, 1, 0], and [0, 0, 1]:

[x, y, z] = x [1, 0, 0] + y [0, 1, 0] + z [0, 0, 1]

The simplicity of this formula suggests that these vectors are the most “natural” generators for
R3 . Indeed, the coordinate representation of [x, y, z] in terms of these generators is [x, y, z].
We call these three vectors the standard generators for R3 . We denote them by e0 , e1 , e2
(when it is understood we are working with vectors in R3 ).
When we are working with, for example, R4 , we use e0 , e1 , e2 , e3 to refer to [1, 0, 0, 0],
[0, 1, 0, 0], [0, 0, 1, 0],[0, 0, 0, 1].
For any positive integer n, the standard generators for Rn are:

e0 = [1, 0, 0, 0, . . . , 0]
e1 = [0, 1, 0, 0, . . . , 0]
e2 = [0, 0, 1, 0, . . . , 0]
..
.
en−1 = [0, 0, 0, 0, . . . , 1]

where ei is all zeroes except for a 1 in position i.


Naturally, for any finite domain D and field F, there are standard generators for FD . We
define them as follows. For each k ∈ D, ek is the function {k : 1}. That is, ek maps k to 1 and
maps all other domain elements to zero.
It is easy to prove that what we call “standard generators” for FD are indeed generators for
D
F . We omit the proof since it is not very illuminating.

Quiz 3.2.13: Write a procedure standard(D, one) that, given a domain D and given the
number one for the field, returns the list of standard generators for RD . (The number one is
provided as an argument so that the procedure can support use of GF (2).)
CHAPTER 3. THE VECTOR SPACE 154

Answer

>>> def standard(D, one): return [Vec(D, {k:one}) for k in D]

Example 3.2.14: (Solvability of 2 × 2 Lights Out:) Can 2 × 2 Lights Out be solved from every
starting configuration? This is equivalent to asking whether the 2 × 2 button vectors

• • • • • •
• • • • • •

are generators for GF (2)D , where D = {(0, 0), (0, 1), (1, 0), (1, 1)}.
To prove that the answer is yes, it suffices to show that each of the standard generators can
be written as a linear combination of the button vectors:
• • • • • • •
=1 +1 +1 +0
• • • • • •
• • • • • • •
=1 +1 +0 +1
• • • • • •
• • • • • •
=1 +0 +1 +1
• • • • • • •
• • • • • •
=0 +1 +1 +1
• • • • • • •

Exercise 3.2.15: For each of the subproblems, you are to investigate whether the given vectors
span R2 . If possible, write each of the standard generators for R2 as a linear combination of the
given vectors. If doing this is impossible for one of the subproblems, you should first add one
additional vector and then do it.

1. [1, 2], [3, 4]


2. [1, 1], [2, 2], [3, 3]
3. [1, 1], [1, −1], [0, 1]

Exercise 3.2.16: You are given the vectors [1, 1, 1], [0.4, 1.3, −2.2]. Add one additional vector
and express each of the standard generators for R3 as a linear combination of the three vectors.
CHAPTER 3. THE VECTOR SPACE 155

3.3 The geometry of sets of vectors


In Chapter 2, we saw how to write lines and line segments in terms of vectors. In a physical
simulation or graphics application, we might need to manipulate higher-dimensional geometrical
objects such as planes—perhaps we need to represent a wall or the surface of a table, or perhaps
we are representing the surface of a complicated three-dimensional object by many flat polygons
glued together. In this section, we informally investigate the geometry of the span of vectors over
R, and, as a bonus, the geometry of other kinds of sets of vectors.

3.3.1 The geometry of the span of vectors over R


Consider the set of all linear combinations of a single nonzero vector v:

Span {v} = {α v : α ∈ R}

We saw in Section 2.5.3 that this set forms the line through the origin and the point v. A line
is a one-dimensional geometrical object.
An even simpler case is the span of an empty set of vectors. We saw in Quiz 3.2.4 that the
span consists of exactly one vector, the zero vector. Thus in this case the span consists of a point,
which we consider a zero-dimensional geometrical object.
What about the span of two vectors? Perhaps it is a two-dimensional geometric object, i.e.
a plane?

Example 3.3.1: What is Span {[1, 0], [0, 1]}? These vectors are the standard generators for
R2 , so every 2-vector is in the span. Thus Span {[1, 0], [0, 1]} includes all points in the Euclidean
plane.

Example 3.3.2: What is Span {[1, 2], [3, 4]}? You might have shown in Exercise 3.2.15 that
the standard generators for R2 can be written as linear combinations of these vectors, so again
we see that the set of linear combinations of the two vectors includes all points in the plane.

Example 3.3.3: What about the span of two 3-vectors? The linear combinations of [1, 0, 1.65]
and [0, 1, 1] form a plane through the origin; part of this plane is shown below:
CHAPTER 3. THE VECTOR SPACE 156

We can use these two vectors in plotting the plane. Here is a plot of the points in the set
{α [1, 0, 1.65] + β [0, 1, 1] : α ∈ {−5, −4, . . . , 3, 4},
β ∈ {−5, −4, . . . , 3, 4}}:

Example 3.3.4: Do every two distinct vectors span a plane? What about Span {[1, 2], [2, 4]}?
For any pair of coefficients α1 and α2 ,

α1 [1, 2] + α2 [2, 4] = α1 [1, 2] + α2 (2 [1, 2])


= α1 [1, 2] + (α2 · 2)[1, 2]
= (α1 + 2α2 )[1, 2]

This shows that Span {[1, 2], [2, 4]} = Span {[1, 2]}, and we know from Section 2.5.3 that
Span {[1, 2]} forms a line, not a plane.

These examples lead us to believe that the span of two vectors over R forms a plane or a
lower-dimensional object (a line or a point). Note that the span of any collection of vectors
must include the origin, because the trivial linear combination (all coefficients equal to zero) is
included in the set.
The pattern begins to become clear:
• The span of zero vectors forms a point—a zero-dimensional object—which must be the
origin.
CHAPTER 3. THE VECTOR SPACE 157

• The span of one vector forms a line through the origin—a one-dimensional object—or a
point, the origin.
• The span of two vectors forms a plane through the origin—a two-dimensional object—or a
line through the origin or a point, the origin.

A geometric object such as a point, a line, or a plane is called a flat. There are higher-
dimensional flats too. All of R3 is a three-dimensional flat. Although it is hard to envision, one
can define a three-dimensional flat within four-dimensional space, R4 , and so on.
Generalizing from our observations, we are led to hypothesize:

Hypothesis 3.3.5: The span of k vectors over R forms a k-dimensional flat containing the
origin or a flat of lower dimension containing the origin.

Observing this pattern raises the following Question:

Question 3.3.6: How can we tell if the span of a given collection of k vectors forms a k-
dimensional object? More generally, given a collection of vectors, how can we predict the
dimensionality of the span?

The question will be answered starting in Chapter 6.

3.3.2 The geometry of solution sets of homogeneous linear systems


Perhaps a more familiar way to specify a plane is with an equation, e.g. {(x, y, z) ∈ R3 :
ax + by + cz = d}. For now, we want to focus on planes that contain the origin (0, 0, 0). For
(0, 0, 0) to satisfy the equation ax + by + cz = d, it must be that d equals zero.

Example 3.3.7: The plane depicted earlier, Span {[1, 0, 1.65], [0, 1, 1]}, can be represented as

{(x, y, z) ∈ R3 : 1.65x + 1y − 1z = 0}

We can rewrite the equation using dot-product, obtaining

{[x, y, z] ∈ R3 : [1.65, 1, −1] · [x, y, z] = 0}

Thus the plane is the solution set of a linear equation with right-hand side zero.

Definition 3.3.8: A linear equation with right-hand side zero is a homogeneous linear equation.
CHAPTER 3. THE VECTOR SPACE 158

Example 3.3.9: The line

can be represented as Span {[3, 2]} but it can also be represented as

{[x, y] ∈ R2 : 2x − 3y = 0}

That is, the line is the solution set of a homogeneous linear equation.

Example 3.3.10: This line

can be represented as Span {[−1, −2, 2]}. It can also be represented as the solution set of a pair
of homogeneous linear equations

{[x, y, z] ∈ R3 : [4, −1, 1] · [x, y, z] = 0, [0, 1, 1] · [x, y, z] = 0}

That is, the line consists of the set of triples [x, y, z] that satisfy both of these two homogeneous
linear equations.

Definition 3.3.11: A linear system (collection of linear equations) with all right-hand sides
zero is called a homogeneous linear system.

Generalizing from our two examples, we are led to hypothesize:

Hypothesis 3.3.12: A flat containing the origin is the solution set of a homogeneous linear
system.
CHAPTER 3. THE VECTOR SPACE 159

We are not yet in a position to formally justify our hypotheses or even to formally define flat.
We are working towards developing the notions that underlie that definition.

3.3.3 The two representations of flats containing the origin


A well-established theme in computer science is the usefulness of multiple representations for the
same data. We have seen two ways to represent a flat containing the origin:

• as the span of some vectors, and


• as the solution set to a homogeneous linear system.
Each of these representations has its uses. Or, to misquote Hat Guy,
Different tasks call for different representations.

Suppose you want to find the plane containing two given lines, the line Span {[4, −1, 1]} and the
line Span {[0, 1, 1]}.

Since the lines are represented as spans, it is easy to obtain the solution: The plane containing
these two lines is Span {[4, −1, 1], [0, 1, 1]}:

On the other hand, suppose you want to find the intersection of two given planes, the plane
{[x, y, z] : [4, −1, 1] · [x, y, z] = 0} and the plane {[x, y, z] : [0, 1, 1] · [x, y, z] = 0}:

Since each plane is represented as the solution set of a homogeneous linear equation, it is
easy to obtain the solution. The set of points that belong to both planes is the set of vectors
satisfying both equations: {[x, y, z] : [4, −1, 1] · [x, y, z] = 0, [0, 1, 1] · [x, y, z] = 0}.

Since each representation is useful, we would like to be able to transform from one repre-
sentation to another. Is this possible? Can any set represented as the span of vectors also be
represented as the solution set of a homogeneous linear system? What about the other way
round? We further discuss these conversion problems in Section 6.5. We first need to better
understand the underlying mathematics.
CHAPTER 3. THE VECTOR SPACE 160

3.4 Vector spaces


3.4.1 What’s common to the two representations?
Our goal is understanding the connection between these two representations. We will see that a
subset V of FD , whether V is the span of some D-vectors over F or the solution set of a linear
system, has three properties:

Property V1: V contains the zero vector,


Property V2: For every vector v, if V contains v then it contains α v for every scalar α, is
closed under scalar-vector multiplication, and
Property V3: For every pair u and v of vectors, if V contains u and v then it contains u + v.

First suppose V = Span {v1 , . . . , vn }. Then V satisfies


• Property V1 because
0 v 1 + · · · + 0 vn

• Property V2 because

if v = β1 v1 + · · · + βn vn then α v = α β1 v1 + · · · + α βn vn

• Property V3 because

if u = α 1 v1 + · · · + αn v1
and v = β 1 v1 + · · · + βn vn
then u+v = (α1 + β1 )v1 + · · · + (αn + βn )vn

Now suppose V is the solution set {x : a1 · x = 0, ..., am · x = 0} of a linear system. Then


V satisfies

• Property V1 because
a1 · 0 = 0, ..., am · 0 = 0

• Property V2 because

if a1 · v = 0, ..., am · v = 0
then α (a1 · v) = 0, ··· , α (am · v) = 0
so a1 · (α v) = 0, ··· , am · (α v) = 0

• Property V3 because

if a1 · u = 0, ..., am · u = 0
and a1 · v = 0, ..., am · v = 0
then a1 · u + a 1 · v = 0, ..., am · u + a m · v = 0
so a1 · (u + v) = 0, ..., am · (u + v) = 0
CHAPTER 3. THE VECTOR SPACE 161

3.4.2 Definition and examples of vector space


We use Properties V1, V2, and V3 to define a notion that encompasses both kinds of represen-
tations: spans of vectors, and solution sets of homogeneous linear systems.

Definition 3.4.1: A set V of vectors is called a vector space if it satisfies Properties V1, V2,
and V3.

Example 3.4.2: We have seen that the span of some vectors is a vector space.

Example 3.4.3: We have seen that the solution set of a homogeneous linear system is a vector
space.

Example 3.4.4: A flat (such as a line or a plane) that contains the origin can be written as
the span of some vectors or as the solution set of a homogeneous linear system, and therefore
such a flat is a vector space.

The statement “If V contains v then it contains α v for every scalar α” is expressed in Mathese
as
“V is closed under scalar-vector multiplication.”

The statement “if V contains u and v then it contains u + v” is expressed in Mathese as

“V is closed under vector addition.”


(In general, we say a set is closed under an operation if the set contains any object produced by
that operation using inputs from the set.)
What about FD itself?

Example 3.4.5: For any field F and any finite domain D, the set FD of D-vectors over F
is a vector space. Why? Well, FD contains a zero vector and is closed under scalar-vector
multiplication and vector addition. For example, R2 and R3 and GF (2)4 are all vector spaces.

What is the smallest subset of FD that is a vector space?

Proposition 3.4.6: For any field F and any finite domain D, the singleton set consisting of
the zero vector 0D is a vector space.
CHAPTER 3. THE VECTOR SPACE 162

Proof

The set {0D } certainly contains the zero vector, so Property V1 holds. For any scalar α,
α 0D = 0D , so Property V2 holds: {0D } is closed under scalar-vector multiplication. Finally,
0D + 0D = 0D , so Property V3 holds: {0D } is closed under vector addition. !

Definition 3.4.7: A vector space consisting only of a zero vector is a trivial vector space.

Quiz 3.4.8: What is the minimum number of vectors whose span is {0D }?

Answer

The answer is zero. As we discussed in the answer to Quiz 3.2.4, {0D } equals the span of
the empty set of D-vectors. It is true, as discussed in the answer to Quiz 3.2.6, that {0D } is
the span of {0D }, but this just illustrates that there are different sets with the same span.
We are often interested in the set with the smallest size.

3.4.3 Subspaces
Definition 3.4.9: If V and W are vector spaces and V is a subset of W, we say V is a subspace
of W.

Remember that a set is considered a subset of itself, so one subspace of W is W itself.

Example 3.4.10: The only subspace of {[0, 0]} is itself.

Example 3.4.11: The set {[0, 0]} is a subspace of {α [2, 1] : α ∈ R}, which is in turn a
subspace of R2 .

Example 3.4.12: The set R2 is not a subspace of R3 since R2 is not contained in R3 ; indeed,
R2 consists of 2-vectors and R3 contains no 2-vectors.

Example 3.4.13: What vector spaces are contained in R2 ?


• The smallest is {[0, 0]}.
CHAPTER 3. THE VECTOR SPACE 163

• The largest is R2 itself.


• For any nonzero vector [a, b], the line through the origin and [a, b], Span {[a, b]}, is a vector
space.
Does R2 have any other subspaces? Suppose V is a subspace of R2 . Assume it has some nonzero
vector [a, b], and assume it also has some other vector [c, d] such that [c, d] is not in Span {[a, b]}.
We prove that in this case V = R2 .
Lemma 3.4.14: ad 2= bc

Proof

Since [a, b] 2= [0, 0], either a 2= 0 or b 2= 0 (or both).


Case 1: a 2= 0. In this case, define α = c/a. Since [c, d] is not in Span {[a, b]}, it must be
that [c, d] 2= α [a, b]. Because c = α a, it must be that d 2= α b. Substituting c/a for α, we
infer that d 2= ac b. Multiplying through by a, we infer that ad 2= cb.
Case 2: b 2= 0. In this case, define α = d/b. Since [c, d] 2= α [a, b], we infer that c 2= α a.
Substituting for α and multiplying through by b, we infer that ad 2= cb. !
Now we show that V = R2 . To show that, we show that every vector in R2 can be written
as a linear combination of just two vectors in V, namely [a, b] and [c, d].
Let [p, q] be any vector in R2 . Define α = dp−cq aq−bp
ad−bc and β = ad−bc . Then

α [a, b] + β [c, d]
1
= [(pd − qc)a + (aq − bp)c, (pd − qc)b + (aq − bp)d]
ad − bc
1
= [adp − bcp, adq − bcq]
ad − bc
= [p, q]

We have shown that [p, q] is equal to a linear combination of [a, b] and [c, d]. Since [p, q] is an
arbitrary element of R2 , we have shown that R2 = Span {[a, b], [c, d]}.
Since V contains [a, b] and [c, d] and is closed under scalar-vector multiplication and vector
addition, it contains all of Span {[a, b], [c, d]}. This proves that V contains all of R2 . Since every
vector in V belongs to R2 , V is also a subset of R2 . Since each of V and R2 is a subset of the
other, they must be equal.

We came to the concept of vector space by considering two ways of forming a set:
• as the span of some vectors, and

• as the solution set of a homogeneous linear system.


Each of these is a vector space. In particular, each is a subspace of FD for some field F and some
domain D.
What about the converse?
CHAPTER 3. THE VECTOR SPACE 164

Question 3.4.15: Can any subspace of FD be expressed as the span of a finite set of vectors?

Question 3.4.16: Can any subspace of FD be expressed as the solution set of a homogeneous
linear system?

We will see in Chapter 6 that the answers are yes, and yes. Establishing this, however,
requires we learn some more mathematics.

3.4.4 *Abstract vector spaces


I am tempted to state more simply that any vector space can be expressed as the span of a finite
number of vectors and as the solution set of a homogeneous linear system. However, that is not
true according to the formal definitions of Mathematics.
In this book, I have defined a vector as a function from a finite domain D to a field F.
However, modern mathematics tends to define things in terms of the axioms they satisfy rather
than in terms of their internal structure. (I informally raised this idea in discussing the notion
of a field.)
Following this more abstract approach, one does not define the notion of a vector; instead, one
defines a vector space over a field F to be any set V that is equipped with a addition operation and
a scalar-multiplication operation (satisfying certain axioms) and that satisfies Properties V1, V2,
and V3. The elements of V, whatever they happen to be, play the role of vectors.
This definition avoids committing to a specific internal structure for vectors and consequently
allows for a much broader class of mathematical objects to be considered vectors. For example,
the set of all functions from R to R is a vector space according to the abstract definition. The
question of whether a subspace of this space is the span of a finite set of vectors is a deeper
mathematical question than we can address in this book.
I avoid the abstract approach in this book because I find that the more concrete notion of
vector is helpful in developing intuition. However, if you go deeper into mathematics, you should
expect to encounter this approach.

3.5 Affine spaces


What about points, lines, planes, etc. that do not include the origin?

3.5.1 Flats that don’t go through the origin


In Section 2.6.1, we observed that a line segment not through the origin could be obtained
from a line segment through the origin by translation, i.e. by applying a function such as
f ([x, y]) = [x, y] + [0.5, 1].
How can we represent a line not through the origin? Two approaches were outlined in
Section 2.6.4. First, we start with a line that does go through the origin
CHAPTER 3. THE VECTOR SPACE 165

We now know that the points of this line form a vector space V.
We can choose a vector a and add it to every vector in V:

In Mathese, we would write the resulting set as

{a + v : v ∈ V}

We will abbreviate this set expression as a + V.


The resulting set is a line that goes through a (and not through the origin):

Now let’s carry out the same process on a plane.

Example 3.5.1: There is one plane through the points u1 = [1, 0, 4.4], u2 = [0, 1, 4], and
u3 = [0, 0, 3]:
CHAPTER 3. THE VECTOR SPACE 166

How can we write the set of points in the plane as a translation of a vector space?
Define a = u2 − u1 and b = u3 − u1 , and let V be the vector space Span a, b. Then the
points of V form a plane:

Now consider the set


u1 + V
Intuitively, the translation of a plane remains a plane. Note in addition that u1 + V contains

• the point u1 since V contains the zero vector,


• the point u2 since V contains u2 − u1 , and
• the point u3 since V contains u3 − u1 .

Since the plane u1 + V contains u1 , u2 , and u3 , it must be the unique plane through those
points.

3.5.2 Affine combinations


In Section 2.6.4, we saw another way to write the line through points u and v: as the set of affine
combinations of u and v. Here we generalize that notion as well.

Definition 3.5.2: A linear combination α1 u1 + · · · + αn un is called an affine combination if


the coefficients sum to one.

Example 3.5.3: The linear combination 2 [10., 20.] + 3 [0, 10.] + (−4) [30., 40.] is a an affine
combination of the vectors because 2 + 3 + (−4) = 1.

Example 3.5.4: In Example 3.5.1 (Page 165), we wrote the plane through u1 , u2 , and u3 as

u1 + V
CHAPTER 3. THE VECTOR SPACE 167

where V = Span {u2 − u1 , u3 − u1 }.


The vectors in V are the vectors that can be written as linear combinations

α (u2 − u1 ) + β (u3 − u1 )

so the vectors in u1 + V are the vectors that can be written as

u1 + α (u2 − u1 ) + β (u3 − u1 )

which can be rewritten as


(1 − α − β) u1 + αu2 + β u3
Let γ = 1 − α − β. Then the above expression can be rewritten as the affine combination

γ u1 + α u2 + β u3

That is, the vectors in u1 + V are exactly the set of all affine combinations of u1 , u2 , and u3 .

The set of all affine combinations of a collection of vectors is called the affine hull of that
collection.

Example 3.5.5: What is the affine hull of {[0.5, 1], [3.5, 3]}? We saw in Section 2.6.4 that the
set of affine combinations,

{α [3.5, 3] + β [0.5, 1] : α ∈ R, β ∈ R, α + β = 1}

is the line through [0.5, 1] and [3.5, 3].

Example 3.5.6: What is the affine hull of {[1, 2, 3]}? It is the set of linear combinations
α [1, 2, 3] where the coefficients sum to one—but there is only one coefficient, α, so we require
α = 1. Thus the affine hull consists of a single vector, [1, 2, 3].

In the examples, we have seen,

• the affine hull of a one-vector collection is a single point (the one vector in the collection),
i.e. a 0-dimensional object;
• the affine hull of a two-vector collection is a line (the line through the two vectors), i.e. a
1-dimensional object;
• the affine hull of a three-vector collection is a plane (the plane through the three vectors),
i.e. a 2-dimensional object.
However, let’s not jump to conclusions.
CHAPTER 3. THE VECTOR SPACE 168

Example 3.5.7: What is the affine hull of {[2, 3], [3, 4], [4, 5]}?

These points all lie on a line. The affine hull is therefore that line, rather than a plane.

Like the span of vectors, the affine hull of vectors can end up being a lower-dimensional object
than you would predict just from the number of vectors. Just as we asked in Question 3.3.6 about
spans, we might ask a new question: how can we predict the dimensionality of an affine hull? In
Example 3.5.1 (Page 165), the affine hull of u1 , u2 , u3 is the translation of Span {u2 −u1 , u3 −u1 },
so, our intuition tell us, the dimensionality of the affine hull is the same as that of Span {u2 −
u1 , u3 − u1 }. Thus, in this case, the question about affine hull is not really a new question.
More generally, we will see in in Section 3.5.3 every affine hull of some vectors is the translation
of the span of some other vectors, so questions about the dimensionality of the former can be
replaced with questions about the dimensionality of the latter.

3.5.3 Affine spaces


Definition 3.5.8: An affine space is the result of translating a vector space. That is, a set A
is an affine space if there is a vector a and a vector space V such that

A = {a + v : v ∈ V}

i.e. A = a + V.

A flat, it can now be told, is just an affine space that is a subset of Rn for some n.

Example 3.5.9: We saw in Example 3.5.1 (Page 165) that the plane through the points u1 =
[1, 0, 4.4], u2 = [0, 1, 4], and u3 = [0, 0, 3] can be written as the result of adding u1 to each
point in the span of u2 − u1 and u3 − u1 . Since Span {u2 − u1 , u3 − u1 } is a vector space, it
follows that the plane through u1 , u2 , and u3 is an affine space.
CHAPTER 3. THE VECTOR SPACE 169

We also saw in Section 3.5.2 that the plane is the set of affine combinations of u1 , u2 , and u3 .
Thus, in this case at least, the affine combination of the vectors is an affine space. Is this true
generally?

Lemma 3.5.10: For any vectors u1 , . . . , un ,


n
%
{α1 u1 + · · · + αn un : αi = 1} = {u1 + v : v ∈ Span {u2 − u1 , . . . , un − u1 }} (3.2)
i=1

In words, the affine hull of u1 , . . . , un equals the set obtained by adding u1 to each vector in
the span of u2 − u1 , . . . , un − u1 .

The lemma shows that the affine hull of vectors is an affine space. Knowing this will help us, for
example, learn how to find the intersection of a plane with a line.
The proof follows the calculations in Example 3.5.4 (Page 166).

Proof

Every vector in Span {u2 − u1 , . . . , un − u1 } can be written in the form

α2 (u2 − u1 ) + · · · + αn (un − u1 )

so every vector in the right-hand side of Equation 3.2 can be written in the form

u1 + α2 (u2 − u1 ) + · · · + αn (un − u1 )

which can be rewritten (using homogeneity and distributivity as in Example 3.5.1 (Page
165)) as
(1 − α2 − · · · − αn ) u1 + α2 u2 + · · · + αn un (3.3)
which is an affine combination of u1 , u2 , . . . , un since the coefficients sum to one. Thus
every vector in the right-hand side of Equation 3.2 is in the left-hand side.
!nConversely, for every vector α1 u1 + α2 u2 + · · · + αn un in the left-hand side, since
i=1 αi = 1, we infer α1 = 1 − α2 − · · · − αn , so the vector can be written as in Line 3.3,
which shows that the vector is in the right-hand side. !

We now have two representations of an affine space:


• as a + V where V is the span of some vectors, and

• as the affine hull of some vectors.


These representations are not fundamentally different; as we have seen, it is easy to convert be-
tween one representation and the other. Next, we discuss a representation that is quite different.
CHAPTER 3. THE VECTOR SPACE 170

3.5.4 Representing an affine space as the solution set of a linear system


In Section 3.3.2, we saw examples in which a flat containing the origin could be represented as
the solution set of a homogeneous linear system. Here we represent a flat not containing the
origin as the solution set of a linear system that is not homogeneous.

Example 3.5.11: We saw in Example 3.5.1 (Page 165) that the plane through the points
[1, 0, 4.4], [0, 1, 4], and [0, 0, 3] is the affine hull of those points. However, the plane is also the
solution set of the equation 1.4x + y − z = −3, i.e. the plane is

{[x, y, z] ∈ R3 : [1.4, 1, −1] · [x, y, z] = −3}

Example 3.5.12: We saw in Section 2.6.4 (see also Example 3.5.5 (Page 167)) that the line

through [0.5, 1] and [3.5, 3] consists of the set of all affine combinations of [0.5, 1] and [3.5, 3].
This line is also the solution set of the equation 2x − 3y = −2, i.e. the set

{[x, y] ∈ R2 : [2, −3] · [x, y] = −2}

Example 3.5.13: The line

can be represented as the set of all affine combinations of [1, 2, 1] and [1, 2, −2]. The line is also
the solution set of the linear system consisting of the equations 4x − y + z = 3 and y + z = 3,
CHAPTER 3. THE VECTOR SPACE 171

i.e. the set


{[x, y, z] ∈ R3 : [4, −1, 1] · [x, y, z] = 3, [0, 1, 1] · [x, y, z] = 3}

3.5.5 The two representations, revisited


As we saw in Section 3.3.3 in the context of flats containing the origin, having two representations
can be useful.

Example 3.5.14: Suppose you are given two lines

and want to find the plane containing the two lines.


The first line is Span {[4, −1, 1]}. The second line is Span {[0, 1, 1]}. Therefore the plane
containing these two lines is Span {[4, −1, 1], [0, 1, 1]}:

Next we give an example using the second kind of representation.

Example 3.5.15: Now you are given two planes through the origin:

and your goal is to find the the intersection.


The first plane is {[x, y, z] : [4, −1, 1] · [x, y, z] = 0}. The second plane is {[x, y, z] :
[0, 1, 1] · [x, y, z] = 0}. We are representing each plane as the solution set of a linear system with
right-hand sides zero. The set of points comprising the intersection is exactly the set of points
that satisfy both equations,

{[x, y, z] : [4, −1, 1] · [x, y, z] = 0, [0, 1, 1] · [x, y, z] = 0}

This set of points forms a line, but to draw the line it is helpful to find its representation as the
span of a vector. We will learn later how to go from a linear system with zero right-hand sides
to a set of generators for the solution set. It turns out the solution set is Span {[1, 2, −2]}.
CHAPTER 3. THE VECTOR SPACE 172

Because different representations facilitate different operations, it is useful to be able to


convert between different representations of the same geometric object. We’ll illustrate this
using an example that arises in computer graphics. A scene is often constructed of thousands of
triangles. How can we test whether a beam of light strikes a particular triangle, and, if so, where
on that triangle?
Let’s say the triangle’s corners are located at the vectors v0 , v1 , and v2 . Then the plane
containing the triangle is the affine hull of these vectors.
Next, suppose a beam of light originates at a point b, and heads in the direction of the arrow
representing the vector d. The beam of light forms the ray consisting of the set of points

{b + α d : α ∈ R, α ≥ 0}

which in turn forms part of the line

{b + α d : α ∈ R}

So far we are using the first kind of representation for the triangle, for the plane containing
the triangle, for the ray and for the line containing the ray. To find out whether the beam of
light strikes the triangle, we find the intersection of

• the plane containing the triangle, and

• the line containing the ray of light.


Usually, the intersection will consist of a single point. We can then test whether that point lies
in the triangle and whether it belongs to the ray.
But how can we find the intersection of the plane and the line? We use the second kind of
representation:

• we find the representation of the plane as the solution set of one linear system, and

• we find the representation of the line as the solution set of another linear system.
The set of points belonging to both the plane and the line is exactly the set of points in the
solution sets of both linear systems, which is the set of points in the solution set of the new linear
system consisting of all the equations in both of the two original linear systems.

Example 3.5.16: Suppose the vertices of the triangle are the points [1, 1, 1], [2, 2, 3], and
[−1, 3, 0].
CHAPTER 3. THE VECTOR SPACE 173

The triangle looks like this:

The ray of light originates at p = [−2.2, 0.8, 3.1] and moves in the direction d = [1.55, 0.65, −0.7].
CHAPTER 3. THE VECTOR SPACE 174

We can see from the picture that the ray does indeed intersect the triangle, but how can a
computer discover that?
Here we show the plane containing the triangle:

We will later learn to find a linear equation whose solution space is the plane. One such
equation turns out to be [5, 3, −4] · [x, y, z] = 4.
We will later learn to find a linear system whose solution space is the line containing the ray
of light. One such system, it turns out, is

[0.275..., −0.303..., 0.327...] · [x, y, z] = 0.1659...


[0, 0.536..., 0.498...] · [x, y, z] = 1.975...

To find the intersection of the plane and the line, we put all these linear equations together,
obtaining

[5, 3, −4] · [x, y, z] = 4


[0.275..., −0.303..., 0.327...] · [x, y, z] = 0.1659...
[0, 0.536..., 0.498...] · [x, y, z] = 1.975...

The solution set of this combined linear system consists of the points belonging to both the
plane and the line. We are using the second kind of representation. In this case, the solution set
consists of just one point. To find that point, we convert back to the first kind of representation.
CHAPTER 3. THE VECTOR SPACE 175

We will later learn algorithms to solve a linear system. The solution turns out to be w =
[0.9, 2.1, 1.7]. The point w, therefore is the intersection of the plane and the line.

Once we have found the point of intersection of the line and the plane, how do we find out
whether the intersection belongs to the triangle and the ray? For this, we return to the first kind
of representation.

Example 3.5.17: The point of intersection lies in the plane and is therefore an affine combi-
nation of the vertices:

w = α0 [1, 1, 1] + α1 [2, 2, 3] + α2 [−1, 3, 0]

which means that the coefficients sum up to one. The point is in the triangle if it is a convex
combination of the vertices. We will learn that in this case there is only one way to represent
the point as an affine combination, and we will learn how to find the coefficients:

w = 0.2 [1, 1, 1] + 0.5 [2, 2, 3] + 0.3 [−1, 3, 0]

Since the coefficients are nonnegative, we know that the point of intersection is indeed in the
triangle.
There is one more thing to check. We should check that the intersection point lies in the
’half’ of the line that comprises the ray. The ray is the set of points {p + α d : α ∈ R, α ≥ 0}.
The line is the set of points {p + α d : α ∈ R, α ≥ 0}. To see if the intersection point w is in
the ray, we find the unique value of α such that w = p + α d, and we check that this value is
nonnegative.
The vector equation w = p + α d is equivalent to three scalar equations, one for each of the
entries of the vector. To find the value of α, let’s just consider the first entry. The first entry
of w is 0.9, the first entry of p is −2.2, and the first entry of d is 1.55, so α must satisfy the
CHAPTER 3. THE VECTOR SPACE 176

equation
0.9 = −2.2 + α 1.55
which we can solve, obtaining α = 2. Since the value of α is nonnegative, the intersection point
does indeed belong to the ray.

In this example, we needed to convert between the two kinds of representations of a flat, (1) as
a set of of linear combinations and (2) as the solution set of a linear system.

3.6 Linear systems, homogeneous and otherwise


In Section 3.4, we saw that the solution set of a homogeneous linear system is a vector space.
What about the solution set of an arbitrary linear system? Is that an affine space? Yes, with an
exception: the case in which the solution set is empty.

3.6.1 The homogeneous linear system corresponding to a general linear


system
In Section 2.9.2, we considered the problem of calculating the rate of power consumption for
hardware components of a sensor node. We formulated this as the problem of finding a solution
to a linear system over R, and we asked (Question 2.9.11): how can we tell if there is only one
solution?
In Section 2.9.7, we considered an attack on a simple authentication scheme. We found a
way in which Eve, an eavesdropper, might calculate the password from observing authentication
trials. We formulated this as the problem of finding a solution to a system of linear equations
over GF (2), and we asked (Question 2.9.18): how many solutions are there to a given linear
system over GF (2)?
We shall see that, in each of these applications, the first question can be addressed by studying
the corresponding system of homogeneous linear equations, i.e. where each right-hand side is
replaced by a zero.

Lemma 3.6.1: Let u1 be a solution to the system of linear equations

a1 · x = β1
..
. (3.4)
am · x = βm

Then another vector u2 is also a solution if and only if the difference u2 − u1 is a solution to
the system of corresponding homogeneous equations

a1 · x = 0
..
. (3.5)
am · x = 0
CHAPTER 3. THE VECTOR SPACE 177

Proof

For i = 1, . . . , m, we have ai ·u1 = βi , so ai ·u2 = βi iff ai ·u2 −ai ·u1 = 0 iff ai ·(u2 −u1 ) = 0.
!

The set of solutions to a homogeneous linear system is a vector space V. We can restate the
assertion of Lemma 3.6.1:

u2 is a solution to the original linear system (3.4) if and only if u2 − u1 is in V


where V is the solution set of the homogeneous linear system (3.5).

Substituting v for u2 − u1 (which implies u2 = u1 + v), we reformulate it as:

u1 + v is a solution to the original linear system if and only if v is in V

which can be reworded as:

{solutions to original linear system} = {u1 + v : v ∈ V} (3.6)

The set on the right-hand side is an affine space!

Theorem 3.6.2: For any linear system, the set of solutions either is empty or is an affine space.

Proof

If the linear system has no solution, the solution set is empty. If it has at least one solution
u1 then the solution set is {u1 + v : v ∈ V}. !

We asked in Question 3.4.16 whether every vector space is the solution space of a homogeneous
system (and indicated that the answer is yes). An analogous question is Is every affine space the
solution set of a linear system? That the answer is yes follows from the fact that the answer to
the previous question is yes.

Example 3.6.3: The solution set of the linear system


( )
0 0 ·x=1

is the empty set.


CHAPTER 3. THE VECTOR SPACE 178

The solution set of the linear system


( )
1 0 ·x=2
( )
0 1 ·x=5
"# $*
2
is the singleton set , which can be written as
5
"# $ "# $**
2 0
+v : v ∈
5 0

The solution set of the linear system


( )
2 −5 · x = 1
( )
4 −10 · x = 2
"# $ # $ *
−2 1
is the set +α : α ∈ R , which can be written as
−1 2.5
"# $ "# $**
−2 1
+ v : v ∈ Span
−1 2.5

3.6.2 Number of solutions revisited


We can now give a partial answer to Question 2.9.11 (How can we tell if a linear system has only
one solution?):

Corollary 3.6.4: Suppose a linear system has a solution. The solution is unique if and only if
the only solution to the corresponding homogeneous linear system is the zero vector.

The question about uniqueness of solution is therefore replaced with

Question 3.6.5: How can we tell if a homogeneous linear system has only a trivial solution?

Moreover, Question 2.9.18 (How many solutions are there to a given system of linear equations
over GF (2)?) is partially addressed by Equation 3.6, which tells us that the number of solu-
tions equals |V|, the cardinality of the vector space consisting of solutions to the corresponding
homogeneous system.
The question about counting solutions to a linear system over GF (2) thus becomes

Question 3.6.6: How can we find the number of solutions to a homogeneous linear system
over GF (2)?
CHAPTER 3. THE VECTOR SPACE 179

In addressing these questions, we will make use of the fact that the solution set for a homogeneous
linear system is a vector space.

3.6.3 Towards intersecting a plane and a line


Here’s an example of how Theorem 3.6.2 could help us: an approach to compute the intersection
of a plane and a line:

Step 1: Since the plane is an affine space, we hope to represent it as the solution set of a linear
system.
Step 2: Since the line is an affine space, we hope to represent it as the solution set of a second
linear system.

Step 3: Combine the two linear systems to form a single linear system consisting of all the linear
equations from the two. The solutions to the combined linear system are the points that
are on both the plane and the line.

The solution set of the combined linear system might consist of many vectors (if the line lies
within the plane) or just one (the point at which the line intersects the plane).
This approach sounds promising—but so far we don’t know how to carry it out. Stay tuned.

3.6.4 Checksum functions


This section gives another application of homogeneous linear equations.
A checksum for a big chunk of data or program is a small chunk of data used to verify that
the big chunk has not been altered. For example, here is a fragment of the download page for
Python:
CHAPTER 3. THE VECTOR SPACE 180

With each downloadable Python release is listed the checksum and the size.
A checksum function is a function that maps a large file of data to a small chunk of data, the
checksum. Since the number of possible checksums is much smaller than the number of possible
files, there is no one-to-one checksum function: there will always be pairs of distinct files that map
to the same checksum. The goal of using a checksum function is to detect accidental corruption
of a file during transmission or storage.
Here we seek a function such that a random corruption is likely detectable: for any file F , a
random change to the file probably leads to a change in the checksum.
We describe an impractical but instructive checksum function. The input is a “file” repre-
sented as an n-bit vector over GF (2). The output is a 64-vector. The function is specified by
sixty-four n-vectors a1 , . . . , a64 . The function is then defined as follows:

x )→ [a1 · x, . . . , a64 · x]

Suppose p is a “file”. We model corruption as the addition of a random n-vector e (the error),
so the corrupted version of the file is p + e. We want to find a formula for the probability that
the corrupted file has the same checksum as the original file.
The checksum for the original file is [β1 , . . . , βm ], where βi = ai · p for i = 1, . . . , m. For
i = 1, . . . , m, bit i of the checksum of the corrupted file is ai ·(p+e). Since dot-product distributes
over vector addition (Proposition 2.9.25), this is equal to ai · p + a · e. Thus bit i of the checksum
of the corrupted file equals that of the original file if and only if ai · p + ai · e = ai · p—that is,
if and only if ai · e = 0.
CHAPTER 3. THE VECTOR SPACE 181

Thus the entire checksum of the corrupted file is the same as that of the original if and only
if ai · e = 0 for i = 1, . . . and m, if and only if e belongs to the solution set for the homogeneous
linear system

a1 · x = 0
..
.
am · x = 0

The probability that a random n-vector e belongs to the solution set is


number of vectors in solution set
number of n-vectors over GF (2)

We know that the number of n-vectors over GF (2) is 2n . To calculate this probability, therefore,
we once again need an answer to Question 3.6.6: How can we find the number of solutions to a
homogeneous linear system over GF (2)?

3.7 Review questions


• What is a linear combination?

• What are coefficients?


• What is the span of vectors?

• What are standard generators?

• What are examples of flats?

• What is a homogeneous linear equation?


• What is a homogeneous linear system?

• What are the two kinds of representations of flats containing the origin?
• What is a vector space?
• What is a subspace?
• What is an affine combination?

• What is the affine hull of vectors?


• What is an affine space?

• What are the two kinds of representations of flats not containing the origin?
• Is the solution set of a linear system always an affine space?
CHAPTER 3. THE VECTOR SPACE 182

3.8 Problems
Vec review
Vectors in containers
Problem 3.8.1:
1. Write and test a procedure vec select using a comprehension for the following compu-
tational problem:

• input: a list veclist of vectors over the same domain, and an element k of the
domain
• output: the sublist of veclist consisting of the vectors v in veclist where v[k] is
zero

2. Write and test a procedure vec sum using the built-in procedure sum(·) for the following:

• input: a list veclist of vectors, and a set D that is the common domain of these
vectors
• output: the vector sum of the vectors in veclist.
Your procedure must work even if veclist has length 0.
Hint: Recall from the Python Lab that sum(·) optionally takes a second argument, which
is the element to start the sum with. This can be a vector.
Disclaimer: The Vec class is defined in such a way that, for a vector v, the expression 0 +
v evaluates to v. This was done precisely so that sum([v1,v2,... vk]) will correctly
evaluate to the sum of the vectors when the number of vectors is nonzero. However, this
won’t work when the number of vectors is zero.

3. Put your procedures together to obtain a procedure vec select sum for the following:

• input: a set D, a list veclist of vectors with domain D, and an element k of the
domain
• output: the sum of all vectors v in veclist where v[k] is zero

Problem 3.8.2: Write and test a procedure scale vecs(vecdict) for the following:
• input: A dictionary vecdict mapping positive numbers to vectors (instances of Vec)
• output: a list of vectors, one for each item in vecdict. If vecdict contains a key k
mapping to a vector v, the output should contain the vector (1/k)v
CHAPTER 3. THE VECTOR SPACE 183

Linear combinations
Constructing the span of given vectors over GF (2)
Problem 3.8.3: Write a procedure GF2_span with the following spec:
• input: a set D of labels and a list L of vectors over GF (2) with label-set D

• output: the list of all linear combinations of the vectors in L


(Hint: use a loop (or recursion) and a comprehension. Be sure to test your procedure on examples
where L is an empty list.)

Problem 3.8.4: Let a, b be real numbers. Consider the equation z = ax+by. Prove that there
are two 3-vectors v1 , v2 such that the set of points [x, y, z] satisfying the equation is exactly
the set of linear combinations of v1 and v2 . (Hint: Specify the vectors using formulas involving
a, b.)

Problem 3.8.5: Let a, b, c be real numbers. Consider the equation z = ax+by +c. Prove that
there are three 3-vectors v0 , v1 , v2 such that the set of points [x, y, z] satisfying the equation is
exactly
{v0 + α1 v1 + α2 v2 : α1 ∈ R, α2 ∈ R}
(Hint: Specify the vectors using formulas involving a, b, c.)
CHAPTER 3. THE VECTOR SPACE 184

Sets of linear combinations and geometry

(a) (b)

Figure 3.1: Figures for Problem 3.8.6.

Problem 3.8.6: Express the line segment in Figure 3.1(a) using a set of linear combinations.
Do the same for the plane containing the triangle in Figure 3.1(b).

Vector spaces
Problem 3.8.7: Prove or give a counterexample: “{[x, y, z] : x, y, z ∈ R, x + y + z = 1} is
a vector space.”

Problem 3.8.8: Prove or give a counterexample: “{[x, y, z] : x, y, z ∈ R and x + y + z = 0}


is a vector space.”

Problem 3.8.9: Prove or give a counterexample: “{[x1 , x2 , x3 , x4 , x5 ] : x1 , x2 , x3 , x4 , x5 ∈


R, x2 = 0 or x5 = 0} is a vector space.”

Problem 3.8.10: Explain your answers.

1. Let V be the set of 5-vectors over GF (2) that have an even number of 1’s. Is V a vector
space?
2. Let V be the set of 5-vectors over GF (2) that have an odd number of 1’s. Is V a vector
space?
Chapter 4

The Matrix

Neo: What is the Matrix?


Trinity: The answer is out there, Neo,
and it’s looking for you, and it will find
you if you want it to.
The Matrix, 1999

4.1 What is a matrix?


4.1.1 Traditional matrices
Traditionally, a matrix over F is a two-dimensional array whose entries are elements of F. Here
is a matrix over R: # $
1 2 3
10 20 30
This matrix has two rows and three columns, so we (call it a 2 × ) 3 matrix. It (is traditional to
)
refer to the rows
# and
$ columns by #numbers.
$ Row 1 is 1 # 2 3 $ and row 2 is 10 20 30 ;
1 2 3
column 1 is , column 2 is , and column 3 is .
10 20 30
In general, a matrix with m rows and n columns is called an m × n matrix. For a matrix A,
the i, j element is defined to be the element in the ith row and the j th column, and is traditionally
written Ai,j or Aij . We will often use the Pythonese notation, A[i, j].
Row i is the vector
( )
A[i, 0], A[i, 1], A[i, 2], · · · , A[i, m − 1]

and column j is the vector


( )
A[0, j], A[1, j], A[2, j], ··· , A[n − 1, j]

185
CHAPTER 4. THE MATRIX 186

Representing a traditional matrix by a list of row-lists


How can we represent a matrix? Perhaps the first representation that comes to mind is a list
of row-lists: each row of the matrix A is represented by a list of numbers, and the matrix is
represented by a list L of these lists. That is, a list L such that

A[i, j] = L[i][j] for every 0 ≤ i < m and 0 ≤ j < n


# $
1 2 3
For example, the matrix would be represented by [[1,2,3],[10,20,30]].
10 20 30

Quiz 4.1.1: Write a nested comprehension whose value is list-of-row-list representation of a


3 × 4 matrix all of whose elements are zero:
 
0 0 0 0
 0 0 0 0 
0 0 0 0

Hint: first write a comprehension for a typical row, then use that expression in a comprehension
for the list of lists.

Answer

>>> [[0 for j in range(4)] for i in range(3)]


[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

Representing a traditional matrix by a list of column-lists


As you will see, one aspect of matrices that makes them so convenient and beautiful is the duality
between rows and columns. Anything you can do with columns, you can do with rows. Thus we
can represent a matrix A by a list of column-lists; that is, a list L such that

A[i, j] = L[j][i] for every 0 ≤ i < m and 0 ≤ j < n


# $
1 2 3
For example, the matrix would be represented by [[1,10],[2,20],[3,30]].
10 20 30

Quiz 4.1.2: Write a nested comprehension whose value is list-of-column-lists representation of


a 3 × 4 matrix whose i, j element is i − j:
 
0 −1 −2 −3
 1 0 −1 −2 
2 1 0 −1
CHAPTER 4. THE MATRIX 187

Figure 4.1: The Matrix Revisited (excerpt) http://xkcd.com/566/

Hint: First write a comprension for column j, assuming j is bound to an integer. Then use that
expression in a comprehension in which j is the control variable.

Answer

>>> [[i-j for i in range(3)] for j in range(4)]


[[0, 1, 2], [-1, 0, 1], [-2, -1, 0], [-3, -2, -1]]

4.1.2 The matrix revealed


We will often use the traditional notation in examples. However, just as we find it helpful to
define vectors whose entries are identified by elements of an arbitrary finite set, we would like to
be able to refer to a matrix’s rows and columns using arbitrary finite sets,
As we have defined a D-vector over F to be a function from a set D to F, so we define a R × C
matrix over F to be a function from the Cartesian product R × C. We refer to the elements of
R as row labels and we refer to the elements of C as column labels.

Example 4.1.3: Here is an example in which R = {'a', 'b'} and C = {'#', '@', '?'}:

@ # ?
a 1 2 3
b 10 20 30

The column labels are given atop the columns, and the row labels are listed to the left of the
rows.
Formally, this matrix is a function from R × C to R. We can represent the function using
Python’s dictionary notation:

{('a','@'):1, ('a','#'):2, ('a', '?'):3, ('b', '@'):10, ('b', '#'):20,


CHAPTER 4. THE MATRIX 188

('b','?'):30}

4.1.3 Rows, columns, and entries


Much of the power of matrices comes from our ability to interpret the rows and columns of a
matrix as vectors. For the matrix of Example 4.1.3 (Page 187):

• row 'a' is the vector Vec({'@', '#', '?'}, {'@':1, '#':2, '?':3})
• row 'b' is the vector Vec({'@', '#', '?'}, {'@':10, '#':20, '?':30})
• column '#' is the vector Vec({'a','b'}, {'a':2, 'b':20})
• column '@' is the vector Vec({'a','b'}, {'a':1, 'b':10})

Quiz 4.1.4: Give a Python expression using Vec for column '?'.

Answer

Vec({'a','b'}, {'a':3, 'b':30})

For a R × C matrix M , and for r ∈ R and c ∈ C, the r, c element of M is defined to be


whatever the pair (r, c) maps to, and is written Mr,c or M [r, c]. The rows and columns are
defined as follows:

• For r ∈ R, row r is the C-vector such that, for each element c ∈ C, entry c is M [r, c], and

• for c ∈ C, column c is the R-vector such that, for each element r ∈ R, entry r is M [r, c].
We denote row r of M by M [r, :] or Mr,: and we denote column c of M by M [:, c] or M:,c .

Dict-of-rows representation
Since I have said that each row of a matrix is a vector, we can represent each row by an instance
of Vec. To map row-labels to the rows, we use a dictionary. I call this representation a rowdict.
For example, the rowdict representation of the matrix of Example 4.1.3 (Page 187) is:
{'a': Vec({'#', '@', '?'}, {'@':1, '#':2, '?':3}),
'b': Vec({'#', '@', '?'}, {'@':10, '#':20, '?':30})}

Dict-of-columns representation
The duality of rows and colums suggests a representation consisting of a dictionary mapping
column-labels to the columns represented as instances of Vec. I call this representation a coldict.
CHAPTER 4. THE MATRIX 189

Quiz 4.1.5: Give a Python expression whose value is the coldict representation of the matrix
of Example 4.1.3 (Page 187).

Answer

{'#': Vec({'a','b'}, {'a':2, 'b':20}),


'@': Vec({'a','b'}, {'a':1, 'b':10}),
'?': Vec({'a','b'}, {'a':3, 'b':30})}

4.1.4 Our Python implementation of matrices


We have defined several different representations of matrices, and will later define still more. It
is convenient, however, to define a class Mat, analogous to our vector class Vec, for representing
matrices. An instance of Mat will have two fields:
• D, which will be bound to a pair (R, C) of sets (unlike Vec, in which D is a single set);
• f, which will be bound to a dictionary representing the function that maps pairs (r, c) ∈
R × C to field elements.
We will follow the sparsity convention we used in representing vectors: entries of the matrix
whose values are zero need not be represented in the dictionary. Sparsity for matrices is more
important than for vectors since matrices tend to be much bigger: a C-vector has |C| entries but
an R × C matrix has |R| · |C| entries.
One key difference between our representations of vectors and matrices is the use of the D
field. In a vector, the value of D is a set, and the keys of the dictionary are elements of this set.
In a matrix, the value of D is a pair (R, C) of sets, and the keys of the dictionary are elements
of the Cartesian product R × C. The reason for this choice is that storing the entire set R × C
would require too much space for large sparse matrices.
The Python code required to define the class Mat is
class Mat:
def __init__(self, labels, function):
self.D = labels
self.f = function
Once Python has processed this definition, you can create an instance of Mat like so:
>>> M=Mat(({'a','b'}, {'@', '#', '?'}), {('a','@'):1, ('a','#'):2,
('a','?'):3, ('b','@'):10, ('b','#'):20, ('b','?'):30})
As with Vec, the first argument is assigned to the new instance’s D field, and the second is
assigned to the f field.
As with Vec, we will write procedures to manipulate instances of Mat, and eventually give
a more elaborate class definition for Mat, one that allows use of operators such as * and that
includes pretty printing as in
CHAPTER 4. THE MATRIX 190

>>> print(M)
# @ ?
---------
a | 2 1 3
b | 20 10 30

4.1.5 Identity matrix


Definition 4.1.6: For a finite set D, the D × D identity matrix is the matrix whose row-label
set and column-label set are both D, and in which entry (d, d) is a 1 for every d ∈ D (and all
other entries are zero). We denote it by 1D . Usually the set D is clear from the context, and
the identity matrix is written 1, without the subscript.

For example, here is the {'a','b','c'}×{'a','b','c'} identity matrix:

a b c
-------
a | 1 0 0
b | 0 1 0
c | 0 0 1

Quiz 4.1.7: Write an expression for the {'a','b','c'}×{'a','b','c'} identity matrix rep-
resented as an instance of Mat.

Answer

Mat(({'a','b','c'},{'a','b','c'}),{('a','a'):1,('b','b'):1,('c','c'):1})

Quiz 4.1.8: Write a one-line procedure identity(D) that, given a finite set D, returns the
D × D identity matrix represented as an instance of Mat.

Answer

def identity(D): return Mat((D,D), {(d,d):1 for d in D})

4.1.6 Converting between matrix representations


Since we will be using different matrix representations, it is convenient to be able to convert
between them.
CHAPTER 4. THE MATRIX 191

Quiz 4.1.9: Write a one-line procedure mat2rowdict(A) that, given an instance of Mat, re-
turns the rowdict representation of the same matrix. Use dictionary comprehensions.

>>> mat2rowdict(M)
{'a': Vec({'@', '#', '?'},{'@': 1, '#': 2, '?': 3}),
'b': Vec({'@', '#', '?'},{'@': 10, '#': 20, '?': 30})}
Hint: First write the expression whose value is the row r Vec; the F field’s value is defined by
a dictionary comprehension. Second, use that expression in a dictionary comprehension in which
r is the control variable.

Answer

Assuming r is bound to one of M’s row-labels, row r is the value of the expression

Vec(A.D[1],{c:A[r,c] for c in A.D[1]})

We want to use this expression as the value corresponding to key r in a dictionary compre-
hension:
{r:... for r in A.D[0]}

Putting these two expressions together, we define the procedure as follows:

def mat2rowdict(A):
return {r:Vec(A.D[1],{c:A[r,c] for c in A.D[1]}) for r in A.D[0]}

Quiz 4.1.10: Write a one-line procedure mat2coldict(A) that, given an instance of Mat,
returns the coldict representation of the same matrix. Use dictionary comprehensions.
>>> mat2coldict(M)
{'@': Vec({'a', 'b'},{'a': 1, 'b': 10}),
'#': Vec({'a', 'b'},{'a': 2, 'b': 20}),
'?': Vec({'a', 'b'},{'a': 3, 'b': 30})}

Answer

def mat2coldict(A):
return {c:Vec(A.D[0],{r:A[r,c] for r in A.D[0]}) for c in A.D[1]}

4.1.7 matutil.py
The file matutil.py is provided. We will be using this module in the future. It contains the
procedure identity(D) from Quiz 4.1.8 and the conversion procedures from Section 4.1.6. It
CHAPTER 4. THE MATRIX 192

also contains the procedures rowdict2mat(rowdict) and coldict2mat(coldict), which are the
inverses, respectively, of mat2rowdict(A) and mat2coldict(A). 1 It also contains the procedure
listlist2mat(L) that, given a list L of lists of field elements, returns an instance of Mat whose
rows correspond to the lists that are elements of L. This procedure is convenient for easily creating
small example matrices:
>>> A=listlist2mat([[10,20,30,40],[50,60,70,80]])
>>> print(A)

0 1 2 3
-------------
0 | 10 20 30 40
1 | 50 60 70 80

4.2 Column space and row space


Matrices serve many roles, but one is a way of packing together vectors. There are two ways of
interpreting a matrix as a bunch of vectors: a bunch of columns and a bunch of rows.
Correspondingly, there are two vector spaces associated with a matrix:

Definition 4.2.1: For a matrix M ,

• the column space of M , written Col M , is the vector space spanned by the columns of M ,
and

• row space of M , written Row M , is the vector space spanned by the rows of M .

# $
1 2 3
Example 4.2.2: The column space of is Span {[1, 10], [2, 20], [3, 30]}. In this
10 20 30
case, the column space is equal to Span {[1, 10]} since [2, 20] and [3, 30] are scalar multiples of
[1, 10].
The row space of the same matrix is Span {[1, 2, 3], [10, 20, 30]}. In this case, the span is
equal to Span {[1, 2, 3]} since [10, 20, 30] is a scalar multiple of [1, 2, 3].

We will get a deeper understanding of the significance of the column space and row space in
Sections 4.5.1, 4.5.2 and 4.10.6. In Section 4.7, we will learn about one more important vector
space associated with a matrix.
1 For each of the procedures rowdict2mat(rowdict), the argument can be either a dictionary of vectors or a

list of vectors.
CHAPTER 4. THE MATRIX 193

4.3 Matrices as vectors


Presently we will describe the operations that make matrices useful. First, we observe that a
matrix can be interpreted as a vector. In particular, an R × S matrix over F is a function
from R × S to F, so it can be interpreted as an R × S-vector over F. Using this interpretation,
we can perform the usual vector operations on matrices, scalar-vector multiplication and vector
addition. Our full implementation of the Mat class will include these operations. (We won’t be
using dot-product with matrices).

Quiz 4.3.1: Write the procedure mat2vec(M) that, given an instance of Mat, returns the cor-
responding instance of Vec. As an example, we show the result of applying this procedure to the
matrix M given in Example 4.1.3 (Page 187):

>>> print(mat2vec(M))
('a', '#') ('a', '?') ('a', '@') ('b', '#') ('b', '?') ('b', '@')
------------------------------------------------------------------
2 3 1 20 30 10

Answer

def mat2vec(M):
return Vec({(r,s) for r in M.D[0] for s in M.D[1]}, M.f)

We won’t need mat2vec(M) since Mat will include vector operations.

4.4 Transpose
Transposing a matrix means swapping its rows and columns.

Definition 4.4.1: The transpose of an P × Q matrix, written M T , is a Q × P matrix such


that (M T )j,i = Mi,j for every i ∈ P, j ∈ Q.

Quiz 4.4.2: Write the procedure transpose(M) that, given an instance of Mat representing a
matrix, returns the representation of the transpose of that matrix.

>>> print(transpose(M))
a b
------
# | 2 20
@ | 1 10
? | 3 30
CHAPTER 4. THE MATRIX 194

Answer

def transpose(M):
return Mat((M.D[1], M.D[0]), {(q,p):v for (p,q),v in M.F.items()})

We say a matrix M is a symmetric matrix if M T = M .


# $ # $
1 2 1 2
Example 4.4.3: The matrix is not symmetric but the matrix is symmetric.
3 4 2 4

4.5 Matrix-vector and vector-matrix multiplication in terms


of linear combinations
What do we do with matrices? Mostly we multiply them by vectors. There are two ways to
multiply a matrix by a vector: matrix-vector multiplication and vector-matrix multiplication. For
each, I will give two equivalent definitions of multiplication: one in terms of linear combinations
and one in terms of dot-products. The reader needs to absorb all these definitions because
different contexts call for different interpretations.

4.5.1 Matrix-vector multiplication in terms of linear combinations


Definition 4.5.1 (Linear-combinations definition of matrix-vector multiplication):
Let M be an R × C matrix over F. Let v be a C-vector over F. Then M ∗ v is the linear com-
bination %
v[c] (column c of M )
c∈C

If M is an R × C matrix but v is not a C-vector then the product M ∗ v is illegal.


In the traditional-matrix case, if M is an m × n matrix over F then M ∗ v is legal only if v
is an n-vector over F. That is, the number of columns of the matrix must match the number of
entries of the vector.

Example 4.5.2: Let’s consider an example using a traditional matrix:


# $
1 2 3
∗ [7, 0, 4] = 7 [1, 10] + 0 [2, 20] + 4 [3, 30]
10 20 30
= [7, 70] + [0, 0] + [12, 120] = [19, 190]

# $
1 2 3
Example 4.5.3: What about times the vector [7, 0]? This is illegal: you
10 20 30
CHAPTER 4. THE MATRIX 195

can’t multiply a 2 × 3 matrix with a 2-vector. The matrix has three columns but the vector has
two entries.

Example 4.5.4: Now we do an example with a matrix with more interesting row and column
@ # ? @ # ? a b
labels: a 2 1 3 * 0.5 5 -1 = 3.0 30.0
b 20 10 30

Example 4.5.5: Lights Out: In Example 3.1.9 (Page 147), we saw that a solution to a Lights
Out puzzle (which buttons to press to turn out the lights) is a linear combination of “button
vectors.” Now we can write such a linear combination as a matrix-vector product where the
columns of the matrix are button vectors.
For example, the linear combination

• • • • • •
1 + 0 + 0 + 1
• • • • • •

can be written as
 

 • • • • • • 
 ∗ [1, 0, 0, 1]
 • • • • • • 

4.5.2 Vector-matrix multiplication in terms of linear combinations


We have seen a definition of matrix-vector multiplication in terms of linear combinations of
columns of a matrix. We now define vector-matrix multiplication in terms of linear combinations
of the rows of a matrix.

Definition 4.5.6 (Linear-combinations definition of vector-matrix multiplication):


Let M be an R × C matrix. Let w be an R-vector. Then w ∗ M is the linear combination
%
w[r] (row r of M )
r∈R

If M is an R × C matrix but w is not an R-vector then the product w ∗ M is illegal.


This is a good moment to point out that matrix-vector multiplication is different from vector-
matrix multiplication; in fact, often M ∗v is a legal product but v∗M is not or vice versa. Because
we are used to assuming commutativity when we multiply numbers, the noncommutativity of
multiplication between matrices and vectors can take some getting used to.
CHAPTER 4. THE MATRIX 196

Example 4.5.7:
# $
1 2 3
[3, 4] ∗ = 3 [1, 2, 3] + 4 [10, 20, 30]
10 20 30
= [3, 6, 9] + [40, 80, 120] = [43, 86, 129]

# $
1 2 3
Example 4.5.8: What about [3, 4, 5] ∗ ? This is illegal: you can’t multiply a
10 20 30
3-vector and a 2 × 3 matrix. The number of entries of the vector must match the number of
rows of the matrix.

Remark 4.5.9: Transpose swaps rows and columns. The rows of M are the columns of M T .
We could therefore define w ∗ M as M T ∗ w. However, implementing it that way would be a
mistake—transpose creates a completely new matrix, and, if the matrix is big, it is inefficient to
do that just for the sake of computing a vector-matrix product.

Example 4.5.10: In Section 3.1.2, we gave examples of applications of linear combinations.


Recall the JunkCo factory data table from Example 3.1.6 (Page 145):
metal concrete plastic water electricity
garden gnome 0 1.3 .2 .8 .4
hula hoop 0 0 1.5 .4 .3
slinky .25 0 0 .2 .7
silly putty 0 0 .3 .7 .5
salad shooter .15 0 .5 .4 .8
Corresponding to each product is a vector. In Example 3.1.6 (Page 145), we defined the
vectors
v gnome, v hoop, v slinky, v putty, and v shooter,
each with domain
{’metal’,’concrete’,’plastic’,’water’,’electricity’}
We can construct a matrix M whose rows are these vectors:
>>> rowdict = {'gnome':v_gnome, 'hoop':v_hoop, 'slinky':v_slinky,
'putty':v_putty, 'shooter':v_shooter}
>>> M = rowdict2mat(rowdict)
>>> print(M)

plastic metal concrete water electricity


------------------------------------------
putty | 0.3 0 0 0.7 0.5
gnome | 0.2 0 1.3 0.8 0.4
CHAPTER 4. THE MATRIX 197

slinky | 0 0.25 0 0.2 0.7


hoop | 1.5 0 0 0.4 0.3
shooter | 0.5 0.15 0 0.4 0.8

In that example, JunkCo decided on quantities αgnome , αhoop , αslinky , αputty , αshooter for the prod-
ucts. We saw that the the vector giving the total utilization of each resource, a vector whose
domain is {metal, concrete, plastic, water, electricity}, is a linear combination of the rows of
the table where the coefficient for product p is αp .
We can obtain the total-utilization vector as a vector-matrix product

[αgnome , αhoop , αslinky , αputty , αshooter ] ∗ M (4.1)

Here’s how we can compute the total utilization in Python using vector-matrix multiplication.
Note the use of the asterisk * as the multiplication operator.

>>> R = {'gnome', 'hoop', 'slinky', 'putty', 'shooter'}


>>> u = Vec(R, {'putty':133, 'gnome':240, 'slinky':150, 'hoop':55,
'shooter':90})
>>> print(u*M)

plastic metal concrete water electricity


-----------------------------------------
215 51 312 373 356

4.5.3 Formulating expressing a given vector as a linear-combination as a matrix-


vector equation
We have learned that a linear combination can be expressed as a matrix-vector or vector-matrix
product. We now use that idea to reformulate the problem of expressing a given vector as a
linear-combination.

Example 4.5.11: Recall the industrial espionage problem of Section 3.1.4: given the JunkCo
factory data table, and given the amount of resources consumed, compute the quantity of the
products produced. Let b be the vector of resources consumed. Define x to be a vector variable.
In view of 4.1, we obtain a matrix-vector equation:

x∗M =b

Solving the industrial espionage problem amounts to solving this equation.

Example 4.5.12: In Example 3.1.9 (Page 147), we said that, for a given initial state s of
Lights Out, the problem of figuring out which buttons to push to turn all lights out could be
expressed as the problem of expressing s as a linear combination (over GF (2)) of the button
CHAPTER 4. THE MATRIX 198

vectors. In Example 4.5.5 (Page 195), we further pointed out that the linear combination of
button vectors could be written as a matrix-vector product B ∗ x where B is a matrix whose
columns are the the button vectors. Thus the problem of finding the correct coefficients can be
expressed as the problem of finding a vector x such that B ∗ x = s.
Here we give a Python procedure to create a dictionary of button-vectors for n × n Lights
Out. Note that we use the value one defined in the module GF2.
def button_vectors(n):
D = {(i,j) for i in range(n) for j in range(n)}
vecdict={(i,j):Vec(D,dict([((x,j),one) for x in range(max(i-1,0), min(i+2,n))]
+[((i,y),one) for y in range(max(j-1,0), min(j+2,n))]))
for (i,j) in D}
return vecdict

Entry (i, j) of the returned dictionary is the button-vector corresponding to button (i, j).
Now we can construct the matrix B whose columns are button-vectors for 5 × 5 Lights Out:

>>> B = coldict2mat(button_vectors(5))
Suppose we want to find out which button vectors to press when the puzzle starts from a
particular configuration, e.g. when only the middle light is on. We create a vector s representing
that configuration:

>>> s = Vec(b.D, {(2,2):one})


Now we need to solve the equation B ∗ x = s.

4.5.4 Solving a matrix-vector equation


In each of the above examples—and in many more applications—we face the following compu-
tational problem.

Computational Problem 4.5.13: Solving a matrix-vector equation

• input: an R × C matrix A and an R-vector b


• output: the C-vector x̂ such that A ∗ x̂ = b

Though we have specified the computational problem as solving an equation of the form A∗x = b,
an algorithm for this problem would also suffice to solve a matrix-vector equation of the form
x ∗ A = b since we could apply the algorithm to the transpose AT of A.

Example 4.5.14: In Example 3.4.13 (Page 162), we considered Span {[a, b], [c, d]} where
a, b, c, d ∈ R.
1. We showed that, if [c, d] is not in Span {[a, b]} then ad 2= bc.
CHAPTER 4. THE MATRIX 199

2. If that is the case, we showed that, for every vector [p, q] in R2 , there are coefficients α
and β such that
[p, q] = α [a, b] + β [c, d] (4.2)
dp−cq
In Part 2, we actually gave formulas for α and β in terms of p, q, a, b, c, d: α = ad−bc and
β = aq−bp
ad−bc .
Note that Equation 4.2 can be rewritten as a matrix vector equation:
# $
a c
∗ [α, β] = [p, q]
b d

Thus the formulas for α and β give an algorithm for solving a matrix-vector equation in which
the matrix is 2 × 2 and the second column is not
# in the$ span of the first.
1 2
For example, to solve the matrix equation ∗ [α, β] = [−1, 1],
3 4
we set α = 4·−1−2·1 −6 1·1−3·−1 4
1·4−2·3 = −2 = 3 and β = 1·4−2·3 = −2 = −2

In later chapters, we will study algorithms for this computational problem. For now, I have
provided a module solver that implements these algorithms. It contains a procedure
solve(A, b) with the following spec:
• input: an instance A of Mat, and an instance v of Vec
• output: a vector u such that A ∗ u = v (to within some error tolerance) if there is any such
vector u
Note that the output vector might not be a solution to the matrix-vector equation. In particular,
if there is no solution to the matrix-vector equation, the vector returned by solve(A,b) is not
a solution. You should therefore check each answer u you get from solver(A,b) by comparing
A*u to b.
Moreover, if the matrix and vector are over R, the calculations use Python’s limited-precision
arithmetic operations. Even if the equation A ∗ x = b has a solution, the vector u returned might
not be an exact solution.

Example 4.5.15: We use solve(A,b) to solve the industrial espionage problem. Suppose
we observe that JunkCo uses 51 units of metal, 312 units of concrete, 215 units of plastic,
373.1 units of water, and 356 units of electricity. We represent these observations by a vector b:
>>> C = {'metal','concrete','plastic','water','electricity'}
>>> b = Vec(C, {'water':373.1,'concrete':312.0,'plastic':215.4,
'metal':51.0,'electricity':356.0})
We want to solve the vector-matrix equation x ∗ M = b where M is the matrix defined in
Example 4.5.10 (Page 196). Since solve(A,b) solves a matrix-vector equation, we supply the
transpose of M as the first argument A:

>>> solution = solve(M.transpose(), b)


CHAPTER 4. THE MATRIX 200

>>> print(solution)

putty gnome slinky hoop shooter


--------------------------------
133 240 150 55 90
Does this vector solve the equation? We can test it by computing the residual vector (often
called the residual):

>>> residual = b - solution*M


If the solution were exact, the residual would be the zero vector. An easy way to see if the
residual is almost the zero vector is to calculate the sum of squares of its entries, which is just
its dot-product with itself:

>>> residual * residual


1.819555009546577e-25
About 10−25 , so zero for our purposes!
However, we cannot yet truly be confident we have penetrated the secrets of JunkCo. Perhaps
the solution we have computed is not the only solution to the equation! More on this topic later.

Example 4.5.16: Continuing with Example 4.5.12 (Page 197), we use solve(A,b) to solve
5 × 5 Lights Out starting from a state in which only the middle light is on:

>>> s = Vec(b.D, {(2,2):one})


>>> sol = solve(B, s)
You can check that this is indeed a solution:
>>> B*sol == s
True

Here there is no issue of accuracy since elements of GF (2) are represented precisely. Moreover,
for this problem we don’t care if there are multiple solutions to the equation. This solution tells
us one collection of buttons to press:

>>> [(i,j) for (i,j) in sol.D if sol[i,j] == one]


[(4,0),(2,2),(4,1),(3,2),(0,4),(1,4),(2,3),(1,0),(0,1),(2,0),(0,2)]

4.6 Matrix-vector multiplication in terms of dot-products


We will also define matrix-vector product in terms of dot-products.

4.6.1 Definitions
CHAPTER 4. THE MATRIX 201

Definition 4.6.1 (Dot-Product Definition of Matrix-Vector Multiplication): If M


is an R × C matrix and u is a C-vector then M ∗ u is the R-vector v such that v[r] is the
dot-product of row r of M with u.

Example 4.6.2: Consider the matrix-vector product


 
1 2
 3 4  ∗ [3, −1]
10 0

The product is a 3-vector. The first entry is the dot-product of the first row, [1, 2], with [3, −1],
which is 1 · 3 + 2 · (−1) = 1. The second entry is the dot-product of the second row, [3, 4], with
[3, −1], which is 3 · 3 + 4 · (−1) = 5. The third entry is 10 · 3 + 0 · (−1) = 30. Thus the product
is [1, 5, 30].
 
1 2
 3 4  ∗ [3, −1] = [ [1, 2] · [3, −1], [3, 4] · [3, −1], [10, 0] · [3, −1] ] = [1, 5, 30]
10 0

Vector-matrix multiplication is defined in terms of dot-products with the columns.

Definition 4.6.3 (Dot-Product Definition of Vector-Matrix Multiplication): If M


is an R × C matrix and u is a R-vector then u ∗ M is the C-vector v such that v[c] is the
dot-product of u with column c of M .

4.6.2 Example applications


Example 4.6.4: You are given a high-resolution image. You would like a lower-resolution
version to put on your web page so the page will load more quickly.
CHAPTER 4. THE MATRIX 202

You therefore seek to downsample the image.

Each pixel of the low-res image (represented as a solid rectangle) corresponds to a little grid of
pixels of the high-res image (represented as dotted rectangles). The intensity value of a pixel of
the low-res image is the average of the intensity values of the corresponding pixels of the high-res
image.
Let’s represent the high-res image as a vector u. We saw in Quiz 2.9.3 that averaging can
be expressed as a dot-product. In downsampling, for each pixel of the low-res image to be
created, the intensity is computed as the average of a subset of the entries of u; this, too, can
be expressed as a dot-product. Computing the low-res image thus requires one dot-product for
each pixel of that image.
Employing the dot-product definition of matrix-vector multiplication, we can construct a
matrix M whose rows are the vectors that must be dotted with u. The column-labels of M are
the pixel coordinates of the high-res image. The row-labels of M are the pixel coordinates of the
low-res image. We write v = M ∗ u where v is a vector representing the low-res image.
Suppose the high-res image has dimensions 3000 × 2000 and our goal is to create a low-res
image with dimensions 750×500. The high-res image is represented by a vector u whose domain
is {0, 1, . . . , 2999} × {0, 1, . . . , 1999} and the low-res image is represented by a vector v whose
domain is {0, 1, . . . , 749} × {0, 1, . . . , 499}.
The matrix M has column-label set {0, 1, . . . , 2999} × {0, 1, . . . , 1999} and row-label set
{0, 1, . . . , 749} × {0, 1, . . . , 499}. For each low-res pixel coordinate pair (i, j), the corresponding
row of M is the vector that is all zeroes except for the 4 × 4 grid of high-res pixel coordinates

(4i, 4j), (4i, 4j + 1), (4i, 4j + 2), (4i, 4j + 3), (4i + 1, 4j), (4i + 1, 4j + 1), . . . , (4i + 3, 4j + 3)
1
where the values are 16 .
Here is the Python code to construct the matrix M .
D_high = {(i,j) for i in range(3000) for j in range(2000)}
D_low ={(i,j) for i in range(750) for j in range(500)}
M = Mat((D_low, D_high),
{((i,j), (4*i+m, 4*j+n)):1./16 for m in range(4) for n in range(4)
for i in range(750) for j in range(500)})
However, you would never actually want to create this matrix! I provide the code just for
illustration.
CHAPTER 4. THE MATRIX 203

Example 4.6.5: You are given an image and a set of pixel-coordinate pairs forming regions in
the image, and you wish to produce a version of the image in which the regions are blurry.
Perhaps the regions are faces, and
you want to blur them to protect
the subjects’ privacy. Once again,
the transformation can be formu-
lated as matrix-vector multiplica-
tion M ∗ v. (Once again, there is
no reason you would actually want
to construct the matrix explicitly,
but the existence of such a matrix
is useful in quickly computing the
transformation, as we will discuss
in Chapter 10.)
This time, the input image and
output image have the same di-
mensions. For each pixel that
needs to be blurred, the inten-
sity is computed as an average of
the intensities of many nearby pix-
els. Once again, we use the fact
that average can be computed as
a dot-product and that matrix-
vector multiplication can be inter-
preted as carrying out many dot-
products, one for each row of the
matrix.
Averaging treats all nearby pixels equally. This tends to produce undesirable visual artifacts
and is not a faithful analogue of the kind of blur we see with our eyes. A Gaussian blur more
heavily weights very nearby pixels; the weights go down (according to a specific formula) with
distance from the center.

Whether blurring is done using simple averaging or weighted averaging, the transformation is an
example of a linear filter, as mentioned in Section 2.9.3.

Example 4.6.6: As in Section 2.9.3, searching for an audio clip within an audio segment can
be formulated as finding many dot-products, one for each of the possible locations of the audio
clip or subimage. It is convenient to formulate finding these dot-products as a matrix-vector
product.
Supppose we are trying to find the sequence [0, 1, −1] in the longer sequence

[0, 0, −1, 2, 3, −1, 0, 1, −1, −1]


CHAPTER 4. THE MATRIX 204

We need to compute one dot-product for each of the possible positions of the short sequence
within the long sequence. The long sequence has ten entries, so there are ten possible positions
for the short sequence, hence ten dot-products to compute.

You might think a couple of these positions are not allowed since 0
-1 0
these positions do not leave enough room for matching all the entries
-1 -1
of the short sequence. However, we adopt a wrap-around conven-
tion: we look for the short sequence starting at the end of the long 1 2
sequence, and wrapping around to the beginning. It is exactly as if
the long sequence were written on a circular strip. 0 3
-1
We formulate computing the ten dot-products as a product of a ten-row matrix with the ten-
element long sequence:
 
0 1 −1 0 0 0 0 0 0 0
 0
 0 1 −1 0 0 0 0 0 0 
 0
 0 0 1 −1 0 0 0 0 0 
 0
 0 0 0 1 −1 0 0 0 0 
 0
 0 0 0 0 1 −1 0 0 0  ∗ [0, 0, −1, 2, 3, −1, 0, 1, −1, −1]
 0
 0 0 0 0 0 1 −1 0 0 

 0
 0 0 0 0 0 0 1 −1 0  
 0
 0 0 0 0 0 0 0 1 −1  
 −1 0 0 0 0 0 0 0 0 1 
1 −1 0 0 0 0 0 0 0 0

The product is the vector [1, −3, −1, 4, −1, −1, 2, 0, −1, 0]. The second-biggest dot-product,
2, indeed occurs at the best-matching position, though the biggest dot-product, 5, occurs at a
not-so-great match.
Why adopt the wrap-around convention? It allows us to use a remarkable algorithm to
compute the matrix-vector product much more quickly than would seem possible. The Fast
Fourier Transform (FFT) algorithm, described in Chapter 10, makes use of the fact that the
matrix has a special form.

4.6.3 Formulating a system of linear equations as a matrix-vector equa-


tion
In Section 2.9.2, we defined a linear equation as an equation of the form a · x = β, and we defined
a system of linear equations as a collection of such equations:
a1 · x = β1
a2 · x = β2
..
.
am · x = βm
Using the dot-product definition of matrix-vector multiplication, we can rewrite this system of
equations as a single matrix-vector equation. Let A be the matrix whose rows are a1 , a2 , . . . , am .
CHAPTER 4. THE MATRIX 205

Let b be the vector [β1 , β2 , . . . , βm ]. Then the system of linear equations is equivalent to the
matrix-vector equation A ∗ x = b.

Example 4.6.7: Recall that in Example 2.9.7 (Page 113) we studied current consumption of
hardware components in sensor nodes. Define D = {’radio’, ’sensor’, ’memory’, ’CPU’}.
Our goal was to compute a D-vector that, for each hardware component, gives the current drawn
by that component.
We have five test periods. For i = 0, 1, 2, 3, 4, there is a vector durationi giving the amount
of time each hardware component is on during test period i.

>>> D = {'radio', 'sensor', 'memory', 'CPU'}


>>> v0 = Vec(D, {'radio':.1, 'CPU':.3})
>>> v1 = Vec(D, {'sensor':.2, 'CPU':.4})
>>> v2 = Vec(D, {'memory':.3, 'CPU':.1})
>>> v3 = Vec(D, {'memory':.5, 'CPU':.4})
>>> v4 = Vec(D, {'radio':.2, 'CPU':.5})

We are trying to compute a D-vector rate such that


v0*rate = 140, v1*rate = 170, v2*rate = 60, v3*rate = 170, and v4*rate = 250
We can formulate this system of equations as a matrix-vector equation:
 
v0

 v1 

 v2  ∗ [x0 , x1 , x2 , x3 , x4 ] = [140, 170, 60, 170, 250]
 
 v3 
v4

To carry out the computation in Python, we construct the vector

>>> b = Vec({0, 1, 2, 3, 4},{0: 140.0, 1: 170.0, 2: 60.0, 3: 170.0, 4: 250.0})


and construct a matrix A whose rows are v0, v1, v2, v3, and v4:
>>> A = rowdict2mat([v0,v1,v2,v3,v4])

Next we solve the matrix-vector equation A*x=b:


>>> rate = solve(A, b)

obtaining the vector


Vec(D, {’radio’:500, ’sensor’:250, ’memory’:100, ’CPU’:300})

Now that we recognize that systems of linear equations can be formulated as matrix-vector
equations, we can reformulate problems and questions involving linear equations as problems
involving matrix-vector equations:

• Solving a linear system (Computational Problem 2.9.12) becomes solving a matrix equation
(Computational Problem 4.5.13).
CHAPTER 4. THE MATRIX 206

• The question how many solutions are there to a linear system over GF (2) (Question 2.9.18),
which came up in connection with attacking the authentication scheme (Section 2.9.7),
becomes the question how many solutions are there to a matrix-vector equation over GF (2).
• Computational Problem 2.9.19, computing all solutions to a linear system over GF (2),
becomes computing all solutions to a matrix-vector equation over GF (2).

4.6.4 Triangular systems and triangular matrices


In Section 2.11, we described an algorithm to solve a triangular system of linear equations. We
have just seen that a system of linear equations can be formulated as a matrix-vector equation.
Let’s see what happens when we start with a triangular system.

Example 4.6.8: Reformulating the triangular system of Example 2.11.1 (Page 130) as a
matrix-vector equation, we obtain
 
1 0.5 −2 4
 0 3 3 2 
  ∗ x = [ − 8, 3, −4, 6]
 0 0 1 5 
0 0 0 2

Because we started with a triangular system, the resulting matrix has a special form: the first
entry of the second row is zero, the first and second entries of the third row are zero, and the
first and second and third entries of the fourth row are zero. Since the nonzero entries form a
triangle, the matrix itself is called a triangular matrix.

Definition 4.6.9: An n × n upper-triangular matrix A is a matrix with the property that


Aij = 0 for i > j.

Note that the entries forming the triangle can be be zero or nonzero.
The definition applies to traditional matrices. To generalize to our matrices with arbitrary
row- and column-label sets, we specify orderings of the label-sets.

Definition 4.6.10: Let R and C be finite sets. Let LR be a list of the elements of R, and let
LC be a list of the elements of C. An R × C matrix A is triangular with respect to LR and LC
if
A[LR [i], LC [j]] = 0
for j > i.
CHAPTER 4. THE MATRIX 207

@ # ?
a 0 2 3
Example 4.6.11: The {a, b c}× {@, #, ?} matrix
b 10 20 30
c 0 35 0

is triangular with respect to [a, b c] and [@, ?, #]. We can see this by reordering the rows
@ ? #
b 10 30 20
and columns according to the list orders:
a 0 3 2
c 0 0 35

To facilitate viewing a matrix with reordered rows and columns, the class Mat will provide a
pretty-printing method that takes two arguments, the lists LR and LC :
>>> A = Mat(({'a','b','c'}, {'#', '@', '?'}),
... {('a','#'):2, ('a','?'):3,
... ('b','@'):10, ('b','#'):20, ('b','?'):30,
... ('c','#'):35})
>>>
>>> print(A)

# ? @
----------
a | 2 3 0
b | 20 30 10
c | 35 0 0

>>> A.pp(['b','a','c'], ['@','?','#'])

@ ? #
----------
b | 10 30 20
a | 0 3 2
c | 0 0 35

Problem 4.6.12: (For the student with knowledge of graph algorithms) Design an algorithm
that, for a given matrix, finds a list of a row-labels and a list of column-labels with respect to
which the matrix is triangular (or report that no such lists exist).

4.6.5 Algebraic properties of matrix-vector multiplication


We use the dot-product interpretation of matrix-vector multiplication to derive two crucial prop-
erties. We will use the first property in the next section, in characterizing the solutions to a
CHAPTER 4. THE MATRIX 208

matrix-vector equation and in error-correcting codes.

Proposition 4.6.13: Let M be an R × C matrix.


• For any C-vector v and any scalar α,

M ∗ (α v) = α (M ∗ v) (4.3)

• For any C-vectors u and v,

M ∗ (u + v) = M ∗ u + M ∗ v (4.4)

Proof

To show Equation 4.3 holds, we need only show that, for each r ∈ R, entry r of the left-hand
side equals entry r of the right-hand side. By the dot-product interpretation of matrix-vector
multiplication,

• entry r of the left-hand side equals the dot-product of row r of M with αv, and

• entry r of the right-hand side equals α times the dot-product of row r of M with v.
These two quantities are equal by the homogeneity of dot-product, Proposition 2.9.22.
The proof of Equation 4.4 is similar; we leave it as an exercise. !

Problem 4.6.14: Prove Equation 4.4.

4.7 Null space


4.7.1 Homogeneous linear systems and matrix equations
In Section 3.6, we introduced homogeneous linear systems, i.e. systems of linear equations in
which all right-hand side values were zero. Such a system can of course be formulated as a
matrix-vector equation A ∗ x = 0 where the right-hand side is a zero vector.

Definition 4.7.1: The null space of a matrix A is the set {v : A ∗ v = 0}. It is written
Null A.

Since Null A is the solution set of a homogeneous linear system, it is a vector space (Sec-
tion 3.4.1).
CHAPTER 4. THE MATRIX 209

 
1 4 5
Example 4.7.2: Let A =  2 5 7 . Since the sum of the first two columns equals the
3 6 9
third column, A ∗ [1, 1, −1] is the zero vector. Thus [1, 1, −1] is in Null A. By Equation 4.3,
for any scalar α, A ∗ (α [1, 1, −1]) is also the zero vector, so α [1, 1, −1] is also in Null A. For
example, [2, 2, −2] is in Null A.

Problem 4.7.3: For each of the given matrices, find a nonzero vector in the null space of the
matrix.
( )
1. 1 0 1
# $
2 0 0
2.
0 1 1
 
1 0 0
3.  0 0 0 
0 0 1

Here we make use of Equation 4.4:

Lemma 4.7.4: For any R × C matrix A and C-vector v, a vector z is in the null space of A
if and only if A ∗ (v + z) = A ∗ v.

Proof

The statement is equivalent to the following statements:

1. if the vector z is in the null space of A then A ∗ (v + z) = A ∗ v;

2. if A ∗ (v + z) = A ∗ v then z is in the null space of A.


For simplicity, we prove these two statements separately.
1. Suppose z is in the null space of A. Then

A ∗ (v + z) = A ∗ v + A ∗ z = A ∗ v + 0 = A ∗ v

2. Suppose A ∗ (v + z) = A ∗ v. Then

A ∗ (v + z) = A∗v
A∗v+A∗z = A∗v
A∗z = 0
CHAPTER 4. THE MATRIX 210

4.7.2 The solution space of a matrix-vector equation


In Lemma 3.6.1 (Section 3.6.1), we saw that two solutions to a system of linear equations differ
by a vector that solves the corresponding system of homogeneous equations. We restate and
reprove the result in terms of matrix-vector equations:

Corollary 4.7.5: Suppose u1 is a solution to the matrix equation A ∗ x = b. Then u2 is also


a solution if and only if u1 − u2 belongs to the null space of A.

Proof

Since A ∗ u1 = b, we know that

A ∗ u2 = b if and only if A ∗ u2 = A ∗ u1 .
Applying Lemma 4.7.4 with v = u2 and z = u1 − u2 , we infer:

A ∗ u2 = A ∗ u1 if and only if u1 − u2 is in the null space of A.


Combining these two statements proves the corollary. !

While studying a method for calculating the rate of power consumption for hardware components
(Section 2.9.2), we asked about uniqueness of a solution to a system of linear equations. We saw
in Corollary 3.6.4 that uniqueness depended on whether the corresponding homogeneous system
have only the trivial solution. Here is the same corollary, stated in matrix terminology:

Corollary 4.7.6: Suppose a matrix-vector equation Ax = b has a solution. The solution is


unique if and only if the null space of A consists solely of the zero vector.

Thus uniqueness of a solution comes down to the following question:

Question 4.7.7: How can we tell if the null space of a matrix consist solely of the zero vector?

This is just a restatement using matrix terminology, of Question 3.6.5, How can we tell if a
homogeneous linear system has only the trivial solution?
While studying an attack on an authentication scheme in Section 2.9.7, we became interested
in counting the solutions to a system of linear equations over GF (2) (Question 2.9.18). In
Section 3.6.1 we saw that this was equivalent to counting the solutions to a homogeneous system
(Question 3.6.6). Here we restate this problem in terms of matrices:

Question 4.7.8: How can we find the cardinality of the null space of a matrix over GF (2)?
CHAPTER 4. THE MATRIX 211

4.7.3 Introduction to error-correcting codes


Richard Hamming was getting, he later recalled, “very an-
noyed.” He worked for Bell Laboratories in New Jersey but
needed to use a computer located in New York. This was a
very early computer, built using electromechanical relays, and
it was somewhat unreliable. However, the computer could de-
tect when an error occured, and when it did, it would restart
the current computation. After three tries, however, it would
go on to the next computation.
Hamming was, in his words, “low man on the totem pole”,
so he didn’t get much use of the computer during the work
week. However, nobody else was using it during the weekend.
Hamming was allowed to submit a bunch of computations on
Friday afternoon; the computer would run them during the
weekend, and Hamming would be able to collect the results.
However, he came in one Monday to collect his results, and
found that something went wrong, and all the computations
failed. He tried again the following weekend—the same thing happened. Peeved, he asked himself:
if the computer can detect that its input has an error, why can’t it tell me where the error is?
Hamming had long known one solution to this problem: replication. if you are worried about
occasional bit errors, write your bit string three times: for each bit position, if the three bit
strings differ in that position, choose the bit that occurs twice. However, this solution uses more
bits than necessary.
As a result of this experience, Hamming invented error-correcting codes. The first code he
invented is now called the Hamming code and is still used, e.g. in flash memory. He and other
researchers subsequently discovered many other error-correcting codes. Error-correcting codes
are ubiquitious today; they are used in many kinds of transmission (including WiFi, cell phones,
communication with satellites and spacecraft, and digital television) and storage (RAM, disk
drives, flash memory, CDs, and DVDs).
The Hamming code is what we now call a linear binary block code:
• linear because it is based on linear algebra,

• binary because the input and output are assumed to be in binary, and

• block because the code involves a fixed-length sequence of bits.


The transmission or storage of data is modeled by a noisy channel, a tube through which you
can push vectors but which sometimes flips bits. A block of bits is represented by a vector over
GF (2). A binary block code defines a function f : GF (2)m −→ GF (2)n . (In the Hamming code,
m is 4 and n is 7.)
transmission over
noisy channel
0101 encode 1101101 1111101 decode 0101
CHAPTER 4. THE MATRIX 212

When you have a block of m bits you want to be reliably received at the other end, you first use
f to transform it to an n-vector, which you then push through the noisy channel. At the other
end of the noisy channel, the recipient gets an n-vector that might differ from the original in
some bit positions; the recipient must somehow figure out which bits were changed as the vector
passed through the noisy channel.
We denote by C the set of encodings, the image of f —the set of n-vectors that can be injected
into the noisy channel. The vectors of C are called codewords.

4.7.4 Linear codes


Let c denote the codeword injected into the noisy channel, and let c̃ denote the vector (not
necessarily a codeword) that comes out the other end. Ordinarily, c̃ differs from c only in a small
number of bit positions, the positions in which the noisy channel introduced errors. We write
c̃ = c + e
where e is the vector with 1’s in the error positions. We refer to e as the error vector.
The recipient gets c̃ and needs to figure out e in order to figure out c. How?
In a linear code, the set C of codewords is the null space of a matrix H. This simplifies the
job of the recipient. Using Equation 4.4, we see
H ∗ c̃ = H ∗ (c + e) = H ∗ c + H ∗ e = 0 + H ∗ e = H ∗ e
because c is in the null space of H.
Thus the recipient knows something useful about e: she knows H ∗ e (because it is the same
as H ∗ c̃, which she can compute). The vector H ∗ e is called the error syndrome. If the error
syndrome is the zero vector then the recipient assumes that e is all zeroes, i.e. that no error
has been introduced. If the error syndrome is a nonzero vector then the recipient knows that
an error has occured, i.e. that e is not all zeroes. The recipient needs to figure out e from the
vector H ∗ e. The method for doing this depends on the particular code being used.

4.7.5 The Hamming Code


In the Hamming code, the codewords are 7-vectors, and
 
0 0 0 1 1 1 1
H= 0 1 1 0 0 1 1 
1 0 1 0 1 0 1
Notice anything special about the columns and their order?
Now suppose that the noisy channel introduces at most one bit error. Then e has only one 1.
Can you determine the position of the bit error from the matrix-vector product H ∗ e?

Example 4.7.9: Suppose e has a 1 in its third position, e = [0, 0, 1, 0, 0, 0, 0]. Then H ∗ e is
the third column of H, which is [0, 1, 1].

As long as e has at most one bit error, the position of the bit can be determined from H ∗ e.
This shows that the Hamming code allows the recipient to correct one-bit errors.
CHAPTER 4. THE MATRIX 213

Quiz 4.7.10: Suppose H ∗ e is [1, 1, 0]. What is e?

Answer

[0, 0, 0, 0, 0, 1, 0].

Quiz 4.7.11: Show that the Hamming code does not allow the recipient to correct two-bit
errors: give two different error vectors, e1 and e2 , each with at most two 1’s, such that H ∗ e1 =
H ∗ e2 .

Answer

There are many acceptable answers, e.g. e1 = [1, 1, 0, 0, 0, 0, 0] and e2 = [0, 0, 1, 0, 0, 0, 0] or


e1 = [0, 0, 1, 0, 0, 1, 0] and e2 = [0, 1, 0, 0, 0, 0, 1].

Next we show that the Hamming code allows detection of errors as long as the number of
errors is no more than two. Remember that the recipient assumes that no error has occured if
H ∗ e is the zero vector. Is there a way to set exactly two 1’s in e so as to achieve H ∗ e = 0?
When e has two 1’s, H ∗ e is the sum of the two corresponding columns of H. If the sum of
two columns is 0 then (by GF (2) arithmetic) the two columns must be equal.

Example 4.7.12: Suppose e = [0, 0, 1, 0, 0, 0, 1]. Then H ∗ e = [0, 1, 1] + [1, 1, 1] = [1, 0, 0]

Note, however, that a two-bit error can get misinterpreted as a one-bit error. In the example, if the
recipient assumes at most one error, she will conclude that the error vector is e = [0, 0, 0, 1, 0, 0, 0].
In Lab 4.14, we will implement the Hamming code and try it out.

4.8 Computing sparse matrix-vector product


For computing products of matrices with vectors, we could use the linear-combinations or dot-
products definitions but they are not very convenient for exploiting sparsity.
By combining the definition of dot-product with the dot-product definition of matrix-vector
multiplication, we obtain the following equivalent definition.

Definition 4.8.1 (Ordinary Definition of Matrix-Vector Multiplication:): If M is an


R × C matrix and u is a C-vector then M ∗ u is the R-vector v such that, for each r ∈ R,
%
v[r] = M [r, c]u[c] (4.5)
c∈C
CHAPTER 4. THE MATRIX 214

The most straightforward way to implement matrix-vector multiplication based on this defi-
nition is:
1 for each i!in R:
2 v[i] := j∈C M [i, j]u[j]
However, this doesn’t take advantage of the fact that many entries of M are zero and do not
even appear in our sparse representation of M . We could try implementing the sum in Line 2
in a clever way, omitting those terms corresponding to entries of M that do not appear in our
sparse representation. However, our representation does not support doing this efficiently. The
more general idea is sound, however: iterate over the entries of M that are actually represented.
The trick is to initialize the output vector v to the zero vector, and then iterate over the
nonzero entries of M , adding terms as specified by Equation 4.5.

1 initialize v to zero vector


2 for each pair (i, j) such that the sparse representation specifies M [i, j],
3 v[i] = v[i] + M [i, j]u[j]

A similar algorithm can be used to compute a vector-matrix product.

Remark 4.8.2: This algorithm makes no effort to exploit sparsity in the vector. When doing
matrix-vector or vector-matrix multiplication, it is not generally worthwhile to try to exploit
sparsity in the vector.

Remark 4.8.3: There could be zeroes in the output vector but such zeroes are considered
“accidental” and are so rare that it is not worth trying to notice their occurence.

4.9 The matrix meets the function


4.9.1 From matrix to function
For every matrix M , we can use matrix-vector multiplication to define a function x )→ M ∗ x.
The study of the matrix M is in part the study of this function, and vice versa. It is convenient
to have a name by which we can refer to this function. There is no traditional name for this
function; just in this section, we will refer to it by fM . Formally, we define it as follows: if M is
an R × C matrix over a field F then the function fM : FC −→ FR is defined by fM (x) = M ∗ x
This is not a traditional definition in linear algebra; I introduce it here for pedagogical pur-
poses.

# @ ?
Example 4.9.1: Let M be the matrix a 1 2 3
b 10 20 30
Then the domain of the function fM is R{#,@,?} and the co-domain is R{a,b} . The image, for
CHAPTER 4. THE MATRIX 215

# @ ? a b
example, of the vector is the vector
2 2 -2 0 0

Problem 4.9.2: Recall that M T is the transpose of M . The function corresponding to M T is


fM T

1. What is the domain of fM T ?


2. What is the co-domain?

3. Give a vector in the domain of fM T whose image is the all-zeroes vector.

4.9.2 From function to matrix


Suppose we have a function fM : FA −→ FB corresponding to some matrix M but we don’t
happen to know the matrix M . We want to compute the matrix M such that fM (x) = M ∗ x.
Let’s first figure out the column-label set for M . Since the domain of fM is FA , we know that
x is an A-vector. For the product M ∗ x to even be legal, we need the column-label set of M to
be A.
Since the co-domain of fM is FB , we know that the result of multiplying M by x must be a
B-vector. In order for that to be the case, we need the row-label set of M to be B.
So far, so good. We know M must be a B × A matrix. But what should its entries be? To
find them, we use the linear-combinations definition of matrix-vector product.
Remember the standard generators for FA : for each element a ∈ A, there is a generator ea
that maps a to one and maps every other element of A to zero. By the linear-combinations
definition, M ∗ ea is column a of M . This shows that column a of M must equal fM (ea ).

4.9.3 Examples of deriving the matrix


In this section, we give some examples illustrating how one derives the matrix from a function,
assuming that there is some matrix M such that the function is x )→ M ∗ x. Warning: In at
least one of these examples, that assumption is not true.

Example 4.9.3: Let s(·) be the function from R2 to R2 that scales the x-coordinate by 2.
CHAPTER 4. THE MATRIX 216

Assume that s([x, y]) = M# ∗ [x, y] $for some matrix M . The image of [1, 0] is [2, 0] and the image
2 0
of [0, 1] is [0, 1], so M = .
0 1

Example 4.9.4: Let r90 (·) be the function from R2 to R2 that rotates points in 2D by ninety
degrees counterclockwise around the origin.

Let’s assume for now that r90 ([x, y]) = M ∗ [x, y] for some matrix M . To find M , we find the
image under this function of the two standard generators [1, 0] and [0, 1].
Rotating the point [1, 0] by ninety degrees about the origin yields [0, 1], so this must be the
first column of M .
Rotating the point [0,
# 1] by ninety
$ degrees yields [−1, 0], so this must be the second column
0 −1
of M . Therefore M = .
1 0

Example 4.9.5: For an angle θ, let rθ (·) be the function from R2 to R2 that rotates points
CHAPTER 4. THE MATRIX 217

around the origin counterclockwise by θ. Assume rθ ([x, y]) = M ∗ [x, y] for some matrix M .
Rotating the point [1, 0] by θ gives us the point [cos θ, sin θ], which must therefore be the
first column of M .

r! ([1,0]) = [cos ! ,sin ! ]

cos ! (cos ! ,sin ! )

sin !
!

(1,0)
Rotating the point [0, 1] by θ gives us the point [− sin θ, cos θ], so this must be the second
column of M .
(1,0)

(-sin ! ,cos ! )
sin !

r! ([0,1]) = [-sin ! , cos ! ]


cos !

# $
cos θ − sin θ
Thus M = .
sin θ cos θ 1 √ 2
3 1
2 −
√2
For example, for rotating by thirty degrees, the matrix is 1 3
. Finally, we have
2 2
caught up with complex numbers, for which rotation by a given angle is simply multiplication
(Section 1.4.10).
CHAPTER 4. THE MATRIX 218

Matrix Transform (http://xkcd.com/824)

Example 4.9.6: Let t(·) be the function from R2 to R2 that translates a point one unit to the
right and two units up.

Assume that t([x, y]) = M ∗#[x, y] for$ some matrix M . The image of [1, 0] is [2, 2] and the
2 1
image of [0, 1] is [1, 3], so M = .
2 3

4.10 Linear functions


In each of the examples, we assumed that the function could be expressed in terms of matrix-
vector multiplication, but this assumption turns out not to be valid in all these examples. How
can we tell whether a function can be so expressed?

4.10.1 Which functions can be expressed as a matrix-vector product


In Section 3.4, we identified three properties, Property V1, Property V2, and Property V3, that
hold of

• the span of some vectors, and


CHAPTER 4. THE MATRIX 219

• the solution set of a homogeneous linear system.


We called any set of vectors satisfying Properties V1, V2, and V3 a vector space.
Here we take a similar approach. In Section 4.6.5, we proved two algebraic properties of
matrix-vector multiplication. We now use those algebraic properties to define a special kind of
function, linear functions.

4.10.2 Definition and simple examples


Definition 4.10.1: Let U and V be vector spaces over a field F. A function f : U −→ V is
called a linear function if it satisfies the following two properties:

Property L1: For any vector u in the domain of f and any scalar α in F,

f (α u) = α f (u)

Property L2: For any two vectors u and v in the domain of f ,

f (u + v) = f (u) + f (v)

(A synonym for linear function is linear transformation.)


Let M be an R × C matrix over a field F, and define

f : FC −→ FR

by f (x) = M ∗ x. The domain and co-domain are vector spaces. By Proposition 4.6.13, the
function f satisfies Properties L1 and L2. Thus f is a linear function. We have proved:

Proposition 4.10.2: For any matrix M , the function x )→ M ∗ x is a linear function.

Here is a special case.

Lemma 4.10.3: For any C-vector a over F, the function f : FC −→ F defined by f (x) = a · x
is a linear function.

Proof

Let A be the {0} × C matrix whose only row is a. Then f (x) = A ∗ x, so the lemma follows
from Proposition 4.10.2. !

Bilinearity of dot-product Lemma 4.10.3 states that, for any vector w, the function x )→
w · x is a linear function of x. Thus the dot-product function f (x, y) = x · y is linear in its
first argument (i.e. if we plug in a vector for the second argument). By the symmetry of the
CHAPTER 4. THE MATRIX 220

dot-product (Proposition 2.9.21), the dot-product function is also linear in its second argument.
We say that the dot-product function is bilinear to mean that it is linear in each of its arguments.

Example 4.10.4: Let F be any field. The function from F2 to F defined by (x, y) )→ x + y is
a linear function. You can prove this using bilinearity of dot-product.

Quiz 4.10.5: Show that the function with domain R2 defined by [x, y] )→ xy is not a linear
function by giving inputs for which the function violates either Property L1 or Property L2.

Answer

f ([1, 1] + [1, 1]) = f ([2, 2]) = 4


f ([1, 1]) + f ([1, 1]) = 1 + 1

Quiz 4.10.6: Show that rotation by ninety degrees, r90 (·), is a linear function.

Answer

The scalar-multiplication property, Property L1, is proved as follows:

α f ([x, y]) = α [−y, x]


= [−αy, αx]
= f ([αx, αy])
= f (α [x, y])

The vector-addition property, Property L2, is proved similiarly:

f ([x1 , y1 ]) + f ([x2 , y2 ]) = [−y1 , x1 ] + [−y2 , x2 ]


= [−(y1 + y2 ), x1 + x2 ]
= f ([x1 + x2 , y1 + y2 ])

Exercise 4.10.7: Define g : R2 −→ R3 by g([x, y]) = [x, y, 1]. Is g a linear function? If so,
prove it. If not, give a counterexample.
CHAPTER 4. THE MATRIX 221

Exercise 4.10.8: Define h : R2 −→ R2 to be the function that takes a point [x, y] to its
reflection about the y-axis. Give an explicit (i.e. algebraic) definition of h. Is it a linear
function? Explain your answer.

Problem 4.10.9: In at least one of the examples in Section 4.9.3, the function cannot be
written as f (x) = M ∗ x. Which one? Demonstrate using a numerical example that the
function does not satisfy the Properties L1 and L2 that define linear functions.

4.10.3 Linear functions and zero vectors


Lemma 4.10.10: If f : U −→ V is a linear function then f maps the zero vector of U to the
zero vector of V.

Proof

Let 0 denote the zero vector of U, and let 0V denote the zero vector of V.

f (0) = f (0 + 0) = f (0) + f (0)

Subtracting f (0) from both sides, we obtain

0V = f (0)

Definition 4.10.11: Analogous to the null space of a matrix (Definition 4.7.1), we define the
kernel of a linear function f to be {v : f (v) = 0}. We denote the kernel of f by Ker f .

Lemma 4.10.12: The kernel of a linear function is a vector space.

Problem 4.10.13: Prove Lemma 4.10.12 by showing that Ker f satisfies Properties V1, V2,
and V3 of vector spaces (Section 3.4).

4.10.4 What do linear functions have to do with lines?


Suppose f : U −→ V is a linear function. Let u1 and u2 be two vectors in U, and consider a
linear combination α1 u1 + α2 u2 and its image under f .
CHAPTER 4. THE MATRIX 222

f (α1 v1 + α2 v2 ) = f (α1 v1 ) + f (α2 v2 ) by Property L2


= α1 f (v1 ) + α2 f (v2 ) by Property L1

We interpret this as follows: the image of a linear combination of u1 and u2 is the corresponding
linear combination of f (u1 ) and f (u2 ).
What are the geometric implications?
Let’s focus on the case where the domain U is Rn . The line through the points u1 and u2 is
the affine hull of u1 and u2 , i.e. the set of all affine combinations:

{α1 u1 + α2 u2 : α1 , α2 ∈ R, α1 + α2 = 1}

What is the set of images under f of all these affine combinations? It is

{f (α1 u1 + α2 u2 ) : α1 , α2 ∈ R, α1 + α2 = 1}

which is equal to
{α1 f (u1 ) + α2 f (u2 ) : α1 , α2 ∈ R, α1 + α2 = 1}
which is the set of all affine combinations of f (u1 ) and f (u2 ).
This shows:

The image under f of the line through u1 and u2 is the “line” through f (u1 ) and f (u2 ).

The reason for the scare-quotes is that f might map u1 and u2 to the same point! The set of
affine combinations of two identical points is the set consisting just of that one point.
The argument we have given about the image of a linear combination can of course be extended
to handle a linear combination of more than two vectors.

Proposition 4.10.14: For a linear function f , for any vectors u1 , . . . , un in the domain of f
and any scalars α1 , . . . , αn ,

f (α1 u1 + · · · + αn un ) = α1 f (u1 ) + · · · + αn f (un )

Therefore the image under a linear function of any flat is another flat.

4.10.5 Linear functions that are one-to-one


Using the notion of kernel, we can give a nice criterion for whether a linear function is one-to-one.

Lemma 4.10.15 (One-to-One Lemma): A linear function is one-to-one if and only if its
kernel is a trivial vector space.
CHAPTER 4. THE MATRIX 223

Proof

Let f : V −→ W be a linear function. We prove two directions.


Suppose Ker f contains some nonzero vector v, so f (v) = 0V . By Lemma 4.10.10,
f (0) = 0V as well, so f is not one-to-one.
Suppose Ker f = {0}. Let v1 , v2 be any vectors such that f (v1 ) = f (v2 ). Then
f (v1 ) − f (v2 ) = 0V so, by linearity, f (v1 − v2 ) = 0V , so v1 − v2 ∈ Ker f . Since Ker f
consists solely of 0, it follows that v1 − v2 = 0, so v1 = v2 . !

This simple lemma gives us a fresh perspective on the question of uniqueness of solution to
a linear system. Consider the function f (x) = A ∗ x. Solving a linear system A ∗ x = b can
be interpreted as finding a pre-image of b under f . If a pre-image exists, it is guaranteed to be
unique if f is one-to-one.

4.10.6 Linear functions that are onto?


The One-to-One Lemma gives us a nice criterion for determining whether a linear function is
one-to-one. What about onto?
Recall that the image of a function f with domain V is the set {f (v) : v ∈ V}. Recall that
a function f being onto means that the image of the function equals the co-domain.

Question 4.10.16: How can we tell if a linear function is onto?

When f : V −→ W is a linear function, we denote the image of f by Im f . Thus asking


whether f is onto is asking whether Im f = W.

Example 4.10.17: (Solvability of Lights Out) Can 3 × 3 Lights Out be solved from any initial
configuration? (Question 2.8.5).
As we saw in Example 4.5.5 (Page 195), we can use a matrix to address Lights Out. We
construct a matrix M whose columns are the button vectors:
 
• • • • • • • • •
M = • • • • • • • • ... • 
• • • •

The set of solvable initial configurations (those from which it is possible to turn out all lights) is
the set of all linear combinations of these columns, the column space of the matrix. We saw in
Example 3.2.14 (Page 154) that, in the case of 2 × 2 Lights Out, every initial configuration is
solvable in this case. What about 3 × 3 Lights Out?
Let D = {(0, 0), . . . , (2, 2)}. Let f : GF (2)D −→ GF (2)D be defined by f (x) = M ∗ x.
The set of solvable initial configurations is the image of f . The set of all initial configurations is
the co-domain of f . Therefore, the question of whether every position is solvable is exactly the
question of whether f is onto.
CHAPTER 4. THE MATRIX 224

We can make one step towards answering Question 4.10.16.

Lemma 4.10.18: The image of a linear function is a subspace of the function’s co-domain.

Proof

Let f : V −→ W be a linear function. Clearly Im f is a subset of W. To show that Im f is


a subspace of W, we must show that Im f satisfies Properties V1, V2, and V3 of a vector
space.

• V1: We saw in Lemma 4.10.10 that f maps the zero vector of V to the zero vector of
W, so the zero vector of W belongs to Im f .

• V2: Let w be a vector in Im f . By definition of Im f , there must be a vector v in V


such that f (v) = w. For any scalar α,

α w = α f (v) = f (α v)

so α w is in Im f .

• V3: Let w1 and w2 be vectors in Im f . By definition of Im f , there must be vectors


v1 and v2 in V such that f (v1 ) = w1 and f (v2 ) = w2 . By Property L1 of linear
functions, w1 + w2 = f (v1 ) + f (v2 ) = f (v1 + v2 ), so w1 + w2 is in Im f .

The complete answer to Question 4.10.16 must wait until Chapter 6.

4.10.7 A linear function from FC to FR can be represented by a matrix


Suppose f : FC −→ FR is a linear function. We can use the method of Section 4.9.2 to obtain a
matrix M : for each c ∈ C, column c of M is the image under f of the standard generator ec .
How do we know the resulting matrix M satisfies f (x) = M ∗ x? Linearity!
! For any vector
x ∈ FC , for each
! c ∈ C, let α c be the value of entry c of x. Then x = c∈C α c e c . Because f is
linear, f (x) = c∈C αc f (ec ).
On the other hand, by the linear-combinations definition of matrix-vector multiplication,
M ∗ x is the linear combination of M ’s columns where the coefficients are the scalars αc (for
c ∈ C).!We defined M to be the matrix whose column c is f (ec ) for each c ∈ C, so M ∗ x also
equals c∈C αc f (ec ). This shows that f (x) = M ∗ x for every vector x ∈ FC .
We summarize this result in a lemma.

Lemma 4.10.19: If f : FC −→ FR is a linear function then there is an R × C matrix M over


F such that f (x) = M ∗ x for every vector x ∈ FC .
CHAPTER 4. THE MATRIX 225

4.10.8 Diagonal matrices


Let d1 , . . . , dn be real numbers. Let f : Rn −→ Rn be the function such that f ([x1 , . . . , xn ]) =
[d1 x1 , . . . , dn xn ]. The matrix corresponding to this function is
 
d1
 .. 
 . 
dn
Such a matrix is called a diagonal matrix because the only entries allowed to be nonzero form a
diagonal.

Definition 4.10.20: For a domain D, a D × D matrix M is a diagonal matrix if M [r, c] = 0


for every pair r, c ∈ D such that r 2= c.

Diagonal matrices are very important in Chapters 11 and 12.

Quiz 4.10.21: Write a procedure diag(D, entries) with the following spec:

• input: a set D and a dictionary entries mapping D to elements of a field


• output: a diagonal matrix such that entry (d,d) is entries[d]

Answer

def diag(D, entries):


return Mat((D,D), {(d,d):entries[d] for d in D})

A particularly simple and useful diagonal matrix is the identity matrix, defined in Sec-
tion 4.1.5. For example, here is the {a, b, c} × {a, b, c} identity matrix:
a b c
-------
a | 1 0 0
b | 0 1 0
c | 0 0 1
Recall that we refer to it as 1D or just 1.
Why is it called the identity matrix? Consider the function f : FD −→ FD defined by
f (x) = 1 ∗ x. Since 1 ∗ x = x, the function f is the identity function on FD .

4.11 Matrix-matrix multiplication


We can also multiply a pair of matrices. Suppose A is an R × S matrix and B is an S × T matrix.
Then it is legal to multiply A times B, and the result is a R × T matrix. The traditional way of
writing “A times B” is simply AB, with no operator in between the matrices.
CHAPTER 4. THE MATRIX 226

(In our Mat class implementing matrices, however, we will use the * operator to signify
matrix-matrix multiplication.)
Note that the product AB is different from the product BA, and in fact one product might
be legal while the other is illegal. Matrix multiplication is not commutative.

4.11.1 Matrix-matrix multiplication in terms of matrix-vector and vector-


matrix multiplication
We give two equivalent definitions of matrix-matrix multiplication, one in terms of matrix-vector
multiplication and one in terms of vector-matrix multiplication.

Definition 4.11.1 (Vector-matrix definition of matrix-matrix multiplication): For


each row-label r of A,
row r of AB = (row r of A) ∗ B (4.6)

Example 4.11.2: Here is a matrix A that differs only slightly from the 3 × 3 identity matrix:
 
1 0 0
A= 2 1 0 
0 0 1

Consider the product AB where B is a 3 × n matrix. In order to use the vector-matrix definition
of matrix-matrix multiplication, we think of A as consisting of three rows:
 
1 0 0
A= 2 1 0 
0 0 1

Row i of the matrix-matrix product AB is the vector-matrix product

(row i of A) ∗ B

This product, according to the linear-combinations definition of vector-matrix multiplication, is


the linear combination of the rows of B in which the coefficients are the entries of row i of A.
Writing B in terms of its rows,
 
b1
B= b2 
b3

Then
row 1 of AB = 1 b1 + 0 b2 + 0 b3 = b1
row 2 of AB = 2 b 1 + 1 b2 + 0 b3 = 2b1 + b2
row 3 of AB = 0 b 1 + 0 b2 + 1 b3 = b3
CHAPTER 4. THE MATRIX 227

The effect of left-multiplication by A is adding twice row 1 to row 2.

Matrix A in Example 4.11.2 (Page 226) is an elementary row-addition matrix, a matrix that
is an identity matrix plus at most one off-diagonal nonzero entry. Multiplying on the left by an
elementary row-addition matrix adds a multiple of one row to another. We make use of these
matrices in an algorithm in Chapter 7.

Definition 4.11.3 (Matrix-vector definition of matrix-matrix multiplication): For


each column-label s of B,

column s of AB = A ∗ (column s of B) (4.7)

# $
1 2
Example 4.11.4: Let A = and let B be the matrix whose columns are [4, 3], [2, 1],
−1 1
and [0, −1], i.e. # $
4 2 0
B=
3 1 −1
Now AB is the matrix whose column i is the result of multiplying A by column i of B. Since
A ∗ [4, 3] = [10, −1], A ∗ [2, 1] = [4, −1], and A ∗ [0, −1] = [−2, −1],
# $
10 4 −2
AB =
−1 −1 −1

Example 4.11.5: The matrix from Example 4.9.5 (Page 216) that rotates points in R2 by
thirty degrees is
# $ 1 √3 1
2
cos θ − sin θ 2 −√2
A= =
sin θ cos θ 1 3
2 2
2
We form a matrix B whose columns are the points in R belonging to the list L in Task 2.3.2:
# $
2 3 1.75 2 2.25 2.5 2.75 3 3.25
B=
2 2 1 1 1 1 1 1 1

Now AB is the matrix whose column i is the result of multiplying column i of B on the left by
A, i.e. rotating the ith point in L by thirty degrees.
CHAPTER 4. THE MATRIX 228

Example 4.11.6: In Example 3.2.11 (Page 151), I state equations showing that the “old”
vectors [3, 0, 0], [0, 2, 0], and [0, 0, 1] can each be written as a linear combination of “new”
vectors [1, 0, 0], [1, 1, 0], and [1, 1, 1]:

[3, 0, 0] = 3 [1, 0, 0] + 0, [1, 1, 0] + 0 [1, 1, 1]


[0, 2, 0] = −2 [1, 0, 0] + 2 [1, 1, 0] + 0 [1, 1, 1]
[0, 0, 1] = 0 [1, 0, 0] − 1 [1, 1, 0] + 1 [1, 1, 1]

We rewrite these equations using the the linear-combinations definition of matrix-vector multi-
plication:
 
1 1 1
[3, 0, 0] =  0 1 1  ∗ [3, 0, 0]
0 0 1
 
1 1 1
[0, 2, 0] =  0 1 1  ∗ [−2, 2, 0]
0 0 1
 
1 1 1
[0, 0, 1] =  0 1 1  ∗ [0, −1, 1]
0 0 1

We combine these three equations to form one equation, using the matrix-vector definition of
CHAPTER 4. THE MATRIX 229

matrix-matrix multiplication:
    
3 0 0 1 1 1 3 −2 0
 0 2 0 = 0 1
  1   0 2 −1 
0 0 1 0 0 1 0 0 1

The matrix-vector and vector-matrix definitions suggest that matrix-matrix multiplication


exists simply as a convenient notation for a collection of matrix-vector products or vector-matrix
products. However, matrix-matrix multiplication has a deeper meaning, which we discuss in
Section 4.11.3.
Meanwhile, note that by combining a definition of matrix-matrix multiplication with a def-
inition of matrix-vector or vector-matrix multiplication, you can get finer-grained definitions of
matrix-matrix multiplication. For example, by combining the matrix-vector definition of matrix-
matrix multiplication with the dot-product definition of matrix-vector multiplication, we get:

Definition 4.11.7 (Dot-product definition of matrix-matrix multiplication): Entry rc


of AB is the dot-product of row r of A with column c of B.

In Problem 4.17.19, you will work with United Nations (UN) voting data. You will build
a matrix A, each row of which is the voting record of a different country in the UN. You will
use matrix-matrix multiplication to find the dot-product of the voting records of every pair of
countries. Using these data, you can find out which pairs of countries are in greatest disagreement.
You will find that matrix-matrix multiplication can take quite a long time! Researchers have
discovered faster algorithms for matrix-matrix multiplication that are especially helpful when the
matrices are very large and dense and roughly square. (Strassen’s algorithm was the first and is
still the most practical.)
There is an even faster algorithm to compute all the dot-products approximately (but accu-
rately enough to find the top few pairs of countries in greatest disagreement).

4.11.2 Graphs, incidence matrices, and counting paths


In the movie Good Will Hunting, at the end of class the professor announces,

“I also put an advanced Fourier system on the main hallway chalkboard. I’m hoping
that one of you might prove it by the end of the semester. Now the person to do so will
not only be in my good graces but will also go on to fame and fortune, having their
accomplishment recorded and their name printed in the auspicious MIT Tech. Former
winners include Nobel laureates, Fields Medal winners, renowned astrophysicists, and
lowly MIT professors.”
Must be a tough problem, eh?
The hero, Will Hunting, who works as a janitor at MIT, sees the problem and surreptitiously
writes down the solution.
CHAPTER 4. THE MATRIX 230

The class is abuzz over the weekend—who is the mysterious student who cracked this problem?
The problem has nothing to do with Fourier systems. It has to do with representing and
manipulating a graph using a matrix.

Graphs
Informally, a graph has points, called vertices or nodes, and links, called edges. Here is a diagram
of the graph appearing in Will’s problem, but keep in mind that the graph doesn’t specify the
geometric positions of the nodes or edges—just which edges connect which nodes.

The nodes of this graph are labeled 1, 2, 3, and 4. There are two edges with endpoints 2 and 3,
one edge with endpoints 1 and 2, and so on.

Adjacency matrix
The first part of Will’s problem is to find the adjacency matrix of the graph. The adjacency
matrix A of a graph G is the D × D matrix where D is the set of node labels. In Will’s graph,
D = {1, 2, 3, 4}. For any pair i, j of nodes, A[i, j] is the number of edges with endpoints i and j.
Therefore the adjacency matrix of Will’s graph is
1 2 3 4
1 0 1 0 1
2 1 0 2 1
3 0 2 0 0
4 1 1 0 0

Note that this matrix is symmetric. This reflects the fact that if an edge has endpoints i and
j then it has endpoints j and i. Much later we discuss directed graphs, for which things are a bit
more complicated.
Note also that the diagonal elements of the matrix are zeroes. This reflects the fact that
Will’s graph has no self-loops. A self-loop is an edge whose two endpoints are the same node.
CHAPTER 4. THE MATRIX 231

Walks
The second part of the problem addresses walks in the graph. A walk is a sequence of alternating
nodes and edges
v0 e0 v1 e1 · · · ek−1 vk
in which each edge is immediately between its two endpoints.
Here is a diagram of Will’s graph with the edges labeled:

and here is the same diagram with the walk 3 c 2 e 4 e 2 shown:

Note that a walk can use an edge multiple times. Also, a walk need not visit all the nodes of
a graph. (A walk that visits all nodes is a traveling-salesperson tour; finding the shortest such
tour is a famous ex