Lecture Notes Math 307
Lecture Notes Math 307
Linear Equations
I Linear Equations
I Linear Equations In the example above, the augmented matrix is 1 2 1 0 . 1 5 1 1 The reduced row echelon form is 1 0 1 2/7 , 0 1 0 1/7 which leads to a family of solutions (one for each value of the parameter s) 2/7 1 x = 1/7 + s 0 . 0 1
First, we dene the matrix A and the vector b in MATLAB/Octave. Here is the input (after the prompt symbol >) and the output (without a prompt symbol). >A=[1 1 1;1 1 -1;1 -1 1] A = 1 1 1 1 1 -1 1 -1 1
>b=[3;1;1]
b = 3 1 1 Notice that the entries on the same row are separated by spaces (or commas) while rows are separated by semicolons. In MATLAB/Octave, column vectors are n by 1 matrices and row vectors are 1 by n matrices. The semicolons in the denition of b make it a column vector. In MATLAB/Octave, X denotes the transpose of X. Thus we get the same result if we dene b as >b=[3 1 1] b = 3 1 1 The solution can be found by computing the inverse of A and multiplying >x = A^(-1)*b x = 1 1 1 However if A is a large matrix we dont want to actually calculate the inverse. The syntax for solving a system of equations eciently is >x = A\b x = 1 1 1
I Linear Equations If you try this with a singular matrix A, MATLAB/Octave will complain and print an warning message. If you see the warning, the answer is not reliable! You can always check to see that x really is a solution by computing Ax.
>A*x ans = 3 1 1
As expected, the result is b. By the way, you can check to see how much faster A\b is than A^(-1)*b by using the functions tic() and toc(). The function tic() starts the clock, and toc() stops the clock and prints the elapsed time. To try this out, lets make A and b really big with random entries.
A=rand(1000,1000); b=rand(1000,1);
Here we are using the MATLAB/Octave command rand(m,n) that generates an m n matrix with random entries chosen between 0 and 1. Each time rand is used it generates new numbers. Notice the semicolon ; at the end of the inputs. This suppresses the output. Without the semicolon, MATLAB/Octave would start writing the 1,000,000 random entries of A to our screen! Now we are ready to time our calculations.
I Linear Equations The solution appears on the right. Now lets try to solve Ax = b with
This time the matrix A is singular and doesnt have an inverse. Recall that the determinant of a singular matrix is zero, so we can check by computing it. >A=[1 2 3; 4 5 6; 7 8 9]; >det(A) ans = 0 However we can still try to solve the equation Ax = b using Gaussian elimination. >b=[1 1 1]; >rref([A b]) ans = 1.00000 0.00000 0.00000 0.00000 1.00000 0.00000 -1.00000 2.00000 0.00000 -1.00000 1.00000 0.00000
1 2 3 A = 4 5 6 7 8 9
1 b = 1 . 1
Letting x3 = s be a parameter, and proceeding as you learned in previous courses, we arrive at the general solution 1 1 1 + s 2 . x= 0 1 1 2 3 A = 4 5 6 7 8 9 1 b = 1 , 0
then
>rref([1 2 3 1;4 5 6 1;7 8 9 0]) ans = 1.00000 0.00000 0.00000 0.00000 1.00000 0.00000 -1.00000 2.00000 0.00000 0.00000 0.00000 1.00000
To bring the system into upper triangular form, we need to swap the second and third rows. Here is the MATLAB/Octave code. >temp=C(3,:);C(3,:)=C(2,:);C(2,:)=temp C = 1 0 0 1 -2 0 1 0 -2 3 -2 -2
I Linear Equations
When you draw the vector as an arrow on the plane, this norm is the Euclidean distance between the tip and the tail. This leads to the formula a = (4)2 + 32 = 5.
> norm(a) ans = 5 The formula is easily generalized to n dimensions. If x = [x1 , x2 , . . . , xn ]T then x = |x1 |2 + |x2 |2 + + |xn |2 .
The absolute value signs in this formula, which might seem superuous, are put in to make the formula correct when the components are complex numbers. So, for example i 1 = |i|2 + |1|2 = 1 + 1 = 2.
Does MATLAB/Octave give this answer too? There are situations where other ways of measuring the norm of a vector are more natural. Suppose that the tip and tail of the vector a = [4, 3]T are locations in a city where you can only walk along the streets and avenues.
10
If you dened the norm to be the shortest distance that you can walk to get from the tail to the tip, the answer would be a
1
= | 4| + |3| = 7.
This norm is called the 1-norm and can be calculated in MATLAB/Octave by adding 1 as an extra argument in the norm function. > norm(a,1) ans = 7 The 1-norm is also easily generalized to n dimensions. If x = [x1 , x2 , . . . , xn ]T then x
1
Another norm that is often used measures the largest component in absolute value. This norm is called the innity norm. For a = [4, 3]T we have a
To compute this norm in MATLAB/Octave we use inf as the second argument in the norm function. > norm(a,inf) ans = 4 Here are three properties that the norms we have dened all have in common: 1. For every vector x and every number s, sx = |s| x . 2. The only vector with norm zero is the zero vector, that is, x = 0 if and only if x = 0
11
I Linear Equations 3. For all vectors x and y, x + y x + y . This inequality is called the triangle inequality. It says that the length of the longest side of a triangle is smaller than the sum of the lengths of the two shorter sides. What is the point of introducing many ways of measuring the length of a vector? Sometimes one of the non-standard norms has natural meaning in the context of a given problem. For example, when we study stochastic matrices, we will see that multiplication of a vector by a stochastic matrix preserves the 1-norm of the vector. So in this situation it is natural to use 1-norms. However, in this course we will almost always use the standard Euclidean norm. If v a vector then v (without any subscripts) will always denote the standard Euclidean norm.
Taking this point of view, we would dene the norm of A to be 12 + 22 + 02 + 22 = 3. In fact, the norm computed in this way is sometimes used for matrices. It is called the HilbertSchmidt norm. For a general matrix A = [ai,j ], the formula for the Hilbert-Schmidt norm is A
HS
=
i j
|ai,j |2 .
The Hilbert-Schmidt norm does measure the size of matrix in some sense. It has the advantage of being easy to compute from the entries ai,j . But it is not closely tied to the action of A as a linear transformation. When A is considered as a linear transformation or operator, acting on vectors, there is another norm that is more natural to use. Starting with a vector x the matrix A transforms it to the vector Ax. We want to say that a matrix is big if increases the size of vectors, in other words, if Ax is big compared to x . So it is natural to consider the stretching ratio Ax / x . Of course, this ratio depends on x, since some vectors get stretched more than others by A. Also, the ratio is not dened if x = 0. But in this case Ax = 0 too, so there is no stretching.
12
I.1 Solving Linear Equations We now dene the matrix norm of A to be the largest of these ratios, A = max Ax . x
x: x =0
This norm measures the maximum factor by which A can stretch the length of a vector. It is sometimes called the operator norm. Since A is dened to be the maximum of a collection of stretching ratios, it must be bigger than or equal to any particular stretching ratio. In other words, for any non zero vector x we know A Ax / x , or Ax A x . This is how the matrix norm is often used in practice. If we know x and the matrix norm A , then we have an upper bound on the norm of Ax. In fact, the maximum of a collection of numbers is the smallest number that is larger than or equal to every number in the collection (draw a picture on the number line to see this), the matrix norm A is the smallest number that is bigger than Ax / x for every choice of non-zero x. Thus A is the smallest number C for which Ax C x for every x. An equivalent denition for A is A = max
x: x =1
Ax .
Why do these denitions give the same answer? The reason is that the quantity Ax / x does not change if we multiply x by a non-zero scalar (convince yourself!). So, when calculating the maximum over all non-zero vectors in the rst expression for A , all the vectors pointing in the same direction will give the same value for Ax / x . This means that we need only pick one vector in any given direction, and might as well choose the unit vector. For this vector, the denominator is equal to one, so we can ignore it. Here is another way of saying this. Consider the image of the unit sphere under A. This is the set of vectors {Ax : x = 1} The length of the longest vector in this set is A .
The picture below is a sketch of the unit sphere (circle) in two dimensions, and its image 1 2 under A = . This image is an ellipse. 0 2
||A||
13
I Linear Equations The norm of the matrix is the distance from the origin to the point on the ellipse farthest from the origin. In this case this turns out to be A = 9/2 + (1/2) 65. Its hard to see how this expression can be obtained from the entries of the matrix. There is no easy formula. However, if A is a diagonal matrix the norm is easy to compute. To see this, lets consider a diagonal matrix 3 0 0 A = 0 2 0 . 0 0 1 x1 x = x2 x3 3x1 Ax = 2x2 x3 Ax
2
If
then
so that
This implies that for any unit vector x Ax 3 and taking the maximum over all unit vectors x yields A 3. On the other hand, the maximum of Ax over all unit vectors x is larger than the value of Ax for any particular unit vector. In particular, if 1 e1 = 0 0 then A Ae1 = 3. Thus we see that A = 3. In general, the matrix norm of a diagonal matrix with diagonal entries 1 , 2 , , n is the largest value of |k |.
14
I.1 Solving Linear Equations The MATLAB/Octave code for a diagonal matrix with diagonal entries 3, 2 and 1 is diag([3 2 1]) and the expression for the norm of A is norm(A). So for example >norm(diag([3 2 1])) ans = 3
15
I Linear Equations Here we used the fact that if x ranges over all non-zero vectors so does y = Ax and that the minimum of a collection of positive numbers is one divided by the maximum of their reciprocals. Thus the smallest stretching factor for A is 1/ A1 . This leads to the following formula for the condition number of an invertible matrix: cond(A) = A A1 . In our applications we will use the condition number as a measure of how well we can solve the equations that come up accurately. Now, let us try to see why the condition number of A is a good measure of how well we can solve the equations Ax = b accurately. Starting with Ax = b we change the right side to b = b + b. The new solution is x = A1 (b + b) = x + x where x = A1 b is the original solution and the change in the solutions is x = A1 b. Now the absolute errors b and x are not very meaningful, since an absolute error b = 100 is not very large if b = 1, 000, 000, but is large if b = 1. What we really care about are the relative errors b / b and x / x . Can we bound the relative error in the solution in terms of the relative error in the equation? The answer is yes. Beginning with x b = A1 b Ax A1 we can divide by b x to obtain x A1 x b b b . = cond(A) b A b A x ,
This equation gives the real meaning of the condition number. If the condition number is near to 1 then the relative error of the solution is about the same as the relative error in the equation. However, a large condition number means that a small relative error in the equation can lead to a large relative error in the solution. In MATLAB/Octave the condition number is computed using cond(A). > A=[2 0; 0 0.5]; > cond(A) ans = 4
16
How to create a column vector or a matrix with more than one row ; when the semicolon is used inside square brackets, it terminates rows. For example, a=[1;1;1] creates a column vector with three rows B=[1 2 3; 4 5 6] creates a 2 by 3 matrix when a matrix (or a vector) is followed by a single quote (or apostrophe) MATLAB ips rows with columns, that is, it generates the transpose. When the original matrix is a simple row vector, the apostrophe operator turns the vector into a column vector. For example, a=[1 1 1] creates a column vector with three rows B=[1 2 3; 4 5 6] creates a 3 by 2 matrix where the rst row is 1 4 How to use specialized matrix functions rand(n,m) returns a n-by-m matrix with random numbers between 0 and 1.
How to extract elements or submatrices from a matrix A(i,j) returns the entry of the matrix A in the i-th row and the j-th column A(i,:) returns a row vector containing the i-th row of A A(:,j) returns a column vector containing the j-th column of A A(i:j,k:m) returns a matrix containing a specic submatrix of the matrix A. Specically, it returns all rows between the i-th and the j-th rows of A, and all columns between the k-th and the m-th columns of A.
17
I Linear Equations How to perform specic operations on a matrix det(A) returns the determinant of the (square) matrix A rref(A) returns the reduced row echelon form of the matrix A norm(V) returns the 2-norm (Euclidean norm) of the vector V norm(V,1) returns the 1-norm of the vector V norm(V,inf) returns the innity norm of the vector V
18
I.2 Interpolation
I.2 Interpolation
Prerequisites and Learning Goals
From your work in previous courses, you should be able to
compute the determinant of a square matrix; apply the basic linearity properties of the determinant, and explain what its value means about existence and uniqueness of solutions.
give a denition of interpolation function and explain the idea of getting a unique interpolation function by restricting the class of functions under consideration. Dene the problem of Lagrange interpolation and express it in terms of a system of equations where the unknowns are the coecients of a polynomial of given degree; set up the system in matrix form using the Vandermonde matrix, derive the formula for the determinant of the Vandermonde matrix; explain why a solution to the Lagrange interpolation problem always exists. Explain why Lagrange interpolation is not a practical method for large numbers of points. Dene the mathematical problem of interpolation using splines, compare and contrast it with Lagrange interpolation. Explain how minimizing the bending energy leads to a description of the shape of the spline as a piecewise polynomial function. Express the interpolation problem of cubic splines in terms of a system of equations where the unknowns are related to the coecients of the cubic polynomials. Given a set of points, use MATLAB/Octave to calculate and plot the interpolating polynomial in Lagrange interpolation and the piecewise function for cubic splines. Use the MATLAB/Octave functions linspace, vander, polyval, zeros and ones. Use m les in MATLAB/Octave.
19
I Linear Equations
I.2.1 Introduction
Suppose we are given some points (x1 , y1 ), . . . , (xn , yn ) in the plane, where the points xi are all distinct.
Our task is to nd a function f (x) that passes through all these points. In other words, we require that f (xi ) = yi for i = 1, . . . , n. Such a function is called an interpolating function. Problems like this arise in practical applications in situations where a function is sampled at a nite number of points. For example, the function could be the shape of the model we have made for a car. We take a bunch of measurements (x1 , y1 ), . . . , (xn , yn ) and send them to the factory. Whats the best way to reproduce the original shape? Of course, it is impossible to reproduce the original shape with certainty. There are innitely many functions going through the sampled points.
To make our problem of nding the interpolating function f (x) have a unique solution, we must require something more of f (x), either that f (x) lies in some restricted class of functions, or that f (x) is the function that minimizes some measure of badness. We will look at both approaches.
20
I.2 Interpolation The n equations p(xi ) = yi are n linear equations for these unknown coecients which we may write as a1 n1 n2 x1 x2 x1 1 a2 x1 y1 1 xn1 xn2 x2 x2 1 . y2 . 2 2 2 . = . . . . . . .. . . . . an2 . . . . . . . . . . n1 xn2 x2 x an1 yn xn n 1 n n an
Thus we see that the problem of Lagrange interpolation reduces to solving a system of linear equations. If this system has a unique solution, then there is exactly one polynomial p(x) of degree n 1 running through our points. This matrix for this system of equations has a special form and is called a Vandermonde matrix. To decide whether the system of equations has a unique solution we need to determine whether the Vandermonde matrix is invertible or not. One way to do this is to compute the determinant. It turns out that the determinant of a Vandermonde matrix has a particularly simple form, but its a little tricky to see this. The 2 2 case is simple enough: det x1 1 x2 1 = x1 x2 .
Now, we recall that the determinant is linear in each row separately. This implies that det x2 (x2 x1 ) x2 x1 x3 (x3 x1 ) x3 x1 = (x2 x1 ) det x2 1 x3 (x3 x1 ) x3 x1 x2 1 x3 1 .
Now we can take advantage of the zeros in the rst row, and calculate the determinant by expanding along the top row. This gives 2 x1 x1 1 x2 (x2 x1 ) x2 x1 x2 x1 x2 x2 x1 2 . = det det x2 x2 1 = det 2 x3 (x3 x1 ) x3 x1 x2 x1 x3 x3 x1 3 x2 x3 1 3
To go on to the 3 3 case we wont simply expand the determinant, but recall that the determinant is unchanged under row (and column) operations of the type add a multiple of one row (column) to another. Thus if we start with a 3 3 Vandermonde determinant, add x1 times the second column to the rst, and then add x1 times the third column to the second, the determinant doesnt change and we nd that 2 x1 x1 1 0 x1 1 0 0 1 det x2 x2 1 = det x2 x1 x2 x2 1 = det x2 x1 x2 x2 x1 1 . 2 2 2 x2 x3 1 x2 x1 x3 x3 1 x2 x1 x3 x3 x1 1 3 3 3
But the determinant on the right is a 2 2 Vandermonde determinant that we have already
21
I Linear Equations computed. Thus we end up with the formula 2 x1 x1 1 det x2 x2 1 = (x2 x1 )(x3 x1 )(x3 x2 ). 2 x2 x3 1 3
n1 n2 x1 x1 xn1 xn2 2 2 det . . .. . . . . . n1 xn2 xn n
x2 1 x2 2 . . . x2 n
where = (1)n(n1)/2 . It can be proved by induction using the same strategy as we used for the 3 3 case. The product on the right is the product of all dierences xi xj . This product is non-zero, since we are assuming that all the points xi are distinct. Thus the Vandermonde matrix is invertible, and a solution to the Lagrange interpolation problem always exists. Now lets use MATLAB/Octave to see how this interpolation works in practice. We begin by putting some points xi into a vector X and the corresponding points yi into a vector Y.
x1 1 x2 1 . . = (xi xj ), . . . . i>j xn 1
>X=[0 0.2 0.4 0.6 0.8 1.0] >Y=[1 1.1 1.3 0.8 0.4 1.0]
We can use the plot command in MATLAB/Octave to view these points. The command plot(X,Y) will pop open a window and plot the points (xi , yi ) joined by straight lines. In this case we are not interested in joining the points (at least not with straight lines) so we add a third argument: o plots the points as little circles. (For more information you can type help plot on the MATLAB/Octave command line.) Thus we type
The axis command adjusts the axis. Normally when you issue a new plot command, the existing plot is erased. The hold on prevents this, so that subsequent plots are all drawn on the same graph. The original behaviour is restored with hold off. When you do this you should see a graph appear that looks something like this.
22
I.2 Interpolation
1.4
1.2
0.8
0.6
0.4
0.2
Now lets compute the interpolation polynomial. Luckily there are build in functions in MATLAB/Octave that make this very easy. To start with, the function vander(X) returns the Vandermonde matrix corresponding to the points in X. So we dene >V=vander(X) V = 0.00000 0.00032 0.01024 0.07776 0.32768 1.00000 0.00000 0.00160 0.02560 0.12960 0.40960 1.00000 0.00000 0.00800 0.06400 0.21600 0.51200 1.00000 0.00000 0.04000 0.16000 0.36000 0.64000 1.00000 0.00000 0.20000 0.40000 0.60000 0.80000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
We saw above that the coecients of the interpolation polynomial are given by the solution a to the equation V a = y. We nd those coecients using >a=V\Y Lets have a look at the interpolating polynomial. The MATLAB/Octave function polyval(a,X) takes a vector X of x values, say x1 , x2 , . . . xk and returns a vector containing the values p(x1 ), p(x2 ), . . . p(xk ), where p is the polynomial whose coecients are in the vector a, that is, p(x) = a1 xn1 + a2 xn2 + + an1 x + an So plot(X,polyval(a,X)) would be the command we want, except that with the present denition of X this would only plot the polynomial at the interpolation points. What we want is to plot the polynomial for all points, or at least for a large number. The command linspace(0,1,100) produces a vector of 100 linearly spaced points between 0 and 1, so the following commands do the job. >XL=linspace(0,1,100); >YL=polyval(a,XL); >plot(XL,YL); >hold off
23
1.4
1.2
0.8
0.6
0.4
0.2
The MATLAB/Octave commands for this example are in lagrange.m. Unfortunately, things get worse when we increase the number of interpolation points. One clue that there might be trouble ahead is that even for only six points the condition number of V is quite high (try it!). Lets see what happens with 18 points. We will take the x values to be equally spaced between 0 and 1. For the y values we will start o by taking yi = sin(2xi ). We repeat the steps above. >X=linspace(0,1,18); >Y=sin(2*pi*X); >plot(X,Y,o) >axis([-0.1 1.1 -1.5 1.5]) >hold on >V=vander(X); >a=V\Y; >XL=linspace(0,1,500); >YL=polyval(a,XL); >plot(XL,YL); The resulting picture looks okay.
1.5
0.5
-0.5
-1
But look what happens if we change one of the y values just a little. We add 0.02 to the fth y value, redo the Lagrange interpolation and plot the new values in red.
24
I.2 Interpolation >Y(5) = Y(5)+0.02; >plot(X(5),Y(5),or) >a=V\Y; >YL=polyval(a,XL); >plot(XL,YL,r); >hold off The resulting graph makes a wild excursion and even though it goes through the given points, it would not be a satisfactory interpolating function in a practical situation.
1.5
0.5
-0.5
-1
If we try to go to 20 points equally spaced between 0 and 1, the Vandermonde matrix is so ill conditioned that MATLAB/Octave considers it to be singular.
25
I Linear Equations
It consists of a bendable but sti strip held in position by a series of weights called ducks. We will try to make a mathematical model of such a device. We begin again with points (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) in the plane. Again we are looking for a function f (x) that goes through all these points. This time, we want to nd the function that has the same shape as a real draughtsmans spline. We will imagine that the given points are the locations of the ducks. Our rst task is to identify a large class of functions that represent possible shapes for the spline. We will write down three conditions for a function f (x) to be acceptable. Since the spline has no breaks in it the function f (x) should be continuous. Moreover f (x) should pass through the given points. Condition 1: f (x) is continuous and f (xi ) = yi for i = 1, . . . , n. The next condition reects the assumption that the strip is sti but bendable. If the strip were not sti, say it were actually a rubber band that just is stretched between the ducks, then our resulting function would be a straight line between each duck location (xi , yi ). At each duck location there would be a sharp bend in the function. In other words, even though the function itself would be continuous, the rst derivative would be discontinuous at the duck locations. We will interpret the words bendable but sti to mean that the rst derivatives of f (x) exist. This leads to our second condition.
26
I.2 Interpolation Condition 2: The rst derivative f (x) exists and is continuous everywhere, including each interior duck location xi . In between the duck locations we will assume that f (x) is perfectly smooth and that higher derivatives behave nicely when we approach the duck locations from the right or the left. This leads to Condition 3: For x in between the duck points xi the higher order derivatives f (x), f (x), . . . all exist and have left and right limits as x approaches each xi . In this condition we are allowing for the possibility that f (x) and higher order derivatives have a jump at the duck locations. This happens if the left and right limits are dierent. The set of functions satisfying conditions 1, 2 and 3 are all the possible shapes of the spline. How do we decide which one of these shapes is the actual shape of the spline? To do this we need to invoke a bit of the physics of bendable strips. The bending energy E[f ] of a strip whose shape is described by the function f is given by the integral
xn
E[f ] =
x1
f (x)
dx
The actual spline will relax into the shape that makes E[f ] as small as possible. Thus, among all the functions satisfying conditons 1, 2 and 3, we want to choose the one that minimizes E[f ]. This minimization problem is similiar to ones considered in calculus courses, except that instead of real numbers, the variables in this problem are functions f satisfying conditons 1, 2 and 3. In calculus, the minimum is calculated by setting the derivative to zero. A similar procedure is described in the next section. Here is the result of that calculation: Let F (x) be the function describing the shape that makes E[f ] as small as possible. In other words, F (x) satises condtions 1, 2 and 3. If f (x) also satises conditions 1, 2 and 3, then E[F ] E[f ]. Then, in addition to conditions 1, 2 and 3, F (x) satises Condition a: In each interval (xi , xi+1 ), the function F (x) is a cubic polynomial. In other words, for each interval there are coecients Ai , Bi , Ci and Di such that F (x) = Ai x3 + Bi x2 + Ci x + Di for all x between xi and xi+1 . The coecients can be dierent for dierent intervals. Condition b: The section derivative F (x) is continuous. Condition c: When x is an endpoint (either x1 or xn ) then F (x) = 0 As we will see, there is exactly one function satisfying conditions 1, 2, 3, a, b and c.
27
I Linear Equations
=
=0 x1 x3
dx
=0 =0
=
x1
dx
x3
=2
x1 x2
=2
x1
F (x)h (x)dx
28
I.2 Interpolation We divide by 2 and integrate by parts in each integral. This gives 0 = F (x)h (x)
x=x2 x=x1 x2
x1
x=x3 x=x2
x3
F (x)h (x)dx
x2
In each boundary term we have to take into account the possibility that F (x) is not continuous across the points xi . Thus we have to use the appropriate limit from the left or the right. So, for the rst boundary term F (x)h (x)
x=x2 x=x1
Notice that since h (x) is continuous across each xi we need not distinguish the limits from the left and the right. Expanding and combining the boundary terms we get 0 = F (x1 +)h (x1 ) + F (x2 ) F (x2 +) h (x2 ) + F (x3 )h (x3 )
x2 x3
x1
F (x)h (x)dx
F (x)h (x)dx
x2
Now we integrate by parts again. This time the boundary terms all vanish because h(xi ) = 0 for every i. Thus we end up with the equation 0 = F (x1 +)h (x1 ) + F (x2 ) F (x2 +) h (x2 ) + F (x3 )h (x3 )
x2
+
x1
F (x)h(x)dx
x3
F (x)h(x)dx
(I.2)
x2
as desired. Recall that this equation has to be true for every choice of h satisfying conditions 1, 2 and 3. For dierent choices of h(x) we can extract dierent pieces of information about the minimizer F (x). To start, we can choose h that is zero everywhere except in the open interval (x1 , x2 ). For x all such h we then obtain 0 = x12 F (x)h(x)dx. This can only happen if F (x) = 0 for x1 < x < x2
Thus we conclude that the fourth derivative F (x) is zero in the interval (x1 , x2 ). Once we know that F (x) = 0 in the interval (x1 , x2 ), then by integrating both sides we can conclude that F (x) is constant. Integrating again, we nd F (x) is a linear polynomial. By integrating four times, we see that F (x) is a cubic polynomial in that interval. When doing the integrals, we must not extend the domain of integration over the boundary point x2 since F (x) may not exist (let alone by zero) there. Similarly F (x) must also vanish in the interval (x2 , x3 ), so F (x) is a (possibly dierent) cubic polynomial in the interval (x2 , x3 ).
29
I Linear Equations (An aside: to understand better why the polynomials might be dierent in the intervals (x1 , x2 ) and (x3 , x4 ) consider the function g(x) (unrelated to the spline problem) given by g(x) = 0 for x1 < x < x2 1 for x2 < x < x3
Then g (x) = 0 in each interval, and an integration tells us that g is constant in each interval. However, g (x2 ) does not exist, and the constants are dierent.) We have established that F (x) satises condition a. Now that we know that F (x) vanishes in each interval, we can return to (??) and write it as 0 = F (x1 +)h (x1 ) + F (x2 ) F (x2 +) h (x2 ) + F (x3 )h (x3 ) Now choose h(x) with h (x1 ) = 1 and h (x2 ) = h (x3 ) = 0. Then the equation reads F (x1 +) = 0 Similarly, choosing h(x) with h (x3 ) = 1 and h (x1 ) = h (x2 ) = 0 we obtain F (x3 ) = 0 This establishes condition c. Finally choosing h(x) with h (x2 ) = 1 and h (x1 ) = h (x3 ) = 0 we obtain F (x2 ) F (x2 +) = 0 In other words, F must be continuous across the interior duck position. Thus shows that condition b holds, and the derivation is complete. This calculation is easily generalized to the case where there are n duck positions x1 , . . . , xn . A reference for this material is Essentials of numerical analysis, with pocket calculator demonstrations, by Henrici.
30
I.2 Interpolation Secondly, we require continuity of the rst derivative across interior xi s. This translates to p (xi+1 ) = p (xi+1 ) or i i+1 3ai (xi+1 xi )2 + 2bi (xi+1 xi ) + ci = ci+1 for i = 1, . . . , n 2, giving an additional n 2 equations. Next, we require continuity of the second derivative across interior xi s. This translates to p (xi+1 ) = p (xi+1 ) or i i+1 6ai (xi+1 xi ) + 2bi = 2bi+1 for i = 1, . . . , n 2, once more giving an additional n 2 equations. Finally, we require that p (x1 ) = p (xn ) = 0. This yields two more equations n1 1 2b1 = 0 6an1 (xn xn1 ) + 2bn1 = 0 for a total of 3(n 1) equations for the same number of variables. We now specialize to the case where the distances between the points xi are equal. Let L = xi+1 xi be the common distance. Then the equations read ai L3 + bi L2 3ai L + 2bi L 6ai L + 2bi for i = 1 . . . n 2 together with an1 L3 + bn1 L2 + 2b1 6an1 L + 2bn1 +cn1 L = yn yn1 =0 =0
2
= yi+1 yi
=0 =0
We make one more simplication. After multiplying some of the equations with suitable powers of L we can write these as equations for i = ai L3 , i = bi L2 and i = ci L. They have a very simple block structure. For example, when n = 4 the matrix form of the equations is 1 1 1 0 0 0 0 0 0 1 y2 y1 3 2 1 0 0 1 0 0 0 1 0 6 2 0 0 2 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 2 y3 y2 0 0 0 3 2 1 0 0 1 2 = 0 0 0 0 6 2 0 0 2 0 2 0 0 0 0 0 0 0 1 1 1 3 y4 y3 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 6 2 0 3 0 Notice that the matrix in this equation does not depend on the points (xi , yi ). It has a 3 3
31
Once we have solved the equation for the coecients i , i and i , the function F (x) in the interval (xi , xi+1 ) is given by F (x) = pi (x) = i x xi L
3
+ i
x xi L
+ i
x xi L
+ yi
Now let us use MATLAB/Octave to plot a cubic spline. To start, we will do an example with four interpolation points. The matrix S in the equation is dened by >N=[1 1 1;3 2 1;6 2 0]; >M=[0 0 0;0 0 -1; 0 -2 0]; >Z=zeros(3,3); >T=[0 0 0;0 2 0; 0 0 0]; >V=[1 1 1;0 0 0;6 2 0]; >S=[N M Z; Z N M; T Z V] S = 1 3 1 2 1 1 0 0 0 0 0 -1 0 0 0 0 0 0
32
I.2 Interpolation 6 0 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 3 6 0 0 0 -2 1 2 2 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 6 0 0 0 -2 1 0 2 0 0 -1 0 1 0 0
Here we used the function zeros(n,m) which denes an n m matrix lled with zeros. To proceed we have to know what points we are trying to interpolate. We pick four (x, y) values and put them in vectors. Remember that we are assuming that the x values are equally spaced. >X=[1, 1.5, 2, 2.5]; >Y=[0.5, 0.8, 0.2, 0.4]; We plot these points on a graph. >plot(X,Y,o) >hold on Now lets dene the right side of the equation >b=[Y(2)-Y(1),0,0,Y(3)-Y(2),0,0,Y(4)-Y(3),0,0]; and solve the equation for the coecients. >a=S\b; Now lets plot the interpolating function in the rst interval. We will use 50 closely spaced points to get a smooth looking curve. >XL = linspace(X(1),X(2),50); Put the rst set of coecients (1 , 1 , 1 , y1 ) into a vector >p = [a(1) a(2) a(3) Y(1)];
33
I Linear Equations Now we put the values p1 (x) into the vector YL. First we dene the values (x x1 )/L and put them in the vector XLL. To get the values x x1 we want to subtract the vector with X(1) in every position from X. The vector with X(1) in every position can be obtained by taking a vector with 1 in every position (in MATLAB/Octave this is obtained using the function ones(n,m)) and multiplying by the number X(1). Then we divide by the (constant) spacing between the xi values.
Now we evaluate the polynomial p1 (x) and plot the resulting points.
To complete the plot, we repeat this steps for the intervals (x2 , x3 ) and (x3 , x4 ).
>XL = linspace(X(2),X(3),50); >p = [a(4) a(5) a(6) Y(2)]; >XLL = (XL - X(2)*ones(1,50))/L; >YL = polyval(p,XLL); >plot(XL,YL); >XL = linspace(X(3),X(4),50); >p = [a(7) a(8) a(9) Y(3)]; >XLL = (XL - X(3)*ones(1,50))/L; >YL = polyval(p,XLL); >plot(XL,YL);
34
I.2 Interpolation
0.8
0.7
0.6
0.5
0.4
0.3
I have automated the procedure above and put the result in two les splinemat.m and plotspline.m. splinemat(n) returns the 3(n1)3(n1) matrix used to compute a spline through n points while plotspline(X,Y) plots the cubic spline going through the points in X and Y. If you put these les in you MATLAB/Octave directory you can use them like this:
>splinemat(3) ans = 1 3 6 0 0 0 1 2 2 0 2 0 1 1 0 0 0 0 0 0 0 1 0 6 0 0 -2 1 0 2 0 -1 0 1 0 0
and
35
I Linear Equations to produce the plot above. Lets use these functions to compare the cubic spline interpolation with the Lagrange interpolation by using the same points as we did before. Remember that we started with the points >X=linspace(0,1,18); >Y=sin(2*pi*X); Lets plot the spline interpolation of these points >plotspline(X,Y); Here is the result with the Lagrange interpolation added (in red). The red (Lagrange) curve covers the blue one and its impossible to tell the curves apart.
1.5
0.5
-0.5
-1
Now we move one of the points slightly, as before. >Y(5) = Y(5)+0.02; Again, plotting the spline in blue and the Lagrange interpolation in red, here are the results.
36
I.2 Interpolation
1.5
0.5
-0.5
-1
This time the spline does a much better job! Lets check the condition number of the matrix for the splines. Recall that there are 18 points. >cond(splinemat(18)) ans = 32.707
Recall the Vandermonde matrix had a condition number of 1.8822e+14. This shows that the system of equations for the splines is very much better conditioned, by 13 orders of magnitude!!
37
I Linear Equations for l=[1:k-1] S(3*k-2:3*k,3*l-2:3*l) = Z; end S(3*k-2:3*k,3*k-2:3*k) = L; S(3*k-2:3*k,3*k+1:3*k+3) = M; for l=[k+2:n-1] S(3*k-2:3*k,3*l-2:3*l) = Z; end end S(3*(n-1)-2:3*(n-1),1:3)=T; for l=[2:n-2] S(3*(n-1)-2:3*(n-1),3*l-2:3*l) = Z; end S(3*(n-1)-2:3*(n-1),3*(n-1)-2:3*(n-1))=V; end
function plotspline(X,Y) n=length(X); L=X(2)-X(1); S=splinemat(n); b=zeros(1,3*(n-1)); for k=[1:n-1] b(3*k-2)=Y(k+1)-Y(k); b(3*k-1)=0; b(3*k)=0; end a=S\b; npoints=50; XL=[]; YL=[]; for k=[1:n-1] XL = [XL linspace(X(k),X(k+1),npoints)]; p = [a(3*k-2),a(3*k-1),a(3*k),Y(k)]; XLL = (linspace(X(k),X(k+1),npoints) - X(k)*ones(1,npoints))/L; YL = [YL polyval(p,XLL)]; end plot(X,Y,o)
38
How to create a matrix by concatenating other matrices C= [A B] takes two matrices A and B and creates a new matrix C by concatenating A and B horizontally
Other specialized matrix functions zeros(n,m) creates a n-by-m matrix lled with zeros ones(n,m) creates a n-by-m matrix lled with ones vander(X) creates the Vandermonde matrix corresponding to the points in the vector X. Note that the columns of the Vandermonde matrix are powers of the vector X. Other useful functions and commands polyval(a,X) takes a vector X of x values and returns a vector containing the values of a polynomial p evaluated at the x values. The coecients of the polynomial p (in descending powers) are the values in the vector a. sin(X) takes a vector X of values x and returns a vector containing the values of the function sin x plot(X,Y) plots vector Y versus vector X. Points are joined by a solid line. To change line types (solid, dashed, dotted, etc.) or plot symbols (point, circle, star, etc.), include an additional argument. For example, plot(X,Y,o) plots the points as little circle.
39
I Linear Equations
This is a dierential equation where the unknown quantity to be found is a function f (x). The functions q(x) and r(x) are given (known) functions. As dierential equations go, this is a very simple one. For one thing it is an ordinary dierential equation (ODE), because it only involves one independent variable x. But the nite dierence methods we will introduce can also be applied to partial dierential equations (PDE). It can be useful to have a picture in your head when thinking about an equation. Here is a situation where an equation like the one we are studying arises. Suppose we want to nd the shape of a stretched hanging cable. The cable is suspended above the points x = 0 and x = 1 at heights of A and B respectively and hangs above the interval 0 x 1. Our goal is to nd the height f (x) of the cable above the ground at every point x between 0 and 1.
40
u(x) A B 0 1 x
The loading of the cable is described by a function 2r(x) that takes into account both the weight of the cable and any additional load. Assume that this is a known function. The height function f (x) is the function that minimizes the sum of the stretching energy and the gravitational potential energy given by
1
E[f ] =
0
subject to the condition that f (0) = A and f (1) = B. An argument similar (but easier) to the one we did for splines shows that the minimizer satises the dierential equation f (x) = r(x). So we end up with the special case of our original equation where q(x) = 0. Actually, this special case can be solved by simply integrating twice and adjusting the constants of integration to ensure f (0) = A and f (1) = B. For example, when r(x) = r is constant and A = B = 1, the solution is f (x) = 1 rx/2 + rx2 /2. We can use this exact solution to compare against the approximate solution that we will compute.
I.3.2 Discretization
In the nite dierence approach to solving dierential equations approximately, we want to approximate a function by a vector containing a nite number of sample values. Pick equally spaced points xk = k/N , k = 0, . . . , N between 0 and 1. We will represent a function f (x) by its values fk = f (xk ) at these points. Let f0 f1 F = . . . . fN 41
I Linear Equations
f(x)
f0 f1 f2 f3 f4 f5 f6 f7 f8 x
x0 x1 x2 x3 x4 x5 x6 x7 x8
At this point we throw away all the other information about the function, keeping only the values at the sampled points.
f(x)
f0 f1 f2 f3 f4 f5 f6 f7 f8 x
x0 x1 x2 x3 x4 x5 x6 x7 x8
If this is all we have to work with, what should we use as an approximation to f (x)? It seems reasonable to use the slopes of the line segments joining our sampled points.
f(x)
f0 f1 f2 f3 f4 f5 f6 f7 f8 x
x0 x1 x2 x3 x4 x5 x6 x7 x8
Notice, though, that there is one slope for every interval (xi , xi+1 ) so the vector containing the slopes has one fewer entry than the vector F. The formula for the slope in the interval
42
I.3 Finite dierence approximations (xi , xi+1 ) is (fi+1 fi )/x where the distance x = xi+1 xi (in this case x = 1/N ). Thus the vector containing the slopes is f0 f1 f0 1 1 0 0 0 f2 f1 0 1 1 0 0 f1 f 0 1 1 0 2 = (x)1 DN F F = (x)1 f3 f2 = (x)1 0 . . . . . .. . f3 . . . . . . . . . . . . . . . . fN fN 1 0 0 0 0 1 fN
where DN is the N (N + 1) nite dierence matrix in the formula above. The vector F is our approximation to the rst derivative function f (x).
To approximate the second derivative f (x), we repeat this process to dene the vector F . There will be one entry in this vector for each adjacent pair of slopes, that is, each adjacent pair of entries of F . These are naturally labelled by the interior points x1 , x2 , . . . , xn1 . Thus we obtain f0 1 2 1 0 0 0 0 0 1 2 1 0 0 0 f1 f 1 2 0 0 0 2 . F = (x)2 DN 1 DN F = (x)2 0 0 . . . . . . . f3 .. . . . . . . . . . . . . . . . . . . 0 0 0 0 1 2 1 fN
Let rk = r(xk ) be the sampled points for the load function r(x) and dene the vector approximation for r at the interior points r1 . r = . . . rN 1
The reason we only dene this vector for interior points is that that is where F is dened. Now we can write down the nite dierence approximation to f (x) = r(x) as (x)2 DN 1 DN F = r or DN 1 DN F = (x)2 r
This is a system of N 1 equations in N + 1 unknowns. To get a unique solution, we need two more equations. That is where the boundary conditions come in! We have two boundary conditions, which in this case can simply be written as f0 = A and fN = B. Combining these
43
I Linear Equations with the N 1 equations for the interior 1 0 0 0 1 2 1 0 0 1 2 1 0 0 1 2 . . . . .. . . . . . . . . . 0 0 0 0 0 0 0 0 points, we may rewrite the system of equations as 0 0 0 A 0 0 0 (x)2 r1 0 0 0 (x)2 r2 0 0 0 F = . . . . . . . . . . . . . 2r (x) N 1 1 2 1 B 0 0 1
Note that it is possible to incorporate other types of boundary conditions by simply changing the rst and last equations.
Lets dene L to be the (N + 1) (N + 1) matrix of coecients for this equation, so that the equation has the form LF = b. The rst thing to do is to verify that L is invertible, so that we know that there is a unique solution to the equation. It is not too dicult to compute the determinant if you recall that the elementary row operations that add a multiple of one row to another do not change the value of the determinant. Using only this type of elementary row operation, we can reduce L to an upper triangular matrix whose diagonal entries are 1, 2, 3/2, 4/3, 5/4, . . . , N/(N 1), 1. The determinant is the product of these entries, and this equals N . Since this value is not zero, the matrix L is invertible. It is worthwhile pointing out that a change in boundary conditions (for example, prescribing the values of the derivative f (0) and f (1) rather than f (0) and f (1)) results in a dierent matrix L that may fail to be invertible. We should also ask about the condition number of L to determine how large the relative error of the solution can be. We will compute this using MATLAB/Octave below. Now lets use MATLAB/Octave to solve this equation. We will start with the test case where r(x) = 1 and A = B = 1. In this case we know that the exact solution is f (x) = 1 x/2 + x2 /2. We will work with N = 50. Notice that, except for the rst and last rows, L has a constant value of 2 on the diagonal, and a constant value of 1 on the o-diagonals immediately above and below. Before proceeding, we introduce the MATLAB/Octave command diag. For any vector D, diag(D) is a diagonal matrix with the entries of D on the diagonal. So for example >D=[1 2 3 4 5]; >diag(D)
44
An optional second argument osets the diagonal. So, for example >D=[1 2 3 4]; >diag(D,1) ans = 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 4 0
>diag(D,-1) ans = 0 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0
Now returning to our matrix L we can dene it as >N=50; >L=diag(-2*ones(1,N+1)) + diag(ones(1,N),1) + diag(ones(1,N),-1); >L(1,1) = 1; >L(1,2) = 0; >L(N+1,N+1) = 1; >L(N+1,N) = 0; The condition number of L for N = 50 is
45
I Linear Equations >cond(L) ans = 1012.7 We will denote the right side of the equation by b. To start, we will dene b to be (x)2 r(xi ) and then adjust the rst and last entries to account for the boundary values. Recall that r(x) is the constant function 1, so its sampled values are all 1 too. >dx = 1/N; >b=ones(N+1,1)*dx^2; >A=1; B=1; >b(1) = A; >b(N+1) = B; Now we solve the equation for F. >F=L\b; The x values are N + 1 equally spaced points between 0 and 1, >X=linspace(0,1,N+1); Now we plot the result. >plot(X,F)
1
0.98
0.96
0.94
0.92
0.9
0.88
0.2
0.4
0.6
0.8
46
I.3 Finite dierence approximations Lets superimpose the exact solution in red. >hold on >plot(X,ones(1,N+1)-X/2+X.^2/2,r) (The . before an operator tells MATLAB/Octave to apply that operator element by element, so X.^2 returns an array with each element the corresponding element of X squared.)
0.98
0.96
0.94
0.92
0.9
0.88
0.2
0.4
0.6
0.8
The two curves are indistinguishable. What happens if we increase the load at a single point? Recall that we have set the loading function r(x) to be 1 everywhere. Lets increase it at just one point. Adding, say, 5 to one of the values of r is the same as adding 5(x)2 to the right side b. So the following commands do the job. We are changing b11 which corresponds to changing r(x) at x = 0.2. >b(11) = b(11) + 5*dx^2; >F=L\b; >hold on >plot(X,F); Before looking at the plot, lets do this one more time, this time making the cable really heavy at the same point.
47
I Linear Equations >b(11) = b(11) + 50*dx^2; >F=L\b; >hold on >plot(X,F); Here is the resulting plot.
0.95
0.9
0.85
0.8
So far we have only considered the case of our equation f (x) + q(x)f (x) = r(x) where q(x) = 0. What happens when we add the term containing q? We must sample the function q(x) at the interior points and add the corresponding vector. Since we multiplied the equations for the interior points by (x)2 we must do the same to these terms. Thus we must add the term 0 0 0 0 0 0 0 0 0 q1 0 0 0 0 0 q1 f1 0 0 q2 0 0 0 0 q2 f2 0 0 F. (x)2 = (x)2 0 0 0 q3 0 . . . . . . . . .. . . . . . . . . . . . . . . . . . qN 1 fN 1 0 0 0 0 0 qN 1 0 0 0 0 0 0 0 0 0 In other words, we replace the matrix L in our equation with L + (x)2 Q where Q is the (N + 1) (N + 1) diagonal matrix with the interior sampled points of q(x) on the diagonal.
48
I.3 Finite dierence approximations Ill leave it to a homework problem to incorporate this change in a MATLAB/Octave calculation. One word of caution: the matrix L by itself is always invertible (with reasonable condition number). However L + (x)2 Q may fail to be invertible. This reects the fact that the original dierential equation may fail to have a solution for some choices of q(x) and r(x).
T K0 (x, t) = t Cp
+ x, t) x
T x (x, t)
Taking the limit as x goes to zero we obtain 2T T (x, t) = k 2 (x, t), t x where k = K0 /Cp is a constant. This partial dierential equation is known as the heat equation and describes how the temperature along a one-dimensional rod evolves.
49
I Linear Equations We can also include other eects. If there was a temperature source or sink, S(x, t), then this will contribute to the local change in temperature: 2T T (x, t) = k 2 (x, t) + S(x, t). t x And if we also allow the rod to cool down along its length (because, say, the surrounding air is a dierent temperature than the rod), then the dierential equation becomes 2T T (x, t) = k 2 (x, t) HT (x, t) + S(x, t), t x where H is a constant (here we have assumed that the surrounding air temperature is zero). In certain cases we can think about what the steady state of the rod will be. That is after suciently long time (so that things have had plenty of time for the heat to move around and for things to heat up/cool down), the temperature will cease to change in time. Once this steady state is reached, things become independent of time, and the dierential equation becomes 2T 0 = k 2 (x) HT (x) + S(x), x which is of the same form as the ordinary dierential equation that we considered at the start of this section.
50
Chapter II
Subspaces, Bases and Dimension
51
Write down a vector as a linear combination of a set of vectors. Dene linear independence for a collection of vectors. Dene a basis for a vector subspace.
Know the denitions of vector addition and scalar multiplication for vector spaces of functions Decide whether a given collection of vectors forms a subspace. Recast the dependence or independence of a collection of vectors in Rn or Cn as a statement about existence of solutions to a system of linear equations. Decide if a collection of vectors are dependent or independent. Dene the span of a collection of vectors; show that given a set of vectors v1 , . . . , vk the span span(v1 , . . . , vk ) is a subspace. Describe the signicance of the two parts (independence and span) of the denition of a basis. Check if a collection of vectors is a basis. Show that any basis for a subspace has the same number of elements. Show that any set of k linearly independent vectors v1 , . . . , vk in a k dimensional subspace S is a basis of S. Dene the dimension of a subspace.
52
In three dimensional space R3 , examples of subspaces are lines and planes through the origin. If we add or scalar multiply two vectors lying on the same line (or plane) the resulting vector remains on the same line (or plane). Additional examples of subspaces are the trivial subspace, containing the single vector 0, as well as the whole space itself.
Here is another example of a subspace. The set of n n matrices can be thought of as an n2 dimensional vector space. Within this vector space, the set of symmetric matrices (satisfying AT = A) is a subspace. To see this, suppose A1 and A2 are symmetric. Then, using the linearity property of the transpose, we see that (c1 A1 + c2 A2 )T = c1 AT + c2 AT = c1 A1 + c2 A2 1 2 which shows that c1 A1 + c2 A2 is symmetric too. We have encountered subspaces of functions in the section on interpolation. In Lagrange interpolation we considered the set of all polynomials of degree at most m. This is a subspace of the space of functions, since adding two polynomials of degree at most m results in another polynomial, again of degree at most m, and scalar multiplication of a polynomial of degree at most m yields another polynomial of degree at most m. Another example of a subspace of functions is the set of all functions y(t) that satisfy the dierential equation y (t) + y(t) = 0. To check that this is a subspace, we must verify that if y1 (t) and y2 (t) both solve the dierential equation, then so does c1 y1 (t) + c2 y2 (t) for any choice of scalars c1 and c2 .
53
ci vi = c1 v1 + c2 v2 + + ck vk
for some choice of numbers c1 , c2 , . . . , ck . The vectors v1 , . . . , vk are called linearly dependent if there exist numbers c1 , c2 , . . . , ck that are not all zero, such that the linear combination k ci vi = 0 i=1 On the other hand, the vectors are called linearly independent if the only linear combination of the vectors equaling zero has every ci = 0. In other words
k
ci vi = 0 implies
i=1
c1 = c2 = = ck = 0
7 1 1 For example, the vectors 1, 0 and 1 are linearly dependent because 7 1 1 1 1 7 0 1 1 + 6 0 1 1 = 0 1 1 7 0 If v1 , . . . , vk are linearly dependent, then at least one of the vi s can be written as a linear combination of the others. To see this suppose that c1 v1 + c2 v2 + + ck vk = 0 with not all of the ci s zero. Then we can solve for any of the vi s whose coecient ci is not zero. For instance, if c1 is not zero we can write v1 = (c2 /c1 )v2 (c3 /c1 )v3 (ck /c1 )vk This means any linear combination we can make with the vectors v1 , . . . , vk can be achieved without using v1 , since we can simply replace the occurrence of v1 with the expression on the right. Sometimes it helps to have a geometrical picture. In three dimensional space R3 , three vectors are linearly dependent if they lie in the same plane. The columns of a matrix in echelon form are linearly independent if and only if every column is a pivot column. We illustrate this with two examples.
54
II.1 Subspaces, basis and dimension 1 0 2 The matrix 0 0 pivot column. Here is an example of a matrix in echelon form where each column is a 3 denotes an arbitrary entry.
Then, equating the bottom entries we nd 3c3 = 0 so c3 = 0. But once we know c3 = 0 then the equation reads 0 1 c1 0 + c2 2 = 0 0 0 0
Similarly, for a matrix in echelon form (even if, as in the example below, it is not completely reduced), the pivot columns are linearly independent. For example the rst, second and fth columns in the matrix 1 1 1 1 0 0 1 2 5 5 0 0 0 0 1 are independent. However, the non-pivot columns can be written as linear combination of the pivot columns. For example 1 1 1 2 = 0 + 2 1 0 0 0
so if there are non-pivot columns, then the set of all columns is linearly dependent. This is particularly easy to see for a matrix in reduced row echelon form, like 1 0 1 1 0 0 1 2 5 0 . 0 0 0 0 1
In this case the pivot columns are standard basis vectors (see below), which are obviously independent. It is easy to express the other columns as linear combinations of these. Recall that for a matrix U in echelon form, the presence or absence of non-pivot columns determines whether the homogeneous equation U x = 0 has any non-zero solutions. By the discussion above, we can say that the columns of a matrix U in echelon form are linearly dependent exactly when the homogeneous equation U x = 0 has a non-zero solution. In fact, this is true for any matrix. Suppose that the vectors v1 , . . . , vk are the columns of a matrix A so that A = v1 v2 vk .
55
II Subspaces, Bases and Dimension If we put the coecients c1 , c2 , . . . , ck into a vector c1 c2 c=. . . ck then Ac = c1 v1 + c2 v2 + + ck vk is the linear combination of the columns v1 , . . . , vk with coecients ci . Now it follows directly from the denition of linear dependence that the columns of A are linearly dependent if there is a non-zero solution c to the homogeneous equation Ac = 0 On the other hand, if the only solution to the homogeneous equation is c = 0 then the columns v1 , . . . , vk are linearly independent. To compute whether a given collection of vectors is dependent or independent we can place them in the columns of a matrix A and reduce to echelon form. If the echelon form has only pivot columns, then the vectors are independent. On the other hand, if the echelon form has some non-pivot columns, then the equation Ac = 0 has some non-zero solutions and so the vectors are dependent. Lets try this with the vectors in the example above in MATLAB/Octave. >V1=[1 >V2=[1 >V3=[7 >A=[V1 A = 1 1 1 1 0 1 7 1 7 1 1]; 0 1]; 1 7]; V2 V3]
>rref(A) ans = 1 0 0 0 1 0 1 6 0
Since the third column is a non-pivot column, the vectors are linearly dependent.
56
II.1.3 Span
Given a collection of vectors v1 , . . . , vk we may form a subspace of all possible linear combinations. This subspace is called span(v1 , . . . , vk ) or the space spanned by the vi s. It is a subspace because if we start with any two elements of span(v1 , . . . , vk ), say c1 v1 +c2 v2 + +ck vk and d1 v1 + d2 v2 + + dk vk then a linear combination of these linear combinations is again a linear combination since s1 (c1 v1 + c2 v2 + + ck vk ) + s2 (d1 v1 + d2 v2 + + dk vk ) =
1 0 0 0, 1 and 0 is the whole three dimensional For example the span of the three vectors 0 0 1 space,because everyvector is a linear combination of these. The span of the four vectors 1 0 0 1 0, 1, 0 and 1 is the same. 1 1 0 0
II.1.4 Basis
A collection of vectors v1 , . . . , vk contained in a subspace V is called a basis for that subspace if 1. span(v1 , . . . , vk ) = V , and 2. v1 , . . . , vk are linearly independent. Condition (1) says that any vector in V can be written as a linear combination of v1 , . . . , vk . Condition (2) says that there is exactly one way of doing this. Here is the argument. Suppose there are two ways of writing the same vector v V as a linear combination: v = c1 v1 + c2 v2 + + ck vk v = d1 v1 + d2 v2 + + dk vk
Then by subtracting these equations, we obtain 0 = (c1 d1 )v1 + (c2 d2 )v2 + + (ck dk )vk Linear independence now says that every coecient in this sum must be zero. This implies c1 = d1 , c2 = d2 . . . ck = dk .
57
II Subspaces, Bases and Dimension Example: Rn has the standard basis e1 , e2 , . . . , en where 1 0 0 1 e1 = e2 = . . . . . . 1 1 , . To see this, notice that saying that any vector y can 1 1 1 1 + c2 is the same as saying that the equation be written in a unique way as c1 1 1 Another basis for R2 is 1 1 1 1 always has a unique solution. This is true. A basis for the vector space P2 of polynomials of degree at most two is given by {1, x, x2 }. These polynomials clearly span P2 since every polynomial p P2 can be written as a linear combination p(x) = c0 1 + c1 x + c2 x2 . To show independence, suppose that c0 1 + c1 x + c2 x2 is the zero polynomial. This means that c0 1 + c1 x + c2 x2 = 0 for every value of x. Taking the rst and second derivatives of this equation yields that c1 + 2c2 x = 0 and 2c2 = 0 for every value of x. Substituting x = 0 into each of these equations we nd c0 = c1 = c2 = 0. Notice that if we represent the polynomial p(x) = c0 1 + c1 x + c2 x2 P2 by the vector c0 of coecients c1 R3 , then the vector space operations in P2 are mirrored perfectly in c2 R3 . In other words, adding or scalar multiplying polynomials in P2 is the same as adding or scalar multiplying the corresponding vectors of coecients in R3 . This sort of correspondence can be set up whenever we have a basis v1 , v2 , . . . , vk for a vector space V . In this case every vector v has a unique representation c1 v1 +c2 v2 + +ck vk c1 c2 and we can represent the vector v V by the vector . Rk (or Ck ). In some sense this . . c1 =x c2
ck says that we can always think of nite dimensional vector spaces as being copies of Rn or Cn . The only catch is that the the correspondence that gets set up between vectors in V and vectors in Rn or Cn depends on the choice of basis.
It is intuitively clear that, say, a plane in three dimensions will always have a basis of two vectors. Here is an argument that shows that any two bases for a subspace S of Rk or Ck will always have the same number of elements. Let v1 , . . . , vn and w1 , . . . , wm be two bases
58
II.1 Subspaces, basis and dimension for a subspace S. Lets try to show that n must be the same as m. Since the vi s span V we can write each wi as a linear combination of vi s. We write
n
wj =
i=1
ai,j vi
for each j = 1, . . . , m. Lets put all the coecients into an n m matrix A = [ai,j ]. If we form the matrix k m matrix W = [w1 |w2 | |wm ] and the k n matrix V = [v1 |v2 | |vm ] then the equation above can be rewritten W =VA 1 1 1 4 0 , 1 and 2 , 2 To understand this construction consider the two bases 1 0 6 1 for subspace in R3 (in fact this subspace is the plane through the origin with normal vector a 1 1). Then we may write 1 4 1 1 2 = 6 0 2 1 6 1 0 1 1 1 2 = 0 + 2 1 1 1 0 and the equation W = V A for this example reads 4 1 1 1 2 2 = 0 1 6 1 2 2 6 1 1 0 Returning now to the general case, suppose that m > n. Then A has more columns than rows. So its echelon form must have some non-pivot columns which implies that there must be some non-zero solution to Ac = 0. Let c = 0 be such a solution. Then W c = V Ac = 0 But this is impossible because the columns of W are linearly dependent. So it cant be true that m > n. Reversing the roles of V and W we nd that n > m is impossible too. So it must be that m = n. We have shown that any basis for a subspace S has the same number of elements. Thus it makes sense to dene the dimension of a subspace S to be the number of elements in any
59
II Subspaces, Bases and Dimension basis for S. Here is one last fact about bases: any set of k linearly independent vectors {v1 , . . . , vk } in a k dimensional subspace S automatically spans S and is therefore a basis. To see this (in the case that S is a subspace of Rn or Cn ) we let {w1 , . . . , wk } be a basis for S, which also will have k elements. Form V = [v1 | |vk ] and W = [w1 | |wk ]. Then the construction above gives V = W A for a k k matrix A. The matrix A must be invertible. Otherwise there would be a non-zero solution c to Ac = 0. This would imply V c = W Ac = 0 contradicting the independence of the rows of V . Thus we can write W = V A1 which shows that every wk is a linear combination of vi s. This shows that the vi s must span S because every vector in S is a linear combination of the basis vectors wk s which in turn are linear combinations of the vi s. As an example of this, consider again the space P2 of polynomials of degree at most 2. We claim that the polynomials {1, (x a), (x a)2 } (for any constant a) form a basis. We already know that the dimension of this space is 3, so we only need to show that these three polynomials are independent. The argument for that is almost the same as before.
60
II.2 The four fundamental subspaces for a matrix Be able to compute the rank of a matrix. State the formulas for the dimension of each of the four subspaces and be able to explain why they are true. Explain what it means for two subspaces to be orthogonal (V W ) and for one subspace to be the orthogonal complement of another (V = W ). State which of the fundamental subspaces are orthogonal to each other and explain why, verify the orthogonality relations in examples, and use the orthogonality relation for R(A) to test whether the equation Ax = b has a solution. Use MATLAB/Octave to compute the inner product of two vectors and the angle between them. Be familiar with the MATLAB/Octave function eye().
There are two important subspaces associated to any matrix. Let A be an n m matrix. If x is m dimensional, then Ax makes sense and is a vector in n dimensional space. The rst subspace associated to A is the nullspace (or kernel) of A denoted N (A) (or Ker(A)). It is dened as all vectors x solving the homogeneous equation for A, that is N (A) = {x : Ax = 0} This is a subspace because if Ax1 = 0 and Ax2 = 0 then A(c1 x1 + c2 x2 ) = c1 Ax1 + c2 Ax2 = 0 + 0 = 0. The nullspace is a subspace of m dimensional space Rm . The second subspace is the range (or column space) of A denoted R(A) (or C(A)). It is dened as all vectors of the form Ax for some x. From our discussion above, we see that R(A) is the the span (or set of all possible linear combinations) of its columns. This explains the name column space. The range is a subspace of n dimensional space Rn . The four fundamental subspaces for a matrix are the nullspace N (A) and range R(A) for A together with the nullspace N (AT ) and range R(AT ) for the transpose AT .
61
To calculate a basis for the nullspace N (A) and determine its dimension we need to nd the solutions to Ax = 0. To do this we rst reduce A to reduced row echelon form U and solve U x = 0 instead, since this has the same solutions as the original equation. >A=[1 3 3 10;2 6 -1 -1;1 3 1 4]; >rref(A) ans = 1 0 0 3 0 0 0 1 0 1 3 0
We now divide the variables into basic variables, corresponding to pivot columns, and free variables, corresponding to non-pivot columns. In this example the basic variables are x1 and x3 while the free variables are x2 and x4 . The free variables are the parameters in the solution. We can solve for the basic variables in terms of the free ones, giving x3 = 3x4 and x1 = 3x2 x4 . This leads to x1 3x2 x4 3 1 x2 1 0 x2 = x3 3x4 = x2 0 + x4 3 x4 x4 0 1 3 1 1 0 The vectors and span the nullspace since every element of N (A) is a linear 0 3 0 1 combination of them. They are also linearly independent because if the linear combination
62
II.2 The four fundamental subspaces for a matrix on the right of the equation above is zero, then by looking at the second entry of the vector (corresponding to the rst free variable) we nd x2 = 0 and looking at the last entry (corresponding to the second free variable) we nd x4 = 0. So both coecients must be zero. To nd a basis for N (A) in general we rst compute U = rref(A) and determine which variables are basic and which are free. For each free variable we form a vector as follows. First put a 1 in the position corresponding to that free variable and a zero in every other free variable position. Then ll in the rest of the vector in such a way that U x = 0. (This is easy to do!) The set all such vectors - one for each free variable - is a basis for N (A).
This immediately explains why the N (A) = N (U ), because if Ax = 0 then U x = E 1 Ax = 0 and conversely if Ax = 0 then U x = EAx = 0. What is this matrix E? It can be thought of as a matrix record of the Gaussian elimination steps taken to reduce A to U . It turns out performing an elementary row operation is the same as multiplying on the left by an invertible square matrix. This invertible square matrix, called an elementary matrix, is obtained by doing the row operation in question to the identity matrix. Suppose we start with the matrix >A=[1 3 3 10;2 6 -1 -1;1 3 1 4] A = 1 2 1 3 6 3 3 -1 1 10 -1 4
The rst elementary row operation that we want to do is to subtract twice the rst row from the second row. Lets do this to the 3 3 identity matrix I (obtained with eye(3) in MATLAB/Octave) and call the result E1 >E1 = eye(3) E1 = 1 0 0 0 1 0 0 0 1
63
which is the result of doing that elementary row operation to A. Lets do one more step. The second row operation we want to do is to subtract the rst row from the third. Thus we dene
>E2 = eye(3) E2 = 1 0 0 0 1 0 0 0 1
>E2(3,:) = E2(3,:)-E2(1,:) E2 = 1 0 -1 0 1 0 0 0 1
and we nd
64
II.2 The four fundamental subspaces for a matrix >E2*E1*A ans = 1 0 0 3 0 0 3 -7 -2 10 -21 -6
which is one step further along in the Gaussian elimination process. Continuing in this way until we eventually arrive at U so that Ek Ek1 E2 E1 A = U
1 1 1 1 Thus A = EU with E = E1 E2 Ek1 Ek . For the example above it turns out that
1 3 6 E = 2 1 18 1 1 9
>U=rref(A) U = 1 0 0 3 0 0 0 1 0 1 3 0
65
II Subspaces, Bases and Dimension If we do a partial elimination then at each step we can write A = E U where U is the resulting matrix at the point we stopped, and E is obtained from the Gaussian elimination step up to that point. A common place to stop is when U is in echelon form, but the entries above the pivots have not been set to zero. If we can achieve this without doing any row swaps along the way then E turns out to be lower triangular matrix. Since U is upper triangular, this is called the LU decomposition of A.
66
The nal subspace to consider is N (AT ). From our work above we know that dim(N (AT )) = n dim(R(AT )) = n r(A). Finding a basis is trickier. It might be easiest to nd the reduced row echelon form of AT . But if we insist on using A = EU or AT = U T E T we could proceed by multiplying on the right be the inverse of E T . This gives AT (E T )1 = U T Now notice that the last n r(A) columns of U T are zero, since U is in reduced row echelon form. So the last n r(A) columns of (E T )1 are in the the nullspace of AT . They also have to be independent, since (E T )1 is invertible. Thus the last n r(A) of (E T )1 form a basis for N (AT ). From a practical point of view, this is not so useful since we have to compute the inverse of a matrix. It might be just as easy to reduce AT . (Actually, things are slightly better if we use the LU decomposition. The same argument shows that the last n r(A) columns of (LT )1 also form a basis for N (AT ). But LT is an upper triangular matrix, so its inverse is faster to compute.)
67
xT y = x1 x2
xn
xi yi
i=1
Some important properties of the inner product are symmetry xy =yx and linearity (c1 x1 + c2 x2 ) y = c1 x1 y + c2 x2 y. The (Euclidean) norm, or length, of a vector is given by x =
n
xx =
x2 i
i=1
An important property of the norm is that x = 0 implies that x = 0. The geometrical meaning of the inner product is given by x y = x y cos() where is the angle between the vectors. The angle can take values from 0 to . The CauchySchwarz inequality states |x y| x y . It follows from the previous formula because | cos()| 1. The only time that equality occurs in the CauchySchwarz inequality, that is x y = x y , is when cos() = 1 and is either 0 or . This means that the vectors are pointed in the same or in the opposite directions.
68
II.2 The four fundamental subspaces for a matrix The vectors x and y are orthogonal if x y = 0. Geometrically this means either that one of the vectors is zero or that they are at right angles. This follows from the formula above, since cos() = 0 implies = /2. Another way to see that x y = 0 means that vectors are orthogonal is from Pythagoras formula. If x and y are at right angles then x 2 + y 2 = x + y 2 .
x+y y x
+ y
To compute the inner product of (column) vectors X and Y in MATLAB/Octave we use the formula x y = xT y. Thus the inner product can be computed using X*Y. (If X and Y are row vectors, the formula is X*Y.) The norm of a vector X is computed by norm(X). In MATLAB/Octave inverse trig functions are computed with asin(), acos() etc. So the angle between column vectors X and Y could be computed as
> acos(X*Y/(norm(X)*norm(Y)))
Two subspaces V and W are said to be orthogonal if every vector in V is orthogonal to every vector in W . In this case we write V W .
T S
69
II Subspaces, Bases and Dimension In this gure V W and also S T . A related concept is the orthogonal complement. The orthogonal complement of V , denoted V is the subspace containing all vectors orthogonal to V . In the gure W = V but T = S since T contains only some of the vectors orthogonal to S.
,
If we take the orthogonal complement of V we get back the original space V : This is certainly plausible from the pictures. It is also obvious that V (V ) , since any vector in V is perpendicular to vectors in V . If there were a vector in (V ) not contained in V we could subtract its projection onto V (dened in the next chapter) and end up with a non-zero vector in (V ) that is also in V . Such a vector would be orthogonal to itself, which is impossible. This shows that (V ) = V. One consequence of this formula is that V = W implies V = W . Just take the orthogonal complement of both sides and use (W ) = W .
70
II.2 The four fundamental subspaces for a matrix vector of the form AT y. So (AT y) x = y (Ax) = 0 for every y. Pick y = Ax. Then (Ax) (Ax) = Ax 2 = 0. This implies Ax = 0 so x N (A). We can conclude that R(AT ) N (A). These two inclusions establish that N (A) = R(AT ) . Lets verify these orthogonality relations in 1 1 A= 2 an example. Let 2 1 1 3 0 1 5 1 2 1 0 rref(AT ) = 0 0 0 1 0 0 1 1 0 0
Then
1 3 1 0 N (A) = span , 0 1 0 1 2 1 1 , 3 R(A) = span 2 5 1 T N (A ) = span 1 1 0 1 T 0 , 1 R(A ) = span 1 3 1 0 We can now verify directly that every vector in the basis for N (A) is orthogonal to every vector in the basis for R(AT ), and similarly for N (AT ) and R(A). Does the equation 2 Ax = 1 3
have a solution? We can use the ideas above to answer this question easily. We are really 2 1 is contained in R(A). But, according to the orthogonality relations, this asking whether 3 71
II Subspaces, Bases and Dimension 2 1 is contained in N (AT ) . This is easy to check. Simply is the same as asking whether 3 compute the dot product 2 1 1 1 = 2 1 + 3 = 0. 3 1
72
State Ohms law for a resistor. State Kirchhos laws for a resistor network. After completing this section, you should be able to
Be able to write down the incidence matrix of a directed graph, and to draw the graph given the incidence matrix. Dene the Laplace operator or Laplacian for a graph and be able to write it down. When the edges of a graph represent resistors or batteries in a circuit, you should be able to interpret each of the four subspaces associated with the incidence matrix and their dimensions in terms of voltage and current vectors, and verify their orthogonality relations. write down Ohms law for all the edges of the graph in matrix form using the Laplacian. express the connection between Kirchos law and the nullspace of the Laplacian. use the voltage-to-current map to calculate the voltages and currents in the network when a battery is attached . use the voltage-to-current map to calculate the eective resistance between two nodes in the network. Re-order rows and columns of a matrix and extract submatrices in MATLAB/Octave.
73
1 2 2 4 3
Graphs come up in many applications. For example, the nodes could represent computers and the arrows internet connections. Or the nodes could be factories and the arrows represent movement of goods. We will mostly focus on a single interpretation where the edges represent resistors or batteries hooked up in a circuit. In this interpretation we will be assigning a number to each edge to indicate the amount of current owing through that edge. This number can be positive or negative. The arrows indicate the direction associated to a positive current. The incidence matrix of a graph is an n m matrix, where n is the number of edges and m is the number of vertices. We label the rows by the edges in the graph and the columns by the vertices. Each row of the matrix corresponds to an edge in the graph. It has a 1 in the place corresponding to the vertex where the arrow starts and a 1 in the place corresponding to the vertex where the arrow ends. Here is the incidence matrix for the illustrated graph. 1 2 3 4
1 1 1 0 0 2 0 1 1 0 0 3 0 1 1 4 0 1 0 1 5 1 0 0 1 The columns of the matrix have the following interpretation. The column representing a given vertex has a +1 for each arrow coming in to that vertex and a 1 for each arrow leaving the vertex. Given an incidence matrix, the corresponding graph can easily be drawn. What is the graph for 1 1 0 0 1 1? 1 0 1 (Answer: a triangular loop.)
74
For a graph that has several disconnected pieces, Dv = 0 will force v to be constant on each connected component of the graph. Each connected component will contribute one basis vector to N (D). This is the vector that is equal to 1 on that component and zero everywhere else. Thus dim(N (D)) will be equal to the number of disconnected pieces in the graph.
75
II Subspaces, Bases and Dimension b1 b2 example the edges 1 , 4 , 5 form a loop, so if b = b3 is a voltage dierence vector then b4 b5 v2 v1 v3 v2 b1 + b4 + b5 = 0 We can check this directly in the example. Since b = Dv = v4 v3 v4 v2 v1 v4 we check that (v2 v1 ) + (v4 v2 ) + (v1 v4 ) = 0. In the example graph there are three loops, namely 1 , 4 , 5 and 2 , 3 , 4 and 1 , 2 , 3 , 5 . The corresponding equations that the components of a vector b must satisfy to be in the range of D are b1 + b4 + b5 = 0 b1 + b2 + b3 + b5 = 0 b2 + b3 b4 = 0
Notice the minus sign in the second equation corresponding to a backwards arrow. However these equations are not all independent, since the third is obtained by adding the rst two. There are two independent equations that the components of b must satisfy. Since R(D) is 3 dimensional, there can be no additional constraints. y1 y2 Now we wish to nd interpretations for the null space and the range of DT . Let y = y3 y4 y5 5 which we interpret as being an assignment of a current to each edge in be a vector in R y5 y1 y y2 y4 the graph. Then D T y = 1 y2 y3 . This vector assigns to each node the amount of y3 + y4 y5 current collecting at that node.
II.3 Graphs and Networks vector lies in the null space of D T : 1 0 1 0 0 0 1 1 1 0 1 0 0 0 0 = 0 1 1 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 The current vectors corresponding to the other two loops are 1 and 1. However 1 0 0 1 these three vectors are not linearly independent. Any choice of two of these vectors are independent, and form a basis.
x1 x This is the set of vectors in R4 of the form 2 = D T y. With our interpretation these are x3 x4 vectors which measure how the currents in y are building up or draining away from each node. Since the current that is building up at one node must have come from some other nodes, it must be that x1 + x2 + x3 + x4 = 0 In our example, this can be checked directly. This one condition in R4 results in a three dimensional subspace.
77
R1 2
R5 R 4
R2
3 4 R3
If we begin by an assignment of voltage to every node, and put these numbers in a vector v R4 . Then Dv R5 represents the vector of voltage dierences for each of the edges. Given the resistance Ri for each edge, we can now invoke Ohms law to compute the current owing through each edge. For each edge, Ohms law states that (V )i = ji Ri , where (V )i is the voltage drop across the edge, ji is the current owing through that edge, and Ri is the resistance. Solving for the current we obtain
1 ji = Ri (V )i .
Notice that the voltage drop (V )i in this formula is exactly the ith component of the vector Dv. So if we collect all the currents owing along each edge in a vector j indexed by the edges, then Ohms law for all the edges can be written as j = R1 Dv where R1 0 0 0 0 0 R2 0 0 0 R=0 0 R3 0 0 0 0 0 R4 0 0 0 0 0 R5
78
II.3 Graphs and Networks Finally, if we multiply j by the matrix DT the resulting vector J = DT j = D T R1 Dv has one entry for each node, representing the total current owing in or out of that node along the edges that connect to it. The matrix L = D T R1 D appearing in this formula is called the Laplacian. It is similar to the second derivative matrix that appeared when we studied nite dierence approximations. One important property of the Laplacian is symmetry, that is the fact that LT = L. To see this recall that the transpose of a product of matrices is the product of the transposes in reverse order ((ABC)T = C T B T AT ). This implies that LT = (D T R1 D)T = DT R1 D = L Here we used that D T
T T
Lets determine the entries of L. To start we consider the case where all the resistances have the same value 1 so that R = R1 = I. In this case L = DT D. Lets start with the example graph above. Then 1 1 0 0 1 0 0 0 1 2 1 0 1 0 1 1 0 1 1 0 1 0 1 3 1 1 0 L= 0 1 1 = 0 1 0 1 1 0 0 2 1 0 1 0 1 0 0 1 1 1 1 1 1 3 1 0 0 1
Notice that the ith diagonal entry is the total number of edges connected to the ith node. The i, j entry is 1 if the ith node is connected to the jth node, and 0 otherwise. This pattern describes the Laplacian L for any graph. To see this, write D = [d1 |d2 |d3 | |dm ] Then the i, j entry of DT D is dT dj . Recall that di has an entry of 1 for every edge leaving i the ith node, and a 1 for every edge coming in. So dT di , the diagonal entries of DT D, are i the sum of (1)2 , with one term for each edge connected to the ith node. This sum gives the total number of edges connected to the ith node. To see this in the example graph, lets consider the rst node. This node has two edges connected to it and 1 0 d1 = 0 0 1 79
II Subspaces, Bases and Dimension Thus the 1, 1 entry of the Laplacian is dT d1 = (1)2 + 12 = 2 1 On the other hand, if i = j then the vectors di and dj have a non-zero entry in the same position only if one of the edges leaving the ith node is coming in to the jth node or vice versa. For a graph with at most one edge connecting any two nodes (we usually assume this) this means that dT dj will equal 1 if the ith and jth nodes are connected by an edge, and i zero otherwise. For example, in the graph above the rst edge leaves the rst node, so that d1 has a 1 in the rst position. This rst edge comes in to the second node so d2 has a +1 in the rst position. Otherwise, there is no overlap in these vectors, since no other edges touch both these nodes. Thus 1 1 dT d2 = 1 0 0 0 1 0 = 1 1 1 0 What happens if the resistances are not all equal to one? In this case we must replace D with R1 D in the calculation above. This multiplies the kth row of D with k = 1/Rk . Making this change in the calculations above leads to the following prescription for calculating the entries of L. The diagonal entries are given by Li,i =
k
Where the sum goes over all edges touching the ith node. When i = j then Li,j = k 0 if nodes i and j are connected with edge k if nodes i and j are not connected
80
II.3 Graphs and Networks To see N (L) = N (D) we start with a vector v N (D). Then Dv = 0 implies Lv = D T R1 Dv = DT R1 0 = 0. This show that v N (L) too, that is, N (D) N (L) To show the opposite inclusion we rst note that the matrix R1 can be factored into a product of invertible matrices R1 = R1/2 R1/2 where R1/2 is the diagonal matrix with diagonal entries 1/ Ri . This is possible because each Ri is a positive number. Also, since R1/2 is a diagonal matrix it is equal to its transpose, that is, R1/2 = (R1/2 )T .
Now suppose that Lv = 0. This can be written D T (R1/2 )T R1/2 Dv = 0. Now we multiply on the left with vT . This gives vT DT (R1/2 )T R1/2 Dv = (R1/2 Dv)T R1/2 Dv = 0 But for any vector w, the number wT w is the dot product of w with itself which is equal to the length of w squared. Thus the equation above can be written R1/2 Dv
2
=0
This implies that R1/2 Dv = 0. Finally, since R1/2 is invertible, this yields Dv = 0. We have shown that any vector in N (L) also is contained in N (D). Thus N (L) N (D) and together with the previous inclusion this yields N (L) = N (D).
1 R1 R5 R 4 3 4 R3 2 R2
The terminals of a battery are kept at a xed voltage. Thus the voltages v1 and v2 are now known, say, v1 = b1 v2 = b2
81
II Subspaces, Bases and Dimension Of course, it is only voltage dierences that have physical meaning, so we could set b1 = 0. Then b2 would be the voltage of the battery. At the rst and second nodes there now will be current owing in and out from the battery. Lets call these currents J1 and J2 . At all the other nodes the total current owing in and out is still zero, as before. How are the equations for the circuit modied? For simplicity lets set all the resistances Ri = 1. The new equations are 2 1 0 1 b1 J1 1 b2 J2 3 1 1 = 0 1 2 1 v3 0 1 1 1 3 v4 0
Two of the voltages v1 and v2 have changed their role in these equations from being unknowns to being knowns. On the other hand, the rst two currents, which were originally known quantities (namely zero) are now unknowns. Since the current owing into the network should equal the current owing out, we expect J1 J2 that J1 = J2 . This follows from the orthogonality relations for L. The vector is 0 0 contained in R(L). But R(L) = N (LT ) = N (L) (since L = LT ). But we know that N (L) consists of all constant vectors. Hence J1 1 J2 1 = J1 + J2 = 0 0 1 1 0 To solve this system of equations we write it in block matrix form A BT B C b J = v 0 0 1 1 1 J= J1 J2 2 1 1 3 0 0
where A= and
2 1 1 3 b1 b2
B= v3 v4
C=
b=
v=
0=
82
II.3 Graphs and Networks We can solve the second equation for v. Since C is invertible v = C 1 Bb Using this value of v in the rst equation yields J = (A B T C 1 B)b The matrix A B T C 1 B is the voltage-to-current map. In our example A B T C 1 B = (8/5) 1 1 1 1
This can be deduced from two facts: (i) AB T C 1 B is symmetric and (ii) R(AB T C 1 B) = 1 span . You are asked to carry this out in a homework problem. 1 Notice that this form of the matrix implies that if b1 = b2 then the currents are zero. b Another way of seeing this is to notice that if b1 = b2 then 1 is orthogonal to the range b2 of A B T C 1 B by (ii) and hence in the nullspace N (A B T C 1 B). The number R= 1
is the ratio of the applied voltage to the resulting current, is the eective resistance of the network between the two nodes. So in our example circuit, the eective resistance between nodes 1 and 2 is 5/8. If the battery voltages are b1 = 0 and b2 = b then the voltages at the remaining nodes are v3 0 4/5 = C 1 B = b v4 b 3/5
83
1 R1 2 R2 3
with i = 1/Ri , as always. We want the eective resistance between nodes 1 and 3. Although it is not strictly necessary, it is easier to see what the submatrices A, B and C are if we reorder the vertices so that the ones we are connecting, namely 1 and 3, come rst. This reshues the rows and columns of L yielding 1 3 2
Here we have labelled the re-ordered rows and columns with the nodes they represent. Now the desired submatrices are A= and A B T C 1 B = 1 0 0 2 B = 1 2 C = 1 + 2
1 1 0 1 3 0 2 2 2 1 2 1 + 2
2 1 2 1 1 1 2 1 1 1 0 = 2 1 0 2 1 + 2 1 2 2 1 + 2 1
84
8 4 3 5 1 2
We will use MATLAB/Octave to solve this problem. To begin we dene the Laplace matrix L. Since each node has three edges connecting it, and all the resistances are 1, the diagonal entries are all 3. The o-diagonal entries are 1 or 0, depending on whether the corresponding nodes are connected or not. >L=[3 -1 0 -1 0 -1 3 -1 0 0 -1 0 0 0 3 -1 0 0 -1 0 0 -1 -1 0 0 0;-1 3 -1 -1 0;-1 0 -1 3 0 0 -1;0 -1 0 0 -1 3 -1;0 0 0 -1 -1 0 0 3 0 0 -1 0 0; 0 -1; -1 0; -1 3];
We want to nd the eective resistance between 1 and 7. To compute the submatrices A, B and C it is convenient to re-order the nodes so that 1 and 7 come rst. In MATLAB/Octave, this can be achieved with the following statement. >L=L([1,7,2:6,8],[1,7,2:6,8]); In this statement the entries in the rst bracket [1,7,2:6,8] indicates the new ordering of the rows. Here 2:6 stands for 2,3,4,5,6. The second bracket indicates the re-ordering of the columns, which is the same as for the rows in our case. Now it is easy to extract the submatrices A, B and C and compute the voltage-to-current map DN >N = length(L); >A = L(1:2,1:2); >B = L(3:N,1:2); >C = L(3:N,3:N); >DN = A - B*C^(-1)*B;
85
II Subspaces, Bases and Dimension The eective resistance is the reciprocal of the rst entry in DN. The command format rat gives the answer in rational form. (Note: this is just a rational approximation to the oating point answer, not an exact rational arithmetic as in Maple or Mathematica.) >format rat >R = 1/DN(1,1) R = 5/6
86
Chapter III
Orthogonality
87
III Orthogonality
III.1 Projections
Prerequisites and Learning Goals
After completing this section, you should be able to
write down the denition of an orthogonal projection matrix use propeties of a projection matrix to deduce facts like the orthogonality of the null space and range. compute the orthogonal projection matrix whose range in the span of a given collection of vectors. use orthogonal projection matrices to decompose a vector in to components parallel to and perpendicular to a given subspace. use least squares to compute approximate solutions to systems of equations with no solutions. perform least squares calculations in applications where overdetermined systems arise.
Lets review why this formula is true, using properties of the dot product. Here is a diagram of the situation.
88
To get the vector p start with the unit vector a/ a and stretch it by an amount p . This gives 1 a = aaT x p= p a a 2 This can be written p = P x where P is the projection matrix above. Notice that the matrix P satises P 2 = P since P2 = In addition, P T = P since PT = 1 1 1 (aaT )T = (aT )T AT = aaT = P 2 2 a a a 2 1 1 1 aaT aaT = a a 2 aT aaT = P 4 4 a a a 2
We will discuss the signicance of the two properties P 2 = P and P T = P below. 1 1 1 in the direction of a = 2? Lets calculate Example: What is the projection of x = 1 1 2 = P and P T = P . the projection matrix P and compute P x and verify that P >x = [1 1 1]; >a = [1 2 -1]; >P = (a*a)^(-1)*a*a P = 0.16667 0.33333 -0.16667 >P*x ans = 0.33333 0.66667 -0.33333 >P*P 0.33333 0.66667 -0.33333 -0.16667 -0.33333 0.16667
89
III Orthogonality
ans = 0.16667 0.33333 -0.16667 0.33333 0.66667 -0.33333 -0.16667 -0.33333 0.16667
x p q
Thus we can write q = x P x = (I P )x = Qx where Notice that, like the matrix P , the matrix Q also satises Q2 = Q and QT = Q since Q2 = (I P )(I P ) = I 2P + P 2 = I P = Q and QT = I T P T = I P = Q. Continuing with the example above, if we want to compute the projection matrix onto the plane perpendicular to a we compute Q = I P . Then Qx is the projection of x onto the plane. We can also check that Q2 = Q. > Q = eye(3) - P Q = 0.83333 -0.33333 0.16667 -0.33333 0.33333 0.33333 0.16667 0.33333 0.83333 Q=I P
90
III.1 Projections
>Q*x ans = 0.66667 0.33333 1.33333 >Q^2 ans = 0.83333 -0.33333 0.16667 -0.33333 0.33333 0.33333 0.16667 0.33333 0.83333
= (P a + Qb) (P a + Qb) = P a
+ Qb
91
III Orthogonality since the cross terms vanish. Let P be an orthogonal projection. Lets show that given any vector x, the vector P x is the vector in R(P ) that is closest to x. First we compute the square of the distance from P x to x. This is given by P x x 2 = Qx 2 Now let P y be any other vector in the range R(P ). Then the square of the distance from P y to x is Py x
2
= P y (P + Q)x
= P (y x) + Qx
= P (y x)
+ Qx
We want to determine x so that Ax is as close as possible to b. In other words, we want to minimize Ax b . This will happen when Ax is the projection of b onto R(A), that is, Ax = P b, where P is the projection matrix. In this case Qb = (I P )b is orthogonal to R(A). But (I P )b = b Ax. Therefore (and this is also clear from the picture), we see that Ax b is orthogonal to R(A). But the vectors orthogonal to R(A) are exactly the vectors in N (AT ). Thus the vector we are looking for will satisfy AT (Ax b) = 0 or the equation AT Ax = AT b This is the least squares equation, and a solution to this equation is called a least squares solution. (Aside: We can also use Calculus to derive the least squares equation. We want to minimize Ax b 2 . Computing the gradient and setting it to zero results in the same equations.)
92
III.1 Projections It turns out that the least squares equation always has a solution. Another way of saying this is R(AT ) = R(AT A). Instead of checking this, we can verify that the orthogonal complements N (A) and N (AT A) are the same. But this is something we showed before, when we considered the incidence matrix D for a graph. If x solves the least squares equation, the vector Ax is the projection of b onto the range R(A), since Ax is the closest vector to x in the range of A. In the case where AT A is invertible (this happens when N (A) = N (AT A) = {0}), we can obtain a formula for the projection. Starting with the least squares equation we multiply by (AT A)1 to obtain x = (AT A)1 AT b so that Ax = A(AT A)1 AT b. Thus the projection matrix is given by P = A(AT A)1 AT Notice that the formula for the projection onto a line through a is a special case of this, since then AT A = a 2 . It is worthwhile pointing out that if we say that the solution of the least squares equation gives the best approximation to a solution, what we really mean is that it minimizes the distance, or equivalently, its square Ax b
2
((Ax)i bi )2 .
There are other ways of measuring how far Ax is from b, for example the so-called L1 norm Ax b
1
|(Ax)i bi |
Minimizing the L1 norm will result in a dierent best solution, that may be preferable under some circumstances. However, it is much more dicult to nd!
III.1.4 Polynomial t
Suppose we have some data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) and we want to t a polynomial p(x) = a1 xm1 + a2 xm2 + + am1 x + am through them. This is like the Lagrange interpolation problem we considered before, except that now we assume that n > m. This means that in general there will no such polynomial. However we can look for the least squares solution. To begin, lets write down the equations that express the desired equalities p(xi ) = yi for i = 1, . . . m. These can be written in matrix form
93
III Orthogonality
or Aa = y. where A is a submatrix of the Vandermonde matrix. To nd the least squares approximation we solve AT Aa = AT y. In a homework problem, you are asked to do this using MATLAB/Octave. In case where the polynomial has degree one this is a straight line t, and the equation we want to solve are x1 1 y1 x2 1 y2 a1 = . . . a2 . . . xn 1 yn
y1 x1 1 y2 x2 1 a . . 1 . . . a2 . . . . . . . = . . . . . . . . . . . . am . . . . . . yn xn 1
These equations will not have a solution (unless the points really do happen to lie on the same line.) To nd the least squares solution, we compute x1 1 x1 x2 xn x2 1 xi x2 i . = 1 1 1 . xi n . xn 1 and x1 x2 1 1 This results in the familiar equations x2 i xi which are easily solved. y1 xn y2 .= 1 . . yn xi n a1 = a2 xi y i yi
xi yi yi
94
III.1 Projections dierence in scores is the dierence in values. So, if the seasons games had the following results 1 vs. 2 30 40 1 vs. 2 20 40 2 vs. 3 10 0 3 vs. 1 0 5 3 vs. 2 5 5 then the vi s would satisfy the equations v2 v1 = 10
v2 v3 = 0
v1 v3 = 5
v2 v3 = 10
v2 v1 = 20
Of course, there is no solution to these equations. Nevertheless we can nd the least squares solution. The matrix form of the equations is Dv = b with 1 1 D= 0 1 0 1 0 1 0 1 1 0 1 1 1 10 20 b = 10 5 0
D T Dv = D T v 3 2 1 35 2 4 2 v = 40 1 2 3 5
Before going on, notice that D is an incidence matrix. What is the graph? (Answer: the nodes are the teams and they are joined by an edge with the arrow pointing from the losing team to the winning team. This graph may have more than one edge joining to nodes, if two teams play more than once. This is sometimes called a multi-graph.). We saw that in this situation N (D) is not empty, but contains vectors whose entries are all the same. The situation is the same as for resistances, it is only dierences in vi s that have a meaning. We can solve this equation in MATLAB/Octave. The straightforward way is to compute >L = [3 -2 -1;-2 4 -2;-1 -2 3];
95
III Orthogonality >b = [-35; 40; -5]; >rref([L b]) ans = 1.00000 0.00000 0.00000 0.00000 1.00000 0.00000 -1.00000 -1.00000 0.00000 -7.50000 6.25000 0.00000
We can choose s so that the vi for one of the teams is zero. This is like grounding a node 0 in a circuit. So, by choosing s = 7.5, s = 6.25 and s = 0 we obtain the solutions 13.75, 7.5 13.75 7.5 0 or 6.25 . 6.25 0 Actually, it is easier to compute a solution with one of the vi s equal to zero directly. If 0 v v = v2 then v2 = 2 satises the equation L2 v2 = b2 where the matrix L2 is the bottom v3 v3 right 2 2 block of L and b2 are the last two entries of b. >L2 = L(2:3,2:3); >b2 = b(2:3); >L2\b2 ans = 13.7500 7.5000
As expected, the solution is not unique. The general solution, depending on the parameter s is 7.5 1 v = s 1 + 6.25 1 0
We can try this on real data. The football scores for the 2007 CFL season can be found at http://www.cfl.ca/index.php?module=sked&func=view&year=2007. The dierences in scores for the rst 20 games are in cfl.m. The order of the teams is BC, Calgary, Edmonton, Hamilton, Montreal, Saskatchewan, Toronto, Winnipeg. Repeating the computation above for this data we nd the ranking to be (running the le cfl.m)
96
III.1 Projections v = 0.00000 -12.85980 -17.71983 -22.01884 -11.37097 -1.21812 0.87588 -20.36966 Not very impressive, if you consider that the second-lowest ranked team (Winnipeg) ended up in the Grey Cup game!
97
III Orthogonality
0 0 e3 = 1 , . . .
If you expand a vector in an orthonormal basis, its very easy to nd the coecients in the expansion. Suppose v = c1 q1 + c2 q2 + + cn qn
98
III.2 Orthonormal bases and Orthogonal Matrices for some orthonormal basis q1 , q2 , . . .. Then, if we take the dot product of both sides with qk , we get qk v = c1 qk q1 + c2 qk q2 + + ck qk qk + cn qk qn = ck = 0 + 0 + + ck + + 0
This gives a convenient formula for each ck . For example, in the expansion 1 1 1 1 1 + c2 = c1 1 2 2 1 2 we have 1 1 3 1 c1 = = 2 2 1 2 1 1 1 1 c2 = = 1 2 2 2 Notice also that the norm of v is easily expressed in terms of the coecients ci . We have v
2
= c2 + c2 + + c2 1 2 n
= (c1 q1 + c2 q2 + + cn qn ) (c1 q1 + c2 q2 + + cn qn )
=vv
Another way of saying this is that the vector c = [c1 , c2 , . . . , cn ] of coecients has the same norm as v.
This is the same as saying that the columns of Q form an orthonormal basis.
Another way of recognizing orthogonal matrices is by their action on vectors. Suppose Q is orthogonal. Then Qv
2
99
III Orthogonality This implies that Qv = v . In other words, orthogonal matrices dont change the lengths of vectors. The converse is also true. If a matrix Q doesnt change the lengths of vectors then it must be orthogonal. To see this, suppose that Qv = v for every v. Then the calculation above shows that v (QT Qv) = v v for every v. Applying this to v + w we nd (v + w) QT Q(v + w) = (v + w) (v + w) Expanding, this gives v (QT Qv) + w (QT Qw) + v (QT Qw) + w (QT Qv) = v v + w w + v w + w v Since v (QT Qv) = v v and w (QT Qw) = w w we can cancel these terms. Also w (QT Qv) = ((QT Q)T w) v = (QT Qw) v = v (QT Qw). So on each side of the equation, the two remaining terms are the same. Thus v (QT Qw) = v w This equation holds for every choice of vectors v and w. If v = ei and w = ej then the left side is the i, jth matrix element Qi,j of Q while the right side is the ei ej , which is i, jth matrix element of the identity matrix. Thus QT Q = I and Q is orthogonal. We can recast the problem of nding coecients c1 , c2 , . . . , cn in the expansion v = c1 q1 + c2 q2 + + cn qn in an orthonormal basis as the solution of the matrix equation Qc = v where Q is the orthogonal matrix whose columns contain the orthonormal basis vectors. The solution is obtained by multiplying by QT . Since QT Q = I multiplying both sides by QT gives c = QT v. The fact that c = v follows from the length preserving property of orthogonal matrices. QT Q Recall that for square matrices a left inverse is automatically also a right inverse. So if = I then QQT = I too. This means that QT is an orthogonal matrix whenever Q is. This proves the (non-obvious) fact that if the columns of an square matrix form an orthonormal basis, then so do the rows! A set G of invertible matrices is called a (matrix) group if 1. I G (G contains the identity matrix) 2. If A, B G then AB G (G is closed under matrix multiplication) 3. If A G then A is invertible and A1 G (G is closed under taking the inverse) In a homework problem, you are asked to verify that the set of n n orthogonal matrices is a group.
100
101
III Orthogonality The modulus of a complex number, denoted |z| is simply the length of the corresponding vector in two dimensions. If z = x + iy |z| = |x + iy| = An important property is |zw| = |z||w| The complex conjugate of a complex number z, denoted z , is the reection of z across the x axis. Thus x + iy = x iy. Thus complex conjugate is obtained by changing all the is to is. We have zw = z w and z = |z|2 z This last equality is useful for simplifying fractions of complex numbers by turning the denominator into a real number, since zw z = w |w|2 For example, to simplify (1 + i)/(1 i) we can write 1+i (1 + i)2 1 1 + 2i = = =i 1i (1 i)(1 + i) 2 A complex number z is real (i.e. the y part in x + iy is zero) whenever z = z. We also have the following formulas for the real and imaginary part. If z = x + iy then x = (z + z )/2 and y = (z z )/(2i) We dene the exponential, eit , of a purely imaginary number it to be the number eit = cos(t) + i sin(t) lying on the unit circle in the complex plane. The complex exponential satises the familiar rule ei(s+t) = eis eit since by the addition formulas for sine and cosine ei(s+t) = cos(s + t) + i sin(s + t) = cos(s) cos(t) sin(s) sin(t) + i(sin(s) cos(t) + cos(s) sin(t)) = (cos(s) + i sin(s))(cos(t) + i sin(t)) = eis eit The exponential of a number that has both a real and imaginary part is dened in the natural way. ea+ib = ea eib = ea (cos(b) + i sin(b)) x2 + y 2
102
III.3 Complex vector spaces and inner product The derivative of a complex exponential is given by the formula d (a+ib)t e = (a + ib)e(a+ib)t dt while the anti-derivative, for (a + ib) = 0 is e(a+ib)t dt = 1 e(a+ib)t + C (a + ib)
zn
w, z = w z =
i=1
w i zi
103
III Orthogonality With this denition the norm of z is always positive since
n
z, z = z
=
i=1
|zi |2
For complex matrices and vectors we have to modify the rule for bringing a matrix to the other side of an inner product. w, Az = wT Az = (AT w)T z = (A w)
T T T
= A w, z
This leads to the denition of the adjoint of a matrix A = A . (In physics you will also see the notation A .) With this notation w, Az = A w, z . The complex analogue of an orthogonal matrix is called a unitary matrix. A unitary matrix U is a square matrix satisfying U U = U U = I. Notice that a unitary matrix with real entries is an orthogonal matrix since in that case U = U T . The columns of a unitary matrix form an orthonormal basis (with respect to the complex inner product.) MATLAB/Octave deals seamlessly with complex matrices and vectors. Complex numbers can be entered like this >z= 1 + 2i z = 1 + 2i
T
There is a slight danger here in that if i has be dened to be something else (e.g. i =16) then z=i would set z to be 16. You could use z=1i to get the desired result, or use the alternative syntax >z= complex(0,1) z = 0 + 1i
104
III.3 Complex vector spaces and inner product The functions real(z), imag(z), conj(z), abs(z) compute the real part, imaginary part, conjugate and modulus of z. The function exp(z) computes the complex exponential if z is complex. If a matrix A has complex entries then A is not the transpose, but the adjoint (conjugate transpose).
z = 1 + 0i 0 + 1i
z ans = 1 - 0i 0 - 1i
>z*z ans = 2
(Warning: the function dot in Octave does not compute the correct inner product for complex vectors (it doesnt take the complex conjugate). This is probably a bug. In MATLAB dot works correctly for complex vectors.)
105
III Orthogonality
f, g =
a
f (x)g(x)dx
= f, f =
a
|f (x)|2 dx
For real valued functions we can ignore the complex conjugate. Example: the inner product of f (x) = 1 + ix and g(x) = x2 over the interval [0, 1] is 1 + ix, x2 =
0 1
(1 + ix) x2 dx =
1 0
(1 ix) x2 dx =
1 0
x2 ix3 dx =
1 1 i 3 4
It will often happen that a function, like f (x) = x is dened for all real values of x. In this case we can consider inner products and norms for any interval [a, b] including semi-innite and innite intervals, where a may be or b may be +. Of course the values of the inner product an norm depend on the choice of interval. There are technical complications when dealing with spaces of functions. In this course we will deal with aspects of the subject where these complications dont play an important role. However, it is good to aware that they exist, so we will mention a few. One complication is that the integral dening the inner product may not exist. For example for the interval (, ) = R the norm of f (x) = x is innite since
|x|2 dx =
Even if the interval is nite, like [0, 1], the function might have a spike. For example, if f (x) = 1/x then 1 1 dx = |x|2 0 106
III.3 Complex vector spaces and inner product too. To overcome this complication we agree to restrict our attention to square integrable functions. For any interval [a, b], these are the functions f (x) for which |f (x)|2 is integrable. They form a vector space that is usually denoted L2 ([a, b]). It is an example of a Hilbert space and is important in Quantum Mechanics. The L in this notation indicates that the integrals should be dened as Lebesgue integrals rather than as Riemann integrals usually taught in elementary calculus courses. This plays a role when discussing convergence theorems. But for any functions that come up in this course, the Lebesgue integral and the Riemann integral will be the same. The question of convergence is another complication that arises in innite dimensional vector spaces of functions. When discussing innite orthonormal bases, innite linear combinations of vectors (functions) will appear. There are several possible meanings for an equation like
ci i (x) = (x).
i=0
since we are talking about convergence of an innite series of functions. The most obvious interpretation is that for every xed value of x the innite sum of numbers on the left hand side equals the number on the right. Here is another interpretation: the dierence of and the partial sums zero when measured in the L2 norm, that is
N N N i=0 ci i
tends to
lim
i=0
ci i = 0
With this denition, it might happen that there are individual values of x where the rst equation doesnt hold. This is the meaning that we will give to the equation.
107
III Orthogonality
= en , en =
a b
=
a b
=
a
=L
108
III.4 Fourier series This shows that en = orthogonal. L for every n. Next we check that if n = m then en and em are
b
en , em =
a b
=
a
Here we used that e2i(mn)b/L = e2i(mn)(ba+a)/L = e2i(mn) e2i(mn)a/L = e2i(mn)a/L . This shows that the functions {. . . , e2 , e1 , e0 , e1 e2 , . . .} form an orthonormal set (scaled by L). To show these functions form a basis we have to verify that they span the space L2 [a, b]. In other words, we must show that any function f L2 [a, b] can be written as an innite linear combination f (x) =
cn en (x) =
cn e2inx/L .
n=
n=
This is a bit tricky, since it involves innite series of functions. For a nite dimensional space, to show that an orthogonal set forms a basis, it suces to count that there are the same number of elements in an orthogonal set as there are dimensions in the space. For an innite dimensional space this is no longer true. For example, the set of en s with n even is also an innite orthonormal set, but it doesnt span all of L2 [a, b]. In this course, we will simply accept that it is true that {. . . , e2 , e1 , e0 , e1 e2 , . . .} span L2 [a, b]. Once we accept this fact, it is very easy to compute the coecients in a Fourier expansion. The procedure is the same as in nite dimensions. Starting with f (x) =
n=
cn en (x)
we simply take the inner product of both sides with em . The only term in the innite sum that survives is the one with n = m. Thus em , f = and we obtain the formula cm =
n=
cn em , en = cm L
1 L
b a
e2imx/L f (x)dx
109
III Orthogonality
To obtain this form, recall that e2inx/L = cos(2nx/L) i sin(2nx/L) Using this formula we nd
n= n=1 n=1 n=1 n=1
cn e
2nx/L
= c0 + = c0 + = c0 +
cn e
2nx/L
cn e2nx/L
n=1
cn (cos(2nx/L) + i sin(2nx/L)) +
cn (cos(2nx/L) i sin(2nx/L))
Thus the real form of the Fourier series holds with a0 = 2c0 an = cn + cn Equivalently a0 2 an bn + for n > 0 cn = 2 2i an bn for n < 0. cn = 2 2i c0 = The coecients an and bn in the real form of the Fourier series can also be obtained directly. The set of functions {1/2, cos(2x/L), cos(4x/L), cos(6x/L), . . . , sin(2x/L), sin(4x/L), sin(6x/L), . . .} also form an orthogonal basis where each vector has norm an = 2 L
b
for
n>0
bn = icn icn
for n > 0.
cos(2nx/L)f (x)
a
110
sin(2nx/L)f (x)
for n = 1, 2, . . .. The desire to have the formula for an work out for n = 0 is the reason for dividing by 2 in the constant term a0 /2 in the real form of the Fourier series. One advantage of the real form of the Fourier series is that if f (x) is a real valued function, then the coecients an and bn are real too, and the Fourier series doesnt involve any complex numbers. However, it is often to calculate the coecients cn because exponentials are easier to integrate than sines and cosines.
III.4.3 An example
Lets compute the Fourier coecients for the square wave function. In this example L = 1. f (x) = 1 if 0 x 1/2 1 if 1/2 < x 1
c0 =
0
f (x)dx =
0
1dx
1dx = 0
1/2
Otherwise, we have
1
cn =
0
1/2
=
0
ei2nx dx
ei2nx x=1/2 ei2nx = i2n x=0 i2n in 2 2e = 2in 0 if n is even = 2/in if n is odd Thus we conclude that f (x) =
n= n odd
2 i2nx e in
To see how well this series is approximating f (x) we go back to the real form of the series. Using an = cn + cn and bn = icn icn we nd that an = 0 for all n, bn = 0 for n even and
111
n=1 n odd
4 sin(2nx) = n
n=0
We can use MATLAB/Octave to see how well this series is converging. The le ftdemo1.m contains a function that take an integer N as an argument and plots the sum of the rst 2N + 1 terms in the Fourier series above. Here is a listing: function ftdemo1(N) X=linspace(0,1,1000); F=zeros(1,1000); for n=[0:N] F = F + 4*sin(2*pi*(2*n+1)*X)/(pi*(2*n+1)); end plot(X,F) end Here are the outputs for N = 0, 1, 2, 10, 50:
1.5 1.5
0.5
0.5
-0.5
-0.5
-1
-1
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.5
0.5
-0.5
-0.5
-1
-1
-1.5 -0.2
0.2
0.4
0.6
0.8
1.2
-1.5 -0.2
0.2
0.4
0.6
0.8
1.2
112
0.5
-0.5
-1
-1.5 -0.2
0.2
0.4
0.6
0.8
1.2
v = c1 v1 + + cn vn =
ci vi
i=1
then, taking the inner product of v with itself, and using the fact that the basis is orthonormal, we obtain
n n n
v, v =
i=1 j=1
ci cj vi , vj =
i=1
|ci |2
cn en (x)
|f (x)|2 dx = f, f =
1 0 1dx
|cn |2
1=
n= n odd
4 =2 2 n2
n=
n=0 n odd
4 8 = 2 2 n2
or
113
III Orthogonality y(t) for t [0, T ], where y(t) is the air pressure at a point in space, for example, at your eardrum. A complex exponential e2it = cos(2t) i sin(2t) can be thought of as a pure oscillation with frequency . It is a periodic function whose values are repeated when t increases by 1 . If t has units of time (seconds) then has units of Hertz (cycles per second). In other words, in one second the function e2it cycles though its values times. The Fourier basis functions can be written as e2in t with n = n/T . Thus Fouriers theorem states that for t [0, T ] y(t) =
n=
cn e2in t .
In other words, the audio signal y(t) can be synthesized as a superposition of pure oscillations with frequencies n = n/T . The coecients cn describe how much of the frequency n is present in the signal. More precisely, writing the complex number cn as cn = |cn |e2in we have cn e2in t = |cn |e2i(n t+n ) . Thus |cn | represents the amplitude of the oscillation with frequency n while n represents a phase shift. A frequency-amplitude plot for y(t) is a plot of the points (n , |cn |). It should be thought of as a graph of the amplitude as a function of frequency and gives a visual representation of how much of each frequency is present in the signal. If y(t) is dened for all values of t we can use any interval that we want and expand the restriction of y(t) to this interval. Notice that the frequencies n = n/T in the expansion will be dierent for dierent values of T . Example: Lets illustrate this with the function y(t) = e2it and intervals [0, T ]. This function is itself a pure oscillation with frequency = 1. So at rst glance one would expect that there will be only one term in the Fourier expansion. This will turn out to be correct if number 1 is one of the available frequencies, that is, if there is some value of n for which n = n/T = 1. (This happens if T is an integer.) Otherwise, it is still possible to reconstruct y(t), but more frequencies will be required. In this case we would expect that |cn | should be large for n close to 1. Lets do the calculation. Fix T . Lets rst consider the case when T is an integer. Then cn = 1 T 1 = T = 1
1 2T i(1n/T ) T 0 T 0
e2i(T n)
e0
114
III.4 Fourier series as expected. Now lets look at what happens when T is not an integer. Then cn = = 1 T
T 0
e2int/T e2it dt
1 e2i(T n) 1 2i(T n)
We can use MATLAB/Octave to do an amplitude-frequency plot. Here are the commands for T = 10.5 and T = 100.5 N=[-200:200]; T=10.5; omega=N/T; absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega)); plot(omega,absc) T=100.5; omega=N/T; absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega)); hold on; plot(omega,absc, r) Here is the result
0.6
0.5
0.4
0.3
0.2
0.1
0 -20
-15
-10
-5
10
15
20
115
III Orthogonality
III.5.1 Denition
In the previous section we saw that the functions ek (x) = e2ikx for k Z form an innite orthonormal basis for the Hilbert space of functions L2 ([0, 1]). Now we will introduce a discrete, nite dimensional version of this basis. To motivate the denition of this basis, imagine taking a function dened on the interval [0, 1] and sampling it at the point at the N points 0, 1/N, 2/N, . . . , j/N, . . . , (N 1)/N . If we do this to the basis functions ek (x) we end up with vectors ek given by e2i0k/N e2ik/N e2i2k/N . . . e2i(N 1)k/N 1 k N 2k N . . .
(N 1)k
where
ek =
N = e2i/N The complex number N lies on the unit circle, that is, |N | = 1. Moreover N is a primitive j N N th root of unity. This means that N = 1 and N = 1 unless j is a multiple of N .
k+N k N k Because N = N N = N we see that ek+N = ek . Thus, although the vectors ek are dened for every integer k, they start repeating themselves after N steps. Thus there are only N distinct vectors, e0 , e1 , . . . , eN 1 .
116
III.5 The Discrete Fourier Transform These vectors, ek for k = 0, . . . , N 1 form an orthogonal basis for CN . To see this we use the formula for the sum of a geometric series: N 1 N r=1 j N r = 1r r=1 j=0 1r Using this formula, we compute
N 1 j=0 N 1 j=0 (lk)j
ek , el =
lj N kj N =
Now we can expand any vector f CN in this basis. Actually, to make our discrete Fourier transform agree with MATLAB/Octave we divide each basis vector by N . Then we obtain 1 f= N where ck = ek , f =
N 1 j=0
cj ej
N 1 j=0
e2ikj/N fj
The map that send the vector f to the vector of coecients c = [c0 , . . . , cN 1 ]T is the discrete Fourier transform. We can write this in matrix form as c = Ff, f = F 1 c
where the matrix F 1 has the vectors ek as its columns. Since this vectors are an orthogonal basis, the inverse is the transpose, up to a factor of N . Explicitly 1 1 F = 1 . . . 1 N 2 N . . . 1 2 N 4 N . . .
2(N 1)
1
N N 1 2(N 1) N . . .
117
N 1 N 1 N
(N 1)(N 1)
and
1 1 1 1 = N . . .
1 N N 2 . . .
1 N 2 N 4 . . .
1 N N 1 N 2(N 1) . . . N (N 1)(N 1)
1 N N 1 N 2(N 1)
III Orthogonality The matrix F = N 1/2 F is a unitary matrix (F 1 = F ). Recall that unitary matrices preserve the length of complex vectors. This implies that the lengths of the vectors f = [f0 , f1 , . . . , fN 1 ] and c = [c0 , c1 , . . . , cN 1 ] are related by N c or N
N 1 k=0 2
= f
|ck |2 =
N 1 k=0
|fk |2
Multiplying an N N matrix with a vector of length N normally requires N 2 multiplications, since each entry of the product requires N , and there are N entries. It turns out that the discrete Fourier transform, that is, multiplication by the matrix F , can be carried out using only N log2 (N ) multiplications (at least if N is a power of 2). The algorithm that achieves this is called the Fast Fourier Transform, or FFT. This represents a tremendous saving in time: calculations that would require weeks of computer time can be carried out in seconds.
The basic idea of the FFT is to break the sum dening the Fourier coecients ck into a sum of the even terms and a sum of the odd terms. Each of these turns out to be (up to a factor we can compute) a discrete Fourier transform of half the length. This idea is then applied recursively. Starting with N = 2n and halving the size of the Fourier transform at each step, it takes n = log2 (N ) steps to arrive at Fourier transforms of length 1. This is where the log2 (N ) comes in.
To simplify the notation, we will ignore the factor of 1/N in the denition of the discrete Fourier transform (so one should divide by N at the end of the calculation.) We now also assume that N = 2n so that we can divide N by 2 repeatedly. The basic formula, splitting the sum for ck into a
118
III.5 The Discrete Fourier Transform sum over odd and even js is ck =
N 1 j=0
ei2kj/N fj ei2kj/N fj +
N 1
j=0 j odd
N 1
j=0 j even
ei2kj/N fj
N/21
N/21
=
j=0 N/21
i2k2j/N
f2j +
j=0
ei2k(2j+1)/N f2j+1
N/21
=
j=0
ei2kj/(N/2) f2j+1
Notice that the two sums on the right are discrete Fourier transforms of length N/2. To continue, it is useful to write the integers j in base 2. Lets assume that N = 23 = 8. Once you understand this case, the general case N = 2n will be easy. Recall that 0 = 000 1 = 001 2 = 010 3 = 011 4 = 100 5 = 101 6 = 110 7 = 111 (base 2) (base 2) (base 2) (base 2) (base 2) (base 2) (base 2) (base 2)
The even js are the ones whose binary expansions have the form 0, while the odd js have binary expansions of the form 1. For any pattern of bits like 0, I will use the notation F <pattern> to denote the discrete Fourier transform where the input data is given by all the fj s whose js have binary expansion tting the pattern. Here are some examples. To start, Fk = ck is the original discrete Fourier transform, since every j ts the pattern . In this example k ranges over 0, . . . , 7, that is, the values start repeating after that. Only even js t the pattern 0, so F 0 is the discrete Fourier transform of the even js given by
N/21 0 Fk = j=0
ei2kj/(N/2) f2j .
119
III Orthogonality Here k runs from 0 to 3 before the values start repeating. Similarly, F 00 is a transform of length N/4 = 2 given by
N/41 00 Fk = j=0
ei2kj/(N/4) f4j .
In this case k = 0, 1 and then the values repeat. Finally, the only j matching the pattern 010 010 is j = 2, so Fk is a transform of length one term given by
N/81 010 Fk 0
=
j=0
i2kj/(N/8)
f2 =
j=0
e0 f2 . = f2
Recall that N = ei2/N , so N = ei2/N . Lets look at this equation when k = 0. We will represent the formula by the following diagram.
F **0 0
0 N
F *** 0
F **1 0
120
0 N
1 N
2 N
3 N
121
III Orthogonality Now when we get to k = 4, we recall that F 0 and F 1 are discrete transforms of length 0 0 0 0 N/2 = 4. Therefore, by periodicity F4 = F0 , F5 = F1 , and so on. So in the formula 0 1 0 1 0 1 F4 = F4 + 4 F4 we may replace F4 and F4 with F0 and F0 respectively. N Making such replacements, we complete the rst part of the diagram as follows.
0 N
1 N
2 N
3 N
4 N
5 N
6 N
7 N
122
III.5 The Discrete Fourier Transform To move to the next level we analyze the discrete Fourier transforms on the left of this diagram in the same way. This time we use the basic formula for the transform of length N/2, namely 10 0 00 Fk = Fk + k Fk N/2 and
11 01 1 Fk = Fk + k Fk . N/2
The resulting diagram shows how to go from the length two transforms to the nal transform on the right.
0 N/2
0 N
N/2
1 N
2 N/2
2 N
3 N/2
3 N
0 N/2
4 N
1 N/2
5 N
2 N/2
6 N
3 N/2
7 N
123
III Orthogonality Now we go down one more level. Each transform of length two can be constructed from transforms of length one, i.e., from the original data in some order. We complete the diagram as follows. Here we have inserted the value N = 8.
0 2
0 4
0 8
1 2
1 4
1 8
0 2
2 4
2 8
1 2
3 4
3 8
0 2
0 4
4 8
1 2
1 4
5 8
0 2
2 4
6 8
1 2
3 4
7 8
Notice that the fj s on the left of the diagram are in bit reversed order. In other words, if we reverse the order of the bits in the binary expansion of the js, the resulting numbers are ordered from 0 (000) to 7 (111). Now we can describe the algorithm for the fast Fourier transform. Starting with the original data [f0 , . . . , f7 ] we arrange the values in bit reversed order. Then we combine them pairwise, as indicated by the left side of the diagram, to form the transforms of length 2. To do this we we need to compute 2 = ei = 1. Next we combine the transforms of length 2 according to the middle part of the diagram to form the transforms of length 4. Here we use that 4 = ei/2 = i. Finally we combine the transforms of length 4 to obtain the transform of length 8. Here we need 8 = ei/4 = 21/2 i21/2 . The algorithm for values of N other than 8 is entirely analogous. For N = 2 or 4 we stop at the rst or second stage. For larger values of N = 2n we simply add more stages. How many multiplications do we need to do? Well there are N = 2n multiplications per stage of the algorithm (one for each circle on the diagram), and there are n = log2 (N ) stages. So the number of multiplications is 2n n = N log2 (N )
As an example let us compute the discrete Fourier transform with N = 4 of the data [f0 , f1 , f2 , f3 ] = [1, 2, 3, 4]. First we compute the bit reversed order of 0 = (00), 1 = (01), 2 = (10), 3 = (11) to be (00) = 0, (10) = 2, (01) = 1, (11) = 3. We then do the rest of the computation right on the diagram as follows.
124
f0 = f2 = f1 = f3 =
1 3 2 4
1 1 1 1
1 i 1 i
4+6=10 = c 0 2+2i = c1
46=2 = c 2 22i = c3
The MATLAB/Octave command for computing the fast Fourier transform is fft. Lets verify the computation above. > fft([1 2 3 4]) ans = 10 + 0i -2 + 2i -2 + 0i -2 2i
cn e2in t =
cn e2int/T
n=
n=
If y(t) represents the sound of a musical instrument, then the frequency-amplitude plot gives a visual representation of the strengths of the various frequencies present in the sound. Of course, for an actual instrument there is no formula for y(t) and the best we can do is to measure this function at a discrete set of points. To do this we pick a sampling frequency Fs samples/second. Then we measure the function y(t) at times t = tj = j/Fs , j = 1, . . . N , where N = Fs T (so that tN = T ) and put the results yj = y(tj ) in a vector y = [y1 , y2 , . . . , yN ]T . How can we make an approximate frequency-amplitude plot with this information? The key is to realize that the coecients in the discrete Fourier transform of y can be used to approximate the Fourier series coecients cn . To see this, do a Riemann sum approximation
125
III Orthogonality of the integral in the formula for cn . Using the equally spaced points tj with tj = 1/Fs , and recalling that N = T Fs we obtain cn = = 1 T 1 T
T 0 N 1 j=0 N 1 j=0
1 T Fs
1 = cn N where cn is the nth coecient in the discrete Fourier transform of y The frequency corresponding to cn is n = n/T = nFs /N . So, for an approximate c frequency-amplitude plot, we can plot the points (nFs /N, |n |/N ). However, it is important to realize that the approximation cn cn /N is only good for small n. The reason is that the Riemann sum will do a worse job in approximating the integral when the integrand is oscillating rapidly, that is, when n is large. So we should only plot a restricted range of n. In fact, it never makes sense to plot more than N/2 points. The reason c c for this is cn+N = cn and, for y real valued, cn = cn . These facts imply that |n | = |N n |, so that the values of |n | in the range [0, N/2 1] are the same as the values in [N/2, N 1], c with the order reversed. To compare the meanings of the coecients cn and cn it is instructive to consider the formulas (both exact) for the Fourier series and the discrete Fourier transform for yj = y(tj ): yj = y(tj ) = 1 N
N 1
cn e2inj/N
n=
n=0
cn e2intj /T =
cn e2inj/N
n=
The coecients cn and cn /N are close for n close to 0, but then their values diverge so that the innite sum and the nite sum above both give the same answer. Now lets try and make a frequency amplitude plot using MATLAB/Octave for a sampled ute contained in the audio le F6.baroque.au available at http://www.phys.unsw.edu.au/music/flute/baroque/sounds/F6.baroque.au. This le contains a sampled baroque ute playing the note F6 , which has a frequency of 1396.91 Hz. The sampling rate is Fs = 22050 samples/second. Audio processing is one area where MATLAB and Octave are dierent. The Octave code to load the le F6.baroque.au is
126
III.5 The Discrete Fourier Transform y=loadaudio(F6.baroque,au,8); while the MATLAB code is y=auread(F6.baroque.au); After this step the sampled values are loaded in the vector y. Now we compute the FFT of y and store the resulting values cn in a vector tildec. Then we compute a vector omega containing the frequencies and make a plot of these frequencies against |n |/N . We plot the c rst Nmax=N/4 values. tildec = fft(y); N=length(y); Fs=22050; omega=[0:N-1]*Fs/N; Nmax=floor(N/4); plot(omega(1:Nmax), abs(tildec(1:Nmax)/N)); Here is the result.
12
10
Notice the large spike at 1396 corresponding to the note F6 . Smaller spikes appear at the overtone series, but evidently these are quite small for a ute.
127
Chapter IV
Eigenvalues and Eigenvectors
129
write down the denition of eigenvalues and eigenvectors and be able to compute them using the standard procedure. use MATLAB/Octave commands poly and root to compute the characteristic polynomial and its roots, and eig to compute the eigenvalues and eigenvectors. write down the denitions of algebraic and geometric multiplicities of eigenvectors when there are repeated eigenvalues. use eigenvalues and eigenvectors to perform matrix diagonalization. recognize the form of the Jordan Canonical Form for non-diagonalizable matrices. explain the relationship between eigenvalues and the determinant and trace of a matrix. use eigenvalues to compute powers of a diagonalizable matrix.
IV.1.1 Denition
Let A be an n n matrix. A number and non-zero vector v are an eigenvalue eigenvector pair for A if Av = v Although v is required to be nonzero, = 0 is possible. If v is an eigenvector, so is sv for any number s = 0. Rewrite the eigenvalue equation as (I A)v = 0 Then we see that v is a non-zero vector in the nullspace N (I A). Such a vector only exists if I A is a singular matrix, or equivalently if det(I A) = 0
130
Compute the characteristic polynomial p() Find all the zeros (roots) of p(). This is equivalent to completely factoring p() as p() = ( 1 )( 2 ) ( n ) Such a factorization always exists if we allow the possibility that the zeros 1 , 2 , . . . are complex numbers. But it may be hard to nd. In this factorization there may be repetitions in the i s. The number of times a i is repeated is called its algebraic multiplicity. For each distinct i nd N (I A), that is, all the solutions to (i I A)v = 0 The non-zero solutions are the eigenvectors for i .
IV.1.3 Example 1
This is the typical case where all the eigenvalues are distinct. Let 3 6 7 A= 1 8 5 1 2 1 det(I A) = 3 122 + 44 48 This can be factored as 3 122 + 44 48 = ( 2)( 4)( 6) So the eigenvalues are 2, 4 and 6. These steps can be done with MATLAB/Octave using poly and root. If A is a square matrix, the command poly(A) computes the characteristic polynomial, or rather, its coecients.
131
IV Eigenvalues and Eigenvectors > A=[3 -6 -7; 1 8 5; -1 -2 1]; > p=poly(A) p = 1.0000 -12.0000 44.0000 -48.0000
Recall that the coecient of the highest power comes rst. The function roots takes as input a vector representing the coecients of a polynomial and returns the roots. >roots(p) ans = 6.0000 4.0000 2.0000 To nd the eigenvector(s) for 1 = 2 we must solve the homogeneous equation (2I A)v = 0. Recall that eye(n) is the n n identity matrix I >rref(2*eye(3) - A) ans = 1 0 0 0 1 0 -1 1 0
From this we can read o the solution 1 v1 = 1 1 Similarly we nd for 2 = 4 and 3 = 6 that the corresponding eigenvectors are 1 2 1 v3 = 1 v2 = 1 0
The three eigenvectors v1 , v2 and v3 are linearly independent and form a basis for R3 . The MATLAB/Octave command for nding eigenvalues and eigenvectors is eig. The command eig(A) lists the eigenvalues
132
IV.1 Eigenvalues and Eigenvectors >eig(A) ans = 4.0000 2.0000 6.0000 while the variant [V,D] = eig(A) returns a matrix V whose columns are eigenvectors and a diagonal matrix D whose diagonal entries are the eigenvalues. >[V,D] = eig(A) V = 5.7735e-01 5.7735e-01 -5.7735e-01 D = 4.00000 0.00000 0.00000 0.00000 2.00000 0.00000 0.00000 0.00000 6.00000 5.7735e-01 -5.7735e-01 5.7735e-01 -8.9443e-01 4.4721e-01 2.2043e-16
Notice that the eigenvectors have been normalized to have length one. Also, since they have been computed numerically, they are not exactly correct. The entry 2.2043e-16 (i.e., 2.2043 1016 ) should actually be zero.
IV.1.4 Example 2
This example has a repeated eigenvalue. 1 1 0 A = 0 2 0 0 1 1
det(I A) = 3 42 + 5 2 = ( 1)2 ( 2) In this example the eigenvalues are 1 and 2, but the eigenvalue 1 has algebraic multiplicity 2.
133
From this it is easy to see that there are two linearly independent eigenvectors for this eigenvalue: 0 1 v1 = 0 and w1 = 0 1 0 In this case we say that the geometric multiplicity is 2. In general, the geometric multiplicity is the number of independent eigenvectors, or equivalently the dimension of N (I A) The eigenvalue 2 = 2 has eigenvector 1 v2 = 1 1 So, although this example has repeated eigenvalues, there still is a basis of eigenvectors.
IV.1.5 Example 3
Here is an example where the geometric multiplicity is less than the algebraic multiplicity. If 2 1 0 A = 0 2 1 0 0 2 then the characteristic polynomial is det(I A) = ( 2)3 so there is one eigenvalue 1 = 2 with algebraic multiplicity 3. To nd the eigenvectors we compute 0 1 0 2I A = 0 0 1 0 0 0
From this we see that there is only one independent solution 1 0 v1 = 0 134
IV.1 Eigenvalues and Eigenvectors Thus the geometric multiplicity dim(N (2I A)) is 1. What does MATLAB/Octave do in this situation?
>A=[2 1 0; 0 2 1; 0 0 2]; >[V D] = eig(A) V = 1.00000 0.00000 0.00000 D = 2 0 0 0 2 0 0 0 2 -1.00000 0.00000 0.00000 1.00000 -0.00000 0.00000
It simply returned the same eigenvector three times. In this example, there does not exist a basis of eigenvectors.
IV.1.6 Example 4
Finally, here is an example where the eigenvalues are complex, even though the matrix has real entries. Let 0 1 A= 1 0 Then det(I A) = 2 + 1 which has no real roots. However 2 + 1 = ( + i)( i) so the eigenvalues are 1 = i and 2 = i. The eigenvectors are found with the same procedure as before, except that now we must use complex arithmetic. So for 1 = i we compute i 1 iI A = 1 i There is trick for computing the null space of a singular 2 2 matrix. Since the two rows must be multiples of each other (in this case the second row is i times the rst row) we simply
135
IV Eigenvalues and Eigenvectors a with ia + b = 0. This is easily achieved by ipping the entries in b the rst row and changing the sign of one of them. Thus v1 = 1 i
need to nd a vector
If a matrix has real entries, then the eigenvalues and eigenvectors occur in conjugate pairs. This can be seen directly from the eigenvalue equation Av = v. Taking complex conjugates (and using that the conjugate of a product is the product of the conjugates) we obtain v v v A = But if A is real then A = A so A = , which shows that and v are also an v eigenvalue eigenvector pair. From this discussion it follows that v2 is the complex conjugate of v1 v2 = 1 i
136
IV.1 Eigenvalues and Eigenvectors As before, we multiply by A to get one equation, then multiply by 3 to get another equation. Subtracting the resulting equations gives c1 (1 3 )v1 + c2 (2 3 )v2 = 0 But we already know that v1 and v2 are independent. Therefore c1 (1 3 ) = c2 (2 3 ) = 0. Since 1 3 = 0 and 2 3 = 0 this implies c1 = c2 = 0 too. Therefore v1 , v2 and v3 are independent. Repeating this argument, we eventually nd that all the eigenvectors v1 , . . . , vn are independent. In example 2 above, we saw that it might be possible to have a basis of eigenvectors even when there are repeated eigenvalues. For some classes of matrices (for example symmetric matrices (AT = A) or orthogonal matrices) a basis of eigenvectors always exists, whether or not there are repeated eigenvalues. Will will consider this in more detail later in the course.
1 . 0 What happens when we change slightly? Then the eigenvalues change to 1 and 1 + , and being distinct, they must have independent eigenvectors. A short calculation reveals that they are 1 1 v1 = v2 = 0 when = 0 this matrix has a single eigenvalues = 1 and only one eigenvector v1 = These two eigenvectors are almost, but not quite, dependent. When becomes zero they collapse and point in the same direction. In general, if you start with a matrix with repeated eigenvalues and too few eigenvectors, and change the entries of the matrix a little, some of the eigenvectors (the ones corresponding to the eigenvalues whose algebraic multiplicity is higher than the geometric multiplicity) will split into several eigenvectors that are almost parallel.
IV.1.9 Diagonalization
Suppose A is an n n matrix with eigenvalues 1 , , n and a basis of eigenvectors v1 , . . . , vn . Form the matrix with eigenvectors as columns S = v1 v2 vn
137
= 1 v1 2 v2
= v1 v2
vn
= SD
where D is the diagonal matrix with the eigenvalues on the diagonal. Since the columns of S are independent, the inverse exists and we can write A = SDS 1 This is called diagonalization. Notice that the matrix S is exactly the one returns by the MATLAB/Octave call [S D] = eig(A). >A=[1 2 3; 4 5 6; 7 8 9]; >[S D] = eig(A); >S*D*S^(-1) ans = 1.0000 4.0000 7.0000 2.0000 5.0000 8.0000 3.0000 6.0000 9.0000 S 1 AS = D
IV.1 Eigenvalues and Eigenvectors Where each submatrix Bi (called a Jordan block) has and 1s on the superdiagonal. i 1 0 0 i 1 Bi = 0 0 i . . . 0 0 0 a single eigenvalue on the diagonal 0 0 0 . . .
If all the blocks are of size 1 1 then there are no 1s and the matrix is diagonalizable.
Thus (taking A = SD and B = S 1 ) tr(A) = tr(SDS 1 ) = tr(S 1 SD) = tr(D) = 1 + 2 + + n Thus the trace of a diagonalizable matrix is the sum of the eigenvalues. Again, this is true for non-diagonalizable matrices as well, and can be seen from the Jordan Canonical Form.
139
IV Eigenvalues and Eigenvectors because all of the factors S 1 S cancel. Since powers of the diagonal matrix D are given by k 1 0 0 0 0 k 0 0 2 k Dk = 0 0 3 0 . . . . . . 0 0 0 k n this formula provides an eective way to understand and compute Ak for large k.
140
141
IV Eigenvalues and Eigenvectors we nd 1 v1 , v2 = 2 v1 , v2 Here 1 should appear as 1 , but we already know that eigenvalues are real so 1 = 1 . This can be written (1 2 ) v1 , v2 = 0 and since 1 2 = 0 this implies v1 , v2 = 0
This calculation shows that if A has distinct eigenvalues then the eigenvectors are all orthogonal, and by rescaling them, we can obtain an orthonormal basis of eigenvectors. In fact, even it A has repeated eigenvalues, it is still true that an orthonormal basis of eigenvectors exists. If A is real and symmetric, then the eigenvectors can be chosen to be real. One way to see this is to notice that if once we know that is real then all the calculations involved in computing the nullspace of I A only involve real numbers. This implies that the matrix that diagonalizes A can be chosen to be an orthogonal matrix. If A has complex entries, but satises A = A it is called Hermitian. (Recall that A = A .) The argument above still is valid for Hermitian matrices and shows that all the eigenvalues are real. There also exists an orthonormal basis of eigenvectors. However, in contrast to the case where A is real and symmetric, the eigenvectors may have complex entries. Thus a Hermitian matrix may be diagonalized by a unitary matrix. If A is any matrix with real entries, then A + AT is real symmetric. (The matrix AT A is also real symmetric, and has the additional property that all the eigenvalues are positive.) We can use this to produce random symmetric matrices in MATLAB/Octave like this: >A = rand(4,4); >A = A+A A = 0.043236 1.240654 0.658890 0.437168 1.240654 1.060615 0.608234 0.911889 0.658890 0.608234 1.081767 0.706712 0.437168 0.911889 0.706712 1.045293
T
142
The eigenvalues are real, as expected. Also, the eigenvectors contained in the columns of the matrix V have been normalized. Thus V is orthogonal: >V*V ans = 1.0000e+00 9.1791e-17 -1.4700e-16 1.3204e-16 6.5066e-17 1.0000e+00 -1.0012e-16 2.2036e-17 -1.4700e-16 -1.0432e-16 1.0000e+00 -1.0330e-16 1.4366e-16 2.2036e-17 -1.2617e-16 1.0000e+00
Normalize to unit length. repeatedly. This generates a series of vectors x0 , x1 , x2 , . . .. It turns out that these vectors converge to the eigenvector corresponding to the eigenvalue whose absolute value is the largest.
143
IV Eigenvalues and Eigenvectors To verify this claim, lets rst nd a formula for xk . At each stage of this process, we are multiplying by A and then by some number. Thus xk must be a multiple of Ak x0 . Since the resulting vector has unit length, that number must be 1/ Ak x0 . Thus xk = Ak x0 Ak x0
We know that A has a basis of eigenvectors v1 , v2 , . . . , vn . Order them so that |1 | > |2 | |n |. (We are assuming here that |1 | = |2 |, otherwise the power method runs into diculty.) We may expand our initial vector x0 in this basis x0 = c1 v1 + c2 v2 + cn vn We need that c1 = 0 for this method to work, but if x0 is chosen at random, this is almost always true. Since Ak vi = k vi we have i Ak x0 = c1 k v1 + c2 k v2 + cn k vn 1 2 n = k (c1 v1 + k ) 1 where k 0 as k . This is because |(i /1 )| < 1 for every i > 1 so the powers tend to zero. Thus Ak x0 = |1 |k c1 v1 + k so that xk = = Ak x0 Ak x0 1 |1 | c1 v1 + k c1 v1 + k v1 v1
k
= k c1 v1 + c2 (2 /1 )k v2 + cn (n /1 )k vn 1
()k
We have shown that xk converges, except for a possible sign ip at each stage, to a normalized eigenvector corresponding to 1 . The sign ip is present exactly when 1 < 0. Knowing v1 (or a multiple of it) we can nd 1 with 1 = v1 , Av1 v1 2
This gives a method for nding the largest eigenvalue (in absolute value) and the corresponding eigenvector. Lets try it out.
144
IV.2 Power Method for Computing Eigenvalues >A = [4 1 3;1 3 2; 3 2 5]; >x=rand(3,1); >for k = [1:10] >y=A*x; >x=y/norm(y) >end x = 0.63023 0.38681 0.67319 x = 0.58690 0.37366 0.71828 x = 0.57923 0.37353 0.72455 x = 0.57776 0.37403 0.72546 x = 0.57745 0.37425 0.72559 x = 0.57738 0.37433 0.72561 x =
145
IV Eigenvalues and Eigenvectors 0.57736 0.37435 0.72562 x = 0.57735 0.37436 0.72562 x = 0.57735 0.37436 0.72562 x = 0.57735 0.37436 0.72562 This gives the eigenvector. We compute the eigenvalue with >x*A*x/norm(x)^2 ans = 8.4188
Lets check: >[V D] = eig(A) V = 0.577350 0.441225 -0.687013 D = 1.19440 0.00000 0.00000 0.00000 2.38677 0.00000 0.00000 0.00000 8.41883 0.577350 -0.815583 -0.038605 0.577350 0.374359 0.725619
146
IV.2 Power Method for Computing Eigenvalues As expected, we have computed the largest eigenvalue and eigenvector. Of course, a serious program that uses this method would not just iterate a xed number (above it was 10) times, but check for convergence, perhaps by checking whether xk xk1 was less than some small number, and stopping when this was achieved. So far, the power method only computes the eigenvalue with the largest absolute value, and the corresponding eigenvector. What good is that? Well, it turns out that with an additional twist we can compute the eigenvalue closest to any number s. The key observation is that the eigenvalues of (A sI)1 are exactly (i s)1 (unless, of course, A sI is not invertible. But then s is an eigenvalue itself and we can stop looking.) Moreover, the eigenvectors of A and (A sI)1 are the same. Lets see why this is true. If Av = v then Now if we multiply both sides by (A sI)1 and divide by s we get ( s)1 v = (A sI)1 v. These steps can be run backwards to show that if ( s)1 is an eigenvalue of (A sI)1 with eigenvector v, then is an eigenvalue of A with the same eigenvector. Now start with an arbitrary vector x0 and dene xk+1 = (A sI)1 xk . (A sI)1 xk (A sI)v = ( s)v.
Then xk will converge to the eigenvector vi of (A sI)1 for which |i s|1 is the largest. But, since the eigenvectors of A and A sI are the same, vi is also an eigenvector of A. And since |i s|1 is largest when i is closest to s, we have computed the eigenvector vi of A for which i is closest to s. We can now compute i by comparing Avi with vi Here is a crucial point: when computing (A sI)1 xk in this procedure, we should not actually compute the inverse. We dont need to know the whole matrix (A sI)1 , but just the vector (A sI)1 xk . This vector is the solution y of the linear equation (A sI)y = xk . In MATLAB/Octave we would therefore use something like (A - s*eye(n))\Xk. Lets try to compute the eigenvalue of the matrix A above closest to 3. >A = [4 1 3;1 3 2; 3 2 5]; >x=rand(3,1); >for k = [1:10] >y=(A-3*eye(3))\x; >x=y/norm(y) >end
147
x = 0.649008 -0.756516 0.080449 x = -0.564508 0.824051 0.047657 x = 0.577502 -0.815593 -0.036045 x = -0.576895 0.815917 0.038374 x = 0.577253 -0.815659 -0.038454 x = -0.577311 0.815613 0.038562 x = 0.577338 -0.815593 -0.038590 x =
148
IV.2 Power Method for Computing Eigenvalues -0.577346 0.815587 0.038600 x = 0.577349 -0.815585 -0.038603 x = -0.577350 0.815584 0.038604 This gives the eigenvector. Now we can nd the eigenvalue > lambda = x*A*x/norm(x)^2 lambda = 2.3868 Comparing with the results of eig above, we see that we have computed the middle eigenvalue and eigenvector.
149
n = 1, 2, 3 . . . .
When n is large, these numbers are almost integers: >format long; >((1+sqrt(5))/2)^30/sqrt(5) ans = 832040.000000241
where each number in the sequence is obtained by adding the previous two. If you go far enough along in this sequence you will encounter ... 832040 1346269 2178309 ...
150
IV.3 Recursion Relations and you can check (without using MATLAB/Octave, I hope) that the third number is the sum of the previous two. But why should powers of the golden ratio be very nearly, but not quite, equal to Fibonacci numbers? The reason is that the Fibonacci sequence is dened by a recursion relation. For the Fibonacci sequence F0 , F1 , F2 , . . . the recursion relation is Fn+1 = Fn + Fn1 This equation, together with the identity Fn = Fn can be written in matrix form as 1 1 Fn+1 = Fn 1 0 Thus, taking n = 1, we nd 1 1 F2 = 1 0 F1 Similarly F3 1 1 = 1 0 F2 and continuing like this we nd 1 1 Fn+1 = Fn 1 0 Finally, since F0 = 0 and F1 = 1 we can write Fn+1 1 1 = 1 0 Fn
n n
Fn Fn1 F1 F0
2
F2 1 1 = 1 0 F1
F1 F0
F1 F0
1 0
We can diagonalize the matrix to get a formula for the Fibonacci numbers. The eigenvalues 1 1 are and eigenvectors of 1 0 1+ 5 1 = 2 and 1 5 2 = 2 v1 =
1+ 5 2
v2 =
1 5 2
This implies 1 2 1 1 n 0 1 0 n 2 1 2 1 1
1
1 1 2 = 5 1 1
n 0 1 0 n 2
1 2 1 1
151
1 Fn = (n n ) 1 2 5
Since 2 0.6180339880 is smaller than 1 in absolute value, the powers n become small 2 very quickly as n becomes large. This explains why 1 Fn n 1 5 for large n. If we want to use MATLAB/Octave to compute Fibonacci numbers, we dont need to bother diagonalizing the matrix. >[1 1;1 0]^30*[1;0] ans = 1346269 832040 produces the same Fibonacci numbers as above.
so given three initial values x0 , x1 and x2 we can nd the rest by computing powers of a 3 3 matrix. In the next section we will solve a recurrence relation that arises in Quantum Mechanics.
152
|n |2 < .
153
IV Eigenvalues and Eigenvectors This conditions are trivially satises if = {0, 0, 0, . . .} so we specically exclude this case. (In fact is actually the eigenvector of an innite matrix so this is just the condition that eigenvectors must be non-zero.) Given an energy E, it is always possible to nd a wave function to satisfy conditions (1) and (2). However for most energies E, none of these s will be getting small for large n, so the normalization condition (3) will not be satised. There are only a discrete set of energies E for which a bound state satisfying all three conditions is satised. In other words, the energy E is quantized. If E is one of the allowed energy values and is the corresponding bound state, then the numbers |n |2 /N 2 are interpreted as the probabilities of nding an electron with energy E at the nth site. These numbers add up to 1, consistent with the interpretation as probabilities. Notice that if is a bound state with energy E, then so is any non-zero multiple a = {a0 , a1 , a2 , . . .}. Replacing with a has no eect on the probabilities because N changes to aN , so the as cancel in |n |2 /N 2 .
154
IV.4 The Anderson Tight Binding Model So far we are able to compute xn (and thus n ) satisfying conditions (1) and (2) for any values of V1 , V2 , . . .. Condition (3) is a statement about the large n behaviour of n . This can be very dicult to determine, unless we know more about the values Vn .
The large n behavior of xn can be computed using the eigenvalues and eigenvectors of A(E). Suppose they are 1 , v1 and 2 , v2 . Then we expand x1 = a1 v1 + a2 v2 and conclude that
n1 n1 xn = A(E)n1 (a1 v1 + a2 v2 ) = a1 1 v1 + a2 2 v2
Keep in mind that all the quantities in this equation depend on E. Our goal is to choose E so that the xn become small for large n. Before computing the eigenvalues of A(E), lets note that det(A(E)) = 1. This implies that 1 2 = 1 Suppose the eigenvalues are complex. Then, since A(E) has real entries, they must be complex conjugates. Thus 2 = 1 and 1 = 1 2 = 1 1 = |1 |2 . This means that 1 and 2 lie on the unit circle in the complex plane. In other words, 1 = ei and 2 = ei for n1 some . This implies that 1 = ei(n1) is also on the unit circle, and is not getting small. n1 Similarly 2 is not getting small. So complex eigenvalues will never lead to bound states. In fact, complex eigenvalues correspond to scattering states, and the energy values for which eigenvalues are complex are the energies at which the electron can move freely through the crystal. Suppose the eigenvalues are real. If |1 | > 1 then |2 | = 1/|1 | < 1 and vice versa. So one n1 n1 of the products 1 , 2 will be growing large, and one will be getting small. So the only way that xn can be getting small is if the coecient a1 or a2 sitting in front of the growing product is zero. Now let us actually compute the eigenvalues. They are E E 2 4 . = 2
155
IV Eigenvalues and Eigenvectors If 2 < E < 2 then the eigenvalues are complex, so there are no bound states. The interval [2, 2] is the conduction band, where the electrons can move through the crystal.
If E = 2 then there is only one eigenvalue, namely 1. In this case there actually is only one eigenvector, so our analysis doesnt apply. However there are no bounds states in this case. Now let us consider the case E < 2. Then the large eigenvalue is 1 = (E + E 2 4)/2 1 . The small eigenvalue is 2 = (E and the corresponding eigenvector is v1 = E + 1 1 . We must now compute a1 E 2 4)/2 and the corresponding eigenvector is v2 = E + 2 a and determine when it is zero. We have [v1 |v2 ] 1 = x1 . This is 2 2 matrix equation a2 a that we can easily solve for 1 . A short calculation gives a2 a1 = (1 2 )1 ((a + E)(E + 2 ) + 1). Thus we see that a1 = 0 whenever (a + E)(E E 2 4) 2 = 0
Lets consider the case a = 5 and plot this function on the interval [10, 2]. To see if it crosses the axis, we also plot the function zero.
156
100
80
60
40
20
-20 -10
-9
-8
-7
-6
-5
-4
-3
-2
We can see that there is a single bound state in this interval, just below 5. In fact, the solution is E = 5.2. The case E > 2 is similar. This time we end up with (a + E)(E + E 2 4) 2 = 0
When a = 5 this never has a solution for E > 2. In fact the right side of this equation is bigger than (5 + 2)(2 + 0) 2 = 12 and so can never equal zero. In conclusion, if V1 = 5, and all the other Vn s are zero, then there is exactly one bound state with energy E = 5.2. Here is a diagram of the energy spectrum for this potential.
Conduction band 0 2 E
For the bound state energy of E = 5.2, the corresponding wave function , and thus the probability that the electron is located at the nth lattice point can now also be computed. The evaluation of the innite sum that gives the normalization constant N 2 can be done using a geometric series.
157
Notice that the matrix T (E) has determinant 1 since it is a product of matrices with determinant 1. So, as above, the eigenvalues 1 and 2 are either real with 2 = 1/1 , or complex conjugates on the unit circle. As before, the conduction bands are the energies E for which the eigenvalues of T (E) are complex conjugates. It turns out that this happens exactly when |tr(T (E))| 2 To see this, start with the characteristic polynomial for T (E) det(I T (E)) = 2 tr(T (E)) + det(T (E)) (see homework problem). Since det(T (E)) = 1 the eigenvalues are given by = tr(T (E)) tr(T (E))2 4 . 2
When |tr(T (E))| 2 the quantity under the square root sign is negative, and so the eigenvalues have a non-zero imaginary part. Lets use MATLAB/Octave to plot the values of tr(T (E)) as a function of E. For convenience we rst dene a function that computes the matrices A(z). To to this we type the following lines into a le called A.m in our working directory. function A=A(Z) A=[Z -1; 1 0]; end
158
IV.4 The Anderson Tight Binding Model Next we start with a range of E values and dene another vector T that contains the corresponding values of tr(T (E)). N=100; E=linspace(-1,6,N); T=[]; for e = E T=[T trace(A(4-e)*A(3-e)*A(2-e)*A(1-e))]; end Finally, we plot T against E. At the same time, we plot the constant functions E = 2 and E = 2. plot(E,T) hold on plot(E,2*ones(1,N)); plot(E,-2*ones(1,N)); axis([-1,6,-10,10]) On the resulting picture the energies where T (E) lies between 2 and 2 have been highlighted.
10
10 1
We see that there are four conduction bands for this crystal.
159
write down the denition of a stochastic matrix and its properties explain why the probabilities in a random walk approach limiting values write down the stochastic matrix for a random walk and calculate the limiting probabilities. use stochastic matrices to solve practical Markov chain problems. write down the stochastic matrix associated with the Google Page rank algorithm for a given damping factor , and compute the ranking of the sites for a specied internet. use the Metropolis algorithm to produce a stochastic matrix with a predetermined limiting probability distribution
In the diagram below there are three sites labelled 1, 2 and 3. Think of a walker moving from site to site. At each step the walker either stays at the same site, or moves to one of the other sites according to a set of xed probabilities. The probability of moving to the ith site from the jth site is denoted pi,j . These numbers satisfy 0 pi,j 1 because they are probabilities (0 means no chance and 1 means for sure). On the diagram they label the arrows indicating the relevant transitions. Since the walker has to go somewhere at each step the sum of all the probabilities leaving a given site must be one. Thus for every j, pi,j = 1
i
160
3,3
p p2,3 3 p p
3,1 1,3
2,2
The probability that the walker is at site i after n+1 steps can be calculated from probabilities for the previous step. It is the sum over all sites of the probability that the walker was at that site, times the probability of moving from that site to the ith site. Thus xn+1,i =
j
Let xn,i be the probability that the walker is at site i after n steps. We collect these probabilities into a sequence of vectors called state vectors. Each state vector contains the probabilities for the nth step in the walk. xn,1 xn,2 xn = . . . xn,k
pi,j xn,j
This can be written in matrix form as xn = P xn1 where P = [pi,j ]. Using this relation repeatedly we nd xn = P n x0 where x0 contains the probabilities at the beginning of the walk. The matrix P has two properties: 1. All entries of P are non-negative. 2. Each column of P sum to 1.
161
IV Eigenvalues and Eigenvectors A matrix with these properties is called a stochastic matrix. The goal is to determine where the walker is likely to be located after many steps. In other words, we want to nd the large n behaviour of xn = P n x0 . Lets look at an example. Suppose there are three sites, the transition probabilities are given by 0.5 0.2 0.1 P = 0.4 0.2 0.8 0.1 0.6 0.1 and the walker starts at site 1 so that initial state vector is 1 x0 = 0 0
Now lets use MATLAB/Octave to calculate the subsequent state vectors for n = 1, 10, 100, 1000. >P=[.5 .2 .1; .4 .2 .8; .1 .6 .1]; >X0=[1; 0; 0]; >P*X0 ans = 0.50000 0.40000 0.10000 >P^10*X0 ans = 0.24007 0.43961 0.32032 >P^100*X0 ans = 0.24000 0.44000 0.32000 P^1000*X0
162
The state vectors converge. Lets see what happens if the initial vector is dierent, say with equal probabilities of being at the second and third sites.
The limit is the same. Of course, we know how to compute high powers of a matrix using the eigenvalues and eigenvectors. A little thought would lead us to suspect that P has an eigenvalue of 1 that is largest in absolute value, and that the corresponding eigenvector is the limiting vector, up to a multiple. Lets check
>eig(P) ans = 1.00000 0.35826 -0.55826 >P*[0.24000; 0.44000; 0.32000] ans = 0.24000 0.44000 0.32000
163
lim xn = c1 v1
Since each xn is a state vector, so is the limit c1 v1 . This allows us to compute c1 easily. It is the reciprocal of the sum of the entries of v1 . In particular, if we chose v1 to be a state vector then c1 = 1. Now we will go through the properties above and explain why they are true (1) P preserves state vectors: Suppose x is a state vector, that is, x has non-negative entries which sum to 1. Then P x has non-negative entries too, since all the entries of P are non-negative. Also (P x)i =
i i j
Pi,j xj =
j i
Pi,j xj =
j
xj
i
Pi,j =
j
xj = 1
164
The key point here is that P and P T have the same eigenvalues. To see this recall that det(A) = det(AT ). This implies that det(I P ) = det((I P )T ) = det(I T P T ) = det(I P T ) So P and P T have the same characteristic polynomial. Since the eigenvalues are the zeros of the characteristic polynomial, they must be the same for P and P T . (Notice that this does not say that the eigenvectors are the same.)
But this equation says that 1 is an eigenvalue for P T . Therefore 1 is an eigenvalue for P as well.
Since P has columns adding up to 1, P T has rows that add up to 1. This fact can be expressed as the matrix equation 1 1 1 1 P . = . . . . . . 1 1
Multiplication by P decreases the length of vectors if we use this norm to measure length. In other words Px 1 x 1
To show that this is true, we use the 1-norm introduced at the beginning of the course. Recall that the norm 1 is dened by x1 x2 . = |x1 | + |x2 | + + |xn |. . 1 . xn
165
IV Eigenvalues and Eigenvectors for any vector x. This follows from the calculation (almost the same as one above, and again using the fact that columns of P are positive and sum to 1) Px
1
=
i
=
i
=
j
|xj | |xj |
1
=
j
= x
Now suppose that is an eigenvalue, so that P v = v for some non-zero v. Then v Since v
1 1
= Pv
= || v
and P v
this implies
1
1 1
to obtain
(3) The eigenvector v1 (or some multiple of it) has all non-negative entries We can give a partial proof of this, in the situation where the eigenvalues other than 1 = 1 obey the strict inequality |i | < 1. In this case the power method implies that starting with any initial vector x0 the vectors P n x0 converge to a multiple c1 v1 of v1 . If we choose the initial vector x0 to have only positive entries, then every vector in the sequence P n x0 has only non-negative entries. This implies that the limiting vector must have non-negative entries. (3) and (4) vs. (3) and (4) and P n with all positive entries Saying that P n has all positive entries means that there is a non-zero probability of moving between any two sites in n steps. The fact that in this case all the eigenvalues other than 1 = 1 obey the strict inequality |i | < 1 follows from a famous theorem in linear algebra called the PerronFrobenius theorem. Although we wont be able to prove the PerronFrobenius theorem, we can give some examples to show that if the condition that P n has all positive entries for some n is violated, then (3) and (4) need not hold.
166
This matrix represents a random walk with two sites that isnt very random. Starting at site one, the walker moves to site two with probability 1, and vice versa. The powers P n of P 1 are equal to I or P depending on whether n is even or odd. So P n doesnt converge: it 0 0 1 for odd n. The eigenvalues of P are easily computed to for even n and is equal to 1 0 be 1 and 1. They both have modulus one. For the second example, consider a random walk where the sites can be divided into two sets A and B and the probability of going from any site in B to any site in A is zero. In this case the i, j entries P n with the i site in A and the jth site in B are always zero. Also, applying P to any state vector drains probability from A to B without sending any back. This means that in the limit P n x0 (that is, the eigenvector v1 ) will have zero entries for all sites in A. For example, consider a three site random walk (the very rst example above) where there is no chance of ever going back to site 1. The matrix 1/3 0 0 P = 1/3 1/2 1/2 1/3 1/2 1/2 corresponds to such a walk. We can check
>P=[1/3 0 0;1/3 1/2 1/2; 1/3 1/2 1/2]; >[V D] = eig(P) V = 0.00000 0.70711 0.70711 D = 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.33333 0.00000 0.70711 -0.70711 0.81650 -0.40825 -0.40825
0 The eigenvector corresponding to the eigenvalue 1 is 1/2 (after normalization to make it 1/2 a state vector).
167
168
IV.5 Markov Chains where Q is the matrix where each entry is equal to 1/N (N is the total number of sites.) If x is a state vector, then Qx is a vector with each entry equal to 1/N . The value of used in practice is about 0.85. The nal ranking of a page can then be dened to be the entry in v1 for S corresponding to that page. The matrix S has all non-zero entries
169
IV Eigenvalues and Eigenvectors possible to construct a function p that is large on strings that are actually English. Here is the output of a random walk that is attempting to maximizde this function.
170
IV.5 Markov Chains Let Q denote the adjusted matrix. Then Q = QT as required, but Q is not neccesarily a stochastic matrix. However we have the freedom to change the diagonal entries of Q. So set the diagonal entries of Q equal to qi,i = 1 qj,i
j=i
Then the columns of Q add up to 1 as required. The only thing left to check is that the entries we just dened are non-negative. Since we have always made the adjustment so that qi,j pi,j we have qi,i 1 pj,i = pi,i 0
j=i
In practice we may not be presented with state vector but with a positive vector p = [p1 , p2 , . . . , pn ]T whose maximal component we want to nd. Although in principle it is easy to obtain as =p/( i pi ), in practice there may be so many terms in the sum that it is impossible to compute. However notice that the crucial equation pi,j j = i pj,i is true for i and j if and only if it is true for pi and pj . Thus we may use the values of p instead. Notice that if one of pi,j or pj,i is 0 then qi,j = qj,i = 0. So it is possible that this algorithm would yield Q = I. This is indeed a stochastic matrix where every state vector is an invariant distribution, but it is not very useful. To avoid examples like this, one could restrict the use of the Metropolis algorithm to matrices P where pi,j and pj,i are always either both zero or both nonzero.
IV.5.6 An example
In this example we will use the Metropolis algorithm to try to maximize a function f (x, y) dened in the square 1 x 1 and 1 y 1. We put down a uniform (2N +1)(2N +1) grid and take the resulting lattice points to be our sites. The initial stochastic matrix P is the one that gives equal probabilities for travelling form a site to each neighbouring site. We use the Metropolis algorithm to modify this, obtaining a stochastic matrix Q whose invariant distribution is proportional to the sampled values of f . We then use Q to produce a random path that has a high probability of converging to the maximum of f . To begin we write a MATLAB/Octave function f.m that denes a Gaussian function f with a peak at (0, 0). function f=f(x,y) a = 10; f = exp(-(a*x)^2-(a*y)^2); end
171
IV Eigenvalues and Eigenvectors Next we write a function p.m such that p(i,j,k,l,N) denes the matrix element for stochastic matrix P corresponding to the sites (i, j) and (k, l) when the grid size is (2N + 1) (2N + 1). function p = p(i,j,k,l,N) % [i,j] and [k,l] are two points on a grid with % -N <= i,j,k,l <= N. The probabilities are those for % a uniform random walk, taking into account the edges % Note: with our convention, p([i,j],[k,l],N) is the % probability of going from [k,l] to [i,j] % Note: if i,j,k,l are out of range, then p=0 if( ([k,l]==[N,N] | [k,l]==[N,-N] | [k,l]==[-N,N] | [k,l]==[-N,-N]) & abs(i-k)+abs(j-l)==1) % starting point [k,l] is a corner: p = 1/2; elseif( (k==N | k==-N | l==N | l==-N) & abs(i-k)+abs(j-l)==1) % starting point [k,l] is on the edge: p = 1/3; elseif(abs(i-k)+abs(j-l)==1) % starting point is inside p = 1/4; else p = 0; end % handle out of range cases: if(i<-N | i>N | j<-N | j>N | k<-N | k>N | l<-N | l>N ) p=0; end end The next function q.m implements the Metropolis algorithm to give the matrix elements of Q. function qq = q(i,j,k,l,N) if(i~=k | j~=l) if(p(i,j,k,l,N)==0)
172
IV.5 Markov Chains qq = 0; elseif(p(i,j,k,l,N)*f(k/N,l/N) > f(i/N,j/N)*p(k,l,i,j,N)) qq = p(i,j,k,l,N)*f(i/N,j/N)/f(k/N,l/N); else qq = p(i,j,k,l,N); end else qq = 1 - q(k+1,l,k,l,N) - q(k-1,l,k,l,N) - q(k,l+1,k,l,N) - q(k,l-1,k,l,N); end Finally, here are the commands that compute a random walk dened by Q % set the grid size N=50; % set the number of iterations niter=1000 % pick starting grid point [i,j] at random k=discrete_rnd(1,[-N:N],ones(1,2*N+1)/(2*N+1)); l=discrete_rnd(1,[-N:N],ones(1,2*N+1)/(2*N+1)); % or start in a corner (if these are uncommented) k=N; l=N; % initialize: X and Y contain the path % F contains f values along the path X=zeros(1,niter); Y=zeros(1,niter); F=zeros(1,niter); % the main loop for count=1:niter % pick the direction to go in the grid, % according to the probabilites in the stochastic matrix q probs = [q(k,l+1,k,l,N),q(k+1,l,k,l,N),q(k,l-1,k,l,N),q(k-1,l,k,l,N),q(k,l,k,l,N)]; dn = discrete_rnd(1,[1,2,3,4,5],probs); switch dn case 1 l=l+1; case 2 k=k+1; case 3
173
IV Eigenvalues and Eigenvectors l=l-1; case 4 k=k-1; end % calculate X, Y and F values for the new grid point X(count)=k/N; Y(count)=l/N; F(count)=f(k/N,l/N); end % plot the path subplot(1,2,1); plot(X,Y); axis([-1,1,-1,1]); axis equal %plot the values of F along the path subplot(1,2,2); plot(F); The resulting plot looks like
1
0.8
-1 -1 -0.5 0 0.5 1
0.2
174
is the largest absolute value of a diagonal entry, that is, the largest value of |i |. Since, for a diagonal matrix the diagonal entries i are also the eigenvalues of D it is natural to conjecture that for any matrix A the matrix norm A is the largest absolute value of an eigenvalue of A. Lets test this conjecture on a 2 2 example. A=[1 2; 3 4] A = 1 3 eig(A) ans = -0.37228 5.37228 norm(A) ans = 5.4650 2 4
0 0 0 . . . n
175
IV Eigenvalues and Eigenvectors Since 5.37228 = 5.4650 we see that the conjecture is false. But before giving up on this idea, lets try one more time. B=[1 3; 3 4] B = 1 3 eig(B) ans = -0.85410 5.85410 norm(B) ans = 5.8541 This time the norm and the largest absolute value of an eigenvalue are both equal to 5.8541 The dierence between these two examples is that B is a Hermitian matrix (in fact, a real symmetric matrix) while A is not. It turns out that for any Hermitian matrix (recall this means A = A) the matrix norm is equal to the largest eigenvalue in absolute value. This is related to the fact that every Hermitian matrix A can be diagonalized by a unitary U : A = U DU with D diagonal with the eigenvalues on the diagonal and U unitary. The point is that multiplying A by a unitary matrix doesnt change the norm. Thus D has the same norm as U D which has the same norm as U DU = A. But the norm of D is the largest eigenvalue in absolute value. The singular value is a decompostion similar to this that is valid for any matrix. For an arbitrary matrix it takes the form A = U V where U and V are unitary and is a diagonal matrix with positive entries on the diagonal. These positve numbers are called the singular values of A. As we will see it is the largest singluar value that is equal to the matrix norm. The matrix A does not have to be a square matrix. In this case, the unitary matrices U and V are not only dierent, but have dierent sizes. The matrix has the same dimensions as A. 3 4
176
IV.6 The Singular Value Decomposition Eigenvectors belonging to distinct eigenvalues are orthogonal
This means that if all the eigenvalues of a Hermetian matrix A are distinct, we can choose the correpsonding eigenvectors to form an orthonormal basis. The matrix U with with this orthonormal basis as its columns is unitary (that is, U = U 1 ) and diagonalizes A. Thus U AU = U 1 AU = D, where D is a diagonal matrix with eigenvalues of A as diagonal entries.
The goal of this section is to show that any Hermitian matrix A can be diagonalized by a unitary. In other words, there exists a unitary matrix U such that U AU = U 1 AU = D, where D is a diagonal matrix. This equation is a compact way of saying that columns of U form a basis of eigenvectors, with eigenvalues given by the diagonal entries of D. Since U is unitary, this basis of eigenvectors is orthonormal.
The argument we present works whether or not A has repeated eigenvalues and also gives a new proof of the fact that the eigenvalues are real.
To begin we show that that if A is any n n matrix (not neccesarily Hermitian), then there exists a unitary matrix U such that U AU is upper triangular. To see this start with any eigenvector v of A with eigenvalue . (Every matrix has at least one eigenvalue.) Normalize v so that v = 1. Now choose and orthonormal basis q2 , . . . , qn for the orthogonal complement of the subspace spanned by v, so that v, q2 , . . . , qn form an orthonormal basis for all of Cn . Form the matrix U1 with these vectors as columns. Then, using to denotes a number that
177
v 2 q2 , v = . . .
Here A2 is an (n 1) (n 1) matrix.
Repeating the same procedure with A2 we can construct an (n 1) (n 1) unitary matrix 2 with U 2 . . . 0 U2 A2 U2 = . . . . A
3
IV.6 The Singular Value Decomposition Then it is not hard to see that 0 2 . . . 0 0 U2 U1 AU1 U2 = . . . . . . . A3 0 0 Continuing in this way, we nd unitary matrices U3 , U4 , . . . , Un1 so that 0 2 . . . Un1 U2 U1 AU1 U2 Un1 = 0 0 . . . .. . . . . . 0 0 n Dene U = U1 U2 Un1 . Then U is unitary, since the product of unitary matrices is unitary, and U = Un1 U2 U1 . Thus the equation above says that U AU is upper triangular, that is, 0 2 . . . U AU = 0 0 . . .. . . . . . 0 0 n Notice that if we take the adjoint of this equation, we get 0 0 0 0 2 0 . . . 0 U A U = . . .. . . . . . n Now lets return to the case where A is Hermitian. Then A = A so that the matrices appearing in the previous two equations are equal. Thus 0 0 0 0 0 2 . . . 2 0 . . . 0 0 0 = . . . . .. .. . . . . . . . . . . 0 0 n n 179
IV Eigenvalues and Eigenvectors This implies that all the entries denoted must actually be zero. This also shows that i = i for every i. In other words, Hermitian matrices can be diagonalized by a unitary matrix, and all the eigenvalues are real.
With these denition U V = AV 1 V = AV V = A so we have produced the singular value decomposition. Notice that U is in fact the unitary matrix that diagonalizes AA since U 2 U = AV 1 2 1 V A = AV V A = AA . Moreover, this formula shows that the eigenvalues of AA are the same as those of A A, since they are the diagonal elements of 2 . In MATLAB/Octave, the singular value decomposition is computed using [U,S,V]=svd(A). Lets try this on the example above:
180
IV.6 The Singular Value Decomposition [U S V] = svd([1 1;1 -1;0 1]) U = 0.57735 -0.57735 0.57735 S = 1.73205 0.00000 0.00000 V = 0 1 -1 -0 0.00000 1.41421 0.00000 -0.70711 -0.70711 0.00000 -0.40825 0.40825 0.81650
2 i i ,
Notice that if we dene the vector of singular values = [1 , . . . , n ]T then A = and A HS = 2 . By taking other vector norms for , like the 1-norm, or the p-norm for other values of p, we can dene a whole new family of norms for matrices. These norms are called Schatten p-norms and are useful in some situations.
181