0% found this document useful (0 votes)
14 views42 pages

Numerical Methods Modified

Uploaded by

Atalelew Zeru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views42 pages

Numerical Methods Modified

Uploaded by

Atalelew Zeru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 42

Numerical Methods

(Hand Out)
Prepared by: Hailu Getachew
Mechanical Engineering Department
Technology Faculty
AAU
December, 2006
Contents
1. Mathematical Modeling, Number System and Errors
2. Solution of Non-linear Equation:
a. Bisection method;
b. Newton 's method
c. Secant method;

3. Curve Fitting:
a. Least square Regression;
b. Interpolations;
c. Fourier Approximations

4. Solutions of Systems of Linear Algebraic Equations:


a. Matrix-Inversion;
b. Gauss-Siedle Iteration;
c. Gaussian-Elimination;
d. LU-Decomposition

5. Numerical Differentiation & Integration:


a. Trapezoidal-Rule; Simpson's Rule;
b. Gauss-Quadrature;
c. Romberg's Integration

6. Eigen Values and Eigen Vectors


7. Numerical Solution of ODEs:
a. Euler's method;
b. Runge-Kutta method

8. Solution of PDE by FDM


2. Root finding in one dimension (Solution of Nonlinear Equation)
2.1 Why?
Solutions x = x0 to equations of the form f(x) = 0 are often required where it is impossible
or infeasible to find an analytical expression for the vector x. If the scalar function f
depends on m independent variables x1,x2,…,xm, then the solution x0 will describe a
surface in m-1 dimensional space. Alternatively we may consider the vector function
f(x)=0, the solutions of which typically collapse to particular values of x. For this course
we restrict our attention to a single independent variable x and seek solutions to f(x)=0.
2.2 Bisection
This is the simplest method for finding a root to an equation. As we shall see, it is also
the most robust. One of the main drawbacks is that we need two initial guesses xa and xb
which bracket the root: let fa = f(xa) and fb = f(xb) such that fa fb <= 0. An example of this is
shown graphically in figure 1 . Clearly, if fa fb = 0 then one or both of xa and xb must be a
root of f(x) = 0.

Figure1: Graphical representation of the bisection method showing two initial guesses (xa
and xb bracketting the root).
The basic algorithm for the bisection method relies on repeated application of
Let xc = (xa+xb)/2,
if fc = f(c) = 0 then x = xc is an exact solution,
elseif fa fc < 0 then the root lies in the interval (xa,xc),
else the root lies in the interval (xc,xb).
By replacing the interval (xa,xb) with either (xa,xc) or (xc,xb) (whichever brackets the root),
the error in our estimate of the solution to f(x) = 0 is, on average, halved. We repeat this
interval halving until either the exact root has been found or the interval is smaller than
some specified tolerance.
2.2.1 Convergence
Since the interval (xa,xb) always bracets the root, we know that the error in using either xa
or xb as an estimate for root at the nth iteration must be en < |xa xb|. Now since the interval
(xa,xb) is halved for each iteration, then
en+1 ~ en/2. (3)
*
More generally, if xn is the estimate for the root x at the nth iteration, then the error in
this estimate is
n = xn x*. (4)
In many cases we may express the error at the n+1th time step in terms of the error at the
nth time step as
|n+1| ~ C|n|p. (5)
Indeed this criteria applies to all techniques discussed in this course, but in many cases it
applies only asymptotically as our estimate xn converges on the exact solution. The
exponent p in equation ( 5 ) gives the order of the convergence. The larger the value of p,
the faster the scheme converges on the solution, at least provided n+1 < n. For first
order schemes (i.e. p = 1), |C| < 1 for convergence.
For the bisection method we may estimate n as en. The form of equation ( 3 ) then
suggests p = 1 and C = 1/2, showing the scheme is first order and converges linearly.
Indeed convergence is guaranteed a root to f(x) = 0 will always be found provided f(x) is
continuous over the initial interval.
2.2.2 Criteria
In general, a numerical root finding procedure will not find the exact root being sought
( = 0), rather it will find some suitably accurate approximation to it. In order to prevent
the algorithm continuing to refine the solution for ever, it is necessary to place some
conditions under which the solution process is to be finished or aborted. Typically this
will take the form of an error tolerance on en = |an-bn|, the value of fc, or both.
For some methods it is also important to ensure the algorithm is converging on a solution
(i.e. |n+1| < |n| for suitably large n), and that this convergence is sufficiently rapid to
attain the solution in a reasonable span of time. The guaranteed convergence of the
bisection method does not require such safety checks which, combined with its extreme
simplicity, is one of the reasons for its widespread use despite being relatively slow to
converge.
2.3 Newton-Raphson
Consider the Taylor Series expansion of f(x) about some point x = x0:
f(x) = f(x0) + (x-x0)f'(x0) + ½(x-x0)2f"(x0) + O(|x-x0|3). (6)
Setting the quadratic and higher terms to zero and solving the linear approximation of
f(x) = 0 for x gives

(7)
.
Subsequent iterations are defined in a similar manner as

(8)
.
Geometrically, xn+1 can be interpreted as the value of x at which a line, passing through
the point (xn,f(xn)) and tangent to the curve f(x) at that point, crosses the y axis. Figure 3
provides a graphical interpretation of this.
Figure3: Graphical interpretation of the Newton Raphson algorithm.

When it works, Newton-Raphson converges much more rapidly than the bisection or
linear interpolation. However, if f' vanishes at an iteration point, or indeed even between
the current estimate and the root, then the method will fail to converge. A graphical
interpretation of this is given in figure 4 .

Figure4: Divergence of the Newton Raphson algorithm due to the presence of a turning
point close to the root.

2.3.1 Convergence
To study how the Newton-Raphson scheme converges, expand f(x) around the root
x = x*,
f(x) = f(x*) + (x- x*)f'(x*) + 1/2(x- x*)2f''(x*) + O(|x- x*|3), (9)
and substitute into the iteration formula. This then shows
(10)

since f(x*)=0. Thus, by comparison with ( 4 ), there is second order (quadratic)


convergence. The presence of the f' term in the denominator shows that the scheme will
not converge if f' vanishes in the neighborhood of the root.
2.4 Secant (chord)
This method is essentially the same as Newton-Raphson except that the derivative f'(x) is
approximated by a finite difference based on the current and the preceding estimate for
the root, i.e.

(11)
,
and this is substituted into the Newton-Raphson algorithm ( 8 ) to give

(12)
.
This formula is identical to that for the Linear Interpolation method discussed in
section3.3 . The difference is that rather than replacing one of the two estimates so that
the root is always bracketed, the oldest point is always discarded in favour of the new.
This means it is not necessary to have two initial guesses bracketing the root, but on the
other hand, convergence is not guaranteed. A graphical representation of the method
working is shown in figure 5 and failure to converge in figure 6 . In some cases,
swapping the two initial guesses x0 and x1 will change the behaviour of the method from
convergent to divergent.
Figure5: Convergence on the root using the secant method.

Figure6: Divergence using the secant method.


2.4.1 Convergence
The order of convergence may be obtained in a similar way to the earlier methods.
Expanding around the root x = x* for xn and xn+1 gives
f(xn) = f(x*) + nf'(x*) + 1/2n2f''(x*) + O(|n|3), (13a)
1 2 3
f(xn1) = f(x*) + n-1f'(x*) + /2n-1 f''(x*) + O(|n-1| ), ( 13 b)
and substituting into the iteration formula
(14)

.
Note that this expression for n+1 includes both n and n-1. In general we would like it in
terms of n only. The form of this expression suggests a power law relationship. By
writing

(15)
,
and substituting into the error evolution equation ( 14 ) gives

(16)

which we equate with our assumed relationship to show

(17)

Thus the method is of non-integer order 1.61803… (the golden ratio). As with Newton-
Raphson, the method may diverge if f' vanishes in the neighbourhood of the root.

2.5 Examples
Consider the equation
f(x) = cos x 1/2. (21)
2.5.1 Bisection method
Initial guesses x =  and x = 
Expect linear convergence: |n+1| ~ |n|/2.
Iteration Error en+1/en
0 -0.261799 -0.500001909862
1 0.130900 -0.4999984721161
2 -0.0654498 -0.5000015278886
3 0.0327250 -0.4999969442322
4 -0.0163624 -0.5000036669437
5 0.00818126 -0.4999951107776
6 -0.00409059 -0.5000110008581
7 0.00204534 -0.4999755541866
8 -0.00102262 -0.5000449824959
9 0.000511356 -0.4999139542706
10 -0.000255634 -0.5001721210794
11 0.000127861 -0.4996574405018
12 -0.0000638867 -0.5006848060707
13 0.0000319871 -0.4986322611303
14 -0.0000159498 -50274110020.188
15 0.00000801862

2.5.2 Newton-Raphson
Initial guess: x = .
Note that can not use x = 0 as derivative vanishes here.
Expect quadratic convergence: n+1 ~ cn2.
Iteration Error en+1/en en+1/en2
0 0.0235988 0.00653855280777 0.2770714107399
1 0.000154302 0.0000445311143083 0.2885971297087
2 0.00000000687124 0.000000014553 -
3 1.0E-15
4 Machine accuracy
Solution found to roundoff error (O(10-15)) in three iterations.
2.5.3 Secant method
Initial guesses x =  and x = /2.
Expect convergence: n+1 ~ cn1.618.
Iteration Error en+1/en |en+1|/|en|1.618
0 -0.261799 0.1213205550823 0.2777
1 -0.0317616 -0.09730712558561 0.8203
2 0.00309063 -0.009399086917554 0.3344
3 -0.0000290491 0.0008898244696049 0.5664
4 -0.0000000258486 -0.000008384051747483 0.4098
5 0.000000000000216716
6 Machine accuracy
Convergence substantially faster than linear interpolation.
2.6 C++ Program
#include<iostream.h>
#include<math.h>
int i,j;
float error, es=machine accuracy?;
const float pi = 3.141592653;
float f(float x)
{
float a=cos(x)-o.5;
return a;
}
float df(float x)
{
float a=sin(x);
return a;
}

float Bisection ( void)


{ float xp,xl, xu, xr, fl, fu, fr;
xl = 0;
fl = f(xl);
xu = pi/2.0;
fu = f(xu)
i=0;
DO
{
xr = (xl + xu)/2;
fr = f(xr);
IF (fl*fr < 0.0)
{ xu = xr;
fu = fr;}
ELSE
{ xa = xc;
fl = fr;}
if (i>0) error= xr – xp;
xp=xr;
++I;
}while(i<=15 && error>es);
return xr;
}
float Newton_Raphson(void)
{
float xp,xi = pi/2.0;
i=0;
DO
{
xi = xi - f(xi)/df(xi);
if(i>0) error=xi-xp;
xp=xi;
++I;
}while(i<=15 && error>es)
return xi;
}
float Secant(void)
{
float xim, xi, xp;
xim = 0
fim = f(xim);
xi = pi/2.0;
fi = f(xi);
i=0;
DO{
If(fim!=fi)
{
/*fim = fi then either method has converged (xim=xi)
or will diverge from this point */
xp=xi;
xi = xi –fi* (xi-xim)/(fi-fim);
xim = xp;
fim = fi;
fi = f(xi);
}
error = xi – xp;
++I;
}while(i<15);
return xi;
}
void main()
{
Bisection;
Newton_raphson;
Secant;
}

3. Curve Fitting
a. Least square Regression;
b. Interpolations;
c. Fourier Approximations

Curve fitting is a method of adjusting a function to a series of data points. When data
origin is from experimental apparatus and, consequently, with some amount of error
associated with that data, a strategy to adjust a function fit or follow the trend of the data
curve is to minimize the discrepancy between the data and the adjusted function.

The Least-Squares Regression


The most usual technique to attain this objective is known as the Least-Squares
Regression. This method bases on the error minimization between the data and the
adjusted function, being the error defined as follow:

The strategy is to minimize the sum of the square of the residuals between the
experimental values and the calculated values by the following way:

This criterion has a number of advantages including the fact that it yields a unique
solution for a given set of data. The parameter values are obtained by equaling to zero the
first derivatives of Sr. This way it is possible to obtain a system of n equations and n
unknowns, n being the number of the parameters of the model equation.

Polynomial Regression
Linear Regression
The simplest example of a least-squares approximation is fitting a straight line to a set
of paired observations. The mathematical expression for a straight line is:

The least-squares method can be readily extended to fit the data to a higher-order
polynomial. Supposing a second-order polynomial:

For this case the sum of the square of the residuals is:

Derivation with one of the parameters,


Equaling each derivative to zero and collecting like terms,

In this case the solution to the problem is easily encountered by solving a system of 3
equations with 3 unknowns. This can be generalized to higher-order polynomials.
Example: Use Least-squares regression to fit a straight line to the following data.
x y Raw Data

0 0
1 1.5
2 2
3 2.1
4 4
5 5.6
6 7
7 8
8 9
9 9

xi yi Xi2 xi*yi yadjusted error


1 0 0 0 0 -0.084 0.084
2 1 1.5 1 1.5 1.0057 0.4943
3 2 2 4 4 2.0954 -0.0954
4 3 2.1 9 6.3 3.1851 -1.0851
5 4 4 16 16 4.2748 -0.2748
6 5 5.6 25 28 5.3645 0.2355
7 6 7 36 42 6.4542 0.5458
8 7 8 49 56 7.5439 0.4561
9 8 9 64 72 8.6336 0.3664
10 9 9 81 81 9.7233 -0.7233
Sum 45 48.2 285 306.8

a1= 1.089697
a0= -0.08364

Fitted Curve
Exponential Regression
The exponential model is used in many fields of engineering to characterize quantities
that increase or decrease at a rate directly proportional to their own magnitude. Their
mathematical representation is the following:

and can be linearized by taking its natural logarithm to yield

This is a mathematical formulation that already obeys the linear mathematical expression.
Example: Fit an exponential model to the following data.

x y
0.4 750
0.8 1000
1.2 1400
1.6 2000
2 2700
2.3 3750

x y yi=ln(y) square(x) yi*x a1*exp(b1*x)


0.4 750 6.6201 0.16 2.648 727.4511
0.8 1000 6.9078 0.64 5.5262 1018.643
1.2 1400 7.2442 1.44 8.6931 1426.397
1.6 2000 7.6009 2.56 12.161 1997.37
2 2700 7.901 4 15.802 2796.9
2.3 3750 8.2295 5.29 18.928 3600.316
8.3 11600 44.503 14.09 63.759

a11= 0.8417
a0= 6.2529

ln(y)=ln(a1)+b1*x
b1= 0.8417
a1= 519.5
Power Regression
Other mathematical representation ,such as the power equation, can be easily linearized:

It can be linearized by taking its base-10 logarithm to yield

Saturation-Growth-Rate
Another non-linear example is the one that has a saturation-growth-rate shape. The
formulation for this model is the following:

that can be linearized by inverting the last equation to yield,

Non-linear Regression
There are many cases in engineering where non-linear models must be fit to data. As
with linear least squares, nonlinear regression is based on determining the values of the
parameters that minimize the sum of the squares of the residuals. However, for the
nonlinear case, the solution must proceed in an iterative fashion. The Gauss-Newton
method is one algorithm for minimizing the sum of the squares of the residuals between
data and nonlinear equations. The key concept underlying the technique is that a Taylor
series expansion is used to express the original nonlinear equation in an approximate
linear form. Then, least-squares theory can be used to obtain new estimates of the
parameters that move in the direction of minimizing the residual. To illustrate how this is
done, first the relationship between the nonlinear equation and the data can be expressed
generally as

or in an abbreviated form by omitting the parameters


applying the Taylor series expansion, and assuming dependency only to two parameters,

where j is the initial guess, j+1 the prediction, a0=a0,j+1-a0,j e a1=a1,j+1-a1,j


Substituting the last equation in the anterior equation, results

or in matrix form,

where Zj is the matrix of the partial derivatives of the function evaluated at the initial
guess j, where n is the number of points and m the number of parameters,

{D} contains the differences between the measured and the function values,

and the vector {a} contains the changes in the parameters values,

This system can now be solved repetitively until the solution converges, this is, until the
relative error between two estimates is less than the stopping criterion.
Interpolation
Interpolation is a method of estimating intermediate values between precise data points.
Linear Interpolation
The simplest form of interpolation is to connect two data points with a straight line,
called linear interpolation. If f(x0) and f(x1) are known at x0 and x1, we can guess the value
of f(x) for any point x between x0 and x1 using:

The smaller the interval between the data points (x0 and x1), the better the approximation,
i.e. as the interval decreases, a continuous function will be better approximated by a
straight line.
Example: Estimate the natural logarithm of 2 using linear interpolation, if ln1=0 and
ln4=1.386.
Solution:
Quadratic Interpolation
This method uses three data points, and it connects these points with a parabola. If f(x0),
f(x1) and f(x2) are known, then we can guess f(x) for any point in between x0 and x2.

Example: Estimate the natural logarithm of 2 using quadratic interpolation, if ln1=0,


ln4=1.386 and ln6=1.792.
Solution:

Fourier Approximation
Sinusoidal Fitting
This a particular case of the least squares fit, since the following equation can be
considered as a linear least squares model,

that is just another example of the general model,


where z0=1, z1=cos(0 t), z2=sin(0 t) and all other's z's=0, thus, our goal is to determine
coefficient values that minimize

the following steps are similar to the ones described before. It is necessary to know 0.

4. Linear equations ( Solutions of Systems of Linear Algebraic Equations)


Solving equation of the form Ax = r is central to many numerical algorithms. There are a
number of methods which may be used, some algebraically correct, while others iterative
in nature and providing only approximate solutions. Which is best will depend on the
structure of A, the context in which it is to be solved and the size compared with the
available computer resources.
Matrix Inversion…..
(note is given in class)

Gauss-Seidel with Relaxation


Gauss-Seidel is the most commonly used iterative method. On this method each equation
is solved in order to one of the unknowns. To start the solution process we must have
initial guesses for the unknowns (we assumed they were zero). After having the new
estimative for x1, we can replace this value in the second equation and obtain x2 and so on
until we have all the new estimatives (x1 to xn) in the first iteration. To stop the process
the following criteria can be used:

Being an iterative method it may or may not converge to the solution. A sufficient, but
not necessary, criterion for convergence is the fact that the system is diagonally dominant
(the diagonal coefficient in each equation must be larger than the sum of absolute values
of the other coefficients).
The programmed method includes relaxation to enhance convergence. After each new
value of x is computed, that new value can be modified by a weighted average of the
previous and present results:

where  is a weighting factor (between 0 and 2).


If  = 1 result is unmodified.
If  > 1 the system is overrelaxed, there is an implicit assumption that the new value is
moving to the correct direction and the process is accelerated.
If < 1 the system is underrelaxed, being this typically employed to make a
nonconverging system converge or to hasten convergence by dampening out oscillations.

4.1 Gauss elimination


This is what you would probably do if you were computing the solution of a non-trivial
system by hand. For the system
(27)

,
we first divide the first row by a11 and then subtract a21 times the new first row from the
second row, a31 times the new first row from the third row … and an1 times the new first
row from the nth row. This gives

(28)

.
By repeating this process for rows 3 to n, this time using the new contents of element 2,2,
we gradually replace the region below the leading diagonal with zeros. Once we have

()

the final solution may be obtained by back substitution.

(30)

If the arithmetic is exact, and the matrix A is not singular, then the answer computed in
this manner will be exact (provided no zeros appear on the diagonal - see below).
However, as computer arithmetic is not exact, there will be some truncation and rounding
error in the answer. The cumulative effect of this error may be very significant if the loss
of precision is at an early stage in the computation. In particular, if a numerically small
number appears on the diagonal of the row, then its use in the elimination of subsequent
rows may lead to differences being computed between very large and very small values
with a consequential loss of precision. For example, if a22-(a21/a11)a12 were very small, 10-
6
, say, and both a23-(a21/a11)a13 and a33-(a31/a11)a13 were 1, say, then at the next stage of the
computation the 3,3 element would involve calculating the difference between 1/10 -6=106
and 1. If single precision arithmetic (representing real values using approximately six
significant digits) were being used, the result would be simply 1.0 and subsequent
calculations would be unaware of the contribution of a23 to the solution. A more extreme
case which may often occur is if, for example, a22-(a21/a11)a12 is zero - unless something is
done it will not be possible to proceed with the computation!
A zero value occuring on the leading diagonal does not mean the matrix is singular.
Consider, for example, the system

(31)
,
the solution of which is obviously x1 = x2 = x3 = 1. However, if we were to apply the
Gauss Elimination outlined above, we would need to divide through by a11 = 0. Clearly
this leads to difficulties!
4.2 Pivoting
One of the ways around this problem is to ensure that small values (especially zeros) do
not appear on the diagonal and, if they do, to remove them by rearranging the matrix and
vectors. In the example given in ( 31 ) we could simply interchange rows one and two to
produce

(32)
,
or columns one and two to give

(33)
,
either of which may then be solved using standard Guass Elimination.
More generally, suppose at some stage during a calculation we have

(34)

where the element 2,5 (201) is numerically the largest value in the second row and the
element 6,2 (155) the numerically largest value in the second column. As discussed
above, the very small 10-6 value for element 2,2 is likely to cause problems. (In an
extreme case we might even have the value 0 appearing on the diagonal - clearly
something must be done to avoid a divide by zero error occurring!) To remove this
problem we may again rearrange the rows and/or columns to bring a larger value into the
2,2 element.
4.2.1 Partial pivoting
In partial or column pivoting, we rearrange the rows of the matrix and the right-hand side
to bring the numerically largest value in the column onto the diagonal. For our example
matrix the largest value is in element 6,2 and so we simply swap rows 2 and 6 to give

(35)

.
Note that our variables remain in the same order which simplifies the implementation of
this procedure. The right-hand side vector, however, has been rearranged. Partial pivoting
may be implemented for every step of the solution process, or only when the diagonal
values are sufficiently small as to potentially cause a problem. Pivoting for every step
will lead to smaller errors being introduced through numerical inaccuracies, but the
continual reordering will slow down the calculation.
4.2.2 Full pivoting
The philosophy behind full pivoting is much the same as that behind partial pivoting. The
main difference is that the numerically largest value in the column or row containing the
value to be replaced. In our example above element the magnitude of element 2,5 (201) is
the greatest in either row 2 or column 2 so we shall rearrange the columns to bring this
element onto the diagonal. This will also entail a rearrangement of the solution vector x.
The rearranged system becomes

(36)

.
The ultimate degree of accuracy can be provided by rearranging both rows and columns
so that the numerically largest value in the submatrix not yet processed is brought onto
the diagonal. In our example above, the largest value is 6003 occurring at position 4,6 in
the matrix. We may bring this onto the diagonal for the next step by interchanging
columns one and six and rows two and four. The order in which we do this is
unimportant. The final result is
(37)

.
Again this process may be undertaken for every step, or only when the value on the
diagonal is considered too small relative to the other values in the matrix.
If it is not possible to rearrange the columns or rows to remove a zero from the diagonal,
then the matrix A is singular and no solution exists.
4.3 LU factorization
A frequently used form of Gauss Elimination is LU Factorisation also known as LU
Decomposition or Crout Factorisation. The basic idea is to find two matrices L and U
such that LU = A, where L is a lower triangular matrix (zero above the leading diagonal)
and U is an upper triangular matrix (zero below the diagonal). Note that this
decomposition is underspecified in that we may choose the relative scale of the two
matrices arbitrarily. By convention, the L matrix is scaled to have a leading diagonal of
unit values. Once we have computed L and U we need solve only Ly=b then Ux=y, a
procedure requiring O(n2) operations compared with O(n3) operations for the full Gauss
elimination. While the factorisation process requires O(n3) operations, this need be done
only once whereas we may wish to solve Ax=b for with whole range of b.
Since we have decided the diagonal elements lii in the lower triangular matrix will always
be unity, it is not necessary for us to store these elements and so the matrices L and U can
be stored together in an array the same size as that used for A. Indeed, in most
implementations the factorisation will simply overwrite A.
The basic decomposition algorithm for overwriting A with L and U may be expressed as
# Factorisation
FOR i=1 TO n
FOR p=i TO n

NEXT p
FOR q=i+1 TO n

NEXT q
NEXT i
# Forward Substitution
FOR i=1 TO n
FOR q=n+1 TO n+m

NEXT q
NEXT i
# Back Substitution
FOR i=n-1 TO 1
FOR q=n+1 TO n+m

NEXT q
NEXT i

This algorithm assumes the right-hand side(s) are initially stored in the same array
structure as the matrix and are positioned in the column(s) n+1 (to n+m for m right-hand
sides). To improve the efficiency of the computation for right-hand sides known in
advance, the forward substitution loop may be incorporated into the factorisation loop.
Figure 10 indicates how the LU Factorisation process works. We want to find vectors liT
and uj such that aij = liTuj. When we are at the stage of calculating the ith element of uj, we
will already have the i nonzero elements of liT and the first i1 elements of uj. The ith
element of uj may therefore be chosen simply as uj(i) = aij liTujwhere the dot-product is
calculated assuming uj(i) is zero.
As with normal Gauss Elimination, the potential occurrence of small or zero values on
the diagonal can cause computational difficulties. The solution is again pivoting - partial
pivoting is normally all that is required. However, if the matrix is to be used in its
factorised form, it will be essential to record the pivoting which has taken place. This
may be achieved by simply recording the row interchanges for each i in the above
algorithm and using the same row interchanges on the right-hand side when using L in
subsequent forward substitutions.

Numerical Differentiation & Integration:

5. Numerical integration
There are two main reasons for you to need to do numerical integration: analytical
integration may be impossible or infeasible, or you may wish to integrate tabulated data
rather than known functions. In this section we outline the main approaches to numerical
integration. Which is preferable depends in part on the results required, and in part on the
function or data to be integrated.
5.1 Trapezium rule
Consider the Taylor Series expansion integrated from x0 to x0+x:
(46)

.
The approximation represented by 1/2[f(x0) + f(x0+x)]x is called the Trapezium Rule
based on its geometric interpretation as shown in figure 12 .

Figure12: Graphical interpretation of the trapezium rule.


As we can see from equation ( 46 ), the error in the Trapezium Rule is proportional to
x3. Thus, if we were to halve x, the error would be decreased by a factor of eight.
However, the size of the domain would be halved, thus requiring the Trapezium Rule to
be evaluated twice and the contributions summed. The net result is the error decreasing
by a factor of four rather than eight. The Trapezium Rule used in this manner is
sometimes termed the Compound Trapezium Rule, but more often simply the Trapezium
Rule. In general it consists of the sum of integrations over a smaller distance x to obtain
a smaller error.
Suppose we need to integrate from x0 to x1. We shall subdivide this interval into n steps of
size x=(x1-x0)/n as shown in figure 13 .
Figure13: Compound Trapezium Rule.
The Compound Trapezium Rule approximation to the integral is therefore

(47)

.
While the error for each step is O(x3), the cumulative error is n times this or O(x2) ~
O(n2).
The above analysis assumes x is constant over the interval being integrated. This is not
necessary and an extension to this procedure to utilise a smaller step size xi in regions
of high curvature would reduce the total error in the calculation, although it would remain
O(x2). We would choose to reduce x in the regions of high curvature as we can see
from equation ( 46 ) that the leading order truncation error is scaled by f".

5.2 Simpson's rule


An alternative approach to decreasing the step size x for the integration is to increase
the accuracy of the functions used to approximate the integrand. Integrating the Taylor
series over an interval 2x shows

(50)
Whereas the error in the Trapezium rule was O(x3), Simpson's rule is two orders more
accurate at O(x5), giving exact integration of cubics.
To improve the accuracy when integrating over larger intervals, the interval x0 to x1 may
again be subdivided into n steps. The three-point evaluation for each subinterval requires
that there are an even number of subintervals. Hence we must be able to express the
number of intervals as n=2m. The Compound Simpson's rule is then

(51)

,
and the corresponding error O(nx5) or O(x4).
5.3 Gauss Quadrature
By careful selection of the points at which the function is evaluated it is possible to
increase the precision for a given number of function evaluations. The Mid-point rule is
an example of this: with just a single function evaluation it obtains the same order of
accuracy as the Trapezium Rule (which requires two points).
One widely used example of this is Gauss quadrature which enables exact integration of
cubics with only two function evaluations (in contrast Simpson's Rule, which is also
exact for cubics, requires three function evaluations). Gauss quadrature has the formula

(56)
.
In general it is possible to choose M function evaluations per interval to obtain a formula
exact for all polynomials of degree 2M1 and less.
The Gauss Quadrature accurate to order 2M1 may be determined using the same
approach required for the two-point scheme. This may be derived by comparing the
Taylor Series expansion for the integral with that for the points x0+x and x0+x:
(57)

.
Equating the various terms reveals
+=1
(2 + 2)/4 = 1/6, (58)
the solution of which gives the positions stated in equation ( 56 ).
5.4 Romberg integration
With the Compound Trapezium Rule we know from section 5.2 the error in some
estimate T(x) of the integral I using a step size x goes like cx2 as x0, for some
constant c. Likewise the error in T(x/2) will be cx2/4. From this we may construct a
revised estimate T(1)(x/2) for I as a weighted mean of T(x) and T(x/2):
T(1)(x/2) = T(x/2) + (1-)T(x)
= [I + cx2/4 + O(x4)] + (1-)[I + cx2 + O(x4)]. (53)
2
By choosing the weighting factor  = 4/3 we elimate the leading order (O(x )) error
terms, relegating the error to O(x4). Thus we have
T(1)(x/2) = [4T(x/2) - T(x)]/3. (54)
Comparison with equation ( 51 ) shows that this formula is precisely that for Simpson's
Rule.
This same process may be carried out to higher orders using x/4, x/8, … to eliminate
the higher order error terms. For the Trapezium Rule the errors are all even powers of x
and as a result it can be shown that
T(m)(x/2) = [22mT(m-1)(x/2) - T(m-1)(x)]/(22m-1). ()
A similar process may also be applied to the Compound Simpson's Rule.
5.8 Example of numerical integration
Consider the integral

(59)
,
which may be integrated numerically using any of the methods described in the previous
sections. Table 2 gives the error in the numerical estimates for the Trapezium Rule,
Midpoint Rule, Simpson's Rule and Gauss Quadrature. The results are presented in terms
of the number of function evaluations required. The calculations were performed in
double precision.

No. intervals Trapezium Rule Simpson's Rule Gauss Quadrature


1
2 -2.00000000E+00 -6.41804253E-02
4 -4.29203673E-01 9.43951023E-02 -3.05477319E-03
8 -1.03881102E-01 4.55975498E-03 -1.79666460E-04
16 -2.57683980E-02 2.69169948E-04 -1.10640837E-05
32 -6.42965622E-03 1.65910479E-05 -6.88965642E-07
64 -1.60663902E-03 1.03336941E-06 -4.30208237E-08
128 -4.01611359E-04 6.45300022E-08 -2.68818500E-09
256 -1.00399815E-04 4.03225719E-09 -1.68002278E-10
512 -2.50997649E-05 2.52001974E-10 -1.04984909E-11
1024 -6.27492942E-06 1.57500679E-11 -6.55919762E-13
2048 -1.56873161E-06 9.82769421E-13
4096 -3.92182860E-07
8192 -9.80457133E-08
16384 -2.45114248E-08
32768 -6.12785222E-09
65536 -1.53194190E-09
131072 -3.82977427E-10

6. Eigen Values and Eigen Vectors

7. Solutions first order ordinary differential equations


7.1 Taylor series
The key idea behind numerical solution of odes is the combination of function values at
different points or times to approximate the derivatives in the required equation. The
manner in which the function values are combined is determined by the Taylor Series
expansion for the point at which the derivative is required. This gives us a finite
difference approximation to the derivative.
7.2 Finite difference
Consider a first order ode of the form

(60)
,
subject to some boundary/initial condition f(t=t0) = c. The finite difference solution of this
equation proceeds by discretising the independent variable t to t0, t0+t, t0+2t, t0+3t,
… We shall denote the exact solution at some t = tn = t0+nt by yn = y(t=tn) and our
approximate solution by Yn. We then look to solve
Y'n = f(tn,Yn) (61)
at each of the points in the domain.
If we take the Taylor Series expansion for the grid points in the neighbourhood of some
point t = tn,

()

,
we may then take linear combinations of these expansions to obtain an approximation for
the derivative Y'n at t = tn, viz.

()
.
The linear combination is chosen so as to eliminate the term in Yn, requiring

(64)

and, depending on the method, possibly some of the terms of higher order. We shall look
at various strategies for choosing i in the following sections. Before doing so, we need
to consider the error associated with approximating yn by Yn.
7.3 Truncation error
The global truncation error at the nth step is the cumulative total of the truncation error
at the previous steps and is
En = Yn - yn. (65)
In contrast, the local truncation error for the nth step is
en = Yn - yn*, (66)
where yn* the exact solution of our differential equation but with the initial condition yn-
1*=Yn-1. Note that En is not simply the sum of en. It also depends on the stability of the
method (see section 6.7 for details) and we aim for En = O(en).
7.4 Euler method
The Euler method is the simplest finite difference scheme to understand and implement.
By approximating the derivative in ( 61 ) as
Y'n =~ (Yn+1 - Yn)/t, (67)
in our differential equation for Yn we obtain
Yn+1 = Yn + tf(tn,Yn). (68)
Given the initial/boundary condition Y0 = c, we may obtain Y1 from Y0 + tf(t0,Y0), Y2
from Y1 + tf(t1,Y1) and so on, marching forwards through time. This process is shown
graphically in figure 18 .
Figure18: Sketch of the function y(t) (dark line) and the Euler method solution (arrows).
Each arrow is tangental to to the solution of ( 60 ) passing through the point located at the
start of the arrow. Note that this point need not be on the desired y(t) curve.
The Euler method is termed an explicit method because we are able to write down an
explicit solution for Yn+1 in terms of "known" values at tn. Inspection of our
approximation for Y'n shows the error term is of order t2 in our time step formula. This
shows that the Euler method is a first order method. Moreover it can be shown that if
Yn=yn+O(t2), then Yn+1=yn+1+O(t2) provided the scheme is stable (see section6.7 ).
7.5 Implicit methods
The Euler method outlined in the previous section may be summarised by the update
formula Yn+1 = g(Yn,tn,t). In contrast implicit methods have have Yn+1 on both sides:
Yn+1 = h(Yn,Yn+1,tn,t), for example. Such implicit methods are computationally more
expensive for a single step than explicit methods, but offer advantages in terms of
stability and/or accuracy in many circumstances. Often the computational expense per
step is more than compensated for by it being possible to take larger steps (see section6.7
).
7.5.1 Backward Euler
The backward Euler method is almost identical to its explicit relative, the only difference
being that the derivative Y'n is approximated by
Y'n =~ (Yn - Yn-1)/t, (69)
to give the evolution equation
Yn+1 = Yn + tf(tn+1,Yn+1). (70)
This is shown graphically in figure 19 .
Figure19: Sketch of the function y(t) (dark line) and the Euler method solution (arrows).
Each arrow is tangental to to the solution of ( 60 ) passing through the point located at the
end of the arrow. Note that this point need not be on the desired y(t) curve.
The dependence of the right-hand side on the variables at tn+1 rather than tn means that it is
not, in general possible to give an explicit formula for Yn+1 only in terms of Yn and tn+1. (It
may, however, be possible to recover an explicit formula for some functions f.)
As the derivative Y'n is approximated only to the first order, the Backward Euler method
has errors of O(t2), exactly as for the Euler method. The solution process will, however,
tend to be more stable and hence accurate, especially for stiff problems (problems where
f' is large). An example of this is shown in figure 20 .

Figure20: Comparison of ordinary differential equation solvers for a stiff problem.


7.7 Stability
The stability of a method can be even more important than its accuracy as measured by
the order of the truncation error. Suppose we are solving
y' = y, (82)
for some complex . The exact solution is bounded (i.e. does not increase without limit)
provided Re <= 0. Substituting this into the Euler method shows
Yn+1 = (1 + t)Yn = (1 + t)2Yn-1 = … = (1 + t)n+1Y0. (83)
If Yn is to remain bounded for increasing n and given Re < 0 we require
|1 + t| <= 1. (84)
If we choose a time step t which does not satisfy ( 84 ) then Yn will increase without
limit. This condition ( 84 ) on t is very restrictive if <<0 as it demonstrates the Euler
method must use very small time steps t < 2||-1 if the solution is to converge on y = 0.
The reason why we consider the behaviour of equation ( 82 ) is that it is a model for the
behaviour of small errors. Suppose that at some stage during the solution process our
approximate solution is y$ = y +  where  is the (small) error. Substituting this into our
differential equation of the form y' = f(t,y) and using a Taylor Series expansion gives

(85)

.
Thus, to the leading order, the error obeys an equation of the form given by ( 82 ), with
 = f/y. As it is desirable for errors to decrease (and thus the solution remain stable)
rather than increase (and the solution be unstable), the limit on the time step suggested by
( 84 ) applies for the application of the Euler method to any ordinary differential
equation. A consequence of the decay of errors present at one time step as the solution
process proceeds is that memory of a particular time step's contribution to the global
truncation error decays as the solution advances through time. Thus the global truncation
error is dominated by the local trunction error(s) of the most recent step(s) and
O(En) = O(en).
In comparison, solution of ( 82 ) by the Backward Euler method
Yn+1 = Yn + tYn+1, (86)
can be rearranged for Yn+1 and
Yn+1 = Yn/(1 - t) = Yn-1/(1 - t)2 = … = Y0/(1 - t)n+1, (87)
which will be stable provided
|1 - t| > 1. (88)
For Re <= 0 this is always satisfied and so the Backward Euler method is
unconditionally stable.
The Crank-Nicholson method may be analysed in a similar fashion with
(1t/2)Yn+1 = (1+t/2)Yn, (89)
to arrive at
Yn+1 = [(1+t/2)/(1 - t/2)]n+1 Y0, (90)
with the magnitude of the term in square brackets always less than unity for Re < 0.
Thus, like Backward Euler, Crank-Nicholson is unconditionally stable.
In general, explicit methods require less computation per step, but are only conditionally
stable and so may require far smaller step sizes than an implicit method of nominally the
same order.
7.8 Predictor-corrector methods
Predictor-corrector methods try to combine the advantages of the simplicity of explicit
methods with the improved stability and accuracy of implicit methods. They achieve this
by using an explicit method to predict the solution Yn+1(p) at tn+1 and then utilise
f(tn+1,Yn+1(p)) as an approximation to f(tn+1,Yn+1) to correct this prediction using something
similar to an implicit step.
7.8.1 Improved Euler method
The simplest of these methods combines the Euler method as the predictor
Yn+1(1) = Yn + tf(tn,Yn), (91)
and then the Backward Euler to give the corrector
Y(2)n+1 = Yn + tf(tn,Yn+1(1)). (92)
The final solution is the mean of these:
Yn+1 = (Yn+1(1) + Yn+1(2))/2. (93)
To understand the stability of this method we again use the y' = y so that the three steps
described by equations ( 91 ) to ( 93 ) become
Yn+1(1) = Yn + tYn, (94a)
(2) (1)
Yn+1 = Yn + tYn+1
= Yn + t (Yn + tYn)
= (1 + t + 2t2)Yn, ( 94 b)
(1) (2)
Yn+1 = (Yn+1 + Yn+1 )/2
= [(1 + t)Yn + (1 + t + 2t2) Yn]/2
= (1 + t + 1/22t2) Yn
= (1 + t + 1/22t2)n Y0. ( 94 c)
Convergence requires |1 + t + 1/22t2| < 1 (for Re < 0) which in turn restricts
t < 2||-1. Thus the stability of this method, commonly known as the Improved Euler
method, is identical to the Euler method. This is not surprising as it is limited by the
stability of the initial predictive step. The accuracy of the method is, however, second
order as may be seen by comparison of ( 94 c) with the Taylor Series expansion.
7.8.2 Runge-Kutta methods
The Improved Euler method is the simplest of a family of similar predictor corrector
methods following the form of a single predictor step and one or more corrector steps.
The corrector step may be repeated a fixed number of times, or until the estimate for Yn+1
converges to some tolerance.
One subgroup of this family are the Runge-Kutta methods which use a fixed number of
corrector steps. The Improved Euler method is the simplest of this subgroup. Perhaps the
most widely used of these is the fourth order method:
k(1) = tf(tn,Yn) , (95a)
(2) 1 (1)
k = tf(tn+ /2t ,Yn+½k ) , ( 95 b)
(3) 1 (2)
k = tf(tn+ /2t ,Yn+½k ) , ( 95 c)
(4) (3)
k = tf(tn+t ,Yn+k ) , ( 95 d)
(1) (2) (3) (4)
Yn+1 = Yn + (k + 2k + 2k + k )/6. ( 95 e)
In order to analyse this we need to construct Taylor-Series expansions for
k(2) = tf(tn+1/2t,Yn+1/2k(1)) = t[f(tn,Yn)+(t/2)(f/t+ff/y)], and similarly for k(3) and k(4).
This is then compared with a full Taylor-Series expansion for Yn+1 up to fourth order
requiring Y" = df/dt = f/t + f f/y, Y'" = d2f/dt2 = 2f/t2 + 2f 2f/ty + f/t f/y + f2 2f/y2 + f (f/y)2, and
similarly for Y"". All terms up to order t4 can be shown to match, with the error coming
in at t5.
8. Partial differential equations
8.1 Laplace equation
Consider the Laplace equation in two dimensions

(101)
,
in some rectangular domain described by x in [x0,x1], y in [y0,y1]. Suppose we discretise
the solution onto a m+1 by n+1 rectangular grid (or mesh) given by xi = x0 + ix,
yj = y0 + jy where i=0,m, j=0,n. The mesh spacing is x = (x1-x0)/m and y = (y1-y0)/n.
Let ij = (xi,yj) be the exact solution at the mesh point i,j, and ij =~ ij be the approximate
solution at that mesh point.
By considering the Taylor Series expansion for about some mesh point i,j,

(102a)
,

( 102 b)
,

( 102 b)
,

( 102 b)
,
it is clear that we may approximate /x and /y2to the first order using the four
2 2 2

adjacent mesh points to obtain the finite difference approximation

(103)

for the internal points 0<i<m, 0<j<n. In addition to this we will have either Dirichlet, von
Neumann or mixed boundary conditions to specify the boundary values of ij. The system
of linear equations described by ( 103 ) in combination with the boundary conditions may
be solved in a variety of ways.
8.1.1 Direct solution
Provided the boundary conditions are linear in , our finite difference approximation is
itself linear and the resulting system of equations may be solved directly using Gauss
Elimination as discussed in section4.1 . This approach may be feasible if the total number
of mesh points (m+1)(n+1) required is relatively small, but as the matrix A used to
represent the complete system will have [(m+1)(n+1)]2 elements, the storage and
computational cost of such a solution will become prohibitive even for relatively modest
m and n.
The structure of the system ensures A is relatively sparse, consisting of a tridiagonal core
with one nonzero diagonal above and another below this. These nonzero diagonals are
offset by either m or n from the leading diagonal. Provided pivoting (if required) is
conducted in such a way that it does not place any nonzero elements outside this band
then solution by Gauss Elimination or LU Decomposition will only produce nonzero
elements inside this band, substantially reducing the storage and computational
requirements (see section4.4 ). Careful choice of the order of the matrix elements (i.e. by
x or by y) may help reduce the size of this matrix so that it need contain only O(m3)
elements for a square domain.
Because of the wide spread need to solve Laplace's and related equations, specialised
solvers have been developed for this problem. One of the best of these is Hockney's
method for solving Ax = b which may be used to reduce a block tridiagonal matrix (and
the corresponding right-hand side) of the form

(104)

,
into a block diagonal matrix of the form

(105)

,
where and are themselves block tridiagonal matrices and I is an identiy matrix.. This
process may be performed iteratively to reduce an n dimensional finite difference
approximation to Laplace's equation to a tridiagonal system of equations with n-1
applications. The computational cost is O(p log p), where p is the total number of mesh
points. The main drawback of this method is that the boundary conditions must be able to
be cast into the block tridiagonal format.
8.1.2 Relaxation
An alternative to direct solution of the finite difference equations is an iterative numerical
solution. These iterative methods are often referred to as relaxation methods as an initial
guess at the solution is allowed to slowly relax towards the true solution, reducing the
errors as it does so. There are a variety of approaches with differing complexity and
speed. We shall introduce these methods before looking at the basic mathematics behind
them.
8.1.2.1 Jacobi
The Jacobi Iteration is the simplest approach. For clarity we consider the special case
when x = y. To find the solution for a two-dimensional Laplace equation simply:
Initialise ij to some initial guess.
Apply the boundary conditions.
For each internal mesh point set
*ij = (i+1,j + i-1,j + i,j+1 + i,j-1)/4. (106)
Replace old solution  with new estimate *.
If solution does not satisfy tolerance, repeat from step 2.
The coefficients in the expression (here all 1/4) used to calculate the refined estimate is
often referred to as the stencil or template. Higher order approximations may be obtained
by simply employing a stencil which utilises more points. Other equations (e.g. the bi-
harmonic equation, 4 = 0) may be solved by introducing a stencil appropriate to that
equation.
While very simple and cheap per iteration, the Jacobi Iteration is very slow to converge,
especially for larger grids. Corrections to errors in the estimate ij diffuse only slowly
from the boundaries taking O(max(m,n)) iterations to diffuse across the entire mesh.
8.1.2.2 Gauss-Seidel
The Gauss-Seidel Iteration is very similar to the Jacobi Iteration, the only difference
being that the new estimate *ij is returned to the solution ij as soon as it is completed,
allowing it to be used immediately rather than deferring its use to the next iteration. The
advantages of this are:
Less memory required (there is no need to store *).
Faster convergence (although still relatively slow).
On the other hand, the method is less amenable to vectorisation as, for a given iteration,
the new estimate of one mesh point is dependent on the new estimates for those already
scanned.
8.1.2.4 Successive Over Relaxation (SOR)
It has been found that the errors in the solution obtained by any of the three preceding
methods decrease only slowly and often decrease in a monotonic manner. Hence, rather
than setting
*ij = (i+1,j + i-1,j + i,j+1 + i,j-1)/4,
for each internal mesh point, we use
*ij = (1-)ij + (i+1,j + i-1,j + i,j+1 + i,j-1)/4, (107)
for some value . The optimal value of  will depend on the problem being solved and
may vary as the iteration process converges. Typically, however, a value of around 1.2 to
1.4 produces good results. In some special cases it is possible to determine an optimal
value analytically.
8.1.3 Multigrid*
The big problem with relaxation methods is their slow convergence. If  = 1 then
application of the stencil removes all the error in the solution at the wave length of the
mesh for that point, but has little impact on larger wave lengths. This may be seen if we
consider the one-dimensional equation d2/dx2 = 0 subject to (x=0) = 0 and (x=1) = 1.
Suppose our initial guess for the iterative solution is that i = 0 for all internal mesh
points. With the Jacobi Iteration the correction to the internal points diffuses only slowly
along from x = 1.
Multigrid methods try to improve the rate of convergence by considering the problem of
a hierarchy of grids. The larger wave length errors in the solution are dissipated on a
coarser grid while the shorter wave length errors are dissipated on a finer grid. for the
example considered above, the solution would converge in one complete Jacobi multigrid
iteration, compared with the slow asymptotic convergence above.
For linear problems, the basic multigrid algorithm for one complete iteration may be
described as
Select the initial finest grid resolution p=P0 and set b(p) = 0 and make some initial guess
at the solution (p)
If at coarsest resolution (p=0) then solve A(p)(p)=b(p) exactly and jump to step 7
Relax the solution at the current grid resolution, applying boundary conditions
Calculate the error r = A(p)-b(p)
Coarsen the error b(p-1)r to the next coarser grid and decrement p
Repeat from step 2
Refine the correction to the next finest grid (p+1) = (p+1)+(p) and increment p
Relax the solution at the current grid resolution, applying boundary conditions
If not at current finest grid (P0), repeat from step 7
If not at final desired grid, increment P0 and repeat from step 7
If not converged, repeat from step 2.
Typically the relaxation steps will be performed using Successive Over Relaxtion with
Red-Black ordering and some relaxation coefficient . The hierarchy of grids is normally
chosen to differ in dimensions by a factor of 2 in each direction. The factor  is typically
less than unity and effectively damps possible instabilities in the convergence. The
refining of the correction to a finer grid will be achieved by (bi-)linear or higher order
interpolation, and the coarsening may simply be by sub-sampling or averaging the error
vector r.
It has been found that the number of iterations required to reach a given level of
convergence is more or less independent of the number of mesh points. As the number of
operations per complete iteration for n mesh points is O(n)+O(n/2d)+ +O(n/22d)+…,
where d is the number of dimensions in the problem, then it can be seen that the
Multigrid method may often be faster than a direction solution (which will require O(n3),
O(n2) or O(n log n) operations, depending on the method used). This is particularly true if
n is large or there are a large number of dimensions in the problem. For small problems,
the coefficient in front of the n for the Multigrid solution may be relatively large so that
direct solution may be faster.
A further advantage of Multigrid and other iterative methods when compared with direct
solution, is that irregular shaped domains or complex boundary conditions are
implemented more easily. The difficulty with this for the Multigrid method is that care
must be taken in order to ensure consistent boundary conditions in the embedded
problems.
8.1.4 The mathematics of relaxation*
In principle, relaxation methods which are the basis of the Jacobi, Gauss-Seidel,
Successive Over Relaxation and Multigrid methods may be applied to any system of
linear equations to interatively improve an approximation to the exact solution. The basis
for this is identical to the Direct Iteration method described in section3.6 . We start by
writing the vector function
f(x) = Ax b, (108)
and search for the vector of roots to f(x) = 0 by writing
xn+1 = g(xn), (109)
where
g(x) = D1{[A+D]x b}, (110)
with D a diagonal matrix (zero for all off-diagonal elements) which may be chosen
arbitrarily. We may analyse this system by following our earlier analysis for the Direct
Iteration method (section3.6 ). Let us assume the exact solution is x* = g(x*), then
n+1 = xn+1 x*
= D1{[A+D]xn b} D1{[A+D]x* b}
= D1[A+D](xn x*)
= D1[A+D]n
= {D1[A+D]}n+1 0.
From this it is clear that convergence will be linear and requires
||n+1|| = ||Bn|| < ||n||, (111)
1
where B = D [A+D] for some suitable norm. As any error vector n may be written as a
linear combination of the eigen vectors of our matrix B, it is sufficient for us to consider
the eigen value problem
Bn = n, (112)
and require max(||) to be less than unity. In the asymptotic limit, the smaller the
magnitude of this maximum eigen value the more rapid the convergence. The
convergence remains, however, linear.
Since we have the ability to choose the diagonal matrix D, and since it is the eigen values
of B = D1[A+D] rather than A itself which are important, careful choice of D can aid the
speed at which the method converges. Typically this means selecting D so that the
diagonal of B is small.
8.1.4.1 Jacobi and Gauss-Seidel for Laplace equation*
The structure of the finite difference approximation to Laplace's equation lends itself to
these relaxation methods. In one dimension,
(113)

and both Jacobi and Gauss-Seidel iterations take D as 2I (I is the identity matrix) on the
diagonal to give B = D1[A+D] as

(114)

The eigen values  of this matrix are given by the roots of


det(BI) = 0. (115)
In this case the determinant may be obtained using the recurrence relation
det(B)(n) =  det(B)(n1) 1/4 det(B)(n2) , (116)
where the subscript gives the size of the matrix B. From this we may see
det(B)(1) =  ,
det(B)(2) = 2 ¼ ,
det(B)(3) = 3 + ½ ,
det(B)(4) = 4 ¾ 2 + (1/16) ,
det(B)(5) = 5 + 3 (3/16) ,
det(B)(6) = 6 (5/4)4 + (3/8)2 (1/64) ,
… (117)
which may be solved to give the eigen values
(1) = 0 ,
2(2) = 1/4 ,
2(3) = 0, 1/2 ,
2(4) = (3 5)/8 ,
2(5) = 0, 1/4, 3/4 ,
… (118)
It can be shown that for a system of any size following this general form, all the eigen
values satisfy || < 1, thus proving the relaxation method will always converge. As we
increase the number of mesh points, the number of eigen values increases and gradually
fills up the range || < 1, with the numerically largest eigen values becoming closer to
unity. As a result of 1, the convergence of the relaxation method slows considerably for
large problems. A similar analysis may be applied to Laplace's equation in two or more
dimensions, although the expressions for the determinant and eigen values is
correspondingly more complex.
The large eigen values are responsible for decreasing the error over large distances (many
mesh points). The multigrid approach enables the solution to converge using a much
smaller system of equations and hence smaller eigen values for the larger distances,
bypassing the slow convergence of the basic relaxation method.
8.1.4.2 Successive Over Relaxation for Laplace equation*
The analysis of the Jacobi and Gauss-Seidel iterations may be applied equally well to
Successive Over Relaxation. The main difference is that D = (2/)I so that

(119)

and the corresponding eigen values are related by (1)2 equal to the values tabulated
above. Thus if  is chosen inappropriately, the eigen values of B will exceed unity and
the relaxation method will diverge. On the otherhand, careful choise of  will allow the
eigen values of B to be less than those for Jacobi and Gauss-Seidel, thus increasing the
rate of convergence.
8.1.4.3 Other equations*
Relaxation methods may be applied to other differential equations or more general
systems of linear equations in a similar manner. As a rule of thumb, the solution will
converge if the A matrix is diagonally dominant, i.e. the numerically largest values occur
on the diagonal. If this is not the case, SOR can still be used, but it may be necessary to
choose  < 1 whereas for Laplace's equation  >= 1 produces a better rate of
convergence.
8.1.5 FFT*
One of the most common ways of solving Laplace's equation is to take the Fourier
transform of the equation to convert it into wave number space and there solve the
resulting algebraic equations. This conversion process can be very efficient if the Fast
Fourier Transform algorithm is used, allowing a solution to be evaluated with O(n log n)
operations.
In its simplest form the FFT algorithm requires there to be n = 2p mesh points in the
direction(s) to be transformed. The efficiency of the algorithm is achieved by first
calculating the transform of pairs of points, then of pairs of transforms, then of pairs of
pairs and so on up to the full resolution. The idea is to divide and conquer! Details of the
FFT algorithm may be found in any standard text.
8.2 Poisson equation
The Poisson equation 2 = f(x) may be treated using the same techniques as Laplace's
equation. It is simply necessary to set the right-hand side to f, scaled suitably to reflect
any scaling in A.
8.3 Diffusion equation
Consider the two-dimensional diffusion equation,
(120)
,
subject to u(x,y,t) = 0 on the boundaries x=0,1 and y=0,1. Suppose the initial conditions
are u(x,y,t=0) = u0(x,y) and we wish to evaluate the solution for t > 0. We shall explore
some of the options for achieving this in the following sections.
8.3.1 Semi-discretisation
One of the simplest and most useful approaches is to discretise the equation in space and
then solve a system of (coupled) ordinary differential equations in time in order to
calculate the solution. Using a square mesh of step size x = y = 1/m, and taking the
diffusivity D = 1, we may utilise our earlier approximation for the Laplacian operator
(equation ( 103 )) to obtain

(121)

for the internal points i=1,m-1 and j=1,m-1. On the boundaries (i=0,j), (i=m,j), (i,j=0) and
(i,j=m) we simply have uij=0. If Uij represents our approximation of u at the mesh points
xij, then we must simply solve the (m-1)2 coupled ordinary differential equations

. (122)
In principle we may utilise any of the time stepping algorithms discussed in earlier
lectures to solve this system. As we shall see, however, care needs to be taken to ensure
the method chosen produces a stable solution.
8.3.2 Euler method
Applying the Euler method Yn+1 = Yn+tf(Yn,tn) to our spatially discretised diffusion
equation gives
(123)
,
where the Courant number
 = t/x2, (124)
describes the size of the time step relative to the spatial discretisation. As we shall see,
stability of the solution depends on  in contrast to an ordinary differential equation
where it is a function of the time step t only.
8.3.3 Stability
Stability of the Euler method solving the diffusion equation may be analysed in a similar
way to that for ordinary differential equations. We start by asking the question "does the
Euler method converge as t->infinity?" The exact solution will have u -> 0 and the
numerical solution must also do this if it is to be stable.
We choose
U(0)i,j=sin(i) sin(j), (125)
for some  and  chosen as multiples of /m to satisfy u = 0 on the boundaries.
Substituting this into ( 123 ) gives
U(1)i,j = sin(i)sin(j) + {sin[(i+1)]sin(j) + sin[(i1)]sin(j)
+ sin(i)sin[(j+1)] + sin(i)sin[(j1)] 4 sin(i)sin(j)}
= sin(i)sin(j) + {[sin(i)cos() + cos(i)sin()]sin(j) + [sin(i)cos()
cos(i)sin()]sin(j)
+ sin(i)[sin(j)cos() + cos(j)sin()] + sin(i)[sin(j)cos()
cos(j)sin()] 4 sin(i)sin(j)}
= sin(i)sin(j) + 2{sin(i)cos() sin(j) + sin(i)sin(j)cos() 2
sin(i)sin(j)}
= sin(i)sin(j){1 + 2[cos() + cos() 2]}
= sin(i)sin(j){1 4[sin2(/2) + sin2(/2)]}. (126)
Applying this at consecutive times shows the solution at time tn is
U(n)i,j = sin(i)sin(j) {1 4[sin2(/2) + sin2(/2)]}n, (127)
2 2
which then requires |1 4[sin (/2) + sin (/2)]| < 1 for this to converge as n>infinity.
For this to be satisfied for arbitrary  and  we require  < 1/4. Thus we must ensure
t < x2/4. (128)
A doubling of the spatial resolution therefore requires a factor of four more time steps so
overall the expense of the computation increases sixteen-fold.
The analysis for the diffusion equation in one or three dimensions may be computed in a
similar manner.
8.3.4 Model for general initial conditions
Our analysis of the Euler method for solving the diffusion equation in section 8.3.3
assumed initial conditions of the form sin(kx/Lx) sin(ly/Ly) where k,l are integers and
Lx, Ly are the dimensions of the domain. In addition to satisfying the boundary conditions,
these initial conditions represent a set of orthogonal functions which may be used to
construct any arbitrary initial conditions as a Fourier series. Now, since the diffusion
equation is linear, and as our stability analysis of the previous section shows the
conditions under which the solution for each Fourier mode is stable, we can see that the
equation ( 128 ) applies equally for arbitrary initial conditions.
8.3.5 Crank-Nicholson
The implicit Crank-Nicholson method is significantly better in terms of stability than the
Euler method for ordinary differential equations. For partial differential equations such as
the diffusion equation we may analyse this in the same manner as the Euler method of
section8.3.3 .
For simplicity, consider the one dimensional diffusion equation

(129)

with u(x=0,t) = u(x=1,t) = 0 and apply the standard spatial discretisation for the curvature
term to obtain
(130)
for the i=1,m internal points. Solution of this expression will involve the solution of a
tridiagonal system for this one-dimensional problem.
To test the stability we again choose a Fourier mode. Here we have only one spatial
dimension so we use U(0)i = sin(i) which satisfies the boundary condition if  is a
multiple of . Substituting this into ( 131 ) we find

(131)
.
Since the term [1-2sin2(/2)]/ [1+2sin2(/2)] < 1 for all  > 0, the Crank-Nicholson
method is unconditionally stable. The step size t may be chosen on the grounds of
truncation error independently of x.

You might also like