Numerical Methods
Numerical Methods
METHODS – Part 2
Lecture notes by C.J.Budd
Contents
52
Chapter 6
6.1 Introduction
An initial value problem is an ordinary differential equation of the form
du
= f(t, u) where u(t0 ) = u0 is given and u ∈ IRn . (6.1)
dt
In the special case when f (t, u) = f(t) we have
Z t
u(t) = f dt + u0 ,
t0
and the task of evaluating this integral accurately is called quadrature To solve any differential equation
we need to put it into the standard form given by (6.1). Any equation involving higher derivatives
can be reformulated as such a vector equation. For example, if w satisfies the second order differential
equation,
d2 w
=w
dt2
Let
u1 = w, u2 = dw/dt.
Then
du1 /dt = u2 ,
du2 /dt = u1 .
Thus if
u = (u1 , u2 )T
we have
du 01
= u ≡ Au
dt 10
Initial value problems (IVPs) come in various forms and there is no such thing as a perfect all purpose
IVP solver. MATLAB offers you quite a choice. Will try to show you how to choose which one to use
for a given problem.
We say that an ODE problem is
3. Non-stiff if all components of the equation evolve on the same timescale. This occurs (roughly)
if the Jacobian matrix ∂f /∂u has all its eigenvalues of similar size.
53
4. Stiff if different components of the system evolve on different time scales. These are very common
in chemical reactions with reactions going on at different rates and in ODEs resulting from spatial
discretisations of PDEs. They also occur in PDEs where different modes (Fourier modes) evolve
at very different rates. Stiff problems are much harder to solve numerically than non-stiff ones.
Hamiltonian equations arise very commonly in rigid body mechanics, celestial mechanics (astron-
omy) and molecular dynamics. To solve a Hamiltonian equation accurately over long time periods
we must use special numerical methods, such as symplectic or reversible methods.
Thus, the IVP solver attempts to use the best method to keep the (estimated) error within a prescribed
tolerance. Whilst it is essential to have some form of error control, no such method is infallible. The
numerical solution of IVPs is well covered in many texts, for example A.Iserles “A first course in the
numerical solution of differential equations.”
6.2 Quadrature
Suppose that f (t) is an arbitrary function, how accurately can we find
Z b
u= f (t)dt ? (6.2)
a
MATLAB determines this integral approximately by using a composite Simpson’s rule.The idea behind
this is as follows:
Suppose we take an interval of length 2h, without loss of generality this is the interval [0, 2h].
This gives
2h
h
Z
f dx ≈ [f (0) + 4f (h) + f (2h)] ≡ Sh
0 6
This approximation is unreasonably accurate. It can be shown (see Froberg, “Numerical Analysis”)
that Z 2h
h5
|Sh − f (t)dt| = max |f (iv) (ξ)| where 0 < ξ < 2h (6.3)
0 90
54
The local error is proportional to h5 and to f (iv) . The traditional use of Simpson’s rule to evaluate (6.3)
over [a, b] breaks this interval into sub-intervals of length 2h takes h constant between a and b and adds
up the results to get a total error estimated by
h4
|b − a| max(|f (iv) |).
90
MATLAB is more intelligent than this and it uses an adaptive version of Simpson’s rule. In this
procedure h is chosen carefully over each interval to keep the error estimated by (6.2) less than a user
specified tolerance. In particular h is small when f is varying more rapidly and f (iv) is large. The
integrals over each sub-interval are then combined to give the total. This method is especially effective
if f has a singularity.
The procedure uses the instruction
> i = quad (\mbox{@fun}, a,b,\mbox{tol})
where fun is the function to be integrated and tol is the tolerance. There is another MATLAB code
>quadl. This approximates f by a higher order polynomial. This is more accurate if f is smooth but
unreliable if f has singularities.
• We set U1 = u0
This method is easy to use and each step is fast as no equations need to be evaluated and there is only
one function evaluation per step. The Forward Euler method is still used when f is hard to evaluate and
there are a large number of simultaneous equations. Problems of this kind arise in weather forecasting.
The main problem with this method is that there are often severe restrictions on the size of h.
Local Error.
At each stage of the method a small error is made. These errors accumulate over successive intervals
to give an overall error. If the exact solution u(t) is substituted into the equation (6.4), there is a
mismatch E between the two sides, called the local truncation error or LTE. This is a good estimate
for the error made by the method at each step. It is estimated by
h2 ′′
|E| = |u(nh) − u((n − 1)h) − hf ((n − 1)h, u((n − 1)h))| = |u (ξ)|, (6.5)
2
where
(n − 1)h < ξ < nh
55
As in quadrature the local truncation error is proportional to a power of h, in this case h2 and a higher
derivative of u, in this case u′′ . So, if u′′ is large (rapid change), we must take h small to give a small
error. The smaller the value of h is, the more accurate the answer will be. (We will see later that h is
also restricted by stability considerations).
Global Error
Each time the method is applied an error is made and the errors accumulate over all the calculations.
If we want to approximate u(T ) then need to make T /h + 1 ≡ N calculations so that UN ≈ u(T ). We
will call ε the global error if
T
ε = |UN − u(T )|, N = + 1.
h
This can be crudely estimated by
N h2
ε≈ max|u′′ | ≈ T h max|u′′ |.
2
We note that the overall error is proportional to h and to max|u′′ |. We say that this is an O(h) or a
first order method.
The estimate we have obtained implies that the global error grows in proportion to both h and T . In
fact, this is rarely observed. In the worst case for some problems the error grows in proportion to eT ,
when looking at periodic problems it can grow like T 2 and for problems with a lot of symmetry, the
errors can accumulate much more slowly so that the overall error may not grow at all with T .
We start by looking at the worst case analysis.
We say that f is globally Lipshitz if there is a constant L such that
It can then be shown (see A. Iserles “A first course in the numerical solution of differential equations”)
that if |u′′ | ≤ M then the global error has an upper bound given by
hM LT
ε≤ e −1 (6.7)
2L
If T is fixed and h → 0 then the upper bound given in (6.7) shows that ε is proportional to h in this
limit. In practice the Forward Euler method is rarely used in this way. We often take h fixed and let
T → ∞. This is very dangerous as errors can accumulate rapidly making the method unusable. In
this case we need a more careful control on the global growth of errors. This is the philosophy behind
geometric integration based methods.
Variable time-stepping
As in the quadrature process, Euler’s method can be used with a variable step size. This allows some
control over the local error. To do this we take a sequence of (small) time steps, so that the solution is
approximated at times tk with so that U k ≈ u(tk )
hk ≡ tk+1 − tk .
If we have an estimate for u′′ (ξ), ξ ∈ [tk , tk+1 ] then hk is chosen so that at each step
h2k |u′′ |
E= < T OL
2
where TOL is specified by the user. Of course, without knowing u(t) we do not know u′′ in general.
We can either estimate an upper bound a -priori to the calculation or we can try to deduce its value
(a-posteriori) from the numerical calculation. This latter procedure often uses an error estimate call
the Milne device. (See Iserles.)
56
6.3.2 The Runge-Kutta method
A much more locally accurate method is the Runge-Kutta method. Its most famous form is called the
du
explicit fourth order Runge-Kutta or the RK4 method. Suppose that the ODE is = f (t, u). Then
dt
if we know Un , and set t = (n − 1)h the value of Un+1 is given by the sequence of operations
k1 = hf (t, Un )
h n k1
k2 = hf t + , U +
2 2
h k2
k3 = hf t + , Un +
2 2
n
k4 = hf (t + h, U + k3 )
1
Un+1 = Un + (k1 + 2k2 + 2k3 + k4 )
6
It can be shown that there is a value C which depends on f in a complex way such that a local truncation
error E = |u(t + h) − Un+1 | is bounded by
E ≤ Ch5
Over a large number of steps these errors accumulate as before to give a global error ε of the form:
ε ∼ C(eLT − 1)h4
The error is proportional to h4 . Hence the name an order 4 method. The error for a given h is much
smaller than for the Forward Euler method. This method is VERY widely favoured as
1. It is easy to use and no equations need to be solved at each stage.
3. The method cannot be used for stiff problems unless h is very small.
I like to think of the RK4 method as being like a Ford Fiesta. It is easy to use, everyone uses it
and it is good value for money. If you are going to the shops (i.e. solving a relatively straightforward
problem) it is the method to use. However, it won’t handle rough country nor would you want it for a
very long drive.
To improve the accuracy the RK4 method is often used together with another higher order method to
estimate the local error and choose h accordingly. Suppose that the value of Un+1 is given by an RK4
method. We could also use a different higher order method to calculate a separate value Ûn+1 . Both
Un+1 and Ûn+1 approximate u(t + h) so that:
57
If h is small then |C|h5 ≈k Un+1 − Ûn+1 k so the difference between the two calculations gives an
estimate for the local error of the RK4 method.
We can now use this estimate as follows. Given Un and h
1. Using Un and step size h, calculate Un+1 , Ûn+1 and En+1 =k Un+1 − Ûn+1 k.
2. If TOL
32 < En+1 < TOL then accept the step
This method keeps the local errors below the specified tolerance and also (due to 3) makes an efficient
choice of step size. However it does not control the growth of errors. MATLAB uses such a Runge-
Kutta 4,5 pair above the form developed by Dormand & Prince in the ode45 routine. There is a similar
(cheaper but less accurate) Runge-Kutta 2,3 pair implemented in the routine ode23. In this routine
you can set both the Absolute Tolerance (as above) or the Relative Tolerance TOL/ k u k. You can the
ode45 routine as follows:
Here t is the vector of times at which the approximate solution vector U is given. Because the step-size
h is chosen at each stage of the algorithm these times will not (necessarily) be equally spaced. The
function f (t, u) is specified by ‘fun’ and u0 is the vector of initial conditions. The time interval for the
calculation is given by trange. Setting
trange = [a b]
means that the ode45 routine will integrate the ODE between a and b, giving its output at the (variable)
time steps it computes. Alternatively
trange = [a: c: b]
leads to output at the points a, a + c, a + 2c etc. The options command is optional, but it allows control
over the operation of the ode45 routine. In particular you can set the tolerances and/or request statistics
on the solution. The options are set by using the odeset routine. For example if you want an absolute
tolerance of 10−12 you type
By default the absolute tolerance is 10−6 and the relative tolerance is 10−3 .
58
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% Example 3: Use of ode45 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% Consider the ODE du/dt = -2t u^2, u(0) = 1 %
% ========================== %
% %
% This has the solution u(t) = 1/(1 + t^2) %
% ================== %
% %
% So that u(1) = 1/2 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% We can solve this using the ode45 method with %
% tolerances 1e-12,to find a numerical approximation for u(1)%
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
options=odeset(’AbsTol’,1e-12,’RelTol’,1e-12);
trange = [0:0.05:1];
unit = [1];
[t,U] = ode45(@frk,trange,unit,options);
s = size(U);
q̇ = ∂H/∂p, ṗ = −∂H/∂q
59
Multiplying the differential equation by u̇ and integrating with respect to time we find that
More generally, in an autonomous Hamiltonian system, the Hamiltonian H is a constant for all times.
This is an example of a conservation law. Many physical systems conserve particular quantities over
all times. An excellent example of this is the solar system considered in isolation with the rest of the
universe. This obeys a complicated set of differential equations with complex (and indeed chaotic)
solutions. However the total energy, angular momentum and linear momentum are conserved for all
time.
To retain the correct dynamics of such a system in a numerical approximation it is essential that the
numerical approximation either exactly conserves the same invariants or (more usually) the approximate
equivalent of any conserved quantity varies from a constant by a small but bounded amount for all times.
If this occurs then the approximate solution is likely to be much closer to the true solution for all times.
The Forward Euler method, RK4 and ode45 are not good at preserving such conserved quantities. For
example if we consider the system
u̇2
ü + f (u) = 0, + F (u) = H
2
An application of the Forward Euler method with U n ≈ u((n − 1)h), V n ≈ u̇((n − 1)h) gives
U n+1 = U n + hV n , V n+1 = V n − hf (U n )
so that
(V n+1 )2
Hn+1 = + F (U n+1 )
2 2
= [V n − hf (U n )] /2 + F (U n + hV n )
h2
f (U n )2 + (V n )2 ∂f /∂u + O(h3 )
= Hn +
2
Thus H changes by
h2
f (U n )2 + (V n )2 ∂f /∂u + O(h3 )
Hn+1 − Hn =
2
at each iteration. If ∂f /∂u > 0 then H is not conserved and increases at each iteration as the errors
accumulate so that the global error in H grows like nh2 . In the RK4 method we have the better result
that
Hn+1 − Hn = h5 Cn .
Again, in general the values of Cn are positive for many problems and the errors in H accumulate
over a large number of iterations, with global errors growing like nh5 . A widely used method which
avoids many of these problems and has excellent conservation problems is the Störmer - Verlet method
(SV) . This is the method of choice for simulations of celestial and molecular dynamics. Not only is
it (much) better than RK4 for long-time integrations it is also much cheaper and easier to code up as
it only requires one function evaluation per time step. It is also explicit, has global errors O(h2 ) and
it is symmetric. The disadvantage of the SV method is that it can only be used for a certain class
of problems (those with a separable Hamiltonian) and it is not a black-box code i.e. it requires some
thought to use it. However this is precisely what mathematicians are paid to do!
For the problem ü + f (u) = 0 the SV method takes the following form
U ∗ = U n + h2 V n
V n+1 = V n − hf (U ∗ )
U n+1 = U ∗ + h2 V n+1
60
This appears to be worse than RK4. However, unlike RK4 the errors DO NOT accumulate and tend
to cancel out. It can be shown that if H is the exact Hamiltonian then
where D does not depend on n. So, although H is not exactly conserved, the method stays close to it
for all time.
We now apply this to an example with f (u) = u3 .
61
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% Example 4: Stormer-Verlet method for %
% %
% u’’ + u^3 = 0, u(0) = 1, u’(0) = 0 %
% %
% In the exact eqn H is constant where %
% %
% H = (u’)^2/2 + u^4/4 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
t(1) = 0;
U(1) = 1;
V(1) = 0;
h = 0.1;
H(1) = 1/4;
for i=2:100
t(i) = t(i-1) + h;
H(i) = V(i)^2/2 + U(i)^4/4;
end
plot(t,H)
62
1
0.8
0.6
0.4
v
0.2
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
0.252
0.25
0 1 2 3 4 5 6 7 8 9 10
t
63
6.4 Stiff Differential Equations
6.4.1 Definition
In a stiff differential equation the solution components evolve on very different timescales. This causes
a problem as a numerical method, such as ode45, possibly chooses a step size for the most rapidly
evolving component, even if its contribution to the solution is negligable. This leads to very small step
sizes, highly inefficient computations and long waits for the user! The reason for this is an instability
in the method, where a small error may grow rapidly with each step.
e = eAt e0
e(t) = U eΛt U −1 e0 .
Aφi = λi φi φi : eigenvector
DEFINITION
The system is stiff if the matrix A has eigenvalues λi for which
64
Typically, in an application a ratio of the largest over the smallest absolute value of the eigenvalues
of over 10 is considered to lead to a stiff system
In a physical system the components for which |λj | is large, and the real part of λj is negative, decay
rapidly and are not seen in the solution apart from some initial transients. It is therefore somewhat
paradoxical that it is precisely these components which lead to instabilities in the numerical scheme.This
is another way of saying that the solution has components that evolve at very different rates.
Un+1 = Un + hf (Un )
so that
Un+1 = Un + hAUn = (I + hA) Un
Therefore
Un = (I + hA)n−1 U1
But I + hA has the same eigenvectors φj as A and has eigenvalues
1 + hλj
(1 + hλj )n−1
or more precisely as |1 + hλj |n−1 . In particular the errors caused by truncation error or by rounding
error only decay if |1 + hλj | < 1 for all λj .
We now find an extraordinary paradox. Suppose that λj is real and that
λj = −µ with µ≫1
Then
eλj t = e−µt ≪ 1, if t≫1
However
|1 + hλj |n = |1 − µh|n ≫ 1 if µh > 2 and n ≫ 1.
So the most rapidly decaying components of the perturbation to the continuous solution are the com-
ponents of Un which are growing most rapidly.
We say the Forward Euler method with step size h is stable if given a matrix A with eigenvalues λj
with the real part less than zero then |1 + hλj | < 1 for all λj . In particular if λj are all real then the
method is stable only if
h < 2/max|λj |
This is the restriction on h which means that solution errors do not grow. If h is larger than this bound
then errors grow and the method is unstable.
Here we see the problem. A component in the direction of φj with large |λj | and with real part λj < 0
dominates the choice of step size even though this component is very small.
65
• The LOCAL (TRUNCATION) ERROR that is made at each stage of the calculation is given by
h2 |u′′ |
.
2
So the error made at each stage depends on the SOLUTION. If all of the eigenvalues of A have
negative real part then this error is ultimately dominated by the component of the solution that
decays most slowly, which is in turn determinated by the eigenvalue of A with the largest real part.
Soi that it is the eigenvalue with the real part closest to zero, and typically this is the eigenvalue
with the smallest modulus.
• In contrast the GROWTH of the error as the iteration proceeds depends on the eigenvalue of A
with the largest modulus, even though this eigenvalue may have a large negative real part and
thus not contribute to the solution in any way.
There are therefore two restrictions on h, it must be small both for accuracy at each stage and for
stability to stop the errors growing. Stiffness arises when the restriction on h for stability is much more
severe than the restriction for accuracy.
This introduces us to the ideas of stability and instability. It is surprisingly hard to give a precise
definition of what we mean by instabiliy in a numerical method which accounts for all cases - later on we
will give a precise definition which covers certain cases, but for the present we will have the following.
INFORMAL DEFINITION OF INSTABILITY
A numerical method to solve a differential equation is unstable if the errors it makes (or indeed its
solution) grow more rapidly than the underlying solution. If the errors decay then it is stable.
Exercise Check the definition of stability for the Forward Euler method is consistent with this informal
definition.
Returning to the Forward Euler example. If h > 0 and λ = p + iq then |1 + hλ| < 1 implies that
2hp + h2 p2 + h2 q 2 < 0
2p + hp2 + hq 2 < 0.
1
So that the method is stable if (p, q) lies in the circle of radius h shown in Figure 6.5. In this figure
30
UNSTABLE q
20
10
STABLE
0
p
−10
−20
−30
−30 −20 −10 0 10 20 30
the shaded region shows the values of p and q for which the numerical method is stable. Recall that
the original differential equation is stable provided that p lies in the half-plane p < 0. The shaded
66
region only occupies a fraction of this half-plane, although the size of the shaded region increases as
h → 0. Thus, for a fixed value of h the numerical method will only have errors which do not grow if
the eigenvalues of A are severely constrained. Unfortunately this is often not the case, particularly in
discretisations of partial differential equations.
Un+1 = (I − hA)−1 Un
Now, if the eigenvalues of A are λj , those of (I − hA)−1 are (1 − hλj )−1 , with the same eigenvectors.
The Backward Euler method is very reliable and has other nice properties (a maximum principle) which
make it especially suitable for solving PDEs. The main disadvantage to using it is that we must solve
a nonlinear equation to find Un+1 . This is an example of the basic principle that there is no such
thing as a free lunch! The penalty of a stable method is the need to do more work. Note, however,
that the Störmer - Verlet method is a good approximation to a free lunch. Finding Un+1 when the
system has a high dimension (e.g.104 ) is a considerable task, especially as this calculation must be done
quickly and often. Usually we use an iterative method such as the Newton-Raphson or Broyden method.
Fortunately a good initial guess is available namely a value Ûn+1 which is obtained by taking one-step
of an explicit method (e.g Forward Euler) applied to Un . This procedure is called a predictor-corrector
method in which the explicit method predicts the value Ûn+1 which is then corrected (usually by an
iterative method) to give Un+1 .
67
30
q
STABLE
20
10
UNSTABLE 2/h
0
p
1/h
−10
−20
−30
−30 −20 −10 0 10 20 30
1 + hλj
2
1 − hλj
2
If we take λ = p + iq then
2 2
hλ 2 ph
+ q 2 h4
1 + 1+ 2
2
θ≡ =
hλ 2
1 − ph 2
2 1− 2 + q 2 h4
Thus 2 2
q2 q2
ph ph
θ<1 if 1+ + < 1− + i.e if p<0
2 4 2 4
The stability region of the numerical method is thus the half-plane p < 0 illustrated in Figure 6.7
which is identical to the region of stability of the underlying differential equation. As a consequence the
dynamics of the solution of the trapezium rule exactly mirrors the true dynamics. This is an excellent
state of affairs.
68
The local error LTE of the Trapezium Rule is given by:
h3 |u′′′ |
LT E = .
12
If h is constant the overall error is then proportional to h2 . To estimate the step-size we keep LT E <
T OL much as before.
The Trapezium Rule, together with a third order method to control the local error, is implemented in
the MATLAB routine ode23t . Here the (second order implicit) Trapezium Rule is a good one to use
if accuracy is not essential. It is very reliable and relatively cheap, but the overall error is quite high
compared to (say) ode45.
30
20 STABLE
UNSTABLE
10
0
p
−10
−20
−30
−30 −20 −10 0 10 20 30
Figure 6.5: Stability Region for the Trapezium rule and the Implicit Mid-Point rule.
The Implicit Mid-point rule is the simplest example of a sequence of implicit Runge-Kutta methods
called Gauss-Legendre methods. If you want to solve an ODE very accurately for long times with
excellent stability, but regardless of cost, then these are the methods to use. Gauss-Legendre methods
are the Lamborghinis of the numerical ODE world.They are expensive and hard to drive, but they have
style! They are implemented in Fortran in the excellent AUTO code.
69
6.4.6 Multi-step methods
These are the most widely used methods for stiff problems and include Adams and BDF methods and
the Trapezium rule. There is a vast literature on them, see Iserles. Multi-step methods are very flexible
to use and are relatively easy to analyse. They take the form
k
X k
X
al Un−l = h bl fn−l
l=0 l=0
In these methods you use information from previous time steps and (in an explicit method) one function
evaluation to find Un . Thus they are cheaper than RK4 and potentially more accurate. Some properties
of these methods are as follows.
• The method has order p if the truncation error at each stage is given by Chp+1 u(p+1) and the
overall method has error of order p, so that the global error is proportional to hp .
• These methods are implicit if b0 6= 0 and explicit if b0 = 0. Explicit methods give Un directly
whereas implicit methods need an equation to be solved.
DEFINITION
A multi-step method is A-stable if when used to solve the equation
u′ = Au,
and all eigenvalues of A have negative real part, then |Un | → 0 as n → ∞, for all values of h > 0. More
precisely, a method is A-stable if the roots z of the polynomial equation Σal z −l − hλΣbl z −l = 0 have
modulus less than or equal to one if the real part of λ is less than or equal to zero.
A-stability is a very desirable property for stiff problems, which the Trapezium Rule has. However, it
is almost unique in having this property as the following theorem shows:
Theorem Only implicit methods of order less than or equal to 2 can be A-stable.
Implicit Multi-step methods involve solving non-linear equations. As before the usual method to do
this is a predictor corrector method namely you generate an approximation Û for Un using an explicit
predictor method and then correct this to find Un . An easy corrector is to use an iterative one. Suppose
we set U0n = Ûn where Ûn is a predicted value obtained using an explicit method. We now perform
the following iteration to find a sequence of approximations Urn to Un :
" k #
bl f˜n−l /a0
X X
r r−1
U = −
n al U n−l+h
l=1
where
f˜k = f (tk , Ur−1
k ),
As in the Runge-Kutta methods, it is common to use two multistep methods simultaneously. One to
perform the numerical solution. The other to give an estimate of the error. In the celebrated Gear
70
solvers (named after their inventor W.Gear) the method chooses what multi-step method to use at each
step from a range of methods of different orders (and stability ranges), based on an estimate of the error
from the two methods (the latter is called the Milne device).
MATLAB has an excellent stiff Gear solver given by ode15s . It is used in exactly the same way as
ode45 and is the routine to use for most problems, if you don’t know much about their structure.
An important sub-set of multi-step methods are BDF (backwards difference form) methods. BDFn
methods have order n (global errors proportional to hn ) and BDF1 is just the Backwards Euler method.
The important Fortran ODE code DDASSL and ode15s are both based on BDF methods and their
extensions. The first three BDF methods are given by:
Thus we arrive at a differential equation for q. We call problems of this form index-1 problems. If we
have to differentiate twice with respect to t to get a differential equation for q we call this an index-2
problem, and if three differentiations are needed an index-3 problem. Electronics and chemistry tend to
lead to index-1 problems, fluid mechanics to index-2 problems and robotics to index-3 problems. MAT-
LAB can handle index-1 problems but has difficulties with problems of a higher index. Full details on
the theory and computation of DAEs are given in the very readable book by U. Ascher and L. Petzold.
71
It is not at all obvious how to apply an explicit method to solve such equations, especially to ensure
that the constraint g(p, q) = 0 is met at each stage.
However, it is easy to code these up using a BDF method. For example if we apply BDF2 to the DAE
(6.5) we get:
4 1
pn − 3 pn−1 + 3 pn−2 = 23 h f (pn , qn )
0 = g(pn , qn )
As before we have to solve Nonlinear equations to find pn and qn , but these equations are no worse to
solve than before. In ddassl these equations are solved using quasi-Newton methods in which conjugate-
gradient methods are used to speed up the linear algebra.
The code ode15s can solve DAEs of the form
M u̇ = f (u)
72
Chapter 7
7.1 Introduction
Two point boundary value problems (2pt BVPs) take the form:
u̇ = f (x, u) u ∈ IRn
g1 (u(a)) = α x ∈ IR
g2 (u(b)) = β
1. They are one-dimensional elliptic equations and arise in their own right as descriptions of physical
problems. For example, the equation for the deflection of the Euler strut is given by:
and the equation for rock deformation (see work by CJB and Professor Giles Hunt) by:
2. As steady states of parabolic or hyperbolic partial differential equations. For example the limit of
the solutions of
∂u/∂t = ∂u/∂x − f (x, u)
in the limit of t → ∞ [Often a good way to solve the steady state problem is to convert it to such
a time-dependent problem].
∂θ ∂θ ∂2θ
+θ =ǫ 2
∂t ∂z ∂z
If we look for a travelling wave solution with
73
and
θ(−∞, t) = 1, θ(∞, t) = −1.
Then u satisfies the BVP
du du d2 u
−c +u =ǫ 2
dx dx dx
u(−∞) = 1, u(∞) = −1.
which is a two-point boundary value problem.
A special (and very hard) case of two-point BVPs is given when a = −∞ and/or b = ∞. Here we look
for ground state, homoclinic or heteroclinic solutions. These are very common in PDEs as self-similar
solutions or in quantum mechanics as bound-states We have just seen such a problem in the travelling
wave example.
There is a huge difference between BVPs and IVPs. In general an initial value problem always has
a unique solution. However a BVP may have one, several (sometimes ∞) or no solutions.
The theory of such problems is very subtle. An excellent description of this and of numerical methods
for them is given in the book on 2pt BVPs by Ascher, Matheij and Russell. There are many methods
for solving BVPs. These include shooting methods, finite difference methods, finite element methods,
spectral methods and collocation methods. The latter are used in the MATLAB code bvp4c and
in Fortran codes such as COLSYS, MOVCOL and AUTO. They are especially suitable for use with
adaptive and non-uniform meshes. In this course we will look at shooting methods and finite difference
methods.
The idea behind shooting methods is simple. Let y be a solution of the initial value problem
dy
= f (x, y), y(a) = z ∈ IRn (7.1)
dx
Suppose we take general initial conditions y(a) = z. The initial value problem (7.1) can always be
solved for such general values so that y(b) is then a function F(z) of z. If we can find a suitable vector
z so that
g1 (z) = α and also g2 (F(z)) = β
then we will have found a solution of the corresponding boundary value problem by simply setting
u(x) ≡ y(x).
The procedure works as follows
1. At the point x = a we have g1 (z) = α. As g1 ∈ IRm this condition fixes m components of the
vector z say z1 , . . . , zm . The other n − m components (zm+1 , . . . , zn ) can be chosen freely.
2. Next use an initial value solver such as ode15s to calculate y(b) ≡ F(z).
74
3. To satisfy the boundary conditions we must have g2 (F(z)) = β. However, as g2 ∈ IRn−m we have
n − m (usually nonlinear) equations to be satisfied for the n − m unknowns zm+1 , . . . , zn .
These nonlinear equations can be solved using a nonlinear solver such as the Newton-Raphson
algorithm or a MATLAB routine such as fzero or fminsearch.
Shooting methods perform well for problems without boundary or internal layers. For those with
boundary layers the effects of exponentially growing terms make these methods hopelessly ill conditioned
and unusable. There are extensions of shooting methods called multi-shooting methods which can be
used for problems with boundary layers. However these are very problem dependent and are not easy
to use.
Example 1
The easiest example of an application of these methods is to linear boundary value problems of the form
d2 u du
a(x) 2
+ b(x) + c(x)u = d(x)
dx dx
with the boundary conditions u(0) = 0, u(1) = 0.
We consider the initial value problem
ay ′′ + by ′ + cy = d (7.2)
with y(0) = z1 and y ′ (0) = z2
The first boundary condition on u forces z1 = 0. However z2 is arbitrary at this stage.
Because of the linearity of (7.2) it follows that there are constants p and q so that
y(1) = p + qz2
Example 2
Suppose now that we want to solve the nonlinear boundary value problem
75
6
root at 33.4618
−2
−4
−6
−8
0 5 10 15 20 25 30 35 40 45 50
z2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% Shooting code %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function F=funn(x)
s = size(t);
ss = s(1);
F = u(ss,1);
--------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% u’’ + u^2 = 1 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function f=u2(t,u)
f=zeros(2,1);
f(1) = u(2);
f(2) = 1-u(1)^2;
--------------------------------------------------------
One way to find z2 is to use the MATLAB command fzero given an initial guess of z2 = 30. Notice
the way that this code uses the function funn computerd using ode15s.
>> z2 = fzero(’funn’,30)
z2 =
76
33.4618
30
20
u
10
du/dx
−10
−20
−30
−40
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
An alternative method, which exploits the structure of the problem, is to use a Newton-Raphson method.
As F (z2 ) = y(1) it follows that
dF dy(1)
= = φ(z2 )
dz2 dz2
where the function φ(x) satisfies the variational equation
using ode15s. We then use a Newton-Raphson iteration to improve the guess via the equation
77
We now can use (7.5) to approximate u′′ by
then (7.8) is a discretisation to the second order BVP with Dirichlet boundary conditions which is given
by
u′′ = f (x), u(a) = α, u(b) = β.
Replacing U by u in the expression (7.8) implies that we would make an error of order h2 as h → 0.
Note that the structure of A forces U1 = α and UN = β. The vector U can then be found via the
operation
α
f2
U = A−1
fN −1
β
Provided that the problem is not too irregular it can then be shown that in the limit of N → ∞ (h → 0)
we have
Ui = u(xi ) + O(h2 ) as h → 0.
The solution of the BVP thus becomes (under discretisation) the problem of solving the linear system
(7.8).
If f = [α, f2 , . . . , fN −1 , β]′ then this can be done in MATLAB via the command A\f .
Whilst this is simple to use it is not especially efficient as the backslash operator does not exploit the
special tri-diagonal structure of Matrix A. A much more efficient algorithm of complexity O(N ) is the
Thomas algorithm which exploits this structure and is based on the LU decomposition. This algorithm
is described in Assignment 4. (Note, use of the MATLAB sparse matrix routines will speed things up
here.)
Efficiency is important as the relatively large error of O(N −2 ) in this method means that we have to
take a large value of N to get a reasonable error estimate. More careful discretisations lead to smaller
errors at the expense of more work.
78
Problems such as this arise frequently in models of heat conduction and electrical flow. For example,
the boundary condition u′ (a) = 0 corresponds to a thermal or electrical insulator. There are many ways
to deal with such a derivative boundary condition. In the simplest we approximate
The expression (7.9) is only accurate to O(h) as an approximation to u′ (a). We can improve this by
using the approximation
u′ (a) ≈ (4U2 − U3 − 3U1 )/2h
Exercise. Show that this expression is accurate to O(h2 )
This leads to a slightly different matrix A. Unfortunately the resulting matrix is no longer tri-
diagonal and we cannot use the Thomas algorithm to invert it.
A special case of Neumann boundary conditions arises when u′ (a) = 0. In this case we can introduce
a “ghost” point U0 approximating u(a − h). The boundary condition implies that to ′(h3 ) we have
U0 = U2 . At the point x = 0 we have
U0 + U2 − 2U1 2U2 − 2U1
u′′ (a) ≈ 2
= (7.11)
h h2
This approximation to u′′ (a) is correct to O(h2 ). (Can you show this?)
The resulting matrix A is then given by
−2 2
1
1 −2 1
A= 2
h 1 −2 1
.. ..
. .
Note that in this case A is tri-diagonal, and we can again solve the linear system by using the Thomas
algorithm.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% Code to solve the Neumann two-point boundary value %
% problem: %
79
% %
% u’’ = -exp(x), u’(0) = 0, u(1) = 0. %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
N = 101;
h = 1/(N-1);
x = [0:h:1];
0.9
0.7
0.6
0.5
0.4
0.3
0.2
Dirichlet condition
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A second example of differential equations occurs for BVPs with periodic boundary conditions for which
u(a) = u(b) and u′ (a) = u′ (b). Periodic boundary conditions often arise when looking at those BVPs
that describe the travelling wave solutions of hyperbolic equations. They also arise naturally when
80
solving BVPs on circles or spheres. A natural example of the latter being weather forecasting on the
whole globe. Often BVPs with periodic boundary conditions are solved using spectral methods in which
the solution is expressed as a combination of trigonometric functions. However they can also be solved
by using finite difference methods. Exploiting periodicity we have
1/h2
.. ..
. . 0
A= .. ..
. .
0
1/h2 ... ... 2
1/h −2/h 2
The resulting discretisation of u′′ then has an error of O(h2 ) as before. The matrix A is not tri-diagonal,
but its special periodic structure means that it can be inverted quickly by using the FFT.
These arise in fluid mechanics problems, where ǫ is a measure of the viscosity of the fluid. The most
accurate discretisation of u′ is given by the central difference discretisation
δU Ui+1 − Ui−1
u′ ≈ ≡
2h 2h
A simple calculation shows that the central difference gives an error O(h2 ) when approximating u′ .
However this discretisation of u′ leads to problems. In particular A−1 may be nearly singular and ill-
conditioned. If ǫ is zero then A has an eigenvector of the form e = (1, −1, 1, −1, 1, −1, . . .)T with zero
eigenvalue eigenvalue so that A−1 is singular. (Check this!)
For small ǫ, A continues to have a similar eigenvector to ǫ with a small eigenvalue. When inverting
A the contribution of e to the solution is greatly amplified, and can lead to an oscillatory contribution
to the solution. This instability is BAD but can easily be detected as it is on a grid scale. i.e all
oscillations in the solution are the same size as the mesh.
Oscillations can be avoided by taking h small enough to avoid this instability so that
h ≤ 2ǫ
81
However the resulting matrix A is much better conditioned and inverting it does not lead to spurious
oscillations in the solution What this approximation is doing is to exploit a natural flow of information
in the system.
This is not always a good solution as often we can’t tell in advance if we need to use the upwind
difference or the downwind difference Ui −U h
i−1
. However, like a stiff solver, the use of the upwind
difference allows us to use a step size governed by accuracy rather than stability. An example of the
effect of using these different discretisations is given in Assignment 4.
An important class of nonlinear two point BVPs (called semilinear problems) take the form
and we need to develop techniques to solve them. Two examples of such problems are given by the
differential equation
u′′ + eu/(1+u) = 0, u(a) = u(b) = 0
which models combustion in a chemically reacting material, and
2 ′
u′′ + u + k(x)u5 = 0, u′ (0) = 0, u(∞) = 0, u > 0,
x
which describes the curvature of space by a spherical body.
A special case is given by the equation
Discretising the differential equation (7.14) leads to a set of nonlinear equations of the form
AU + f (U ) = 0 (7.15)
fi ≡ f (Ui ).
As with most nonlinear problems, we solve this system by iteration starting with an initial guess
although, be warned, the system may have one, none or many solutions. Perhaps the most effective
such method is the Newton-Raphson algorithm. Suppose that U (n) is an approximate solution of (7.15),
we define the residual R(n) by
AU (n) + f (U (n) ) = R(n) . (7.16)
The size of R(n) is a measure of the quality of this solution.
Lϕ ≡ Aϕ + Jϕ.
82
The Newton-Raphson iteration updates U (n) to U (n+1) via the iteration
Here the vector W ≡ L−1 R(n) can be found by solving the linear system
AW + JW = R(n) .
This algorithm converges rapidly (often in about 5 iterations) if the initial guess U (0) is close to the true
solution. Finding such an initial guess can be difficult for a general problem and often requires some
a-priori knowledge of the solution.
The Fortran code AUTO uses the Newton-Raphson method coupled to a path-following (homotopy)
method to find the solution of a version of the nonlinear systems arising from discretisations of nonlinear
BVPs obtained by using collocation.
83