APMA 0160 (A.
Yew) Spring 2011
Curve fitting: least squares methods
Curve fitting is a problem that arises very frequently in science and engineering.
Suppose that from some experiment n observations, i.e. values of a dependent variable y measured at
specified values of an independent variable x, have been collected. In other words, we have a set of n
data points
(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )
The first step in constructing a mathematical model of the underlying physical process is to plot these
data points and postulate a function form f (x) to describe the general trend in the data.
Some simple functions commonly used to fit data are:
• straight line: f (x) = ax + b
• parabola: f (x) = ax2 + bx + c
• polynomial: f (x) = am xm + am−1 xm−1 + · · · + a2 x2 + a1 x + a0 (includes the previous two cases)
• exponential: f (x) = c exp(ax)
• Gaussian, e.g. f (x) = c exp(−bx2 )
• sine or cosine, e.g. f (x) = a cos(bx) + c
The coefficients a, b, c etc. in the formula of f (x) are parameters that we can adjust.
Of course, since there are inevitable measurement errors in the data, in general we would not expect
f (x) to fit the data perfectly. The best we can do is try to choose the parameters of the function so as
to minimize the fitting error —the distance between the data values yi and the y-values f (xi ) on the
fitted curve. The residuals are defined to be the differences between the observed y-values and those
given by the fitted curve at the x-values where the data was originally collected:
ri = yi − f (xi ) for i = 1, 2, . . . n.
The length-n array of ri values is called the residual vector r, and we aim to minimize the norm of this
vector. Recall from last lecture the three vector norms that are most widely used in applications; they
give rise to the following three standard error measures:
n n
1 1X X
• Average error: E1 (f ) = krk1 = |ri | = |yi − f (xi )|
n n
i=1 i=1
n
!1/2 v
u n
1 1 X
2
u1 X 2
• Root-mean-square error: E2 (f ) = √ krk2 = |ri | =t yi − f (xi )
n n n
i=1 i=1
• Maximum error: E∞ (F ) = krk∞ = max |ri | = max |yi − f (xi )|
i=1,2,...,n i=1,2,...,n
Suppose that the formula of f contains the parameters α1 , α2 , . . . , αK . Then the error quantity E(f )
that we wish to minimize will depend on these parameters, so we write it as E(α1 , α2 , . . . , αK ). From
multivariable calculus we know that to minimize E(α1 , α2 , . . . , αK ), we should solve the K equations
∂E
=0
∂α1
∂E
=0
∂α2 for α1 , α2 , . . . , αK
..
.
∂E
=0
∂αK
If we choose the parameters of f in order to minimize the root-mean-square error, then the process is
called “least squares fitting”.
q P 2
Minimizing the root-mean-square error E2 (f ) = n1 ni=1 yi − f (xi ) is equivalent to minimizing
n
X 2
krk22 = yi − f (xi )
i=1
Write
n
X 2
E(α1 , α2 , . . . , αK ) = yi − f (xi )
i=1
Then
n
∂E X ∂f (xi )
= 2 yi − f (xi ) · − ,
∂αk ∂αk
i=1
so we need to solve the equations
n
X ∂f (xi )
(?) f (xi ) − yi = 0, k = 1, 2, . . . , K
∂αk
i=1
When f (x) is a polynomial of degree m (with m + 1 coefficients), (?) can be written as a linear system
M α = β where
Pn 2m
Pn 2m−1 Pn m
Pn m
i=1 xi i=1 xi ··· i=1 xi i=1 xi yi
Pn Pn Pn Pn
x2m−1 2m−2 m−1 m−1
i=1 i i=1 xi ··· i=1 xi i=1 xi yi
M =
.. .. ..
, β= ..
..
. . . . Pn.
Pn m
P n m−1
i=1 ix i=1 xi ··· n i=1 yi
and α is the array of parameters (i.e. coefficients of the polynomial) that we solve for.