Module 4
Module 4
Curve Fitting
***
The process of constructing an approximate curve y = f (x), which fit best to a given discrete set
of points (xi , yi ), i = 1, 2, 3, ..., n is called curve fitting. Curve fitting and interpolation are closely
associated procedures. In interpolation, the fitted function should pass through all given data points;
whereas curve fitting methodologically fits a unique curve to the data points, which may or may not
lie on the fitted curve. The difference between interpolation and curve fitting; while attempting to
fit a linear function; is illustrated in the adjoining figure.
2
Curve fitting is a method of finding a suitable relation or law in the form y = f (x) for a set of
observed values (xi , yi ), i = 1, 2, . . . , n.
Such a relation connecting x and y is known as empirical law.
This relation is most suitable for predicting/estimating the value of y for a given value of x. The
method of least squares is as follows.
Suppose y = f (x) is an approximate relation that fits into a given data comprising (xi , yi ),
i = 1, 2, . . . , n, then yi ’s are called observed values and Yi = f (xi ) are called expected values. Their
difference Ri = yi − Yi are called the residuals or estimate errors.
The method of least squares provides a relationship y = f (x) such that the sum of the squares
of the residuals is least.
Consider a set of n given values (x, y) for fitting the straight line y = ax + b where a and b are
parameters to be determined. The residual R = y − (ax + b) is the difference between the observed
and estimated values of y. By the method of least squares we find parameters a and b such that the
sum of squares of the residuals is minimum (least).
Let n
R2
X
S=
1
n
[y − (ax + b)]2
X
i.e., S =
1
Treating S as a function of two parameters a and b, the necessary conditions for S to be minimum
∂S ∂S
are = 0 and = 0. i.e.,
∂a ∂b n X
2 [y − (ax + b)](−x) = 0
1
and n
X
2 [y − (ax + b)](−1) = 0
1
Dividing both the equations by 2, we have
n n n
ax2 +
X X X
− xy + bx = 0
1 1 1
n
X n
X n
X
− y+ ax + b=0
1 1 1
n
X
But b = b + b + b + . . . n times = nb and hence we have
1
x2 + b
X X X
a x= xy
X X
a x + nb = y
These equations are called normal equations for fitting the straight line y = ax+b in the least squares
sense. By solving these, we obtain the value of a and b.
3
Solution. The normal equations for fitting the straight line y = ax + b are
X X
y=a x + nb
x2 + b
X X X
xy = a x (n = 8)
56a + 8b = 40
524a + 56b = 364
a = 0.636363636 ≈ 0.64
b = 0.545454545 ≈ 0.55
y = 0.64x + 0.55
Example 4.1.2. Find the equation of the best fitting straight line for the following data and hence
estimate the value of the dependent variable corresponding to the value 30 of the independent
variable.
x 5 10 15 20 25
y 16 19 23 26 30
Solution. Let y = ax + b be the equation of the best fitting straight line. The associated normal
equations are as follows.
X X
y=a x + nb
x2 + b
X X X
xy = a x (n = 5)
4
x y xy x2
5 16 80 25
10 19 190 100
15 23 345 225
20 26 520 400
25 30 750 625
P P P P 2
x = 75 y = 114 xy = 1885 x = 1375
75a + 5b = 114
1375a + 75b = 1885
On solving we have,
a = 0.7
b = 12.3
Thus by substituting these values in y = ax + b we obtain the equation of the best fitting straight
line in the form y = 0.7x + 12.3 Further when x = 30 we obtain y = 0.7(30) + 12.3 = 33.3.
Example 4.1.3. Fit a straight line to the following data.
Year 1961 1971 1981 1991 2001
Production 8 10 12 10 16
(in tons)
Also find the expected production in the year 2006.
Solution. Let year and production respectively be represented by the variables x and y. We shall
fit the straight line in the form y = a + bx. Since the values of x are large in magnitude, we may
prefer to modify the same by choosing a convenient origin somewhere in the middle.
X = x − 1981 and the line of fit will be y = a + bX
The normal equations associated with y = a + bX are as follows.
X X
y = na + b X
X2
X X X
Xy = a X +b (n = 5)
Consider a set of n given values (x, y) for fitting the curve y = ax2 + bx + c.
The residual R = y − (ax2 + bx + c) is the difference between the observed and estimated values
of y. We have to find parameters a, b, c such that the sum of the squares of the residuals is the least.
Let n
[y − (ax2 + bx + c)]2
X
S=
1
n
X
But c = c + c + c + ...n times = nc and hence we have
1
x4 + b x3 + c x2 = x2 y
X X X X
a
x3 + b x2 + c
X X X X
a x= xy
x2 + b
X X X
a x + nc = y
6
These equations are called normal equations for fitting the second degree parabola y = ax2 +bx+c
in the least square sense. By solving these we obtain the value of a, b, c.
Note 4.1.1. The normal equations for fitting a straight line or parabola can be written instantly
from the desired equation of the curve as follows.
P
We first apply summation ( ) to the desired equation keeping the constants a, b, c outside the
P P P
summation where the summation of pure constant terms like a, b, c are to be written as
na, nb, nc... respectively.
We then multiply the given equation by the independent variable x and apply summation again.
This will suffice for fitting a straight line. However in the case of parabola we must also multiply by
x2 and apply summation.
Example 4.1.4. Fit a second degree parabola y = ax2 + bx + c in the least square sense for the
following data and hence estimate y at x = 6.
x 1 2 3 4 5
y 10 12 13 16 19
x2 + b
X X X
y=a x + nc
x3 + b x2 + c
X X X X
xy = a x
x2 y = a x4 + b x3 + c x2
X X X X
(n = 5)
55a + 15b + 5c = 70
225a + 55b + 15c = 232
979a + 225b + 55c = 906
On solving we have
a = 0.2857 ≈ 0.29
b = 0.4857 ≈ 0.49
c = 9.4
x 1 2 3 4 5 6 7 8 9
y 2 6 7 8 10 11 11 10 9
Solution. Let y = a + bx + cx2 be the parabola of fit. The associated normal equations are
x2
X X X
y = na + b x+c
x2 + c x3
X X X X
xy = a x+b
x2 y = a x2 + b x3 + c x4
X X X X
(n = 9)
9a + 45b + 285c = 74
45a + 285b + 2025c = 421
285a + 2025b + 15333c = 2771
On solving we have,
a = −0.92857 ≈ −0.93
b = 3.52316 ≈ 3.52
c = −0.26731 ≈ −0.27
Consider y = aebx . Taking logarithms (to the base e) on both sides we get
But loge e = 1, so
loge y = loge a + bx
or
Y = A + BX (4.1.1)
X X
Y = nA + B X (4.1.2)
X2
X X X
XY = A X +B (4.1.3)
Solving (4.1.2) and (4.1.3) we obtain A and B. But loge a = A ⇒ a = eA ; Also b = B. Substituting
these values in y = aebx we get the curve of best fit, in the required form.
Consider y = axb . Taking logarithms (to the base e) on both sides we get,
or
Y = A + BX (4.1.4)
x y Y = loge y xY x2
0 8.12 2.0943 0 0
2 10 2.3026 4.6052 4
4 31.82 3.4601 13.8404 16
P P P P 2
x=6 Y = 7.8570 xY = 18.4456 x = 20
3A + 6b = 7.8570
6A + 20b = 18.4456
On solving we have,
A = 1.9361
b = 0.34145 ≈ 0.3415
Example 4.1.7. Fit an exponential curve of the form y = aebx by the method of least squares for
the following data
No. of petals 5 6 7 8 9 10
No. of flowers 133 55 23 7 2 2
Solution. Note: The preliminary steps are to be retraced as in the previous problem. We shall
prepare the relevant table with reference to the same. (n = 6)
x y Y = loge y xY x2
5 133 4.8903 24.4515 25
6 55 4.0073 24.0438 36
7 23 3.1355 21.9485 49
8 7 1.9459 15.5672 64
9 2 0.6931 6.2379 81
10 2 0.6931 6.9310 100
P P P P 2
x = 45 Y = 15.3652 xY = 99.1799 x = 355
6A + 45b = 15.3652
45A + 355b = 99.1799
10
On solving we have,
A = 9.4433
b = −0.9177
Example 4.1.8. Fit a least square geometric curve y = axb for the following data.
x 1 2 3 4 5
y 0.5 2 4.5 8 12.5
X X
Y = nA + b X
X2
X X X
XY = A X +b (n = 5)
x y X = loge x Y = loge y XY X2
1 0.5 0 -0.6931 0 0
2 2 0.6931 0.6931 0.4804 0.4804
3 4.5 1.0986 1.5041 1.6524 1.2069
4 8 1.3863 2.0794 2.8827 1.9218
5 12.5 1.6094 2.5257 4.0649 2.5902
X 2 = 6.1993
P P P P
X = 4.7874 Y = 6.1092 XY =
9.0804
5A + 4.7874b = 6.1092
4.7874A + 6.1993b = 9.0804
Example 4.1.9. Fit a curve of the form y = abx for the data and hence find the estimation for y
when x = 8.
x 1 2 3 4 5 6 7
y 87 97 113 129 202 195 193
11
X X
Y = nA + B x
x2
X X X
xY = A x+B (n = 7)
x y Y = loge y xY x2
1 87 4.4659 4.4659 1
2 97 4.5747 9.1494 4
3 113 4.7274 14.1822 9
4 129 4.8598 19.4392 16
5 202 5.3083 26.5415 25
6 195 5.2730 31.6380 36
7 193 5.2627 36.8389 49
P P P P 2
x = 28 Y = 34.4718 xY = 142.2551 x = 140
7A + 28B = 34.4718
28A + 140B = 142.2551
Problem 4.1.1. Find the equation of the best fitting straight line for the following data[1
to 6]
x 1 2 3 4 5
1.
y 14 13 9 5 2
x 0 1 2 3 4 5
2.
y 9 8 24 28 26 20
12
x 0 1 2 3 4 5 6
3.
y 2 1 3 2 4 3 5
x 62 64 65 69 70 71 72
4.
y 65.7 66.8 67.2 69.3 69.8 70.5 70.9
x 1 2 3 4 5 6 7
5.
y 80 90 92 83 94 99 92
6.
Year (x) 1911 1921 1931 1941 1951
Production 8 10 12 10 6
(y) (in
thousand
tons)
Fit a parabola of second degree for the following data [7 to 10]
x 0 1 2 3 4 5 6
7.
y 14 18 23 29 36 40 46
x 10 20 30 40 50 60
8.
y 157 179 210 252 302 361
x 0 1 2 3 4
9.
y 1 5 10 22 38
x 1 2 3 4 5
10.
y 25 28 33 39 46
11. Fit a curve of the form y = abx for the data and hence estimate y when x = 8.
x 0 1 2 3 4 5 6
y 32 47 65 92 132 190 275
13. Find the equation of the best fitting curve in the form y = aebx for the data
x 0 2 4
y 5.02 10 31.62
ANSWERS:
6. y = 0.16x − 302.56
11. y = 32.15(1.43)x , 562
7. y = 0.083x2 + 4.96x + 13.46
12. y = 2.98x0.31
8. y = 0.046x2 − 0.84x + 143.67