Curve Fitting
Part 5
1
Describes techniques to fit curves (curve fitting) to
discrete data to obtain intermediate estimates.
There are two general approaches for curve fitting:
• Least Squares regression:
Data exhibit a significant degree of scatter. The
strategy is to derive a single curve that represents
the general trend of the data.
• Interpolation:
Data is very precise. The strategy is to pass a
curve or a series of curves through each of the
points.
Introduction
In engineering, two types of applications are encountered:
– Trend analysis. Predicting values of dependent variable,
may include extrapolation beyond data points or
interpolation between data points.
– Hypothesis testing. Comparing existing mathematical
model with measured data.
(a) Least-square regression
(b) Linear interpolation
(c) Curvilinear interpolation
Mathematical Background
• Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).
y
y i
, i 1, , n
n
• Standard deviation. The most common measure
of a spread for a sample.
St
Sy , St ( yi y ) 2
n 1
Mathematical Background (cont’d)
• Variance. Representation of spread by the square
of the standard deviation.
i y y
2 2
( y y ) 2
/n
S
2
or S 2
i i
n 1
y
n 1
y
• Coefficient of variation. Has the utility to quantify
the spread of data.
Sy
c.v. 100%
y
Normal Distribution
7
Chapter 17
Least-Squares Regression
8
Linear Regression
Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),…,(xn, yn).
y = a0+ a1 x + e
a1 - slope
a0 - intercept
e - error, or residual, between the model and
the observations
Linear Regression: Residual
Linear Regression: Question
How to find a0 and a1 so that the error would be
minimum?
Linear Regression:
Criteria for a “Best” Fit
n n
min e (y
i 1
i
i 1
i a0 a1 xi )
e1 e1= -e2
e2
Linear Regression:
Criteria for a “Best” Fit
n n
min | e | | y
i 1
i
i 1
i a0 a1 xi |
Linear Regression:
Criteria for a “Best” Fit
n
min max | ei || yi a0 a1 xi |
i 1
Linear Regression:
Least Squares Fit
n n n
S r e ( yi , measured yi , model ) ( yi a0 a1 xi ) 2
2
i
2
i 1 i 1 i 1
n n
min S r ei
2
( yi a0 a1 xi ) 2
i 1 i 1
Yields a unique line for a given set of data.
Linear Regression:
Least Squares Fit
n n
min S r ei
2
( yi a0 a1 xi ) 2
i 1 i 1
The coefficients a0 and a1 that minimize Sr must satisfy
the following conditions:
S r
a 0
0
S r 0
a1
Linear Regression:
Determination of ao and a1
S r
2 ( yi ao a1 xi ) 0
ao
S r
2 ( yi ao a1 xi ) xi 0
a1
0 yi a 0 a1 xi
0 yi xi a 0 xi a1 xi2
a 0 na0
na0 xi a1 yi 2 equations with 2
unknowns, can be solved
ii 0i 1i
y x a x a x 2
simultaneously
Linear Regression:
Determination of ao and a1
y = a0+ a1 x
a1 - slope
a0 - intercept
n xi yi xi yi
a1
n x xi
2 2
i
a0 y a1 x
Error of Linear Regression
Error of Linear Regression
n n
S r e ( yi a0 a1 xi ) 2
2
i
Analog to
i 1 i 1 Standard deviation
Standard error of the estimates:
21
Error of Linear Regression
Small residual error
Large residual error
22
Error Quantification of Linear
Regression
• Total sum of the squares around the mean for the
dependent variable, y, is St
S t ( yi y ) 2
• Sum of the squares of residuals around the regression
line is Sr
n n
S r e ( yi ao a1 xi ) 2
2
i
i 1 i 1
Error Quantification of Linear
Regression
• St-Sr quantifies the improvement or error reduction due
to describing data in terms of a straight line rather than
as an average value.
St S r
r
2
St
r2: coefficient of determination
r : correlation coefficient
For a perfect fit:
• Sr= 0 and r = r2 =1, signifying that the line explains 100
percent of the variability of the data.
• For r = r2 = 0, Sr = St, the fit represents no improvement.
Least Squares Fit of a Straight Line:
Example
Fit a straight line to the x and y values in the following
Table:
xi yi xiyi xi2
xi 28 yi 24.0
1 0.5 0.5 1
2 2.5 5 4 xi 140
2
xi yi 119 .5
3 2 6 9
28 24
4 4 16 16 x 4 y 3.42857
5 3.5 17.5 25 7 7
6 6 36 36
28 24
7 5.5 38.5 x 49 4 y 3.428571
7 7
28 24 119.5 140
Least Squares Fit of a Straight Line:
Example (cont’d)
n xi yi xi yi
a1
n x ( xi )
2 2
i
7 119.5 28 24
0.8392857
7 140 28 2
a0 y a1 x
3.428571 0.8392857 4 0.07142857
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line:
Example (Error Analysis)
xi yi (yi y)2 ei2
St yi y 22.7143
2
1 0.5 8.5765 0.1687
2 2.5 0.8622 0.5625
S r ei 2.9911
2
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896 St S r
6 6.0 6.6122 0.7972 r2 0.868
St
7 5.5 4.2908 0.1993
28 24.0 22.7143 2.9911
r r 2 0.868 0.932
Results indicate that 86.8% of the original uncertainty
has been explained by linear regression
Least Squares Fit of a Straight Line:
Example (Error Analysis)
•The standard deviation (quantifies the spread around the
mean):
St 22.7143
sy 1.9457
n 1 7 1
•The standard error of estimate (quantifies the spread
around the regression line)
Sr 2.9911
sy / x 0.7735
n2 72
Because S y / x S y , the linear regression model has good fitness
Algorithm for linear regression
Linearization of Nonlinear Relationships
• Linear regression: the relationship between the
dependent and independent variables is linear.
• However, a few types of nonlinear functions can be
transformed into linear regression problems.
The exponential equation.
The power equation.
The saturation-growth-rate equation.
Exponential eq. Simple power eq. Saturation-growth-rate
Linearization of Nonlinear Relationships
1. The exponential equation.
y a1eb1x
ln y ln a1 b1 x
Linearization of Nonlinear Relationships
2. The power equation
y a2 xb2
log y log a2 b2 log x
Linearization of Nonlinear Relationships
3. The saturation-growth-rate equation
x
y a3
b3 x
1 1 b3 1
y a3 a3 x
Example
Fit the following Equation:
y a2 x b2
to the data in the following table: log y log( a2 x )
b2
xi yi X = log xi Y = log yi
log y log a2 b2 log x
1 0.5 0 -0.301 let Y log y, X log x,
2 1.7 0.301 0.226 a0 log a2 , a1 b2
3 3.4 0.477 0.534
4 5.7 0.602 0.753 Y a0 a1 X
5 8.4 0.699 0.922
15 19.7 2.079 2.141
Example
Xi Yi X*i=Log(X) Y*i=Log(Y) X*Y* X*^2
1 0.5 0.0000 -0.3010 0.0000 0.0000
2 1.7 0.3010 0.2304 0.0694 0.0906
3 3.4 0.4771 0.5315 0.2536 0.2276
4 5.7 0.6021 0.7559 0.4551 0.3625
5 8.4 0.6990 0.9243 0.6460 0.4886
Sum 15 19.700 2.079 2.141 1.424 1.169
n x i y i x i y i 5 1.424 2.079 2.141
a1 1.75
n x i ( x i ) 5 1.169 2.079
2 2
2
a0 y a1x 0.4282 1.75 0.41584 0.334
Example
y 0.46x 1.75 log y = -0.334 + 1.75 log x
Polynomial Regression
• Some engineering data is poorly represented by a
straight line.
• For these cases a curve is better suited to fit the data.
• The least squares method can readily be extended to fit
the data to higher order polynomials.
Polynomial Regression (cont’d)
A parabola is preferable
Polynomial Regression (cont’d)
• A 2nd order polynomial (quadratic) is defined by:
y ao a1 x a2 x e
2
• The residuals between the model and the data:
ei yi ao a1 xi a2 xi
2
• The sum of squares of the residual:
Sr ei yi ao a1 xi a2 xi
2
2 2
Polynomial Regression (cont’d)
S r
2 ( yi ao a1 xi a2 xi2 ) 0
ao
S r
2 ( yi ao a1 xi a2 xi2 ) xi 0
a1
S r
2 ( yi ao a1 xi a2 xi2 ) xi2 0
a2
i
y n a o a1 i
x a 2 i
x 2
3 linear equations
i i o i 1 i 2 i
with 3 unknowns
x y a x a x 2
a x 3
(ao,a1,a2), can be
i i o i 1 i 2 i
solved
x 2
y a x 2
a x 3
a x 4
Polynomial Regression (cont’d)
• A system of 3x3 equations needs to be solved to
determine the coefficients of the polynomial.
n
x x
i
2
i
a0 y i
xi x x a1 xi yi
2 3
i i
xi2 x x
3 4 a2 xi2 yi
i i
• The standard error & the coefficient of determination
Sr St S r
sy / x r
2
n3 St
Polynomial Regression (cont’d)
General:
The mth-order polynomial:
y ao a1 x a2 x 2 ..... am x m e
• A system of (m+1)x(m+1) linear equations must be solved for
determining the coefficients of the mth-order polynomial.
• The standard error:
Sr
sy / x
n m 1
St S r
• The coefficient of determination: r
2
St
Polynomial Regression- Example
Fit a second order polynomial to data:
xi
0
yi
2.1
xi2
0
Xi3
0
xi4
0
xiyi xi2yi
0 0
x i 15
1 7.7 1 1 1 7.7 7.7 y i 152.6
2 13.6 4 8 16 27.2 54.4
3 27.2 9 27 81 81.6 244.8 i 55
x 2
4
5
40.9
61.1
16
25
64
125
256
625
163.6 654.4
305.5 1527.5
i 225
x 3
15 152.6 55 225 979 585.6 2489 i 979
x 4
15 152.6 x y i i 585.6
x 2.5, y 25.433 xi yi 2488.8
2
6 6
Polynomial Regression- Example
(cont’d)
• The system of simultaneous linear equations:
6 15 55 a0 152.6
15 55 225 a 585.6
1
55 225 979 2488.8
2
a
a0 2.47857, a1 2.35929, a2 1.86071
y 2.47857 2.35929 x 1.86071 x 2
St yi y 2513.39 S r ei 3.74657
2 2
Polynomial Regression- Example
(cont’d)
xi yi ymodel e i2 (yi-y`)2
0 2.1 2.4786 0.14332 544.42889
1 7.7 6.6986 1.00286 314.45929
2 13.6 14.64 1.08158 140.01989
3 27.2 26.303 0.80491 3.12229
4 40.9 41.687 0.61951 239.22809
5 61.1 60.793 0.09439 1272.13489
15 152.6 3.74657 2513.39333
•The standard error of estimate:
3.74657
sy / x 1.12
63
•The coefficient of determination:
2513.39 3.74657
r
2
0.99851, r r 2 0.99925
2513.39
Other Regression Techniques
• Multiple linear regression
• General linear least-squares
• Nonlinear regression
51