Curve Fitting (Regression)
• Describes techniques to fit curves (curve fitting) to
discrete data to obtain intermediate estimates.
• There are two general approaches two curve fitting:
– Data exhibit a significant degree of scatter. The strategy is
to derive a single curve that represents the general trend of
the data.
– Data is very precise. The strategy is to pass a curve or a
series of curves through each of the points.
1
Least Square Method - Curve Fitting
Linear Interpolation
Curvlinear Interpolation
2
Mathematical Background
• Arithmetic mean y. The sum of the individual data points (yi)
divided by the number of points (n).
• Standard deviation Sy. The most common measure of a spread
for a sample.
or
3
• Variance Sy2. Representation of spread by the square
of the standard deviation.
Degrees of freedom
• Coefficient of variation c.v. Has the utility to quantify
the spread of data.
4
Least Squares Regression
Linear Regression
• Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),…,(xn, yn).
y=a0+a1x+e
a1- slope
a0- intercept
e- error, or residual, between the model and
the observations
5
Criteria for a “Best” Fit/
• Minimize the sum of the residual errors for all
available data:
n = total number of points
• However, this is an inadequate criterion, so is the sum
of the absolute values
6
Minimize the sum of the
residual errors for all
available data (not
adequate)
Minimize the sum of the
absolute value of the
residual errors for all
available data (not
adequate)
7
• Best strategy is to minimize the sum of the squares of
the residuals between the measured y and the
calculated y with the linear model:
• This strategy yields a unique line for any given set of
data.
8
List-Squares Fit of a Straight Line/
Normal equations, can be
solved simultaneously
Mean values
9
10
11
“Goodness” of our fit/
If
• Total sum of the squares around the mean for the
dependent variable, y, is St
• Sum of the squares of residuals around the regression
line is Sr
• St-Sr quantifies the improvement or error reduction
due to describing data in terms of a straight line rather
than as an average value.
r2-coefficient of determination
Sqrt(r2) – correlation coefficient 12
r -Coefficient of Determination
2
13
• For a perfect fit
Sr=0 and r=r2=1, signifying that the line
explains 100 percent of the variability of the
data.
• For r=r2=0, Sr=St, the fit represents no
improvement.
• A correlation coefficient, r, greater than 0.8 is
generally described as strong, whereas a
correlation less than 0.5 is generally described
as weak.
14
Algorithm for Least-Square Linear
Regression
Sumx = 0 : Sumy=0 : St=0
Sumxy = 0 : Sumx2=0 : Sr=0
‘ Calculate Sumx, Sumy, Sumxy, and Sumx2
For i=1 to n
Sumx = Sumx + x(i)
Sumy = Sumy + y(i)
Sumxy = Sumxy + x(i) * y(i)
Sumx2 = Sumx2 + x(i)^2
Next i
‘ Calculate the mean values of x and y
xm = Sumx /n : ym = Sumy / n
‘ Calculate constants for the line equation a0 and a1
a1 = (n * Sumxy – Sumx * Sumy) / (n * Sumx2 – Sumx^2)
a0 = ym – a1 * xm
15
Algorithm for Least-Square Linear
Regression (Cont.)
‘ Calculate St, and Sr
For i=1 to n
St = St + (y(i) – ym)^2
Sr = Sr + (y(i) – a1 * x(i) – a0)^2
Next i
‘ Calculate r2
r2 = (St – Sr) / St
16
Linearization of Non-linear
Regression
Non-linear
Linear
Perform regression
for xi and (ln yi)
17
Linearization of Non-linear
Regression
Non-linear
Linear
Perform regression
for ln(xi) and ln (yi)
18
Linearization of Non-linear
Regression (Assignment)
?
Non-linear
Linear
? ?
19
Regression Analysis in Excel
Excel has different regression built in
models (for correlating two variables)
20
Regression Analysis in Excel
Here is an example of a sample data set and
the plot of a "best-fit" straight line through
the data
21
Regression Analysis in Excel
Correlation Coefficient, r
Coefficient of Determination, r 2
or R2
22
Questions
23