Curve Fitting
Often have set data, y , that is a function of some independent variable,
x , but the underlying relationship is unknown (don’t know f(x) ) .
• Measured data
• Tabulated data
Such function may be obtained through curve itting, or approximation, of
the data.
Curve fitting is a procedure where a function is used to it a given set of
data in the “best” possible manner without having to match the data
exactly.
Two categories of curve fitting:
• Least‐squares regression
o Noisy data – uncertainty in value for a given value
o Want “good” agreement between and data points . Curve (f(x) may
not pass through any data points
• Polynomial interpolation
o Data points are known exactly – noiseless data
o Resulting curve passes through all data points
1
Linear Regression
• Noisy data, y, values at known
values x
• Suspect relationship between
and is linear
• assume
• Determine a0 and a1 that
define the “best‐ fit” line for
the data
• The simplest example is fitting a straight line to a set of paired observations:
(x1, y1), (x2, y2), . . . , (xn, yn). The mathematical expression for the straight line
is :
1
a0 and a1 are coefficients representing the intercept and the slope, respectively,
and e is the error, or residual, between the model and the observations
2
rearranging Eq. (1) as
(2)
One strategy for fitting a “best” line through the data would be to minimize the sum
of the residual errors for all the available data, as in
(3)
n = total number of points
One way to remove the effect of the signs might be to minimize sum of the
squares of the residuals as :
(4)
The sum of the squares of the residuals is a function of the two fitting
parameters, a0 and a1, Sr(a0,a1).
Minimize by setting its partial derivatives to zero and solving for a0 and a1
Setting these derivatives equal to zero will result in a minimum Sr
the equations as a set of two simultaneous linear equations with two
unknowns (a0 and a1):
(5)
(6)
These are called the normal equations.
4
The equations can be solved simultaneously :
(7)
(8)
Quantification of Error of Linear Regression
Recall that the sum of the squareserror is defined as :
(9)
St the square of the residual represented the square of the discrepancy
between the data and a single estimate of the measure of central
tendency—the mean :
(10)
5
( xn , y n )
y xi , yi
Ei yi a0 a1 xi
n
Sr yi a 0 a1xi 2 x2 , y2
i 1
x3 , y 3
x1 , y1
x
( xn , y n )
y xi , yi
n 2 yy
_
St yi y
i 1
x2 , y 2
x3 , y3
x1 , y1
6
St is the magnitude of the residual error associated with the dependent
variable prior to regression. After performing the regression, we can compute
Sr, the sum of the squares of the residuals around the regression line
The difference between the two quantities, St - Sr, quantifies the improvement or
error reduction due to describing the data in terms of a straight line rather than
as an average value. Because the magnitude of this quantity is scale
dependent, the difference is normalized to St to yield
r2 is called the coefficient of determination and r is the correlation coefficient
For a perfect fit, Sr = 0 and r2 = 1, signifying that the line explains 100% of the
variability of the data. For r 2 = 0 Sr = St and the fit represents no improvement.
An alternative formulation for r that is more convenient for computer
implementation is
7
function [a, r2] = linregr(x,y)
% linregr: linear regression curve fitting
% [a, r2] = linregr(x,y): Least squares fit of straight
% line to data by solving the normal equations
% input:
% x = independent variable
% y = dependent variable
% output:
% a = vector of slope, a(1), and intercept, a(2)
% r2 = coefficient of determination
n = length(x);
if length(y)~=n, error('x and y must be same length');
end
x = x(:); y = y(:); % convert to column vectors
sx = sum(x); sy = sum(y);
sx2 = sum(x.*x); sxy = sum(x.*y); sy2 = sum(y.*y);
a(1) = (n*sxy—sx*sy)/(n*sx2—sx^2);
a(2) = sy/n—a(1)*sx/n;
r2 = ((n*sxy—sx*sy)/sqrt(n*sx2—sx^2)/sqrt(n*sy2—sy^2))^2;
% create plot of data and best fit line
xp = linspace(min(x),max(x),2);
yp = a(1)*xp+a(2);
plot(x,y,'o',xp,yp)
grid on
8
LINEARIZATION OF NONLINEAR RELATIONSHIPS
Not all data can be explained by a linear relationship to an independent
variable, e.g
Linearize the fitting equation:
where
9
Linearize the fitting equation:
Linearize the fitting equation:
10
Polynomial Regression
The linear least-squares regression can be extended to it a set of n data
points (x1, y1), …, (xn, yn) with an mth-degree polynomial in the form :
with a total error
The coeficients am, …, a2, a1, a0 are determined such that E is minimized. A
necessary condition for E to attain a minimum is that its partial derivative with
respect to each of these coeficients vanishes, that is :
11
Manipulation of these equations yields a system of m + 1 linear equations to be
solved for am, …, a2, a1, a0 :
Quadratic Least-Squares Regression
The objective is to it a set of n data points (x1, y1), …, (xn, yn) with a second-
degree polynomial
the total error
12
the coeficients a2, a1, a0 are determined by solving a system of three linear
equations:
The user-deined function QuadraticRegression uses the quadratic least-
squares regression approach to ind the second-degree polynomial that best
its a set of data. The coeficients a2, a1, a0 are found by writing Equation in
matrix form and applying the built-in backslash “\” operator in MATLAB. The
function also returns the plot of the data and the best quadratic polynomial it.
13
function [a2, a1, a0] = QuadraticRegression(x,y)
%
% QuadraticRegression uses quadratic least-squares approximation to fit a
% data by a 2nd-degree polynomial in the form y = a2*x^2 + a1*x + a0.
%
% [a2, a1, a0] = QuadraticRegression(x,y), where
%
% x, y are n-dimensional row or column vectors of data,
%
% a2, a1 and a0 are the coefficients that describe the quadratic fit.
%
n = length(x);
Sumx = sum(x); Sumy = sum(y);
Sumx2 = sum(x.^2); Sumx3 = sum(x.^3); Sumx4 = sum(x.^4);
Sumxy = sum(x.*y); Sumx2y = sum(x.*x.*y);
% Form the coefficient matrix and the vector of right-hand sides
A = [n Sumx Sumx2;Sumx Sumx2 Sumx3;Sumx2 Sumx3 Sumx4];
b = [Sumy;Sumxy;Sumx2y];
w = A\b; % Solve for the coefficients
a2 = w(3); a1 = w(2); a0 = w(1);
% Plot the data and the quadratic fit
xx = linspace(x(1),x(end)); % Generate 100 points for plotting purposes
p = zeros(100,1); % Pre-allocate
for i = 1:100,
p(i) = a2*xx(i)^2 + a1*xx(i) + a0; % Calculate 100 points
end
plot(x,y,'o')
hold on
plot(xx,p)
end
14
MATLAB Built-In Functions Polyfit
A brief description of the MATLAB built-in function polyfit is given as:
POLYFIT Fit polynomial to data.
P = POLYFIT(X,Y,N) finds the coefficients of a polynomial P(X) of
degree N that fits the data Y best in a least-squares sense. P is
a row vector of length N + 1 containing the polynomial
coefficients in descending powers :
P(1)*X^N + P(2)*X^(N-1) + ⋯ + P(N)*X + P(N + 1).
EXAMPLE
Using the polyfit functions find and plot the second-degree polynomial that
best its the data. Apply the user-deined function QuadraticRegression and
compare the results.
15
Solution
>> x = 0:0.3:1.2; y = [3.6 4.8 5.9 7.6 10.9];
>> P = polyfit(x,y,2)
P = 3.8095 1.2286 3.7657 % Coefficients of the second-deg polynomial fit
>> xi = linspace(0,1.2); % Generate 100 points for plotting purposes
>> yi = polyval(P,xi); % Evaluate the polynomial at these points
>> plot(xi,yi,x,y,'o')
Execution of QuadraticRegression yields:
>> [a2, a1, a0] = QuadraticRegression(x,y)
a2 =
3.8095
a1 =
1.2286
a0 =
3.7657
The coeficients of the second-degree polynomial are precisely those returned
by polyfit.
16
17