Goldsman ISyE 6739 Linear Regression
REGRESSION
12.1 Simple Linear Regression Model
12.2 Fitting the Regression Line
12.3 Inferences on the Slope Parameter
1
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
Suppose we have a data set with the following paired
observations:
(x1, y1), (x2, y2), . . . , (xn, yn)
Example:
xi = height of person i
yi = weight of person i
Can we make a model expressing yi as a function of
xi ?
2
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
Estimate yi for fixed xi. Lets model this with the
simple linear regression equation,
yi = 0 + 1xi + i,
where 0 and 1 are unknown constants and the error
terms are usually assumed to be
iid
1, . . . , n N (0, 2)
yi N (0 + 1xi, 2).
3
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
y = 0 + 1x
with high 2
y = 0 + 1x
with low 2
4
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
Warning! Look at data before you fit a line to it:
doesnt look very linear!
5
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
xi yi
Production Electric Usage
($ million) (million kWh)
Jan 4.5 2.5
Feb 3.6 2.3
Mar 4.3 2.5
Apr 5.1 2.8
May 5.6 3.0
Jun 5.0 3.1
Jul 5.3 3.2
Aug 5.8 3.5
Sep 4.7 3.0
Oct 5.6 3.3
Nov 4.9 2.7
Dec 4.2 2.5
6
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
3.4
yi 3.0
2.6
2.2
3.5 4.0 4.5 5.0 5.5 6.0
xi
Great... but how do you fit the line?
7
Goldsman ISyE 6739 12.2 Fitting the Regression Line
Fit the regression line y = 0 + 1x to the data
(x1, y1), . . . , (xn, yn)
by finding the best match between the line and the
data. The bestchoice of 0, 1 will be chosen to
minimize
n n
))2 2
X X
Q= (yi (0 + 1xi = i.
i=1 i=1
8
Goldsman ISyE 6739 12.2 Fitting the Regression Line
This is called the least square fit. Lets solve...
Q P
0 = 2 (yi (0 + 1xi)) = 0
Q P
1 = 2 xi(yi (0 + 1xi)) = 0
P P
yi = n0 + 1 xi
P P
xiyi = 2 xi(yi (0 + 1xi)) = 0
After a little algebra, get
P P P
n xiyi( xi)( yi)
1 =
n x2
P P
i ( xi)2
0 = y 1x, where y 1 Py and x 1 Px .
n i n i
9
Goldsman ISyE 6739 12.2 Fitting the Regression Line
Lets introduce some more notation:
x)2 x2 2
P P
Sxx = (xi = i nx
xi)2
P
(
x2
P
= i n
P P
Sxy = (xi x)(yi y) = xiyi nxy
P P
P ( xi)( yi )
= xiyi n
These are called sums of squares.
10
Goldsman ISyE 6739 12.2 Fitting the Regression Line
Then, after a little more algebra, we can write
Sxy
1 =
Sxx
Fact: If the is are iid N (0, 2), it can be shown that
0 and 1 are the MLEs for 0 and 1, respectively.
(See text for easy proof).
Anyhow, the fitted regression line is:
y = 0 + 1x.
11
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
Fix a specific value of the explanatory variable x, the
equation gives a fitted value y|x = 0 + 1x for the
dependent variable y.
12
y
y = 0 + 1x
y|x
x
x xi
Goldsman ISyE 6739 12.2 Fitting the Regression Line
For actual data points xi, the fitted values are yi =
0 + 1xi.
observed values : yi = 0 + 1xi + i
fitted values : yi = 0 + 1xi
Lets estimate the error variation 2 by considering
the deviations between yi and yi.
SSE = (yi yi = (yi (0 + 1xi))2
)2
P P
= yi2 0 yi 1 xiyi.
P P P
13
Goldsman ISyE 6739 12.2 Fitting the Regression Line
Turns out that 2 SSE
n2 is a good estimator for 2.
P12
Example: Car plant energy usage n = 12, i=1 xi =
x2 yi2 = 98.697,
P P P
58.62, yi = 34.15, i = 291.231,
P
xiyi = 169.253
1 = 0.49883, 0 = 0.4090
fitted regression line is
y = 0.409 + 0.499x y|5.5 = 3.1535
What about something like y|10.0?
14
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Sxy
(xi x)2 and
P
1 = Sxx , where Sxx =
P P P
Sxy = (xi x)(yi y) = (xi x)yi y (xi x)
P
= (xi x)yi
15
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Since the yis are independent with yi N(0+1xi, 2)
(and the xis are constants), we have
E1 = S1xx ESxy = S1xx (xi x)Eyi = X1xx
P P
(xi x)(0 + 1x
= 1 [ X(x x) + P(x x)x ]
Sxx 0 | i
{z } 1 i i
0
1 P 2 1 X 2 2)
= Sxx (x i x i x) = (
Sxx | x
i {z nx = 1
}
Sxx
1 is an unbiased estimator of 1.
16
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Further, since 1 is a linear combination of indepen-
dent normals, 1 is itself normal. We can also derive
1 1 X 2
Var(1) = 2 Var(Sxy ) = 2 (xi x)2Var(yi) = .
Sxx Sxx Sxx
2
Thus, 1 N(1, Sxx )
17
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
While were at it, we can do the same kind of thing
with the intercept parameter, 0:
0 = y 1x
Thus, E0 = Ey xE1 = 0 + 1x x1 = 0 Similar
to before, since 0 is a linear combination of indepen-
dent normals, it is also normal. Finally,
x2
P
Var(0) = i 2.
nSxx
18
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Proof:
Cov(y, 1) = 1 Cov(y, P(x x)y )
Sxx i i
P
(xix
= Sxx )Cov(y, yi)
P
(xix) 2
= Sxx n = 0
Var(0) = Var(y 1x)
= Var(y) + x2Var1 2x Cov(y,
| {z
1)}
0
2
= n + x2 Sxx 2
2
= 2 Sxx nx .
nSxx
P 2
xi 2
Thus, 0 N(0, nSxx ).
19
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Back to 1 N(1, 2/Sxx) . . .
1 1
q N(0, 1)
2/Sxx
Turns out:
SSE 22(n2)
(1) 2
= n2 n2 ;
(2) 2 is independent of 1.
20
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
1
1
/ Sxx N(0, 1)
s t(n 2)
/ 2(n2)
n2
1 1
t(n 2).
/ Sxx
21
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
t(n 2)
t/2,n2 t/2,n2
22
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
2-sided Confidence Intervals for 1:
1 = Pr(t/2,n2 1 1
t/2,n2)
/ Sxx
= Pr(1 t/2,n2 1 1 + t/2,n2 )
Sxx Sxx
1-sided CIs for 1:
1 (, 1 + t,n2 )
Sxx
1 (1 t,n2 , )
Sxx
23