1
CHAPTER 4: REGRESSION
MODELS
2
CHAPTER 4: REGRESSION
MODELS
Discussion Questions and Problems
4.1 What is the meaning of least squares in a regression model?
A least squares regression is used in every analysis for a regression model. It is the
regression line with the minimum sum of squared errors. When given data points,
they are first plotted onto a graph. In essence, creating a scatter plot. Many lines can
be drawn through these points to attain some kind of relationship between the
variables. The best regression line is the one with the least error, or least squares
regression. Why must it be the least squares? Error = Actual value Predicted value.
Since error can be both positive and negative, errors could be canceled out when
summed together. To avoid this, the errors are squared.
4.2 Discuss the use of dummy variables in regression analysis.
In regression analysis it is often assumed all the variables are quantitative. Examples
include how many years of experience an employee has and how old a house is.
However, qualitative variables may also be examined and this is what a dummy
variable represents. A dummy variable can be used in determining if the employee has
a college degree and what condition the house is in at the time of sale. A dummy
variable is therefore not measurable and can also be called an indicator variable or
binary variable. The number of variables must be one less than the number of
categories of a qualitative variable. An example, when deciding if an employee that
has a college education will be better for the company overall:
X3 = 1 if the employee has a college degree
3
CHAPTER 4: REGRESSION
MODELS
= 0 if otherwise
This variable can then be factored into the regression equation to determine if
employees with a college education are more productive for the company that those
without one.
4.3 Discuss how the coefficient of determination and the coefficient of correlation are
related and how they are used in regression analysis.
Both the coefficient of determination and the coefficient of correlation measure the
strength of the linear relationship. The coefficient of determination is represented by
r2. R2 is the proportion of variability in Y that is explained by the regression equation.
R2 ranges from 0 to 1. The closer the coefficient is to 1 the stronger the correlation is.
The coefficient of correlation is represented by r and ranges from -1 to +1. It can be
found by taking the square root of the coefficient of determination. A positive r is
associated with a positive slope and a negative r has a negative slope.
4.4 Explain how a scatter diagram can be used to identify the type of regression to use.
A scatter diagram is a graph of the data. It can be used to find a relationship
between variables. From the created graph, different types of regressions can be
identified such as if it is linear, nonlinear, or no correlation. Specifics such as if it is
positive or negative can also be assessed using the scatter diagram.
4
CHAPTER 4: REGRESSION
MODELS
4.5
In a regression model when it is not clear which variable should be included or excluded
in the model r^2 should be used. It helps in determining whether the addition of more
variables to the regression model is useful or not. It gives a better picture over the normal
r^2 as it starts to fall when more than the required numbers of variables are added to the
model, thus helping in deciding how many variables should be used.
4.6
F test is used to determine a relationship between the dependent variable and
independent. A large F- test value shows that the regression model is not significant
enough while a low F- test value shows that the model is significant and there is a
relationship between the independent and dependent variables.
4.10
a)
Demand For Bass
Drums
Green Shades TV
Appearance
Column1
Column2
10
5
CHAPTER 4: REGRESSION
MODELS
b)
Y
3
6
7
5
10
8
Y=6.5
X
3
4
7
6
8
5
(Y-Y)^2
12.5
.25
.25
2.25
12.25
2.25
SST = 29.5
Y
4
5
8
7
9
6
(Y-Y)^2
1
1
1
4
1
4
SSE = 12.0
(Y-Y)^2
6.25
2.25
2.25
.25
6.25
.25
SSR = 17.5
c)
If the Green Shades performed on TV 6 times during the last month, using the regression
model in (1), the demand for bass drums can be calculated as Y = 1+6
4.12
Regression
Residuals
d.f
SS
MS
F Values
1
4
17.5
12.0
17.5
3.0
5.83
Significance
F
0.073
6
CHAPTER 4: REGRESSION
MODELS
Total
Intercept
Sales
29.5
Coefficient
1
1
Standard Error
2.385
0.414
t- Stat
0.419
2.415
p- Stat
0.697
0.073
Regression line is Y=1+X
Theres no statistical significance
4.13
A)
Regression model:
( x x)( y y )
( x x )^ 2
(Y)
(X)
( x x )( y y )
( x x )^ 2
93
98
236.444
285.23
7
CHAPTER 4: REGRESSION
MODELS
78
77
4.111
16.901
84
88
34.444
47.457
73
80
6.667
1.235
84
96
74.444
221.679
64
61
301.667
404.457
64
66
226.667
228.346
95
95
222.222
192.901
76
69
36.333
146.679
711
730
1143
1544.9
1st grade average X = 730
Final average Y = 711
1143/1544.9=0.74
(711/9)-0.74(730/9) = 18.99
= 18.99 + 0.74x
B)
Y= 18.99+ 0.74* 83
= 80.41
8
CHAPTER 4: REGRESSION
MODELS
C)
( y y )^ 2
( y y )^ 2
196
156.135
9.252
25
25.977
36
0.676
25
121.345
225
221.396
225
124.994
256
105.592
80.291
SST = 998
SSR= 845.659
Given the formula
Given the formula
( y y )^ 2
( y y )^ 2
SSR/SST = 845.659/998
r^2 = 0.8473 = 0.85
0.8473
r=0.92048 = 0.92
, SST would equal 998.
, SSR would equal 845.659
9
CHAPTER 4: REGRESSION
MODELS
4.14
MSE= SSE/(n-k-1)
( y y )^ 2
( y y )^ 2
( y y )^ 2
196
156.135
2.2617
9.252
4.1689
25
25.977
0.0009
36
0.676
28.8106
25
121.345
36.1959
225
221.396
0.0143
225
124.994
14.5871
256
105.592
32.7596
80.291
35.5335
SST = 998
SSR= 845.659
SSE = 152.341
MSE= 152.341/(9-1-1) = 21.76
MSR=SSR/k
MSR= 845.659/1= 845.659
F= 845.659/21.76= 38.9
F is equal to 5.59. So 38.9 is > 5.59 and is significantly significant due to relationship
between the first test grade and final average from problems 4.13.
4.16
10
CHAPTER 4: REGRESSION
MODELS
Formula is
y 13,473 37.65 x
A)
x= 1,860
y 13,473 37.65(1,860)
= $83,502
B)
The selling price for the house of 1,860 square feet was $83,502. The selling price was
based on other homes sold within the neighborhood, not just the square footage. One can
purchase a home for either lower the selling price or above the selling price.
C) Other quantitative variables may be location and square footage of the house being
sold. The number of bedrooms, bathrooms, whether its one story or 2 story can also be
included in the model.
D) The coefficient of determination for the model r^2 = 0.63^2 = 0.3969
4.17
A)
Distance traveled= x2= 300 miles
Days out of town= x1= 5 days
y $90.00 $48.50(5) $0.40(300)
Expected expenses= $452.50
11
CHAPTER 4: REGRESSION
MODELS
B)
The reimbursement request was for $685. Expected expenses was $452.50. The expected
expenses was less than the reimbursement request. A receipt of expenses for the 5 day trip
should be provided to justify the high expenses by Williams.
C)
Travel expenses can vary in its variables. This can include food, gas, rental vehicle and
hotel. With business trips, meetings/conferences or events can be included. Only 46% of
the cost is covered under the proposed model. It is not efficient in accounting for the
other percent due to other variables.
4.24
Use the data in Problem 4-22 and develop a regression model to predict
selling price based on the square footage, number of bedrooms, and age.
Use this to predict the selling price of a 10-year-old, 2,000-square-foot
house with 3 bedrooms.
12
CHAPTER 4: REGRESSION
MODELS
SUMMARY
OUTPUT
Regressio
n
13
CHAPTER 4: REGRESSION
MODELS
Statistics
Multiple R
0.94134
8
R Square
0.88613
7
Adjusted
R Square
0.85986
1
Standard
Error
13439.7
7
Observati
ons
17
ANOVA
df
Regressio
n
Residual
SS
MS
Significa
nce F
1.83E
+10
6.09E 33.724 2.12E-06
+09
06
13
2.35E
1.81E
14
CHAPTER 4: REGRESSION
MODELS
+09
Total
16
+08
2.06E
+10
Coefficie Standa
nts
rd
Error
t Stat
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
13189 32478.
3.1
23
13189
3.1
Intercept
82185.6 23008. 3.5719 0.0034
5
77
28
1
32478.2
3
Sq
25.9407 9.5830 2.7069 0.0179
6
37
46
55
5.23787 46.643 5.2378 46.643
66
7
66
Footage
Bedrooms
- 8826.0
- 0.8111 -21219.3 16915.
- 16915.
2151.74
87 0.2437
96
86 21219.
86
9
3
Age Years
- 327.19
- 0.0001 -2418.39
1711.54
08 5.2310
62
1004.6 2418.3 1004.6
1
8
9
8
15
CHAPTER 4: REGRESSION
MODELS
SUMMARY
OUTPUT
Regressio
n
Statistics
Multiple R
0.94107
2
R Square
0.88561
6
Adjusted
R Square
0.86927
6
Standard
Error
12980.4
5
Observati
ons
17
ANOVA
16
CHAPTER 4: REGRESSION
MODELS
df
Regressio
n
SS
MS
Significa
nce F
1.83E
+10
9.13E 54.197 2.56E-07
+09
54
Residual
14
2.36E
+09
1.68E
+08
Total
16
2.06E
+10
Coefficie Standa
nts
rd
Error
t Stat
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
12072 38062.
1.4
11
12072
1.4
Intercept
79391.7 19269. 4.1200 0.0010
5
82
06
41
38062.1
1
Sq
24.3190 6.6624 3.6501 0.0026
4
2
8
24
10.0295 38.608 10.029 38.608
7
51
57
51
Footage
Age Years
- 315.99
1712.21
77 5.4184
9.06E- -2389.96
05
1034.4 2389.9 1034.4
17
CHAPTER 4: REGRESSION
MODELS
P value of bedrooms is .81 > .15, so I will exclude it.
New formula is Selling value = 79391 + 24(Sq Feet) - 1712 (years)
= 79391 + (24*2000) - (1712*10)
= 79391 + 48000 - 17120
=110271$
4.27
A sample of 20 automobiles was taken, and the miles per gallon (MPG),
horsepower, and the total weight were recorded. Develop a linear
regression model to predict MPG, using horsepower as the only independent
variable. Develop another model with weight as the independent variable.
Which of these two models is better? Explain.
18
CHAPTER 4: REGRESSION
MODELS
The
19
CHAPTER 4: REGRESSION
MODELS
Horsepower vs MPG is model is better in that it is more predictable,
because the R2 value is .7702 compared to .7326, which is closer to 1
and thus more points are closer to the regression line, meaning there is
less error with this set of data.
4.29
Use the data in problem 4-27 to find the best quadratic regression model.
(There is more than one to consider.) How does this compare to the models
in 4-27?
a.
Based on the R2 values, the horsepower quadratic seems
to be more predictive with less room for error considering it has the
highest R2 value, of .0896, which is the closest to 1
Horsepower quadratic: y = 66.84 - 0.5769x + .001x 2
20
CHAPTER 4: REGRESSION
MODELS
SUMMAR
Y
OUTPUT
Regressi
on
Statistics
Multiple
R
0.89979
1
R Square
0.80962
4
Adjusted
R Square
0.78582
7
Standard
Error
3.96071
3
Observati
ons
19
21
CHAPTER 4: REGRESSION
MODELS
ANOVA
df
Regressio
n
SS
MS
2 1067.4 533.71 34.022
25
25
06
Residual
16 250.99 15.687
6
25
Total
18 1318.4
21
Coefficie Stand
nts
ard
Error
Intercept
t Stat
66.8404 10.204 6.5501
4
38
73
Pvalue
Significa
nce F
1.73E06
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
6.68E06
45.2081 88.472 45.208 88.472
3
75
13
75
67
- 0.2134
- 0.0156
0.57691
74 2.7024
92
8
1.02946 0.1243 1.0294 0.1243
6
6
6
4489
0.00160 0.0010 1.5262 0.1464
3
5
83
63
- 0.0038
- 0.0038
0.00062
3 0.0006
3
22
CHAPTER 4: REGRESSION
MODELS
Weight Quadratic
SUMMARY
OUTPUT
Regressio
n
Statistics
23
CHAPTER 4: REGRESSION
MODELS
Multiple R
0.88003
3
R Square
0.77445
8
Adjusted
R Square
0.74626
5
Standard
Error
4.31102
7
Observati
ons
19
ANOVA
df
Regressio
n
SS
MS
2 1021.0 510.53 27.470
62
09
12
Residual
16 297.35 18.584
93
95
Total
18 1318.4
Significa
nce F
6.7E-06
24
CHAPTER 4: REGRESSION
MODELS
21
Coefficie Standa
nts
rd
Error
Intercept
1844
3400336
t Stat
84.2071 14.661 5.7432
9
84
87
Pvalue
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
3.02E05
53.1254 115.28 53.125 115.28
7
89
47
89
- 0.0101
- 0.0082 -0.05231
0.03071
87 3.0148
2
0.0091 0.0523 0.0091
8
2
1
2
3.45E06
1.7E- 2.0373 0.0585
06
23
1
-1.4E-07
7.05E06
-1.4E07
7.05E06
25
CHAPTER 4: REGRESSION
MODELS
Y = 84.207-.0307x+0.00000345x2
Case Study
-The first concern is the maintenance cost. As the age of the aircraft increases, so does the
cost of maintenance.
-Maintenance should be looked at for both airlines as maintenance costs vary greatly.
-The data that was provided does not provide sufficient results. Maintenance cost seems
to be based on the airline rather than the age of the aircraft.
26
CHAPTER 4: REGRESSION
MODELS
-Northern Airline appears more efficient as the cost of maintenance has little variation
from year to year.
-Southeast Airline appears to have a steady increase for engine and air frame maintenance
cost.
Based on the overall data, Southeast Airline seems more efficient at doing repairs that
requires abrupt attention. Northern Airline however with its high costs is better suited for
preventative maintenance.