0% found this document useful (0 votes)

45 views72 pages

Selvanathan 6e - 19 - PPT

The document discusses multiple regression analysis using several examples. Multiple regression allows for analyzing the relationship between a dependent variable and multiple independent variables. Key steps in multiple regression include estimating coefficients, assessing model fit using measures like the standard error of estimate and coefficient of determination, and ensuring assumptions are met.

Uploaded by

nguyen.khanh.hoa1304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views72 pages

Selvanathan 6e - 19 - PPT

Uploaded by

nguyen.khanh.hoa1304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CHAPTER 19

Multiple regression
Chapter outline
19.1 Model and required conditions
19.2 Estimating the coefficients and assessing the model
19.3 Regression diagnostics – II
19.4 Regression diagnostics – III (time series)
Learning objectives
LO1 Develop a multiple regression model, use a
computer and program to estimate the model and
interpret the estimated coefficients
LO2 Understand the adjusted coefficient of
determination and assess the fitness of the model
LO3 Test the significance of the individual coefficients
and the overall utility of the model
LO4 Use the estimated regression model to make
predictions
LO5 Perform diagnostic checks for the regression model
assumptions.
19.5

Introduction
The simple linear regression model was used to
analyse how one numerical variable (the dependent
variable y) is related to one other numerical variable
(the independent variable x).

Multiple regression allows for any number of

independent variables.

We expect to develop models that fit the data better

than would a simple linear regression model.
19.6

19.1 The model and required conditions

We now assume we have k independent variables potentially
related to the one dependent variable. This relationship is
represented in this first order linear equation:

Dependent
Independent variables
variable

Error variable

Coefficients

In the one variable, two dimensional case we drew a

regression line; here we imagine a response surface.
19.7

The simple linear regression model

y allows for one independent variable x.

y = b0 + b1x + e

y = b0 + b1x
Note how the straight line

becomes a plane and ...

2
1x 1 + b 2x
y= b0 + b X 1

The multiple linear regression model allows

for more than one independent variable.

X2
y = b0 + b1x1 + b2x2 + e
19.8

Required conditions for the error variable 

(1) The mean of e is zero: E() = 0.
(2) The standard deviation of  is a constant ().
(3) The errors are independent.
(4) The errors are independent of the independent variable x.
(5) The error  is normally distributed.

These conditions are required in order to

• estimate the model coefficients with desirable properties

• test hypotheses about the model coefficients

• assess the resulting model.

19.9

19.2 Estimating the Coefficients and

Assessing the Model…
Estimating the model…
The sample regression equation is expressed as:
yˆ  ˆ  ˆ x  ˆ x  ...  ˆ x
0 1 1 2 2 k k

We will use computer output to assess the model.

Assessing the model…

• How well does the model fits the data?
• Is it useful?
• Are any required conditions violated?
19.10

Estimating the Coefficients and Assessing

the Model…
Employ the model…

yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk

• Interpreting the coefficients
• Predictions using the prediction equation
• Estimating the expected value of the dependent
variable.
19.11

Regression analysis steps…

1. Use a computer and software to generate the
estimated coefficients and the statistics required to
assess the model.
2. Diagnose violations of required conditions. If there
are problems, attempt to remedy them.
3. Assess the fitness of the model.
• standard error of estimate
• coefficient of determination
• F-test of the analysis of variance.
4. If 1, 2 and 3 are OK, use the model to predict or
estimate the expected value of the dependent
variable.
19.12

Example 1 - Selecting sites for a motel chain (Example 19.1, p770)

The Holiday Inns group is planning an expansion. The

management wishes to predict which sites are likely to
be profitable. Several areas where predictors of
profitability can be identified are:
• competition
• market awareness
• demand generators
• demographics
• physical quality.
19.13

Example 1…
Margin
Profitability

Market
Competition Customers Community Physical
awareness

Rooms Nearest Office University Income Distance

Number of
Distance to space enrolment Median to town
hotel/motel
the nearest household Distance to
rooms
Holiday Inn income downtown
within 5 km

of the site
19.14

Example 1…

Data were collected from 100 randomly-selected Holiday

Inns and run for the following suggested model:
Margin = 0 + 1 Rooms + 2 Nearest + 3 Office + 4 Enrolment
+ 5 Income + 6 Distance + 
19.15

Excel output This is the sample regression equation

(sometimes called the prediction equation)

SUMMARY OUTPUT

Regression Statistics MARGIN = 38.664 – 0.0076ROOMS –1.656NEAREST

Multiple R 0.7231
R Square 0.5229 + 0.198OFFICE + 0.213ENROLMT
Adjusted R Square 0.4921
+ 0.366INCOME - 0.142DISTTWN
Standard Error 5.5248
Observations 100

ANOVA Let us assess this equation

df SS MS F Significance F
Regression 6 3110.80 518.47 16.99 0.00
Residual 93 2838.66 30.52
Total 99 5949.46

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 38.6643 6.9690 5.5480 0.0000 24.8252 52.5034
Number (x1) -0.0076 0.0013 -6.0586 0.0000 -0.0101 -0.0051
Nearest (x2) 1.6564 0.6345 2.6105 0.0105 0.3964 2.9164
Office Space (x3) 0.1980 0.0342 5.7933 0.0000 0.1302 0.2659
Enrollment (x4) 0.2131 0.1338 1.5921 0.1148 -0.0527 0.4788
Income (x5) 0.3660 0.1271 2.8803 0.0049 0.1137 0.6184
Distance (x6) -0.1424 0.1119 -1.2725 0.2064 -0.3647 0.0798
19.16

Model Assessment…

We will assess the model in three ways:

• Standard error of estimate,
• Coefficient of determination, and
• F-test of the analysis of variance.
19.17

Standard Error of Estimate…

In multiple regression, the standard error of estimate is
defined as:

n is the sample size and k is the number of independent

variables in the model. We compare this value with the
mean value of y:
s = 5.5248 compared to = 45.739

It seems the standard error of estimate is not particularly

small. What can we conclude?
19.18

Coefficient of Determination…
Again, the coefficient of determination is defined as:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.7231
R Square 0.5229
Adjusted R Square 0.4921
Standard Error 5.5248
Observations 100

This means that 52.29% of the variation in the

operating margin is explained by the six independent
variables, but 47.71% remains unexplained.
19.19

Adjusted R2 Value…
SUMMARY OUTPUT
What’s this?
Regression Statistics
Multiple R 0.7231
R Square 0.5229
The ‘adjusted’ R2 is: Adjusted R Square 0.4921
Standard Error 5.5248
the coefficient of Observations 100

determination adjusted for degrees of freedom.

It takes into account the sample size n, and k, the

number of independent variables, and is given by:
19.20

Testing the validity of the model…

In a multiple regression model (i.e. more than one
independent variable), we utilise an analysis of variance
technique to test the overall validity/utility of the model.
Here’s the idea:
H0: 1 = 1 = … = k = 0
HA: At least one i is not equal to zero.

If the null hypothesis is true, none of the independent

variables is linearly related to y, and so the model is invalid.
If at least one i  0, the model does have some validity and
it is useful.
19.21

Testing the Validity of the Model…

To test these hypotheses we perform an analysis of variance
procedure.

The F-test
• Construct the F-statistic
MSR = SSR/k
MSR
F =
MSE

• Rejection region MSE = SSE/(n – k – 1)

F > Fa,k,n-k-1

SST = [Variation in y] = SSR + SSE. Large F results from a large SSR. Then much of the variation in
Required conditions
y is explained by the regression model. The null hypothesis should be rejected; thus the model is
must be satisfied.
useful.
19.22

Testing the Validity of the Model…

ANOVA table for regression analysis…

Source of Degrees of Sums of

Mean squares F-statistic
variation freedom squares
Regression k SSR MSR = SSR/k F=MSR/MSE
MSE = SSE/(n–k-
Error n–k–1 SSE
1)
Total n–1

A large value of F indicates that most of the variation in y is explained by the regression model

and that the model is valid or useful. A small value of F indicates that most of the variation in y is

unexplained.
19.23

Example 1…

Excel provides the following ANOVA results:

MSR/MSE

ANOVA
df SS MS F Significance F
Regression 6 3110.80 518.47 16.99 0.00
Residual 93 2838.66 30.52
Total 99 5949.46
p-value of the
SSR F-test

SSE MSE

MSR
19.24

Example 1…
• Excel provides the following ANOVA results
ANOVA
df SS MS F Significance F
Regression 6 3110.80 518.47 16.99 0.00
Residual 93 2838.66 30.52
Total 99 5949.46

F,k,n-k-1 = F0.05,6,100-6-1 = 2.17

F = 16.99 > 2.17

Also the p-value (significance F) = 0.00.

Clearly, p-value = 0.00 < 0.05 = , the null hypothesis is rejected.

Conclusion: There is sufficient evidence to reject the null hypothesis in favour of the alternative hypothesis. That is, at least one of the bi

is not equal to zero. Thus, at least one independent variable is linearly related to y.

This linear regression model is useful.

19.25

Table 19.1 Summary

Assessmen
SSE R2 F
t of model

0 0 1 Perfect

Small Small Close to 1 Large Good

Large Large Close to 0 Small Poor

0 0 Useless

Once we’re satisfied that the model fits the data as well as possible, and that the required conditions are

satisfied, we can interpret and test the individual coefficients and use the model to predict and estimate…
19.26

Interpreting the Coefficients

• ˆ0  38.66 T is the intercept, the value of y when
all the variables take the value zero. Since the data range of
all the independent variables do not cover the value zero, do
not interpret the intercept.
• ˆ 1  .0076 In this model, for each additional 1 000 rooms
within 5 km of the Holiday Inn, the operating margin
decreases on average by 7.6% (assuming the other variables
are held constant).
• ˆ2  1.656 In this model, for each additional km that the
nearest competitor is to the Holiday Inn, the average
operating margin increases by 1.65%.
19.27

Interpreting the coefficients…

• ˆ3  .198 For each additional 1 000 square metre of office

space, the average increase in operating margin will be 0.198%.
• ˆ4  .213 For each additional thousand students MARGIN
increases by 0.21%.
• ˆ5  .366 For each additional $1 000 increase in median
household income, MARGIN increases by 0.37%.
• For each additional km to downtown, MARGIN
decreases by 0.14% on average.
19.28

Testing the Coefficients

For each independent variable, we can test to determine
whether there is enough evidence of a linear relationship
between it and the dependent variable for the entire
population:
H 0:  i = 0
HA: i ≠ 0

(for i = 1, 2, …, k) and using:

ˆi  i
t
sˆ
i
as our t test statistic (with n–k–1 degrees of freedom).
19.29

Testing the Coefficients…

The hypotheses for each bi
Test statistic

H0: bi = 0
ˆ i  i
HA: bi  0 t d.f. = n - k -1
sˆ
i

Excel output: Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.6643 6.9690 5.5480 0.0000 24.8252 52.5034
Number (x1) -0.0076 0.0013 -6.0586 0.0000 -0.0101 -0.0051
Nearest (x2) 1.6564 0.6345 2.6105 0.0105 0.3964 2.9164
Office Space (x3) 0.1980 0.0342 5.7933 0.0000 0.1302 0.2659
Enrollment (x4) 0.2131 0.1338 1.5921 0.1148 -0.0527 0.4788
Income (x5) 0.3660 0.1271 2.8803 0.0049 0.1137 0.6184
Distance (x6) -0.1424 0.1119 -1.2725 0.2064 -0.3647 0.0798
19.30

Testing the Coefficients… INTERPRET

Conclusion:
Thus, the number of hotel rooms within 5km distance of the
nearest competitor (Holiday Inn), amount of office space,
and median household income are linearly related to the
operating margin.
There is no evidence to infer that university enrolment and
distance to the town centre are linearly related to
operating margin.
The number of hotel rooms within 5km distance to the
nearest competitor and distance to downtown have a
negative effect on the operating margin, while all other
variables have a positive effect.
19.31

Using the regression equation

The model can be used by:
• producing a prediction interval for the particular
value of y, for a given set of values of xi.
• producing an interval estimate for the expected value
of y, for a given set of values of xi.

The model can be used to learn about relationships

between the independent variables xi and the dependent
variable y, by interpreting the coefficients i
19.32

Example 1

Predict the MARGIN of an inn at a site with the

following characteristics:
• 3 815 rooms within 5 km
• closest competitor 0.9 km away
• 476 000 square metres of office space
• 24 500 university students
• $38 000 median household income
• 17.9 km distance to downtown centre.

MARGIN = 38.66 – 0.0076(3815) + 1.656(0.9) + 0.198(47.6)

+0.213(24.5) + 0.366(38) – 0.142(17.9) = 37.08

19.33

Example 1 - Solution COMPUTE

Using Excel (Data Analysis Plus):

We add one row (our given values for the independent variables)
to the bottom of our data set:

Then we use: Prediction Interval

Add-Ins > Data Analysis Plus Margin (y)

> Prediction Interval Predicted value 37.07624

to obtain the predicted values and intervals. Prediction

Lower limit
Interval
25.35321
Upper limit 48.79927
Excel output is presented on the right.
Interval Estimate of Expected Value
Lower limit 32.94539
Upper limit 41.20709
19.34

Prediction Interval… INTERPRET

Prediction Interval
We predict that the operating margin Margin (y)
will fall between 25.3 and 48.8. Predicted value 37.07624

Prediction Interval
Lower limit 25.35321
If management defines a profitable Upper limit 48.79927

inn as one with an operating margin Interval Estimate of Expected Value

Lower limit 32.94539
greater than 50% and an unprofitable Upper limit 41.20709

inn as one with an operating margin

below 30%, they will pass on this site,
since the entire prediction interval
is below 50%.
19.35

Confidence Interval INTERPRET

The expected operating margin of all Prediction Interval

sites that fit this category is Margin (y)

estimated to be between 32.9 and Predicted value 37.07624

41.2. Prediction Interval

Lower limit 25.35321
Upper limit 48.79927

Interval Estimate of Expected Value

We interpret this to mean that if we Lower limit 32.94539
Upper limit 41.20709
built inns on an infinite number of
sites that fit the category described,
the mean operating margin would
fall between 32.9 and 41.2. In other
words, the average inn would not be
profitable either…
19.36

19.3 Regression Diagnostics – II

The required conditions for the model must be checked.

Calculate the residuals and check the following:
• Is the error variable non-normal?
Draw the histogram of the residuals.
• Is the error variance constant?
Plot the residuals versus the predicted values of
y.
• Are the errors independent (time-series data)?
Plot the residuals versus the time periods.
• Are there observations that are inaccurate or do not belong
to the target population?
Double-check the accuracy of outliers and
influential observations.
19.37

Regression Diagnostics – II…

• Multiple regression models have a problem that

simple regressions do not, namely multicollinearity.

• It happens when the independent variables are

highly correlated.

• We’ll explore this concept through the following

example…
19.38

Example 2

A real estate agent believes that the selling price of a

house can be predicted using the house size, number of
bedrooms and lot size. A random sample of 100 houses
was drawn and data recorded.

Price Bedrooms H size Lot size

124100 3 129 390
218300 4 208 660
117800 3 125 375
. . . .
. . . .

Analyse the relationship between house prices and the

three variables, house size, number of bedrooms and lot
size.
19.39

Example 2: Solution IDENTIFY

The proposed model is

PRICE = 0 + 1BEDROOMS + 2H-SIZE + 3LOTSIZE + e

19.40

Example 2: Solution IDENTIFY

Using Excel (Data Analysis): Data > Data Analysis > Regression

The F-test indicates the model is valid…

…but these t-stats suggest none of the

variables are related to the selling price.

19.41

Example 2: Solution IDENTIFY

Unlike the t-tests in the multiple regression model, these three

t-tests for the significance of correlation coefficients, tell us

that the number of bedrooms, the house size, and the lot size

are all linearly related to the price…

41
19.42

Example 2: Solution… IDENTIFY

How to account for this apparent contradiction?

The answer is that the three independent variables are
correlated with each other!

(i.e. this is reasonable: larger houses have more bedrooms and are situated on larger lots, and smaller

houses have fewer bedrooms and are located on smaller lots.)

Multicollinearity has affected the t-tests so that they implied that none of the independent

variables is linearly related to price when, in fact, all are.

19.43

Example 2: Solution… IDENTIFY

When regressing the price on each independent variable alone,

it is found that each variable is strongly related to the selling
price.
Multicollinearity is the source of this problem.
Multicollinearity causes two kinds of difficulties:
• The t statistics appear to be too small.
• The  coefficients cannot be interpreted as ‘slopes’.

To overcome the multicollinearity problem,

• one can drop one of the correlated variables or
• the variables can be deviated from their respective mean value and
these mean deviated variables can be used in the regression model
without a constant term estimation.
19.44

Remedying violations of required conditions

Non-normality or heteroscedasticity can be remedied

using transformations on the y variable.
The transformations can improve the linear
relationship between the dependent variable and the
independent variables.
Many computer software systems allow us to make the
transformations easily.
19.45

Remedying violations of required conditions

A brief list of transformations:
y’ = log y (for y > 0)
• Use when the se increases with y, or
• Use when the error distribution is positively skewed.
y’ = y2
• Use when the s2e is proportional to E(y), or
• Use when the error distribution is negatively skewed.
y’ = y1/2 (for y > 0)
• Use when the s2e is proportional to E(y).
y’ = 1/y
• Use when s2e increases significantly when y increases
beyond some value.
19.46

Example 3: The effect of time limits on quiz marks

(Example 19.9, p800)

A statistics lecturer wanted to know whether time limit

affect the marks on a quiz. A random sample of 100
students was split into five groups. Each student did a
quiz, but each group was given a different time limit.
See data below. Analyse these results and include
diagnostics.

Time 40 45 50 55 60
M 20 24 26 30 32
a 23 26 25 32 31
r
. . . . .
k
s
. . . . .
19.47

Example 3 - Solution
50
40

The model tested: 30

20
MARK = b0 + b1TIME + e 10
0
SUMMARY OUTPUT -2.5 -1.5 -0.5 0.5 1.5 2.5 More
This model is useful and
Regression Statistics
Multiple R 0.86254 provides a good fit.
R Square 0.743974
Adjusted R Square 0.741362
Standard Error 2.304609
Observations 100 The errors seem to be normally

distributed.
ANOVA
df SS MS F Significance F
Regression 1 1512.5 1512.5 284.7743 9.42E-31
Residual 98 520.5 5.311224
Total 99 2033

Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -2.2 1.64582 -1.33672 0.184409 -5.46608 1.066077
Time 0.55 0.032592 16.87526 9.42E-31 0.485322 0.614678
19.48

Example 3 - Solution…

Standardised errors vs. predicted mark

4
3
2
1
0
-1 20 22 24 26 28 30 32
-2
-3

The standard error of estimate seems to increase with the predicted value of y.

Two transformations are used to remedy this problem:

1. y’ = logey

2. y’ = 1/y
19.49

Example 3 - Solution…
Let us see what happens when a transformation is applied:

Mark
40
LogMark
The original data, where 4
‘mark’ is a function of ‘time’ The modified data, where
35
LogMark is a function of ‘time’

40, 3.135
Loge23 = 3.135 3
25
40, 2.89

40,23

40,18 Loge18 = 2.89

2
15
0 20 40 60 80 0 20 40 60 80
19.50

Example 3 - Solution…

The new regression analysis and diagnostics are:

The model tested:

LOGMARK = b’0 + b’1TIME + e’ Predicted LogMark = 2.1295 + 0.021 time

SUMMARY OUTPUT

Regression Statistics This model is useful and

Multiple R 0.878300
R Square 0.771412 provides a good fit.
Adjusted R Square 0.769079
Standard Error 0.084437
Observations 100

ANOVA
df SS MS F Significance F
Regression 1 2.357901 2.35790 330.72 3.58E-33
Residual 98 0.698705 0.00713
Total 99 3.056606

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 2.129582 0.060300 35.31632 1.51409E-57 2.009918 2.249246
Time 0.021716 0.001194 18.18566 3.58062E-33 0.019346 0.024086
19.51

Example 3 - Solution…

20 The errors seem

to be normally
10
distributed.
0
-2.5 -1.5 -0.5 0.5 1.5 2.5 More

Standard Residuals
4

2
The standard error still changes with
0
the predicted y, but the change is 2.9 3 3.1 3.2 3.3 3.4 3.5
smaller than before.
-2

-4
19.52

Example 3 - Solution…

How do we use the modified model to predict?

Let TIME = 55 minutes

LogMark = 2.1295 + 0.0217 time

= 2.1295 + 0.0217(55) = 3.323

To find the predicted mark, take the antilog:

3.323
Mark = antiloge3.323 = e = 27.743
19.4 Regression Diagnostics – III
19.53

(Time series)
Durbin–Watson test
This test detects first-order autocorrelation between
consecutive residuals in a time series of the form
et =  et-1 + vt, t=2,3,…,n,

and test the hypotheses:

H0: ρ = 0

HA: ρ  0

If the null hypothesis is rejected, we conclude that

autocorrelation exists and the error variables are not
independent.
19.54

Durbin-Watson (DW) Test…

The test statistic for Durbin-Watson (DW) test is

n
2
 (et  et 1 )
d  t 2 n
, d  2(1 – ρ)
2
 et
t 1

Residual at time t
Since -1    1, we have 0  d  4. Therefore,

if d = 2 then there is no evidence of autocorrelation

19.55

Durbin-Watson (DW) Test…

Positive first-order autocorrelation occurs

when consecutive residuals tend to be

Positive first-order autocorrelation
Residuals + Residuals
similar. Then the value of d is small (< 2).
+
+

+
0
Time
+

+
+
+

Negative first-order autocorrelation occurs when consecutive residuals

Negative first-order autocorrelation
tend to differ markedly. Then the value of d is large (> 2).
Residuals
Residuals
+ +

0
+ + + Time
+
19.56

Durbin-Watson (DW) Test…

0 2 4

Small values of d (d < 2) indicate a positive Large values of d (d > 2) imply a negative

first-order autocorrelation. first-order autocorrelation.

19.57

Durbin-Watson (one-tail) test

To test for positive first-order autocorrelation:

Positive first-order autocorrelation
Inconclusive test Positive first-order autocorrelation does not exist
exists

dL dU 2 4
0

• If d < dL , we conclude that there is enough evidence to

support positive first-order autocorrelation.
• If d > dU , we conclude that there is not enough evidence
to support positive first-order autocorrelation.
• If dL ≤ d ≤ dU , the test is inconclusive.

dL and dU can be read from Table 11, Appendix B

57
19.58

Durbin–Watson (one-tail) Test

To test for negative first-order autocorrelation:

Negative first-order autocorrelation
Negative first-order autocorrelation does not exist Inconclusive test
exists

0 2 4-dU 4-dL 4

• If d > 4 – dL, we conclude that there is enough evidence to

support negative first-order autocorrelation.
• If d < 4 – dU, we conclude that there is not enough evidence to
support negative first-order autocorrelation.
• If 4 – dU ≤ d ≤ 4 – dL, the test is inconclusive.

dL and dU can be read from Table 11, Appendix B

58
19.59

Durbin–Watson (two-tail) test

To test for first-order autocorrelation:
First-order autocorrelation exists First-order autocorrelation exists
Inconclusive Doesn’t exist inconclusive

0 dL dU 2 4-dU 4-dL 4

• If d < dL or d > 4 – dL, first-order autocorrelation exists.

• If d falls between dL and dU or between 4 – dU and 4 – dL , the
test is inconclusive.
• If d falls between dU and 4 – dU there is no evidence of first
order autocorrelation.
dL and dU can be read from Table 11, Appendix B
19.60

Durbin–Watson test using Excel COMPUTE

Step 1: Using Data Analysis, run the regression and obtain

the residuals
Data > Data Analysis > Regression (tick
Residuals box then OK)
Step 2: Use Data Analysis Plus and the residual output in
Step 1 to obtain the DW statistic.
Add-Ins > Data Analysis Plus > Durbin–
Watson statistic > Highlight range of residuals
from regression run > OK
19.61

Example 4 IDENTIFY

How does the weather affect the sales of lift tickets in a ski
resort? Data of the past 20 years sales of tickets, along with
the total snowfall and the average temperature during July
in each year, were collected.

The model hypothesised was

TICKETS = b0+ b1SNOWFALL+ b2TEMPERATURE + e

Regression analysis yielded the following results:

19.62

Example 4 – Solution… COMPUTE

Both the coefficient of determination and the p-value of the F-test

indicate the model is poor…

Neither variable is linearly related to ticket sale…

19.63

Example 4 - Solution… INTERPRET

Themodel
The modelseems
seemsto
tobe
bevery
verypoor:
poor:

•• The fit of the model is very low

22
The fit of the model is very low (R ==0.12)
(R 0.12)

•• Modelisisnot
Model notvalid
valid(significance
(significanceFF==0.33)
0.33)

•• Noneof
None ofthe
the33variables
variablesseem
seemto
tobe
belinearly
linearlyrelated
relatedto
tosales.
sales.

Diagnosisof
Diagnosis ofthe
therequired
requiredconditions
conditionsresulted
resultedin
inthe
thefollowing
followingfindings:
findings:
19.64

Example 4 – Solution…
The histogram of residuals…

reveals the errors may be normally distributed…

19.65

Example 4 – Solution… COMPUTE

In the plot of residuals versus predicted values (testing

for heteroscedasticity) – the error variance appears to
be constant…
19.66

Example 4 – Solution… COMPUTE

Durbin-Watson (DW) test

Apply the Durbin-Watson Statistic from Data Analysis Plus
to the entire list of residuals.
19.67

Example 4 - Solution… INTERPRET

To test for positive first-order autocorrelation with  = .05, we

find in Table 11(a) in Appendix B
dL = 1.10 and dU = 1.54

The null and alternative hypotheses are

H0 : There is no first-order autocorrelation.
HA : There is positive first-order autocorrelation.

The rejection region is d < dL = 1.10. Since d = 0.59, we reject

the null hypothesis and conclude that there is enough evidence
to infer that positive first-order autocorrelation exists.
19.68

Example 4 – Solution… INTERPRET

Autocorrelation usually indicates that the model needs to

include an independent variable that has a time-ordered
effect on the dependent variable.

The simplest such independent variable represents the time

periods. We included a third independent variable that
records the number of years since the year the data were
gathered. Thus, YEARS = 1, 2,..., 20. The new model is

TICKETS = b0+ b1SNOWFALL+ b2TEMPERATURE+ b3YEARS+ e

19.69

Example 4 – Solution… COMPUTE

The fit of the model is high.

The model is valid…

Snowfall and time are linearly related to ticket sales; temperature

Our new variable
is not…
19.70

Example 4 – Solution… INTERPRET

If we re-run the Durbin-Watson statistic against the

residuals from our Regression analysis,

we can conclude that there is not enough evidence to

infer the presence of first-order autocorrelation.
(Determining dL and dU is left as an exercise for the
reader…)

Hence, we have improved our model dramatically!

19.71

Example 4 – Solution… INTERPRET

All the required conditions are met for this model.

The fit of this model is high: R2 = 0.74.
The model is useful as p-value (F-test) = 0.0001 is very low.
SNOWFALL and YEARS are linearly related to ticket sales.
TEMPERATURE is not linearly related to ticket sales.
19.72

Example 4 - Solution INTERPRET

Notice that the model has improved dramatically.

The F-test tells us that the model is valid. The t-tests tell
us that both the amount of snowfall and time are
significantly linearly related to the number of lift tickets.
This information could prove useful in advertising for the
resort. For example, if there has been a recent snowfall,
the resort could emphasise that in its advertising.
If no new snow has fallen, it may emphasise their snow-
making facilities.

Multiple Regression
No ratings yet
Multiple Regression
25 pages
Chapter 18
No ratings yet
Chapter 18
25 pages
Multiple Regression
No ratings yet
Multiple Regression
39 pages
Ch18 Multiple Regression
No ratings yet
Ch18 Multiple Regression
51 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
36 pages
Time Series Regression Analysis
No ratings yet
Time Series Regression Analysis
18 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Multiple Linear Regression-I
No ratings yet
Multiple Linear Regression-I
6 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
53 pages
Session 3: - Quantitative Demand
No ratings yet
Session 3: - Quantitative Demand
32 pages
MultipleRegression 1
No ratings yet
MultipleRegression 1
40 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Multiple Regression Analysis for Business Forecasting
No ratings yet
Multiple Regression Analysis for Business Forecasting
100 pages
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
No ratings yet
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
6 pages
Autocorrelation and Regression in Excel
No ratings yet
Autocorrelation and Regression in Excel
19 pages
Multiple Regression (Compatibility Mode)
No ratings yet
Multiple Regression (Compatibility Mode)
24 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
12 pages
Multiple Regression Insights
100% (1)
Multiple Regression Insights
21 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
19 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Simple vs. Multiple Regression Analysis
No ratings yet
Simple vs. Multiple Regression Analysis
65 pages
Topic 3 Multiple Regression Analysis Estimation
No ratings yet
Topic 3 Multiple Regression Analysis Estimation
31 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Multiple Linear Regression
100% (4)
Multiple Linear Regression
26 pages
Chuong 6 - Hoi Quy Boi (SBE - 11e Ch15)
No ratings yet
Chuong 6 - Hoi Quy Boi (SBE - 11e Ch15)
67 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Multiple Linear Regression Slides
No ratings yet
Multiple Linear Regression Slides
17 pages
Demand Estimation via Regression Analysis
No ratings yet
Demand Estimation via Regression Analysis
9 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
40 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Lecture Week 13 - Regression
No ratings yet
Lecture Week 13 - Regression
10 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Ch13 Multiple Regres
No ratings yet
Ch13 Multiple Regres
46 pages
Lecture 16
No ratings yet
Lecture 16
29 pages
Chapter 5,6 Regression Analysis
50% (2)
Chapter 5,6 Regression Analysis
44 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
50 pages
Multiple Regression
100% (1)
Multiple Regression
100 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
LGT2425 Lecture 3 Part II (Notes)
No ratings yet
LGT2425 Lecture 3 Part II (Notes)
55 pages
05 Linear Regression 2
No ratings yet
05 Linear Regression 2
71 pages
Advanced Managerial Statistics Guide
No ratings yet
Advanced Managerial Statistics Guide
37 pages
Simple Liner REgression
No ratings yet
Simple Liner REgression
27 pages
Chap 014
No ratings yet
Chap 014
20 pages
Regression Analysis and Techniques
No ratings yet
Regression Analysis and Techniques
49 pages
Multiple Regression Forecasting Techniques
No ratings yet
Multiple Regression Forecasting Techniques
100 pages
C2 English
No ratings yet
C2 English
34 pages
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
No ratings yet
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
26 pages
Type Equation Here. Type Equation Here.: N SST K N SSE R y
No ratings yet
Type Equation Here. Type Equation Here.: N SST K N SSE R y
2 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Aeris Corporate Overview Brochure 2018
No ratings yet
Aeris Corporate Overview Brochure 2018
3 pages
Promoting The Columnar To Equiaxed Transition and Grain Refinement of Titanium Alloys During Additive Manufacturing
No ratings yet
Promoting The Columnar To Equiaxed Transition and Grain Refinement of Titanium Alloys During Additive Manufacturing
14 pages
Dimensional Testing for Laminate Panels
No ratings yet
Dimensional Testing for Laminate Panels
1 page
Shardmind 5e Race
No ratings yet
Shardmind 5e Race
1 page
XC740KXC760K Minicargador
100% (1)
XC740KXC760K Minicargador
4 pages
Assembly Language Programming Tasks
No ratings yet
Assembly Language Programming Tasks
6 pages
Bradford Protein Assay Analysis
No ratings yet
Bradford Protein Assay Analysis
2 pages
Dead Magic II Secrets and Survivors (Revised)
No ratings yet
Dead Magic II Secrets and Survivors (Revised)
156 pages
Macadamia Agribusiness Business Plan
No ratings yet
Macadamia Agribusiness Business Plan
35 pages
Currency Exchange and Inflation Analysis
100% (18)
Currency Exchange and Inflation Analysis
23 pages
Lums Opd Policy 2023 - 2024
No ratings yet
Lums Opd Policy 2023 - 2024
4 pages
6 - Graphical Display of Categorical Data
No ratings yet
6 - Graphical Display of Categorical Data
10 pages
Hydraulic Expansion Locating Pin Guide
No ratings yet
Hydraulic Expansion Locating Pin Guide
88 pages
Madison County Scores and Inspection Reports
No ratings yet
Madison County Scores and Inspection Reports
9 pages
Chronic Kidney Disease Overview
100% (1)
Chronic Kidney Disease Overview
9 pages
User Instruction Manual: AP16 Precharged Pneumatic Air Pistols
No ratings yet
User Instruction Manual: AP16 Precharged Pneumatic Air Pistols
14 pages
JEE Main-6 - JEE 2024 - Solution
No ratings yet
JEE Main-6 - JEE 2024 - Solution
14 pages
2021 Sustainable Development Highlights
No ratings yet
2021 Sustainable Development Highlights
1 page
Pitfalls in Evaluating The Low Risk Chest Pain Patient
No ratings yet
Pitfalls in Evaluating The Low Risk Chest Pain Patient
19 pages
Open Access Dataset Toolbox and Benchmark Processing Results of
No ratings yet
Open Access Dataset Toolbox and Benchmark Processing Results of
12 pages
Mit Doe1
No ratings yet
Mit Doe1
57 pages
CR
No ratings yet
CR
104 pages
Automation Stations Modular Model PXC... - U
No ratings yet
Automation Stations Modular Model PXC... - U
12 pages
Kinematic Waves and Freeway Bottlenecks
No ratings yet
Kinematic Waves and Freeway Bottlenecks
15 pages
S3NQ18KL2PA
No ratings yet
S3NQ18KL2PA
2 pages
Poli 140907065955 Phpapp01 PDF
No ratings yet
Poli 140907065955 Phpapp01 PDF
8 pages
IGCSE ICT Chapter 12 Images
No ratings yet
IGCSE ICT Chapter 12 Images
11 pages
PDRRMC MEMO 53 SUSPENSION OF Classes
No ratings yet
PDRRMC MEMO 53 SUSPENSION OF Classes
1 page
Chapter 1 Biodiversity - Notes - Student
No ratings yet
Chapter 1 Biodiversity - Notes - Student
8 pages
تعديل اخير
No ratings yet
تعديل اخير
60 pages