Adekunle Onaopepo 2561632
Ghareh Gozlou Samira 2561142
Problem 6
a)
The matrix of correlations is illustrated in the table below
mpg
cylinders
displacement
horsepower
weight
acceleration
year
origin
mpg
cylinders
1.0000
-0.7776
-0.8051
-0.7784
-0.8322
0.4233
0.5805
0.5652
-0.7776
1.0000
0.9508
0.8429
0.8975
-0.5046
-0.3456
-0.5689
displacement
-0.8051
0.9508
1.0000
0.8972
0.9329
-0.5438
-0.3698
-0.6145
horsepower
weight
acceleration
-0.7784
0.8429
0.8972
1.0000
0.8645
-0.6891
-0.4163
-0.4551
-0.8322
0.8975
0.9329
0.8645
1.0000
-0.4168
-0.3091
-0.5850
0.4233
-0.5046
-0.5438
-0.6891
-0.4168
1.0000
0.2903
0.2127
year
origin
0.5805
-0.3456
-0.3698
-0.4163
-0.3091
0.2903
1.0000
0.1815
0.5652
-0.5689
-0.6145
-0.4551
-0.5850
0.2127
0.1815
1.0000
Without loss of generality,
The correlation between a variable and itself is 1.0
The correlation (, ) {, , , } is negative
showing inverse relation between the variables
The correlation (, ) {, , } is positive showing
direct relation between the variables
The correlation between year and origin which is 0.18 is close to zero which implies that the variables
are nearly independent on one another.
b)
Mpg ~ cylinders
Mpg ~ displacement
Mpg ~ horsepower
Mpg ~ year
RSE
4.914
4.365
4.906
6.363
R squared
0.6047 (Significant)
0.6482 (Significant)
0.6059 (Significant)
0.337
Clear from the illustrations above mpg as a response is statistically significant with respect to cylinders
displacement and horsepower but not so statistically significant with year.
c)
The 95% confidence interval for all parameters estimates are:
(Intercept)
cylinders
displacement
horsepower
weight
[-26.349864469
[ -1.129001385
[ 0.005119788
[ -0.044058392
[ -0.007756074
,-8.087004775]
, 0.142248747]
, 0.034671499]
, 0.010156103]
,-0.005192013]
acceleration [ -0.113769257 , 0.274920933]
year
[ 0.650551315 , 0.850994041]
origin
[ 0.879280169 , 1.973000822]
The values suggest a possible high standard error for the intercept parameter estimate (0) and lowest
standard error in the group is the weight parameter(4). This implies that the 0.95 probability of the 0
being a true estimate lies in a wider range than that of 4
Multiple Linear Regression
Residual standard error: 4.914 on 390 degrees of freedom
Multiple R-squared: 0.6047
Simple Linear Regression (X as cylinders, displacement, horsepower and year respectively)
Residual standard error: 4.914 on 390 degrees of freedom
Multiple R-squared: 0.6047
Residual standard error: 4.635 on 390 degrees of freedom
Multiple R-squared: 0.6482
Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared: 0.6059
Residual standard error: 6.363 on 390 degrees of freedom
Multiple R-squared: 0.337
The model fit is worse in the simple linear regression, since generally, the estimates for the Residual
standard error and R-squared for cylinders, horsepower and displacement are higher and lower
(respectively) than in the multiple regression model. This implies that the multiple linear regression is a
better estimate of the system.
d)
The residual plot in the upper left illustration suggest a slight non-linearity but still generally acceptable.
The residual plot also marks out observations 323,327,326 as outliers.
The leverage plot in the lower left illustration shows observation 14 possesses an unusually high
leverage
e)
In this case we decided to choose the pairwise combination with non-linear displacement in the form
Cylinders ~ weight + exp(displacement)
Year ~ cylinders + sqrt(displacement)
Weight ~ year + cubic- displacement
The residual vs fitted values was selected since its the best to evaluate non-linearity and it shows in the
illustrations 1 and 3 that the model fit degrades and is not the best fit but illustration 2 is still somewhat
acceptable except the ending deviates substantially from the mean also, so its debatable.