R2 INTERPRETATION
1
R2 INTERPRETATION
2
3
4
R2 INTERPRETATION
• Coefficient of Determination = r2 = 0.7493 or 74.93%
• We can conclude that 74.93% of the total sum of squares can be
explained by using the estimated regression equation to predict the
tip amount.
• The remainder is error.
5
Sum of Squared Error
𝑛
(𝑦𝑖 − 𝑦ො𝑖 )2
𝑖=1
• A measure of the variability of the observation about the regression
line
7
Mean Squared Error
• MSE 𝑠 2 is an estimate of 𝜎 2 the variance of the error, ɛ.
• In other words, how spread out the data points are from the regression line.
MSE is SSE divided by its degrees of freedom which is 2 because we are
estimating the slope and intercept.
MSE = 𝑠 2 =SSE/n-2
• Why divide by n - 2 and not just N? REMEMBER, we are using sample
data. It's also why we use 𝑠 2 and not 𝜎 2 .
• This is why MSE is not simply the average of the residuals.
• If we were using population data, we would just divide by N and it would
simply be the average of the residuals.
8
Standard error of the Estimate/ standard
deviation of data points
• The standard error of the estimate σ (or just "standard error") is the standard
deviation of the error term, ɛ. Now we are UN-SQUARED!
• It is the average distance an observation falls from the regression line in units
of the dependent variable.
• Since the MSE is 𝑠 2 , the standard error is just the square root of MSE.
• s= √MSE = √ SSE/n-2
• s = √7.5187 = 2.742
• So the average distance of the data points from the fitted line is about $2.74.
• You can think of s as a measure of how well the regression model makes
predictions. Can be used to make prediction intervals.
9
Statistically significant
• How much variance in the dependent variable is explained by the
model / independent variable?
• For this we look at the value of R2 or Adjusted- R2
• Does a statistically significant linear relationship exist between the
independent and dependent variables?
• Is the overall F-test or t-test (in simple regression these are actually the same
thing) significant?
• Can we reject the null hypothesis that the slope b1 of the regression line is
ZERO?
• Does the confidence interval for the slope b1 contain zero?
10
Confidence Interval
• 95% confidence that the actual mean for the population falls within
this interval
t-value calculation
• 𝑏1 ± 𝑡αΤ2 𝑠𝑏1
Where 𝑠𝑏1 is standard deviation of the slope,
𝑡αΤ2 𝑠𝑏1 is margin of error, 𝑏1 is point estimator for the slope
13
14
Standard Deviation of the slope
𝑠
• 𝑠𝑏1 =
σ(𝑥𝑖 −𝑥)ҧ 2
• =2.742/sqrt(4206)
• =0.04228
15
Standard error of the Estimate
• The standard error of the estimate σ (or just "standard error") is the standard
deviation of the error term, ɛ.
• Since the MSE is 𝑠 2 , the standard error is just the square root of MSE.
• s= √MSE = √ SSE/n-2
• s = √7.5187 = 2.742
16
17
Confidence Interval for slope
• 0.1462197 ± 𝑡0.05Τ2 0.04228
• 0.1462197 ± 2.776 ∗0.04228
• 0.1462197 ± 0.11737
• (0.02885,0.2636)
We are 95% confident that the interval (0.02885,0.2636)contains the
true slope of the regression line
18
Does the interval contain zero?
• (0.02885,0.2636)
• Hypothesis : 𝐻0 : 𝑏1 = 0
𝐻𝑎 : 𝑏1 ≠ 0
• Can we reject null hypothesis have slope as zero?
• Null hypothesis is that the slope of the regression line is zero and
therefore there is no significant relationship exist between two
variables.
19
Test statistics
𝑏1 0.1462197
•𝑡= = = 3.4584
𝑆𝑏1 0.04228
• t vs 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙
𝑧
• 3.4584 > 2.776 is significant, so reject null hypothesis
20
21
Pearson correlation coefficient
22
T value from pearson correlation
23
24
25