Business Analytics
Fall 2021
Week-04
Making Numerical Predictions
Simple Linear Regression
Question: How much? or How many?
uses regression algorithms
Regression algorithms make numerical predictions,
such as:
What will be next month car sales?
What will be next week’s temp?
They help answer any question that asks for a
number.
Predict the values of a dependent (response) variable
based on values of at least one independent
(explanatory) variable
Scatterplot
Examine possible relationships between two numerical
variables
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
Negative Linear Relationship No Relationship
Chap 13-4
Simple Linear Regression Model
Simple: One (independent) variable
Linear: Relationship between Variables is
Described by a Linear Function
The Change of One Variable Causes the Other
Variable to Change
A Dependence of One Variable on the Other
Introduction to Linear Regression
(cont.)
Any straight line can be represented by an
equation of the form Y = a + bX, where ‘b’ and
‘a’ are constants.
The value of ‘b’ is called the slope constant and
determines the direction and degree to which the
line is tilted.
The value of ‘a’ is called the Y-intercept and
determines the point where the line crosses the
Y-axis.
6
Introduction to Linear Regression
(cont.)
How well a set of data points fits a straight line can be
measured by calculating the distance between the data points
and the line.
The total error between data points and the line is obtained by
squaring each distance & then summing the squared values.
Regression equation: Minimize the sum of squared errors.
Error = Actual – Predicted
8
Regression Equation
Calculating a and b
Simple Linear Regression:
Example 1
• Determine / predict salary column values (Dependent
Variables) based on years of experience.
Example Solution
Finding the regression equation
X Y X2 XY
2 15 n=9 (no of data points)
3 28
5 42
13 64
8 50
16 90
11 58
1 8
9 54
Avg(X) Avg(Y) Sum(X2) Sum(XY)
Example Solution
Finding the regression equation
X Y X2 XY
2 15 4 30
n=9
3 28 9 84
5 42 25 210
13 64 169 832
8 50 64 400
b = 4.80
16 90 256 1440
11 58 121 638 a = 45.44 – (4.80 x 7.56)
1 8 1 8 a = 9.18
9 54 81 486
Y = 9.18 + 4.80 X
Sal = 9.18 + 4.80 (Exp)
Avg(X) = 7.56 Sum(X2) = 730
Avg(Y) = 45.44 Sum(XY) = 4128
Interpretation of Results: Example
Interpreting the slope
Y = 9.18 + 4.80 X
Sal = 9.18 + 4.80 (Exp)
The slope of 4.80 means that for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 4.80 units.
The equation estimates that for each increase of 1
year in experience, the expected salary is
predicted to increase by $4800 (4.80 x 1000$).
Making predictions
Sal = 9.18 + 4.80 (Exp)
What is the predicted salary for someone with
an experience of
6 years
14 years
Example Solution
Predicted Values & Residuals (errors)
X Y Pred Y Residual Y = 9.18 + 4.80 X
2 15 18.78 -3.78
3 28 23.58 4.42
5 42 33.18 8.82
13 64 71.58 -7.58
8 50 47.58 2.42
16 90 85.97 4.03
11 58 61.98 -3.98
1 8 13.98 -5.98
9 54 52.38 1.62
What is the predicted salary for experience of 10
years? What is the prediction error?
Evaluating Regression Results
The ability of the regression equation to accurately
predict the Y values is measured by first computing
the proportion of the Y-score variability that
is predicted by the regression equation and the
proportion that is not predicted.
Measures of Variation:
The Sum of Squares
(continued)
_
Y
X
Xi
Measures of Variation:
The Sum of Squares
(continued)
Y
SSE =(Yi - Yi )2
_
SST = (Yi - Y)2
_
SSR = (Yi - Y)2
_
Y
X
Xi
Measures of Variation:
The Sum of Squares
(continued)
SST = Total Sum of Squares
Measures the variation of the Yi values around
their mean, Y
SSR = Regression Sum of Squares
Explained variation attributable to the relationship
between X and Y
SSE = Error Sum of Squares
Variation attributable to factors other than the
relationship between X and Y
Measures of Variation:
The Sum of Squares
SST = SSR + SSE
Total
= Explained + Unexplained
Sample
Variability Variability
Variability
Measures of Variation
The Sum of Squares: Example
Excel Output for Salary example
SSR
SSE SST
The Coefficient of Determination
SSR Regression Sum of Squares
r
2
SST Total Sum of Squares
Measures the proportion of variation in Y that
is explained by the independent variable X in
the regression model
Excel Output: Salary example
R-sq = SSR/SST
= 4980.907/5226.222
= 0.953
The Coefficient of Determination
r2
Recall:
The coefficient of determination r2 is the
proportion of variability in the response variable
“explained” by the regression.
It’s another way of saying, “By introducing this
other variable, how much better is my estimate
than it would be if I simply used the average to
make my estimate.”
The Coefficient of Determination
r2
Example: In a study of bone density as a function
of body weight, an ‘r’ of .6 is noted.
Interpretation 1 (full credit): “About 36% of the
variability in bone density is explained by the
linear regression of bone density on body weight.”
Interpretation 2 (less credit): “About 36% of
variability in bone density is accounted for by body
weight.”
Inferences about the Slope:
F Test
F Test for a Population Slope
Is there a linear relationship between Y and X ?
A significant F-ratio (p-value < 0.05) indicates that
the equation predicts a significant portion of the
variability in the Y scores (more than would be
expected by chance alone).
Chap 13-26
Important concepts
Calculating ‘a’ and ‘b’ using formula, & writing the
regression equation.
Interpreting the slope (with + or – sign).
Using the equation, making a prediction &
calculating prediction errors (residuals)
From regression output: R-sq, SSR, SSE, SST, F-
statistic and its significance, Slope and intercept
values.
Calculating & Interpreting the r-squared value
The Simple Linear Regression
• Example 2: Car Odometer Price
A car dealer wants to find 1 37388 14636
the relationship between 2 44758 14122
the odometer reading and 3 45833 14016
the selling price of used 4 30862 15590
cars. 5 31705 15568
A random sample is 6 34010 14718
selected, and the data . . .
Independent Dependent
recorded. . . x
variable . y
variable
Find the regression line. . . .
Interpret the slope
28
Scatterplot
Car Odometer Price
1 37388 14636
2 44758 14122
3 45833 14016
4 30862 15590
5 31705 15568
6 34010 14718
. . .
. . .
. . .
29
6.30
Excel Output of car price example
Note: Can get R2 from ANOVA table too.
SS Regression = SSR = 2052339
SS Total = SST = 2318134
R2 = SSR/SST = 2052339 / 231814 = 0.8853
Homework Problem: Predict annual sales based upon
Store area. Use both calculator & MS Excel.
QUESTIONS
Data for 7 Stores:
1. What is the correlation
Annual coefficient? SSR, SST, SSE?
Store Square Sales
2. What is the regression equation?
Feet ($000)
3. Interpret the slope.
1 1,726 3,681
4. Interpret R-sq value.
2 1,542 3,395
5. Interpret the F-statistic.
3 2,816 6,653
6. What’s the prediction error for
4 5,555 9,543
Store 3 and 6?
5 1,292 3,318
7. Consider a store with 2000 square
6 2,208 5,563 feet area. Predict Annual Sales for
7 1,313 3,760 this case…