Unit 4: Regression Analysis
S. Nakale
University of Namibia
(School of Accounting)
2021 Academic Year
Learning outcome
At the end of this lecture, you must be able to:
Explain the meaning of regression analysis and identify
practical examples where regression analysis is used
Construct and interpret simple linear regression equations
Assess the goodness of fit for linear regression equations
Use the linear regression equations for prediction purposes
2021 Academic Year
Recall from Business Statistics A…..
Measures of association between two variables:
Covariance: measure of linear association
between two variables.
Correlation coefficient: measure the strength of
linear relationship between two variables
2021 Academic Year
Linear Regression Analysis
• Regression Analysis is a statistical procedure which is
used to develop an equation showing the relationship
between independent variable(s) and a dependent
variable.
– Dependent variable: variable being predicted.
– Independent variable(s): the variable or variables being
used to predict the dependent variable.
• Note: the dependent variable is denoted by (y) while
the independent variables are denoted by (x’s) i.e.
(𝑥1 , 𝑥2 , … . 𝑥𝑘 ) if there are k independent variables.
2021 Academic Year
Simple vs. Multiple Linear Regression Analysis
• Simple Linear Regression Analysis is a type of regression analysis
involving one independent variable and one dependent variable
and the relationship is approximated by a straight line i.e. linear
equation.
• Regression Analysis involving more two or more independent
variables is called Multiple Regression Analysis (not covered in this
course).
2021 Academic Year
Simple Linear Regression Equation
• The equation that describes how the expected value of the
dependent variable is related to the independent variable is called
the Simple Linear Regression Equation and is given by:
𝐸(𝑦) = 𝛽0 + 𝛽1 𝑥
where 𝛽0 is the y-intercept of the regression line, 𝛽1 is the slope and
𝐸 𝑦 is the mean or expected value of the dependent variable for a
given value of the independent variable.
2021 Academic Year
Estimated Simple Linear Regression Equation
• In practice, the population parameters 𝛽0 and 𝛽1 are unknown and
must be estimated using the sample data. Thus, sample statistics 𝑏0
and 𝑏1 are computed and used as estimates of the population
parameters 𝛽0 and 𝛽1 .
• The Estimated Simple Linear Regression Equation is given by:
𝒚 = 𝒃𝟎 + 𝒃𝟏 𝒙
2021 Academic Year
The method of Least Squares
• The Least Squares Method is a procedure for using sample data to estimate the simple
linear regression equation.
• It provides values for 𝑏0 and 𝑏1 that minimise the sum of squares of the deviations between
the observed values of the dependent variable 𝑦𝑖 and the estimated values of the
dependent variable 𝑦𝑖 i.e. 𝒎𝒊𝒏 𝒚𝒊 − 𝒚𝒊 𝟐 .
• Using differential calculus, it can be shown that the values of 𝑏0 and 𝑏1 that minimise the
𝟐
sum of squares of the deviations i. e. 𝒚𝒊 − 𝒚𝒊 are given by:
𝒙𝒊 𝒚𝒊 − ( 𝒙𝒊 𝒚𝒊 )/𝒏
𝒃𝟏 =
𝒙𝒊 𝟐 − 𝒙𝒊 𝟐 /𝒏
𝒚 𝒙
𝒃𝟎 = − (𝒃𝟏 )
𝒏 𝒏
•
2021 Academic Year
Interpreting the estimated slope coefficient (𝑏1)
• If the sign of 𝑏1 is positive, then it implies that an increase in
the independent variable will result in an increase in the
dependent variable.
• If the sign of 𝑏1 is negative, then it implies that an increase
in the independent variable will result in a decrease in
the dependent variable
2021 Academic Year
Assessing the goodness of fit of the estimated linear
regression equation
• The coefficient of determination provides a measure of the goodness of fit for the estimated regression equation used to
assess how well the estimated regression equation fits the data.
•
• To calculate the coefficient of determination (denoted by 𝑟 2 ), define the following:
•
– Sum of Square due to Error = SSE = (𝒚𝒊 − 𝒚𝒊 )𝟐
𝑥𝑖 𝑦𝑖 −( 𝑥𝑖 𝑦𝑖 )/𝑛 2
– Sum of Squares due to regression = SSR = (𝒚𝒊 − 𝒚)𝟐 =
𝑥𝑖 2 − 𝑥𝑖 2 /𝑛
𝒚 𝟐
– Total Sum of Squares = TSS = (𝒚𝒊 − 𝒚)𝟐 = 𝒚𝟐 −
𝒏
– Note: TSS = SSR + SSE
2
𝑥𝑖 𝑦𝑖 − ( 𝑥𝑖 𝑦𝑖 )/𝑛
2
𝑺𝑺𝑹 𝑥𝑖 2 − 𝑥𝑖 /𝑛
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝒓𝟐 = = 𝟐
𝑻𝑺𝑺 𝒚
𝒚𝟐 −
𝒏
0 ≤ 𝑟2 ≤ 1
• Commonly expressed as percentage.
• Interpreted as the percentage of the variability in the dependent variable that can be explained by the estimated linear
regression equation between the independent and the dependent variable.
2021 Academic Year
Relationship between coefficient of determination and
correlation coefficient
If a (simple linear) regression analysis has already been performed and the
coefficient of determination (𝑟 2 ) computed, the sample correlation coefficient
(𝑟𝑥𝑦 ) between the independent and the dependent variable can be
computed by:
𝑟𝑥𝑦 = 𝒔𝒊𝒈𝒏 𝒐𝒇 𝒃𝟏 𝑟2
2021 Academic Year
Exercise 15
Armand’s Pizza is located in a five-state area. Armand’s most successful locations are near college
campuses. The managers believe that quarterly sales for these restaurants are related to the size of
the student population.
Suppose the following data were collected from a sample of 6 Armand’s Pizza restaurants located
near college campuses:
Restaurant Student Population (1000s) Quarterly Sales (N$ 1000s)
1 2 58
2 6 105
3 8 118
4 12 117
5 20 169
6 26 202
Required:
a. Which variable is the dependent variable and which variable is the independent variable?
b. Construct a scatterplot for the data.
c. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable.
d. Provide an interpretation of the slope coefficient of the estimated linear regression equation.
e. Use the estimated regression equation from to predict the quarterly sales of pizza when the
student population is 7000.
f. Calculate and interpret the coefficient of determination for the estimated regression.
g. Calculate and interpret the correlation coefficient between the dependent and the
independent variable.
2021 Academic Year
Exercise 15 Solution
a. Which variable is the dependent variable and which variable is the independent variable?
Dependent : Quarterly Sales Independent: Student population
Why?
b. Construct a scatterplot for the data.
250
Quarterly sales (N$ 1000s)
200
150
100
50
0
0 5 10 15 20 25 30
Student population (1000s)
What can you say about the relationship between student population and quarterly sales?
2021 Academic Year
Exercise 15 Solution
c. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable.
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2 58 (2)(58)=116 𝟐𝟐 =4 𝟓𝟖𝟐 =3364
6 105 (6)(105)=630 𝟔𝟐 =36 𝟏𝟎𝟓𝟐 =11025
8 118 (8)(118)=944 𝟖𝟐 =81 𝟏𝟏𝟖𝟐 =13924
12 117 (12)(117)=1404 𝟏𝟐𝟐 =144 𝟏𝟏𝟕𝟐 =13689
20 169 (20)(169)=3380 𝟐𝟎𝟐 =400 𝟏𝟔𝟗𝟐 =28561
26 202 (26)(202)=5252 𝟐𝟔𝟐 =676 𝟐𝟎𝟐𝟐 =40804
𝒙 = 𝟕𝟒 𝒚 = 𝟕𝟔𝟗 𝒙𝒚 = 𝟏𝟏𝟕𝟐𝟔 𝒙𝟐 = 𝟏𝟑𝟐𝟒 𝒚𝟐 = 𝟏𝟏𝟏𝟑𝟔𝟕
𝟏𝟏𝟕𝟐𝟔 − (𝟕𝟒)(𝟕𝟔𝟗)/𝟔
𝒃𝟏 = = 𝟓. 𝟒𝟓
𝟏𝟑𝟐𝟒 − 𝟕𝟒𝟐 /𝟔
𝟕𝟔𝟗 𝟕𝟒
𝒃𝟎 = − 𝟓. 𝟒𝟓 = 𝟔𝟎. 𝟗𝟓
𝟔 𝟔
𝒚 = 𝟔𝟎. 𝟗𝟓 + 𝟓. 𝟒𝟓𝒙
2021 Academic Year
Exercise 15 Solution
d. Provide an interpretation of the slope coefficient of the estimated linear regression equation.
If the student population increases by 1000 students, the quarterly sales will be expected to increase
by (1000)(5.45)= N$ 5450, holding all other factors constant.
e. Use the estimated regression equation from to predict the quarterly sales of pizza when the student
population is 7000.
𝒚 = 𝟔𝟎. 𝟗𝟓 + 𝟓. 𝟒𝟓 𝟕 = 𝟗𝟗. 𝟏 𝒕𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆 𝟗𝟗. 𝟏 𝟏𝟎𝟎𝟎 = 𝑵$ 𝟗𝟗 𝟏𝟎𝟎
2021 Academic Year
Exercise 15 Solution
f. Calculate and interpret the coefficient of determination for the estimated regression.
𝒙 = 𝟕𝟒 𝒚 = 𝟕𝟔𝟗 𝒙𝒚 = 𝟏𝟏𝟕𝟐𝟔 𝒙𝟐 = 𝟏𝟑𝟐𝟒 𝒚𝟐 = 𝟏𝟏𝟏𝟑𝟔𝟕
𝟐
𝟏𝟏𝟕𝟐𝟔 − (𝟕𝟒)(𝟕𝟔𝟗)/𝟔
𝟕𝟒𝟐
𝟏𝟑𝟐𝟒 − 𝟏𝟐𝟐𝟏𝟔. 𝟓𝟒
𝒓𝟐 = 𝟔 = = 𝟎. 𝟗𝟓
𝟏𝟏𝟏𝟑𝟔𝟕 − 𝟕𝟔𝟗𝟐 /𝟔 𝟏𝟐𝟖𝟎𝟔. 𝟖𝟑
About 95% of the variation in the quarterly sales can be explained by the variation in the student
population.
g. Calculate and interpret the correlation coefficient between the dependent and the independent
variable.
𝒓𝒙𝒚 = 𝟎. 𝟗𝟓 = 𝟎. 𝟗𝟕
There is a relatively strong positive linear association between the quarterly sales and student
population.
2021 Academic Year
Additional Exercises (A)
It is common knowledge that it pays to advertise. Suppose an investigation is undertaken to determine the
effect of advertisement on sales in Small , Medium Enterprises during the first wave of Covid-19. Eight SMEs
are randomly selected and their advertising expenditure and profit for the first month during the first wave of
Covid-19 are recorded:
SME Advertisement expenditure Profit (N$)
(N$)
1 1500 6300
2 2300 6900
3 2600 7700
4 1050 4500
5 650 4600
6 2100 7500
7 400 1500
8 1300 4000
a. Which variable is the dependent variable and which variable is the independent variable?
b. Construct a scatterplot for the data.
c. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable.
d. Provide an interpretation of the slope coefficient of the estimated linear regression equation.
e. Use the estimated regression equation to predict the dependent variable when the independent
variable is N$ 1400.
f. Calculate and interpret the coefficient of determination for the estimated regression.
g. Calculate and interpret the correlation coefficient between the dependent and the
independent variable.
2021 Academic Year
Additional Exercises (A)
a. Which variable is the dependent variable and which variable is the independent
variable?
Dependent variable : Profit Independent variable: Advertisement expenditure
b. Construct a scatterplot for the data.
9000
8000
7000
6000
Profit (N$)
5000
4000
3000
2000
1000
0
0 500 1000 1500 2000 2500 3000
Advertisement expenditure (N$)
2021 Academic Year
Additional Exercises (A)
c. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable.
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
1500 6300 9450000 2250000 39690000
2300 6900 15870000 5290000 47610000
2600 7700 20020000 6760000 59290000
1050 4500 4725000 1102500 20250000
650 4600 2990000 422500 21160000
2100 7500 15750000 4410000 56250000
400 1500 600000 160000 2250000
1300 4000 5200000 1690000 16000000
𝒙 =11900 𝒚 =43000 𝒙𝒚 =74605000 𝟐
𝒙 =22085000 𝟐
𝒚 =262500000
𝒙𝒊 𝒚𝒊 − ( 𝒙𝒊 𝒚𝒊 )/𝒏 𝟕𝟒𝟔𝟎𝟓𝟎𝟎𝟎 − (𝟏𝟏𝟗𝟎𝟎)(𝟒𝟑𝟎𝟎𝟎)/𝟖
𝒃𝟏 = = = 𝟐. 𝟒𝟑
𝒙𝒊 𝟐 − 𝒙𝒊 𝟐 /𝒏 𝟐𝟐𝟎𝟖𝟓𝟎𝟎𝟎 − (𝟏𝟏𝟗𝟎𝟎)𝟐 /𝟖
𝒚 𝒙 𝟒𝟑𝟎𝟎𝟎 𝟏𝟏𝟗𝟎𝟎
𝒃𝟎 = − (𝒃𝟏 ) = − (𝟐. 𝟒𝟑) = 𝟏𝟕𝟔𝟎. 𝟑𝟖
𝒏 𝒏 𝟖 𝟖
𝒚 = 𝟏𝟕𝟔𝟎. 𝟑𝟖 + 𝟐. 𝟒𝟑𝒙
2021 Academic Year
Additional Exercises (A)
d. Provide an interpretation of the slope coefficient of the estimated linear regression equation.
If advertisement expenditure increases by N$ 1, the profit is expected to increase by N$ 2.43, holding
all other factors constant.
e. Use the estimated regression equation to predict the dependent variable when the independent
variable is N$ 1400.
𝒚 = 𝟏𝟕𝟔𝟎. 𝟑𝟖 + 𝟐. 𝟒𝟑 𝟏𝟒𝟎𝟎 = 𝑵$𝟓𝟏𝟔𝟐. 𝟑𝟖
2021 Academic Year
Additional Exercises (A)
f. Calculate and interpret the coefficient of determination for the estimated regression.
𝒙 =11900 𝒚 =43000 𝒙𝒚 =74605000 𝒙𝟐 =22085000 𝒚𝟐 =262500000
𝑥𝑖 𝑦𝑖 − ( 𝑥𝑖 𝑦𝑖 )/𝑛 2 [𝟕𝟒𝟔𝟎𝟓𝟎𝟎𝟎 − (𝟏𝟏𝟗𝟎𝟎)(𝟒𝟑𝟎𝟎𝟎)/𝟖]𝟐
𝑺𝑺𝑹 𝑥𝑖 2 − 𝑥𝑖 2 /𝑛 𝟐𝟐𝟎𝟖𝟓𝟎𝟎𝟎 − 𝟏𝟏𝟗𝟎𝟎𝟐 /𝟖
𝒓𝟐 = = = = 𝟎. 𝟖𝟐
𝑻𝑺𝑺 𝒚 𝟐 𝟒𝟑𝟎𝟎𝟎𝟐
𝒚𝟐 − 𝟐𝟔𝟐𝟓𝟎𝟎𝟎𝟎𝟎 −
𝟖
𝒏
About 82% of the variation in the SMEs’ profits can be explained by the estimated regression equation
between advertisement expenditure and profit. This indicates a relatively good fit for the data.
g. Calculate and interpret the correlation coefficient between the dependent and the independent
variable.
𝑟𝑥𝑦 = 𝑟 2 = 0.82 = 0.91
There is a relatively strong positive linear relationship between advertisement expenditure and profit.
2021 Academic Year
Additional Exercises (B)
Major hotels frequently provide special rate for business travelers. The lowest rates are charged when
reservations are made 14 days in advance. The following table reports the business rates (x) and the 14-day
advance super saver rates (y) for one night at a sample of six ITT Sheraton Hotels (Sky Magazine, January
2005).
Hotel Location x y
A 89 81
B 130 115
C 98 89
D 149 138
E 199 149
F 114 94
a) Use the least squares method and estimate the regression equation relating business rates and the 14-
day advance rates.
b) The ITT Sheraton Hotel offers a business rate of N$ 140 per night. Estimate the 14-day advance super-
saver rate at this hotel.
c) Compute the coefficient of determination and comment on the goodness of fit.
d) Calculate and interpret the correlation coefficient between the business rates and the 14-day advance
super saver rates.
2021 Academic Year
Additional Exercises (B) solution
a) Use the least squares method and estimate the regression equation relating business rates and the 14-
day advance rates.
𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒚𝟐
89 81 7209 7921 6561
130 115 14950 16900 13225
98 89 8722 9604 7921
149 138 20562 22201 19044
199 149 29651 39601 22201
114 94 10716 12996 8836
779 666 91810 109223 77788
𝒙𝒊 𝒚𝒊 − ( 𝒙𝒊 𝒚𝒊 )/𝒏 𝟗𝟏𝟖𝟏𝟎 − (𝟕𝟕𝟗)(𝟔𝟔𝟔)/𝟔
𝒃𝟏 = = = 𝟎. 𝟔𝟔
𝒙𝒊 𝟐 − 𝒙𝒊 𝟐 /𝒏 𝟏𝟎𝟗𝟐𝟐𝟑 − (𝟕𝟕𝟗)𝟐 /𝟔
𝒚 𝒙 𝟔𝟔𝟔 𝟕𝟕𝟗
𝒃𝟎 = − (𝒃𝟏 ) = − (𝟎. 𝟔𝟔) = 𝟐𝟓. 𝟑𝟏
𝒏 𝒏 𝟔 𝟔
𝑦 = 25.31 + 0.66𝑥
b) The ITT Sheraton Hotel offers a business rate of N$ 140 per night. Estimate the 14-day advance super-
saver rate at this hotel.
𝑦 = 25.31 + 0.66 140 = 𝑵$ 𝟏𝟏𝟕. 𝟕𝟏
2021 Academic Year
Additional Exercises (B) solution
c. Compute the coefficient of determination and comment on the goodness of fit.
𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒚𝟐
779 666 91810 109223 77788
𝑥𝑖 𝑦𝑖 − ( 𝑥𝑖 𝑦𝑖 )/𝑛 2 [𝟗𝟏𝟖𝟏𝟎 − (𝟕𝟕𝟗)(𝟔𝟔𝟔)/𝟔]𝟐
𝑺𝑺𝑹 𝑥𝑖 2 − 𝑥𝑖 2 /𝑛 𝟏𝟎𝟗𝟐𝟐𝟑 − (𝟕𝟕𝟗)𝟐 /𝟔
𝒓𝟐 = = = = 𝟎. 𝟗𝟏
𝑻𝑺𝑺 𝟐 𝒚 𝟐 (𝟔𝟔𝟔)𝟐
𝒚 − 𝟕𝟕𝟕𝟖𝟖 −
𝒏 𝟔
About 91% of the variation in the dependent variable is explained by the estimated regression equation
between the dependent variable and the independent variable and this is a relatively good fit.,
d. Calculate and interpret the correlation coefficient between the business rates and the 14-day advance
super saver rates.
𝑟𝑥𝑦 = 0.91 = 0.95
𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑜𝑛𝑔 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛the business rates (x) and the 14−day advance super saver
rates (y)
2021 Academic Year
Additional Exercises (C)
An accountant at a manufacturing company wants to establish the relationship
between the production cost per unit (y) and the production volume (x). Consider the
following sample of production volumes and production cost per unit data:
Units of Production Volume Production Cost per unit (N$)
4000 72
4500 72
5500 59
6000 58
7000 56
7500 55
9000 54
a. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable. Interpret the slope coefficient.
b. Calculate and interpret the coefficient of determination for the estimated regression.
c. Calculate and interpret the correlation coefficient between the dependent and the
independent variable.
2021 Academic Year
Additional Exercises (C)
a. Use the least squares method to estimate the simple linear regression equation showing how the
dependent variable is related to the independent variable. Interpret the slope coefficient.
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
4000 72 288000 16000000 5184
4500 72 324000 20250000 5184
5500 59 324500 30250000 3481
6000 58 348000 36000000 3364
7000 56 392000 49000000 3136
7500 55 412500 56250000 3025
9000 54 486000 81000000 2916
43500 426 2575000 288750000 26290
𝒙𝒊 𝒚𝒊 − ( 𝒙𝒊 𝒚𝒊 )/𝒏 𝟐𝟓𝟕𝟓𝟎𝟎𝟎 − (𝟒𝟑𝟓𝟎𝟎)(𝟒𝟐𝟔)/𝟕
𝒃𝟏 = = = −𝟎. 𝟎𝟎𝟒
𝒙𝒊 𝟐 − 𝒙𝒊 𝟐 /𝒏 𝟐𝟖𝟖𝟕𝟓𝟎𝟎𝟎𝟎 − (𝟒𝟑𝟓𝟎𝟎)𝟐 /𝟕
𝒚 𝒙 𝟒𝟐𝟔 𝟒𝟑𝟓𝟎𝟎
𝒃𝟎 = − (𝒃𝟏 ) = − (−𝟎. 𝟎𝟎𝟒) = 𝟖𝟓. 𝟕𝟏
𝒏 𝒏 𝟕 𝟕
𝑦 = 85.71 − 0.004𝑥
If production volume increases by 1 unit, the production cost per unit
will be expected to decrease by N$0.004, holding everything else
constant.
2021 Academic Year
Additional Exercises (C)
b. Calculate and interpret the coefficient of determination for the estimated regression.
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
43500 426 2575000 288750000 26290
𝑥𝑖 𝑦𝑖 − ( 𝑥𝑖 𝑦𝑖 )/𝑛 2 [𝟐𝟓𝟕𝟓𝟎𝟎𝟎 − (𝟒𝟑𝟓𝟎𝟎)(𝟒𝟐𝟔)/𝟕]𝟐
𝑺𝑺𝑹 𝑥𝑖 2 − 𝑥𝑖 2 /𝑛 𝟐𝟖𝟖𝟕𝟓𝟎𝟎𝟎𝟎 − (𝟒𝟑𝟓𝟎𝟎)𝟐 /𝟕
𝒓𝟐 = = = = 𝟎. 𝟕𝟖
𝑻𝑺𝑺 𝟐 𝒚 𝟐 (𝟒𝟐𝟔)𝟐
𝒚 − 𝟐𝟔𝟐𝟗𝟎 −
𝒏 𝟕
About 78% of the variation in the production cost per unit can be explained by the
estimated regression equation between production volume and production cost per
unit. This indicates a good fit.
c. Calculate and interpret the correlation coefficient between the dependent and the independent
variable.
𝑟𝑥𝑦 = − 0.78 = −0.88
There is a relatively strong negative linear association between production volume and
production cost per unit i.e. as production volume increases, production cost per unit
decreases, holding all other factors constant.
2021 Academic Year
Reference
Anderson D.R, Sweeney D.J and Williams T.A, 2011.
Statistics for Business and Economics, eleventh edition.
Wegner T, 2016. Applied Business Statistics: Methods
and Excel-Based Applications, fourth edition.
2021 Academic Year