EconomEtrics for managEmEnt
(MGMT3071)
Chapter two:
RegRession AnAlysis
Teklebirhan Alemnew (Assistant Professor)
[email protected] AAU, 2023
1
2.1. Introduction
As you know, economic theories are mainly concerned with the
relationships between variables.
These relationships can be stated in mathematical terms which
show the functional relationship of variables.
The functional relationships of these variables define the
dependence of one variable upon the other variable(s) in the
specific functional form.
The specific functional forms may be linear, quadratic,
logarithmic, exponential, or any other form.
By: Teklebirhan A. 2
Cont…
The two types of regression analysis are,
Simple Linear regression analysis known as two variables
regression in which the dependent variable is linearly related to a
single explanatory variable.
Multiple Linear regression analysis in which the regressand is
related to two or more regressors.
By: Teklebirhan A. 3
2.2. The Concept of Regression Analysis
The main goal of any econometric analysis is to establish an
acceptable empirical causal relationship between variables.
Regression analysis is concerned with the study of the dependence
of one variable (the dependent variable) on one or more other
variables (the explanatory variable(s)).
In other words, Regression analysis is concerned with describing
and evaluating the relationship between a given variable.
The objective of regression analysis is to estimate and/or predict
the unknown (population) mean value of the dependent variable in
terms of the known values of the explanatory variables. 4
By: Teklebirhan A.
Cont…
For instance, an economist may be interested in studying the
dependence of household monthly consumption expenditure on
household monthly disposable income.
That is, our concern might be with predicting the average
consumption expenditure knowing household monthly disposable
income.
Such an analysis is helpful in estimating the marginal propensity to
consume (MPC), that is, average change in consumption
expenditure for, say, a unit change in disposable income.
By: Teklebirhan A. 5
Cont…
.
Regression Line
The line that passes through the average level of consumption
expenditure for each level of household income is known as the
regression line. It shows how the average consumption expenditure
increases with the household’s income.
6
By: Teklebirhan A.
Cont…
.
What is the difference between
Regression and Correlation
Analysis?
By: Teklebirhan A. 7
Cont…
In addition, regression analysis is closely related to correlation
analysis but conceptually there is huge difference
Statistical relationships (Regression analysis) by themselves
cannot logically imply causation.
To ascribe causality, one must appeal to ‘a priori’ or theoretical
considerations.
The primary objective of correlation analysis is to measure the
strength or degree of linear association between two variables.
However, in regression analysis, we try to predict the average
value of the dependent variable on the basis of fixed values of the
explanatory variables. By: Teklebirhan A. 8
Cont…
Terminologies in Regression Theory
The variables in a regression relation consist of dependent and
explanatory variables.
The dependent variable is the variable whose variation is being
explained by the other variable(s).
The explanatory variable is the variable whose variation is used to
explain the variation in the dependent variable.
The following is a representative list of the various terminologies
used in regression analysis:
By: Teklebirhan A. 9
Cont…
Dependent Variable Explanatory Variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
By: Teklebirhan A. 10
Cont…
Note that Regression analysis can be simple or multiple depending
on the number of variables included in the analysis.
That is, if we are studying the dependence of one variable on only a
single explanatory variable, such as the dependence of
consumption expenditure on the level of real income, such a study
is known as simple, or two-variable, regression analysis.
However, if we are studying the dependence of one variable on
more than one explanatory variable, such as the dependence of
crop-yield on rainfall, labor spent, farm size, fertilizer, and etc, it is
known as Multiple Regression Analysis. By: Teklebirhan A. 11
Types of Regression Models
By: Teklebirhan A. 12
2.3. Correlation Analysis
Correlation is a bivariate analysis that measures the strength of
association between two variables and the direction of the
relationship - without being able to infer causal relationships.
In terms of the strength of relationship, the value of the
correlation coefficient varies between +1 and -1.
A value of ± 1 indicates a perfect degree of association
between the two variables.
As the correlation coefficient value goes towards 0, the
relationship between the two variables will be weaker.
By: Teklebirhan A. 13
Cont…
The direction of the relationship is indicated by the sign of the
coefficient;
a + sign indicates a positive relationship and
a – sign indicates a negative relationship.
Generally, Correlation measures the direction and strength of the
linear relationship between two quantitative variables
Represented by ‘r’.
There is no assumption of causality
Assumes a linear association between two variables.
By: Teklebirhan A. 14
Scatter plot
Linear relationships implying straight line association are
visualized with scatter plots.
Consider the following Cigarette data set (n = 11)
per capita cigarette lung cancer mortality per
consumption (X) 100,000 in 1950 (Y)
By: Teklebirhan A. 15
Cont…
scatter LUNGCA CIG
Assess:
Functional Form
Direction of
association
Outliers
Strength of
relation
By: Teklebirhan A. 16
Cont…
. Form: linear
Direction: positive
association
Outlier: no clear
outliers
Strength: difficult to
determine by eye
By: Teklebirhan A. 17
Cont…
The eye is not a good judge of strength
Identical data sets on differently scaled axes
This relation appears to be weak This relation appears strong
The different appearances in strength is an artifact of the
axis scaling (shows eye is not a good judge of strength)
By: Teklebirhan A. 18
Cont…
The pattern of data is indicative of the type of relationship
between your two variables:
Positive relationship
Negative relationship
No relationship
By: Teklebirhan A. 19
Cont…
Positive Relationship
By: Teklebirhan A. 20
Cont…
Negative Relationship
Income & illitracy rates (%)
100
Rate of illiteracry (%)
80
60
40
20
0
0 200 400 600 800 1000 1200
Income
By: Teklebirhan A. 21
Cont…
No Relation
By: Teklebirhan A. 22
Cont…
Usually, in statistics, we use four types of correlation measures:
a) Pearson correlation (Simple Correlation coefficient (r),
b) Spearman rank correlation,
c) Kendall rank correlation, and
d) the Point-Biserial correlation.
By: Teklebirhan A. 23
a) Pearson Correlation
It is also called Simple Correlation coefficient (r) or product
moment correlation coefficient.
Pearson r correlation is the most widely used correlation statistic
to measure the degree of the relationship between linearly related
variables.
For example, in the fertilizer market, if we want to measure how
two fertilizer are related to each other, Pearson r correlation is
used to measure the degree of relationship between the two.
It measures the nature and strength of association between two
variables of quantitative type. By: Teklebirhan A. 24
Cont…
The value of r ranges between ( -1) and ( +1)
.
The sign of r denotes the nature of
association
While the value of r denotes the strength of
association.
If the sign is +ve this means the relation is Karl Pearson
direct (an increase in one variable is associated 1857 - 1936
with an increase in the other variable and a
decrease in one variable is associated with a
decrease in the other variable).
While if the sign is -ve this means an inverse or indirect relationship
(which means an increase in one variable is associated with a decrease
in the other).
By: Teklebirhan A. 25
Cont…
By: Teklebirhan A. 26
Correlational Direction and Strength
By: Teklebirhan A. 27
Cont…
The following formula is used to calculate the Pearson ‘r’
correlation:
z quantify distance above or below mean in standard deviations
units.
When z scores track in same directions ⟹products are positive
When z scores track in opposite directions ⟹ products are
negative By: Teklebirhan A. 28
Cont…
Types of research questions a Pearson correlation can examine:
Is there a relationship between job satisfaction, as measured by
the JSS, and income, measured in dollars?
Is there a statistically significant relationship between age, as
measured in years, and height, measured in inches?
Is there a relationship between temperature, measured in
degrees Fahrenheit, and ice cream sales, measured by income?
Is there a statistically significant relationship between
fertilizer, as measured in Kg, and crop productivity, measured
in Quintal?
By: Teklebirhan A. 29
Cont…
Assumptions
For the Pearson r correlation, both variables should be normally
distributed (normally distributed variables have a bell-shaped
curve).
Other assumptions include linearity and homoscedasticity.
Linearity assumes a straight line relationship between each of the
two variables and homoscedasticity assumes that data is equally
distributed about the regression line.
By: Teklebirhan A. 30
Cont…
Example
By: Teklebirhan A. 31
Cont…
STATA Output – Correlation coefficient (Pearson)
. pwcorr LUNGCA CIG, obs sig star(1)
NB: Non-
LUNGCA CIG
significant
LUNGCA 1.0000 correlation does
not imply no
11 association
CIG 0.7373* 1.0000
0.0096
11 11
r = 0.74 indicates a strong, positive association at 1% level of
significance.
By: Teklebirhan A. 32
b) Spearman Rank Correlation
Spearman rank correlation is a non-parametric test that is used to
measure the degree of association between two variables.
The Spearman rank correlation test does not carry any
assumptions about the distribution of the data and is the
appropriate correlation analysis when the variables are measured
on a scale that is at least ordinal.
The following formula is used to calculate the Spearman rank
ρ= Spearman rank correlation
correlation:
di= the difference between the ranks of
corresponding variables
n= number of observations
By: Teklebirhan A. 33
Cont…
By: Teklebirhan A. 34
Cont…
Types of research questions a Spearman Correlation can examine:
Is there a statistically significant relationship between
participants’ level of education (high school, bachelor’s, or
graduate degree) and their starting salary?
Is there a statistically significant relationship between worker’s
productivity and worker’s age?
By: Teklebirhan A. 35
Cont…
. spearman LUNGCA CIG, stats(rho obs p) star(0.01)
Number of obs = 11
Spearman's rho = 0.8428
Test of Ho: LUNGCA and CIG are independent
Prob > |t| = 0.0011
at 1% level of significance
By: Teklebirhan A. 36
2.4. Population Regression Function Versus
Sample Regression Function
Population Regression Function (PRF)
The economic theory of consumption (in its simplest form) can be
modeled as stochastic of the following form:
The econometrics model given in the above is called population
regression model or, simply, the population model.
This population regression model is called the true relationship
because Y, X and U represent their respective population values,
and α and β are called the true parameters.
By: Teklebirhan A. 37
Cont…
By: Teklebirhan A. 38
Cont…
By: Teklebirhan A. 39
Cont…
By: Teklebirhan A. 40
Cont…
By: Teklebirhan A. 41
Cont…
Therefore, right now, our major task is to estimate the population
regression function (PRF) on the basis of the sample regression
function (SRF).
By: Teklebirhan A. 42
2.5. Methods of Estimation: The Classical Simple Linear
Regression Analysis
By: Teklebirhan A. 43
Cont…
Specifying the model is the first stage of any econometric
application. The next step is the estimation of the numerical values
of the parameters of economic relationships.
The parameters of the simple linear regression model can be
estimated by the three most commonly used estimation methods:
1. Ordinary Least Square Method (OLS)
2. Method of Moments (MM)
3. Maximum Likelihood Method (MLM)
But, having some desirable properties (property of linearity,
unbiasedness, and minimum variance), OLS method is the most
popular method to estimate regression parameters.
By: Teklebirhan A. 44
Cont…
By: Teklebirhan A. 45
2.5.1. The Basic Assumptions of the Classical Linear
Regression Analysis (OLS) to estimate SLRM & MLRM
The method of OLS is attributed to Carl Friedrich Gauss, a
German Mathematician.
OLS is an econometric method used to derive estimates of the
parameters of economic relationships from statistical observations.
However, it works under some restrictive assumptions.
The most important of these assumptions are discussed below.
By: Teklebirhan A. 46
Cont…
A model is termed as linear if it is linear in parameters
By: Teklebirhan A. 47
Cont…
This assumption implies that the values of Y corresponding to
various values of X have constant variance.
By: Teklebirhan A. 48
Cont…
This assumption is required mainly for hypothesis testing (inference).
By: Teklebirhan A. 49
Cont…
By: Teklebirhan A. 50
u
Cont…
i
By: Teklebirhan A. 51
Cont…
By: Teklebirhan A. 52
Cont…
9) No model specification error :The econometric model is correctly
specified
No omission of relevant variable(s),
No inclusion of unnecessary variable(s),
Absence of adoption of wrong functional form.
If not, OLS estimators will be biased & inconsistent
10) Variability in the values of X
The ‘X’ values in a given sample must not all be the same.
11) Absence of high multi-collinearity among explanatory variables
(specific to Multiple regression models – Chapter 3)
There is no perfect linear relationship among the explanatory
variables - not perfectly correlated with each other
By: Teklebirhan A. 53
Cont…
NB:
Without the realization of these assumptions, the
application of OLS results would be misleading.
By: Teklebirhan A. 54
2.5.2. Estimation of SLRM by Ordinary Least
Square (OLS) Method
By: Teklebirhan A. 55
Cont…
By: Teklebirhan A. 56
Cont…
Obsns
1. 4 5 20 25 -3 -4 12 16
2. 4 4 16 16 -3 -5 15 25
3. 7 8 56 64 0 -1 0 1
4. 8 10 80 100 1 1 1 1
5. 9 13 117 169 2 4 8 16
6. 10 14 140 196 3 5 15 25
Sums 42 54 429 570 0 0 51 84
By: Teklebirhan A. 57
Cont…
By: Teklebirhan A. 58
2.6. Alternative Functional Forms and
Interpretation of OLS Estimates for SLRM
By: Teklebirhan A. 59
Cont…
By: Teklebirhan A. 60
Cont…
By: Teklebirhan A. 61
Cont…
By: Teklebirhan A. 62
Cont…
By: Teklebirhan A. 63
Cont…
By: Teklebirhan A. 64
Cont…
By: Teklebirhan A. 65
Cont…
By: Teklebirhan A. 66
Cont…
By: Teklebirhan A. 67
Cont…
By: Teklebirhan A. 68
Cont…
By: Teklebirhan A. 69
Cont…
By: Teklebirhan A. 70
Cont…
Model If X increases by Then Y will change by
Linear 1 unit
Linear-Log 1%
Log-Linear 1 unit
Log-Log 1%
By: Teklebirhan A. 71
Cont...
By: Teklebirhan A. 72
2.7. Decomposition of the Variation of Y and
“Goodness of Fit” of an Estimated Model
By: Teklebirhan A. 73
Cont…
By: Teklebirhan A. 74
Cont…
By: Teklebirhan A. 75
Cont…
By: Teklebirhan A. 76
Cont…
By: Teklebirhan A. 77
Cont…
By: Teklebirhan A. 78
2.8. Evaluation of an Estimated Model for SLRM
& MLRM
After estimation of a model, the next stage is to evaluate the
estimated model.
By evaluation of the model means examining the ‘goodness’ of an
estimated model.
To judge on the ‘goodness’ of an estimated econometrics model,
there are three criteria. These are
Economic criterion,
Statistical criterion (First order test) and
Econometric criterion (Second Order Tests).
By: Teklebirhan A. 79
2.8.1.Econometric Criterion: Statistical Desirable
Properties of OLS Estimators and the Gauss-Markov
Theorem
There are traditional criteria based on which the closeness of an
estimate to the true population parameter can be determined.
These are called desirable properties of Estimators (or estimates).
Desirable properties of estimators are two categories:
1) Finite (small sample) desirable properties of estimators and
2) Infinite (large sample) or asymptotic properties of estimators.
By: Teklebirhan A. 80
Cont…
1. Finite (Small Sample) Properties of Estimators.
The desirable attributes of estimators under smaller sample sizes
are: = a) + b)
a)Unbiasedness
b)Minimum variance
c)Efficiency Estimator
d)Minimum mean square error (MMSE)
e)Linearity Estimator
f)Best, linear, unbiased Estimator (BLUE) - Gauss-Markov Theorem
An estimator is called BLUE if: linear, unbiased & Minimum
variance
By: Teklebirhan A. 81
Cont…
By: Teklebirhan A. 82
Cont…
2) Large-Sample (Asymptotic) Properties of Estimators
It often happens that an estimator does not satisfy one or more of
the desirable statistical properties in small samples.
But as the sample size increases indefinitely, the estimator
possesses several desirable statistical properties.
These properties are known as the large-sample, or asymptotic,
properties.
By: Teklebirhan A. 83
Cont…
Asymptotic (large sample) desirable properties of estimators are:
Asymptotic unbiasedness
Consistency (biased + Variance tends to zero as ‘n’ increase)
Asymptotic efficiency (consistent + min variance)
By: Teklebirhan A. 84
2.8.2. Statistical Inference: Statistical Test of
Significance of OLS Estimators (First Order Tests)
In this section, we shall develop statistical criteria for the
evaluation of an estimated model.
Statistical criteria are developed based on statistical and
probability theories.
The application of statistical criteria to judge on the goodness of a
model is known as tests of the statistical significance (TSS) or first
order tests of a model.
By: Teklebirhan A. 85
Cont…
By: Teklebirhan A. 86
Cont…
By: Teklebirhan A. 87
Cont…
By: Teklebirhan A. 88
Cont…
By: Teklebirhan A. 89
Cont…
By: Teklebirhan A. 90
Cont…
By: Teklebirhan A. 91
Cont…
Thus, with these critical values the rejection and acceptance
regions for the null-hypothesis will be:
By: Teklebirhan A. 92
Cont…
By: Teklebirhan A. 93
Cont…
By: Teklebirhan A. 94
Cont…
By: Teklebirhan A. 95
Cont…
By: Teklebirhan A. 96
Cont…
By: Teklebirhan A. 97
Cont…
In statistics, the process of estimating an interval of values
between which the true values of the population parameters are
expected to lie based on the sampling distribution of the sample
estimates is called interval estimation.
It can be done depending on the sample size;
1) Confidence interval from the Standard Normal Distribution (Z-
Distribution)
2) Confidence interval from the Student’s t-distribution.
By: Teklebirhan A. 98
Cont…
Confidence interval from the Standard Normal Distribution (Z-
Distribution)
The meaning of this confidence interval is that there is 95%
chance for this interval to contain the true value of the unknown
parameter β within its range. 99
By: Teklebirhan A.
2.9. Prediction using Simple Linear
Regression Model
By: Teklebirhan A. 100
Cont…
By: Teklebirhan A.
101
Cont…
Reporting the Results of Regression Analysis
The results of the regression analysis derived are reported in
conventional formats.
It is not sufficient merely to report the estimates of β’s.
There are two conventional ways to report a regression result:
a) Equation form, i.e., by fitting the estimated coefficients in to the
regression model and
b) Table form
By: Teklebirhan A. 102
Cont…
By: Teklebirhan A. 103
Cont…
b) Table Form
In this case, the estimated coefficients, the corresponding t-
statistics, and some other indicators are presented in tabular form.
Example: The estimated regression result of our consumption
function can be presented using table as follows:
(1)
Consumption
Expenditure (in
ETB)
Monthly Income (in 0.607***
ETB)
(10.94)
Constant 1.536*
(2.84)
Observations 6
R2 0.968
t statistics in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001
By: Teklebirhan A. 104
End of Chapter Two
By: Teklebirhan A. 105