Introduction to Econometrics for
Finance (Econ 552)
Graduate Studies
Department of Accounting & Finance
Biruk Birhanu (Assistant Professor)
Department of Economics
Wolkite University
April, 2019
Wolkite
Schedule
Class lecturing April 1-4, 2019
Practice Session April 5, 2019
Mid Term Exam April 11, 2019
Individual Assignment Presentation April 08, 2019
Final Exam 19, 2019
Wolkite University Biruk Birhanu (Asst. Professor)
Content
Chapter One
Introduction to Econometrics
1.1. The purpose and applications of econometrics
1.2. The kinds of problems handled by econometrics
1.3. The link between economic theory, mathematics, statistics and
econometrics
Wolkite University Biruk Birhanu (Asst. Professor)
1. What is Econometrics ?
• Meaning, Scope and Objectives…
• Econometrics may be defined as the social science in which the tools
of economic theory, mathematics and statistical inference are
applied to the analysis of economic phenomena (Goldberger, 1964).
• Econometrics mean economic measurement.
• Econometrics is concerned with the empirical determination of
economic laws-Example: the law of consumption, the law of asset
pricing
• Application of mathematical statistics to economic data to lend
empirical support to the models constructed by mathematical
economics and obtain numerical results( Samuelson, Koopmans and
Stone, 1954)
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The origins of econometrics are rooted in economics. However, the main
techniques employed for studying economic problems are of equal importance in
financial applications.
• Financial econometrics will be defined as the application of statistical
techniques to problems in finance.
• Financial econometrics can be useful for testing theories in;
Finance,
Determining asset prices or returns,
Testing hypotheses concerning the relationships between variables,
Examining the effect on financial markets of changes in economic
conditions, and
Forecasting future values of financial variables and for financial decision-
making.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Is financial econometrics different from ‘economic econometrics’?
• The tools commonly used in financial applications are fundamentally
the same as those used in economic applications. BUT,
-With different emphasis and sets of problem.
Problems in finance: repaying debt, financial recording, funding
long term investment
Problems in economics: unemployment, economic growth, etc
-Relying on financial data vs macroeconomic data.
-Accordingly differences in measurement error, frequency,
revision, lack of data are observed in economic vs financial data.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Why a separate subject ?
• Economic Theory: makes statements or hypotheses that are mostly
qualitative in nature.
• Mathematical Economics: express economic theory in mathematical
form (equations) without regard to measurability or empirical
verification of the theory.
• Economic Statistics: is mainly concerned with collecting, processing
and presenting economic data. It is not concerned with using the
collected data to test economic theories.
• Mathematical Statistics: provides many of tools for economic studies,
but econometrics supplies the later with many special methods of
quantitative analysis based on economic data.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Interdisciplinary nature of econometrics; econometrics lend
Economic
Statistics
Economic Mathimatical
Econometrics
Theory Statistics
Mathimatical
Economics
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Its an amalgamation of different disciplines.
• Even if derived from several disciplines:
Economic Theory | Statistics | Economic statistics | Mathematical
Economics | Mathematical statistics
and…follows common assumptions Econometrics is not statistics.
• Econometrics is not same as Mathematical Economics.
• The statistical problems of econometrics are simply problems of
mathematical statistics, but one dominating characteristic of econometric
problem ,normally, that they deal with non-experimental data.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
1.1. Objectives/Uses/Goals of econometrics ?
• Estimation/measurement of economic parameters or relationships,
which may be needed for policy- or decision-making;
• Testing (& possibly refining) economic theory;
• Forecasting/prediction of future values of economic magnitudes and
• Evaluation of policies/programs.
Purpose of econometrics ? Two main purposes are…
First:- To give empirical content to economic theory, and
Second:- To subject economic theory to potentially falsifying tests.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
1.2. The kinds of problems handled by econometrics
Value of econometrics: In finance the common problems…
• Testing whether the Capital Asset Pricing Model (CAPM) or Arbitrage Pricing
Theory (APT) represent superior models for the determination of returns on risky
assets;
• Measuring and forecasting the volatility of bond returns;
• Modelling long-term relationships between prices and exchange rates;
• Testing the hypothesis that earnings or dividend announcements have no effect
on stock prices;
• Forecasting the correlation between the stock indices of two countries.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Types of Data
• There are broadly three types of data that can be employed in quantitative analysis of financial
problems: time series data, cross-sectional data, and panel data.
1. Time series Data: as the name suggests, are data that have been collected over a period of time
on one or more variables.
-It is also generally a requirement that all data used in a model be
of the same frequency of observation.
-The data may be quantitative (e.g. exchange rates, prices, number of
shares outstanding), or qualitative
• Problems that could be tackled using time series data:
-How the value of a country’s stock index has varied with that country’s macroeconomic
fundamentals
-The relationship between interest rate and investment performance
-Is trade deficit affect a country’s exchange rate ?
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
2. Cross-sectional data: are data on one or more variables collected at
a single point in time. For example, the data might be on;
-Interest rate variation in commercial banks.
-Income tax collection performance in regional states.
• Problems that could be tackled using cross-sectional data:
-The relationship between company size and the return to
investing in its shares
-The relationship between a country’s GDP level and the
probability that the government will default on its sovereign
debt.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
3. Panel data: have the dimensions of both time series and cross-
sections,
e.g. the tax performance on different particulars over ten years.
• Fortunately, virtually all of the standard techniques and analysis in
econometrics are equally valid for time series and cross-sectional
data.
• Also, there are other types of data;
Continuous: height, weight, skull circumference
Discrete data: the no of daily admission to wolkite University
Cardinal, ordinal and nominal numbers (read by yourself)
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Methodology in Econometrics
• How do econometricians proceed in their analysis of an economic
problem? That is, what is their methodology?
• Broadly speaking, traditional econometric methodology proceeds
along the following lines:
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• But, the steps are not as easy as depicted, What if the model is not
statistically adequate? The steps may goes like,
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
What are these steps ?
1. Statement of Theory or Hypothesis
Example: Keynes postulated that the marginal propensity to consume
(MPC), the rate of change of consumption for a unit (say, a dollar) change in income, is
greater than zero but less than 1:
2. Specification of the Mathematical Model of Consumption
Mathematical economist might suggest the following form of the Keynesian
consumption function:
3. Specification of the Econometric Model of Consumption
To allow for the inexact relationships between economic variables, the
econometrician would modify the deterministic consumption function.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• where u, known as the disturbance, or error, term, is a random (stochastic) variable that has
well-defined probabilistic properties. The disturbance term u may well represent all those factors
that affect consumption but are not taken into account explicitly.
• The above equation is an example of an econometric model. More technically, it is an example of
a linear regression model, which is the major concern for this course.
4. Obtaining Data
• To estimate the econometric model given above, that is, to obtain the
numerical values of β1 and β2, we need data (Time series, Panel, Cross Sectional)
5. Estimation of the Econometric Model
• The numerical estimates of the parameters give empirical content to the consumption function.
• That the statistical technique of regression analysis is the main tool used to obtain the estimates.
Using this technique and the data given, the estimated consumption function would be;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In simple terms we can say that, according to our data, the average,
or mean, consumption expenditure went up by about 70 cents for a
dollar’s increase in real income.
6. Hypothesis Testing
• As noted earlier, Keynes expected the MPC to be positive but less
than 1. In our example we found the MPC is about 0.70. But before
we accept this finding as confirmation of Keynesian consumption
theory, we must enquire whether this estimate is sufficiently below
unity to convince us that this is not a chance occurrence or peculiarity
of the particular data we have used.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In other words, is 0.70 statistically less than 1? If it is, it may support Keynes’
theory.
• Such confirmation or refutation of economic theories on the basis of
sample evidence is based on a branch of statistical theory known as statistical
inference (hypothesis testing).
7. Forecasting or Prediction
• If the chosen model does not refute the hypothesis or theory under
consideration, we may use it to predict the future value(s) of the dependent, or
forecast, variable Y on the basis of known or expected future value(s) of the
explanatory, or predictor, variable X.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
8. Use of the Model for Control or Policy Purposes
• From the regression, a simple arithmetic gives,
• That is, given an income level of about 7197 (billion) dollars, given a MPC
of about 0.70, will produce an expenditure of about 4900 billion dollars.
• As these calculations suggest, an estimated model may be used for control,
or policy, purposes. By appropriate fiscal and monetary policy mix, the
government can manipulate the control variable X to produce the desired
level of the target variable Y.
Wolkite University Biruk Birhanu (Asst. Professor)
Content
Chapter Two
Estimation of Two Variable Regression Model
2.1. The method of OLS (Second)
2.2. The classical Linear Regression Model (First)
2.3. The Assumptions (CLRM)
2.4. Properties of Least Square Estimates
2.5. The Coefficient of Determination (R-square) and (adj R-square)
2.6. Hypothesis Testing and Confidence Interval
Wolkite University Biruk Birhanu (Asst. Professor)
2.1. The classical Linear Regression Model
What is regression?
• The term regression was introduced by Francis Galton…. Tendency ?
• The dictionary definition of “regression” is ‘backward movement, a
retreat, a return to an earlier stage of development’. Paradoxical as
it may sound, regression analysis as it is currently used has nothing to
do with regression dictionaries define the term.
• Regression analysis is concerned with the study of the dependence
of one variable, the dependent variable (Y), on one or more other
variables, the explanatory variables (x1, x2, x3…), with a view to
estimating and/or predicting the (population) mean or average value
of the former in terms of the known or fixed (in repeated sampling)
values of the latter.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• If we are studying the dependence of a variable on only a single
explanatory variable, such as that of consumption expenditure on
real income, such a study is known as simple, or two-variable,
regression analysis.
• If we are studying the dependence of one variable on more than one
explanatory variable, like the dependence in the crop-yield, rainfall,
temperature, sunshine, and fertilizer, it is known as multiple
regression analysis.
• In other words, in two-variable regression there is only one
explanatory variable, whereas in multiple regression there is more
than one explanatory variable.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• At this point we need to distinguish between two types of
relationship;
-A deterministic or mathematical relationship, and
-A stochastic or statistical relationship which does not give
unique values of for a given values of but can be described
exactly in probabilistic term.
Example:- Suppose that the relationship between sales (Y) and
advertising expenditure (X) is;
𝑌 = 2500 + 100𝑋 − 𝑋 2
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• This equation is a kind of deterministic relationship.
• The sales for different levels of advertising expenditure can be
determined exactly.
On the other hand suppose that the relationship between sales (Y) and
advertising expenditure (X) is;
𝑌 = 2500 + 100𝑋 − 𝑋 2 + u
Where, u=+500 with probability ½
u=-500 With probability ½
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Then the values for different values of X cannot be determined
exactly but can be described probabilistically.
• For instance, the relationship between Y and X is given by;
Y=2+X+u,
Has a known probability distribution.
• Where, the error term u is N(0,1) , then for each value of X,Y will have
a normal distribution-sometimes called the “bell curve” it represent
real valued random variables whose distributions are not known.
• The line have drawn is the deterministic relationship for;
Y=2+X
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
: A stochastic relationship
Y 2 X
Possible values of Y for a given value of X
• The actual values of for each will be some points on the vertical lines
shown. The relationship between Y and X in such cases is called a
stochastic or statistical relationship.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Why should we add an error term or what are the source of the error
term ?
• There are three main sources of the error term (u)
-Unpredictable element or randomness of human behaviour,
-Effect of large number of variables that have been omitted, and
-Measurement error in Y.
• Thus simple regression model can be written as;
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝑢𝑖 𝑖 = 1,2,3, … . . 𝑛. And rearranged for the
error term as;
𝑢𝑖 = 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Regression versus Correlation?
• The correlation between two variables measures the degree of linear
association between them.
• If it is stated that y and x are correlated, it means that y and x are
being treated in a completely symmetrical way. Thus, it is not implied
that changes in x cause changes in y, or indeed that changes in y
cause changes in x. Rather, it is simply stated that there is evidence
for a linear relationship between the two variables, and that
movements in the two are on average related to an extent given by
the correlation coefficient.
Wolkite University Biruk Birhanu (Asst. Professor)
2.2. The Method of OLS
• Method of estimation?
-OLS (Will be covered in class)
-Method of moment (Reading Assignment)
-Maximum Likelihood Estimation (Reading Assignment)
The method of OLS
• The method of ordinary least squares (OLS) has some attractive
statistical properties that have made it one of the most powerful and
popular methods of regression.
• OLS requires that we should choose 𝛼 and 𝛽 as sample estimates of
𝛼 and 𝛽, respectively, so that 𝑢𝑖 will be small of the minimum.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Under the stochastic assumptions we will see later, the method of
ordinary least squares (OLS) has some attractive statistical properties
that have made it one of the most powerful and popular methods of
regression analysis.
• To understand this, method, we first explain the ordinary least square
principle.
• From the two-variable population regression function (PRF) we
have;
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝑢𝑖
• However, as we noted above, the PRF is not directly observable. We
estimate it from the Sample regression function (SRF).
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝑌𝑖 = 𝛼 + 𝛽 𝑋𝑖 + 𝑢𝑖
𝑌𝑖 = 𝑌 + 𝑢𝑖
• To understand how the SRF is determined we first rearrange the above
equation as,
𝑢𝑖 = 𝑌 − 𝛼 − 𝛽𝑖 𝑋𝑖
• Which is the difference between the actual and estimated values.
• Now given pairs of observations on Y and X, we would like to determine
the SRF in such a manner that it is as close as possible to the actual Y. To
this end, we may adopt the following criterion: choose the SRF in such a
way that the sum of the residuals ( u (Y Y )
i i i
is small as possible.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Least square criterio n
Y
Yt
u3 Yt Xt
u1
u4
u2
X1 X2 X3 X4
• If we adopt a criterion of minimizing u i (Yi Y i )
All the residuals receive the same weights in the sum although the
first two residuals are much closer to the SRF than the latter two.
• In other words all the residuals receive equal importance no matter
how close or how widely scattered the individual observations are
from the SRF.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The algebraic sum of the residuals are zero although 𝑢1 and 𝑢4 are
scattered more widely about the SRF than 𝑢2 and 𝑢3 .
• We can avoid this problem if we adopt the least-squares criterion,
which states that the SRF can be fixed in such a way that;
2
𝑢𝑖 = ( 𝑌𝑖𝑖 − 𝑌𝑖 ) = 0
2 2
𝑢𝑖 = ( 𝑌𝑖𝑖 − 𝛼 − 𝛽1 𝑋𝑖 ) = 0, is as small as possible, where 𝑢𝑖 are
the squared residuals.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Estimation Procedure ?
• The method of the Least Squares requires that we should choose 𝛼
and 𝛽 as sample estimates of 𝛼 and 𝛽, respectively, so that 𝑢𝑖 will be
small of the minimum.
• For this purpose we minimize;
• Here we apply the concept of differential calculus, and equate the
first order partial derivation to zero.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The starting point is the OLS method calculates the best fitting line
for the observed data by minimizing the sum of the squares of the
vertical deviations from each data point to the line (RSS).
• The reason for using the squares of the residuals is just to prevent
negative residuals from cancelling positive ones. Because, the
deviations are first squared, then summed, so that there is no
cancellation between positive and negative values.
• The OLS technique is as follows;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝑛 2 𝑛 2 𝑛 2
• 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑅𝑆𝑆 = 𝑢
𝑖=1 𝑖 = 𝑖=1 𝑌𝑖 −𝑌𝑖 = 𝑖=1 𝑌𝑖 − 𝛼 − 𝛽 𝑋𝑖
2
𝜕 𝑛 𝑢 2
𝜕 𝑛 𝑌
𝑖=1 𝑖 −𝛼 −𝛽 𝑋 𝑖
• 𝐹𝑂𝐶 = 𝑖=1 𝑖
= =0
𝜕𝛼 𝜕𝛼
• = 2 𝑛𝑖=1(𝑌𝑖 −𝛼 − 𝛽𝑋𝑖 ) (−1) = 0
• = 𝑛𝑖=1 𝑌𝑖 − 𝑛𝑖=1 𝛼 − 𝑛𝑖=1 𝛽 𝑋𝑖 = 0 , Since, 𝑛
𝑖=1 𝑋𝑖 = 𝑛𝑥
𝑛 𝑛
•= 𝑖=1 𝑌𝑖 − 𝑛𝛼 − 𝛽 𝑖=1 𝑋𝑖 =0
• = 𝑛𝑌 − 𝑛𝛼 − 𝛽 𝑛𝑋 = 0, divide both sides by n, gives 𝜶 = 𝒀 − 𝜷𝑿
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝑛 2 𝑛 2
𝜕 𝑢 𝜕 𝑌𝑖 −𝛼−𝛽 𝑋𝑖
• 𝐹𝑂𝐶 = 𝑖=1 𝑖
= 𝑖=1
=0
𝜕𝛽 𝜕𝛽
𝑛
•=2 𝑖=1 𝑌𝑖 − 𝛼 − 𝛽 𝑋𝑖 −𝑋𝑖 =0
𝑛
•= 𝑖=1 𝑌𝑖 − 𝛼 − 𝛽 𝑋𝑖 𝑋𝑖 =0
𝑛 𝑛 𝑛 2
•= 𝑖=1 𝑌𝑖 𝑋𝑖 − 𝑖=1 𝛼 𝑋𝑖 − 𝑖=1 𝛽 𝑋𝑖 =0
𝑛 𝑛 𝑛 2
•= 𝑖=1 𝑌𝑖 𝑋𝑖 −𝛼 𝑖=1 𝑋𝑖 −𝛽 𝑖=1 𝑖 , Since
𝑋 𝛼 = 𝑌 − 𝛽𝑋
𝑛 𝑛 𝑛 2
•= 𝑖=1 𝑌𝑖 𝑋𝑖 = 𝑌 − 𝛽𝑋 𝑖=1 𝑋𝑖 +𝛽 𝑋
𝑖=1 𝑖
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝑛 𝑛 𝑛 𝑛 2
•= 𝑖=1 𝑌𝑖 𝑋𝑖 =𝑌 𝑖=1 𝑋𝑖 − 𝛽𝑋 𝑖=1 𝑋𝑖 +𝛽 𝑋
𝑖=1 𝑖
𝑛 𝑛 𝑛 2 𝑛
•= 𝑖=1 𝑌𝑖 𝑋𝑖 −𝑌 𝑖=1 𝑋𝑖 =𝛽 𝑋
𝑖=1 𝑖 − 𝛽𝑋 𝑖=1 𝑋𝑖
𝑛 𝑛 2 𝑛
•= 𝑖=1 𝑌𝑖 𝑋𝑖 − 𝑛𝑋𝑌 = 𝛽( 𝑋
𝑖=1 𝑖 −𝑋 𝑖=1 𝑋𝑖 )
𝑛 𝑛 2
•= 𝑖=1 𝑌𝑖 𝑋𝑖 − 𝑛𝑋𝑌 = 𝛽( 𝑋
𝑖=1 𝑖 − 𝑛𝑋 2 ), thus
𝒏
𝒊=𝟏 𝒀𝒊 𝑿𝒊 −𝒏𝑿𝒀
•𝜷= 𝒏 𝑿𝟐 −𝒏𝑿𝟐
𝒊=𝟏 𝒊
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• 𝑨𝒍𝒕𝒆𝒓𝒏𝒂𝒕𝒊𝒗𝒆𝒍𝒚 𝜷
𝒏
𝒊=𝟏(𝑿𝒊 − 𝑿)(𝒀𝒊 − 𝒀) 𝒙𝒚
𝜷= 𝒏 𝟐
= 𝟐
,
𝒊=𝟏(𝑿𝒊 − 𝑿) 𝒙
𝒘𝒉𝒆𝒓𝒆 𝒙 = 𝑿𝒊 − 𝑿, 𝒂𝒏𝒅 𝒚 = 𝒀𝒊 − 𝒀
• Let us define the deviation of the variables from their mean using
small letters as follows;
2
A. y i2 (Yi Y ) 2 Yi 2 n Y
2
B. x 2
i ( X i X ) 2
X i
2
nX
C. x i y i ( X i X )(Yi Y ) X Y i i nXY
Wolkite University Biruk Birhanu (Asst. Professor)
2.3. The assumptions underlying the classical linear
regression model
• The model yt = α + βxt + ut that has been derived above, together
with the assumptions listed below, is known as the classical linear
regression model (CLRM).
• Data for xt is observable, but since yt also depends on ut, it is
necessary to be specific about how the ut are generated.
• The set of assumptions shown below are usually made concerning
the uts, the unobservable error or disturbance terms. Note that no
assumptions are made concerning their observable counterparts, the
estimated model’s residuals.
• Since our interest is not estimating 𝛼 𝑎𝑛𝑑 𝛽 , but also to draw
inference about the true PDF estimates.
Wolkite University • Biruk Birhanu (Asst. Professor)
…trend
• Therefore, the assumption we make about the 𝑋𝑖 and 𝑢𝑖 are
extremely critical to the valid interpretation of the regression
estimates.
• The assumptions underlying the standard, or classical linear
regression model (CLRM) are;
1. Zero mean value of the disturbance (𝒖𝒊 ):- given the value X of the
mean or expected value of the disturbance term is zero. Technically,
the conditional mean value of 𝑢𝑖 is zero. Symbolically,
E (u i ) 0 for all i or
E (ui / X i ) 0
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
2. Homoscedasticity or equal variance of 𝒖𝒊 :- that is; given the value of
X, the variance of 𝑢𝑖 is the same for all observations. That is the
conditional variances of 𝑢𝑖 are identical. Symbolically we have,
var(ui / X i ) E[ui E(ui / X i )]2
E (u i2 / X i ) because of the 3rd assumption
var( u i ) 2 for all i
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
3. Independence or no autocorrelation between the disturbance term;
given any two X values 𝑋𝑖 and 𝑋𝑗 (𝑖 ≠ 𝑗 ), the correlation between 𝑢𝑖
and 𝑢𝑗 is zero. Symbolically,
cov(ui , u j / Xi, X j ) E{[ui E(ui )] / X i }{[u j E(u j )] / X j }
[ui / X i ][u j / X j ] 0
Where, i and j are different observations.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
4. Independence of 𝑿𝒋 ; that is 𝑢𝑗 and 𝑋𝑗 are independent or the
𝐸(𝑢𝑗, 𝑋𝑗 )=0 for all j.
cov(u j , X j ) E[u j E(u j )][ X j E( X j )]
E[u j ( X j E ( X j )] 0 ; since E(u j ) 0
E(u j X j ) E( X j ) E(u j ) 0 since E( X j ) is non stochastic
E (u j X j ) 0 ; since E(u j ) 0
0 by assumption
Assumption (4) says that the disturbance term and the explanatory
variable are uncorrelated.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
5. Normality; that is, 𝑢𝑖 are normally distributed for all i. In conjunction
with assumption 3, 4, and 5, this implies that 𝑢𝑖 are independently and
normally distributed with mean zero and a common variance 𝛿 2 . We
write this as
u i ~ IN (0, 2 )
6. Linear Regression model; that is the regression model is linear in
parameters but it may not be linear in variables. That is, 𝛼 and 𝛽
appear with power 1 only and cannot be multiplied or divided by any
other variables like 𝛼 ∗ 𝛽 , 𝛼 𝛽 etc.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
7. The X values are fixed in a repeated sampling; values taken by the
regressor X are considered fixed in repeated samples. More technically,
X is assumed to be non-stochastic.
8. The number of observations (n ) must be greater than the number
of parameters (k) to be estimated. Alternatively the number of
observations should be greater than the number of explanatory
variables.
9. Variability of values; That is, X’s in a given sample must not be all
the same. Technically, the Var (X) must be a finite positive number.
10. The regression model is correctly specified; alternatively, there is
no specification bias or error in the model used in the empirical
analysis. That is, variables to be included in the model, the functional
form, and statistical assumptions should be correct.
Wolkite University Biruk Birhanu (Asst. Professor)
2.4. Properties of Least Square Estimates
• The main desirable properties of least square estimates are Best,
Linear, Unbiased Estimator (BLUE).
• The least squares estimator is unbiased if its bias is to zero, that is if
𝑬 𝜷 = 𝜷 . This means that the unbiased estimator converges to the
true value of the parameter as the number of samples (of any given
finite size n).
• An unbiased estimator gives ‘on average’ the true value of the
parameter. Unbiasedness is a desirable property but not particularly
important by it self. It becomes important when combined with small
variance.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Consider the following simple regression model;
𝑌𝑖 = 𝛽𝑋𝑖 + 𝑢𝑖 𝑖 = 1,2, … . , 𝑛
For simplicity we have omitted the constant term.
𝐸 𝑌𝑖 = 𝛽𝑋𝑖 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑌𝑖 = 𝛿 2 . 𝑙𝑒𝑡 𝑢𝑠 𝑝𝑟𝑜𝑣𝑒 𝛽 𝑖𝑠 𝑡𝑒 𝐵𝐿𝑈𝐸 𝑜𝑓 𝛽.
1. Linearity of 𝜷: in a stochastic variable 𝑌𝑖 𝑜𝑟 𝜀𝑖 .
𝑥𝑖 𝑦𝑖 𝑥𝑖 𝑌𝑖 − 𝑌 𝑥𝑖 𝑌𝑖 𝑥𝑖 𝑌
𝛽= 2 = 2 = 2 − 2 𝑠𝑖𝑛𝑐𝑒 𝑥𝑖 = 0,
𝑥𝑖 𝑥𝑖 𝑥𝑖 𝑥𝑖
𝑥𝑖 𝑌𝑖 𝑥𝑖 𝑌𝑖 𝑥𝑖
𝛽= 2 = 2 , 𝑤𝑒𝑟𝑒 2 = 𝐾𝑖 , 𝐾𝑖 𝑌𝑖
𝑥𝑖 𝑥𝑖 𝑥𝑖
𝛽 = 𝐾1 𝑌1 + 𝐾2 𝑌2 , + ⋯ + 𝐾𝑛 𝑌𝑛
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
2. Unbiasedness of 𝜷; 𝑤𝑖𝑐 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 𝐾𝑖 𝑌𝑖
𝛽= 𝐾𝑖 𝑌𝑖 = 𝐾𝑖 𝛼 + 𝛽𝑋𝑖 + 𝜀𝑖
𝛽=𝛼 𝐾𝑖 + 𝛽 𝐾𝑖 𝑋𝑖 + 𝐾𝑖 𝜀𝑖 , 𝑤𝑒𝑟𝑒 𝐾𝑖 = 0, 𝐾𝑖 𝑋𝑖 = 1,
𝛽= 𝛽+ 𝐾𝑖 𝜀𝑖 , 𝑡𝑎𝑘𝑖𝑛𝑔 𝑡𝑒 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒𝑠
𝐸 𝛽 = 𝐸 𝛽 +𝐸 𝐾𝑖 𝜀𝑖
𝐸 𝛽 = 𝛽+ 𝐾𝑖 𝐸 𝜀𝑖 , 𝑤𝑒𝑟𝑒 𝐸 𝜀𝑖 𝑖𝑠 0.
Wolkite University Biruk Birhanu (Asst. Professor)
𝐸 𝛽 = 𝛽
…trend
• Hence 𝛽 is an unbiased linear estimator,
• An estimator is unbiased if its bias is to zero; that is if E(𝜷) = 𝜷.
• The bias of an estimator is defined as the difference between its
expected value and the true parameter. That is;
Bias= E(𝜷) − 𝜷
3. Efficient; suppose 𝛽 is another unbiased linear estimator of 𝛽.
Then Var (𝛽 ) ≥ 𝑉𝑎𝑟 𝛽 .
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Assignment,
Prove the following theorems
1. The assumption of Common variance or homoscedasticity.
2. The assumption of independence or no autocorrelation between two
error terms; that is 𝑢𝑖 and 𝑢𝑗 are independent for all 𝑖 ≠ 𝑗.
Wolkite University Biruk Birhanu (Asst. Professor)
2.5. The Coefficient of Determination (R-square)
and (adj R-square)
• The coefficient of determination r2 (two-variable case) or R2
(multiple regression) is a summary measure that tells how well the
sample regression line fits the data.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In this figure the circle Y represents variation in the dependent variable Y and the
circle X represents variation in the explanatory variable X.
• The overlap of the two circles (the shaded area) indicates the extent to which the
variation in Y is explained by the variation in X (say, via an OLS regression).
• The greater the extent of the overlap, the greater the variation in Y is explained
by X.
• The r2 is simply a numerical measure of this overlap.
• In the figure, as we move from left to right, the area of the overlap increases, that
is, successively a greater proportion of the variation in Y is explained by X. In
short, r2 increases.
• When there is no overlap, r 2 is obviously zero, but when the overlap is complete,
r 2 is 1, since 100 percent of the variation in Y is explained by X. As we shall show
shortly, r 2 lies between 0 and 1.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• We wish to characterize to what extent the variables included in X
(excluding the constant, if there is one) explain Y. let’s use the
following graph,
• It means, Decomposing the variation in Y:
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Decomposing the variation in Y:
• One measure of the variation in Y is the sum of its squared deviations
around its sample mean, often described as the Total Sum of
Squares, TSS.
• TSS, the total sum of squares of Y can be decomposed into ESS, the
‘explained’ sum of squares, and RSS, the residual (‘unexplained’) sum
of squares.
TSS = ESS + RSS
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Residual and Goodness of Fit
The last term equal zero. How ?
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Coefficient of Determination (𝑹𝟐 ): the proportion of the variation in
the dependent variable that is explained by the model.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The OLS regression coefficients are chosen in such a way as to
minimize the sum of the squares of the residuals.
• Thus it automatically follows that they maximize R2.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Assignment 2
• What is the difference r2 has with adjusted r2?
Wolkite University Biruk Birhanu (Asst. Professor)
2.6. Hypothesis Testing and Confidence Interval
Statistical Inference
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The only way to know the true mean (or any other parameter) of a
population is by surveying the entire population.
• When we take only a sample of the population, Unfortunately, there
is always uncertainty about whether we are inferring or generalizing
known characteristics of the sample to the population.
• The characteristics of our sample reflect the characteristics of the
population.
• This is true even if we have a representative sample and followed the
best sampling methods available.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Hypothesis Testing in Regression Analysis
• A hypothesis is an assumption we make about a population
parameter.
• The procedure for testing a hypothesis concerning the value of
population parameters and involves the following six steps.
First: Formulate the null and alternate hypotheses
• To test their statistical reliability, that is to apply some rule which will
enable us to decide whether to accept our estimates or reject it. To
make such a decision the best way is to compare the estimate with
the true value of population parameter.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• However, the population parameter is unknown.
• Under these circumstances how are we going to make the decision
whether to accept or reject the sample estimate given that we do
not have the appropriate yardstick (that is the population parameter)
for making comparison required?
• To bypass such difficulty we make some assumption about the value
of the population parameter and use our sample estimate in order to
decide whether our assumption is acceptable or not.
• The hypothesis which we wish to test (on the basis of the evidence of
our sample estimate) is called the null hypothesis, because it implies
that the true population parameter is zero.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Alternative hypothesis (H1) is the counterpart proposition of the null
hypothesis. The form in which we express the alternative hypothesis is
important in defining the location of the rejection region or critical
region of the test. It may take one of the following forms.
H 0 : 0 (or H 0 : 0) H1 : 0 (or H1 : 0)
H 0 : 0 (or H 0 : 0 H1 : 0 (or H1 : 0 )
H 0 : 0 (or H 0 : 0 H1 : 0 (or H1 : 0 )
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The acceptance region includes the values of X , which have high
probability of being observed and the rejection region or critical
region includes the values of the population, which are highly
unlikely, that is they have low probability of being observed.
Second: Choose the level of significance of the test
• In making decision one can never be 100% sure that one will make
the right decision; because we are liable to commit one of the
following types of errors:
Type I Error: We reject the null hypothesis when it is actually
true-expect Prob>0.05
Type II Error: We accept the null hypothesis when it is actually
false-expect Prob<0.05
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• For deciding whether to accept or reject our hypothesis. It is
customary in econometrics to choose the 5% or 1% level of
significance. This means that in making our decision we allow
(tolerate) five times of a hundred to be “wrong”, that is to reject the
probability when it be actually true.
Third: Choose the location of the critical region
• In applied econometrics it has become customary to perform a two-
tail test. The choice of two-tail implies no priori knowledge regarding
the sign of the coefficient whose significance is being tested.
• However, a one-tail test would be appropriate for variables economic
theory does usually provide us with a priori expectation regarding the
sign of the coefficients of economic relation.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• For example, if we choose the 5% level of significance (𝛼), each tail
will include the area (probability) 0.025 ( 𝛼 2).
Fourth: Choose the appropriate test statistics (Z, t and F) and find
from the relevant tables the critical value(s) of the chosen statistics.
That is the values(s) that defines the boundary of the critical region.
Fifth: Compute from the sample observations, the observed values (or
sample value or empirical value) of the choose statistics, using the
relevant formula.
Sixth: Compute the sample value of the chosen statistics with the
theoretical (tabular) value(s) that define the critical region. If the
observed value of the statistics falls in the critical region we reject the
null hypothesis. Otherwise we can’t accept the null hypothesis.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Confidence Interval
• The confidence interval is an expression of that uncertainty,
expressing an area around the sample mean we think the population
mean is likely to fall.
• The larger our sample, the smaller the confidence interval. The larger
the sample, the more sure we are that the sample mean
approximates the population mean.
• There are different confidence intervals, depending on how sure we
want to be that the population mean falls within the CI. The more
sure we want to be, the wider the range of the confidence interval
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• T-statistics for 𝛽 is; we use the t-distribution to establish CI for 𝛽1
𝜷𝟏 − 𝜷𝟏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓 − 𝒑𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓
𝒕= =
𝑺𝒆𝜷𝟏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓
With (n-K) df
𝑡𝛼 𝑡𝛼
𝑝𝑟 − 2 ≤ 𝑡 ≤ 2 =1−𝛼
Where;
𝑡 −value in the middle of this double inequality is the t-value.
𝑡𝛼
2 is the value of the t-variable obtained from the t-distribution for
𝛼 level of significance and n-k degree of freedom.
2
Wolkite University Biruk Birhanu (M.Sc.)
…trend
𝑡𝛼 𝛽1 − 𝛽1 𝑡𝛼
𝑝𝑟 − 2 ≤ 𝑆𝑒(𝛽 ) ≤ 2 =1−𝛼
1
P 1 t / 2 ( Se 1 ) 1 1 t / 2 ( Se 1 ) 1
P 1 t 0.025 ( 1 ) 1 1 t 0.025 ( 1 ) 0.95
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Arguing analogously, we can write the 95% or 100 (1-𝛼)% confidence
interval for the constant term (𝛼) as;
𝑡 𝑡
𝑝 𝛼 − 𝛼 2 𝑠𝑒𝛼 ≤ 𝛼 ≤ 𝛼 + 𝛼 2 (𝑠𝑒𝛼 ) = 0.95
𝜶 = 𝜶 ± 𝒕𝟎.𝟎𝟐𝟓 (𝜹𝜶)
• In both cases the width of the confidence interval is proportional to
the standard errors of the estimator.
• That is, the larger the standard error the larger will be the width of
the confidence interval. Put differently, the larger the standard errors
of the estimator, the greater is the uncertainty of estimating the true
value of the parameter.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Example:- Suppose we have estimated and obtained the following
regression line for a sample of 25 observations.
Y 89 2.88 X
Se (38.4) (0.85)
A. Find the 95% confidence interval for the parameter estimators 𝛽1
and 𝛼. Interpret also the confidence intervals.
B. Calculate the t-statistics for the parameter estimator 𝛽 and 𝛼 and
make inference whether the parameters are statistically significant or
not.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
A. First, from the table we get the critical value of 𝑡𝛼 2 = 𝑡0.025=2.069
for 23 degree of freedom. Second, substituting these values into
equation (2.25d) the 95% confidence interval for the parameter 𝛽1 .
P 1 t 0.025 ( 1 ) 1 1 t 0.025 ( 1 ) 0.95
1 1 t0.025 ( 1 )
2.88 2.069(0.85) 1 2.88 2.069(0.85)
2.88 (2.069)(0.85) That is,
2.88 1.75865 1 2.88 1.75865
2.88 1.75865
1.12135 1 4.6386
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The interpretation of this confidence interval is given the 95%
confidence interval, in the long run, in 95 out of 100 cases interval like
(1.1214 , 4.6386) will contain the true 𝛽1 .
• But note that we cannot say that the probability is 95% that the
specific interval (1.1214 to 4.6386) contains the true 𝛽1 because this
interval is now fixed and no longer random. Therefore, 𝛽1 either lies
in it or does not. The probability that the specified fixed interval
includes the true is therefore 1 or 0.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Please, do the rest questions.
Wolkite University Biruk Birhanu (Asst. Professor)
Chapter Three
Multiple Linear Regression
3.1. Method of Ordinary Least Squares revised
3.2. Partial Correlation Coefficients & their Interpretation
3.3. Coefficient of Multiple Determination
3.4. Properties of Least Squares and Gauss-Markov Theorem
3.5. Hypothesis Testing in Multiple Linear Regression
3.6. Predictions using Multiple Linear Regression
Wolkite University Biruk Birhanu (Asst. Professor)
3.1. Method of Ordinary Least Squares revised
• In multiple regression we study the relationship between Y and a
number of explanatory variables, X1, X2, X3,…..,Xk .
• For instance, in demand studies we study the relationship between
quantity demanded of a good Y and price of the good, prices of the
substitute goods, and the consumer’s income.
• The model we assume is;
Yi 1 X 1i 2 X 2i ... k X ki ui ; Where i 1,2,..., n
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In this section we start our analysis with the case of two explanatory
variables and then present the formulas for the case of K explanatory
variables. Consider the model
𝑌𝑖 = 𝛼 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝑢𝑖 ;
𝑤𝑒𝑟𝑒 𝑖 = 1,2, … … . . , 𝑛 −−−−−−−−−−− −(3.1)
• The assumptions we have made about the error terms 𝑢𝑖 imply that;
E (u ) 0 cov( X1, u) =0 cov( X 2 , u ) =0
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• As in the case of the simple regression model discussed in the
previous section, we can replace the assumptions by their sample
counterparts.
• Let 𝛼, 𝛽1 , and 𝛽2 be the estimators of α, 𝛽1 , and 𝛽2 respectively.
• The sample counterpart of 𝑢𝑖 is the residual;
• The three equations to determine 𝛼, 𝛽1 , and 𝛽2 are obtained by
replacing the population assumptions by their sample counterparts;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Population assumptions Sample counterparts
E (u ) 0 (1 / n) u i 0 OR u i 0
cov( X 1 , u ) (1 / n) X 1i u i 0 OR X 1i u i 0
cov( X 2 , u ) (1 / n) X 2 i u i 0 OR X 2i u i 0
• These equations are also being obtained by the use of the Least Squares
method and are referred to as the ‘normal equations'.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• We should choose 𝛼, 𝛽1 , and 𝛽2 of α, 𝛽1 , and 𝛽2 so as to minimize,
2
ui (Yi 1 X 1i 2 X 2i ) 2 -
• To minimize we first differentiate equation with respect to;
2 2 2
u i u i u i
0;
0 ,and
0
1 2
• To proceed with the derivation, we expand the right hand side of the
above equation and get;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
2
2 1 2 X 1 X 2 2 X 22 -
2
e
Then the partial differentiation of i with respect to , 1 and 2 are
2
u i
0 2 Yi 2n 2 1 X 1 2 2 X 2 0
2
u i
2 X 1Y 2 X 1 21 X 1
2
2 2 X 1 X2 0
1
2
u i
2 X 2Y 2 X 2 2 2 X 2
2 2 1 X1 X 2 0
2
Wolkite University Biruk Birhanu (M.Sc.)
…trend
• These three equations, as mentioned earlier, are called “normal equations”.
• Dividing them by 2,
Y i n 1 X 1 2 X 2
• Dividing both sides by n, will give;
Where,
1 1 1
Y
n
Y i X1
n
X 1i X2
n
X 2i
Rearranging the above equation we obtain the value of the intercept term (𝛼,) as;
𝜶 = 𝒀 − 𝜷 𝟏 𝑿 𝟏 − 𝜷𝟐 𝑿
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• From the above equation;
X 1Y X1 1 X 12 2 X1 X 2
X 2 Y X 2
1 X1 X 2 2 X 2
2
• Substituting the value of 𝛼, from equation into equation
We get;
X Y n X 1 (Y 1 X 1 2 X 2 ) 1
1 X 1
2
2 X 1 X2
We can simplify this equation by the use of the following six notations.
Let us define;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
2
1. x1 y X 1Y n X 1 Y 4. 1 1 nX1
x 2
X 2
2. x y X Y nX
2 2 2 Y 5. x x X X
1 2 1 2 n X1 X2
2 2
3. nY
y 2
Y 2
6. 2 2 nX2
x 2
X 2
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Proof:
x y (Y Y )( X
1 1 X 1 ) . Expanding the right hand side gives
Recall that, X 1 n X 1 and Y n Y . Substituting these into the above expression
we get
x1 y X 1Y Y n X 1 X 1 n Y n X 1 Y
Now the last two expressions cancel out and we remain only with
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
x y X Y n X
1 1 1 Y ; Which is the same as the first notation in equation
2
X Y 1 n X1Y 1 n X1 2 n X1 X 2 1 X 1
2
2 X 1 X2
2
X 1Y nX 1 Y 1
X 1
2
n X 1
2 X1 X 2 n X 1 X
2
Substituting the first, forth, and fifth notations of equation
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
x2 y 1 x1 x 2 2 x 2
2
• Now we can solve these two normal equations; that is equation
x1 y 1 x12 2 x 1 x2
x 2 y 1 x 1 x2 2 x 2
2
• To get 𝛽1 , 𝛽2
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
( x1 y 1 x 2 x x
2 2
1 1 2 ) x2
( x 2 y 1 x1 x 2 2 x 2
2 ) x1 x2
• Then solving these equations using the procedure for solving
simultaneous equations we get;
x x2
2 1 y 1 x x2
1
2
2 2 x 1 x 2 x 22
x1 x 2 x 2 y 1 x1 x 2 2
x x
2 2
2 1 x2
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• This in turn gives us;
1 x1 x 2
x x 1 y x1 x 2 x 2 y 1 x x
2 2 2 2
2 1 2
Finally, solving for 1 we obtain the expression as;
𝒙𝟐𝟐 𝒙𝟏 𝒚 − 𝒙𝟏 𝒙𝟐 𝒙𝟐 𝒚
𝜷𝟏 = 𝟐
𝒙𝟐𝟏 𝒙𝟐𝟐 − ( 𝒙𝟏 𝒙𝟐 )
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Secondly, multiply equation;
x1 y 1 x12 2 x x 1 2
by x1 x2
2
by x1
x 2 y 1 x1 x 2 2 x 22
• Then solving these equations using the procedure for solving
simultaneous equations we have;
x1 x 2 2 x1 x 2
x1 x 2 x1 y 1 x 2 2
1
x x 2
1 2 y 1 x 1 x2 x 2
1 2 x x
2
2
2
1
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Further it can be simplified to
x x2
x12 x1 y x1 x 2 x1 y x12 x 22
2
2 2 1
• Finally, we obtain the expression for 𝛽2 as;
𝒙𝟐𝟏 𝒙𝟐 𝒚 − 𝒙𝟏 𝒙𝟐 𝒙𝟏 𝒚
𝜷𝟐 = 𝟐
𝒙𝟐𝟏 𝒙𝟐𝟐 − ( 𝒙𝟏 𝒙𝟐 )
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In multiple regression each regression coefficient measure the
change in the dependent variable for a unit increase in that
independent/explanatory variable; say X1 , holding other variables
constant.
Hence, the coefficients 1 and 2 are called partial correlation coefficients.
• Naturally, if we add more factors to our model that are useful for
explaining Y, then more of the variation in Y can be explained.
• Thus, multiple regression analysis can be used to build better models
for predicting the dependent variables.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Therefore, the computational procedure in multiple regression is as
follows;
1. Obtain all the means; Y , X 1 , X 2
2. Obtain all sums of squares of deviations from respective means; 1 2
x 2
, x 2
, y 2
and the sums of products X 1 X 2 , X 1Y , X 2Y
Wolkite University Biruk Birhanu (Asst. Professor)
3.2. Partial Correlation Coefficients & Their Interpretation
If we have an explained variable Y and three explanatory variables X 1 , X 2 , X 3 and
rY21 , rY22 and rY23 are the squares of the simple correlations between Y and X 1 , X 2 and X3
2 2 2
respectively, then rY 1 , rY 2 and rY 3 measure the proportion of the variance in Y that X 1
alone, X 2 alone or X 3 alone explain.
2
On the other hand, RY 123 measures the proportion of the variance of Y that X 1 , X 2 and
X 3 explain together.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
We would also like to measure how much does X 2 explain after X 1 is included? How
much does X 3 after X 1 and X 2 are included?
2
These are measured by the partial correlation of coefficient of determination; rY 2.1 and
rY23.12 respectively.
The variables written in the subscripts and appears after the dot are the variables already
included. Therefore, with three explanatory variables the partial correlations are
rY1.2, , rY1.3 , rY 2.1 , rY 2.3 , rY 3.1 , and rY 3.2 .
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
These are called partial correlations of the first order. We have also three partial
correlation coefficients of second order: rY 1.23 , rY 2.13 , and rY 3.12 . In both partial correlations
the variables after the dot are always the variables already included in the regression
equation.
The order of the partial correlation coefficient depends on the number of variables after
the dot. The usual convention is to denote simple and partial correlations by a small r
and multiple correlations by capital R .
Wolkite University Biruk Birhanu (Asst. Professor)
3.2.1. Computation of the partial correlation coefficients
For this we use the relationship between r 2 and t 2 .
2
For example, to compute rY 2.3 we have to consider the multiple regression of Y on X 2
and X 3 . Let the estimated regression equation be
Y 2 X2 3 X3
Let t 2 2 / SE( 2 ) from this equation.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝑆𝑒 𝛽2 = 𝑉𝑎𝑟𝛽2
1
𝑉𝑎𝑟𝛽2 =𝜎𝜇2 , and we can obtain 𝜎𝜇2
𝑥𝑖2
2 u 2
i
nk
t 22
r 2
d. f .
Y 2. 3
t 22
Wolkite University Biruk Birhanu (Asst. Professor)
3.2.2. Interpretation of partial correlations
Partial correlations are very important in deciding whether or not to include more
explanatory variables. For instance, suppose that we have two explanatory variables X 1
2 2
r r
and X 2 and Y 2 is very high; say 0.95 and Y 2.1 is very low; say 0.01.
What this means is that if X 2 alone is used to explain Y , it can do a good job. But after
X 1 is included X 2 does not help any more in explaining Y ; that is X 1 has done the good
job of X 2 . In this case there is no use of including X 2 .
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• In fact we can have a situation where, for instance
rY21 0.95 and rY22 0.96 . But, rY 1.2 0.1 and rY 2.1 0.1
• In this case each explanatory variable is highly correlated with but
the partial correlations are both very low.
This is called multicollinearity.
Example: GDP=F(skilled L, unskilled L)
Wolkite University Biruk Birhanu (Asst. Professor)
3.3. Coefficient of Multiple Determination
• In the two-variable case we saw that 𝒓𝟐 are defined as a measure of
goodness of fit of the regression equation; that is the proportion of
the total variation of the dependent variable explained by the single
explanatory variable 𝒙𝟏 .
• This notation of 𝑟 2 can be easily extended to the regression model
containing more than one explanatory variables.
• Thus in the three variable model we would like to know the variation
in Y explained by 𝑥1 and 𝑥2 jointly. The quantity that gives us this
information is called the multiple coefficient of determination.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The analogous expressions in multiple regressions are;
A. Residual sum of squares (RSS) = y2 1 x1 y 2 x 2 y
B. Regression/explained sum of squares (ESS) = 1 x1 y 2 x 2 y
2
C. RY . X 1 X 2 =
1 x1 y 2 x 2 y
y 2
RY2. X1 X 2 is called the multiple coefficient of determination and it’s the positive square root
is called the multiple correlation coefficient. The first subscript is the explained variable.
The subscripts after the dot are explanatory variables.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
𝛽1 𝑥1 𝑦 + 𝛽2 𝑥2 𝑦
𝑅2 =
𝑦2
𝑢2
𝑖
𝑅2 = 1 − 2 𝑛−𝑘 is the adjusted 𝑅2 . The term adjusted implies
𝑦 𝑛−1
adjusted for the degree of freedom associated with the sum of squares
entering into equation.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Exercise 6: From the following data estimate the partial regression coefficients, their
standard errors, and the adjusted and unadjusted R 2 values using intermediate values.
Where, n 15
(Y
Y 367.693 X 1 402.760 X 2 8 .0 i Y ) 2 66042.269
(X 1i X 1 ) 84855.096
2
(X 2i X 2 ) 2 280.0
(Y i Y )( X 1i X 1 ) 74778.346 (Y i Y )( X 21i X 2 ) 4250.9
(X 1i X 1 )( X 21i X 2 ) 4796.0.9
Wolkite University Biruk Birhanu (Asst. Professor)
3.4. Properties of Least Squares and Gauss-Markov
Theorem
• The errors, 𝑢𝑖 are again due to measurement errors in and errors in
the specification of the relationship between Y and the X’s.
• For reasons already spelled out, we assume that the 𝑢𝑖 follow the
normal distribution with zero mean and constant variance error 𝜎 2 .
• We continue to operate within the framework of the classical linear
regression model (CLRM) introduced from chapter 2, specifically we
assume;
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
1. Zero mean of u i ; E (u i ) 0
2. Homoscedasticity; var( u i )
2
for all i
3. No serial correlation; u i and u j are independent for all i j
3. Zero covariance between u i and X j or u i and X j a are independent for all i and j
4. Normality; u i are normally distributed for all i .
5. The X values are fixed in a repeated sampling or X is assumed to be non-stochastic.
…trend
6. The number of observations ( n ) must be greater than the number of parameters ( k ) to
be estimated.
7. Variability of X values. That is, X ' s in a given sample must not be all the same.
Technically, the var( X ) must be a finite positive number.
8. No exact collinearity between X variables; no exact linear relationship between X 1
and X 2 .
9. The multiple regression model is linear in parameter
10. No specification bias; the model is correctly specified.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Under the first four assumptions, we can show that the method of least squares gives
estimators , 1 , 2 ,..., k that are unbiased and have the minimum variance among the
class of linear unbiased estimators.
With normality assumption and following the same discussion as the former chapter, we
can show that the OLS estimators of the partial regression coefficients are best linear
unbiased estimators (BLUE).
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Moreover, the estimators , 1 , and 2 are themselves normally distributed with means
equal to the true , 1 , and 2 and their variances. One can also show upon replacing 2
2
by the unbiased estimator of in the computation of the standard errors, each of the
following variables;
*
t*
se( )
1 1 *
t*
se( 1 )
2 2 *
t*
se( 2 )
With t distributio n with n degree of freedom
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Assumption 5 is needed for the test of significance and confidence intervals. It is not
needed to prove the optimal properties of the least squares estimators. In addition to
these assumptions, which are similar to those we make in the case of simple regression,
we will also assume that X 1 , X 2 ,..., X k are not collinear, that is, there is no determinist ic
linear relationship among them (assumption 6 above). For instance;
A case where there is exact linear relationship between explanatory variables is known
as exact or perfect collinearity. In the case of two variables we considered, the exact
relationship implies that the correlation coefficient between X 1 and X 2 is +1 or -1. In our
analysis in this unit we rule out perfect collinearity but not the case where the correlatio n
between the variables is high or not perfect. When it comes to the analysis of the effect
of several variables, X 1 , X 2 ,..., X k on Y we have to distinguish between joint effects and
partial effects.
Wolkite University Biruk Birhanu (Asst. Professor)
3.5. Hypothesis Testing in Multiple Linear
Regression
• Reading Assignment
Compare and contrast the difference in hypothesis testing in
simple and multiple linear regression model.
Wolkite University Biruk Birhanu (Asst. Professor)
3.6. Predictions using Multiple Linear Regression
Model
• The formulas for forecasting/prediction of the dependent variable for
any values of the explanatory variables in multiple regressions are
similar to those in the case of simple regression except that to
compute the standard error of the predicted value we need the
variance and covariance of all the regression coefficients.
• Again we will present the expression for the standard error in the
case of two explanatory variables and then the expression can be
extended for the general case of k explanatory variables.
• Let the estimated multiple regression equation be;
Y 1 X1 2 X2
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Now consider the forecasting of the value of Y0 of Y given values of X 10 of X 1 and X 20
of X 2 , respectively. These could be values at some future date. Then we have
Y0 1 X 10 2 X 20 0
Consider;
Y 0 1 X 10 2 X 20
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
The prediction error is
Y 0 Y0 ( 1 1 ) X 10 ( 2 2 ) X 20 0
Exercise 7: Again consider the example in the former section, the estimated regression
is
Y 4 . 0 0. 7 X 1 0 . 2 X 2
Consider the prediction of Y for X 10 12 and X 20 7 . We have
Y 4.0 0.7 (12) 0.2(7) 13.8
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Note that;
X 10 X 1 12 10 2
X 20 X 2 75 2
Wolkite University Biruk Birhanu (Asst. Professor)
Chapter Four
Problems of Measurement, Specification, Estimation and Their
Solutions
4.1. Heteroscedasticity
4.2. Autocorrelation
4.3. Multicollinearity
4.4. Distributed lag models and Expectations
Wolkite University Biruk Birhanu (Asst. Professor)
Introduction
• Recall that five assumptions were made relating to the classical linear regression
model (CLRM). These were required to show that the estimation technique,
ordinary least squares (OLS), had a number of desirable properties, and also so
that hypothesis tests regarding the coefficient estimates could validly be
conducted. Specifically, it was assumed that:
1. E(ut ) = 0
2. var(ut ) = σ2 < ∞
3. cov(ui ,u j ) = 0
4. cov(ut ,xt ) = 0
5. ut ∼ N(0, σ2)
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• These assumptions will now be studied further, in particular looking
at the following:
● How can violations of the assumptions be detected?
● What are the most likely causes of the violations in practice?
● What are the consequences for the model if an assumption is
violated but this fact is ignored and the researcher proceeds
regardless?
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The answer to the last of these questions is that, in general, the
model could encounter any combination of three problems:
● the coefficient estimates (βˆs) are wrong
● the associated standard errors are wrong
● the distributions that were assumed for the test statistics are
inappropriate.
Wolkite University Biruk Birhanu (Asst. Professor)
1. Assumption 1: E(ut ) = 0
• The first assumption required is that the average value of the errors is zero.
• In fact, if a constant term is included in the regression equation, this assumption
will never be violated.
• But what if financial theory suggests that, for a particular application, there
should be no intercept so that the regression line is forced through the origin? If
the regression did not include an intercept, and the average value of the errors
was nonzero, several undesirable consequences could arise.
First, R2, defined as ESS/TSS can be negative, implying that the sample
average, ¯y, ‘explains’ more of the variation in y than the explanatory
variables.
Second, and more fundamentally, a regression with no intercept
parameter could lead to potentially severe biases in the slope
coefficient estimates.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• The solid line shows the regression estimated
including a constant term, while the dotted line
shows the effect of suppressing (i.e. setting to
zero) the constant term. The effect is that the
estimated line in this case is forced through the
origin, so that the estimate of the slope
coefficient ( ˆ β) is biased.
• Additionally, R2 and ¯R2 are usually
meaningless in such a context. This arises since
the mean value of the dependent variable, ¯y,
will not be equal to the mean of the fitted
values from the model, i.e. the mean of ˆy if
there is no constant in the regression.
Wolkite University Biruk Birhanu (Asst. Professor)
2. Assumption 2: var(ut ) = σ2 <∞
• It has been assumed thus far that the variance of the errors is
constant, σ2 - this is known as the assumption of homoscedasticity.
• If the errors do not have a constant variance, they are said to be
heteroscedastic.
• To consider one illustration of heteroscedasticity, suppose that a
regression had been estimated and the residuals, ˆut , have been
calculated and then plotted against one of the explanatory variables,
x2t.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
It is clearly evident that the errors in
figure are heteroscedastic-that is,
although their mean value is roughly
constant, their variance is increasing
systematically with x2t .
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• How can one tell whether the errors are heteroscedastic or not?
Option One: Graphical method, however it doesn’t show the
cause or form of hetroscedastisity
• It is also possible that the variance of the errors changes over time
rather than systematically with one of the explanatory variables
(ARCH/GARCH)-the GARCH model allows the conditional variance to
be dependent upon previous own lags.
• The test is called: Goldfeld-Quandt (1965) test.
• The null hypothesis is that the variances of the disturbances are
equal, which can be written H0 : σ2 1= σ22 , against a two-sided
alternative.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• What happens if the errors are heteroscedastic, but this fact is
ignored and the researcher proceeds with estimation and inference?
First: In this case, OLS estimators will still give unbiased (and also
consistent) coefficient estimates, but they are no longer BLUE-
that is, they no longer have the minimum variance among the
class of unbiased estimators.
Second: the OLS standard errors will be too large for the
intercept when the errors are heteroscedastic.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• What has to be done If the form (i.e. the cause) of the heteroscedasticity is
known?
Option one: alternative estimation method which takes this into account
can be used. One possibility is called generalised least squares (GLS). GLS
can be viewed as OLS applied to transformed data that satisfy the OLS
assumptions.
Option two: Transforming the variables into logs or reducing by some
other measure of ‘size’.
Option three: Using heteroscedasticity-consistent standard error estimates.
Most standard econometrics software packages have an option (usually
called something like ‘robust’) that allows the user to employ standard error
estimates that have been modified to account for the heteroscedasticity
following White (1980).
Wolkite University Biruk Birhanu (Asst. Professor)
3. Assumption 3: cov(ui , uj ) = 0 for i = j
• Disturbance terms is that the covariance between the error terms
over time (or cross-sectionally, for that type of data) is zero.
• In other words, it is assumed that the errors are uncorrelated with
one another. If the errors are not uncorrelated with one another, it
would be stated that they are ‘autocorrelated’ or that they are
‘serially correlated’.
• Again, the population disturbances cannot be observed, so tests for
autocorrelation are conducted on the residuals, ˆu.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Before one can proceed to see how formal tests for autocorrelation
are formulated, the concept of the lagged value of a variable needs to
be defined.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
3.1. The concept of a lagged value
• The lagged value of a variable (which may be yt , xt, or ut ) is simply
the value that the variable took during a previous period. So for
example, the value of yt lagged one period, written yt−1, can be
constructed by shifting all of the observations forward one period in a
spreadsheet.
• Note that when one-period lags or first differences of a variable are
constructed, the first observation is lost.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
3.2. Graphical tests for autocorrelation
In order to test for autocorrelation, The first step is to
consider possible relationships between the current residual
and the immediately previous one, ˆut−1, via a graphical
exploration. Thus ˆut is plotted against ˆut−1, and ˆut is plotted
over time.
This case is known as positive autocorrelation since on
average if the residual at time t−1 is positive, the residual at
time t is likely to be also positive; similarly, if the residual at
t−1 is negative, the residual at t is also likely to be negative.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
Shows that a positively autocorrelated series of residuals
will not cross the time-axis very frequently.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
show negative autocorrelation,
indicated by an alternating pattern
in the residuals. This case is known
as negative autocorrelation since on
average if the residual at time t−1 is
positive, the residual at time t is
likely to be negative; similarly, if the
residual at t−1 is negative, the
residual at t is likely to be positive.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Figures show no pattern
in residuals at all: this is
what is desirable to see.
• In the plot of ˆut against
ˆut−1 , the points are
randomly spread across
all four quadrants, and
the time series plot of the
residuals does not cross
the x-axis either too
frequently or too little.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Test ?
Durbin--Watson (DW) is a test for first order autocorrelation-
i.e. it tests only for a relationship between an error and its
immediately previous value.
DW ≈ 2(1 − ˆρ)
ρˆ = 0, DW = 2 This is the case where there is no autocorrelation
in the residuals. So roughly speaking, the null hypothesis would
not be rejected if DW is near 2 →i.e. there is little evidence of
autocorrelation.
ρˆ = 1, DW = 0 This corresponds to the case where there is
perfect positive autocorrelation in the residuals.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
So, to reiterate, the null hypothesis is rejected and the existence of positive autocorrelation presumed
if DW is less than the lower critical value; the null hypothesis is rejected and the existence of negative
autocorrelation presumed if DW is greater than 4 minus the lower critical value; the null hypothesis is
not rejected and no significant residual autocorrelation is presumed if DW is between the upper and 4
minus the upper limits.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Conditions for DW to be a valid test?
1. There must be a constant term in the regression
2. The regressors must be non-stochastic
3 There must be no lags of dependent variable in the regression.
• Another test for autocorrelation: the Breusch–Godfrey test
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Why might lags be required in a regression?
• Lagged values of the explanatory variables or of the dependent
variable (or both) may capture important dynamic structure in the
dependent variable that might be caused by a number of factors. Two
possibilities that are relevant in finance are as follows:
1. Inertia of the dependent variable: Often a change in the value of
one of the explanatory variables will not affect the dependent
variable immediately during one time period, but rather with a lag
over several time periods.
2. Overreactions It is sometimes argued that financial markets
overreact to good and to bad news.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• Moving from a purely static model to one which allows for lagged
effects is likely to reduce, and possibly remove, serial correlation
which was present in the static model’s residuals.
3.Omission of relevant variables, which are themselves
autocorrelated In other words, if there is a variable that is an
important determinant of movements in y, but which has not
been included in the model, and which itself is autocorrelated,
this will induce the residuals from the estimated model to be
serially correlated.
Wolkite University Biruk Birhanu (Asst. Professor)
4. Assumption 4: the disturbances are
normally distributed
• Recall that the normality assumption (u ∼ N(0, σ )) is required in
t 2
order to conduct single or joint hypothesis tests about the model
parameters.
• A normal distribution is symmetric about its mean, while a skewed
distribution will not be, but will have one tail longer than the other.
• A normally distributed random variable that the entire distribution is
characterised by the first two moments-the mean and the variance.
• The standardised third and fourth moments of a distribution are
known as its skewness and kurtosis. Skewness measures the extent to
which a distribution is not symmetric about its mean value and
kurtosis measures how fat the tails of the distribution are.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• A leptokurtic distribution is one which has fatter tails
and is more peaked at the mean than a normally
distributed random variable with the same mean and
variance, while a platykurtic distribution will be less
peaked in the mean, will have thinner tails, and more of
the distribution in the shoulders than a normal.
• Is far more likely to characterise financial (and economic)
A normal versus a skewed distribution time series, and to characterise the residuals from a
financial time series model.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• One of the most commonly applied tests for normality is the Bera-
Jarque (hereafter BJ) test.
• If the residuals are normally distributed, the histogram should be bell-
shaped and the Bera-Jarque statistic would not be significant. This
means that the p-value given at the bottom of the normality test
screen should be bigger than 0.05 to not reject the null of normality
at the 5% level.
Wolkite University Biruk Birhanu (Asst. Professor)
…trend
• What should be done if evidence of non-normality is found?
In economic or financial modelling, it is quite often the case that one or
two very extreme residuals cause a rejection of the normality
assumption. Such observations would appear in the tails of the
distribution, and would therefore lead u4, which enters into the
definition of kurtosis, to be very large. Such observations that do not fit
in with the pattern of the remainder of the data are known as outliers.
• If this is the case, one way to improve the chances of error normality
is to use dummy variables or some other method to effectively
remove those observations.
Wolkite University Biruk Birhanu (Asst. Professor)
Assignment
• Multicollinearity
Wolkite University Biruk Birhanu (Asst. Professor)