0% found this document useful (0 votes)
12 views93 pages

1.cross Sectional Econometrics - 2017

The document provides a comprehensive overview of basic econometrics, emphasizing its application in analyzing cross-sectional data. It covers key topics such as model specification, estimation methods, hypothesis testing, and the use of linear regression with dummy variables. The author highlights the historical development of econometrics and its relationship with economic theory, statistics, and mathematics, illustrating the importance of econometric analysis in economic policymaking.

Uploaded by

Ayewusew kebede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views93 pages

1.cross Sectional Econometrics - 2017

The document provides a comprehensive overview of basic econometrics, emphasizing its application in analyzing cross-sectional data. It covers key topics such as model specification, estimation methods, hypothesis testing, and the use of linear regression with dummy variables. The author highlights the historical development of econometrics and its relationship with economic theory, statistics, and mathematics, illustrating the importance of econometric analysis in economic policymaking.

Uploaded by

Ayewusew kebede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Revision on

Basic Econometrics with a focus on Cross


Sectional Data
Zerayehu Sime Eshete (PhD)
PhD Program
Addis Ababa University

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 1
Ababa University
Outline

1. Understanding Econometrics
2. Model Specification
3. Estimation Methods
4. Hypothesis Testing
5. Linear Regression with Dummy Variables

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 2
Ababa University
1. Understanding Econometrics

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 3
Ababa University
1. Understanding Econometrics
• The history of econometric traced back the late nineteenth century.
• Jan Tinbergen and Ragnar Frisch are the two founding fathers of econometrics.
• Traditionally econometrics has focused upon aggregate economic relationships.
• Since the 1970s econometric methods are increasingly employed in micro-
economic models describing individual, household or firm behaviour, stimulated
(Verbeek 2004)
Ragnar Frisch, along with Jan Tinbergen, pioneered development of mathematical formulations of
economics. He coined the term econometrics for studies in which he used statistical methods to
describe economic systems. He is best known for his contributions to dynamic economic modeling,
and in 1933 he presented the first mathematical economic model that could describe fluctuations in
the business cycle. His later work concerned models for economic planning

Jan Tinbergen, Dutch economist noted for his development of econometric models. He was the
cowinner (with Ragnar Frisch) of the first Nobel Prize for Economics, in 1969. Because of the political
nature of his economic analyses, Tinbergen was one of the first to show that a government with
multiple policy objectives must be able to draw on multiple economic policy tools to achieve the
desired results. Among his major works are Statistical Testing of Business Cycles (1938), Econometrics
(1942), Economic Policy (1956), and Income Distribution (1975).
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 4
Ababa University
• Econometrics uses economic theory, mathematics, and
statistical inference to quantify economic phenomena.
In other words, it turns theoretical economic models
into useful tools for economic policymaking.
• The objective of econometrics is to convert qualitative
statements into quantitative statements.
• As Stock and Watson (2007) put it, “econometric
methods are used in many branches of economics,
including finance, labor economics, macroeconomics,
microeconomics, and economic policy.” Economic
policy decisions are rarely made without econometric
analysis to assess their impact.
• Consequently, econometrics is the interaction of
economic theory, observed data and statistical
methods. It is the interaction of these three that
makes econometrics interesting, challenging and,
perhaps, difficult.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 5
Ababa University
Why Econometrics? Why combining?
Econometrics complements economics. How? Economic theory makes statements or
• While the boundaries between economics and econometrics may overlap, there are hypotheses that are mostly qualitative
certainly some key differences between the two that you should be aware of. While in nature (the law of demand), the law
economics covers qualitative relationship, econometrics focuses almost purely on does not provide any numerical
quantitative studies. measure of the relationship. This is the
job of the econometrician.
Econometrics complements Statistics
• Statistics is about analyzing data, econometrics is the application of statistical methods Economic statistics is mainly concerned
to economic data. Statisticians use sampling to make statistical inferences about large with collecting, processing, and
populations. Econometricians on other hand examine counterfactuals to make causal presenting economic data in the form
inferences. Econometrics is the application of statistics to the study of economic and of charts and tables (descriptive
financial data while statistics is much broader and a branch of applied mathematics analysis). It does not go further on
economics and mathematics. one who
Econometrics complements Mathematics does this is by econometrician.
The main concern of mathematical
• Econometrics and mathematical economics are fields within economics that involve
economics is to express economic
the quantification of economic theories. However, they are both different.
theory in mathematical form without
Econometrics deals with the use of statistical and mathematical tools to analyze trends
regard to measurability or empirical
and predict future outcomes. Mathematical economics involves the application of
verification of the theory. Econometrics
mathematical models in the analysis of economic concepts.
is mainly interested in the empirical
verification of economic theory.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 6
Ababa University
i
A 0
Y X
1 ˆ
Y = aˆ + bX
How does Mathematics work? B 3 2
Y
bˆ =
C 6 3
• It establishes a linear relationship between Y and X D 9 4 =3
E 12 5 X
• The slope intercept form of a straight line is one of the most F 15 6
common forms used to represent the equation of a line. The G 18 7 ˆ = −3
aˆ = Y − bX
H 21 8
slope intercept formula can be used to find the equation of a I 24 9
line when given the slope of the straight line and the y- J 27 10
intercept( the y-coordinate of the point where the line
intersects the y-axis).
• Equation of line is the equation that is satisfied by each point
that lies on that line.

N.B: The slope is constant (=3), indicating that the effect of


a single unit change in X on Y is constant. So, the model or
equation is called Linear Model, presenting a straight line
graphically. The line passes through the paired dotes so
that mathematical model is an exact science( dealing
exact relationship between Y and X) Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 7
Ababa University
• But, this mathematical model does not work with real data since the ˆ +e
Y = aˆ + bX
reality is not an exact model. So, how can we develop a linear
Cov(Y , X )
bˆ =
relationship between Y and X and compute the slope and intercept?
• Yes, it is possible to compute by regression method or econometric Var ( X )
method. It is a real data modelling, and an empirical model
aˆ = Y − bX
• Let us take a simple regression model to simplify the procedure: By
using: Y= a +bX+e where e is the error terms.
• For Multiple regression, this formula does not work so that we need a • Regression analysis is based upon a functional
more general model relationship among variables and further,
assumes that the relationship is linear.
• This linearity assumption is required because,
for the most part, the theoretical statistical
properties of non-linear estimation are not well
worked out yet by the mathematicians and
econometricians.
• This presents us with some difficulties in
economic analysis because many of our
theoretical models are nonlinear
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 8
Ababa University
• Econometrics Model
2 2
i Y X (Y-Ybar) (Y-Ybar) (X-Xbar) (X-Xbar) (Y-Ybar)(X-Xbar) Yhat e
Covar (Y , X )
A 15 17 1.5 2.25 3.6 12.96 5.4 13.20 1.80 b= = −0.0836614
B 23 10 9.5 90.25 -3.4 11.56 -32.3 13.78 9.22 Var ( X )
C 12 26 -1.5 2.25 12.6 158.76 -18.9 12.44 -0.44
a = Y − bX = 14.62106
D 7 7 -6.5 42.25 -6.4 40.96 41.6 14.03 -7.03
E 10 11 -3.5 12.25 -2.4 5.76 8.4 13.70 -3.70 Yˆ = a + bX = Yhat
F 14 5 0.5 0.25 -8.4 70.56 -4.2 14.20 -0.20
G 21 16 7.5 56.25 2.6 6.76 19.5 13.28 7.72 Yˆ = 14.62106 − 0.0836614 X
H 6 22 -7.5 56.25 8.6 73.96 -64.5 12.77 -6.77
e = Y − Yˆ
I 19 11 5.5 30.25 -2.4 5.76 -13.2 13.70 5.30
J 8 9 -5.5 30.25 -4.4 19.36 24.2 13.86 -5.86
Sum 135 134 0 322.5 0 406.4 -34 135.0 0.0
13.5 13.40 0 32.25 0 40.64 -3.4 13.5 0.0


2 2

Y X Var (Y ) =
 Y − Yˆ  Var ( X ) =
  X − Xˆ  Covar (Y , X ) =
Y − Yˆ   X − Xˆ 
  
Y = X=
n n n n n

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 9
Ababa University
Y X Yhat e
15 17 13.19 1.81
23 10 13.78 9.22
12 26 12.44 -0.44
7 7 14.03 -7.03
10 11 13.70 -3.70
14 5 14.20 -0.20
• Plotting the actual Y with X, and Y hat with X, we find a scattered and 21 16 13.28 7.72
line graph respectively. 6 22 12.77 -6.77
19 11 13.70 5.30
• The mean value of the actual and fitted value of Y is always equal. 8 9 13.86 -5.86
Sum 135 134 134.9 0.0
• The summation and mean value of error terms is zero. Mean 13.5 13.4 13.5 0.0

• “Error" or residual. It is not an error in the sense of a mistake. The error

25
term was put into the estimating equation to capture missing variables
and errors in measurement that may have occurred in the dependent
variables.

20
• The absolute value of a residual measures the vertical distance between

15
the actual value of and the estimated value of y .
• In other words, it measures the vertical distance between the actual

10
data point and the predicted point on the line as can be seen on the
graph at point X0

5
• Econometrics / regression analysis is inexact science (inexact 5 10 15 20 25
X
relationship between Y and X).
Y yhat
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 10
Ababa University
Properties of regression line
• The line passes through the sample means of Y and X. So, we call it a
mean regression
• Expected value / mean value of Y is equal to Yhat.
• The mean value of the residuals i is zero
• The residuals are uncorrelated with Yhat
• The residuals are uncorrelated with Xi
• The line reduces the sum of squared differences between observed
values and predicted values Y =  0 + 1 X + e
• The regression constant (b0) is equal to y-intercept the linear
regression E Y  = E   0 + 1 X + e =  0 + 1 X = Yˆ
• The regression coefficient (b1) is the slope of the regression line ˆ  = 0 =  Ye
ˆ
which is equal to the average change in the dependent variable (Y) E Ye 
for a unit change in the independent variable (X).

Mathematical Model is deterministic model whereas Econometrics is


E  Xe = 0 =  Xe
Stochastic model
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 11
Ababa University
• To generalize, regression based on sample can be expressed as follows:

Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e The hat on the Y indicates that it is an estimate. We
say on average because the relationship between Y
and X is inexact. Not all the data points lie exactly
on the regression line.

Error term is a surrogate or proxy for all the omitted

Y = Yˆ + e
or neglected variables that may affect Y but are not
(or cannot be) included in the regression model
Error terms captures the following issues:
✓ Vagueness of theory so that the behavior
Y^: Explanatory/ Explained
e: Error term of Y may be incomplete.
Y: Dependent variable Independent variable
Random ✓ Unavailability of data due to no
Explained variable Predictor
Stochastic component quantitative information.
Predictand Regressor
Disturbance ✓ Strong interest in core variables than
Regressand Stimulus
Residual peripheral variables
Response Exogenous
Unexplained ✓ Intrinsic randomness in human behavior
Endogenous Covariate
White Noise ✓ Gap between proxy and actual variables,
Outcome systematic, deterministic
Shocks leading to a problem of errors of
Fitted Value/ estimated
measurement.
✓ Unknown functional relationship between
Zerayehu Sime Eshete (PhD), Email:
Y and X
[Link]@[Link], Economics Department, Addis 12
Ababa University
• To sum up, inferring about population parameter based sample statistic:
Sample Regression: Population Regression:
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e Y =  0 + 1 X 1 +  2 X 2 + ... k X k + 

Sample coefficient is Population coefficient is


• Statistic • Parameter
• Estimable • True Value
• Vary with sample • Not vary with sample/constant
• Based on hypothetical concept • Based on real data

Statistical inference is the attempt of making a statement about a population using only sample data that is a subset of that population. The goal of
statistical inference is to use sample data to estimate a parameter (a statistic about the population) or determine whether to believe a claim that
has been made about the population. We never actually observe the parameter we are interested in; instead we use an estimate of the parameter
based on data from a sample.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 13
Ababa University
• There are different types of statistical inferences that are
extensively used for making conclusions. They are:
a) One sample hypothesis testing
b) Confidence Interval
c) Pearson Correlation
d) Bi-variate regression
e) Multi-variate regression
f) Chi-square statistics and contingency table
g) ANOVA or T-test

Summary
• The primary objective of correlation analysis is to measure the strength or degree of linear association
between two variables. It is symmetric , Corr. (XY)= Corr. (YX).
• In regression analysis there is an asymmetry in the way the dependent and explanatory variables are
treated.
• The dependent variable is assumed to be statistical, random, or stochastic, that is, to have a probability
distribution.
• The explanatory variables, on the other hand, are assumed to have fixed values.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 14
Ababa University
2. Model Specification

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 15
Ababa University
2. Model Specification
Model Specification in Econometrics
• Model specification is the process of determining which
independent variables to include and exclude from a
regression equation. How do you choose the best Econometric Methodology: It has 3 parts
regression model?
Model Specification:
• Model selection in statistics is a crucial process. If you 1. Statement of theory or hypothesis.
don’t select the correct model, you have made a 2. Specification of the mathematical model of the theory
specification error, which can invalidate your results. 3. Specification of the statistical, or econometric, model
Estimation Process
• Specification error is when both independent variables 4. Obtaining the data
and their functional form (i.e., curvature and interactions) 5. Estimation of the parameters of the econometric model
Diagnostic Test
inaccurately portray the real relationship present in the 6. Hypothesis testing
data. 7. Forecasting or prediction
8. Using the model for control or policy purposes.
• Specification error can cause bias, which can exaggerate,
understate, or entirely hide the presence of underlying
relationships.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 16
Ababa University
• The analysts need to reach a Goldilocks balance by including the correct number of independent
variables in the regression equation.
❖ Too few: Underspecified models tend to be biased.
❖ Too many: Overspecified models tend to be less precise.
❖ Just right: Models with the correct terms are not biased and are the most precise.
• In addition, your regression equation should contain Control Independent Variables and Interest
Variables that affect the dependent variable to avoid biased results

The Goldilocks principle is named by analogy to the children's story "The Three Bears", in which a
young girl named Goldilocks tastes three different bowls of porridge and finds she prefers porridge that
is neither too hot nor too cold, but has just the right temperature. In our case, it balances unbiasedness
and precision.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 17
Ababa University
• Some of the controlling mechanism is to check the specification
before estimation, some are used to test after estimation.
• In checking the specification, we need to address the following:
❖ The model does not exclude an core or Relevant variables:
E.g. Excluding labour from Ethiopian GDP
❖ The model does not include Irrelevant or superfluous
variables: E.g. Including Ebola dissemination rare in
Ethiopian GDP
❖ The functional form of a model is suitably chosen. E.g. The
effect of consumption on Utility could not be captured by
linear model
❖ Do not regress a variable on it own components. E.g. The
response of aggerate demand to consumption, investment,
government expenditure and net export.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 18
Ababa University
• Example: Suppose that you are assigned to examine the effect of
foreign grant on GDP. Here you have two variables (GDP & Foreign
grant) as dependent and independent (interest) variables.
• But, the point is that it is not only foreign grant that influences GDP,
there are some other variables (control variables): The question is how
do you choose them as a determinant factor.
• Both theories and empirics are important in this regard to set. Lets base
on R. Lucas Growth Model
Y = f ( Labour, Capital, Human Capital)........................................................................................ (1)
• It is possible to substitute capital by investment, meaning that stock
concept with flow concept to make all annual data.
Y = f ( Labour, Investment, Human Capital).................................................................................. (2)

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 19
Ababa University
• From neoclassical perspective, what we save is equal to what we invest.
We can then replace it by saving, but it does not reflect the African
context. In Africa, investment exceeds saving, which call for foreign aid.
Foreign aid can also be decomposed into two (foreign loan, and foreign
grant) based on interest rate. Thus,
Y = f ( Labour, Saving, Foreign Grant, Foreign Loan, Human Capital)........................................ (3)

• On the same manner, Human capital can be captured by education and


health, so how do we measure them? Lets take expenditure on
education and health can serve as proxy variables. Thus,
Y = f ( Labour, Saving, Foreign Grant, Foreign Loan, Health exp, Education Exp)......................... (4)

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 20
Ababa University
3. Estimation Methods

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 21
Ababa University
Data and Estimation Methods
• Once we define this general specification, we can proceed to establish
mathematical and econometric models as follows:

GDP =  0lab +  0 Sav +  0 Fg +  0 Fl +  0 He +  0 Ee


GDP =  0lab +  0 Sav +  0 Fg +  0 Fl +  0 He +  0 Ee + e ..........................(5)

• Then,
No. we set variableDefinition
Variable definitionTypeandMeasurement
expectedExpected
signssignas Sources
follows:from Literature
1 GDP NA NA
2 Labour
3 Saving
4 Foreign Loan
5 Foreign Grant
6 Health Expenditure
7 Education Expenditure
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 22
Ababa University
• Estimating coefficients of the linear model can be done by OLS, MLE and MM
method.
❖ The sign of each coefficient indicates the direction of the relationship
between a predictor variable and the response variable.
❖ The coefficient value represents the mean change in the response given a
one unit change in the predictor.
• First, we discuss estimation by Ordinary Least Squares (OLS) by minimizing the
residual sum of squares. This yields the famous Gauss estimator.
• Second, we derive estimates of the regression coefficients using the methods of
Maximum Likelihood Estimation (MLE)assuming normal errors. This also leads to
the Gauss estimator. Finally, we must note there are other methods for determining
• Finally, we drive Method of Moment, MM the regression line. They are preferred in different contexts.
Such examples are the Generalized least squares, Maximum
likelihood estimation, Bayesian regression, the Kernel
regression, and the Gaussian process regression.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 23
Ababa University
Ordinary Least Squares (OLS) estimator of regression coefficients

• Now we show the classic way (Gauss 1809; Legendre 1805) to estimate regression coefficients by the
method of ordinary least squares (OLS).
• Goal: choose regression coefficients such as to minimize the squared error between observations and
the prediction.
• Why Sum Squared? This is because of the fact that summation of error terms give us zero, so scape out
this trap, we use the squared one.
• So, it is possible to present this in two approaches ( Summation approach and Matrix approach )
• Steps:
1. Solve for the error term of the regression equation
2. Square both sides
3. Insert summation
4. Apply first order condition
5. Solve for coefficients

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 24
Ababa University
(A) Estimation using Summation Approach
• To get the optimal estimate of coefficient, we call for the First Order Condition
Principle for our minimization objective. Lets take a multiple regression with two Xs:

n n 2
 e =  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  
2
  n
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + e i =1 i =1
 Y − ˆ − ˆ1 X 1 − ˆ2 X 2  = 0
n 0

  e2
i =1
e = Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  n n n n n
i =1
= 2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    −1 = 0  Y −  ˆ0 − ˆ1  X1 − ˆ2  X 2 = 0
n n 2
ˆ0  

i =1
e =  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  
2

i =1
  n
i =1
n
i =1
n
i =1
n
i =1
n
i =1
n

n  e 2
2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    −1
  0
 Y =  ˆ 0 + ˆ1  X 1 + ˆ2  X 2
  e2 i =1
= i =1
= i =1 i =1 i =1 i =1

i =1
=0 ˆ0 −2 −2 n n n

ˆs n
 Y = nˆ 0 + ˆ1  X 1 + ˆ2  X 2 ...(1)
  e2
i =1 i =1 i =1
n
i =1
=  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2   = 0
ˆ0 i =1
 
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 25
Ababa University
n n 2 n n 2

i =1
e =  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  
2

i =1
  
i =1
e =  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  
2

i =1
 
n n
 e 2
n
  e2 n
i =1
= 2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    − X 1  = 0 i =1
= 2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    − X 2  = 0
ˆ1 i =1
  ˆ2 i =1
 
n n n n
 e 2
2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    − X 1   e 2
2 Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    − X 2 
  0   0
i =1
= i =1
= i =1
= i =1
=
ˆ1 −2 −2 ˆ2 −2 −2
n n
 e 2
n
  e2 n
i =1
=  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    X 1  = 0 i =1
=  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2    X 2  = 0
ˆ1 i =1
  ˆ2 i =1
 
n n


i =1
YX 1 − ˆ0 X 1 − ˆ1 X 12 − ˆ2 X 1 X 2  = 0
   YX
i =1
2 − ˆ0 X 2 − ˆ1 XX 2 − ˆ2 X 22  = 0
n n n n n n n n

 YX 1 −  ˆ0 X 1 − ˆ1  X 12 − ˆ2  X 1 X 2 = 0


i =1 i =1 i =1 i =1
 YX 2 −  ˆ0 X 2 − ˆ1  X 1 X 2 − ˆ2  X 22 = 0
i =1 i =1 i =1 i =1
n n n n n n n n

 YX 1 =  ˆ0 X 1 + ˆ1  X 12 + ˆ2  X 1 X 2


i =1 i =1 i =1 i =1
 YX 2 =  ˆ0 X 2 + ˆ1  X 1 X 2 + ˆ2  X 22
i =1 i =1 i =1 i =1
n n n n n n n n

 YX 1 = ˆ0  X 1 + ˆ1  X 12 + ˆ2  X 1 X 2 ...(2)


i =1 i =1 i =1 i =1
 YX 2 = ˆ0  X 2 + ˆ1  X 1 X 2 + ˆ2  X 22 ...(3)
i =1 i =1 i =1 i =1

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 26
Ababa University
Combining the  n   n n

three n n n   Y   n X 1 X 2 
equations  Y = nˆ 0 + ˆ1  X 1 + ˆ2  X 2  i =1   i =1 i =1
  ˆ0 
i =1 i =1 i =1  n
  n n n
 ˆ 
  YX 1  =   X 1 X  X 1 X 2   1 
2
n n n n 1
n n n

 Y = nˆ0 + ˆ1  X1 + ˆ2  X 2 ...................(1)  YX 1 = ˆ0  X 1 + ˆ1  X 12 + ˆ2  X 1 X 2  i =1   i =1 i =1 i =1   ˆ 


2 
i =1 i =1 i =1 i =1  n   n n n  
  YX 2    X 2 X X  X 22 
i =1 i =1 i =1
n n n n 1 2
n

 YX =  ˆ X
1
n

0 1
n n
+ ˆ1  X 12 + ˆ2  X 1 X 2 ...(2)  YX 2 = ˆ0  X 2 + ˆ1  X1 X 2 + ˆ2  X 22
i =1 i =1 i =1 i =1
 i =1   i =1 i =1 i =1
−1

 n n
  n 
X  
i =1 i =1 i =1 i =1
     n X2  Y
n n n
 
i=1 YX 2 = i=1 ˆ0 X 2 + ˆ1 i=1 X1 X 2 + ˆ2 i=1 X 2 ...(3)    X1 
n n n n 1
2 Y   n X 2   ˆ0   i =1 i =1
  i =1

i =1
  i =1 i =1
  ˆ0     n n n
  n

 n
  n n n
 ˆ   ˆ1  =   X 1 X 2
 X X
1 2  1  YX
  YX 1  =   X 1 X  X 1 X 2   1 
2 1
1    i =1 i =1 i =1   i =1 
 i =1   i =1 i =1 i =1   ˆ   ˆ2 
2     n n n
2   n 
 n   n n n
2   X2 X X  X2    YX 2 
  YX 2    X 2  X1 X 2  1 2
X2   i =1 i =1 i =1   i =1 
 i =1   i =1 i =1 i =1 
B =X ' X   X 'Y 
−1
In Matrix form

Note that X0 is 1 This X’X is a symmetric


Matrix
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 27
Ababa University
(B) Estimation using Matrix Approach
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + e = XB + e
e = Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2 
e =Y − X
n n 2 Summary:
 e =  Y −  ˆ0 + ˆ1 X 1 + ˆ2 X 2  
2
 
i =1 i =1 • The OLS estimators are expressed solely in terms of
RSS = Y − X   ' Y − X   the observable quantities (i.e., X and Y).
RSS = Y ' Y − Y ' X  −  ' X ' Y +  ' X ' X 
• They are point estimators; that is, given the sample,
RSS = Y ' Y − 2  ' X ' Y +  ' X ' X 
each estimator will provide only a single (point) value
RSS = Y ' Y − 2 ˆ ' X ' Y + ˆ ' X ' X ˆ
of the relevant population parameter.
RSS • We will consider the so-called interval estimators,
= −2 X ' Y + 2 X ' X ˆ = 0
ˆ which provide a range of possible values for the
−2 X ' Y + 2 X ' X ˆ = 0 unknown population parameters
−2 X ' Y = −2 X ' X ˆ
X ' Y = X ' X ˆ
ˆols =  X ' X   X 'Y 
−1

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 28
Ababa University
Example:
• Examining the determinants of SAT scores
• Get data from Lawrence C. Hamilton (chapter 6).
• Use the file [Link] from [Link]
. describe csat expense percent income high college region . sum csat expense percent income high college region

storage display value Variable Obs Mean Std. Dev. Min Max
variable name type format label variable label
csat 51 944.098 66.93497 832 1093
csat int %9.0g Mean composite SAT score expense 51 5235.961 1401.155 2960 9259
expense int %9.0g Per pupil expenditures prim&sec percent 51 35.76471 26.19281 4 81
percent byte %9.0g % HS graduates taking SAT income 51 33.95657 6.423134 23.465 48.618
income double %10.0g Median household income, $1,000 high 51 76.26078 5.588741 64.3 86.6
high float %9.0g % adults HS diploma
college float %9.0g % adults college degree college 51 20.02157 4.16578 12.3 33.3
region byte %9.0g region Geographical region region 50 2.54 1.128662 1 4

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 29
Ababa University
• Using OLS, we have the following results: Analysis of Variance
. reg csat expense percent income high college [Link]
(ANOVA)

Source SS df MS Number of obs = 50


F(8, 41) = 52.51
Model 194023.719 8 24252.9649 Prob > F = 0.0000
Residual 18937.6605 41 461.894159 R-squared = 0.9111
Adj R-squared = 0.8937
Total 212961.38 49 4346.15061 Root MSE = 21.492
Measure of Goodness of
csat Coef. Std. Err. t P>|t| [95% Conf. Interval]
the Model to fit with
data
expense -.002021 .00424 -0.48 0.636 -.0105839 .0065419
percent -3.007647 .2328838 -12.91 0.000 -3.477965 -2.537329
income -.1674421 1.035771 -0.16 0.872 -2.259224 1.924339
high 1.814731 1.184555 1.53 0.133 -.5775255 4.206988
college 4.670564 1.708108 2.73 0.009 1.220969 8.120159

region
N. East 69.45333 14.95479 4.64 0.000 39.25151 99.65514 Regression Equation with
South 25.39701 13.32343 1.91 0.064 -1.510213 52.30423
Midwest 34.57704 9.5368 3.63 0.001 15.31709 53.837 associated test
_cons 808.0206 79.79478 10.13 0.000 646.8718 969.1694

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 30
Ababa University
Maximum Likelihood Estimation (MLE)
• As its name suggests, maximum likelihood estimation involves finding the value of the parameter
that maximizes the likelihood function (or, equivalently, maximizes the log-likelihood function). This
value is called the maximum likelihood estimate, or MLE.
• A method of point estimation with some stronger theoretical properties than the method of OLS is
the method of maximum likelihood (ML).
• Following the properties of error term, Yi are normally and independently distributed with mean =
β1 + β2Xi and variance = σ2
Y = Xˆ +e

( )
2

 Y − Y 
2
 Y − Xˆ 
  =  =2 e 2

E(Y)= Xˆ , Var (Y ) = =


n n n
Y N (Xˆ , 2 )
• As a result, the joint probability density function of Y1, Y2, . . . , Yn, given the preceding mean and
variance, can be written as

f  Y1, Y2, . . . , Yn |Xˆ ,  2 

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 31
Ababa University
• But in view of the independence of the Y’s, this joint probability density function
can be written as a product of n individual density functions as
f  Y1 , Y2 , . . . , Yn |Xˆ ,  2 

= f  Y1 |Xˆ ,  2  f  Y2 |Xˆ ,  2  ... f  Y3 |Xˆ ,  2  ................(1)


 Y − X ˆ
 
2

1  1   ..........................................................( 2)
f (Yi ) = exp  −  
 2  2  2

 
• which is the density function of a normally distributed variable with the given mean
and variance. Substituting (2) for each Yi into (1) gives

 Y − X ˆ
 
2

1  1    .......(3)
f (Y1 , Y2 ,...Yn ) = exp  
− 
  2 
n
n
 2  2

 

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 32
Ababa University
• If Y1, Y2, . . . , Yn are known or given, but β1, β2, and σ 2 are not known, the
function in (3) is called a likelihood function, denoted by LF(β1, β2, σ 2),

 Y − X ˆ
 
2

1  1    .......(4)
LF (  s ,  2 ) = exp  
− 
  2 
n
n
 2  2

 
2
n 1 Y − Xˆ 
ln LF ( / Y , X ) = −n ln  − ln(2 ) −   
2 2 2
2
n n 1 Y − Xˆ 
ln LF ( / Y , X ) = − ln  2 − ln(2 ) −    .............(5)
2 2 2 2

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 33
Ababa University
• Differentiating (5) partially with respect to β1, β2, and σ 2, we obtain
2
n n 1 Y − Xˆ 
ln LF = − ln  − ln(2 ) − 
2   .............(5)
2 2 2  2

 Y '
Y − Y '
Xˆ − X 'Y ˆ + X ' ˆ ' X ˆ 
n n 1  
ln LF = − ln  − ln(2 ) − 
2

2 2 2 2 Getting Bs are precisely the normal


equations of the least-squares theory
 ln LF 1
= 2   −2 X 'Y ˆ + 2X ' X ˆ  = 0 → ˆMLE =  X ' X  X 'Y
obtained in the OLS. Therefore, the ML
estimators, the β˜’s, are the
ˆ 2 same as the OLS estimators, the βˆ’s

 ln LF n 1 2
= − 2 + 4  Y − X  = 0
ˆ
ˆ 2
2 2

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 34
Ababa University
• Examining the last likelihood, we see that the last term enters with a
negative sign. Therefore, maximizing amounts to minimizing this term,
which is precisely the least-squares approach.
 ln LF n 1 2
= − 2 + 4  Y − X  = 0
ˆ
ˆ 2
2 2
1 2
= −n + 2  Y − X  = 0
ˆ From this, it is obvious that the ML estimator σ˜ 2 differs
 from the OLS estimator σˆ2 = RSS/n-k, which was shown to
be an unbiased estimator of σ2
1 2
n = 2  Y − Xˆ  = 0

2

2 = 
ˆ
Y − X 
 =  e 2

=
e '
e
n n n

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 35
Ababa University
• Thus, the ML estimator of σ2 is biased.
The magnitude of this bias can be easily
determined as follows. Taking the
mathematical expectation on both sides,
we obtain • This shows that σ˜ 2 is biased downward (i.e., it
underestimates the true σ2 in small samples.
• But notice that as n, the sample size, increases
ˆ 2 =  e 2
indefinitely, the second term, the bias factor, tends
n to be zero.
• Therefore, asymptotically (i.e., in a very large

E ˆ  = E 
2  e 2

= 
1   2
 sample), σ˜2 is unbiased too, that
n n n − k  is, lim E(σ˜2) = σ 2 as n → ∞.
    • It can further be proved that σ˜2 is also a consistent
1  2  2 2 estimator; that is, as n increases indefinitely σ˜2
E ˆ  = 
2
 = − 
2
converges to its true value σ 2
n n − 2 n
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 36
Ababa University
Iteration 5: log likelihood = -249.92494 (not concave)
Iteration 6: log likelihood = -242.76569 (not concave)
Iteration 7: log likelihood = -237.31405 (not concave)
Iteration 8: log likelihood = -234.3235
Iteration 9: log likelihood = -226.66329
Iteration 10: log likelihood = -219.40916
Iteration 11: log likelihood = -219.3691
Iteration 12: log likelihood = -219.36905

• Using MLE, we have the Iteration 13: log likelihood = -219.36905

Number of obs = 50

following result based on Log likelihood = -219.36905


Wald chi2(8)
Prob > chi2
=
=
512.27
0.0000

these stata command: csat Coef. Std. Err. z P>|z| [95% Conf. Interval]

• ml model lf lfols (xb: csat = expense percent income xb


expense -.002021 .0038395 -0.53 0.599 -.0095463 .0055043
high college [Link]) (lnsigma:) percent -3.007647 .2108853 -14.26 0.000 -3.420975 -2.594319
income -.1674421 .9379304 -0.18 0.858 -2.005752 1.670868
• ml maximize high 1.814731 1.07266 1.69 0.091 -.2876436 3.917106
college 4.670564 1.546758 3.02 0.003 1.638974 7.702154

• display exp([lnsigma]_cons) region


N. East 69.45333 13.54214 5.13 0.000 42.91122 95.99543
South 25.39701 12.06488 2.11 0.035 1.750272 49.04375
Midwest 34.57704 8.63594 4.00 0.000 17.65091 51.50318

_cons 808.0206 72.25725 11.18 0.000 666.399 949.6422

lnsigma
_cons 2.968442 .1 29.68 0.000 2.772446 3.164439

[Link]
[Link]
[Link]
[Link]
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 37
Ababa University
Method of Moment
• This is the 3rd method of estimation (Simple and General Method of
Moment).
• Lets focus on the simple one using the first and second moment
• It is based on by imposing the two assumptions
Y = ˆ0 + ˆ1 X 1 + e  Y = ˆ n + ˆ  X ................(1)
0 1 1

M 1:  e = 0 =  Y − ˆ0 − ˆ1 X 1   YX = ˆ  X + ˆ  X ........(2)


0 1 1 1
2

 Y − ˆ − ˆ X  = 0
0 1 1
 Y  n
 =
 X   ˆ 
 
1 0

  YX    X  X   ˆ 
2
 Y = ˆ n + ˆ  X ................(1)
0 1 1
1 1 1
−1
M 2 :  eX = 0 =  Y − ˆ − ˆ X  X
1 0 1 1 1
 ˆ   n
0
 =
 X   Y 
 
1

     X  X    YX 
ˆ 2

 Y − ˆ − ˆ X  X = 0
0 1 1 1
1 1 1

Bols = ( X ' X ) −1 ( X 'Y )


 YX = ˆ  X + ˆ  X ........(2)
0 1 1 1
2

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 38
Ababa University
GMM estimation

Number of parameters = 9
Number of moments = 9
Initial weight matrix: Unadjusted Number of obs = 50
GMM weight matrix: Robust

• OLS is an MM estimator.
Robust

• The stata command is as Coef. Std. Err. z P>|z| [95% Conf. Interval]

follows: expense
percent
-.002021
-3.007647
.0032485
.2135325
-0.62
-14.09
0.534
0.000
-.008388
-3.426163
.004346
-2.589131
income -.1674421 1.083308 -0.15 0.877 -2.290688 1.955803
• gmm (csat - {xb: expense percent income high college
[Link] _cons}), instruments(expense percent income high high 1.814731 .9298204 1.95 0.051 -.0076834 3.637146
college [Link]) college 4.670564 1.448589 3.22 0.001 1.831381 7.509746

• The OLS estimator is a one-step GMM estimator, region


but we did not bother to specify the one-step N. East 69.45333 16.29745 4.26 0.000 37.51091 101.3957
option, because the model is just identified South 25.39701 11.34163 2.24 0.025 3.167819 47.6262
Midwest 34.57704 8.556918 4.04 0.000 17.80579 51.3483

_cons 808.0206 61.44765 13.15 0.000 687.5854 928.4558

[Link]

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 39
Ababa University
Difference among OLS, MLE and MM
• To be able to pass statistics, we often encounter OLS and MLE. “OLS” stands for “ordinary least squares”
while “MLE” stands for “maximum likelihood estimation.” Usually, these two statistical terms are related
to each other n regression with Gaussian errors.
• If OLS assumptions hold, both are the same.
• The maximum likelihood estimation covers a set of parameters which can be used for predicting the data
needed in a normal distribution.
• OLS is specific to linear regression, whereas MLE can be applied to various statistical models.
• MM starts with the assumptions satisfaction.
• The ordinary least square (OLS) method is tailored to the linear regression model. If the data is not too
weird, it should always give a decent result. The OLS method does not make any assumption on the
probabilistic nature of the variables and is considered to be deterministic.
• The maximum likelihood estimation (MLE) method is a more general approach, probabilistic by nature,
that is not limited to linear regression models.
• OLS provides unbiased estimates, while ML provides asymptotically efficient estimates.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 40
Ababa University
Interval Estimation as Alterative

• Suppose that you obtained the MPC with estimate A confidence


of 0.5091. This is called a point or single estimate of interval is the mean
unknown population parameter. of your estimate
plus and minus the
• How reliable is this estimate? Because of sampling variation in that
fluctuations, a single estimate is likely to differ from estimate
the true value
• But, note that in repeated sampling its mean value
is expected to be equal to the true value. [Note:
E(βˆ2) = β2.]
• Now in statistics the reliability of a point estimator
is measured by its standard error. In this case, we cannot make the probabilistic statement; that
• Therefore, instead of relying on the point estimate is, we cannot say that the probability is 1 - α that a given fixed
alone, we may construct an interval around the interval includes the true β2. In this situation β2 is either in
point estimator, say within two or three standard the fixed interval or outside it. Therefore, the probability is
errors on either side of the point estimator, such either 1 or 0. Thus, for our hypothetical consumption-income
example, if the 95% confidence interval were obtained as
that this interval has, say, 95 percent probability of (0.4268 ≤ β2 ≤ 0.5914), we cannot say the probability is 95%
including the true parameter value. that this interval includes the true β2. That probability is
either 1 or 0.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 41
Ababa University
CONFIDENCE INTERVALS FOR REGRESSION COEFFICIENTS
• with the normality assumption for error terms, the OLS coefficients are themselves
normally distributed with means and variances given therein.
Pr  −t / 2  t  t / 2  = 1 − 
 ˆ −  
Pr  −t / 2   t / 2  = 1 − 
ˆ
se(  )
 
Pr  ˆ − t / 2 se( ˆ )    ˆ + t / 2 se( ˆ )  = 1 − 
• The width of the confidence interval is proportional to the standard error of the
estimator. That is, the larger the standard error, the larger is the width of the
confidence interval.
• Put differently, the larger the standard error of the estimator, the greater is the
uncertainty of estimating the true value of the unknown parameter.
• Thus, the standard error of an estimator is often described as a measure of the
precision of the estimator, i.e., how precisely the estimator measures the true
population value.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 42
Ababa University
Assumptions and Properties with OLS
• To accept the OLS results, the following assumptions of linear model must be satisfied
and then BLU (CAN) properties are an outcome for small (large) sample case.
(A) Key assumptions in OLS Regression Analysis

Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e


About e
About Y 1) The mean value of error
terms is zero
1)There is no linear Y = X +e 2) The Variance of error term is
relationship between constant, Homoscedasticity
Yt and Yt-1 About X 3) There is not autocorrelation
2)There is no 1) There is no measurement error in X among error terms
measurement error in Y 2) There is no linear relationship among Xs (No 4) Error term is normally
3)Y is linear in Coefficient Multicollinearity), full rank assumption distributed with zero mean
3) There is no linear relationship among Xs and Error term and constant variance
(No endogeneity problem)

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 43
Ababa University
• Assumption 1: The disturbance is assumed to have conditional expected value zero at every
observation. This conditional mean assumption states, in words, that no observations on x convey
information about the expected value of the disturbance.

E ( i / X ) = 0
• The zero conditional mean implies that the unconditional mean is also zero, The converse is not true

E ( i ) = E X  E ( i / X )  = E X  0 = 0
• This assumption also implies that

E ( i X ) = 0
E (Y / X ) =  0 + 1 X

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 44
Ababa University
• Assumption 2: The error term ϵ has the same  
variance given any value of the explanatory variable  
 E (11 / X ) E (1 2 / X ) E (1 n / X ) 
(i.e. homoskedasticity) across observation. E ( ' / X ) =  E ( 21 / X ) E ( 2 2 / X ) E ( 2 n / X ) 
 
Var ( i / X ) = E ( i j / X ) =  2 
 E (  / X ) E ( n 2 / X )

E ( n n / X ) 
 n 1

• Assumption 3: the error terms are not correlated  


 2 
across observations (i.e. no autocorrelation):  0 0
E ( ' / X ) =  0  2 0  =  2I
Covar ( i ,  j / X ) = E ( i j / X ) = 0 i j 



 0 0  2 

• These implies that Cov-variance Matrix
E ( '/ X ) =  2 I Disturbances that meet the assumptions of
homoscedasticity and nonautocorrelation are
• Assumption 4: The residuals are normal sometimes called spherical disturbances.

 / X  N (0,  2 )
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 45
Ababa University
• Assumption 5: Full rank/ No multicollinearity: There is no
exact linear relationship among any of the independent
variables in the model. This assumption will be necessary
for estimation of the parameters of the model.
• Hence, X has full column rank; the columns of X are linearly
independent and there are at least K observations and the
surrounding text. This assumption is known as an
identification condition.
• This assumption packs three assumptions in itself.
✓ (1) There should not be any Perfect multicollinearity between any
of the regressors. X values are fixed in repeated sampling. Values
✓ (2) The number of observations (n rows of matrix X)should be taken by the regressor X are considered fixed in
greater than the number of regressors (k columns of matrix X): n>k repeated samples. More technically, X is
assumed to be nonstochastic. The X values in a
✓ (3) For the case of simple linear regression, all the values of the given sample must not all be the same.
regressor x should not be the same. Technically, var (X ) must be a finite positive
number.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 46
Ababa University
• Assumption 6: Exogeneity of the independent variables: E [εi | xj1, xj2, . . . , xj K] = 0. This states that
the expected value of the disturbance at observation i in the sample is not a function of the
independent variables observed at any observation, including this one. This means that the
independent variables will not carry useful information for prediction of εi.

Covar ( i , X ) = 0
• Assumption 7: Linearity: The model specifies a linear relationship between Y and Xs. In the
regression context, linearity refers to the manner in which the parameters and the disturbance enter
the equation, not necessarily to the relationship among the variables. So, The regression model is
linear in the parameters and error term.

Y= 0 +1X+

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 47
Ababa University
Why Should We Care About the Classical OLS Assumptions?

• Efficiency: - If these assumptions hold true, the OLS procedure creates the best possible
estimates, BLUE. In statistics, estimators that produce unbiased estimates that have the smallest
variance are referred to as being “efficient.” Efficiency is a statistical concept that compares the
quality of the estimates calculated by different procedures while holding the sample size
constant. OLS is the most efficient linear regression estimator when the assumptions hold true.
• Convergence: - Another benefit of satisfying these assumptions is that as the sample size
increases to infinity, the coefficient estimates converge on the actual population parameters.
• Reliability: - If your error term also follows the normal distribution, you can safely use
hypothesis testing to determine whether the independent variables and the entire model are
statistically significant. You can also produce reliable confidence intervals and prediction
intervals.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 48
Ababa University
(B) Properties of OLS in Finite/ Small Sample (BLUE)
Linear
• This property is more concerned with the estimator rather than the original equation that is being
estimated.
• In assumption A1, the focus was that the linear regression should be “linear in parameters.”
However, the linear property of OLS estimator means that OLS belongs to that class of estimators,
which are linear in Y, the dependent variable.

 ols =  X ' X   X ' Y  = cY


−1
ˆ
• Note that OLS estimators are linear only with respect to the dependent variable and not necessarily
with respect to the independent variables.
• The linear property of OLS estimators doesn’t depend only on assumption (the linear regression
model is “linear in parameters) but on all other assumptions on error and x.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 49
Ababa University
The bias measures the difference between the fitted value and
the true value of estimates. If a treatment effect from linear
regression is biased, it means we have an inaccurate causal
effect. The variance measures the spread of the estimates (which
are random variables) around their expected values. The higher
B)Unbiasedness the variance, the less the precision of estimates

ˆols =  X ' X   X ' Y 


−1

• If you look at the regression equation, you will find


an error term associated with the regression From Population, Y = X  + 
equation that is estimated. This makes the Then,
dependent variable also random.
ˆ =  X ' X   X ' Y 
−1

• If an estimator uses the dependent variable, then ols

ˆols =  X ' X   X ' ( X  +  ) 


that estimator would also be a random number. −1

• Therefore, before describing what unbiasedness is, it


ˆols =  X ' X   X ' X   +  X ' X   X '  
−1 −1
is important to mention that unbiasedness property
is a property of the estimator and not of any sample
E  ˆols  = E  X ' X   X ' X   + E  X ' X   X '  
−1 −1

• Unbiasedness is one of the most desirable


E  ˆols  =  X ' X   X ' X  
properties of any estimator. −1

• The estimator should ideally be an unbiased


estimator of true parameter/population values. E  ˆols  = 

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 50
Ababa University
ˆols =  X ' X   X ' Y 
−1

C) Best: Minimum Variance From Population, Y = X  + 


Then,
• The efficient property of any estimator says that ˆ =  X ' X   X ' Y 
−1

the estimator is the minimum variance unbiased


ˆ =  X ' X   X ' ( X  +  ) 
−1
estimator.
ˆ =  X ' X   X ' X   +  X ' X   X '  
−1 −1
• As a result, they will be more likely to give better
ˆ =  X ' X   X ' X   +  X ' X   X '  
−1 −1
and accurate results than other estimators having
higher variance. In short:
ˆ =  +  X ' X   X '  
−1

✓ If the estimator is unbiased but doesn’t have


ˆ −  =  X ' X   X '  
−1

the least variance – it’s not the best! '


'
Var ˆ = E  ˆ −    ˆ −   =  X ' X   X '    X ' X   X ' e 
−1 −1

✓ If the estimator has the least variance but is   


'
biased – it’s again not the best! Var ˆ = E  ˆ −    ˆ −   = E  X ' X   X '    ' X  X ' X 
−1 −1

✓ If the estimator is both unbiased and has the '


Var ˆ = E  ˆ −    ˆ −   =  2  X ' X   X ' X  X ' X 
−1 −1

least variance – it’s the best estimator.


'
Var ˆ = E  ˆ −    ˆ −   =  2  X ' X 
−1

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 51
Ababa University
• Once we compute this variance, we need to prove whether it is minimum comparing
with other unbiased estimator: Since B is linear in Y, it can be written as B=Cy for some
matrix C, which possibly is a function of X. Let C=D+A, then
ˆ =  ' +  X ' X   X '  To be unbaised ,
−1

E ( D ) = 0 → DX  = 0 → DX = 0
ˆ −  ' =  X ' X   X ' 
−1

So, b = D + ˆ Since both b and bβ are


ˆ −  ' = A (As indicated above)
( )
unbiased and since E(Dε | X) = D
b −  = D + ˆ −  E(ε | X) = 0, it follows that DXβ =
Then, 0. For this to be true for any
b = CY = ( D + A)Y b− = (D + A)  given β, it is necessary that DX =
0. So bβ = Dε + b and
Therefore, Therefore,
b = DY + AY Var (b) = Var (b −  )
b = DY + AY Var (b) = Var (( D + A )  )
b = D  X  +   +  X ' X 
−1
X ' Y ( D + A ) Var ( ) ( D + A)
'
Var (b) =
 
Var (b) =  2 ( D + A ) ( D + A )
'
b = D X  +   + 
b = DX  + D +  Var (b) =  2 ( DD '+ AD '+ DA '+ AA ')
E (b) = DX  + E ( D ) + E (  ) Var (b) =  2 ( DD '+ ( x ' x ) −1 )   2 ( x ' x ) −1 )
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 52
Ababa University
(C ) Properties of OLS in Infinite/ Large Sample (CAN)

A) Consistency of the Least Squares Estimator of β


• An estimator is said to be consistent if its value approaches the actual, true parameter
(population) value, as the sample size increases. An estimator is consistent if it satisfies two
conditions:
✓ a. It is asymptotically unbiased
✓ b. Its variance converges to 0 as the sample size increases.
• Both these hold true for OLS estimators and, hence, they are consistent estimators. For an
estimator to be useful, consistency is the minimum basic requirement. Since there may be
several such estimators, asymptotic efficiency also is considered. Asymptotic efficiency is the
sufficient condition that makes OLS estimators the best estimators

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 53
Ababa University
For Large Sample,
B^ is a consistent estimator of β −1
ˆ  X ' X   X ' 
 = +
ˆols =  X ' X   n   n 
 X 'Y 
−1

ˆ −1  X '  
From Population, Y = X  +   =  +Q 
Then,  n 
 X ' 
ˆ =  X ' X   X ' Y 
−1
ˆ =  + Q −1 
 n 
 =  X ' X   X ' ( X  +  ) 
−1
ˆ
ˆ  X ' 
Plim n →  =  + Q Plim n → 
−1

 =  X ' X   X ' X   +  X ' X   X '   n 


−1 −1
ˆ

 =  X ' X   X ' X   +  X ' X   X ' 


ˆ −1 −1
Plim ˆ =  + Q −1Plim
n → w n →

ˆ =  +  X ' X   X '  
−1 Plim n → ˆ =  + Q −1 0
Plim n → ˆ = 
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 54
Ababa University
B) Asymptotic Normality of For Large Sample, For Large Sample,
the Least Squares Estimator ˆ =  + 
−1
 X ' X   X '  X 'X 
Plim n → ˆ =  + 
−1

Plim
 X ' 
 n   n   n 
n → 
• Consistency is an improvement  n 
−1
 X ' X   X '   X ' 
over unbiasedness. ˆ −  =  Plim n → ˆ =  + Q −1Plim n → 
 n   n   n 
• To derive the asymptotic −1
Plim ˆ = 
 X ' X   X '  n →
distribution of the least n  ˆ −   =   
 n   n 
squares estimator, we will
 X ' 
make use of some basic central n  ˆ −   = Q −1  
 n 
limit theorems
n  ˆ −   → d → N (0,  2Q −1 )

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 55
Ababa University
Asymptotic Distribution of b with Independent observations: If {εi} are independently distributed with
mean zero and finite variance σ2 and xik is such that the Grenander conditions are met, then
  2

ˆ N   , Q −1 
 n 
'
 X '    X '   X  
ˆ =  + Q  −1
Var ( ˆ ) =  Plim n→ Q −1       Q −1 
2

For Large Sample,  n    n   n  


 X '    X ' X 
X 'X 
−1
 X '  ˆ −  = Q −1  Var ( ˆ ) =  2 Plim n→ Q −1  2   Q −1 
'

Plim n → ˆ =  +  Plim  n    n 
 n 
n → 
 n  '
Var ( ˆ ) = Plim n→  ˆ −    ˆ −   2   X ' X   −1 '
 X '  ˆ
Var (  ) = Plim n→ Q −1  Q 
Plim n → ˆ =  + Q −1Plim n →   
 n  ' n   n 
  X '     −1  X '   
Var ( ˆ ) = Plim n→ Q −1    Q  n    2
Plim ˆ =  Var ( ˆ ) = Plim n→ Q −1 Q  Q −1 
'
n →   n    
'
n
  X '       '
X  −1   2
Var ( ˆ ) = Plim n→ Q 
−1
   n  Q  Var ( ˆ ) = Q −1
  n     n
Grenander conditions: - Conditions on the regressors under which the OLS estimator will be consistent. The Grenander conditions are
weaker than the assumption on the regressor X that limn->infinity(X'X)/n is a fixed positive definite matrix, which is a common starting
assumption. See Greene, 2nd ed, 1993, p 295.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 56
Ababa University
(D) Properties of Maximum Likelihood Estimation(MLE)
 Y − Xˆ 
2

1  1   
• An alternative to the least-squares method
is the method of maximum likelihood (ML).
LF (  s ,  2 ) = n
exp  −
2
 2 
  2 
n
 
 
• MLE is a method for estimating parameters 2
of a statistical model. Given the distribution Y − Xˆ 
of a statistical model f(y ; θ) with unknown n 1  
deterministic parameter θ, MLE is to ln LF ( / Y , X ) = −n ln  − ln(2 ) − 
estimate the parameter θ by maximizing 2 2 2
2
the probability f(y ; θ) with observations y. Y − Xˆ 
n n 1  
• To use this method, however, one must ln LF ( / Y , X ) = −ln  2 − ln(2 ) − 
make an assumption about the probability 2 2 2 2
distribution of the disturbance term. ˆ
2
n n 1  Y − X  
• In the regression context, the assumption ln LF = − ln  2 − ln(2 ) −  
most popularly made is that error term 2 2 2 2
follows the normal distribution.
n n 1 Y 'Y − Y ' Xˆ − X 'Y ˆ + X ' ˆ ' X ˆ 
• Maximum likelihood estimators (MLEs) are ln LF = − ln  2 − ln(2 ) −   
most attractive because of their large 2 2 2  2

sample or asymptotic properties.  ln LF 1


✓ Consistency ˆ
=
2 2   −2 X 'Y ˆ + 2X ' X ˆ  = 0 → ˆMLE =  X ' X  X 'Y
   
✓ Asymptotic normality
✓ Asymptotic efficiency
 ln LF
= −
n
+
1
 Y − Xˆ 
2
= 0 → 2
=
e 2

=
e 'e
ˆ 2 2 2 2 4 n n
✓ Invariance
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 57
Ababa University
• Maximum likelihood estimators (MLEs) are most attractive because of their
large sample or asymptotic properties.
• Under regularity, the maximum likelihood estimator (MLE) has the
following asymptotic properties:
✓ Consistency
✓ Asymptotic normality
✓ Asymptotic efficiency
✓ Invariance

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 58
Ababa University
(A) Consistency
• Consistency. We say that an estimate sample coefficient is consistent if sample coefficient
converges to population coefficient in probability as n →infinity , where population coefficient is
the ’true’ unknown parameter of the distribution of the sample.

if ˆ −  , n →  Assignment: Prove it

(B) Asymptotic normality


• We say that sample coefficient is asymptotically normal if

n (ˆ −  ) →d N (0,  2 ) Assignment: Prove it

•  2 is the asymptotic variance of the estimate



ˆ
• Asymptotic normality says that the estimator not only converges to the unknown parameter, but
it converges fast enough, at a rate 1/√n.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 59
Ababa University
(C) Asymptotic Efficiency
• An estimator is asymptotically efficient if it is consistent, asymptotically normally
distributed (CAN), and has an asymptotic covariance matrix that is not larger than
the asymptotic covariance matrix of any other consistent, asymptotically normally
distributed estimator. The Variance also attains the Cramer-Rao lower bound.
• Lets start with Hessian Matrix:

Y '
Y − Y '
X  ˆ − X 'Y ˆ + X ' ˆ ' X ˆ   ˆ 
n n 1  
ln LF = − ln ˆ − ln(2 ) − 
2
, Let  =  2 
2 2 2 ˆ 2  ˆ 
 
  2 ln LF  2 ln LF  
   X ' X (
− Y − X ˆ X 

) Assignment: Prove it
ˆ ˆ ˆ 2  −
 ln LF 
2
      ˆ = ˆ 2
ˆ 4

=
  2 ln LF  2 ln LF   − ˆ 
 '
    Y (− X ˆ X
 ) n −  Y( − X   )
 ˆ ˆ
2
ˆ ˆ  
2 2

 ˆ 4 2ˆ 4 ˆ 6 

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 60
Ababa University
• Derive the Information Matrix, which is the negative of the expected
value of the second derivative:

 − X ' X (
− Y − X ˆ X  ) 
  2 ln LF   ˆ2 ˆ 4

I ( ) = − E   = −E 
    ( ) ( ˆ  )
 − Y − X  X 
 '  ˆ − Y − X 
n 
  Assignment: Prove it
 ˆ 4 2ˆ 4 ˆ 6 
X 'X 
0
  2 ln LF    2 
I ( ) = − E 
    = 
 '   0
n 
 
 2 4 

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 61
Ababa University
• Derive the Cramer-Rao Lower Bound, which is the inverse of the
information matrix
This is the variance of the MLE estimator This is the variance of the MLE estimator
as Var (ˆ ) =  =   X ' X 
2
2 of S^2
X 'X

 2 
 0 
=
X 'X Assignment: Prove them
CRLB =  I ( )  
−1

 2 4 
 0 
 n 
The CRLB confirms the smallest variance that can be obtained by a consistent estimator. In this regard, the MLE
estimators are considered to be efficient. The variance of and their unbiased estimator is greater or equal to the
CRLB. The Variance of the MLE coefficient is the same as that of the OLS, but the variance of (S^2) for MLE is lower
that that of the OLS:
2 4 2 4
Var ( S ) MLE =
2
 Var ( S )OLS =
2

n n−K
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 62
Ababa University
(D) Invariance
• The invariance property is a mathematical result of the method of computing MLEs; it is not a statistical
result as such.
• More formally, the MLE is invariant to one-to one transformations of θ. Any transformation that is not one to
one either renders the model inestimable if it is one to many or imposes restrictions if it is many to one.
• Some theoretical aspects of this feature are discussed in Davidson and MacKinnon (2004, pp. 446, 539–540).
For the practitioner, the result can be extremely useful.
• For example, when a parameter appears in a likelihood function in the form 1/θj, it is usually worthwhile to
reparametrize the model in terms of γj = 1/θj. In an important application, Olsen (1978) used this result to
great advantage.
Example: Log-Likelihood Function and Likelihood Equations for the Normal Distribution
• In sampling from a normal distribution with mean μ and variance σ 2, the log-likelihood function and the
likelihood equations for μ and σ 2 are

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 63
Ababa University
• Suppose that the normal log-likelihood in the above Example is parameterized in
terms of the precision parameter, θ^2 = 1/σ^2.
• The log-likelihood becomes

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 64
Ababa University
Algebraic Aspects of the Least
Squares Solution
(A) Residual Maker Matrix • Residual Maker Matrix (M)

Y = X ˆ + e
Y1 = ˆ0 + ˆ1 X 11 + ˆ2 X 12 + ...ˆk X 1k + e1
e = Y − X ˆ
Y2 = ˆ0 + ˆ1 X 21 + ˆ2 X 22 + ...ˆk X 2 k + e2 e = Y − X ( X ' X ) X 'Y 

Y = ˆ + ˆ X + ˆ X + ...ˆ X + e e=  I − X  ( X ' X ) X '


Y
3 0 1 31 2 32 k 3k 3
e = MY

• M is both symmetric and Idempotent Matrix


Yn = ˆ0 + ˆ1 X n1 + ˆ2 X n 2 + ...ˆk X nk + en M =  I − X  ( X ' X ) X '
A) Symmetric (M = M') Assignment: Prove
such properties of
B) Idempotent (M = M 2 )
Y = X ˆ + e in Matrix
M matrix

C) MX=0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 65
Ababa University
(B) Projection Matrix
Projection Matrix
Y = X ˆ + e ✓ P is symmetric
Y = Yˆ + e ✓ P idempotent.
✓ Both M and P are orthogonal
e = MY where M =  I − X  ( X ' X ) X ' ✓ PX = X
Assignment: Prove
such properties of

Yˆ = Y − e = Y − MY =  I − M Y
P and M matrix

Yˆ =  I − M  Y = PY Both Projection and Residual Makers:


✓ PM = MP = 0
where P = I −  I − X  ( X ' X ) X ' = X  ( X ' X ) X ' ✓ Y= PY + MY = projection + residual

[Link]

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 66
Ababa University
( C ) Mean Deviation Maker Matrix
• This matrix is useful in transforming data by calculating a variable deviation from its mean. This
matrix is defined as

• For an example of how this matrix is used, consider the case when we want to transform a single
variable x. In the single variable case, the sum of squared deviations about the mean is given by
(Greene, 2003, p. 808; Searle, 1982, p. 68)

[Link]
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 67
Ababa University
• It can easily be shown that M0 is symmetric so that M0^T=M0. Therefore

• For two variables x and y, the sums of squares and cross products in deviations
from their means is given by (Greene, 2003, p. 809)

Assignment: Prove such


• Mo is also symmetric and Idempotent matrix. properties of Mo Matrix

[Link]

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 68
Ababa University
(D) Applying both Projection and Residual Matrices into Regression

• In Matrix, using both projection and residual maker • In manipulating equations involving least squares
matrices: results, the following equivalent expressions for
the sum of squared residuals are often useful:
Y = Yˆ + e
e = MY where M =  I − X  ( X ' X ) X '
Y ' Y = Yˆ 'Yˆ + e'e
Yˆ =  I − M  Y = PY where P = X  ( X ' X ) X '
e ' e = Y ' M ' MY = Y ' MY = Y 'e = eY '
Then,
Y = Yˆ + e e'e = Y ' Y − Yˆ 'Yˆ = Y 'Y − ˆ ' X ' X ˆ
Y = PY + MY N .B : ˆ = ( X ' X ) −1 ( X ' Y )  ˆ ( X ' X ) = ( X ' Y )
• The Pythagorean theorem at work in the sums of
squares e'e = Y ' Y − Yˆ 'Yˆ = Y 'Y − ˆ ' X 'Y = Y 'Y − Y ' ˆ X
Y = PY + MY
Y ' Y = Y ' P ' PY + Y ' M ' MY
Y ' Y = Yˆ 'Yˆ + e'e Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 69
Ababa University
(E) Estimating the Variance of Estimator
• The least squares residuals are
e = MY = M  X  +   = M  as MX=0
• An estimator of σ2 will be based on the sum of squared residuals:
e 'e =  ' M
E  e ' e  = E  ' M  / X 
• The scalar εMε is a 1 × 1 matrix, so it is equal to its trace. By using the result on
cyclic permutations
E tr (  ' M  ) / X  = E tr ( M  '  ) / X 
• Since M is a function of X, the result is
E  = E tr ( M  '  ) / X 
tr  M .E (  '  ) / X  = tr  M . 2 I  =  2tr ( M )

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 70
Ababa University
• The trace of M is
• So, the natural estimator is biased toward
tr  M .E (  '  ) / X  = tr  M . I  =  tr ( M )
2 2
zero, although the bias becomes smaller as
the sample size increases.
N .B : • An unbiased estimator of σ 2 is
tr ( M ) = tr  I n − X ( X ' X ) −1 X '
e 'e
s2 =
tr ( M ) = tr  I n  − tr  X ( X ' X ) −1 X ' n−k
tr ( M ) = tr  I n  − tr  I k  • The estimator is unbiased unconditionally as
well.
tr ( M ) = n − k • The standard error of the regression is s, the
square root of s2. With s2, we can then
• Therefore, compute Est. Var[b | X] = s2(XX)−1

E  e ' e  = (n − k ) 2

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 71
Ababa University
4. Hypothesis Testing

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 72
Ababa University
Goodness of the Model to Fit with Data
and ANOVA
• TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two
components: explained sum of squares (ESS) and residual sum of squares (RSS). A
study of these components of TSS is known as the analysis of variance (ANOVA)
from the regression viewpoint.
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e y = yˆ + e OR
y 2 =  yˆ + e 
2

y  yˆ 
2
Y = Yˆ + e 2
e 2

= +
 y =   yˆ 2 + e2 + 2 ye
2
Then, ˆ  n n n
Y − Y = Yˆ − Y + e
 y 2 =  yˆ +  e2 + 0 Var (Y ) = Var (Yˆ ) + Var (e)
2

Y − Y  = Yˆ − Yˆ  +  e − e 
 y 2 =  yˆ +  e2
2
 
Y − Y  = Yˆ − Yˆ  +  e − 0 TSS = ESS + RSS
 
y = yˆ + e

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 73
Ababa University
• If the regression contains a constant term, then the residuals will sum to zero and the mean of the
predicted values of Yi will equal the mean of the actual values. Subtracting y¯ from both sides as
follows:
Y − Y  = Yˆ − Yˆ  +  e 
 
Y − Y  =  X − Xˆ  ' ˆ +  e 

M 0Y = M 0 X ˆ + M 0 e

• Intuitively, the regression would appear to fit well if the deviations of y from its mean are more
largely accounted for by deviations of x from its mean than by the residuals. Since both terms in this
decomposition sum to zero, to quantify this fit, we use the sums of squares instead.

Y ' M 0Y = ˆ ' X ' M 0 X ˆ + e ' e


ESS Var (Yˆ ) ˆ ' X ' M 0 X ˆ Yˆ ' M 0Y
R =
2
= = = Assignment: Prove it
TSS Var (Y ) Y ' M 0Y Y ' M 0Y

• where M0 is the n × n idempotent matrix that transforms observations into deviations from sample
means. The column of M0X corresponding to the constant term is zero, and, since the residuals
already have mean zero, M0e = e.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 74
Ababa University
The coefficient of determination (R2)
• As we have shown, it must be between 0 and 1, and it
measures the proportion of the total variation in y that is • There are some problems with the use of
accounted for by variation in the regressors. R2 in analyzing goodness of fit.
• The first concerns the number of degrees
• It equals zero if the regression is a horizontal line, that is,
of freedom used up in estimating the
if all the elements of b except the constant term are zero.
parameters. R2 will never decrease when
In this case, the predicted values of y are always ¯ y, so another variable is added to a regression
deviations of x from its mean do not translate into equation.
different predictions for y. As such, x has no explanatory
power.
TSS = ESS + RSS
• The other extreme, R2 = 1, occurs if the values of x and y ESS RSS RSS
all lie in the same hyperplane (on a straight line for a two 1= + = R2 +
variable regression) so that the residuals are all zero. If all TSS TSS TSS
the values of yi lie on a vertical line, then R2 has no RSS
R2 = 1 −
meaning and cannot be computed. TSS
• Regression analysis is often used for forecasting. In this RSS / n − k
Adj.R 2 = 1 −
case, we are interested in how well the regression model TSS / n − 1
predicts movements in the dependent variable.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 75
Ababa University
Summary of ANOVA

Source SS df Mean Square

ESS b ' X ' Y − nY 2 k-1  b ' X ' Y − nY 2 


 k −1 
 

RSS e ‘e n-k  e 'e 



n − k 

 Y ' Y − nY 2

TSS Y ' Y − nY 2
n-1 
 n −1 

Assignment: Prove ESS,RSS, and TSS: Hint: Use the concept of


Mean Deviation Maker Matrix, Residual Maker Matrix, and
Prediction Maker matrix.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 76
Ababa University
Hypothesis Testing Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + + ˆk X k + e
(a) Testing The Overall Significance Of The Y =  0 + 1 X 1 +  2 X 2 + + k X k + 
Sample Regression a ) Formulate H
• Testing the significance of the estimated partial regression H 0 : 1 =  2 = =  k = 0 or R 2 = 0
coefficients individually, that is, under the separate hypothesis
H1 : 1 =  2 = =  k  0 or R 2  0
that each true population partial regression coefficient was
zero. b)Compute the test statstic
• It is also known as joint or simultaneous test, F test ESS / k − 1
Fcal ( k −1)( n − k ) =
• This joint hypothesis can be tested by the analysis of variance RSS / n − k
(ANOVA) technique c ) Read the table
• It address the question how are all independent variables Ftab ( k −1)( n − k )
capable of explaining the variations in Y d ) Decision Rule
• The null hypothesis states that all explanatory variables are If Fcal  Ftab , reject H 0
statistically insignificant and not able to explain Y jointly. OR
If Pvalue   , reject H 0

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 77
Ababa University
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + + ˆk X k + e
Y =  0 + 1 X 1 +  2 X 2 + + k X k + 
(b) Test for Individual Explanatory Variable Let ' s take X 1 as an example
a ) Formulate H
• T-tests are used in linear regression to determine if H 0 : 1 = 0
a particular variable is statistically significant in the
model. H1 : 1  0
b)Compute the test statstic
• A statistically significant variable is one that has a
strong relationship with the dependent variable ˆ1 − 1 ˆ1 − 1
tcal ( )( n − k ) = =
and contributes significantly to the accuracy of the ˆ
se( 1 ) Var ( ˆ1 )
model.
c ) Read the table
• T-tests are also used to compare the significance of
different variables in the model, which can help to ttab ( )( n − k )
identify which variables are most important for d ) Decision Rule
predicting the dependent variable. If t cal  ttab , reject H 0
OR
If Pvalue   , reject H 0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 78
Ababa University
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + + ˆk X k + e
(C) Testing The Equality Of Two Regression Y =  0 + 1 X 1 +  2 X 2 + + k X k + 
Coefficients
Let ' s take X 1 as an example
• Here it is to test the hypothesis of equality of a ) Formulate H
two slope coefficients.
H 0 : 1 =  2 or 1 - 2 = 0
• Such a null hypothesis is of practical importance. H :    or  -  0
1 1 2 1 2
• For example, the demand function for a b)Compute the test statstic
commodity where Y = amount of a commodity  ˆ1 − ˆ2  −  1 −  2 
demanded, X2 = price of the commodity, X3 = t  
cal ( )( n − k ) =
income of the consumer, and X4 = wealth of the se  ˆ − ˆ 
 1 2
consumer.
 ˆ1 − ˆ2  − 0
• The null hypothesis in this case means that the tcal ( )( n − k ) =  
income and wealth coefficients are the same. Or, Var ( ˆ1 ) + Var ( ˆ2 ) − 2 cov( ˆ1 , ˆ2 )
if Yi and the X’s are expressed in logarithmic c ) Read the table
form, the null hypothesis implies that the ttab ( )( n − k )
income and wealth elasticities of consumption
d ) Decision Rule
are the same.
If t cal  ttab , reject H 0
OR
If Pvalue   , reject H 0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 79
Ababa University
The t − test approach
Y = ˆ X 1 X  2 eui
ˆ ˆ
0 1 2
(D) Restricted Least Squares: Testing Linear ln Y = ln ˆ0 + ˆ1 ln X 1 + ˆ2 ln X 2 + ln e
Equality Restrictions
ln Y = c + ˆ ln X + ˆ ln X + u
1 1 2 2 i
• There are occasions where economic theory may
suggest that the coefficients in a regression model a ) Formulate H
satisfy some linear equality restrictions. For H 0 : 1 +  2 =1
instance, consider the Cobb–Douglas production H1 : 1 +  2  1
function: b)Compute the test statstic
• Now if there are constant returns to scale  ˆ1 + ˆ2  −  1 +  2 
(equiproportional change in output for an tcal ( )( n − k ) =  
se  ˆ ˆ 
equiproportional change in the inputs), economic  1 −  2 
theory would suggest that β2 + β3 = 1  ˆ1 + ˆ2  − 1
tcal ( )( n − k ) =  
• which is an example of a linear equality restriction. Var ( ˆ1 ) + Var ( ˆ2 ) + 2 cov( ˆ1 , ˆ2 )
• How does one find out if there are constant returns c ) Read the table
to scale, that is, if the restriction is valid? ttab ( )( n − k )
• There are two approaches: The t-Test approach: d ) Decision Rule
unrestricted or unconstrained regression and The If t cal  ttab , reject H 0
F-Test Approach: Restricted Least Squares OR
If Pvalue   , reject H 0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 80
Ababa University
The F − test approach
ln Y = c + ˆ1 ln X 1 + ˆ2 ln X 2 + ui .....UR
(E) The F-Test Approach: Restricted
1 = 1- 2 or  2 = 1-1
Least Squares
ln Y = c + 1- 2  ln X 1 + ˆ2 ln X 2 + ui
• The preceding t test is a kind of postmortem examination
because we try to find out whether the linear restriction ln Y = c + ln X 1 −  2 ln X 1 + ˆ2 ln X 2 + ui
is satisfied after estimating the “unrestricted’’ regression.
 ln Y − ln X  = c + ˆ  ln X − ln X  + u
1 2 2 1 i
• A direct approach would be to incorporate the restriction
 Y  ˆ ln  X 2  + u .........R
into the estimating procedure at the outset. In the ln   = c +  2   i
present example, this procedure can be done easily as  X1   X1 
follows: b)Compute the test statstic
• Where (Yi/X2i) = output/labor ratio and (X3i/X2i) =
Fcal =
 RSS R − RSSUR  / m
capital labor ratio, quantities of great economic
RSSUR /( n − k )
importance.
or
• Once we estimate β3 from the transformed, β2 can be
easily estimated or retrieved back from the relation. Fcal =
 RUR
− RR  / m

• M is the number of restriction, it is 1 in this case.


1 − RUR  /( n − k )
N .B : F should be postive
If F is not significant , accept H 0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 81
Ababa University
5. Linear Regression with Dummy Variable

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 82
Ababa University
5. Linear Regression with Dummy Variables
• Using categorical data in Multiple Regression Models is a powerful method to include non-
numeric data types into a regression model.
• In a regression model, these values can be represented by dummy variables - variables containing
values such as 1 or 0 representing the presence or absence of the categorical value.
• By including dummy variable in a regression model however, one should be careful of the Dummy
Variable Trap.
• The Dummy Variable trap is a scenario in which the independent variables are multicollinear - a
scenario in which two or more variables are highly correlated; in simple terms one variable can be
predicted from the others.
• To overcome such trap, we can model dummy variables with two options:
• Model I: Without Constant Term
• Model II: With Constant Term, dropping one dummy category

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 83
Ababa University
• Dummy variables (also known as binary, indicator, dichotomous, discrete, or categorical variables)
are a way of incorporating qualitative information into regression analysis.
• Qualitative data, unlike continuous data, tell us simply whether the individual observation
belongs to a particular category.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 84
Ababa University
• Example: To assess the effect of geographical location on teachers’ wage. These 51 areas are
classified into three geographical regions: (1) Northeast and North Central (21 states in all), (2)
South (17 states in all), and (3) West (13 states in all).

In case of dropping one of the dummy categories, coefficients tell us the marginal difference of Y for each dummy
category comparing with the reference category, constant term. The constant term captures the mean value of Y for
the reference category. Prove this how coefficients are equal to marginal difference of Y per categories.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 85
Ababa University
• In Example, to distinguish the three regions, we used only two dummy variables, D2 and D3. Why
did we not use three dummies to distinguish the three regions? Suppose we do that and write the
model as:

• Inserting all dummy categories with intercept leads to a case of perfect collinearity, that is, exact
linear relationships among the variables. Why? So, we forced to drop one of the dummy
categories as stated above. But, you are interested to have all, drop the constant term as follows:

In case of dropping constant term,


coefficients tell us the mean value of Y
for each dummy category. Prove this
how coefficients are equal to mean
values of Y per categories.

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 86
Ababa University
Assignment:
• Assume a multiple regression model with n=16, n1=4, n2=6, n3=3 and n4=3.
a) Show that OLS estimates are the mean b) Show that OLS estimates are the
values without the reference. marginal values with the reference.
Y = ˆ1 X 1 + ˆ2 X 2 + ˆ3 X 3 + ˆ4 X 4 + 
Y = ˆ0 + ˆ2 X 2 + ˆ3 X 3 + ˆ4 X 4 + 
Solution :
  Y1  Solution :
 
 n1   ˆ0  Y
 ˆ1    
   2   
Y 1
 ˆ2   n2

ˆ 
 ˆ =


  2  Y2 − Y1 
  3    Y3   ˆ = 
 ˆ   n   Y
 3  3 1 − Y
4   3

  Y4   ˆ  Y − Y 
 n
 4

 4   4 1 
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 87
Ababa University
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 88
Ababa University
Summary:
• Dummy variables can be incorporated in regression models just as easily as quantitative variables.
As a matter of fact, a regression model may contain regressors that are all exclusively dummy, or
qualitative, in nature. Such models are called Analysis of Variance (ANOVA) models.
• Regression models containing an admixture of quantitative and qualitative variables are called
analysis of covariance (ANCOVA) models. ANCOVA models are an extension of the ANOVA models
in that they provide a method of statistically controlling the effects of quantitative regressors,
called covariates or control variables, in a model that includes both quantitative and qualitative,
or dummy, regressors.
• If a qualitative variable has more than one category, as in our illustrative example, the choice of
the benchmark category is strictly up to the researcher. Sometimes the choice of the benchmark
is dictated by the particular problem at hand.
• Do not interpret the coefficient of dummy as a unit change, instead use the word of ‘ being
category A increase or decreases Y by this amount’
• Interaction effects can also be captured by using dummy variables. In other words, there may be
interaction between the two qualitative variables D2 and D3. Therefore their effect on mean Y
may not be simply additive as in previous but multiplicative as well, as in the following model

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 89
Ababa University
Exercise I: Linear model
• Question 1: Suppose that you are assigned to conduct a research of assessing the effect of price
on the efficiency of firms. The table below gives information on price and efficiency:

i A B C D E F G H I J K L M N O
Efficiency 401 132 140 190 230 390 125 260 270 276 280 290 300 280 275
Price 77 22 24 30 32 82 20 55 60 62 69 75 77 75 70

• Given the above information, show all the necessary Steps to prove:
a) Compute estimate of coefficients (Slope and Intercept)
b) Figure our the error terms for each firms
c) Compute the TSS, ESS, and RSS
d) Compute the coefficient of determination
e) Compute the standard error of each coefficient
f) Compute the lower and upper confidence interval at 5%
g) Compute the F test and t test and make a decision about the model and policy
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 90
Ababa University
• Question 2: Assume that you are assigned to conduct a research about rural-urban migration
problem in Addis Ababa with the sample size of 3240 and investigate the relationship between
laborers’ wage and their age X (Preliminary analysis of the sample data produces the following
sample information
• N N N

 yi2 = 78434.97
i =1
 xi2 = 25526.17
i =1
x y
i =1
i i = 3666.426
N N N

Y
i =1
i = 34379.16 X
i =1
i = 96143.00 Y
i =1
i
2
= 443227.1
N N N

X
i =1
i
2
= 2878451.0 XY
i =1
i i = 1023825.0  ˆ
i =1
i
2
= 77908.35

✓ (a) Compute the value of coefficient of determination and briefly explain what does it mean
✓ (b) Is age statistically significant to explain the variation in hourly wage? Which Hypothesis
do you support?
✓ (c)Calculate the value of F-statistic

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 91
Ababa University
• Question 3: The model is to be estimated from a sample of 20 observations. Preliminary summarized
information from the data is given below in matrix and summation form. Note that the inverse matrix
of the explanatory variables in deviation form while X’Y is the column matrix of the explanatory
variables and the dependent variable which is not deviation form.
Yi = ˆ + ˆ1 X 1i + ˆ2 X 2i + ˆ3 X 3i + ei
810 
0.1 −0.12 −0.03  4960 
( x / x) −1 = −0.12 0.04 0.02 X Y =  
 7340 
−0.03 0.02 0.08  
10900
 Y = 810 Y
__ __

X 1 = 120  X 2 = 180 X 3 = 260 i i


2
= 49500 x = X i − X i y = Yi − Y

a) Find the OLS estimates of the model


b) Compute the standard errors of the slope coefficients
c) Compute and interpret the result
d) Test the significance of at 5% level of significance using t-test( critical value of t, 2.12 )
e) Test the joint significance of the slope coefficients at 5% level of significance (the critical value of F,
3.63 ).

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 92
Ababa University
MANY THANKS

Zerayehu Sime Eshete (PhD), Email:


[Link]@[Link], Economics Department, Addis 93
Ababa University

You might also like