1.cross Sectional Econometrics - 2017
1.cross Sectional Econometrics - 2017
1. Understanding Econometrics
2. Model Specification
3. Estimation Methods
4. Hypothesis Testing
5. Linear Regression with Dummy Variables
Jan Tinbergen, Dutch economist noted for his development of econometric models. He was the
cowinner (with Ragnar Frisch) of the first Nobel Prize for Economics, in 1969. Because of the political
nature of his economic analyses, Tinbergen was one of the first to show that a government with
multiple policy objectives must be able to draw on multiple economic policy tools to achieve the
desired results. Among his major works are Statistical Testing of Business Cycles (1938), Econometrics
(1942), Economic Policy (1956), and Income Distribution (1975).
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 4
Ababa University
• Econometrics uses economic theory, mathematics, and
statistical inference to quantify economic phenomena.
In other words, it turns theoretical economic models
into useful tools for economic policymaking.
• The objective of econometrics is to convert qualitative
statements into quantitative statements.
• As Stock and Watson (2007) put it, “econometric
methods are used in many branches of economics,
including finance, labor economics, macroeconomics,
microeconomics, and economic policy.” Economic
policy decisions are rarely made without econometric
analysis to assess their impact.
• Consequently, econometrics is the interaction of
economic theory, observed data and statistical
methods. It is the interaction of these three that
makes econometrics interesting, challenging and,
perhaps, difficult.
2 2
Y X Var (Y ) =
Y − Yˆ Var ( X ) =
X − Xˆ Covar (Y , X ) =
Y − Yˆ X − Xˆ
Y = X=
n n n n n
25
term was put into the estimating equation to capture missing variables
and errors in measurement that may have occurred in the dependent
variables.
20
• The absolute value of a residual measures the vertical distance between
15
the actual value of and the estimated value of y .
• In other words, it measures the vertical distance between the actual
10
data point and the predicted point on the line as can be seen on the
graph at point X0
5
• Econometrics / regression analysis is inexact science (inexact 5 10 15 20 25
X
relationship between Y and X).
Y yhat
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 10
Ababa University
Properties of regression line
• The line passes through the sample means of Y and X. So, we call it a
mean regression
• Expected value / mean value of Y is equal to Yhat.
• The mean value of the residuals i is zero
• The residuals are uncorrelated with Yhat
• The residuals are uncorrelated with Xi
• The line reduces the sum of squared differences between observed
values and predicted values Y = 0 + 1 X + e
• The regression constant (b0) is equal to y-intercept the linear
regression E Y = E 0 + 1 X + e = 0 + 1 X = Yˆ
• The regression coefficient (b1) is the slope of the regression line ˆ = 0 = Ye
ˆ
which is equal to the average change in the dependent variable (Y) E Ye
for a unit change in the independent variable (X).
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e The hat on the Y indicates that it is an estimate. We
say on average because the relationship between Y
and X is inexact. Not all the data points lie exactly
on the regression line.
Y = Yˆ + e
or neglected variables that may affect Y but are not
(or cannot be) included in the regression model
Error terms captures the following issues:
✓ Vagueness of theory so that the behavior
Y^: Explanatory/ Explained
e: Error term of Y may be incomplete.
Y: Dependent variable Independent variable
Random ✓ Unavailability of data due to no
Explained variable Predictor
Stochastic component quantitative information.
Predictand Regressor
Disturbance ✓ Strong interest in core variables than
Regressand Stimulus
Residual peripheral variables
Response Exogenous
Unexplained ✓ Intrinsic randomness in human behavior
Endogenous Covariate
White Noise ✓ Gap between proxy and actual variables,
Outcome systematic, deterministic
Shocks leading to a problem of errors of
Fitted Value/ estimated
measurement.
✓ Unknown functional relationship between
Zerayehu Sime Eshete (PhD), Email:
Y and X
[Link]@[Link], Economics Department, Addis 12
Ababa University
• To sum up, inferring about population parameter based sample statistic:
Sample Regression: Population Regression:
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ...ˆk X k + e Y = 0 + 1 X 1 + 2 X 2 + ... k X k +
Statistical inference is the attempt of making a statement about a population using only sample data that is a subset of that population. The goal of
statistical inference is to use sample data to estimate a parameter (a statistic about the population) or determine whether to believe a claim that
has been made about the population. We never actually observe the parameter we are interested in; instead we use an estimate of the parameter
based on data from a sample.
Summary
• The primary objective of correlation analysis is to measure the strength or degree of linear association
between two variables. It is symmetric , Corr. (XY)= Corr. (YX).
• In regression analysis there is an asymmetry in the way the dependent and explanatory variables are
treated.
• The dependent variable is assumed to be statistical, random, or stochastic, that is, to have a probability
distribution.
• The explanatory variables, on the other hand, are assumed to have fixed values.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 14
Ababa University
2. Model Specification
The Goldilocks principle is named by analogy to the children's story "The Three Bears", in which a
young girl named Goldilocks tastes three different bowls of porridge and finds she prefers porridge that
is neither too hot nor too cold, but has just the right temperature. In our case, it balances unbiasedness
and precision.
• Then,
No. we set variableDefinition
Variable definitionTypeandMeasurement
expectedExpected
signssignas Sources
follows:from Literature
1 GDP NA NA
2 Labour
3 Saving
4 Foreign Loan
5 Foreign Grant
6 Health Expenditure
7 Education Expenditure
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 22
Ababa University
• Estimating coefficients of the linear model can be done by OLS, MLE and MM
method.
❖ The sign of each coefficient indicates the direction of the relationship
between a predictor variable and the response variable.
❖ The coefficient value represents the mean change in the response given a
one unit change in the predictor.
• First, we discuss estimation by Ordinary Least Squares (OLS) by minimizing the
residual sum of squares. This yields the famous Gauss estimator.
• Second, we derive estimates of the regression coefficients using the methods of
Maximum Likelihood Estimation (MLE)assuming normal errors. This also leads to
the Gauss estimator. Finally, we must note there are other methods for determining
• Finally, we drive Method of Moment, MM the regression line. They are preferred in different contexts.
Such examples are the Generalized least squares, Maximum
likelihood estimation, Bayesian regression, the Kernel
regression, and the Gaussian process regression.
• Now we show the classic way (Gauss 1809; Legendre 1805) to estimate regression coefficients by the
method of ordinary least squares (OLS).
• Goal: choose regression coefficients such as to minimize the squared error between observations and
the prediction.
• Why Sum Squared? This is because of the fact that summation of error terms give us zero, so scape out
this trap, we use the squared one.
• So, it is possible to present this in two approaches ( Summation approach and Matrix approach )
• Steps:
1. Solve for the error term of the regression equation
2. Square both sides
3. Insert summation
4. Apply first order condition
5. Solve for coefficients
n n 2
e = Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2
2
n
Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + e i =1 i =1
Y − ˆ − ˆ1 X 1 − ˆ2 X 2 = 0
n 0
e2
i =1
e = Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 n n n n n
i =1
= 2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 −1 = 0 Y − ˆ0 − ˆ1 X1 − ˆ2 X 2 = 0
n n 2
ˆ0
i =1
e = Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2
2
i =1
n
i =1
n
i =1
n
i =1
n
i =1
n
i =1
n
n e 2
2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 −1
0
Y = ˆ 0 + ˆ1 X 1 + ˆ2 X 2
e2 i =1
= i =1
= i =1 i =1 i =1 i =1
i =1
=0 ˆ0 −2 −2 n n n
ˆs n
Y = nˆ 0 + ˆ1 X 1 + ˆ2 X 2 ...(1)
e2
i =1 i =1 i =1
n
i =1
= Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 = 0
ˆ0 i =1
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 25
Ababa University
n n 2 n n 2
i =1
e = Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2
2
i =1
i =1
e = Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2
2
i =1
n n
e 2
n
e2 n
i =1
= 2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 − X 1 = 0 i =1
= 2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 − X 2 = 0
ˆ1 i =1
ˆ2 i =1
n n n n
e 2
2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 − X 1 e 2
2 Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 − X 2
0 0
i =1
= i =1
= i =1
= i =1
=
ˆ1 −2 −2 ˆ2 −2 −2
n n
e 2
n
e2 n
i =1
= Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 X 1 = 0 i =1
= Y − ˆ0 + ˆ1 X 1 + ˆ2 X 2 X 2 = 0
ˆ1 i =1
ˆ2 i =1
n n
i =1
YX 1 − ˆ0 X 1 − ˆ1 X 12 − ˆ2 X 1 X 2 = 0
YX
i =1
2 − ˆ0 X 2 − ˆ1 XX 2 − ˆ2 X 22 = 0
n n n n n n n n
YX = ˆ X
1
n
0 1
n n
+ ˆ1 X 12 + ˆ2 X 1 X 2 ...(2) YX 2 = ˆ0 X 2 + ˆ1 X1 X 2 + ˆ2 X 22
i =1 i =1 i =1 i =1
i =1 i =1 i =1 i =1
−1
n n
n
X
i =1 i =1 i =1 i =1
n X2 Y
n n n
i=1 YX 2 = i=1 ˆ0 X 2 + ˆ1 i=1 X1 X 2 + ˆ2 i=1 X 2 ...(3) X1
n n n n 1
2 Y n X 2 ˆ0 i =1 i =1
i =1
i =1
i =1 i =1
ˆ0 n n n
n
n
n n n
ˆ ˆ1 = X 1 X 2
X X
1 2 1 YX
YX 1 = X 1 X X 1 X 2 1
2 1
1 i =1 i =1 i =1 i =1
i =1 i =1 i =1 i =1 ˆ ˆ2
2 n n n
2 n
n n n n
2 X2 X X X2 YX 2
YX 2 X 2 X1 X 2 1 2
X2 i =1 i =1 i =1 i =1
i =1 i =1 i =1 i =1
B =X ' X X 'Y
−1
In Matrix form
storage display value Variable Obs Mean Std. Dev. Min Max
variable name type format label variable label
csat 51 944.098 66.93497 832 1093
csat int %9.0g Mean composite SAT score expense 51 5235.961 1401.155 2960 9259
expense int %9.0g Per pupil expenditures prim&sec percent 51 35.76471 26.19281 4 81
percent byte %9.0g % HS graduates taking SAT income 51 33.95657 6.423134 23.465 48.618
income double %10.0g Median household income, $1,000 high 51 76.26078 5.588741 64.3 86.6
high float %9.0g % adults HS diploma
college float %9.0g % adults college degree college 51 20.02157 4.16578 12.3 33.3
region byte %9.0g region Geographical region region 50 2.54 1.128662 1 4
region
N. East 69.45333 14.95479 4.64 0.000 39.25151 99.65514 Regression Equation with
South 25.39701 13.32343 1.91 0.064 -1.510213 52.30423
Midwest 34.57704 9.5368 3.63 0.001 15.31709 53.837 associated test
_cons 808.0206 79.79478 10.13 0.000 646.8718 969.1694
( )
2
Y − Y
2
Y − Xˆ
= =2 e 2
Y − X ˆ
2
1 1 .......(3)
f (Y1 , Y2 ,...Yn ) = exp
−
2
n
n
2 2
Y − X ˆ
2
1 1 .......(4)
LF ( s , 2 ) = exp
−
2
n
n
2 2
2
n 1 Y − Xˆ
ln LF ( / Y , X ) = −n ln − ln(2 ) −
2 2 2
2
n n 1 Y − Xˆ
ln LF ( / Y , X ) = − ln 2 − ln(2 ) − .............(5)
2 2 2 2
Y '
Y − Y '
Xˆ − X 'Y ˆ + X ' ˆ ' X ˆ
n n 1
ln LF = − ln − ln(2 ) −
2
ln LF n 1 2
= − 2 + 4 Y − X = 0
ˆ
ˆ 2
2 2
=
e '
e
n n n
Number of obs = 50
these stata command: csat Coef. Std. Err. z P>|z| [95% Conf. Interval]
lnsigma
_cons 2.968442 .1 29.68 0.000 2.772446 3.164439
[Link]
[Link]
[Link]
[Link]
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 37
Ababa University
Method of Moment
• This is the 3rd method of estimation (Simple and General Method of
Moment).
• Lets focus on the simple one using the first and second moment
• It is based on by imposing the two assumptions
Y = ˆ0 + ˆ1 X 1 + e Y = ˆ n + ˆ X ................(1)
0 1 1
Y − ˆ − ˆ X = 0
0 1 1
Y n
=
X ˆ
1 0
YX X X ˆ
2
Y = ˆ n + ˆ X ................(1)
0 1 1
1 1 1
−1
M 2 : eX = 0 = Y − ˆ − ˆ X X
1 0 1 1 1
ˆ n
0
=
X Y
1
X X YX
ˆ 2
Y − ˆ − ˆ X X = 0
0 1 1 1
1 1 1
Number of parameters = 9
Number of moments = 9
Initial weight matrix: Unadjusted Number of obs = 50
GMM weight matrix: Robust
• OLS is an MM estimator.
Robust
• The stata command is as Coef. Std. Err. z P>|z| [95% Conf. Interval]
follows: expense
percent
-.002021
-3.007647
.0032485
.2135325
-0.62
-14.09
0.534
0.000
-.008388
-3.426163
.004346
-2.589131
income -.1674421 1.083308 -0.15 0.877 -2.290688 1.955803
• gmm (csat - {xb: expense percent income high college
[Link] _cons}), instruments(expense percent income high high 1.814731 .9298204 1.95 0.051 -.0076834 3.637146
college [Link]) college 4.670564 1.448589 3.22 0.001 1.831381 7.509746
[Link]
E ( i / X ) = 0
• The zero conditional mean implies that the unconditional mean is also zero, The converse is not true
E ( i ) = E X E ( i / X ) = E X 0 = 0
• This assumption also implies that
E ( i X ) = 0
E (Y / X ) = 0 + 1 X
/ X N (0, 2 )
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 45
Ababa University
• Assumption 5: Full rank/ No multicollinearity: There is no
exact linear relationship among any of the independent
variables in the model. This assumption will be necessary
for estimation of the parameters of the model.
• Hence, X has full column rank; the columns of X are linearly
independent and there are at least K observations and the
surrounding text. This assumption is known as an
identification condition.
• This assumption packs three assumptions in itself.
✓ (1) There should not be any Perfect multicollinearity between any
of the regressors. X values are fixed in repeated sampling. Values
✓ (2) The number of observations (n rows of matrix X)should be taken by the regressor X are considered fixed in
greater than the number of regressors (k columns of matrix X): n>k repeated samples. More technically, X is
assumed to be nonstochastic. The X values in a
✓ (3) For the case of simple linear regression, all the values of the given sample must not all be the same.
regressor x should not be the same. Technically, var (X ) must be a finite positive
number.
Covar ( i , X ) = 0
• Assumption 7: Linearity: The model specifies a linear relationship between Y and Xs. In the
regression context, linearity refers to the manner in which the parameters and the disturbance enter
the equation, not necessarily to the relationship among the variables. So, The regression model is
linear in the parameters and error term.
Y= 0 +1X+
• Efficiency: - If these assumptions hold true, the OLS procedure creates the best possible
estimates, BLUE. In statistics, estimators that produce unbiased estimates that have the smallest
variance are referred to as being “efficient.” Efficiency is a statistical concept that compares the
quality of the estimates calculated by different procedures while holding the sample size
constant. OLS is the most efficient linear regression estimator when the assumptions hold true.
• Convergence: - Another benefit of satisfying these assumptions is that as the sample size
increases to infinity, the coefficient estimates converge on the actual population parameters.
• Reliability: - If your error term also follows the normal distribution, you can safely use
hypothesis testing to determine whether the independent variables and the entire model are
statistically significant. You can also produce reliable confidence intervals and prediction
intervals.
E ( D ) = 0 → DX = 0 → DX = 0
ˆ − ' = X ' X X '
−1
ˆ −1 X '
From Population, Y = X + = +Q
Then, n
X '
ˆ = X ' X X ' Y
−1
ˆ = + Q −1
n
= X ' X X ' ( X + )
−1
ˆ
ˆ X '
Plim n → = + Q Plim n →
−1
ˆ = + X ' X X '
−1 Plim n → ˆ = + Q −1 0
Plim n → ˆ =
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 54
Ababa University
B) Asymptotic Normality of For Large Sample, For Large Sample,
the Least Squares Estimator ˆ = +
−1
X ' X X ' X 'X
Plim n → ˆ = +
−1
Plim
X '
n n n
n →
• Consistency is an improvement n
−1
X ' X X ' X '
over unbiasedness. ˆ − = Plim n → ˆ = + Q −1Plim n →
n n n
• To derive the asymptotic −1
Plim ˆ =
X ' X X ' n →
distribution of the least n ˆ − =
n n
squares estimator, we will
X '
make use of some basic central n ˆ − = Q −1
n
limit theorems
n ˆ − → d → N (0, 2Q −1 )
Plim n → ˆ = + Plim n n
n
n →
n '
Var ( ˆ ) = Plim n→ ˆ − ˆ − 2 X ' X −1 '
X ' ˆ
Var ( ) = Plim n→ Q −1 Q
Plim n → ˆ = + Q −1Plim n →
n ' n n
X ' −1 X '
Var ( ˆ ) = Plim n→ Q −1 Q n 2
Plim ˆ = Var ( ˆ ) = Plim n→ Q −1 Q Q −1
'
n → n
'
n
X ' '
X −1 2
Var ( ˆ ) = Plim n→ Q
−1
n Q Var ( ˆ ) = Q −1
n n
Grenander conditions: - Conditions on the regressors under which the OLS estimator will be consistent. The Grenander conditions are
weaker than the assumption on the regressor X that limn->infinity(X'X)/n is a fixed positive definite matrix, which is a common starting
assumption. See Greene, 2nd ed, 1993, p 295.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 56
Ababa University
(D) Properties of Maximum Likelihood Estimation(MLE)
Y − Xˆ
2
1 1
• An alternative to the least-squares method
is the method of maximum likelihood (ML).
LF ( s , 2 ) = n
exp −
2
2
2
n
• MLE is a method for estimating parameters 2
of a statistical model. Given the distribution Y − Xˆ
of a statistical model f(y ; θ) with unknown n 1
deterministic parameter θ, MLE is to ln LF ( / Y , X ) = −n ln − ln(2 ) −
estimate the parameter θ by maximizing 2 2 2
2
the probability f(y ; θ) with observations y. Y − Xˆ
n n 1
• To use this method, however, one must ln LF ( / Y , X ) = −ln 2 − ln(2 ) −
make an assumption about the probability 2 2 2 2
distribution of the disturbance term. ˆ
2
n n 1 Y − X
• In the regression context, the assumption ln LF = − ln 2 − ln(2 ) −
most popularly made is that error term 2 2 2 2
follows the normal distribution.
n n 1 Y 'Y − Y ' Xˆ − X 'Y ˆ + X ' ˆ ' X ˆ
• Maximum likelihood estimators (MLEs) are ln LF = − ln 2 − ln(2 ) −
most attractive because of their large 2 2 2 2
=
e 'e
ˆ 2 2 2 2 4 n n
✓ Invariance
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 57
Ababa University
• Maximum likelihood estimators (MLEs) are most attractive because of their
large sample or asymptotic properties.
• Under regularity, the maximum likelihood estimator (MLE) has the
following asymptotic properties:
✓ Consistency
✓ Asymptotic normality
✓ Asymptotic efficiency
✓ Invariance
if ˆ − , n → Assignment: Prove it
Y '
Y − Y '
X ˆ − X 'Y ˆ + X ' ˆ ' X ˆ ˆ
n n 1
ln LF = − ln ˆ − ln(2 ) −
2
, Let = 2
2 2 2 ˆ 2 ˆ
2 ln LF 2 ln LF
X ' X (
− Y − X ˆ X
) Assignment: Prove it
ˆ ˆ ˆ 2 −
ln LF
2
ˆ = ˆ 2
ˆ 4
=
2 ln LF 2 ln LF − ˆ
'
Y (− X ˆ X
) n − Y( − X )
ˆ ˆ
2
ˆ ˆ
2 2
ˆ 4 2ˆ 4 ˆ 6
2
0
=
X 'X Assignment: Prove them
CRLB = I ( )
−1
2 4
0
n
The CRLB confirms the smallest variance that can be obtained by a consistent estimator. In this regard, the MLE
estimators are considered to be efficient. The variance of and their unbiased estimator is greater or equal to the
CRLB. The Variance of the MLE coefficient is the same as that of the OLS, but the variance of (S^2) for MLE is lower
that that of the OLS:
2 4 2 4
Var ( S ) MLE =
2
Var ( S )OLS =
2
n n−K
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 62
Ababa University
(D) Invariance
• The invariance property is a mathematical result of the method of computing MLEs; it is not a statistical
result as such.
• More formally, the MLE is invariant to one-to one transformations of θ. Any transformation that is not one to
one either renders the model inestimable if it is one to many or imposes restrictions if it is many to one.
• Some theoretical aspects of this feature are discussed in Davidson and MacKinnon (2004, pp. 446, 539–540).
For the practitioner, the result can be extremely useful.
• For example, when a parameter appears in a likelihood function in the form 1/θj, it is usually worthwhile to
reparametrize the model in terms of γj = 1/θj. In an important application, Olsen (1978) used this result to
great advantage.
Example: Log-Likelihood Function and Likelihood Equations for the Normal Distribution
• In sampling from a normal distribution with mean μ and variance σ 2, the log-likelihood function and the
likelihood equations for μ and σ 2 are
Y = X ˆ + e
Y1 = ˆ0 + ˆ1 X 11 + ˆ2 X 12 + ...ˆk X 1k + e1
e = Y − X ˆ
Y2 = ˆ0 + ˆ1 X 21 + ˆ2 X 22 + ...ˆk X 2 k + e2 e = Y − X ( X ' X ) X 'Y
C) MX=0
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 65
Ababa University
(B) Projection Matrix
Projection Matrix
Y = X ˆ + e ✓ P is symmetric
Y = Yˆ + e ✓ P idempotent.
✓ Both M and P are orthogonal
e = MY where M = I − X ( X ' X ) X ' ✓ PX = X
Assignment: Prove
such properties of
Yˆ = Y − e = Y − MY = I − M Y
P and M matrix
[Link]
• For an example of how this matrix is used, consider the case when we want to transform a single
variable x. In the single variable case, the sum of squared deviations about the mean is given by
(Greene, 2003, p. 808; Searle, 1982, p. 68)
[Link]
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 67
Ababa University
• It can easily be shown that M0 is symmetric so that M0^T=M0. Therefore
• For two variables x and y, the sums of squares and cross products in deviations
from their means is given by (Greene, 2003, p. 809)
[Link]
• In Matrix, using both projection and residual maker • In manipulating equations involving least squares
matrices: results, the following equivalent expressions for
the sum of squared residuals are often useful:
Y = Yˆ + e
e = MY where M = I − X ( X ' X ) X '
Y ' Y = Yˆ 'Yˆ + e'e
Yˆ = I − M Y = PY where P = X ( X ' X ) X '
e ' e = Y ' M ' MY = Y ' MY = Y 'e = eY '
Then,
Y = Yˆ + e e'e = Y ' Y − Yˆ 'Yˆ = Y 'Y − ˆ ' X ' X ˆ
Y = PY + MY N .B : ˆ = ( X ' X ) −1 ( X ' Y ) ˆ ( X ' X ) = ( X ' Y )
• The Pythagorean theorem at work in the sums of
squares e'e = Y ' Y − Yˆ 'Yˆ = Y 'Y − ˆ ' X 'Y = Y 'Y − Y ' ˆ X
Y = PY + MY
Y ' Y = Y ' P ' PY + Y ' M ' MY
Y ' Y = Yˆ 'Yˆ + e'e Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 69
Ababa University
(E) Estimating the Variance of Estimator
• The least squares residuals are
e = MY = M X + = M as MX=0
• An estimator of σ2 will be based on the sum of squared residuals:
e 'e = ' M
E e ' e = E ' M / X
• The scalar εMε is a 1 × 1 matrix, so it is equal to its trace. By using the result on
cyclic permutations
E tr ( ' M ) / X = E tr ( M ' ) / X
• Since M is a function of X, the result is
E = E tr ( M ' ) / X
tr M .E ( ' ) / X = tr M . 2 I = 2tr ( M )
E e ' e = (n − k ) 2
y yˆ
2
Y = Yˆ + e 2
e 2
= +
y = yˆ 2 + e2 + 2 ye
2
Then, ˆ n n n
Y − Y = Yˆ − Y + e
y 2 = yˆ + e2 + 0 Var (Y ) = Var (Yˆ ) + Var (e)
2
Y − Y = Yˆ − Yˆ + e − e
y 2 = yˆ + e2
2
Y − Y = Yˆ − Yˆ + e − 0 TSS = ESS + RSS
y = yˆ + e
M 0Y = M 0 X ˆ + M 0 e
• Intuitively, the regression would appear to fit well if the deviations of y from its mean are more
largely accounted for by deviations of x from its mean than by the residuals. Since both terms in this
decomposition sum to zero, to quantify this fit, we use the sums of squares instead.
• where M0 is the n × n idempotent matrix that transforms observations into deviations from sample
means. The column of M0X corresponding to the constant term is zero, and, since the residuals
already have mean zero, M0e = e.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 74
Ababa University
The coefficient of determination (R2)
• As we have shown, it must be between 0 and 1, and it
measures the proportion of the total variation in y that is • There are some problems with the use of
accounted for by variation in the regressors. R2 in analyzing goodness of fit.
• The first concerns the number of degrees
• It equals zero if the regression is a horizontal line, that is,
of freedom used up in estimating the
if all the elements of b except the constant term are zero.
parameters. R2 will never decrease when
In this case, the predicted values of y are always ¯ y, so another variable is added to a regression
deviations of x from its mean do not translate into equation.
different predictions for y. As such, x has no explanatory
power.
TSS = ESS + RSS
• The other extreme, R2 = 1, occurs if the values of x and y ESS RSS RSS
all lie in the same hyperplane (on a straight line for a two 1= + = R2 +
variable regression) so that the residuals are all zero. If all TSS TSS TSS
the values of yi lie on a vertical line, then R2 has no RSS
R2 = 1 −
meaning and cannot be computed. TSS
• Regression analysis is often used for forecasting. In this RSS / n − k
Adj.R 2 = 1 −
case, we are interested in how well the regression model TSS / n − 1
predicts movements in the dependent variable.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 75
Ababa University
Summary of ANOVA
In case of dropping one of the dummy categories, coefficients tell us the marginal difference of Y for each dummy
category comparing with the reference category, constant term. The constant term captures the mean value of Y for
the reference category. Prove this how coefficients are equal to marginal difference of Y per categories.
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 85
Ababa University
• In Example, to distinguish the three regions, we used only two dummy variables, D2 and D3. Why
did we not use three dummies to distinguish the three regions? Suppose we do that and write the
model as:
• Inserting all dummy categories with intercept leads to a case of perfect collinearity, that is, exact
linear relationships among the variables. Why? So, we forced to drop one of the dummy
categories as stated above. But, you are interested to have all, drop the constant term as follows:
i A B C D E F G H I J K L M N O
Efficiency 401 132 140 190 230 390 125 260 270 276 280 290 300 280 275
Price 77 22 24 30 32 82 20 55 60 62 69 75 77 75 70
• Given the above information, show all the necessary Steps to prove:
a) Compute estimate of coefficients (Slope and Intercept)
b) Figure our the error terms for each firms
c) Compute the TSS, ESS, and RSS
d) Compute the coefficient of determination
e) Compute the standard error of each coefficient
f) Compute the lower and upper confidence interval at 5%
g) Compute the F test and t test and make a decision about the model and policy
Zerayehu Sime Eshete (PhD), Email:
[Link]@[Link], Economics Department, Addis 90
Ababa University
• Question 2: Assume that you are assigned to conduct a research about rural-urban migration
problem in Addis Ababa with the sample size of 3240 and investigate the relationship between
laborers’ wage and their age X (Preliminary analysis of the sample data produces the following
sample information
• N N N
yi2 = 78434.97
i =1
xi2 = 25526.17
i =1
x y
i =1
i i = 3666.426
N N N
Y
i =1
i = 34379.16 X
i =1
i = 96143.00 Y
i =1
i
2
= 443227.1
N N N
X
i =1
i
2
= 2878451.0 XY
i =1
i i = 1023825.0 ˆ
i =1
i
2
= 77908.35
✓ (a) Compute the value of coefficient of determination and briefly explain what does it mean
✓ (b) Is age statistically significant to explain the variation in hourly wage? Which Hypothesis
do you support?
✓ (c)Calculate the value of F-statistic