0% found this document useful (0 votes)
267 views79 pages

Violations of Econometric Assumptions

This document discusses violations of assumptions in econometric models, focusing on violations of the assumptions of zero mean error term, homoscedasticity (constant error variance), and normality of the error term. It explains that while coefficient estimates remain unbiased if assumptions are violated, prediction and efficiency are affected. Methods for detecting violations include visual inspection of residuals and formal statistical tests, with the goal of determining appropriate data transformations or alternative model specifications.

Uploaded by

Tariku Guta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views79 pages

Violations of Econometric Assumptions

This document discusses violations of assumptions in econometric models, focusing on violations of the assumptions of zero mean error term, homoscedasticity (constant error variance), and normality of the error term. It explains that while coefficient estimates remain unbiased if assumptions are violated, prediction and efficiency are affected. Methods for detecting violations include visual inspection of residuals and formal statistical tests, with the goal of determining appropriate data transformations or alternative model specifications.

Uploaded by

Tariku Guta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT FOUR : VIOLATIONS OF ECONOMETRIC ASSUMPTIONS (TOPICS IN MULTIPLE

REGRESSION)
4.0 Aims and objectives
The aim of this unit is to show the reader what is meant by violation of basic econometric
assumption that formed the basis of the classical linear regression model. After the student have
completed this unit he/she will understand:
 the sources of the variation
 consequences of the problem
 the various ways of detecting the problem
 the alternative approaches in solving the problem
4.1 Introduction
Recall that in the classical model we have assumed
a) Zero mean of the random term
b) Constant variance of the error term (i.e., the assumption of homoscedasticity)
c) No autocorrelation of the error term
d) Normality of the error term
e) No multicolinearity among the explanatory variable.
It was on the basis of these assumptions that we try to estimate the model, and test the
significance of the model. But the question is what would be the implication if some or all of
these assumptions are violated. That is, if the assumptions are not fulfilled what will be the
outcome? In this unit we will discuss issues in violation of some of the assumptions that are
more important.
4.2 The assumption of zero expected disturbances
This assumption is imposed by the stochastic nature of economic relationships, which
otherwise it would be impossible to estimate with the common rule of mathematics. The
assumption implies that the observations of Y and X must be scattered around the line in a
^ ^
random way (and hence the estimated line Y^ = β 0 + β 1 X be a good approximation of the true
line.) This defines the relationship connecting Y and X ‘on the average ’. The alternative
possible assumptions are either E(U) > 0 or E(U) < 0. Assume that for some reason the U ’s had
not an average value of zero, but tended most of them to be positive. This would imply that the
observation of Y and X would lie above the true line.
1 Compiled Fikadu.E
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased

The above figure shows that the estimated line Y^ is not a good approximation to the true line,
E(Y)
Note that there is no test for the verification of this assumption because the assumption E(U) =
0 is forced upon us if we are to establish the true relationship. That is, we set E(U) = 0 at the
outset of our estimation procedure. Its plausibility should be examined in each particular case
on a priori grounds. In any econometric application we must be sure that the following things
are fulfilled so as to be safe from violating the assumption of E(U) = 0
i) All the important variables have been included into the function.
ii) There are no systematically positive or systematically negative errors of measurement in the
dependent variable.

4.3 The Assumption of Homoscedasticity


A) The Nature of Hetroscedasticity
The assumption of homoscedasticity (or constant variance) about the random variable U is that
its probability distribution remains the same over all observations of X, and in particular that
the variance of each Ui is the same for all values of the explanatory variable. Symbolically we
have
Var(U) = E{(Ui – E(U)}2 = E(Ui) = u2, constant
If the above is not satisfied in any particular case, we say that the U’s are hetroscedastic. That is
Var (Ui) = ui2 not constant. The meaning of homoscedasticity is that the variation of each U i
2
around its zero mean does not depend on the value of X. That is σ ui  f(Xi). Consider the
following diagram.
Note that if u2 is not constant, but its value depends on X. we may write ui2 = f(Xi). As shown
in the above diagrams there are various forms of hetroscedasticity. For example in figure (c) the
variance of Ui decreases as X increases.
In figure (b) we picture the case of (monotonically) increasing variance of U i’s: as X increases,
so does the variance of U. This is a common form of hetrodcedasticity assumed in econometric
applications. That is, the larger an independent variable, the larger the variance of the

2
associated disturbance. Various examples can be stated in support of this argument. For
instance, if consumption is a function of the level of income, at higher levels of income (the
independent variable) there is greater scope for the consumer to act on whims and deviate by
larger amounts from the specified consumption relationship. The following diagram depicts this
case.

Cons

Income
Low High
income income

Figure 4.3: Increasing variance of U


Furthermore, suppose we have a cross-section sample of family budget from which we want to
measure the savings function. That means Saving = f(income). In this case the assumption of
constant variance of the U’s is not appropriate, because high-income families show a much
greater variability in their saving behavior than do low income families. Families with high
income tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than their consumption expenditure. But this is not the case in low income
families. Hence, the variance of Ui’s increase as income increases.
Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data
rather than time series data. That is, the problem is more serious on cross section data.
B) Causes of Hetroscedasticity
Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results
of regression analysis. With outliers it would be hard to maintain the assumption of
homoscedasticity.

3
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not
be constant. But if the omitted variables are included in the model, the impression may
disappear.
In summary we may say that on a priori grounds there are reasons to believe that the
assumption of homoscedasticity may often be violated in practice. It is therefore, important to
examine the consequences of hetroscedaticity.
C) The consequence of Hetrodcedasticity
If the assumption of homoscedastic disturbance is not fulfilled we have the following
consequences:
i) If U is hetroscedastic, the OLS estimates do not have the minimum variance property in
the class of unbiased estimators; that is, they are inefficient in small samples.
Furthermore, they are inefficient in large samples
ii) The coefficient estimates would still be statistically unbiased. That is the expected value

of the β^ ' s will equal to the true parameters, E( β^ i ) = I


iii) The prediction (of Y for a given value of X) would be inefficient because of high
variance. This is because the variance of the prediction includes the variances of U and of
the parameter estimates, which are not minimum due to the incidence of
hetroscedasticity.
In any case how does one detect whether the problem really exists.
D) Detecting the problem of Hetroscedasticity
The usual first step in attacking this problem is to determine whether or not heterodcedasticity
actually exists. There are several tests for this which are based on the examination of the OLS
residuals (i.e., Ui). In this respect the rules are
i) Visual Inspection of Residuals
This is a post moreteum approach in the sense that there are no a priori information as the
existence of hetroscdasticity. So this approach examines whether the error term depicts some
systematic pattern or not. To this the residuals are plotted on graph against the dependent or

4
^2
independent variable to which it is suspected the disturbance variance is related. Although U i
are not the same thing as Ui2, they can be used as proxies especially if the sample size is
sufficiently large.
^2
The following figure shows the plot of U i against
Y^ i , the estimated Yi from the regression line
or X, the idea being to find out whether the estimated mean value of Y or X is systematically
related to the squared residual.
In figure (a) we see that there is no systematic relationship between the two variables,
suggesting that perhaps no hetrodcedasticity is present in the data. Figure (b) and (c) however,
suggests a linear relationship between the two variables particularly figure (c) reveals or
suggests that the hetroscedastic variance may be proportional to the value of Y or X. Figure (d)

and (e) indicate a quadratic relationship between U i and


^2 Y^ i or X. This knowledge may help us
in transforming our data in such a manner that in the regression on the transformed data the
variance of the disturbance is homoscedastic. Note that this visual inspection method is also
known as the informal method. The following tests follow formal method.
ii) Park Test
Park formalizes the graphical method by suggesting that i2 is some function of the explanatory
variable Xi
The functional form he suggested was
Var(Ui) = i2 = 2XieVi .......................................(4.1)
Where Vi is the stochastic disturbance form
The logarithmic form of (4.1) may be written as
lni2 = ln2 + lnXi + Vi .......................................(4.2)
^2
Since i2 is generally not known, park suggests using U i as a proxy and running the following
regression
^2
lnU i = ln2 + lnXi + Vi
= α + lnXi + Vi .......................................(4.3)
If  turns out to be statistically significant, it would suggest that hetrodcedasticity is present in
the data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity.
The park test is thus a two-stage procedure. In the first stage we run the OLS regression

5
^
U
disregarding the hetroscedasticity question. We obtain i from this regression, and then in the

second stage we run the regression stated in (4.3)


Example: Consider a relationship between Compensation (Y) and Productivity (X). To
illustrate the Park approach, the following regression function is used
Yi = βo + β1Xi + Ui .......................................(4.4)
Suppose data on Y and X is used to come up with the following result
A. Ŷ = 1992.34 + 0.23 Xi
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on X i as
suggested in (4.3), giving the following results.
^2
lnU i 2 = 35.82 - 2.81 lnXi
S.e = (38.32) (4.22)
t = (0.93) (-0.67) r2 = 0.46
As shown in the above result (t value) the coefficient of lnX i is not significant. That is, there is
no statistically significant relationship between the two variables. Following the Park test, one
may conclude that there is no hetroscedasticity in the error variance.
Although empirically appealing, the park test has some problems. For instance the error term V i
entering into the (4.3) may not satisfy the OLS assumptions and may itself be hetrodcedastic.
Nonetheless, as strictly exploratory method, one may use the park test
iii) Spearman’s Rank Correlation Test
This test requires to calculate rank correlation where its coefficient can be used to defect
hetrodcedasticity. Note that the rank correlation coefficient is given by

[2
∑ d 2i
rs = 1 – 6 n (n −1)
] .............................................(4.5)
Where di = difference in the rank assigned to two different characteristics of the i th individual or
phenomenon and n = number of individuals or phenomena ranked. The steps required in this
test is stated as follows
Assume Yi = 0 + 1Xi + Ui

6
^
U
Step 1. Fit the regression to the data on Y and X and obtain the residuals i

^
U ^ ^
Step 2. Ignoring the sign of i , that is, taking their absolute value | U i |, rank both |U i |

and Xi (or
Y^ i ) according to an ascending or descending order and compute the
spearman’s rank correlation coefficient given previously, (4.5).
Step 3. Assuming that the population rank correlation coefficient s is zero and n > 8,
the significance of the sample rs can be tested by the t test as follows:
r s √ n−2

t= √1−r 2s ........................................... (4.6)


with df = n – 2

If the computed t value exceeds the critical t value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it. If the regression model involves more than one X
^
U
variable, rs can be computed between | i | and each of the X variable separately and can be

tested for statistical significance by the t-test given.


Example To illustrate the rank correlation test consider the regression Y i = β0 + β1Xi. Suppose
10 observations are used to this equation. The following table make use of the rank correlation
approach to test the hypothesis of hetroscedasticity. Notice that column 6 and 7 put rank of |Û i|
and Xi in an ascending order.
Table 4.1 Rank Correlation Test of Hetroscedasticity
Obser d (difference
Rank Rank of
v- Y X Ŷ Û=(Y- Ŷ)
of Ûi Xi
between the d2
ation two ranking)
1 12.4 12.1 11.37 1.03 9 4 5 25
2 14.4 21.4 15.64 1.24 10 9 1 1
3 14.6 18.4 14.4 0.20 4 7 -3 9
4 16 21.7 15.78 0.22 5 10 -5 25
5 11.3 12.5 11.56 0.26 6 5 1 1
6 10.0 10.4 10.59 0.59 7 2 5 25
7 16.2 20.8 15.37 0.83 8 8 0 0
8 10.4 10.2 10.50 0.10 3 1 2 4
9 13.1 16.0 13.16 0.06 2 6 -4 16
10 11.3 12.0 11.33 0.03 1 3 -2 4

7
TOTAL 0 110

Applying formula (4.5) we obtain

rs =
[
110
1 – 6 10(100−1 )
]
= 0.33
Applying the t-test given in (4.6), we obtain:
(0 . 33) √ 8
t = √1−0 .11
= 0.99
Note that for 8 (=10-2) df this t-value is not significant even at the 10% level of significance.
Thus, there is no evidence of systematic relationship between the explanatory variable and the
absolute value of the residuals, which might suggest that there is no hetroscedasticity.
iv) The Gold feld – Quandt Test
This test is applicable to large samples. The observation must be at least twice as many as the
parameters to be estimated. The test assumes normality and serially independent disturbance
term, Ui’s. Consider the following:
Yi = 0 + 1X1i + 2X2i + kXki + Ui
Furthermore, suppose that the test is to assess whether there exists hetroscedasticity or not. The
hypothesis to be tested is
H0: Ui’s are homoscedastic
H1: U’s are hetrodcedastic (with increasing variance)
To test this, Goldfeld-Quandent perform the following steps
Step I: the observations are ordered according to the magnitude of the independent variable
thought to be related to the variance of the disturbances.
Step II: a certain number of central observations (represented by c) are omitted, leaving two
equal-sized groups of observations, one group corresponding to low values of the chosen
independent variable and the other group corresponding to high values. Note that the

8
observations are omitted to sharpen or accentuate the difference between the small variance and
the large variance group.
Step III. we fit separate regression to each sub-sample, and we obtain the sum of squared
residuals from each of them and the ratio of their sum of squared residuals is formed. That is,
^2
U i = residuals form the sub-sample of low values of X 1 with [(n-c)/2] – k degrees of
freedom, where k is the total number of parameters in the model.
^2
U i = residual from the sub sample of high values of X, with the sample degree of freedom,
[(n-c)/2] – k
If each of these sums is divided by the appropriate degrees of freedom, we obtain estimates of
^
the variances of the U ' s in the two sub samples.
Step IV : Compute the ratio of the two variances given by

∑ U 22 /[ {( n−c ) /2 }−k ] = ∑ U^ 22
F = ∑ U 1 / [ { ( n−c ) /2 }−k ] ∑ 1
* ^2 U^ 2
.........................................(4.7)
has an F distribution (with numerator and denomenator each [{n-c-2k}/2] degrees of freedom,
where n = total number of observations, c = central observations omitted, k = number of
parameters estimated from each regression). If the two variances are the same (that is, if the
U^ ' s are homoscedasticc) the value of F* will tend to one. If the variance differ, F * will have a
^2 ^2
large value (given that by the design of the test U 2 > U 1 . Generally, the observed F* is
compared with the theoretical value of F with (n-c-2k)/2 degrees of freedom (at a chosen level
of significance. The theoretical value of F (obtained from the F-tables) is the value of F that
defines the critical region of the test.
If F* > F we accept that there is hetroscedasticity (that is we reject the null hypothesis of no
difference between the variances of U’s in the two sub samples). If F * < F, we accept that the
U’s are homoscedastic (in other words we accept the null hypothesis). The higher the observed
F* ratio the stronger the hetrodcedasticity of the U’s.
Example: Suppose that we have data on consumption expenditure in relation to income for a
cross section of 30 families. Suppose we postulate that consumption expenditure is linearly
related to income but that hetoscedasticity is present in the data. Suppose further that the
middle 4 observations are dropped after the necessary reordering of the data. Suppose we

9
obtain the following result after we perform a separate regression based on the two 13
observations.

[ ]
1536 .8
11
377 .17
F* = 11

F* = 4.07
Note from the F- table in the appendix that the critical F value for 11 numerator and 11
denominator df at the 5% level is 2.82. Since the estimated F* value exceeds the critical value,
we may conclude that there is hetroscedasticity in the error variance.
Note, however, that the ability of the Goldfeld-Quadent test to perform successfully depends on
how c is chosen. Moreover, its success depends on identifying the correct X (i.e., independent)
variable with which to order the observations. This limitation of this test can be avoided if we
consider the Breusch-Pagan –Godfrey (BPG) test.
V) Breusch-Pagan –Godfrey (BPG) test
This test is relevant for a very wide class of alternative hypotheses, normally that the variance
is some function of a linear combination of known variables. The generality of this test is both
its strength (that it does not require prior knowledge of the functional form involved).
To illustrate this test, consider the k-variable linear regression model
Yi = 0 + 1X1i + + kXki + Ui ..........................................(4.8)
Assume that the error variance i2 is described as
i2 = f(1 + 2Z2i + + mZmi) ..........................................(4.9)
that is, i2 is some function of the non-stochastic variables Z’s. some or all of the X ’s can serve
as Z’s. Specifically, assume that
i2 = 0 + 1Z1i + + mZmi ..........................................(4.10)
that is, i2 is a linear function of the Z’s
If 1 = 2 = = m = 0, i2 = 0 which is constant. Therefore to test whether i2 is
homoscedastic, one test the hypothesis that 1 = 2 = = m = 0. This actual test procedure is
as follows.

Step1.Estimate Yi = 0 + 1X1i + + kXki + Ui by OLS and obtain the residualsU 1 ,U 2 , ,


^ ^ U^ n

10
~
Step2. Obtain σ =
2
∑ U^ 2i /n . Note that this is the maximum likelihood estimator of  .2

(Recall from unit two previous discussion that the OLS estimator i 2
is ∑ ^ 2 /( n−k )
U i

Step3. Construct variables Pi defined as


^ 2 ~2
Pi = U i / σ
~2
Which is simply each residual squared divided by σ
Step 4. Regress Pi thus constructed on Z’s as
Pi = 0 + 1Z1i + mZmi + Vi
Where Vi is the residual term of this regression
Step 5. Obtain the ESS (explained sum of squares) from the above equation and define
1
 = 2 (ESS)
Assuming Ui are normally distributed, one can show that if there is homoscedasticity and if the
sample size n increases indefinitely, then
 ~ X2m-1
that is,  follows assymptoticaly the chi-square distribution with (m-1) degrees of freedom
Therefore, if in an application the computed  (= X2) exceeds the critical X2 value at the chosen
level of significance, one can reject the hypothesis of homoscedasticity; otherwise one does not
reject it.
Example: Suppose we have 30 observations data on Y and X that gave us the following
regression result.
Step 1 Ŷ = 9.29 + 0.64XI.
S.E = (5.2) (0.03)
RSS = 2361.15
~
σ 2=∑ U^ 2i /30 =
2361 .15
Step 2 30 = 78.71
^ 2 /~
U σ2
step 3 pi = i hat is deride the residuals û obtained from regression in step 1 by
78.71 to construct the variable pi.
Step 4 Assuming that Pi are linearly related to Xi (=Zi) we obtain the following
regression result.

11
Pi= -0.74 + 0.01Xi
ESS = 10.42
Step 5  = ½ (ESS) = 5.21
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus,
the observed Chi square value is significant at 5% level of significance.

Note that BPG test is asymptotic. That is, it is a large sample test. The test is sensitive in small
samples with regard to the assumption that the disturbances Vi are normally distributed.
D) Remedial Measures – Solutions for Hetroscedasticic Disturbances
As we have seen, hetroscedacticity does not destroy the unbiasedness and consistency
properties of the OLS estimators, but they are no longer efficient, not even asymptotically (i.e.,
large sample size). This lack of efficiency makes the usual hypothesis testing procedure of
dubious value. Therefore, remedial measures are clearly called for. When hetroscedasticity is
established on the basis of any test, the appropriate solution is to transform the original model
in such a way as to obtain a form in which the transformed disturbances term has constant
variance. We then may apply the method of classical least squares to the transformed model.
The adjustment of the model depends on the particular form of homoscedasticity. Note that the
transformation is based on the assumption of the form of hetroscedasticity plausible
assumptions about hetrodcedasticity pattern
Assumption one: Given the model Yi = 0 + 1Xi + Ui
Suppose that we assume the error variance is proportional to Xi2. That is,
E(Ui2) = 2Xi2
If, as a matter of “speculation”, or graphical methods it is believed that the variance of U i is
proportional to the square of the explanatory variable X. For example suppose that graphical
inspection provides the following result
It is believed that the variance of U i is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi β0 Ui
= + β 1+
Xi Xi Xi

= 0
( )
1
Xi
+ 1 + Vi ............................................... (4.11)

12
where Vi is the transformed disturbance term, equal to Ui/Xi. Now it is easy to verify that

( )
2
Ui
Xi
E(Vi2) = E
1
= 2
E( U 2i )
Xi
Given this it can be concluded that
1
2
E( U 2i )
Xi = 2

This is because by definition we assumed that


E(Ui2) = 2Xi2
So by substituting it on the above result we obtained the following
1
2(
σ 2 X 2i )=σ 2
E(Vi2) = X i
Thus the variance of Vi is homoscedastic and one may proceed to apply OLS to the transformed
equation. Notice that in the transformed regression the intercept term 1 is the slope coefficient
in the original equation and the slope coefficient 0 is the intercept term in the original model.
Therefore, to get back to the original model we shall have to multiply the estimated (4.11) by X i
Assumption two: Given the model Yi = 0 + 1Xi + Ui suppose that we assume the error
variance to be proportional to Xi. That is,
E(Ui2) = 2Xi
This requires square root transformation.
For example if graphical inspection provides the following result, then it suggests that the
variance of Ui is proportional to Xi

In this case the original model can be transformed by dividing the model with √ X i . That is,
Yi β0 Xi Ui
= +β +
√ Xi √ Xi √ Xi √ Xi
1
β0 + β √ X +V
= √ x i 1 i i = Y* = 0* + 1*Xi + Vi ............................(4.12)
13
where Vi =
U i √ X i and X > 0
i

Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is

[√ ]
2
Ui
E
Xi
Var (Vi) = E(Vi2) =
1
= X i E(Ui2)
Since by assumption we said E(Ui2) = 2Xi. It implies that
1
Var (Vi) = X i 2Xi = 2
Therefore, one may proceed to apply OLS to the transformed equation. Note an important
feature of the transformed model: It has no intercept term. Therefore, one will have to use the
regression through the origin model to estimate 0 and 1. Having run regression on the
transformed model (4.12) one can get back to the original model simply by multiplying it with

√ Xi

Assumption three: A log transformation such as


lnYi = 0 + 1lnXi + Ui
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)
To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may
not know a priori which of the X variable should be chosen for transformation the data in case
of multiple regression model. In addition log transformation is not applicable if some of the Y

14
and X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large
samples when regression is conducted in transformed variables.
Check Your Progress 1
1. State with brief reason whether the following statements are true, false, or uncertain
a) In the presence of hetroscedasticity OLS estimators are biased as well as
inefficient
b) If hetroscedasticity is present, the conventional t and F tests are invalid
2. State three consequences of hetroscedasticity
3. List and explain the BPG test
4. Suppose that you have data of personal saving and personal income of Ethiopia for 31
year period. Assume that graphical inspection suggest that U i's are hetroscedasticso so
that you wanted to employ the Gordfield Quandt test. Suppose you ordered the
observation in ascending order of income and omit the nine central observations.
Applying OLS to each subset, you obtained the following result.
a) For sub set I
S^ 1 = -738.84 + 0.008I
i

^2
U 1 = 144,771.5
b) For Sub set II
S^ 2 = 1141.07 + 0.029I
^2
U 2 = 769,899.2
Is there any evidence of hetroscedasticity?

4.4 AUTOCORRELATION
A. The Nature of Autocorrelation
An important assumption of the classical linear model is that there is no autocorrelation or
serial correlation among the disturbances U i entering into the population regression function.
This assumption implies that the covariance of Ui and Uj in equal to zero. That is:
Cov(UiUj) = E{[Ui – E(Ui)] [Uj – E (Uj)]
= E(UiUj) = 0 (for i  j)

15
But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.
i) Spatial autocorrelation: In regional cross-section data, a random shock affecting
economic activity in one region may cause economic activity in an adjacent region to
change because of close economic ties between the regions. Shocks due to weather
similarities might also tend to cause the error terms between adjacent regions to be related.
ii) Prolonged influence of shocks: In time series data, random shocks (disturbances) have
effects that often persist over more than one time period. An earth quick, flood, strike or
war, for example, will probably affect the economy’s operation in periods.
iii) Inertia: past action often have a strong effect on current actions, so that a positive
disturbance in one period is likely to influence activity in succeeding periods.
iv) Data manipulation published data often undergo interpolation or smoothing, procedures
that average true disturbances over successive time periods.
v) Misspecification: An omitted relevant independent variable that is auto correlated will
make the disturbance (associated with the misspecified model) auto correlated. An
incorrect functional form or a misspecification of the equation ’s dynamics could do the
same. In these instances the appropriate procedure is to correct the misspecification.
Note that autocorrelation is a special case of correlation. Autocorrelation refers to the
relationship not between two (or more) different variables, but between the successive values of
the same variable (where in this section we are particularly interested in the autocorrelation of
the U’s. Moreover, note that the term autocorrelation and serial correlation are treated
synonymously.
Since auto correlated errors arise most frequently in time series models, the discussion in the
rest of this unit is couched in terms of time series data.
There are a number of time-series patterns or process that can be used to model correlated
errors. The most common is what is known as “ the first order autoregressive process or AR(1)
process. Consider
Yt = 0 + 1Xt + Ut
where t denotes data or observation at time t (i.e., a time series data) with this one can assume
that the disturbances are generated as follows
Ut = Ut-1 + t

16
Where  is known as the coefficient of auto covariance and where t is the stochastic such that
it satisfies the standard OLS assumptions, namely
E(t) = 0
Var(t) = 2
Cov (t, t+s) = 0
where subscript ‘s’ represent the exact period of lag.
The above specification is of first order because the regression of U t is on itself lagged one
period (where the coefficient  is the first order coefficient of autocorrelation) Note that the
above specification postulates that the movement or shift in Ut consists of two parts: a part
Ut-1, which accounts for systematic shift, and the other t which is purely random.
Relationships between Ut’s
Cov (Ut, Ut-1) = E[(Ut – E(Ut) (Ut-1 – E(Ut-1)]
= E[Ut Ut-1]
by substituting Ut = Ut-1 + t we obtain:
E[(Ut-1 + t) Ut-1]
= E[U2t-1] + E[t Ut-1]
Note that E(t) = 0 thus E(t Ut-1) = 0
Since with the assumption of homoscedasticity (i.e., constant variance) Var(U t) = Var (Ut-1) =
u2 the result would be
Cov (Ut, Ut-1) = u2
Now, correlation of Ut, Ut-1 is given by (recall what we have discussed in the course statistics
for economists):
Cov (U t , U t−1 ) ρσ 2u
=
Corr (Ut, Ut-1) = √Var (U t ) Var (U t−1 )
Var (U t )

ρσ 2u
2

= σu
where -1 <  < 1
Hence, (rho) is simple correlation of the successive errors of the original model.
Note that when  > 0 successive errors are positively correlated and when  < 0 successive
errors are negatively correlated. It can be shown that corr(U t, Ut-s) = s (where s represents the

17
exact period of lag). It implies that the correlation (be it negative or positive) between any two
period diminishes as time goes by; i.e., as s increases
b) Consequences of Autocorrelation

When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates are affected.
i) If disturbances are correlated, the prevaild value of the disturbances have some information
to convey about the current disturbances. If this information is ignored it is clear that the
sample data is not being used with maximum efficiency. However the estimates of the
parameters do not have the statistical biase even when the residuals are serially correlated.
That is, the parameter of OLS estimates are statistically unbiased in the sense that their
expected value is equal to the true parameter.
ii) The variance of the random term U may be seriously underestimated. In particular, the
under estimation of the variance of U will be more serious in the case of positive
autocorrelation of the error term (U t). With positive first-order auto correlated errors it
implies that fitting an OLS estimating line clearly gives an estimate quite wide of the mark.
The high variation in these estimates will cause the variance of OLS to be greater than it
would have been had the errors been distributed randomly. The following figure illustrates
positive autocorrelated errors

Notice from the diagram that the OLS estimating line gives a better fit to the data than the true
relationship. This reveals why in this contest r 2 is overestimated and u2 (and the variance of
^
OLS) is under estimated. When the standard error of β ' s are biased down wards, it leads to
confidence intervals which are much narrow. Moreover, parameter estimate of irrelevant
explanatory variable may be highly significant. In other words, the figure reveals that the

estimated error term


U^ i are closer to the regression line than are the U’s to the true line and

thus we would have a serious underestimation of u2

iii) The prediction based on ordinary least squares estimate will be inefficient with
outocorrelated errors. This is because of having a larger variance as compared with predictions
based on estimates obtained from other econometric techniques. Recall that the variance of the

18
forecast depends on the variances of the coefficient estimates and the variance of U. Since these
variances are not minimal as compared with other techniques, the standard error of the forecast
(from OLS) will not have the least value, due to autocorrelated U’s.

c) Testing (Detecting) for Autocorrelation


Autocorrelation is potentially a series problem. Hence, it is essential to find out whether
autocorrelation exists in a given situation. Consider the following commonly used tests of serial
correlation.
Note that since the population disturbances Ut, cannot be observed directly, we use its proxy,

the residual
U^ t which can be obtained form the usual OLS procedure. The examination of U^ t
can provide useful information not only about autocorrelation but also about hetrescedasticity,
model inadequacy, or specification bias.
i) Graphical Method
Some rough idea about the existence of autocorrelation may be gained by plotting the residuals
either against time or against their own lagged variables.
For instance, suppose plotting the residual against its lagged variable bring about the following
relationship.
Û t

Uˆ t 1

Figure 4.9
U^ t and U^ t−1
As the above figure reveals most of the residuals are bunched in the first and the third
quadrants suggesting very strongly that there is positive correlation in the residuals. However,
the graphical method we have just discussed is essentially subjective or qualitative in nature.
But there are quantitative tests that can be used to supplement the purely qualitative approach
ii) Durbin-Watson d Test

19
The most celebrated test for detecting serial correlation is the one developed by statisticians
Durbin and Watson. It is popularly known as the Durbin-Watson d-Statistic which is defined as
n
∑ (U^ t −U t −1)2
t =2
n
∑ U^ 2t
d= t −2 ............................................(4.13)
which is simply the ratio of the sum of squared differences in successive residuals to the
residual sum of squares, RSS. Note that in the numerator of the d statistic the number of
observations is n-1 because one observation is lost in taking successive differences. Note that
expanding the above formula allows us to obtain

d = 2(1 - ρ^ ). ......................................................... (4.14)


Although it is not used routinely, it is important to note the assumptions underlying the d-
statistics
a) the regression model includes an intercept term
b) the explanatory variables are non-stochastic or fixed in repeated sampling
c) the disturbances Ut are generated by the first order autoregressive scheme.
Ut = Ut-1 + t
d) the regression model does not include lagged value(s) of the dependent variable as one
of the explanatory variables
e) there are no missing observations in the data
Note from the Durbin-Watson statistic that for positive autocorrelation ( > 0), successive
disturbance values will tend to have the same sign and the quantities (U t – Ut-1)2 will tend to be
small relative to the squares of the actual values of the disturbances. We can therefore, expect
the value of the expression in equation (4.13) to be low. Indeed, for the extreme case  = 1 it is
possible that Ut = Ut-1 for all t so that the minimum possible value of the equation is zero.
However, for negative autocorrelation, since positive disturbance values now tend to be
followed by negative ones and vise versa, the quantities (U t – Ut-1)2 will tend to be large relative
to the squares of the U’s. Hence, the value of (4.13) now tends to be high. The extreme case
here is when  = 0 we should expect the expression (4.14) to take a value in the neighborhood
of 2. Notice, however, that when  = 0, the equation reduces to U t = t for all t, so that t takes
on all the property of Ut – in particular it is no longer autocorrelated. Thus in the absence of the

20
autocorrelation we can expect equation (4.14) to take a value close to 2, when negative
autocorrelation is present a value in excess of 2 and may be as high as 4, and when positive
autocorrelation is present a value lower than 2 and may be close to zero.
The Durbin-Watson test tests the hypothesis that H 0:  = 0 (implying that the error terms are
not autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory
variables k and also on the actual sample values of the explanatory variables. Thus, the critical
values at which we might, for example reject the null hypothesis at 5 percent level of
significance depend very much on the sample we have chosen. Notice that it is impracticable
to tabulate critical values for all possible sets of sample values. What is possible however, is for
given values of n and k, to find upper and lower bounds such that actual critical values for any
set of sample values will fall within these known limits. Tables are available which give these
upper and lower bounds for various levels of n and k and for specified levels of significance.(In
the appendices part you can get the Durbin Watson table)
The Durbin-Watson test procedure in testing the null hypothesis of  = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d *, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of
d with in which the actual sampling distribution must lie whatever the sample x-values.

du
d
dL

d*L d* d*u 4
Figure 4.10 Distribution of dL and dU

21
The point d*U and d*L are such that the areas under the respective d u and dL curves to the left of
these points are in each case 5 percent of the total area. i.e., p(d L < d*L) = p(dU < d*U) = 0.05. It
is the point d*U and d*L, representing the upper and lower bounds to the unknown d *, that are
tabulated for varying values of n and k. Clearly, if the sample value of the Durbin-Watson
statistic lies to the left of d *L it must also lie to the left of d *, while if it lies to the right of d *U . it
must also lie to the right of d *. However, there is an inconclusive region, since if d lies
between d*L and d*U we cannot know whether it lies to the left or right of d*
The decision criterion for the Durbin-Watson test is therefore, of the following form
- for d < d*L reject the null hypothesis of no autocorrelation in favor of positive
autocorrelation;
- for d > d*U do not reject null hypothesis, i.e., insufficient evidence to suggest positive
autocorrelation;
- for d*L < d < d*U test inconclusive.
Because of the symmetry of the distribution illustrated in the previous figure it is also possible
to use the tables for d*L and d*U to test the null hypothesis of no autocorrelation against the
alternative hypothesis of negative autocorrelation, i.e.  < 0. The decision criterion then takes
the form.
- for d > 4 - d*L reject the null hypothesis of no autocorrelation in favor of negative
autocorrelation.
- for d < 4 - d*U do not reject null hypothesis, i.e., insufficient evidence to suggest negative
autocorrelation
- for 4 - d*L > d > 4- d*U test inconclusive.
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two
tail tests. The following representation explains better the actual test procedure which shows
that the limit of d are 0 and 4.

Note:
H0: No positive autocorrelation
H0*: No Negative autocorrelation

Reject H0 Zone of Zone of Reject H0


Evidenc indecision indecision Evidenc
Do not reject H0 or H*
e of or both e of
22
positive negativ
d
0 dL dU 2 4-dU 4-dL 4

Note that from the above presentation we can develop the following rule of thumb. That is, if d
is found to be closer to 2 in an application, one may assume that there is no first order
autocorrelation either positive or negative. If d is closer to 0 it is because  is closer to 1
indicating strong positive autocorrelation in the residuals. Similarly the closer d is to 4, the
greater the evidence of negative serial correlation. This is because  is closer to –1.
Example: Suppose in a regression involving 50 observations 4 regressors the estimated d was
1.43. From the Durbin Watson table we find that at the 5% level the critical d value are d L =
1.38 and dU = 1.72 (the reader should check this by refering the Durbin Watson table attached in
the appendix). Note that on the basis of the d test we can not say whether there is positive
autocorrelation or not because the estimated d value lies in the indecisive range
d) Remedial Measure
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measure.
If the source of the problem is suspected to be due to omission of important variables, the
solution is to include those omitted variables. Besides if the source of the problem is believed
to be the result of misspecification of the model, then the solution is to determine the
appropriate mathematical form.
If the above approaches are ruled out, the appropriate procedure will be to transform the
original data so that we can comeup with a new form (or model) which satisfies the assumption
of no serial correlation. Of course, the transformation depends on the nature of the serial
correlation. If the nature of serial correlation is assumed to follow the first-order autoregressive
scheme, namely,
Ut = Ut-1 + t I . ....................................................(4.15)
In this case the serial correlation problem can be satisfactorily resolved if , the coefficient of
autocorrelation, is known.
Consider the following two variable model
Yt = 0 + 1Xt + Ut .......................................................(4.16)
For time t-1 the above model will be

23
Yt-1 = 0 + 1Xt-1 + Ut-1 .......................................................(4.17)
multiplying both sides by , we obtain
Yt-1 = 0 + 1Xt-1 + Ut-1 .....................................................(4.18)
Subtracting (4.19) from (4.17) gives
(Yt - Yt-1) = (0 - 0) + (1Xt - 1Xt-1) + Ut - Ut-1
= 0 (1-) + 1(Xt -Xt-1) + t ................................(4.19)
The transformed model can be expressed as
Y*t = *0 + *1X*t + t ................................(4.20)
Where Y*t = Yt - Yt-1, *0 = 0 - 0 and X*t = (Xt - Xt-1)
Since t (which is Ut - Ut-1 and can be derived from [4.15]) satisfies the OLS assumptions,
one can proceed to apply OLS to the transformed variable Y * and X* and obtain estimators with
all the optimum properties, namely BLUE. Running regression on (4.19- the above transformed
model) is tantamount to using generalized least squares (GLS).
Note that although the procedure discussed earlier is straight forward to apply, it is generally
difficult to run because , population correlation coefficient is rarely known in practice.
Therefore, alternative methods need to be devised.
The Cochrane-Orcutt interative Procedure

This procedure helps to estimate  from the estimated residuals


U^ t so that information about the

unknown  will be derived.


To explain the method, consider the two-variable model
Yi = 0 + 1Xi + Ui ............................................. (4.21)
and assume that Ut is generated by the AR(1) scheme namely
Ut = Ut-1 + t ............................................ (4.22)
Cochrance and Orcutt then recommended the following steps to estimate :
Step 1: Estimate the two variable model by the standard OLS routine and obtain the residuals
U^ t
Step 2: Using the estimated residuals, run the following regression
U^ t = ρ^ U^ t−1 +V t

24
Step 3: Using ρ^ obtained from step 2 regression, run the generalized difference equation
similar to (4.20) as follows

(Yt - ρ^ Yt-1) = 0 (1- ρ^ ) + 1(Xt - ρ^ Xt-1) + (Ut - ρ^ Ut-1)

or Y*t = *0 + *1X*t +


U^ ¿t

Step 4: Since a priori it is not known that the ρ^ obtained from the regression in step 2 is the
^¿
β ^¿
best estimate of , substitute the values of 0 and β 1 obtained from the regression in
^ **
step 3 into the original regression (4.21) and obtain the new residuals, say U t as

U^ ** ^¿ ^¿
t = Yt - β 0 - β 1 Xt

^¿ ^¿
Note that this can be easily computed since Yt, Xt, β 0 and β 2 are all known.
Step 5: Now estimate this regression

U^ ** ^ ^ **
t = ρ^ U t−1 + Wt

Where ρ^^ is the second round estimate of 


Since we do not know whether this second round estimate of is the best estimate of , we can
go into the third estimate, and so on. That is why the Cochrane-Orcutt method is said iterative.
But how long should we go on? The general procedure is to stop carrying out iterations when
the successive estimates of  converges (i.e., differ by very small amount). Thus we select that
choosen  to transform the model and apply a kind of GLS estimation that minimizes the
problem of autocorrelation.
Check Your Progress 2
1. State whether the following statements are true or false. Briefly justify your answer
a) when autocorrelation is present, OLS estimators are biased as well as inefficient
b) in the presence of autocorrelation, the conventionally computed variances and standard
errors of the forecast values are inefficient
2. Given a sample of 50 observation and four explanatory variables what can you say about
autocorrelation if
i) d = 1.05 ii) d = 1.40

25
3. Suppose Yi = 0 + 1Xi + U i. Assume that U i is generated by the AR(I)scheme. Show the
Cochrane- Orcutt procedure to testing autocorrelation
4. Suppose that a research used a 20 years data on imports and GDP of Ethiopia.
Applying OLS to the observations she obtained the following import function.
M = -2461 + 0.28G
^2
U t = 573, 069

∑ ( U t −U t−1)2 = 537,192
Where M = import, G= GDP
Use the Durban Watson test to examine the problem of autocorrelation.
4.5 MULTICOLLINEARITY
a) The nature of the problem
One of the assumption of the classical linear regression model (CLRM) is that there is no
perfect multicollinearity among the regressors included in the regression model. Note that
although the assumption is said to be violated only in the case of exact multicollinearity (i.e., an
exact linear relationship among some of the regressors), the presence of multicollinearity (an
approximate linear relationship among some of the regressors) lead to estimating problems
important enough to warrant out treating it as a violation of the classical linear regression
model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set
at hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be
the lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact
exist some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it
becomes extremely difficult to establish the influence of each one regressors on the dependent
variable separately. That is, if two explanatory variables change by the same proportion, the

26
influence on the dependent variable by one of the explanatory variables may be erroneously
attributed to the other. Their effect cannot be sensibly investigated, due to the high inter
correlation.
In general, the problem of multicollinearity arises when individual effects of explanatory
variables cannot be isolated and the corresponding parameter magnitudes cannot be determined
with the desired degree of precision. Though it is quite frequent in cross section data as well, it
should be noted that it tends to be more common and more serious problem in time series data.
b) Consequences of Multicollinearity
In the case of near or high multicollinearity, one is likely to encounter the following
consequences
i) Although BLUE, the OLS estimators have large variances and covariances, making
precise estimation difficult. This is clearly seen through the formula of variance of the
^
estimators. For example of multiple linear regression, Var( β 1 ) can be written as follows
σ2
^
Var( β 1 ) = ∑ x 21i (1−r212)
It is apparent from the above formula that as r 12 (which is the coefficient of correlation
between X1 and X2) tends towards 1, that is as collinearity increases, the variance of the
^ ^ ^
estimator increases. The same holds for Var( β 2 )and the cov ( β 1 , β 2 )

ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient ’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R 2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
vi) Forecasting is still possible if the nature of the collinearity remains the same within the
new (future) sample observation. That is, if collinearity exists on the data of the past 15

27
years sample, and if collinearity is expected to be the same for the future sample period,
then forecasting will not be a problem.

c) Detecting Multicollinearity
Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not “test
for multicollinearity” but can, if we wish, measure its degree in any particular sample. The
following are some rules of thumb and formal rules to detection of multicolinearity.
i) High R2 but few significant t-ratios: If R 2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient
among two regressors is high, say in excess of 0.8, then multicolinearity is a serious
problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the
regressors are exact or approximately linear combinations of the other regressors, one way
of finding out which X variable is related to other X variables is to regress each X i on the
remaining X variables and compute the corresponding R 2that will help to decide abut the
problem. For example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + + k-1Xk-1 + V
If the R2 of the above regression is high it implies that X k is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
d) Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables ”. Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one

28
can try the following rules of thumb, the success depending on the severity of the collinearity
problem.
a) Obtain more data: - Because the multicollinearity is essentially a data problem, additional
data that do not contain the multicollinearity feature could solve the problem. For
example, in the three variable model we saw that
σ2
Var( β 1 ) = ∑ x 1i (1−r 12 )
^ 2 2

Now as the sample size increases, x1i2 will generally increases. Thus for any given r 12, the
^
variance of β 1 will decrease, thus decreasing the standard error, which will enable us to
estimate 1 more precisely.
b) Drop a variable: - when faced with severe multicollinearity, one of the “simplest ” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we
are dropping a variable when its true coefficient in the equation being estimated is not
zero.
c) Transformation of variables: - In time series analysis, one reason for high
multicollinearity between two variables is that over time both variables tend to move in
the same direction. One way of minimizing then dependence is to transform the variables.
That is, suppose Yt = 0 + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
Yt-1 = 0 + 1X1t-1 + 2X2t-1 + Ut-1.
Subtracting this from the above gives
Yt – Yt-1 = 1(X1t – X1t-1) + 2(X2t – X2t-1) + Vt
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of

29
X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated
d) Formalize relationships among regressors: - If it is believed that the multicollinearity
arises not from an unfortunate data set but from an actual approximate linear relationsip
among some of the regressors, this relationship could be formalized and the estimation
could then proceed in the context of a simultaneous equation estimation problem.
Check your progress 3
1. State with reasons whether the following statements are true, false or uncertain
a) Despite perfect multicollinearity, OLS estimators are BLUE
b) If an auxiliary regression shows that a particular R 2 is high, there is definite
evidence of high collinearity.
2. In data involving economic time series such as GDP, income, prices, unemployment, etc.
multicollinearity is usually suspected. Why?
3. State three remedial measure if multicollinearity is detected
4.7 ANSWERS TO CHECK YOUR PROGRESS
Answer to check your progress 1
1 a) False. Because though OLS estimates are inefficient with the presence of
hetroscedasticity, they would still be statistically unbiased.
b) True, because OLS estimated do not have the minimum variance.
The answer for question number 2 and 3 is discussed in the text
∑ U^ 22 769 , 899 .2
=5
F* = ∑ U 1
^ 2 144 , 771 .6
4.
The theoretical (table) value of F at the 5 percent level of significance (where n= 31,
C= 9 and k= 2 so that 9 df) is 3.18.
Given that F*>F0.05, we reject the assumption of homoscedasticity.
Answer to check your progress 2
1 a) False, because OLS estimates are unbiased.
b) True (see the explanation in the text)
2 a) Note that for n= 50 and k = 4 dL is 1.38
since d = 1.05 is less than dL. It suggests the existence of positive autocorrelation.

30
b) dL = 1.38, d0 = 1.72
Since 1.38 < d= 1.40 < 1.72; it lies in the indecisive range.
3. The answer is discussed in the text.
∑ ( U^ t −U^ t−1 )2 537 , 192
4. d* =
∑ U^ t 2 = 573 , 069 =0 .937
From the Durbin - Watson table, with 5 percent level of significance, n = 20 and K=1, we find
that d2 = 1.20 and du = 1.41. Since d* = 0.937 is less that d2 = 1.20, we conclude that there is
positive autocorrelation in the import function.
Answer To Check Your Progress 3
1. a) True (refer the text for the explanation)
b) True (refer the text for the explanation)
2. This is because the variables are highly interrelated. For example, an increase in income
brings about an increase in GDP. Moreover, an increase in unemployment usually brings
about a decline in prices.
3. Refer the text for the answer
4.8 MODEL EXAMINATION QUESTIONS
1. True or false and explain when necessary
a) In the presence of hetroscedasticity the usual OLS method always overestimates the
standard errors of estimators.
b) If a regression model is mis-specified the OLS residuals will show a distinct pattern.
c) The Durbin Watson d test assumes that the variance of the error term Ut is
homoscedastic.
d) In case of high multi co linearity, it is not possible to assess the individual
significance of one or more partial regression coefficients.
2. Consider the following model
Yt = 0 + 1 Xt + 2Xt-1+ 3Xt-2 + 4Xt-3 + Ut
Where Y = Consumption, X = Income, T = Time. Note That The Model Implies That
Consumption Expenditure At Time T Is A Function Of, Current Income, Xt And Previous
Periods' Income.
a) Would you expect multi-co linearity in such model.

31
b) If co linearity is expected, how would you resolve the problem
3. You are given the following data.
^2
U 1 based on the first 30 observations = 55, df =25.
^2
U 2 based on the last 30 observations = 140, df = 25
Carrying out the Goldfeld Quandt test of hetroscedasticity at the 5% level of significance.
4. Given a sample of 50 observations and 4 explanatory variables, what can you say about
autocorrelation
a) d = 2.50 b) 3.97
5) Answer the following
a) Discuss the causes of hetroscedasticity
b) State the consequences of autocorrelation
c) Explain 3 remedial measures suggested to overcome multi-co linearity.

32
UNIT 5: FURTHER TOPICS IN REGRESSION
5.1 INTRODUCTION
As it is mentioned in the previous section, this unit is dealing with the role of qualitative
explanatory variables in regression analysis and the functional forms of some non-linear
regressor models. It will be shown that the introduction of qualitative variables, often called
dummy variables, makes the linear regression model an extremely flexible tool that is capable
of handling many interesting problems encountered in empirical studies. Having a brief
introduction on such binary variables the functional forms of regression models (i.e., regression
models that may be non linear in the variables but are liner in the parameters) will be discussed.
Double log, semi-log and reciprocal models will be shown. We will see their special features
and functional forms.
5.2 MODELS WITH BINARY REGRESSORS
5.2.1 The Nature of Dummy Variables
In regression analysis it frequently happens that the dependent variable is influenced, not only
by variables which can be readily quantified on some well defined scale (eg. income, output,
price, etc) but also by variables which are essentially qualitative in nature. For example, color,
sex, race, religion, change in policy, nationality etc are all dummy variables. Since such kind of
variables may have an influence on the dependent variable they should be included in the
model. How can one include such variables as an explanatory variable in the model?
Since such qualitative variables usually indicate the presence or absence of an ‘attribute ’ (ex,
male or female, black or white etc) one method of quantifying such attributes is by constructing
artificial variables which take on values of 1 or 0, 0 indicating the absence of an attribute and 1
indicating the presence of that attribute.
Example. If an individual is male = 1
female = 0
Variables which assume such 0 and 1 values are called dummy variables or binary variables or
qualitative variables or categorical variables or dichotomous variables.
Now let us take some examples with a single quantitative explanatory variable and two or more
qualitative explanatory variables.

33
Example 1: Suppose a researcher wants to find out whether sex makes any difference in a
college teacher’s salary, assuming that all other variables such as age, education level,
experience etc are held constant.
The model can be formulated as follows

Yi =
β0 + β1 D + U
i i

Where Y = annual salary of a college teacher


Di = 1 if male college teacher
= 0 otherwise (i.e., female teacher)
The above model is like the two-variable regression models discussed in unit 2. The difference
is instead of a quantitative variable X we have a dummy variable D. (The disturbance term
satisfy all the assumptions of the CLRM)

Mean salary of female college teacher E(Yi/Di = 0) =


β0

Mean salary of male college teacher E(Yi/Di = 1) =


β0 + β1

The intercept term


β 0 gives mean salary of female college teachers and the slope coefficient

β 1 tells by how much the mean salary of a male college teacher differs from the mean salary of

his female counterpart.


Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1
The results of regression analysis are presented as follows:
Y^ i = 18,000 + 3,280D
i

(0.32) (0.44)

34
t= (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college teachers is birr
¿ ¿ ¿

18,000 (= β 0 ) and that of male teachers is birr 21,280 ( β 0 + β 1 )


¿

Since β 1 is statistically significant, the results indicate that the mean salaries of the two
categories are different, actually the female teacher ’s average salary is lower than her male
counter part. If all other variables are held constant, there is sex discrimination in the salaries of
the two sexes.
The above regression can be shown graphically below:

Salary

 
 
( 0+ 1)
= 21,800
= 3,280

 0 = 18,000

Figure 5.1: Female and male teacher’s salary functions


Example 2: Let us consider a regression with quantitative and qualitative explanatory
variables. Let us include one quantitative explanatory variable on the model given in example 1
above

Yi =
β0 + β1 D +β2 X + U
i i i

where Yi = annual salary of college teacher


Xi = years of teaching experience
Di = 1 if male
= 0 otherwise
The female teacher is known as the base category since it is assigned the value of 0.
Note that the assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D = 1 for female and D = 0
35
for male. But in interpreting the results of the models which use the dummy variables it is
critical to know how the 1 and 0 values are assigned.

The coefficient
β 0 (intercept) is the intercept term for the base category. The coefficient β 1
attached to the dummy variable D can be called the differential intercept coefficient because it
tells by how much the value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
The other important point is on the number of dummy variables to be included in the model. If
a qualitative variable has m categories, introduce only m –1 dummy variables. In the above
examples, sex has two categories, and hence we introduced only a single dummy variable. If
this rule is not followed, we shall fall in to what might be called the dummy-variable trap, that
is, the situation of perfect multicollinearity.
Example 3: Let us take an example on regression on one quantitative variable and one
qualitative variable with more than two classes. Suppose we want to regress the annual
expenditure on health care by an individual on the income and education of the individual. Now
the variable education is qualitative in nature. We can have, as an example, three mutually
exclusive levels of education.
- Less than high school
- High school
- College
The number of dummies = 3 – 1 = 2. (Note the rule)
Let us consider the “less than high school education” category as the base category. The model
can be formulated as follows:

Yi =
β0 + β1 D +β2 D + β3 X + U
1i 2i i i

where Yi = annual expenditure on health care


Xi = annual income
D1 = 1 if high school education
= 0 otherwise
D2 = 1 if college education
= 0 otherwise

36
Assuming E(Ui) = 0, the mean health care expenditure functions for the three levels of
education are:

E(Yi/D1 = 0, D2 = 0, Xi) =
β 0 + β 3 X , for less than high school education
i

E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,

E(Yi/D1 = 0, D2 = 1, Xi) = (
β 0 + β 2 )+ β 3 X , for college education
i

College education

High school education

Less than high school (Base category)


education

2
1

0
X (income)
Figure 5.2: Expenditure on health care in relation to income for three levels of education

The intercept
β 0 is the intercept of the base category. The differential intercepts β 1 and β 2 tells
by how much the intercepts of the other two categories differ from the intercept of the base
category.
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. If you consider example 1 above it is possible to introduce another dummy variable,
for example, color of the teacher, as an explanatory variable. Hence we will have an additional
dummy variable for color i.e.
D2 = 1 if white and 0 otherwise
Therefore, it is possible to include more than one quantitative variable and more than two
qualitative variable in our linear regression model.
Check Your Progress 5.2.1
1. Suppose that the salary of economics graduate students depends on their degree
qualification (whether a candidate has a Ph.D degree or not).
a) Specify the model with salary as the dependent variable and degree qualification as
the explanatory variable

37
b) Find E(Yi/Xi = 0) and E(Yi/Xi = 1)
2. Suppose that a researcher wants to regress the annual salaries of economics graduates on
the number of years of experience and education level of the graduates (Here we have
three levels of education, namely. BA, MSC and Ph.D)
a) How many dummy variables will be included in the model
b) Specify the model considering BA as the base category
c) Find the mean values of the annual salaries corresponding to different values of the
regressors.
5.3NON-LINEAR REGRESSION MODELS
The purpose of this section is to introduce you with models that are linear in the parameters but
non linear in the variables.
5.3.1 Non Linear Relationships in Economics
The assumption of linear relationship between the dependent and the explanatory variables may
not be acceptable for many economic relationships. Given the complexity of the real world we
expect non-linearities in most economic relationships.
Example 1: Cost functions are usually non-linear

ATC ATC

Quantity
of output

Example 2: Production functions

Product
(Y) TP

Input

Fig 5.3. Examples of non linear relationships in economics

38
Other economic functions like demand, supply, income-consumption curves, etc can also be
non-linear.
5.3.2 Specification and Estimation of Non-Linear Models
Now let us consider some of the commonly used regression models that may be non-linear in
the variables but are linear in the parameters.
5.3.2.1 Transformation of the Polynomials
Some of the most common forms of non-linear economic relationships can be expressed by
polynomials.

E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,

E(Yi/D1 = 0, D2 = 1, Xi) = (
β 0 + β 2 )+ β 3 X , for college education
i

Example 1: Y =
β0 + β1 X + β2 X 2 + β3 X 3 + +U
1 1 1

If we consider the U-shaped average cost curve

C=
β 0 + β 1 X - β 2 X2 + β 3 X3 +U
Where, C = total cost; X = output
To fit this model we need to transform some of the variables
Let X2 = Z and X3 = W, U = error term
Then the above model becomes

C=
β0 + β1 X -β2 Z + β3 W + U
Now we can proceed with the application of OLS to the above linear relationship
Example 2. Suppose we have data on yield of wheat and amount of fertilizer applied. Assume
that the increased amount of fertilizer begin to burn the crop causing the yield to decline.
Y X X2
55 1 1
70 2 4
75 3 9
65 4 16
60 5 25
we want to fit the second degree equation

Yi =
β0 + β1 X + β2 X 2 + U
1i 1i i

2
Let X 1 i = W

39
Then Yi =
β0 + β1 X + β2 W + U
1i i

This is linear both in terms of parameters and variables. We apply OLS to the above function.
The results are presented as follows:
Y^ i = 36 + 24.07X – 3.9 X i2
i

(6.471) (1.059)
t= 3.72 -3.71
2
It is possible to test the significance of X i

H0: β 2 = 0

H1: β 2 < 0
¿
β2 −β 2 −3 . 90
¿ =
t= S( β ) 1 .059
2 = -3.71
t0.05(5-3) = 2.92
2
Decision: we reject H0: since β 2 is significant, X i should be retained in the model. This
implies that the relationship between yield and amount of fertilizer has to be estimated by
second degree equation.
5.3.2.2 Double-Log or log-Log Models
This model is very common in economics. Consider the following model

Yi =
β 0 X 1β 1i X 2β2i
This can be transformed in to linear form by using logarithm

lnYi = ln
β 0 + β 1 lnX + β 2 lnX
1 2

Since both the dependent and the explanatory variables are expressed in terms of logarithm, the
model is known as double-log or log-log or log-linear model.
If we include the disturbance term
β1 β2
Yi =
β 0 X 1 i X 2 i eU

Which may be alternatively expressed as

lnYi = ln
β 0 + β 1 lnX + β 2 lnX + U
1 2

40
This model is linear in the parameters and can be estimated by OLS if the assumptions of the
classical linear regression model are fulfilled.
¿
Suppose lnYi = Y* lnX1 = X 1

ln
β 0 = β ¿0 ¿
lnX2 = X 2
¿
Y* =
β 0 + β 1 X ¿1 + β 2 X ¿2 +U Which is linear both in terms of the parameters and variables.
i

Example 3: The following table shows the yearly outputs of an industry and the amount of
inputs (labor and capital) used for eight firms.
Output Labor Capital
(Q) (L) (K)
100 1 2.0
120 1.3 2.2
140 1.8 2.3
150 2.0 1.5
165 2.5 2.8
190 3.0 3.0
200 3.0 3.3
220 4.0 3.4
The objective is to estimate the Cobb-Douglas production function for an industry on the basis
of the random sample of eight firms. The estimated production function is
logQ = 4.3900 + 0.4349X1 + 0.3395X2
or logQ = 4.3900 + 0.4349 logL + 0.3395 logK
The model can be written in its original form as follows
Q = antilog (4.3900) L0.4349 K0.3395
Q = 80.64 L0.4349 K0.3395
The model in its complete form can be given as
Q = 80.64 L0.4349 K0.3395
(0.1118) (0.2683)
R2 = 0.99
Interpretation: One attractive feature of the log-log model, which has made it popular in

applied work, is that the coefficient β 1 and β 2 measure the elasticity of output with respect to
L and K (labor and capital).

41
¿
β 1 = 0.4349 – implies that a one percent increase in labor input will result a 0.4349% increase
in the output level assuming that capital is held constant.
¿
β 2 = 0.3395 – implies that a one percent increase in the amount of capital will increase the
level of output by 0.3395 percent assuming that labor is constant.

Note that the sum of elasticities ( β 1 + β 2 ) indicate the type of returns to scale. The returns to
scale show the responsiveness of output when all inputs are charged proportionately

If β 1 + β 2 = 1: – Constant returns to scale


β 1 + β 2 > 1: – Increasing returns to scale

β 1 + β 2 < 1: – Decreasing returns to scale

(You may refer to the course micro economic theory I)


Check Your Progress 5.3.2
The following table shows the demand (Y) for a commodity and its price (X1).
Y X1
543 61
580 ...54 a) Estimate the demand function
618 ...50 Y=
 0 X 1 eU
1

695 43  0 +  1 lnX + U
Or lnY = Ln 1
724 ...38
812 .. .36 b) Interpret  1

887 28
991 23
1186 ..19
1940 ..10
5.3.2.3 Semilog Models: Log-lin and Lin-log models
Semilog models are those whose dependent or explanatory variable is written in the log form.

Example 1: 1. lnYi =
β0 + β1 X + U
i i

2. Yi = 0 + 1 lnXi + Ui

42
The above models are called semilog models. We call the first model log-lin model and the
second model is known as lin-log model. The name given to the above models is based on
whether the dependent variable or the explanatory variable is in the log form.
Now let us consider the log-lin model (model 1 above)

lnYi =
β0 + β1 X + U
i i

β 1 measures the relative change in Y for a given absolute change in X, that is

relative change in Y
β 1 = absolute change in X

Multiplying the relative change in Y by 100 will give you the percentage change in Y for an
absolute change in X.

Example: ln
G N^ P t = 6.96 + 0.027T
Where GNP = real gross
(0.015) (0.012) national product
T – time (in years)
r2 = 0.95
F1.13 = 260.34
The above result shows that the real GNP of the country was growing at the rate of 2.70 percent
per year (for the sample period). It is possible to estimate a linear trend model
G N^ P t = 1040.11 + 35 T
(18.9) (2.07)
r2 = 0.95
F1.13 = 284.7
This model implies that for the sample period the real GNP was growing at the constant
absolute amount of about $35 billion a year. The choice between the log-lin and linear model
will depend up on whether one is interested in the relative or the absolute change in the GNP.
NB: you can not compare the r2 values of the two models since the dependent variables are
different.
5.3.2.4 Reciprocal Models
The functions defined as

43
β1
β
Yi = 0 + x i + Ui is known as a reciprocal model. Although this model is nonlinear in

the variable X because it enters inversely or reciprocally, the model is linear in


β 0 and β 1 is
therefore a linear regression model. The method of OLS can be applied to estimate the model.
1
If we let xi = Z, the model becomes

Y=
β 0 + β 1 Z + U which is linear both in terms of the parameters and variables.
i

β
1
( )
The above model shows that as X increases indefinitely, the term 1 x approaches zero and

Y approaches the limiting or asymptotic value


β 0 . Some examples are shown below.

Y Y Y

1 > 0 1 > 0 1 < 0


0 > 0 0 < 0 0

x
0 0
x
0 x - 0 1
(a) (b) (c)
0 0
 b1

Figure 5.4: the reciprocal model Yi =


β0 + β1 ( )
1
x b0

We can have examples for each of the above functions (fig. a, b and c)
1. The average fixed cost curve relates the average fixed cost of production to the level of
output. As it is indicated in fig. (a) the AFC declines continuously as output increases.
2. The Philips curve which relates the unemployment rate with the rate of inflation can be a
good example for fig (b) above
3. The reciprocal model of fig (c) is appropriate Engel expenditure curve that relates a
consumer’s expenditure on a commodity to his total expenditure or income.
SUMMARY ON FUNCTIONAL FORMS
Slope Elasticity

44
dy dy x
Model Equation dx = dx . y

Linear Y=
β0 + β1 X β1 β1 ( )
x
y

Log-linear
β
lnY = 0 + β 1 lnX
y
β1 x( ) β1

Log-lin
β
lnY= 0 + β 1 X β 1 (y) β 1 (x)

Lin-log Y=
β 0 + β 1 lnX β1( ) 1
x β1 ( ) 1
y

( ) ( ) β ( xy )
1 1 1
Reciprocal
β
Y = 0 + β1 x -β x1
2
- 1

Note that if the value of x and y are not given elasticity is often calculated at the mean values,
x̄ and ȳ .

Check Your Progress 5.3.3


1
Y^
1. The reciprocal model t = -1.43 + 8.72 xt , r2 = 0.38
(2.07) (2.85) , F1.15 = 9.39
is estimated from a given data
where Yt = annual percentage change in wage rate
x = unemployment rate
a) which of the figures in fig. (5.4) fits this model
¿

b) what does the value of β 0 = -1.43 indicate?


5.5 ANSWERS TO CHECK YOU PROGRESS
5.2.1

1 a) Yi =
β0 + β1 X + U
i

b) E(Yi/Xi = 0) =
β0

E(Yi/Xi = 1) =
β0 +β1
2. a) Number of dummy variables = No of categories –1 = 3 – 1 = 2

45
b) Yi =
β0 + β1 D + β2 D + β3 X + U
1i 2i i i

where Yi = annual salary


Xi = year of experience
D1 = 1 if Ph.D degree
= 0 otherwise

c) E(Yi/Di = 1, D2 = 0, Xi) =
β0 + β1 + β3 X
i

E(Yi/D1 = 0, D2 = 0, Xi) =
β0 +β3 X
i

E(Yi/D1 = 0, D2 = 1, Xi) = (
β0 +β2 ) + β3 X
i

5.3.2
a) lnY = 9.121 – 0.69 ln X
(10.07) (0.02)
R2 = 0.992
b) An increase in price by one percent will decrease the demand for the commodity by
0.69 percent.
5.3.3
(a) fig (b)
(b) –1.43 is the wage floor. It shows that as X increases indefinitely the percentage
decrease in wages will not be more than 1.43 percent per year.
5.7 MODEL EXAMINATION QUESTIONS
1. Explain the role of qualitative explanatory variables in regression analysis.
2. The following regression explains the determination of moon lighter’s hourly wages.
Wm = 37.07 + 0.403 W0 – 90.06 ra + 75.51 U + 47.33 H + 113.64 re + 2.26 A
(0.062) (24.47) (21.60) (23.42) (27.62) (0.94)
R2 = 0.34 df = 311
Where Wm = moonlighting wage (cents/hour)
Wo = primary wage (cents/hour)
ra = race = 0 if while
= 1 non-white
U = urban = 0 non-urban

46
= 1 urban
H = High school = 0 non graduate
= 1 high school graduate
A = Age, years
From the above equation derive the hourly wage equations for the following types of
moonlighters.
a) White, non-urban, western resident, and high school graduate.
b) Nonwhite, urban, non-western resident, and non-high school graduate.
c) White, non-urban, non-west resident, and high school graduate.
d) White, non-urban, non-west, non-graduate
e) Nonwhite, urban, west, high school graduate (when the dummies are equal to 1)
f) What do you understand about the statistical significance of the variables in the above
model?
g) Interpret the coefficients.
3. The following table gives data on annual percentage change in wage rates(Y) and the
unemployment rate (X) for a country for the period 1950 – 1966.
Percentage increase Unemployment (%)
Year in wage rates (Y) X
1950 1.8 1.4
1951 8.5 1.1
1952 8.4 1.5
1953 4.5 1.5
1954 4.3 1.2
1955 6.9 1.0
1956 8.0 1.1
1957 5.0 1.3
1958 3.6 1.8
1959 2.6 1.9
1960 2.6 1.5
1961 4.2 1.4
1962 3.6 1.8
1963 3.7 2.1
1964 4.8 1.5
1965 4.3 1.3
1966 4.6 1.4
Use these data to fit the following model

47
β
Yt = Yi = 0 + β 1
( )
1
xt
+ Ut
4. We have seen the following growth model
ln
G N^ P t = 6.96 + 0.027T r2 = 0.95
(0.015) (0.0017) F1,13 = 260.34
and that of the linear trend model
G N^ P t = 1040.11 + 35 T r2 = 0.95
(18.86) (2.07) F1,13 = 284.74
Which model do you prefer? Why?
5. The demand function for coffee is estimated as follows
Y^ t = 2.69 – 0.4795 X r2 = 0.6628
t

(0.1216) (0.1140)
where Yt = cups per person per day Xt = average retail price of coffee
Find the price elasticity of demand.
UNIT 6: INTRODUCTION TO SIMULTANEOUS EQUATION

6.1 Introduction
The application of least squares to a single equation assumes, among others, that the
explanatory variables are truly exogenous, that there is one-way causation between the
dependent variable (Y) and the explanatory variables (X). That is, the function cannot be
treated in isolation as a single equation model but belongs to a wider system of equations which
describes the relationship among all the relevant variables. In such cases we must use a multi
equation model which would include separate equations in which y and x would appear as
endogenous variables. A system describing the joint dependence of variables is called a system
of simultaneous equations.
6.2. SIMULTANEOUS DEPENDENCE OF ECONOMIC VARIABLES
In a single equations discussed in the previous units the cause and effect relationship is
unidirectional where the explanatory variables are the cause and the dependent variable is the
effect.
However, there are situations where there is a two-way flow of influence among economic
variables; that is, one economic variable affects another economic variable(s) and is, in turn,
affected by it (them). In such case we need to consider two equations and thus come up with

48
simultaneous equation models in which there is more than one regression equations for each
independent variable.
The first thing we need to answer is the question of “what happens if the parameters of each
equation are estimated by applying, say, the method of OLS, disregarding other equations in
the system? Recall that one of the crucial assumptions of the method of OLS is that the
explanatory X variables are either non stochastic or if stochastic (random are distributed
independently of the stochastic distribution term. If neither of these conditions is met, then, the
least-squares estimators are not only biased but also inconsistent; that is, as the sample size
increases indefinitely, the estimators do not converge to their true (population) values.
For example, consider the following hypothetical system of equation
Y1i = 10 + 12Y2i + 11X1i + U1i ..(6.1)
Y2i = 20 + 21Y1i + 21X1i + U2i ...(6.2)
Where Y1 and Y2 are mutually dependent or endogenous, variables (i.e. whose value are
determined with in the model) and X 1 an exogenous variable (whose value are determined out
side the model) and where U1 and U2 are stochastic disturbance terms, the variables Y 1 and Y2
are both stochastic. Therefore, unless it can be shown that the stochastic explanatory variable
Y2 in (6.1) is distributed independently of U 1 and the stochastic explanatory variable Y 1 in (6.2)
in distributed independently of U2, application of classical OLS to these equations individually
will lead to inconsistent estimates.
Example. Recall that price of a commodity and the quantity (bought and sold) are determined
by the intersection of the demand and supply curves for that commodity. Consider the
following linear demand and supply models.
d
Demand function Q t = 0 + 1Pt + U1t ...(6.3)
s
Supply function Q t = 0 + 1Pt + U2t (6.4)
d s
Equilibrium Condition Q t = Q t . ..(6.5)
d s
Where Q t = Quantity demanded, Q t = Quantity supplied, P = price and t = time
Note that P and Q are jointly dependent variables. If U 1 changes because of changes in other
d
variables affecting Q t (such as income and tastes) the demand shifts. Recall that such shift in
demand changes both P and Q. Similarly, a change in U 2t (because of changes in weather and

49
the like) will shift (affect) supply, again affecting both P and Q. Because of this simultaneous
dependence between Q and P, U1 and Pt in (6.3) and U2t and Pt is (6.4) cannot be independent.
Therefore a regression of Q on P as in (6.3) would violate an important assumption of the
classical linear regression model, namely, the assumption of no correlation between the
explanatory variable(s) and the disturbance term. In summary, the above discussion reveals that
in contrast to single equation models, in simultaneous equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the number
of endogenous variables. As a consequence such an endogenous explanatory variable becomes
stochastic and is usually correlated with the disturbance term of the equation in which it
appears as an explanatory variable.
Recall that the variable entering a simultaneous equation model are of two types: They are
called endogenous and predetermined variables. Endogenous variables are those variables
whose values are determined inside the model. Predetermined variables on the other hand, are
those whose values are determined outside the model. Predetermined variables are divided into
exogenous and lagged endogenous variables. Although non economic variables such as rainfall
and weather are clearly exogenous or predetermined, the model builder must exercise great care
in classifying economic variables as endogenous or predetermined. Consider the Keynesian
model of income determination
Consumption function: Ct = 0 + 1Yt + Ut 0 < 1 < 1 (6.6)
Income identity: Yt = Ct + It .. ..(6.7)
In this model C(consumption) and Y (income are endogenous variables. Investment (I) on the
other hand is treated as exogenous variable. Note that if there were lagged values of
consumption and income variables (i.e., C t-1 and Yt-1) they would have been treated as lagged
endogenous and hence predetermined variables.
Consider the problem of estimating the consumption function, regressing consumption on
income. Suppose the disturbance in the consumption function jumps up. This directly increases
consumption, which through the equilibrium condition increases income. But income is the
independent variable in the consumption function, (6.6). Thus, the disturbance in the
consumption function and the regressor are positively correlated. An increase in the disturbance
term (directly implying an increase in consumption) is accompanied by an increase in income
(also implying an increase in consumption) when estimating the influence of income on

50
consumption, however, the OLS technique attributes both of these increases in consumption
(instead of just the latter) to the accompanying increase in income. This implies that the OLS
estimator of the marginal propensity to consume (1) is biased upward, even asymptotically.
Both equation 6.6 and 6.7 are structural or behavioral equations because they are portraying the
structure of an economy, where equation (6.7) being an identity. The ’s are known as the
structural parameters or coefficients. From the structural equations one can solve for the
endogenous variables and derive a reduced-form equations and the associated reduced form
coefficients. A reduced form equation is one that expresses an endogenous variable solely in
terms of the predetermined variables and the stochastic disturbances.
If equation (6.6) is substituted into equation (6.7), and solve for Y we obtain the following
β0 1 Ut
Yt = 1−β 1 + 1−β 1 It + 1−β 1
= 0 + 1It + Wt ..(6.8)
β0 1 Ut
where 0 = 1−β 1 , 1 = 1−β 1 and Wt = 1−β 1
Equation (6.8) is a reduced-form equation; it expresses the endogenous variable Y solely as a
function of the exogenous (or predetermined) variable I and the stochastic disturbance term U.
0 + and 1 are the associated reduced form coefficients.
Substituting the value of Y from equation (6.8) into Y t of equation (6.6), we obtain another
reduced-form equation given by
Ct = 2 + 3It + Wt ..(6.9)
β0 β1 Ut
where 2 = 1−β 1 , 3 = 1−β 1 and Wt = 1−β 1
The reduced form coefficients, (the ’s) are also known as impact, or short run multipliers,
because they measure the immediate impact on the endogenous variable of a unit change in the
value of the exogenous variable. If in the preceding Keynesian model the investment
expenditure (I) is increased by, say $1 and if the marginal propensity to consume (i.e., 1) is
1
assumed to be 0.8, then from 1 of (6.8) we obtain 1 = 1−0 . 8 = 5. This result means that

51
increasing the investment by $1 will immediately (i.e., in the current time period) lead to an
increase in income of $5, that is, a fire fold increase.
Notice an interesting feature of the reduced-form equations. Since only the predetermined
variables and stochastic disturbances appear on the right side of these equations, and since the
predetermined variables are assumed to be uncorrelated with the disturbance terms, the OLS
method can be applied to estimate the coefficients of the reduced-form equations (the ’s). This
will be the case if a researcher is only interested in predicting the endogenous variables, only
wishes to estimate the size of the multipliers (i.e. the ’s)
Note that since the reduced form coefficients can be estimated by the OLS method and these
coefficients are combinations of the structural coefficients, the possibility exist that the
structural coefficients can be “retrieved” from the reduced-form coefficients, and it is in the
estimation of the structural parameters that we may be ultimately interested. Unfortunately,
retrieving the structural coefficients from the reduced form coefficients is not always possible;
this problem is one way of viewing the identification problem.

6.3. THE IDENTIFICATION PROBLEM


By the identification problem we mean whether numerical estimates of the parameters of a
structural equation can be obtained from the estimated reduced-form coefficients. If this can be
done, we say that the particular equation is identified. If this cannot be done, then we say that
the equation under consideration is unidentified, or under identified.
Note that the identification problem is a mathematical (as opposed to statistical) problem
associated with simultaneous equation systems. It is concerned with the equation of the
possibility or impossibility of obtaining meaningful estimates of the structural parameters.
An identified equation may be either exactly (or fully or just) identified or over identified. It is
said to be over identified if more than one numerical value can be obtained for some of the
parameters of the structural equations. The circumstances under which each of these cases
occurs will be shown in the following discussion.
a) Under Identification

52
Consider the demand-and-supply model (6.3) and (6.4), together with the market clearing or
equilibrium, condition (6.5) that demand is equal to supply. By the equilibrium condition (i.e.,
Q dt = Qts ) we obtain,

0 + 1Pt + U1t = 0 + 1Pt + U2t (6.10)


Solving (6.10) using the substitution technique employed in (6.8) and (6.9), we obtain the
equilibrium price
Pt = 0 + Vt ..(6.11)
β0 +α 0
where 0 = α 1 −β 1
U 2t −U 1t
V1 = α 1−β 1
Substituting Pt from (6.1) into (6.3) or (6.4) we obtain the following equilibrium quantity:
Qt = 1 + Wt ..(6.12)
α 1 β 0 −α 0 β 1
where 1 = α 1 −β 1
α 1 U 2t −β 1 U 1 t
Wt = α 1−β 1

Note that 0 and 1, (the reduced-form-coefficients) contain all four structural parameters; 0,
1, 0 and 1. But, there is no way in which the four structural unknowns can be estimated from
only two reduced form coefficients. Recall from high school algebra that to estimate four
unknowns we must have four (independent) equations, and in general, to estimate k unknowns
we must have R (independent) equations. What all this means is that, given time series data on
p(price) and Q(quantity) and no other information, there is no way the researcher guarantee
whether he/she is estimating the demand function or the supply function. That is, a given P t and
Qt represent simply the point of intersection of the appropriate demand and supply curves
because of the equilibrium condition that demand is equal to supply.
b) Just or Exact Identification
The reason we could not identify the preceding demand function or the supply function was
that the same variables P and Q are present in both functions and there is no additional
information. But suppose we consider the following demand and supply model.

53
Demand function: Qt = 0 + 1Pt + 2It + U1t 1 < 0, 2 > 0 ......................... (6.13)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t 1 > 0, 2 > 0 ...................... (6.14)
where I = income of the consumer, an exogenous variable
P t-1 = Price lagged one period, usually incorporated in the model to explain the supply of
many agricultural commodities.
Note that Pt-1 is a predetermined variable because its value is known at time t.
By the market-clearing mechanism we have
0 + 1Pt + 2It + U1t = 0 + 1Pt + 2Pt-1 + U2t ............... (6.15)
Solving this equation, we obtain the following equilibrium price
Pt = 0 + 1It + 2Pt-1 + Vt .............................. (6.16)
β 0−α 0 α2

where 0 = α 1−β 1 , 1 = α 1 −β 1

β2 U 2t −U 1t
2 = α 1 −β 1 , Vt = α 1−β 1
Substituting the equilibrium price (6.16) into the demand or supply equation of (6.13) or (6.14)
we obtain the corresponding equilibrium quantity:
Qt = 3 + 4It + sPt-1 + Wt ...................................... ..(6.17)
where the reduced-form coefficients are
α 1 β 0 −α 0 β 1 α2 β 1
3 = α 1 −β 1 , 4 = α 1 −β 1
α1 β 2 α 1 U 2t −β 1 U 1 t
5 = α 1 −β 1 , Wt = α 1−β 1

the demand-and-supply model given in equations (6.13) and (6.14) contain six structural
coefficients 0, 1, 2, 0, 1, and 2 – and there are six reduced form coefficients - 0, 1, 2,
3, 4 and 5 – to estimate them. Thus, we have six equations in six unknowns, and normally
we should be able to obtain unique estimates. Therefore, the parameters of both the demand
and supply equations can be identified and the system as a whole can be identified.
c) Over identification

54
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (6.13) can be modified as follows,
keeping the supply function as before:
Demand function: Qt = 0 + 1Pt + 2It + 3Rt + U1t .(6.18)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t .(6.19)
where R represents wealth
Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = 0 + 1It + 2Rt + 3Pt-1 + Vt .. .. (6.20)
Qt = 4 + sIt + 6Rt + 7Pt-1 + Wt .... (6.21)
β 0−α 0 α2
where 0 = α 1−β 1 , 1 = α 1 −β 1
α3 β2
2 = α 1 −β 1 , 3 = α 1 −β 1
α 1 β 0 −α 0 β 1 α2 β 1
4 = α 1 −β 1 , 5 = α 1 −β 1
α3 β 1 α1 β 2
6 = α 1 −β 1 , 7 = α 1 −β 1
α 1 U 2t −β 1 U 1 t U 2t −U 1t
Wt = α 1−β 1 , Vt = α 1−β 1
The demand and supply model in (6.18)_ and (6.19) contains seven structural coefficients, but
there are eight equations to estimate them – the eight reduced form coefficients given above
(i.e., 0 7). Notice that the number of equations is greater than the number of unknowns. As
a result, unique estimation of all the parameters of our model is not possible. For example, one
can solve for 1 in the following two ways
π6 π5
1 = π 2 or 1 = π 1
That is, there are two estimates of the price coefficient in the supply function, and there is no
guarantee that these two values or solutions will be identical. Moreover, since 1 will be
transmitted to other estimates. Note that the supply function is identified in the system (6.13)
and (6.14) but not in the system (6.18) and (6.19), although in both cases the supply function
55
remains the same. This is because we have “too much” or an over sufficiency of information to
identify the supply curve. The over sufficiency of the information results from the fact that in
the model (6.18) and (6.19) the exclusion of the income variable form the supply function was
enough to identify it, but in the model (6.18) and (6.19) the supply function excludes not only
the income variable but also the wealth variable. In other words, in the latter model we put “too
many” restrictions on the supply function by requiring it to exclude more variables than
necessary to identify it. However, this situation does not imply that over identification is
necessarily bad since the problem of too much information can be handled.
Notice that the situation is the opposite of the case of under identification where there is too
little information. The only way in which the structural parameters of unidentified (or under
identified) equations can be identified (and thus be capable of being estimated) is through
imposition of further restrictions, or use of more extraneous information. Such restrictions, of
course, must be imposed only if their validity can be defended.
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. However this time consuming procedure can
be avoided by resorting to either the orders condition or the rank condition of identification.
Although the order condition is easy to apply, it provides only a necessary condition for
identification. On the other hand the rank condition is both a necessary and sufficient condition
for identification. [Note: the order and rank conditions for identification will not be discussed
since the objective of this unit is to briefly introduce and inform the reader about simultaneous
equation. For detailed and advanced discussion readers can refer the reference list stated at the
end of this unit].
6.4 A TEST OF SIMULTANEITY
If there is no simultaneous equation, or simultaneous problem, the OLS estimators produce
consistent and efficient estimators. On the other hand, if there is simultaneity, OLS estimators
are not even consistent so that other testing methods are looked for. If we apply these
alternative methods when there is in fact no simultaneity, the result will not be efficient. This
suggests’ that we should check for the simultaneity problem before we discord OLS in favor of
the alternatives.
A test of simultaneity is essentially a test of whether (an endogenous) regresor is correlated
with the error term. If it is, the simultaneity problem exists, in which case alternatives to OLS

56
must be found: if it is not, we can use OLS. To find out which is the case in a concrete
situation, we can use houseman’s specification error test.
Houseman Specification Test
Consider the following two-equation model
Demand function Qt = 0 + 1Pt + 2It + 3Rt + U1t (6.22)
Supply function Qt = 0 + 1Pt + U2t .. (6.23)
Assume that I and R are exogenous of course, P and Q are endogenous
Now consider the supply function (6.23). If there is no simultaneity problem (i.e., P and Q are
mutually independent), Pt and U2t should be uncorrelated on the other hand, if there is
simultaneity, Pt and U2t, will be correlated. To find out which is the case, the houseman test
procedure as follows:
First, from (6.22) and (6.23) we obtain the following reduced form equations
Pt = 0 + 1It + 2Rt + Vt .......................... ...... (6.24)
Qt = 3 + 4It + 5Rt + Wt ................................... .. (6.25)
where V and W are the reduced form error terms Estimating (6.24) by OLS we obtain
P^ t = π^ 0 + π^ 1 I + π^ 2 R ..(6.26)
t t

Therefore Pt =
P^ V^
t + t . .(6.27)

Where
P^ t are estimated P , and V^ t are estimated residuals. Substituting (6.27) into (6.23) we
t

get:
P^
Qt = 0 + 1
V^
t+ 1 t + U2t (6.28)

Now under the null hypothesis that there is no simultaneity, the correlation between
V^ t and U
2t

should be zero, asymptotically. Thus if we ran the regression (6.28) and find that the coefficient
of Vt in (6.28) is statistically zero, we can conclude that there is no simultaneity problem.
6.5 APPROACHES TO ESTIMATION
At the outset it may be noted that the estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical properties. In view of the introductory
nature of this unit we shall consider very briefly the following techniques.
a) The method of Indirect Least Squares (ILS)

57
For just or exactly identified structural equation, the method of obtaining the estimates of the
structural coefficients from the OLS estimators of the reduced form coefficients is known as the
method of indirect least squares (ILS). ILS involves the following three steps
Step I: - We first obtain the reduced form equations.
Step II: - Apply OLS to the reduced form equations individually.
Step III: - Obtain estimates of the original structural coefficients from the estimated reduced
form coefficients obtained in step II.
b) The method of two stage least squares (2SLS)
This method is applied in estimating an over identified equation. Theoretically, the two stages
least squares may be considered as an extension of ILS method. The 2SLS method boils down
to the application of ordinary list squares in two stages. That is, in the first stage, we apply least
squares to the reduced form equations in order to obtain an estimate of the exact and random
components of the endogenous variables appearing in the right hand side of the equation with
their estimated value and then we apply OLS to the transformed original equation to obtain
estimates of the structural parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is usually
applied uniformly to all identified equations in the system. [For a detailed discussion of this
method readers may refer the reference list stated at the end of this unit].
Check Your Progress

1. What do we mean by simultaneous equation?

2. When do we say a model is over identified? What is the consequence of over


identification?

3. Briefly explain the ILS and 2SLS approaches to estimation

4. Consider the following hypothetical model.


It = 0 + 1Yt + Ut
Yt = Qt + It
Write the reduced form equation expressed in the form of Yt and It
6.7 ANSWER TO CHECK YOUR PROGRESS

 Refer the text for question 1-3

Y = 1−α
t
α
+
0 1
1−α
Y +
1
1−α
Ut t
4. 1 1 1

58
I = 1−α
t
α
+
0 α
1−α1
1

1
It+
1
U
1−α 1 t

6.8 MODEL EXAMINATION


1. What is the economic meaning of the imposition of “Zero restriction on the parameters of a
Model?
2. Consider the following extended Keynesian model of income determination.
Ct = 0 + 1Yt + 2It + U1t
It = 0 + 1Yt-1 + U2t
Tt = 0 + 1Yt + U3t
In the model identify the endogenous and exogenous variables.
3. State whether each of the following statements are true or false.
a) The method a structural equation in a simultaneous equation model.
b) In case an equations not identified, 2SLS is not applicable.
4. The model
Ct = 10 + 12Y2t + 11X1t + U1t
Y2t = 20 + 21Y1t+ U2t
Produces the following reduced from equations.
Y1t = 4 + 8X1t
Y2t = 2 + 12X1t
a) Which structural coefficients, if any, can be estimated from the reduced form
coefficients?
b) Show that the reduced form parameters measure the total effect of a change in the
exogenous variables.
UNIT 7: REGRESSION ON DUMMY DEPENDENT VARIABLE
7.1 INTRODUCTION
Binary dependent variables are extremely common in the social sciences. Suppose we want to
study the labor-force participation of adult males as a function of the unemployment rate,
average wage rate, family income, education, etc. A person either is in the labor force or not.
Hence, the dependent variable, labor-force participation, can take only two values: 1 if the
person is in the labor force and 0 if he or she is not. We can consider another example. A

59
family may or may not own a house. If it owns a house, it takes a value 1 and 0 if it does not.
There are several such examples where the dependent variable is dichotomous. A unique
feature of all the examples is that the dependent variable is of the type that elicits a yes or no
response; that is, it is dichotomous in nature. Now before we discuss the estimation of models
involving dichotomous response variables, let us briefly discuss the concept of qualitative
response models:
7.2 QUALITATIVE RESPONSE MODELS (QRM)
These are models inwhich the dependent variable is a discrete outcome.
Example 1. Y = 0 + 1X1 + 2X2
Y = 1, if individual i attended college
= 0, otherwise
In the above example the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression cannot be used to analyze a qualitative dependent variable model.
The models are analyzed in a general framework of probability models.
7.2.1 Categories of Qualitative Response Models (QRM)
Two broad categories of QRM
A. Binomial Model
The choice is between two alternatives
B. Multinomial models
The choice is between more than two alternatives
Example: Y = 1, occupation is farming
= 2, occupation is carpentry
= 3, occupation is fishing
Let us define some important terminologies
i. Binary variables: are variables that have two categories and are often used to indicate that
an event has occurred or that some characteristic is present.
Example: - Decision to participate in the labor force/or not to participate
-Decision to vote or not to vote
ii. Ordinal variables:- these are variables that have categories that can be ranked.
Example: – Rank to indicate political orientation
Y = 1, radical

60
= 2, liberal
= 3, conservative
- Rank according to education attainment
Y = 1, primary education
= 2, secondary education
= 3, university education
iii. Nominal variables: These variables occur when there are multiple outcomes that cannot be
ordered.
Example: Occupation can be grouped as farming, fishing, carpentry etc.
Y = 1 farming
= 2 fishing Note that numbers are
assigned arbitrarily
= 3 carpentry
= 4 Livestock
iv. Count variables: These variables indicate the number of times some event has occurred.
Example: How many strikes have been occurred.
Now let us turn our attention to the four most commonly used approaches to estimating binary
response models (Type of binomial models).
1. Linear probability models
2. The logit model
3. The probit model
4. The tobit (censored regression) model.
7.3 THE LINEAR PROBABILITY MODEL (LPM)
The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:

Yi =
β0 + β1 X + U (1)
i i

where X = family income


Y = 1 if the family owns a house
= 0 if the family does not own a house
Ui is the disturbance term
The independent variable Xi can be discrete or continuous variable. The model can be extended
to include other additional explanatory variables.
61
The above model expresses the dichotomous Yi as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Y i/Xi) gives the
probability of a family owing a house and whose income is the given amount X i. The
justification of the name LPM can be seen as follows.
Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain

E(Yi/Xi) =
β0 + β1 X .(2)
i

Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Y i has the following
distributions:
Yi Probability
0 1−Pi
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ..(3)
Now, comparing (2) with (3), we can equate

E(Yi/Xi) = Yi =
β0 + β1 X = P (4)
i i

That is, the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi.
Since the probability Pi must lie between 0 and 1, we have the restriction 0  E (Yi/Xi)  1 that
is, the conditional expectation, or conditional probability, must lie between 0 and 1.
Problems with the LPM
While the interpretation of the parameters is unaffected by having a binary outcome, several
assumptions of the LPM are necessarily violated.
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see
this as follows. We have the following probability distributions for U.

62
Yi Ui Probability
0 − β 0 − β1 X i 1− Pi
1 1− β 0 − β1 X i Pi
Now by definition Var (Ui) = E(Ui – E(Ui)]2 = E(Ui2) since E(Ui) = 0 by assumption
Therefore, using the preceding probability distribution of Ui, we obtain

Var(Ui) = E(Ui2) = (-
β 0 – β 1 X )2 (1-P ) + (1- β 0 – β 1 X )2 (P )
i i i i

=(-
β 0 – β 1 X )2(1- β 0 – β 1 X ) + (1- β 0 – β 1 X )2 ( β 0 + β 1 X )
i i i i

=(
β 0 + β 1 X ) (1- β 0 – β 1 X )
i i

or Var(Ui) = E(Yi/Xi) [1 – E(Yi/Xi) = Pi (1 – Pi)


This shows that the variance of Ui is heteroscedastic because it depends on the conditional
expectation of Y, which, of course, depends on the value taken by X. Thus the OLS estimator

of β is inefficient and the standard errors are biased, resulting in incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed
them to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc.
But the assumption of normality for U i is no longer tenable for the LPMs because like Y i, Ui
takes on only two values.

Ui = Yi-
β0 – β1 X
i

Now when Yi = 1, Ui = 1 -
β0 – β1 X
i

and when Yi = 0, Ui = –
β0 – β1 X
i

Obviously Ui cannot be assumed to be normally distributed. Recall that normality is not


required for the OLS estimates to be unbiased.
3. Non-Sensical Predictions
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS
estimation of the LPM.
4. Functional Form:

Since the model is linear, a unit increase in X results in a constant change of β in the
probability of an event, holding all other variables constant. The increase is the same regardless

63
of the current value of X. In many applications, this is unrealistic. When the outcome is a
probability, it is often substantively reasonable that the effects of independent variables will
have diminishing returns as the predicted probability approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.
Check Your Progress 7.1
1. Explain the binary or dichotomous variables.
2. Differentiate among binary, ordinal and nominal variables.
3. What is a linear probability model (LPM)? What are the shortcomings of this model?
7.4 THE LOGIT MODEL
We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of

Ui, possibility of
Y^ i lying outside the 0-1 range, and the generally lower R 2 values. But these
problems are surmountable. The fundamental problem with the LPM is that it is not logically a
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Y^ i = -0.9457 + 0.1021X
i

(0.1228) (0.0082)
t = (-7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows
- The intercept of –0.9457 gives the “probability” that a family with zero income will
own a house. Since this value is negative, and since probability cannot be negative, we
treat this value as zero.
- The slope value of 0.1021 means that for a unit change in income, on the average the
probability of owning a house increases by 0.1021 or about 10 percent. This is so
whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi.
Therefore, what we need is a (probability) model that has the following two features:
1. As Xi increases, Pi = E(Y = 1/X) increases but never steps outside the 0-1 interval.

64
2. The relationship between Pi and Xi is non-linear, that is, “ one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large”
Geometrically, the model we want would look something like fig 7.1 below.

1 CDF

X
- 0 
Fig 7.1 A Cumulative Distribution Function (CDF)
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
probability that it takes a value less than or equal to x0, were x0 is some specified numerical
value of X. In short, F(X), the CDF of X, is F(X = x0) = P(X  x0). Please refer to your text
statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Recall that the LPM was (for home ownership)

Pi = E(Y = 1/Xi) =
β 0 +β1 X
i

Where X is income and Y = 1 means the family owns a house. Now consider the following
representation of home ownership.
1
−( β 0 + β1 X i )
Pi = E(Y = 1/Xi) = 1+ e

65
1
Pi = 1+ e
−Z i
where Zi =
β0 +β1 X
i

This equation represents what is known as the (cumulative) logistic distribution function. Since

the above equation is non linear in both the X and the β ’s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.
1
Zi
1 – Pi = 1+ e
Pi 1+ e Zi Z
= −Z i
=e i

1−Pi 1+e
Pi
Now 1−Pi is simply the odds ratio in favor of owning a house- the ratio of the probability that
a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain

( )
Pi
1−Pi β 0 +β1 X
Li = ln = Zi = i

L(the log of the odds ratio) is linear in X as well as β (the parameters). L is called the logit and
hence the name logit model is given to it.
The interpretation of the logit model is as follows:
β 1 – the slope measures the change in L for a unit change in X.

β 0 – the intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
Now for estimation purposes, let us write the logit model as

( )β
Pi
1−Pi 0 + β 1 Xi
Li = ln = + Ui
To estimate the above model we need values of Xi and Li. Standard OLS cannot be applied

( 1
) ( 0
since values of L are meaningless (ex. L = ln 0 and L = ln 1 .
)

66
Therefore estimation is by using the maximum likelihood method. (because of its mathematical
complexities we will not discuss the method here).
Example: Logit estimates. Assume that Y is linearly related to the variables Xi ’s as follows:

Yi =
β0 +β1 X + β2 X + β3 X + β4 X + β5 X + U
1 2 3 4 5 i

The logit estimate results are presented as:


Yi = -10.84 – 0.74X1 – 11.6X2 – 5.7X3 – 1.3X4 + 2.5X5
t = (-3.20) (-2.51) (-3.01) (-2.4) (-1.37) (1.62)
The variables X1, X2 and X3 are statistically significant at 99%. The variable X 4 is significant at
90%. The above estimated result shows that the variables X 1, X2 and X3 have a negative effect

on the probability of an event to occur (i.e., y = 1). While the sign of


β 5 or the variable X has a
5

positive effect on the probability of an event to occur.


Note: Parameters of the model are not the same as the marginal effects we are used to when
analyzing OLS.
7.5 THE PROBIT MODEL
The estimating model that emerges from the normal CDF is popularly known as the probit
model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria.

Define a latent variable Y* such that


Y ¿i = X 1i β + 
I

¿
Y = 1 if
Yi > 0

0 if
Y ¿i  0

The latent variable Y* is continuous (- < Y* < ). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X ’s through the structural
model.
Example:
Let Y measures whether one is employed or not. It is a binary variable taking values 0 and 1.

67
Y* - measures the willingness to participate in the labor market. This changes continuously and
is unobserved. If X is a wage rate, then as X increases the willingness to participate in the labor
market will increase. (Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the critical point.
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e., the
problem of non-normality of the error term and heteroscedasticity)
However, since the latent dependent variable is unobserved the model cannot be estimated
using OLS. Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the probit
(normit) and logit models, respectively. The coefficients derived from the maximum likelihood
(ML) function will be the coefficients for the probit model, if we assume a normal distribution.
If we assume that the appropriate distribution of the error term is a logistic distribution, the
coefficients that we get from the ML function will be the coefficient of the logit model. In both
cases, as with the LPM, it is assumed that E[i/Xi] = 0
In the probit model, it is assumed that Var (i/Xi) = 1. In the logit model, it is assumed that Var
2
(i/Xi) = π /3 . Hence the estimates of the parameters ( β ’s) from the two models are not
directly comparable.
But as Amemiya suggests, a logit estimate of a parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the same parameter. Similarly the coefficients of
LPM and logit models are related as follows:
β LPM = 0.25 β Logit, except for intercept
β LPM = 0.25 β Logit + 0.5 for intercept
Summary
- logit function
(α + βX i )
e 1
=
( α + βX i) −α −βX i
P(Y = 1/X) = 1+ e 1+e (we obtain this by dividing both the numerator
α +βx i

and denominator by e
- Probit function

P(Y = 1/X) =  (- - β Xi)

68
where (.) is the normal probability distribution function

( )
2
1 1 X−μ
exp−
(i.e., σ √ 2 π
2 2 σ
Therefore, it is possible to avoid the problems of nonsensical result and the constancy impact of
X on the dependent variable (i.e. it will not be constant) since both models are non linear.
Check Your Progress 7.2
1. Explain the differences between the LPM and the logit or probit models.
2. Specify the mathematical form of both the probit and logit models.
3. Explain or outline the similarities and differences between the probit and logit models.
7.6 THE TOBIT MODEL
An extension of the probit model is the tobit model developed by James Tobin. To explain this
model, let us consider the home ownership example.
Suppose we want to find out the amount of money the consumer spends in buying a house in
relation to his or her income and other economic variables. Now we have a problem. If a
consumer does not purchase a house, obviously we have no data on housing expenditure for
such consumers; we have such data only on consumers who actually purchase a house.
Thus consumers are divided into two groups, one consisting of say, N 1 consumers about whom
we have information on the regressors (say income, interest rate etc)as well as the regresand
( amount of expenditure on housing) and another consisting of say, N 2 consumers about whom
we have information only on the regressors but on the regressand. A sample in which
information on regressand is available only for some observations is known as a censored
sample. Therefore, the tobit model is also known as a censored regression model.
Mathematically, we can express the tobit model as

Yi =
β 0 + β 1 X + U if RHS > 0
1i i

= 0, otherwise
Where RHS = right-hand side
The method of maximum likelihood can be used to estimate the parameters of such models.
7.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Answers to check your progress questions in this unit are already discussed in the text.
7.10 MODEL EXAMINATION QUESTIONS

69
1. When do we use models like LPM, logit and probit?
2. The LPM is the simplest of the above three models. But it has several limitations.
Discuss.
3. Can we use the standard OLS method to estimate the probit and logit models? Why?
4. Why do we call the tobit model is a censored regression model.
5. Specify the mathematical form of the tobit model and discuss how one can estimate
such models?
UNIT 8: TIME SERIES ECONOMETRICS (A BRIEF INTRODUCTIN)
8.0 AIMS AND OBJECTIVES
The aim of this unit is to extend the discussion of regression analysis by incorporating a brief
discussion of time series econometrics.
After the student have completed this unit, he/she will
 understand concept of stationarity
 formulate and conduct ADF test
 distinguish between trend stationary and difference stationalry process
 understand the relationship between spurious regression and integration
 specify an error correction model.
8.1 INTRODUCTION
Recall from our unit one discussion that one of the two important type of data used in empirical
analysis is time series data. Time series data have become so frequently and intensively used in
empirical research that econometricians have recently begun to pay very careful attention to
such data.
In this very brief discussion we first define the concept of stationary time series and then
develop tests to find out whether a time series is stationary. In this connection we introduce
some related concepts, such as unit roots. We then distinguish between trend stationary and
difference stationary time series. A common problem in regression involving time series data is
the phenomenon of spurious regression. Therefore, an introduction to this concept will be
made. A last the concept of cointegration will be stated and point out its importance in
empirical research.
8.2 STATIONARITY AND UNIT ROOTS

70
Any time series data can be thought of as being generated by a stochastic or random process. A
type of stochastic process that has received a great deal of attention by time series analysis is
the so-called stationary stochastic process.
Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of covariance between two time periods depend only on the
distance or lag between the two time periods and not on the actual time at which the covariance
is computed. A non-stationary series on the other hand, do not have long run mean where the
variable returns and the variance extends to infinity as time goes by.
For many of time series data, however, stationarity is unlikely to exist. If this is the case, the
conventional hypothesis testing procedure based on t, F, Chi-square and other tests may be
suspected. In other words, if variables in the model are non-stationary, it results in spurious
regression. That is, the fact that the variables share common trend will tend to produce
significant relationship between the variables. Nonetheless, the relationship exhibits
contemporaneous correlation as a result of common trend rather than true causal relationship.
Hence, with non-stationary variables, conducting OLS generate misleading result.
Studies have developed different mechanism that enable non-stationary variables attain
stationalrity. It has been argued that if a variable has deterministic trend (i.e. if it can be
perfectly predictable rather than being variable or stochastic), including trend variable in the
regression removes the trend component and makes it stationary. For example in the regression
of consumption expenditure (PCE) an income (PDI) if we observe a very high r 2, which is
typically the case, it may reflect, not the true degree of association between the two variables,
but simply the common trend present in them. That is, with time the two variables move
together. To avoid such spurious association, the common practice is to regress PCE on PDI
and t(time), the trend variable. The coefficient of PDI obtained from this regression now
represents the net influence of PDI on PCE, having removed the trend effect. In other words,
the explicit introduction of the trend variable in the regression has the effect of detrending (i.e.,
removing the influence of trend from) both PCE and PDI. Such process is called trend
stationary since the deviation from the trend is stationary.
However, most time series data have a characteristic of stochastic trend (that is, the trend is
variable which therefore, cannot be predicted with certainty). In such cases, in order to avoid
the problem associated with spurious regression, pre-testing the variables for the existence of

71
unit roots (i.e., non stationarity) becomes compulsory. In general if a variable has stochastic
trend, it needs to be differenced in order to obtain stationarity. Such process is called difference
stationary process.
In this regard, the Dickey Fuller (DF) test enables us to assess the existence of stationarity. The
simplest DF test starts with the following first order autoregressive model.
Yt = Yt-1 + Ut ..(8.1)
Subtracting Yt-1 from both sides gives
Yt -Yt-1 = Yt = Yt-1 - Yt-1 + Ut Yt = Yt-1 +
= (-1)Yt-1 + Ut
Yt = Yt-1 + Ut (8.2)
where Yt = Yt -Yt-1,  =  - 1
The test for stationarity is conducted on the parameter . If  = 0 or ( = 1) it implies that Yt =
Ut and hence the variable Y is not stationary (has unit root). In times series econometrics, a
time series that has a unit root is known as a random walk. This is because the change in Y
(Yt) is purely a result of the error term, U t. Thus, a random walk is an example of non-
stationary time series.
For the test of stationerity the hypothesis is formulated as follows:
H0:  = 0 or ( = 1)
H1:  < 0 or ( < 1)
Note that (8.2) is appropriate only when the series Y t has a zero mean and no trend term. But it
is impossible to know whether the true value of Y t has zero mean and no trend term. For this
reason including a constant (drift) and time trend in the regression is recommended. Thus (8.2)
is expanded to the following form.
Yt =  + Yt-1 + T + Ut .(8.3)
where  = constant term, T = the trend element.
Here as well the parameter  is used while testing for stationerity. Rejecting the null hypothesis
(of H0:  = 0) implies that there exists stationerity. That is Yt is also influenced by Yt-1 in
addition to Ut. Thus, the change in Yt (i.e., Yt) does not follow a random walk. Note that
accepting the null hypothesis is suggests the existence of unit root (or non stationarity)

72
The DF test has a series limitation in that it suffers from residual autocorrelation. Thus, it is
inappropriate to use DF distribution with the presence of a utocorrelated errors. To amend this
weakness, the DF model is augmented with additional lagged first difference of the dependent
variable. This is called Augmented Dicky Fuller (ADF). This regression model avoids
autocorrelation among the residuals. Incorporating lagged first difference of Y t in (8.3) gives
the following ADF model.
k
∑ ΔY t −i +U t
Yt =  + T + Yt-1 + i i=1 ........................ .(8.4)
where k is the lag length
Now, the test for stationarity is free from the problem of residuals autocorrelation. Thus the
hypothesis testing (just like the above) can be conducted.
Example: Let us illustrate the ADF test using the Personal Consumption Expenditure (PCE)
data of Ethiopia suppose that regressions of PCE that corresponds to (8.4) gave following
results:
PCE = 233.08 + 1.64t – 0.06PCEt-1 + PCEt-1 (8.5)
For our purpose the important thing is  (taw) statistic of PCEt-1 variable. This is a table that
helps to test the hypothesis stated earlier. Suppose the calculated  value do not exceeds its
table value, in this case we fail to reject the null hypothesis which indicates the PCE time series
is not stationary. Thus, if it is not stationary, using the variable at levels will lead to spurious
regression result. As has been stated earlier, if a variable is not stationary at levels, we need to
conduct the test on the variable in its difference form. If a variable that is not stationary in
levels appears to be stationary after nth difference then the variable is said to be integrated
order of n, symbolically we write I(n). Suppose we repeat the preceding exercise using the first
difference of PCE (i.e., PCEt = PCEt – PCEt-1as explanatory variables). If the test result allows
us to reject the null hypothesis we conclude that PCE is integrated of order one, I(1). Note from
our discussion that application of OLS in stationary variables will bring about non-spurious
result. Therefore, before regression is performed that make use of time series variables, the
stationarity of all variables must first be checked.
Note that taking the variables in difference form presents only the dynamic interaction among
the variables with no information about the long run relationship. However, if the variables that
are non stationary separately have the same trend, it points that the variables have a stationary

73
linear combination. This in turn implies that the variables are cointegrated, i.e., there exists
long run equilibrium (relationship) among the variables.
Check Your Progress 1
1. Distinguish between trend stationary process (TSP) and a difference stationary process
(DSP)?
2. What is meant by stationarity and unit roots?
3. What is meant by integrated time series?
4. Discuss the concept of spurious regression
5. Explain the concept of ADF tests
8.3 COINTEGRATION ANALYSIS AND ERROR CORRECTION MECHANISM
Cointegration among the variables reflects the presence of long run relationship in the system.
We need to test for cointegration because differencing the variables to attain stationarity
generates a model that does not show the long run behavior of the variables. Hence, testing for
cointegration is the same as testing for long run relationship.
There are two approaches used in testing for cointegration. They are i) Engle-Granger (two-step
algorism) and ii) Johansen Approach.
The Engle-Granger (EG) method requires that for cointegration to exist, all the variables mush
be integrated of the same order. Once the variables are found to have the same order of
integration, the next step is testing for cointegration. This needs to generate the residual from
the estimated static equation and test its stationarity. By doing so we are testing whether the
deviation from the long run (captured by the error term) from the long run are stationary or not.
If the residuals are found to be stationary, it implies that the variables are cointegrated. This in
turn ensures that the deviation from the long run equilibrium relationship dies out with time.
Example: Suppose we regress PCE on PDI to find out the following estimated relationship
between the two.
PCEt = 0 + 1PDI + Ut (8.6)
To identify whether PCE and PDI are cointegrated (i.e., have stationary linear combination) or
not we write (8.6) as follows
Ut = PCEt-1 - 0 - 1PDI .... (8.7)

74
The purpose of (8.7) is to find that U t [i.e., the linear combination (PCE - 0 - 1PDI)] is I(0) or
stationary. Using the procedure stated in the earlier sub tunit for testing stationarity, if we reject
the null hypothesis then we say that the variables PCE and PDI are cointegrated.
If variables are cointegrated, the regression on the levels of the two variables as in (8.6), is
meaningful (i.e., not spurious); and we do not lose any valuable long term information, which
would result if we were to use their first differences instead.
In short, provided we check that the residuals are stationary, the traditional regression
methodology that we have learned so far (including t and F tests) is applicable to data involving
time series.
We just showed that PCE and PDI are cointegrated, that is there is a long-term equilibrium
relationship between the two. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in (8.7) as the “equilibrium error ”. We can use this error
term to tie the short-run behavior of PCE to its long run value. In other words, the presence of
cointegration makes it possible to model the variables (that are in first difference) through the
error correction model (ECM). In the model a one time lagged value of the residual hold the
error correction term where its coefficient captures the speed of adjustment to the long run
equilibrium. The following model specification show with the PCE/PDI example how the ECM
works
ΔP C^ Et =  +  PDI +  U^ t−1 +  ................................(8.8)
0 1 t 2 t

where
U^ t−1 is the one period lagged value of the residual from regression (8.6) and  is the
t

error term with the usual properties.

In (8.8) PDI captures the short run disturbances in PDI whereas the error correction term
U^ t−1

captures the adjustment toward the long-run equilibrium. If 2 is statistical significant (and has
to be negative between 0 and –1), it tells us what proportion of the disequilibrium in PCE in
one period is corrected in the next period.
Example: Suppose we obtain the following result
ΔP C^ Et = 11.69 + 0.29PDI – 0.08U^ t−1 ..(8.9)
t

(5.32) (4.17) (-2.3)


where the figures in parenthesis at t-values

75
The result shows that short-run changes in PDI have significant positive effects on PCE and
that about 0.09 (or 9%) of the discrepancy (or deviation) between the actual and the long run,
or equilibrium, value of PCE is eliminated or corrected at each year (Note that the error
correction term captures the speed of adjustment to the long run equilibrium.
However, the use of Engle Granger method is criticized for its failure on some issues that are
addressed by the Johansen Approach. Interested readdress can get a detailed discussion of this
advanced approach on Harris (1995).
Check Your Progress 2
1. Discuss the concept of cointegration
2. Explain the error correction mechanism (ECM) what is its relation with cointegration
8.5 ANSWER TO CHECK YOUR PROGRESS
The answers for all questions are found in the discussion under sub units 8.2 and 8.3
8.6 MODEL EXAMINATION
1. Outline the Engle Granger Method for cointegration
2. A time series that has a unit root is called a random walk. Explain
Discuss the following
3. Why do we need to incorporate a one period lagged value of the error term in the ECM.

4. Suppose the value of


U^ t−1 is ECM regression is -0.47. Interpret the result.
Name________________________
Id. No_______________________
P.O.Box______________________
City (Town)___________________
Region (Zone)_________________
Wolaita Sodo University
DEPARTMENT OF AGRICULTURAL ECONOMICS
Worksheet for Introduction to Econometrics (Econ. 331)
This is a test paper you are expected to do on your own. It carries 15 points. The test paper
should be completed and mailed to the School of Distance and Continuing Education for
evaluation. Do not try to complete the worksheet until you have covered all the lessons and
exercises in the course material.

76
Any questions in the course that you have not been able to understand should be stated on a
separate sheet of paper and attached to this worksheet. Your tutor will clarify them for you.
After completing this test paper, be certain to write your Name, Id.No and Address on the first
page. Only your Name and Id.No on the other pages.
Part I: Attempt any three of the following
1. Econometrics is considered as an integration of economic theory, mathematical
economics and statistics but entirely different form each one of the them. Explain.
2. The following represents the true relationship between the independent variables
X1, X2, X3, and the dependent variable Y
Yi= bo+b1X1i+b2X2i+b3X3i+Ui
where Y=Quantity demanded
X1=price of the commodity.
X2=price of the other commodities
X3=Income of the consumer
Ui=disturbance term
i) Is the above relation exact? Why?
ii) What is the economic meaning of the coefficients?
iii) What will be the expected sign of the coefficients?
iv) What will be the expected size (magnitude) of the coefficients?
3. When do we use models like LPM, logit and probit?
4. The LPM is the simplest of the above three models. But it has several limitations. Discuss.
5. Explain the concept of simultaneous equation. When do we need it
6. What is spurious regression.
Part II: Workout Questions attempt any three of the following
1. There are occasions when the two variable linear regression model assumes the
following form:
Yi=Xi+Ei
where  is the parameter and E is the disturbance term. In this model the intercept term
is zero. The model is therefore known as regression through the origin.
For this model show that

77
¿

β=
∑XY i i

∑X =
2

i)The least squares estimator i

2
σ ∑ ei
¿ 2

( β )= u 2 2 =
∑ Xi
2

ii) Var where σ u is estimated by σ u n−1 ,


ei represents the residuals.
2. The following data is based on the consumption expenditure and incomes for 15
households at a particular month.
X=1922 Y=1823 XY=1,838,678
X2=2,541,333 Y2=1,789,940
i) Obtain marginal propensity to consume and level of autonomous consumption
ii) Construct a 95% confidence interval for the coefficients
iii) Test the significance of the coefficients at 5% level
iv) Comment on goodness of fit.
3. The quantity demanded of commodity is assumed to be a linear function of its price X.
the following results has been obtained from a sample of 10 observations.
Price in 15 13 12 12 9 7 7 4 6 3
Birr (x)
Quantity in 760 775 780 785 790 795 800 810 830 840
kg
Making use of the above information
i) Estimate the linear relationship and interpret the results.
ii) Estimate the standard errors of the regression coefficients.
iii) What percent of the variation in quantity demanded is explained by the regression
line?
iv) Compute the price elasticity of demand at X=12 and Y=780
v) What is the average price elasticity of demand?
vi) Forecast the demand at a price level of 10 birr and set a 95% confidence limit for
the forecasted value.

78
vii) Test the significance of the regression coefficients.
viii) Conduct tests of significance for r and R2
ix) Present the result of your analysis.
4. Given the following observations on output Y 1 labor input (x1) and capital input (x2) for 12
firms. (X1, X2, and Y are measured in arbitrary units).
a) Firms 1 2 3 4 5 6 7 8 9 10 11 12
Output 14 18 23 39 24 60 56 65 76 15 27 35
Labor input 11 13 22 45 30 60 62 57 76 15 28 34
Capital input 30 24 31 27 31 56 42 90 80 18 30 20

Estimate the multiple linear regression model using the OLS


b) Fit the CobbDouglas production function to the above data
c) Compare and contrast each function with respect to labor and capital elasticities,
marginal products, and the response of out put to he inputs.
d) Which of the two functions are satisfactory on the basis of economic theory and
traditional statistical theory?
e) What do your results suggest regarding returns to scale?
f) Test the assumption of constant returns to scale.
5. The following table shows the levels of output (Y). labour in put (X 1) and capital input (X2)
of 12 firms measured in arbitrary units.
X2=110  X21 =980 YX1=40,834
X1=647 X12=34,843 YX2=6,7100
Y=757 Y2=48,143 X1X2=5,783
i) Estimate the output function Y=bo+b1X1+b2X2+U
ii) What is the economic meaning of the coefficients
iii) Compute the standard errors of the coefficients
iv) Run tests of significance of the coefficients
v) Compute the coefficient of multiple determination and interpret it
vi) Conduct the overall significance test and interpret your result.

79

You might also like