Violations of Econometric Assumptions
Violations of Econometric Assumptions
REGRESSION)
4.0 Aims and objectives
The aim of this unit is to show the reader what is meant by violation of basic econometric
assumption that formed the basis of the classical linear regression model. After the student have
completed this unit he/she will understand:
the sources of the variation
consequences of the problem
the various ways of detecting the problem
the alternative approaches in solving the problem
4.1 Introduction
Recall that in the classical model we have assumed
a) Zero mean of the random term
b) Constant variance of the error term (i.e., the assumption of homoscedasticity)
c) No autocorrelation of the error term
d) Normality of the error term
e) No multicolinearity among the explanatory variable.
It was on the basis of these assumptions that we try to estimate the model, and test the
significance of the model. But the question is what would be the implication if some or all of
these assumptions are violated. That is, if the assumptions are not fulfilled what will be the
outcome? In this unit we will discuss issues in violation of some of the assumptions that are
more important.
4.2 The assumption of zero expected disturbances
This assumption is imposed by the stochastic nature of economic relationships, which
otherwise it would be impossible to estimate with the common rule of mathematics. The
assumption implies that the observations of Y and X must be scattered around the line in a
^ ^
random way (and hence the estimated line Y^ = β 0 + β 1 X be a good approximation of the true
line.) This defines the relationship connecting Y and X on the average . The alternative
possible assumptions are either E(U) > 0 or E(U) < 0. Assume that for some reason the U s had
not an average value of zero, but tended most of them to be positive. This would imply that the
observation of Y and X would lie above the true line.
1 Compiled Fikadu.E
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased
The above figure shows that the estimated line Y^ is not a good approximation to the true line,
E(Y)
Note that there is no test for the verification of this assumption because the assumption E(U) =
0 is forced upon us if we are to establish the true relationship. That is, we set E(U) = 0 at the
outset of our estimation procedure. Its plausibility should be examined in each particular case
on a priori grounds. In any econometric application we must be sure that the following things
are fulfilled so as to be safe from violating the assumption of E(U) = 0
i) All the important variables have been included into the function.
ii) There are no systematically positive or systematically negative errors of measurement in the
dependent variable.
2
associated disturbance. Various examples can be stated in support of this argument. For
instance, if consumption is a function of the level of income, at higher levels of income (the
independent variable) there is greater scope for the consumer to act on whims and deviate by
larger amounts from the specified consumption relationship. The following diagram depicts this
case.
Cons
Income
Low High
income income
3
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not
be constant. But if the omitted variables are included in the model, the impression may
disappear.
In summary we may say that on a priori grounds there are reasons to believe that the
assumption of homoscedasticity may often be violated in practice. It is therefore, important to
examine the consequences of hetroscedaticity.
C) The consequence of Hetrodcedasticity
If the assumption of homoscedastic disturbance is not fulfilled we have the following
consequences:
i) If U is hetroscedastic, the OLS estimates do not have the minimum variance property in
the class of unbiased estimators; that is, they are inefficient in small samples.
Furthermore, they are inefficient in large samples
ii) The coefficient estimates would still be statistically unbiased. That is the expected value
4
^2
independent variable to which it is suspected the disturbance variance is related. Although U i
are not the same thing as Ui2, they can be used as proxies especially if the sample size is
sufficiently large.
^2
The following figure shows the plot of U i against
Y^ i , the estimated Yi from the regression line
or X, the idea being to find out whether the estimated mean value of Y or X is systematically
related to the squared residual.
In figure (a) we see that there is no systematic relationship between the two variables,
suggesting that perhaps no hetrodcedasticity is present in the data. Figure (b) and (c) however,
suggests a linear relationship between the two variables particularly figure (c) reveals or
suggests that the hetroscedastic variance may be proportional to the value of Y or X. Figure (d)
5
^
U
disregarding the hetroscedasticity question. We obtain i from this regression, and then in the
[2
∑ d 2i
rs = 1 6 n (n −1)
] .............................................(4.5)
Where di = difference in the rank assigned to two different characteristics of the i th individual or
phenomenon and n = number of individuals or phenomena ranked. The steps required in this
test is stated as follows
Assume Yi = 0 + 1Xi + Ui
6
^
U
Step 1. Fit the regression to the data on Y and X and obtain the residuals i
^
U ^ ^
Step 2. Ignoring the sign of i , that is, taking their absolute value | U i |, rank both |U i |
and Xi (or
Y^ i ) according to an ascending or descending order and compute the
spearmans rank correlation coefficient given previously, (4.5).
Step 3. Assuming that the population rank correlation coefficient s is zero and n > 8,
the significance of the sample rs can be tested by the t test as follows:
r s √ n−2
If the computed t value exceeds the critical t value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it. If the regression model involves more than one X
^
U
variable, rs can be computed between | i | and each of the X variable separately and can be
7
TOTAL 0 110
rs =
[
110
1 6 10(100−1 )
]
= 0.33
Applying the t-test given in (4.6), we obtain:
(0 . 33) √ 8
t = √1−0 .11
= 0.99
Note that for 8 (=10-2) df this t-value is not significant even at the 10% level of significance.
Thus, there is no evidence of systematic relationship between the explanatory variable and the
absolute value of the residuals, which might suggest that there is no hetroscedasticity.
iv) The Gold feld Quandt Test
This test is applicable to large samples. The observation must be at least twice as many as the
parameters to be estimated. The test assumes normality and serially independent disturbance
term, Uis. Consider the following:
Yi = 0 + 1X1i + 2X2i + kXki + Ui
Furthermore, suppose that the test is to assess whether there exists hetroscedasticity or not. The
hypothesis to be tested is
H0: Uis are homoscedastic
H1: Us are hetrodcedastic (with increasing variance)
To test this, Goldfeld-Quandent perform the following steps
Step I: the observations are ordered according to the magnitude of the independent variable
thought to be related to the variance of the disturbances.
Step II: a certain number of central observations (represented by c) are omitted, leaving two
equal-sized groups of observations, one group corresponding to low values of the chosen
independent variable and the other group corresponding to high values. Note that the
8
observations are omitted to sharpen or accentuate the difference between the small variance and
the large variance group.
Step III. we fit separate regression to each sub-sample, and we obtain the sum of squared
residuals from each of them and the ratio of their sum of squared residuals is formed. That is,
^2
U i = residuals form the sub-sample of low values of X 1 with [(n-c)/2] k degrees of
freedom, where k is the total number of parameters in the model.
^2
U i = residual from the sub sample of high values of X, with the sample degree of freedom,
[(n-c)/2] k
If each of these sums is divided by the appropriate degrees of freedom, we obtain estimates of
^
the variances of the U ' s in the two sub samples.
Step IV : Compute the ratio of the two variances given by
∑ U 22 /[ {( n−c ) /2 }−k ] = ∑ U^ 22
F = ∑ U 1 / [ { ( n−c ) /2 }−k ] ∑ 1
* ^2 U^ 2
.........................................(4.7)
has an F distribution (with numerator and denomenator each [{n-c-2k}/2] degrees of freedom,
where n = total number of observations, c = central observations omitted, k = number of
parameters estimated from each regression). If the two variances are the same (that is, if the
U^ ' s are homoscedasticc) the value of F* will tend to one. If the variance differ, F * will have a
^2 ^2
large value (given that by the design of the test U 2 > U 1 . Generally, the observed F* is
compared with the theoretical value of F with (n-c-2k)/2 degrees of freedom (at a chosen level
of significance. The theoretical value of F (obtained from the F-tables) is the value of F that
defines the critical region of the test.
If F* > F we accept that there is hetroscedasticity (that is we reject the null hypothesis of no
difference between the variances of Us in the two sub samples). If F * < F, we accept that the
Us are homoscedastic (in other words we accept the null hypothesis). The higher the observed
F* ratio the stronger the hetrodcedasticity of the Us.
Example: Suppose that we have data on consumption expenditure in relation to income for a
cross section of 30 families. Suppose we postulate that consumption expenditure is linearly
related to income but that hetoscedasticity is present in the data. Suppose further that the
middle 4 observations are dropped after the necessary reordering of the data. Suppose we
9
obtain the following result after we perform a separate regression based on the two 13
observations.
[ ]
1536 .8
11
377 .17
F* = 11
F* = 4.07
Note from the F- table in the appendix that the critical F value for 11 numerator and 11
denominator df at the 5% level is 2.82. Since the estimated F* value exceeds the critical value,
we may conclude that there is hetroscedasticity in the error variance.
Note, however, that the ability of the Goldfeld-Quadent test to perform successfully depends on
how c is chosen. Moreover, its success depends on identifying the correct X (i.e., independent)
variable with which to order the observations. This limitation of this test can be avoided if we
consider the Breusch-Pagan Godfrey (BPG) test.
V) Breusch-Pagan Godfrey (BPG) test
This test is relevant for a very wide class of alternative hypotheses, normally that the variance
is some function of a linear combination of known variables. The generality of this test is both
its strength (that it does not require prior knowledge of the functional form involved).
To illustrate this test, consider the k-variable linear regression model
Yi = 0 + 1X1i + + kXki + Ui ..........................................(4.8)
Assume that the error variance i2 is described as
i2 = f(1 + 2Z2i + + mZmi) ..........................................(4.9)
that is, i2 is some function of the non-stochastic variables Zs. some or all of the X s can serve
as Zs. Specifically, assume that
i2 = 0 + 1Z1i + + mZmi ..........................................(4.10)
that is, i2 is a linear function of the Zs
If 1 = 2 = = m = 0, i2 = 0 which is constant. Therefore to test whether i2 is
homoscedastic, one test the hypothesis that 1 = 2 = = m = 0. This actual test procedure is
as follows.
10
~
Step2. Obtain σ =
2
∑ U^ 2i /n . Note that this is the maximum likelihood estimator of .2
(Recall from unit two previous discussion that the OLS estimator i 2
is ∑ ^ 2 /( n−k )
U i
11
Pi= -0.74 + 0.01Xi
ESS = 10.42
Step 5 = ½ (ESS) = 5.21
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus,
the observed Chi square value is significant at 5% level of significance.
Note that BPG test is asymptotic. That is, it is a large sample test. The test is sensitive in small
samples with regard to the assumption that the disturbances Vi are normally distributed.
D) Remedial Measures Solutions for Hetroscedasticic Disturbances
As we have seen, hetroscedacticity does not destroy the unbiasedness and consistency
properties of the OLS estimators, but they are no longer efficient, not even asymptotically (i.e.,
large sample size). This lack of efficiency makes the usual hypothesis testing procedure of
dubious value. Therefore, remedial measures are clearly called for. When hetroscedasticity is
established on the basis of any test, the appropriate solution is to transform the original model
in such a way as to obtain a form in which the transformed disturbances term has constant
variance. We then may apply the method of classical least squares to the transformed model.
The adjustment of the model depends on the particular form of homoscedasticity. Note that the
transformation is based on the assumption of the form of hetroscedasticity plausible
assumptions about hetrodcedasticity pattern
Assumption one: Given the model Yi = 0 + 1Xi + Ui
Suppose that we assume the error variance is proportional to Xi2. That is,
E(Ui2) = 2Xi2
If, as a matter of speculation, or graphical methods it is believed that the variance of U i is
proportional to the square of the explanatory variable X. For example suppose that graphical
inspection provides the following result
It is believed that the variance of U i is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi β0 Ui
= + β 1+
Xi Xi Xi
= 0
( )
1
Xi
+ 1 + Vi ............................................... (4.11)
12
where Vi is the transformed disturbance term, equal to Ui/Xi. Now it is easy to verify that
( )
2
Ui
Xi
E(Vi2) = E
1
= 2
E( U 2i )
Xi
Given this it can be concluded that
1
2
E( U 2i )
Xi = 2
In this case the original model can be transformed by dividing the model with √ X i . That is,
Yi β0 Xi Ui
= +β +
√ Xi √ Xi √ Xi √ Xi
1
β0 + β √ X +V
= √ x i 1 i i = Y* = 0* + 1*Xi + Vi ............................(4.12)
13
where Vi =
U i √ X i and X > 0
i
Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is
[√ ]
2
Ui
E
Xi
Var (Vi) = E(Vi2) =
1
= X i E(Ui2)
Since by assumption we said E(Ui2) = 2Xi. It implies that
1
Var (Vi) = X i 2Xi = 2
Therefore, one may proceed to apply OLS to the transformed equation. Note an important
feature of the transformed model: It has no intercept term. Therefore, one will have to use the
regression through the origin model to estimate 0 and 1. Having run regression on the
transformed model (4.12) one can get back to the original model simply by multiplying it with
√ Xi
14
and X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large
samples when regression is conducted in transformed variables.
Check Your Progress 1
1. State with brief reason whether the following statements are true, false, or uncertain
a) In the presence of hetroscedasticity OLS estimators are biased as well as
inefficient
b) If hetroscedasticity is present, the conventional t and F tests are invalid
2. State three consequences of hetroscedasticity
3. List and explain the BPG test
4. Suppose that you have data of personal saving and personal income of Ethiopia for 31
year period. Assume that graphical inspection suggest that U i's are hetroscedasticso so
that you wanted to employ the Gordfield Quandt test. Suppose you ordered the
observation in ascending order of income and omit the nine central observations.
Applying OLS to each subset, you obtained the following result.
a) For sub set I
S^ 1 = -738.84 + 0.008I
i
^2
U 1 = 144,771.5
b) For Sub set II
S^ 2 = 1141.07 + 0.029I
^2
U 2 = 769,899.2
Is there any evidence of hetroscedasticity?
4.4 AUTOCORRELATION
A. The Nature of Autocorrelation
An important assumption of the classical linear model is that there is no autocorrelation or
serial correlation among the disturbances U i entering into the population regression function.
This assumption implies that the covariance of Ui and Uj in equal to zero. That is:
Cov(UiUj) = E{[Ui E(Ui)] [Uj E (Uj)]
= E(UiUj) = 0 (for i j)
15
But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.
i) Spatial autocorrelation: In regional cross-section data, a random shock affecting
economic activity in one region may cause economic activity in an adjacent region to
change because of close economic ties between the regions. Shocks due to weather
similarities might also tend to cause the error terms between adjacent regions to be related.
ii) Prolonged influence of shocks: In time series data, random shocks (disturbances) have
effects that often persist over more than one time period. An earth quick, flood, strike or
war, for example, will probably affect the economys operation in periods.
iii) Inertia: past action often have a strong effect on current actions, so that a positive
disturbance in one period is likely to influence activity in succeeding periods.
iv) Data manipulation published data often undergo interpolation or smoothing, procedures
that average true disturbances over successive time periods.
v) Misspecification: An omitted relevant independent variable that is auto correlated will
make the disturbance (associated with the misspecified model) auto correlated. An
incorrect functional form or a misspecification of the equation s dynamics could do the
same. In these instances the appropriate procedure is to correct the misspecification.
Note that autocorrelation is a special case of correlation. Autocorrelation refers to the
relationship not between two (or more) different variables, but between the successive values of
the same variable (where in this section we are particularly interested in the autocorrelation of
the Us. Moreover, note that the term autocorrelation and serial correlation are treated
synonymously.
Since auto correlated errors arise most frequently in time series models, the discussion in the
rest of this unit is couched in terms of time series data.
There are a number of time-series patterns or process that can be used to model correlated
errors. The most common is what is known as the first order autoregressive process or AR(1)
process. Consider
Yt = 0 + 1Xt + Ut
where t denotes data or observation at time t (i.e., a time series data) with this one can assume
that the disturbances are generated as follows
Ut = Ut-1 + t
16
Where is known as the coefficient of auto covariance and where t is the stochastic such that
it satisfies the standard OLS assumptions, namely
E(t) = 0
Var(t) = 2
Cov (t, t+s) = 0
where subscript s represent the exact period of lag.
The above specification is of first order because the regression of U t is on itself lagged one
period (where the coefficient is the first order coefficient of autocorrelation) Note that the
above specification postulates that the movement or shift in Ut consists of two parts: a part
Ut-1, which accounts for systematic shift, and the other t which is purely random.
Relationships between Uts
Cov (Ut, Ut-1) = E[(Ut E(Ut) (Ut-1 E(Ut-1)]
= E[Ut Ut-1]
by substituting Ut = Ut-1 + t we obtain:
E[(Ut-1 + t) Ut-1]
= E[U2t-1] + E[t Ut-1]
Note that E(t) = 0 thus E(t Ut-1) = 0
Since with the assumption of homoscedasticity (i.e., constant variance) Var(U t) = Var (Ut-1) =
u2 the result would be
Cov (Ut, Ut-1) = u2
Now, correlation of Ut, Ut-1 is given by (recall what we have discussed in the course statistics
for economists):
Cov (U t , U t−1 ) ρσ 2u
=
Corr (Ut, Ut-1) = √Var (U t ) Var (U t−1 )
Var (U t )
ρσ 2u
2
=ρ
= σu
where -1 < < 1
Hence, (rho) is simple correlation of the successive errors of the original model.
Note that when > 0 successive errors are positively correlated and when < 0 successive
errors are negatively correlated. It can be shown that corr(U t, Ut-s) = s (where s represents the
17
exact period of lag). It implies that the correlation (be it negative or positive) between any two
period diminishes as time goes by; i.e., as s increases
b) Consequences of Autocorrelation
When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates are affected.
i) If disturbances are correlated, the prevaild value of the disturbances have some information
to convey about the current disturbances. If this information is ignored it is clear that the
sample data is not being used with maximum efficiency. However the estimates of the
parameters do not have the statistical biase even when the residuals are serially correlated.
That is, the parameter of OLS estimates are statistically unbiased in the sense that their
expected value is equal to the true parameter.
ii) The variance of the random term U may be seriously underestimated. In particular, the
under estimation of the variance of U will be more serious in the case of positive
autocorrelation of the error term (U t). With positive first-order auto correlated errors it
implies that fitting an OLS estimating line clearly gives an estimate quite wide of the mark.
The high variation in these estimates will cause the variance of OLS to be greater than it
would have been had the errors been distributed randomly. The following figure illustrates
positive autocorrelated errors
Notice from the diagram that the OLS estimating line gives a better fit to the data than the true
relationship. This reveals why in this contest r 2 is overestimated and u2 (and the variance of
^
OLS) is under estimated. When the standard error of β ' s are biased down wards, it leads to
confidence intervals which are much narrow. Moreover, parameter estimate of irrelevant
explanatory variable may be highly significant. In other words, the figure reveals that the
iii) The prediction based on ordinary least squares estimate will be inefficient with
outocorrelated errors. This is because of having a larger variance as compared with predictions
based on estimates obtained from other econometric techniques. Recall that the variance of the
18
forecast depends on the variances of the coefficient estimates and the variance of U. Since these
variances are not minimal as compared with other techniques, the standard error of the forecast
(from OLS) will not have the least value, due to autocorrelated Us.
the residual
U^ t which can be obtained form the usual OLS procedure. The examination of U^ t
can provide useful information not only about autocorrelation but also about hetrescedasticity,
model inadequacy, or specification bias.
i) Graphical Method
Some rough idea about the existence of autocorrelation may be gained by plotting the residuals
either against time or against their own lagged variables.
For instance, suppose plotting the residual against its lagged variable bring about the following
relationship.
Û t
Uˆ t 1
Figure 4.9
U^ t and U^ t−1
As the above figure reveals most of the residuals are bunched in the first and the third
quadrants suggesting very strongly that there is positive correlation in the residuals. However,
the graphical method we have just discussed is essentially subjective or qualitative in nature.
But there are quantitative tests that can be used to supplement the purely qualitative approach
ii) Durbin-Watson d Test
19
The most celebrated test for detecting serial correlation is the one developed by statisticians
Durbin and Watson. It is popularly known as the Durbin-Watson d-Statistic which is defined as
n
∑ (U^ t −U t −1)2
t =2
n
∑ U^ 2t
d= t −2 ............................................(4.13)
which is simply the ratio of the sum of squared differences in successive residuals to the
residual sum of squares, RSS. Note that in the numerator of the d statistic the number of
observations is n-1 because one observation is lost in taking successive differences. Note that
expanding the above formula allows us to obtain
20
autocorrelation we can expect equation (4.14) to take a value close to 2, when negative
autocorrelation is present a value in excess of 2 and may be as high as 4, and when positive
autocorrelation is present a value lower than 2 and may be close to zero.
The Durbin-Watson test tests the hypothesis that H 0: = 0 (implying that the error terms are
not autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory
variables k and also on the actual sample values of the explanatory variables. Thus, the critical
values at which we might, for example reject the null hypothesis at 5 percent level of
significance depend very much on the sample we have chosen. Notice that it is impracticable
to tabulate critical values for all possible sets of sample values. What is possible however, is for
given values of n and k, to find upper and lower bounds such that actual critical values for any
set of sample values will fall within these known limits. Tables are available which give these
upper and lower bounds for various levels of n and k and for specified levels of significance.(In
the appendices part you can get the Durbin Watson table)
The Durbin-Watson test procedure in testing the null hypothesis of = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d *, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of
d with in which the actual sampling distribution must lie whatever the sample x-values.
du
d
dL
d*L d* d*u 4
Figure 4.10 Distribution of dL and dU
21
The point d*U and d*L are such that the areas under the respective d u and dL curves to the left of
these points are in each case 5 percent of the total area. i.e., p(d L < d*L) = p(dU < d*U) = 0.05. It
is the point d*U and d*L, representing the upper and lower bounds to the unknown d *, that are
tabulated for varying values of n and k. Clearly, if the sample value of the Durbin-Watson
statistic lies to the left of d *L it must also lie to the left of d *, while if it lies to the right of d *U . it
must also lie to the right of d *. However, there is an inconclusive region, since if d lies
between d*L and d*U we cannot know whether it lies to the left or right of d*
The decision criterion for the Durbin-Watson test is therefore, of the following form
- for d < d*L reject the null hypothesis of no autocorrelation in favor of positive
autocorrelation;
- for d > d*U do not reject null hypothesis, i.e., insufficient evidence to suggest positive
autocorrelation;
- for d*L < d < d*U test inconclusive.
Because of the symmetry of the distribution illustrated in the previous figure it is also possible
to use the tables for d*L and d*U to test the null hypothesis of no autocorrelation against the
alternative hypothesis of negative autocorrelation, i.e. < 0. The decision criterion then takes
the form.
- for d > 4 - d*L reject the null hypothesis of no autocorrelation in favor of negative
autocorrelation.
- for d < 4 - d*U do not reject null hypothesis, i.e., insufficient evidence to suggest negative
autocorrelation
- for 4 - d*L > d > 4- d*U test inconclusive.
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two
tail tests. The following representation explains better the actual test procedure which shows
that the limit of d are 0 and 4.
Note:
H0: No positive autocorrelation
H0*: No Negative autocorrelation
Note that from the above presentation we can develop the following rule of thumb. That is, if d
is found to be closer to 2 in an application, one may assume that there is no first order
autocorrelation either positive or negative. If d is closer to 0 it is because is closer to 1
indicating strong positive autocorrelation in the residuals. Similarly the closer d is to 4, the
greater the evidence of negative serial correlation. This is because is closer to 1.
Example: Suppose in a regression involving 50 observations 4 regressors the estimated d was
1.43. From the Durbin Watson table we find that at the 5% level the critical d value are d L =
1.38 and dU = 1.72 (the reader should check this by refering the Durbin Watson table attached in
the appendix). Note that on the basis of the d test we can not say whether there is positive
autocorrelation or not because the estimated d value lies in the indecisive range
d) Remedial Measure
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measure.
If the source of the problem is suspected to be due to omission of important variables, the
solution is to include those omitted variables. Besides if the source of the problem is believed
to be the result of misspecification of the model, then the solution is to determine the
appropriate mathematical form.
If the above approaches are ruled out, the appropriate procedure will be to transform the
original data so that we can comeup with a new form (or model) which satisfies the assumption
of no serial correlation. Of course, the transformation depends on the nature of the serial
correlation. If the nature of serial correlation is assumed to follow the first-order autoregressive
scheme, namely,
Ut = Ut-1 + t I . ....................................................(4.15)
In this case the serial correlation problem can be satisfactorily resolved if , the coefficient of
autocorrelation, is known.
Consider the following two variable model
Yt = 0 + 1Xt + Ut .......................................................(4.16)
For time t-1 the above model will be
23
Yt-1 = 0 + 1Xt-1 + Ut-1 .......................................................(4.17)
multiplying both sides by , we obtain
Yt-1 = 0 + 1Xt-1 + Ut-1 .....................................................(4.18)
Subtracting (4.19) from (4.17) gives
(Yt - Yt-1) = (0 - 0) + (1Xt - 1Xt-1) + Ut - Ut-1
= 0 (1-) + 1(Xt -Xt-1) + t ................................(4.19)
The transformed model can be expressed as
Y*t = *0 + *1X*t + t ................................(4.20)
Where Y*t = Yt - Yt-1, *0 = 0 - 0 and X*t = (Xt - Xt-1)
Since t (which is Ut - Ut-1 and can be derived from [4.15]) satisfies the OLS assumptions,
one can proceed to apply OLS to the transformed variable Y * and X* and obtain estimators with
all the optimum properties, namely BLUE. Running regression on (4.19- the above transformed
model) is tantamount to using generalized least squares (GLS).
Note that although the procedure discussed earlier is straight forward to apply, it is generally
difficult to run because , population correlation coefficient is rarely known in practice.
Therefore, alternative methods need to be devised.
The Cochrane-Orcutt interative Procedure
24
Step 3: Using ρ^ obtained from step 2 regression, run the generalized difference equation
similar to (4.20) as follows
Step 4: Since a priori it is not known that the ρ^ obtained from the regression in step 2 is the
^¿
β ^¿
best estimate of , substitute the values of 0 and β 1 obtained from the regression in
^ **
step 3 into the original regression (4.21) and obtain the new residuals, say U t as
U^ ** ^¿ ^¿
t = Yt - β 0 - β 1 Xt
^¿ ^¿
Note that this can be easily computed since Yt, Xt, β 0 and β 2 are all known.
Step 5: Now estimate this regression
U^ ** ^ ^ **
t = ρ^ U t−1 + Wt
25
3. Suppose Yi = 0 + 1Xi + U i. Assume that U i is generated by the AR(I)scheme. Show the
Cochrane- Orcutt procedure to testing autocorrelation
4. Suppose that a research used a 20 years data on imports and GDP of Ethiopia.
Applying OLS to the observations she obtained the following import function.
M = -2461 + 0.28G
^2
U t = 573, 069
∑ ( U t −U t−1)2 = 537,192
Where M = import, G= GDP
Use the Durban Watson test to examine the problem of autocorrelation.
4.5 MULTICOLLINEARITY
a) The nature of the problem
One of the assumption of the classical linear regression model (CLRM) is that there is no
perfect multicollinearity among the regressors included in the regression model. Note that
although the assumption is said to be violated only in the case of exact multicollinearity (i.e., an
exact linear relationship among some of the regressors), the presence of multicollinearity (an
approximate linear relationship among some of the regressors) lead to estimating problems
important enough to warrant out treating it as a violation of the classical linear regression
model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set
at hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be
the lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact
exist some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it
becomes extremely difficult to establish the influence of each one regressors on the dependent
variable separately. That is, if two explanatory variables change by the same proportion, the
26
influence on the dependent variable by one of the explanatory variables may be erroneously
attributed to the other. Their effect cannot be sensibly investigated, due to the high inter
correlation.
In general, the problem of multicollinearity arises when individual effects of explanatory
variables cannot be isolated and the corresponding parameter magnitudes cannot be determined
with the desired degree of precision. Though it is quite frequent in cross section data as well, it
should be noted that it tends to be more common and more serious problem in time series data.
b) Consequences of Multicollinearity
In the case of near or high multicollinearity, one is likely to encounter the following
consequences
i) Although BLUE, the OLS estimators have large variances and covariances, making
precise estimation difficult. This is clearly seen through the formula of variance of the
^
estimators. For example of multiple linear regression, Var( β 1 ) can be written as follows
σ2
^
Var( β 1 ) = ∑ x 21i (1−r212)
It is apparent from the above formula that as r 12 (which is the coefficient of correlation
between X1 and X2) tends towards 1, that is as collinearity increases, the variance of the
^ ^ ^
estimator increases. The same holds for Var( β 2 )and the cov ( β 1 , β 2 )
ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the Zero null hypothesis (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R 2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
vi) Forecasting is still possible if the nature of the collinearity remains the same within the
new (future) sample observation. That is, if collinearity exists on the data of the past 15
27
years sample, and if collinearity is expected to be the same for the future sample period,
then forecasting will not be a problem.
c) Detecting Multicollinearity
Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not test
for multicollinearity but can, if we wish, measure its degree in any particular sample. The
following are some rules of thumb and formal rules to detection of multicolinearity.
i) High R2 but few significant t-ratios: If R 2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient
among two regressors is high, say in excess of 0.8, then multicolinearity is a serious
problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the
regressors are exact or approximately linear combinations of the other regressors, one way
of finding out which X variable is related to other X variables is to regress each X i on the
remaining X variables and compute the corresponding R 2that will help to decide abut the
problem. For example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + + k-1Xk-1 + V
If the R2 of the above regression is high it implies that X k is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
d) Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables . Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one
28
can try the following rules of thumb, the success depending on the severity of the collinearity
problem.
a) Obtain more data: - Because the multicollinearity is essentially a data problem, additional
data that do not contain the multicollinearity feature could solve the problem. For
example, in the three variable model we saw that
σ2
Var( β 1 ) = ∑ x 1i (1−r 12 )
^ 2 2
Now as the sample size increases, x1i2 will generally increases. Thus for any given r 12, the
^
variance of β 1 will decrease, thus decreasing the standard error, which will enable us to
estimate 1 more precisely.
b) Drop a variable: - when faced with severe multicollinearity, one of the simplest thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we
are dropping a variable when its true coefficient in the equation being estimated is not
zero.
c) Transformation of variables: - In time series analysis, one reason for high
multicollinearity between two variables is that over time both variables tend to move in
the same direction. One way of minimizing then dependence is to transform the variables.
That is, suppose Yt = 0 + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
Yt-1 = 0 + 1X1t-1 + 2X2t-1 + Ut-1.
Subtracting this from the above gives
Yt Yt-1 = 1(X1t X1t-1) + 2(X2t X2t-1) + Vt
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of
29
X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated
d) Formalize relationships among regressors: - If it is believed that the multicollinearity
arises not from an unfortunate data set but from an actual approximate linear relationsip
among some of the regressors, this relationship could be formalized and the estimation
could then proceed in the context of a simultaneous equation estimation problem.
Check your progress 3
1. State with reasons whether the following statements are true, false or uncertain
a) Despite perfect multicollinearity, OLS estimators are BLUE
b) If an auxiliary regression shows that a particular R 2 is high, there is definite
evidence of high collinearity.
2. In data involving economic time series such as GDP, income, prices, unemployment, etc.
multicollinearity is usually suspected. Why?
3. State three remedial measure if multicollinearity is detected
4.7 ANSWERS TO CHECK YOUR PROGRESS
Answer to check your progress 1
1 a) False. Because though OLS estimates are inefficient with the presence of
hetroscedasticity, they would still be statistically unbiased.
b) True, because OLS estimated do not have the minimum variance.
The answer for question number 2 and 3 is discussed in the text
∑ U^ 22 769 , 899 .2
=5
F* = ∑ U 1
^ 2 144 , 771 .6
4.
The theoretical (table) value of F at the 5 percent level of significance (where n= 31,
C= 9 and k= 2 so that 9 df) is 3.18.
Given that F*>F0.05, we reject the assumption of homoscedasticity.
Answer to check your progress 2
1 a) False, because OLS estimates are unbiased.
b) True (see the explanation in the text)
2 a) Note that for n= 50 and k = 4 dL is 1.38
since d = 1.05 is less than dL. It suggests the existence of positive autocorrelation.
30
b) dL = 1.38, d0 = 1.72
Since 1.38 < d= 1.40 < 1.72; it lies in the indecisive range.
3. The answer is discussed in the text.
∑ ( U^ t −U^ t−1 )2 537 , 192
4. d* =
∑ U^ t 2 = 573 , 069 =0 .937
From the Durbin - Watson table, with 5 percent level of significance, n = 20 and K=1, we find
that d2 = 1.20 and du = 1.41. Since d* = 0.937 is less that d2 = 1.20, we conclude that there is
positive autocorrelation in the import function.
Answer To Check Your Progress 3
1. a) True (refer the text for the explanation)
b) True (refer the text for the explanation)
2. This is because the variables are highly interrelated. For example, an increase in income
brings about an increase in GDP. Moreover, an increase in unemployment usually brings
about a decline in prices.
3. Refer the text for the answer
4.8 MODEL EXAMINATION QUESTIONS
1. True or false and explain when necessary
a) In the presence of hetroscedasticity the usual OLS method always overestimates the
standard errors of estimators.
b) If a regression model is mis-specified the OLS residuals will show a distinct pattern.
c) The Durbin Watson d test assumes that the variance of the error term Ut is
homoscedastic.
d) In case of high multi co linearity, it is not possible to assess the individual
significance of one or more partial regression coefficients.
2. Consider the following model
Yt = 0 + 1 Xt + 2Xt-1+ 3Xt-2 + 4Xt-3 + Ut
Where Y = Consumption, X = Income, T = Time. Note That The Model Implies That
Consumption Expenditure At Time T Is A Function Of, Current Income, Xt And Previous
Periods' Income.
a) Would you expect multi-co linearity in such model.
31
b) If co linearity is expected, how would you resolve the problem
3. You are given the following data.
^2
U 1 based on the first 30 observations = 55, df =25.
^2
U 2 based on the last 30 observations = 140, df = 25
Carrying out the Goldfeld Quandt test of hetroscedasticity at the 5% level of significance.
4. Given a sample of 50 observations and 4 explanatory variables, what can you say about
autocorrelation
a) d = 2.50 b) 3.97
5) Answer the following
a) Discuss the causes of hetroscedasticity
b) State the consequences of autocorrelation
c) Explain 3 remedial measures suggested to overcome multi-co linearity.
32
UNIT 5: FURTHER TOPICS IN REGRESSION
5.1 INTRODUCTION
As it is mentioned in the previous section, this unit is dealing with the role of qualitative
explanatory variables in regression analysis and the functional forms of some non-linear
regressor models. It will be shown that the introduction of qualitative variables, often called
dummy variables, makes the linear regression model an extremely flexible tool that is capable
of handling many interesting problems encountered in empirical studies. Having a brief
introduction on such binary variables the functional forms of regression models (i.e., regression
models that may be non linear in the variables but are liner in the parameters) will be discussed.
Double log, semi-log and reciprocal models will be shown. We will see their special features
and functional forms.
5.2 MODELS WITH BINARY REGRESSORS
5.2.1 The Nature of Dummy Variables
In regression analysis it frequently happens that the dependent variable is influenced, not only
by variables which can be readily quantified on some well defined scale (eg. income, output,
price, etc) but also by variables which are essentially qualitative in nature. For example, color,
sex, race, religion, change in policy, nationality etc are all dummy variables. Since such kind of
variables may have an influence on the dependent variable they should be included in the
model. How can one include such variables as an explanatory variable in the model?
Since such qualitative variables usually indicate the presence or absence of an attribute (ex,
male or female, black or white etc) one method of quantifying such attributes is by constructing
artificial variables which take on values of 1 or 0, 0 indicating the absence of an attribute and 1
indicating the presence of that attribute.
Example. If an individual is male = 1
female = 0
Variables which assume such 0 and 1 values are called dummy variables or binary variables or
qualitative variables or categorical variables or dichotomous variables.
Now let us take some examples with a single quantitative explanatory variable and two or more
qualitative explanatory variables.
33
Example 1: Suppose a researcher wants to find out whether sex makes any difference in a
college teachers salary, assuming that all other variables such as age, education level,
experience etc are held constant.
The model can be formulated as follows
Yi =
β0 + β1 D + U
i i
β 1 tells by how much the mean salary of a male college teacher differs from the mean salary of
(0.32) (0.44)
34
t= (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college teachers is birr
¿ ¿ ¿
Since β 1 is statistically significant, the results indicate that the mean salaries of the two
categories are different, actually the female teacher s average salary is lower than her male
counter part. If all other variables are held constant, there is sex discrimination in the salaries of
the two sexes.
The above regression can be shown graphically below:
Salary
( 0+ 1)
= 21,800
= 3,280
0 = 18,000
Yi =
β0 + β1 D +β2 X + U
i i i
The coefficient
β 0 (intercept) is the intercept term for the base category. The coefficient β 1
attached to the dummy variable D can be called the differential intercept coefficient because it
tells by how much the value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
The other important point is on the number of dummy variables to be included in the model. If
a qualitative variable has m categories, introduce only m 1 dummy variables. In the above
examples, sex has two categories, and hence we introduced only a single dummy variable. If
this rule is not followed, we shall fall in to what might be called the dummy-variable trap, that
is, the situation of perfect multicollinearity.
Example 3: Let us take an example on regression on one quantitative variable and one
qualitative variable with more than two classes. Suppose we want to regress the annual
expenditure on health care by an individual on the income and education of the individual. Now
the variable education is qualitative in nature. We can have, as an example, three mutually
exclusive levels of education.
- Less than high school
- High school
- College
The number of dummies = 3 1 = 2. (Note the rule)
Let us consider the less than high school education category as the base category. The model
can be formulated as follows:
Yi =
β0 + β1 D +β2 D + β3 X + U
1i 2i i i
36
Assuming E(Ui) = 0, the mean health care expenditure functions for the three levels of
education are:
E(Yi/D1 = 0, D2 = 0, Xi) =
β 0 + β 3 X , for less than high school education
i
E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,
E(Yi/D1 = 0, D2 = 1, Xi) = (
β 0 + β 2 )+ β 3 X , for college education
i
College education
2
1
0
X (income)
Figure 5.2: Expenditure on health care in relation to income for three levels of education
The intercept
β 0 is the intercept of the base category. The differential intercepts β 1 and β 2 tells
by how much the intercepts of the other two categories differ from the intercept of the base
category.
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. If you consider example 1 above it is possible to introduce another dummy variable,
for example, color of the teacher, as an explanatory variable. Hence we will have an additional
dummy variable for color i.e.
D2 = 1 if white and 0 otherwise
Therefore, it is possible to include more than one quantitative variable and more than two
qualitative variable in our linear regression model.
Check Your Progress 5.2.1
1. Suppose that the salary of economics graduate students depends on their degree
qualification (whether a candidate has a Ph.D degree or not).
a) Specify the model with salary as the dependent variable and degree qualification as
the explanatory variable
37
b) Find E(Yi/Xi = 0) and E(Yi/Xi = 1)
2. Suppose that a researcher wants to regress the annual salaries of economics graduates on
the number of years of experience and education level of the graduates (Here we have
three levels of education, namely. BA, MSC and Ph.D)
a) How many dummy variables will be included in the model
b) Specify the model considering BA as the base category
c) Find the mean values of the annual salaries corresponding to different values of the
regressors.
5.3NON-LINEAR REGRESSION MODELS
The purpose of this section is to introduce you with models that are linear in the parameters but
non linear in the variables.
5.3.1 Non Linear Relationships in Economics
The assumption of linear relationship between the dependent and the explanatory variables may
not be acceptable for many economic relationships. Given the complexity of the real world we
expect non-linearities in most economic relationships.
Example 1: Cost functions are usually non-linear
ATC ATC
Quantity
of output
Product
(Y) TP
Input
38
Other economic functions like demand, supply, income-consumption curves, etc can also be
non-linear.
5.3.2 Specification and Estimation of Non-Linear Models
Now let us consider some of the commonly used regression models that may be non-linear in
the variables but are linear in the parameters.
5.3.2.1 Transformation of the Polynomials
Some of the most common forms of non-linear economic relationships can be expressed by
polynomials.
E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,
E(Yi/D1 = 0, D2 = 1, Xi) = (
β 0 + β 2 )+ β 3 X , for college education
i
Example 1: Y =
β0 + β1 X + β2 X 2 + β3 X 3 + +U
1 1 1
C=
β 0 + β 1 X - β 2 X2 + β 3 X3 +U
Where, C = total cost; X = output
To fit this model we need to transform some of the variables
Let X2 = Z and X3 = W, U = error term
Then the above model becomes
C=
β0 + β1 X -β2 Z + β3 W + U
Now we can proceed with the application of OLS to the above linear relationship
Example 2. Suppose we have data on yield of wheat and amount of fertilizer applied. Assume
that the increased amount of fertilizer begin to burn the crop causing the yield to decline.
Y X X2
55 1 1
70 2 4
75 3 9
65 4 16
60 5 25
we want to fit the second degree equation
Yi =
β0 + β1 X + β2 X 2 + U
1i 1i i
2
Let X 1 i = W
39
Then Yi =
β0 + β1 X + β2 W + U
1i i
This is linear both in terms of parameters and variables. We apply OLS to the above function.
The results are presented as follows:
Y^ i = 36 + 24.07X 3.9 X i2
i
(6.471) (1.059)
t= 3.72 -3.71
2
It is possible to test the significance of X i
H0: β 2 = 0
H1: β 2 < 0
¿
β2 −β 2 −3 . 90
¿ =
t= S( β ) 1 .059
2 = -3.71
t0.05(5-3) = 2.92
2
Decision: we reject H0: since β 2 is significant, X i should be retained in the model. This
implies that the relationship between yield and amount of fertilizer has to be estimated by
second degree equation.
5.3.2.2 Double-Log or log-Log Models
This model is very common in economics. Consider the following model
Yi =
β 0 X 1β 1i X 2β2i
This can be transformed in to linear form by using logarithm
lnYi = ln
β 0 + β 1 lnX + β 2 lnX
1 2
Since both the dependent and the explanatory variables are expressed in terms of logarithm, the
model is known as double-log or log-log or log-linear model.
If we include the disturbance term
β1 β2
Yi =
β 0 X 1 i X 2 i eU
lnYi = ln
β 0 + β 1 lnX + β 2 lnX + U
1 2
40
This model is linear in the parameters and can be estimated by OLS if the assumptions of the
classical linear regression model are fulfilled.
¿
Suppose lnYi = Y* lnX1 = X 1
ln
β 0 = β ¿0 ¿
lnX2 = X 2
¿
Y* =
β 0 + β 1 X ¿1 + β 2 X ¿2 +U Which is linear both in terms of the parameters and variables.
i
Example 3: The following table shows the yearly outputs of an industry and the amount of
inputs (labor and capital) used for eight firms.
Output Labor Capital
(Q) (L) (K)
100 1 2.0
120 1.3 2.2
140 1.8 2.3
150 2.0 1.5
165 2.5 2.8
190 3.0 3.0
200 3.0 3.3
220 4.0 3.4
The objective is to estimate the Cobb-Douglas production function for an industry on the basis
of the random sample of eight firms. The estimated production function is
logQ = 4.3900 + 0.4349X1 + 0.3395X2
or logQ = 4.3900 + 0.4349 logL + 0.3395 logK
The model can be written in its original form as follows
Q = antilog (4.3900) L0.4349 K0.3395
Q = 80.64 L0.4349 K0.3395
The model in its complete form can be given as
Q = 80.64 L0.4349 K0.3395
(0.1118) (0.2683)
R2 = 0.99
Interpretation: One attractive feature of the log-log model, which has made it popular in
applied work, is that the coefficient β 1 and β 2 measure the elasticity of output with respect to
L and K (labor and capital).
41
¿
β 1 = 0.4349 implies that a one percent increase in labor input will result a 0.4349% increase
in the output level assuming that capital is held constant.
¿
β 2 = 0.3395 implies that a one percent increase in the amount of capital will increase the
level of output by 0.3395 percent assuming that labor is constant.
Note that the sum of elasticities ( β 1 + β 2 ) indicate the type of returns to scale. The returns to
scale show the responsiveness of output when all inputs are charged proportionately
695 43 0 + 1 lnX + U
Or lnY = Ln 1
724 ...38
812 .. .36 b) Interpret 1
887 28
991 23
1186 ..19
1940 ..10
5.3.2.3 Semilog Models: Log-lin and Lin-log models
Semilog models are those whose dependent or explanatory variable is written in the log form.
Example 1: 1. lnYi =
β0 + β1 X + U
i i
2. Yi = 0 + 1 lnXi + Ui
42
The above models are called semilog models. We call the first model log-lin model and the
second model is known as lin-log model. The name given to the above models is based on
whether the dependent variable or the explanatory variable is in the log form.
Now let us consider the log-lin model (model 1 above)
lnYi =
β0 + β1 X + U
i i
relative change in Y
β 1 = absolute change in X
Multiplying the relative change in Y by 100 will give you the percentage change in Y for an
absolute change in X.
Example: ln
G N^ P t = 6.96 + 0.027T
Where GNP = real gross
(0.015) (0.012) national product
T time (in years)
r2 = 0.95
F1.13 = 260.34
The above result shows that the real GNP of the country was growing at the rate of 2.70 percent
per year (for the sample period). It is possible to estimate a linear trend model
G N^ P t = 1040.11 + 35 T
(18.9) (2.07)
r2 = 0.95
F1.13 = 284.7
This model implies that for the sample period the real GNP was growing at the constant
absolute amount of about $35 billion a year. The choice between the log-lin and linear model
will depend up on whether one is interested in the relative or the absolute change in the GNP.
NB: you can not compare the r2 values of the two models since the dependent variables are
different.
5.3.2.4 Reciprocal Models
The functions defined as
43
β1
β
Yi = 0 + x i + Ui is known as a reciprocal model. Although this model is nonlinear in
Y=
β 0 + β 1 Z + U which is linear both in terms of the parameters and variables.
i
β
1
( )
The above model shows that as X increases indefinitely, the term 1 x approaches zero and
Y Y Y
x
0 0
x
0 x - 0 1
(a) (b) (c)
0 0
b1
We can have examples for each of the above functions (fig. a, b and c)
1. The average fixed cost curve relates the average fixed cost of production to the level of
output. As it is indicated in fig. (a) the AFC declines continuously as output increases.
2. The Philips curve which relates the unemployment rate with the rate of inflation can be a
good example for fig (b) above
3. The reciprocal model of fig (c) is appropriate Engel expenditure curve that relates a
consumers expenditure on a commodity to his total expenditure or income.
SUMMARY ON FUNCTIONAL FORMS
Slope Elasticity
44
dy dy x
Model Equation dx = dx . y
Linear Y=
β0 + β1 X β1 β1 ( )
x
y
Log-linear
β
lnY = 0 + β 1 lnX
y
β1 x( ) β1
Log-lin
β
lnY= 0 + β 1 X β 1 (y) β 1 (x)
Lin-log Y=
β 0 + β 1 lnX β1( ) 1
x β1 ( ) 1
y
( ) ( ) β ( xy )
1 1 1
Reciprocal
β
Y = 0 + β1 x -β x1
2
- 1
Note that if the value of x and y are not given elasticity is often calculated at the mean values,
x̄ and ȳ .
1 a) Yi =
β0 + β1 X + U
i
b) E(Yi/Xi = 0) =
β0
E(Yi/Xi = 1) =
β0 +β1
2. a) Number of dummy variables = No of categories 1 = 3 1 = 2
45
b) Yi =
β0 + β1 D + β2 D + β3 X + U
1i 2i i i
c) E(Yi/Di = 1, D2 = 0, Xi) =
β0 + β1 + β3 X
i
E(Yi/D1 = 0, D2 = 0, Xi) =
β0 +β3 X
i
E(Yi/D1 = 0, D2 = 1, Xi) = (
β0 +β2 ) + β3 X
i
5.3.2
a) lnY = 9.121 0.69 ln X
(10.07) (0.02)
R2 = 0.992
b) An increase in price by one percent will decrease the demand for the commodity by
0.69 percent.
5.3.3
(a) fig (b)
(b) 1.43 is the wage floor. It shows that as X increases indefinitely the percentage
decrease in wages will not be more than 1.43 percent per year.
5.7 MODEL EXAMINATION QUESTIONS
1. Explain the role of qualitative explanatory variables in regression analysis.
2. The following regression explains the determination of moon lighters hourly wages.
Wm = 37.07 + 0.403 W0 90.06 ra + 75.51 U + 47.33 H + 113.64 re + 2.26 A
(0.062) (24.47) (21.60) (23.42) (27.62) (0.94)
R2 = 0.34 df = 311
Where Wm = moonlighting wage (cents/hour)
Wo = primary wage (cents/hour)
ra = race = 0 if while
= 1 non-white
U = urban = 0 non-urban
46
= 1 urban
H = High school = 0 non graduate
= 1 high school graduate
A = Age, years
From the above equation derive the hourly wage equations for the following types of
moonlighters.
a) White, non-urban, western resident, and high school graduate.
b) Nonwhite, urban, non-western resident, and non-high school graduate.
c) White, non-urban, non-west resident, and high school graduate.
d) White, non-urban, non-west, non-graduate
e) Nonwhite, urban, west, high school graduate (when the dummies are equal to 1)
f) What do you understand about the statistical significance of the variables in the above
model?
g) Interpret the coefficients.
3. The following table gives data on annual percentage change in wage rates(Y) and the
unemployment rate (X) for a country for the period 1950 1966.
Percentage increase Unemployment (%)
Year in wage rates (Y) X
1950 1.8 1.4
1951 8.5 1.1
1952 8.4 1.5
1953 4.5 1.5
1954 4.3 1.2
1955 6.9 1.0
1956 8.0 1.1
1957 5.0 1.3
1958 3.6 1.8
1959 2.6 1.9
1960 2.6 1.5
1961 4.2 1.4
1962 3.6 1.8
1963 3.7 2.1
1964 4.8 1.5
1965 4.3 1.3
1966 4.6 1.4
Use these data to fit the following model
47
β
Yt = Yi = 0 + β 1
( )
1
xt
+ Ut
4. We have seen the following growth model
ln
G N^ P t = 6.96 + 0.027T r2 = 0.95
(0.015) (0.0017) F1,13 = 260.34
and that of the linear trend model
G N^ P t = 1040.11 + 35 T r2 = 0.95
(18.86) (2.07) F1,13 = 284.74
Which model do you prefer? Why?
5. The demand function for coffee is estimated as follows
Y^ t = 2.69 0.4795 X r2 = 0.6628
t
(0.1216) (0.1140)
where Yt = cups per person per day Xt = average retail price of coffee
Find the price elasticity of demand.
UNIT 6: INTRODUCTION TO SIMULTANEOUS EQUATION
6.1 Introduction
The application of least squares to a single equation assumes, among others, that the
explanatory variables are truly exogenous, that there is one-way causation between the
dependent variable (Y) and the explanatory variables (X). That is, the function cannot be
treated in isolation as a single equation model but belongs to a wider system of equations which
describes the relationship among all the relevant variables. In such cases we must use a multi
equation model which would include separate equations in which y and x would appear as
endogenous variables. A system describing the joint dependence of variables is called a system
of simultaneous equations.
6.2. SIMULTANEOUS DEPENDENCE OF ECONOMIC VARIABLES
In a single equations discussed in the previous units the cause and effect relationship is
unidirectional where the explanatory variables are the cause and the dependent variable is the
effect.
However, there are situations where there is a two-way flow of influence among economic
variables; that is, one economic variable affects another economic variable(s) and is, in turn,
affected by it (them). In such case we need to consider two equations and thus come up with
48
simultaneous equation models in which there is more than one regression equations for each
independent variable.
The first thing we need to answer is the question of what happens if the parameters of each
equation are estimated by applying, say, the method of OLS, disregarding other equations in
the system? Recall that one of the crucial assumptions of the method of OLS is that the
explanatory X variables are either non stochastic or if stochastic (random are distributed
independently of the stochastic distribution term. If neither of these conditions is met, then, the
least-squares estimators are not only biased but also inconsistent; that is, as the sample size
increases indefinitely, the estimators do not converge to their true (population) values.
For example, consider the following hypothetical system of equation
Y1i = 10 + 12Y2i + 11X1i + U1i ..(6.1)
Y2i = 20 + 21Y1i + 21X1i + U2i ...(6.2)
Where Y1 and Y2 are mutually dependent or endogenous, variables (i.e. whose value are
determined with in the model) and X 1 an exogenous variable (whose value are determined out
side the model) and where U1 and U2 are stochastic disturbance terms, the variables Y 1 and Y2
are both stochastic. Therefore, unless it can be shown that the stochastic explanatory variable
Y2 in (6.1) is distributed independently of U 1 and the stochastic explanatory variable Y 1 in (6.2)
in distributed independently of U2, application of classical OLS to these equations individually
will lead to inconsistent estimates.
Example. Recall that price of a commodity and the quantity (bought and sold) are determined
by the intersection of the demand and supply curves for that commodity. Consider the
following linear demand and supply models.
d
Demand function Q t = 0 + 1Pt + U1t ...(6.3)
s
Supply function Q t = 0 + 1Pt + U2t (6.4)
d s
Equilibrium Condition Q t = Q t . ..(6.5)
d s
Where Q t = Quantity demanded, Q t = Quantity supplied, P = price and t = time
Note that P and Q are jointly dependent variables. If U 1 changes because of changes in other
d
variables affecting Q t (such as income and tastes) the demand shifts. Recall that such shift in
demand changes both P and Q. Similarly, a change in U 2t (because of changes in weather and
49
the like) will shift (affect) supply, again affecting both P and Q. Because of this simultaneous
dependence between Q and P, U1 and Pt in (6.3) and U2t and Pt is (6.4) cannot be independent.
Therefore a regression of Q on P as in (6.3) would violate an important assumption of the
classical linear regression model, namely, the assumption of no correlation between the
explanatory variable(s) and the disturbance term. In summary, the above discussion reveals that
in contrast to single equation models, in simultaneous equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the number
of endogenous variables. As a consequence such an endogenous explanatory variable becomes
stochastic and is usually correlated with the disturbance term of the equation in which it
appears as an explanatory variable.
Recall that the variable entering a simultaneous equation model are of two types: They are
called endogenous and predetermined variables. Endogenous variables are those variables
whose values are determined inside the model. Predetermined variables on the other hand, are
those whose values are determined outside the model. Predetermined variables are divided into
exogenous and lagged endogenous variables. Although non economic variables such as rainfall
and weather are clearly exogenous or predetermined, the model builder must exercise great care
in classifying economic variables as endogenous or predetermined. Consider the Keynesian
model of income determination
Consumption function: Ct = 0 + 1Yt + Ut 0 < 1 < 1 (6.6)
Income identity: Yt = Ct + It .. ..(6.7)
In this model C(consumption) and Y (income are endogenous variables. Investment (I) on the
other hand is treated as exogenous variable. Note that if there were lagged values of
consumption and income variables (i.e., C t-1 and Yt-1) they would have been treated as lagged
endogenous and hence predetermined variables.
Consider the problem of estimating the consumption function, regressing consumption on
income. Suppose the disturbance in the consumption function jumps up. This directly increases
consumption, which through the equilibrium condition increases income. But income is the
independent variable in the consumption function, (6.6). Thus, the disturbance in the
consumption function and the regressor are positively correlated. An increase in the disturbance
term (directly implying an increase in consumption) is accompanied by an increase in income
(also implying an increase in consumption) when estimating the influence of income on
50
consumption, however, the OLS technique attributes both of these increases in consumption
(instead of just the latter) to the accompanying increase in income. This implies that the OLS
estimator of the marginal propensity to consume (1) is biased upward, even asymptotically.
Both equation 6.6 and 6.7 are structural or behavioral equations because they are portraying the
structure of an economy, where equation (6.7) being an identity. The s are known as the
structural parameters or coefficients. From the structural equations one can solve for the
endogenous variables and derive a reduced-form equations and the associated reduced form
coefficients. A reduced form equation is one that expresses an endogenous variable solely in
terms of the predetermined variables and the stochastic disturbances.
If equation (6.6) is substituted into equation (6.7), and solve for Y we obtain the following
β0 1 Ut
Yt = 1−β 1 + 1−β 1 It + 1−β 1
= 0 + 1It + Wt ..(6.8)
β0 1 Ut
where 0 = 1−β 1 , 1 = 1−β 1 and Wt = 1−β 1
Equation (6.8) is a reduced-form equation; it expresses the endogenous variable Y solely as a
function of the exogenous (or predetermined) variable I and the stochastic disturbance term U.
0 + and 1 are the associated reduced form coefficients.
Substituting the value of Y from equation (6.8) into Y t of equation (6.6), we obtain another
reduced-form equation given by
Ct = 2 + 3It + Wt ..(6.9)
β0 β1 Ut
where 2 = 1−β 1 , 3 = 1−β 1 and Wt = 1−β 1
The reduced form coefficients, (the s) are also known as impact, or short run multipliers,
because they measure the immediate impact on the endogenous variable of a unit change in the
value of the exogenous variable. If in the preceding Keynesian model the investment
expenditure (I) is increased by, say $1 and if the marginal propensity to consume (i.e., 1) is
1
assumed to be 0.8, then from 1 of (6.8) we obtain 1 = 1−0 . 8 = 5. This result means that
51
increasing the investment by $1 will immediately (i.e., in the current time period) lead to an
increase in income of $5, that is, a fire fold increase.
Notice an interesting feature of the reduced-form equations. Since only the predetermined
variables and stochastic disturbances appear on the right side of these equations, and since the
predetermined variables are assumed to be uncorrelated with the disturbance terms, the OLS
method can be applied to estimate the coefficients of the reduced-form equations (the s). This
will be the case if a researcher is only interested in predicting the endogenous variables, only
wishes to estimate the size of the multipliers (i.e. the s)
Note that since the reduced form coefficients can be estimated by the OLS method and these
coefficients are combinations of the structural coefficients, the possibility exist that the
structural coefficients can be retrieved from the reduced-form coefficients, and it is in the
estimation of the structural parameters that we may be ultimately interested. Unfortunately,
retrieving the structural coefficients from the reduced form coefficients is not always possible;
this problem is one way of viewing the identification problem.
52
Consider the demand-and-supply model (6.3) and (6.4), together with the market clearing or
equilibrium, condition (6.5) that demand is equal to supply. By the equilibrium condition (i.e.,
Q dt = Qts ) we obtain,
Note that 0 and 1, (the reduced-form-coefficients) contain all four structural parameters; 0,
1, 0 and 1. But, there is no way in which the four structural unknowns can be estimated from
only two reduced form coefficients. Recall from high school algebra that to estimate four
unknowns we must have four (independent) equations, and in general, to estimate k unknowns
we must have R (independent) equations. What all this means is that, given time series data on
p(price) and Q(quantity) and no other information, there is no way the researcher guarantee
whether he/she is estimating the demand function or the supply function. That is, a given P t and
Qt represent simply the point of intersection of the appropriate demand and supply curves
because of the equilibrium condition that demand is equal to supply.
b) Just or Exact Identification
The reason we could not identify the preceding demand function or the supply function was
that the same variables P and Q are present in both functions and there is no additional
information. But suppose we consider the following demand and supply model.
53
Demand function: Qt = 0 + 1Pt + 2It + U1t 1 < 0, 2 > 0 ......................... (6.13)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t 1 > 0, 2 > 0 ...................... (6.14)
where I = income of the consumer, an exogenous variable
P t-1 = Price lagged one period, usually incorporated in the model to explain the supply of
many agricultural commodities.
Note that Pt-1 is a predetermined variable because its value is known at time t.
By the market-clearing mechanism we have
0 + 1Pt + 2It + U1t = 0 + 1Pt + 2Pt-1 + U2t ............... (6.15)
Solving this equation, we obtain the following equilibrium price
Pt = 0 + 1It + 2Pt-1 + Vt .............................. (6.16)
β 0−α 0 α2
−
where 0 = α 1−β 1 , 1 = α 1 −β 1
β2 U 2t −U 1t
2 = α 1 −β 1 , Vt = α 1−β 1
Substituting the equilibrium price (6.16) into the demand or supply equation of (6.13) or (6.14)
we obtain the corresponding equilibrium quantity:
Qt = 3 + 4It + sPt-1 + Wt ...................................... ..(6.17)
where the reduced-form coefficients are
α 1 β 0 −α 0 β 1 α2 β 1
3 = α 1 −β 1 , 4 = α 1 −β 1
α1 β 2 α 1 U 2t −β 1 U 1 t
5 = α 1 −β 1 , Wt = α 1−β 1
the demand-and-supply model given in equations (6.13) and (6.14) contain six structural
coefficients 0, 1, 2, 0, 1, and 2 and there are six reduced form coefficients - 0, 1, 2,
3, 4 and 5 to estimate them. Thus, we have six equations in six unknowns, and normally
we should be able to obtain unique estimates. Therefore, the parameters of both the demand
and supply equations can be identified and the system as a whole can be identified.
c) Over identification
54
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (6.13) can be modified as follows,
keeping the supply function as before:
Demand function: Qt = 0 + 1Pt + 2It + 3Rt + U1t .(6.18)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t .(6.19)
where R represents wealth
Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = 0 + 1It + 2Rt + 3Pt-1 + Vt .. .. (6.20)
Qt = 4 + sIt + 6Rt + 7Pt-1 + Wt .... (6.21)
β 0−α 0 α2
where 0 = α 1−β 1 , 1 = α 1 −β 1
α3 β2
2 = α 1 −β 1 , 3 = α 1 −β 1
α 1 β 0 −α 0 β 1 α2 β 1
4 = α 1 −β 1 , 5 = α 1 −β 1
α3 β 1 α1 β 2
6 = α 1 −β 1 , 7 = α 1 −β 1
α 1 U 2t −β 1 U 1 t U 2t −U 1t
Wt = α 1−β 1 , Vt = α 1−β 1
The demand and supply model in (6.18)_ and (6.19) contains seven structural coefficients, but
there are eight equations to estimate them the eight reduced form coefficients given above
(i.e., 0 7). Notice that the number of equations is greater than the number of unknowns. As
a result, unique estimation of all the parameters of our model is not possible. For example, one
can solve for 1 in the following two ways
π6 π5
1 = π 2 or 1 = π 1
That is, there are two estimates of the price coefficient in the supply function, and there is no
guarantee that these two values or solutions will be identical. Moreover, since 1 will be
transmitted to other estimates. Note that the supply function is identified in the system (6.13)
and (6.14) but not in the system (6.18) and (6.19), although in both cases the supply function
55
remains the same. This is because we have too much or an over sufficiency of information to
identify the supply curve. The over sufficiency of the information results from the fact that in
the model (6.18) and (6.19) the exclusion of the income variable form the supply function was
enough to identify it, but in the model (6.18) and (6.19) the supply function excludes not only
the income variable but also the wealth variable. In other words, in the latter model we put too
many restrictions on the supply function by requiring it to exclude more variables than
necessary to identify it. However, this situation does not imply that over identification is
necessarily bad since the problem of too much information can be handled.
Notice that the situation is the opposite of the case of under identification where there is too
little information. The only way in which the structural parameters of unidentified (or under
identified) equations can be identified (and thus be capable of being estimated) is through
imposition of further restrictions, or use of more extraneous information. Such restrictions, of
course, must be imposed only if their validity can be defended.
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. However this time consuming procedure can
be avoided by resorting to either the orders condition or the rank condition of identification.
Although the order condition is easy to apply, it provides only a necessary condition for
identification. On the other hand the rank condition is both a necessary and sufficient condition
for identification. [Note: the order and rank conditions for identification will not be discussed
since the objective of this unit is to briefly introduce and inform the reader about simultaneous
equation. For detailed and advanced discussion readers can refer the reference list stated at the
end of this unit].
6.4 A TEST OF SIMULTANEITY
If there is no simultaneous equation, or simultaneous problem, the OLS estimators produce
consistent and efficient estimators. On the other hand, if there is simultaneity, OLS estimators
are not even consistent so that other testing methods are looked for. If we apply these
alternative methods when there is in fact no simultaneity, the result will not be efficient. This
suggests that we should check for the simultaneity problem before we discord OLS in favor of
the alternatives.
A test of simultaneity is essentially a test of whether (an endogenous) regresor is correlated
with the error term. If it is, the simultaneity problem exists, in which case alternatives to OLS
56
must be found: if it is not, we can use OLS. To find out which is the case in a concrete
situation, we can use housemans specification error test.
Houseman Specification Test
Consider the following two-equation model
Demand function Qt = 0 + 1Pt + 2It + 3Rt + U1t (6.22)
Supply function Qt = 0 + 1Pt + U2t .. (6.23)
Assume that I and R are exogenous of course, P and Q are endogenous
Now consider the supply function (6.23). If there is no simultaneity problem (i.e., P and Q are
mutually independent), Pt and U2t should be uncorrelated on the other hand, if there is
simultaneity, Pt and U2t, will be correlated. To find out which is the case, the houseman test
procedure as follows:
First, from (6.22) and (6.23) we obtain the following reduced form equations
Pt = 0 + 1It + 2Rt + Vt .......................... ...... (6.24)
Qt = 3 + 4It + 5Rt + Wt ................................... .. (6.25)
where V and W are the reduced form error terms Estimating (6.24) by OLS we obtain
P^ t = π^ 0 + π^ 1 I + π^ 2 R ..(6.26)
t t
Therefore Pt =
P^ V^
t + t . .(6.27)
Where
P^ t are estimated P , and V^ t are estimated residuals. Substituting (6.27) into (6.23) we
t
get:
P^
Qt = 0 + 1
V^
t+ 1 t + U2t (6.28)
Now under the null hypothesis that there is no simultaneity, the correlation between
V^ t and U
2t
should be zero, asymptotically. Thus if we ran the regression (6.28) and find that the coefficient
of Vt in (6.28) is statistically zero, we can conclude that there is no simultaneity problem.
6.5 APPROACHES TO ESTIMATION
At the outset it may be noted that the estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical properties. In view of the introductory
nature of this unit we shall consider very briefly the following techniques.
a) The method of Indirect Least Squares (ILS)
57
For just or exactly identified structural equation, the method of obtaining the estimates of the
structural coefficients from the OLS estimators of the reduced form coefficients is known as the
method of indirect least squares (ILS). ILS involves the following three steps
Step I: - We first obtain the reduced form equations.
Step II: - Apply OLS to the reduced form equations individually.
Step III: - Obtain estimates of the original structural coefficients from the estimated reduced
form coefficients obtained in step II.
b) The method of two stage least squares (2SLS)
This method is applied in estimating an over identified equation. Theoretically, the two stages
least squares may be considered as an extension of ILS method. The 2SLS method boils down
to the application of ordinary list squares in two stages. That is, in the first stage, we apply least
squares to the reduced form equations in order to obtain an estimate of the exact and random
components of the endogenous variables appearing in the right hand side of the equation with
their estimated value and then we apply OLS to the transformed original equation to obtain
estimates of the structural parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is usually
applied uniformly to all identified equations in the system. [For a detailed discussion of this
method readers may refer the reference list stated at the end of this unit].
Check Your Progress
Y = 1−α
t
α
+
0 1
1−α
Y +
1
1−α
Ut t
4. 1 1 1
58
I = 1−α
t
α
+
0 α
1−α1
1
1
It+
1
U
1−α 1 t
59
family may or may not own a house. If it owns a house, it takes a value 1 and 0 if it does not.
There are several such examples where the dependent variable is dichotomous. A unique
feature of all the examples is that the dependent variable is of the type that elicits a yes or no
response; that is, it is dichotomous in nature. Now before we discuss the estimation of models
involving dichotomous response variables, let us briefly discuss the concept of qualitative
response models:
7.2 QUALITATIVE RESPONSE MODELS (QRM)
These are models inwhich the dependent variable is a discrete outcome.
Example 1. Y = 0 + 1X1 + 2X2
Y = 1, if individual i attended college
= 0, otherwise
In the above example the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression cannot be used to analyze a qualitative dependent variable model.
The models are analyzed in a general framework of probability models.
7.2.1 Categories of Qualitative Response Models (QRM)
Two broad categories of QRM
A. Binomial Model
The choice is between two alternatives
B. Multinomial models
The choice is between more than two alternatives
Example: Y = 1, occupation is farming
= 2, occupation is carpentry
= 3, occupation is fishing
Let us define some important terminologies
i. Binary variables: are variables that have two categories and are often used to indicate that
an event has occurred or that some characteristic is present.
Example: - Decision to participate in the labor force/or not to participate
-Decision to vote or not to vote
ii. Ordinal variables:- these are variables that have categories that can be ranked.
Example: Rank to indicate political orientation
Y = 1, radical
60
= 2, liberal
= 3, conservative
- Rank according to education attainment
Y = 1, primary education
= 2, secondary education
= 3, university education
iii. Nominal variables: These variables occur when there are multiple outcomes that cannot be
ordered.
Example: Occupation can be grouped as farming, fishing, carpentry etc.
Y = 1 farming
= 2 fishing Note that numbers are
assigned arbitrarily
= 3 carpentry
= 4 Livestock
iv. Count variables: These variables indicate the number of times some event has occurred.
Example: How many strikes have been occurred.
Now let us turn our attention to the four most commonly used approaches to estimating binary
response models (Type of binomial models).
1. Linear probability models
2. The logit model
3. The probit model
4. The tobit (censored regression) model.
7.3 THE LINEAR PROBABILITY MODEL (LPM)
The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:
Yi =
β0 + β1 X + U (1)
i i
E(Yi/Xi) =
β0 + β1 X .(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Y i has the following
distributions:
Yi Probability
0 1−Pi
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 Pi) + 1(Pi) = Pi ..(3)
Now, comparing (2) with (3), we can equate
E(Yi/Xi) = Yi =
β0 + β1 X = P (4)
i i
That is, the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi.
Since the probability Pi must lie between 0 and 1, we have the restriction 0 E (Yi/Xi) 1 that
is, the conditional expectation, or conditional probability, must lie between 0 and 1.
Problems with the LPM
While the interpretation of the parameters is unaffected by having a binary outcome, several
assumptions of the LPM are necessarily violated.
1. Heteroscedasticity
The variance of the disturbance terms depends on the Xs and is thus not constant. Let us see
this as follows. We have the following probability distributions for U.
62
Yi Ui Probability
0 − β 0 − β1 X i 1− Pi
1 1− β 0 − β1 X i Pi
Now by definition Var (Ui) = E(Ui E(Ui)]2 = E(Ui2) since E(Ui) = 0 by assumption
Therefore, using the preceding probability distribution of Ui, we obtain
Var(Ui) = E(Ui2) = (-
β 0 β 1 X )2 (1-P ) + (1- β 0 β 1 X )2 (P )
i i i i
=(-
β 0 – β 1 X )2(1- β 0 β 1 X ) + (1- β 0 β 1 X )2 ( β 0 + β 1 X )
i i i i
=(
β 0 + β 1 X ) (1- β 0 β 1 X )
i i
of β is inefficient and the standard errors are biased, resulting in incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (Us) to be normally distributed, we assumed
them to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc.
But the assumption of normality for U i is no longer tenable for the LPMs because like Y i, Ui
takes on only two values.
Ui = Yi-
β0 β1 X
i
Now when Yi = 1, Ui = 1 -
β0 β1 X
i
and when Yi = 0, Ui =
β0 β1 X
i
Since the model is linear, a unit increase in X results in a constant change of β in the
probability of an event, holding all other variables constant. The increase is the same regardless
63
of the current value of X. In many applications, this is unrealistic. When the outcome is a
probability, it is often substantively reasonable that the effects of independent variables will
have diminishing returns as the predicted probability approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.
Check Your Progress 7.1
1. Explain the binary or dichotomous variables.
2. Differentiate among binary, ordinal and nominal variables.
3. What is a linear probability model (LPM)? What are the shortcomings of this model?
7.4 THE LOGIT MODEL
We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of
Ui, possibility of
Y^ i lying outside the 0-1 range, and the generally lower R 2 values. But these
problems are surmountable. The fundamental problem with the LPM is that it is not logically a
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Y^ i = -0.9457 + 0.1021X
i
(0.1228) (0.0082)
t = (-7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows
- The intercept of 0.9457 gives the probability that a family with zero income will
own a house. Since this value is negative, and since probability cannot be negative, we
treat this value as zero.
- The slope value of 0.1021 means that for a unit change in income, on the average the
probability of owning a house increases by 0.1021 or about 10 percent. This is so
whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi.
Therefore, what we need is a (probability) model that has the following two features:
1. As Xi increases, Pi = E(Y = 1/X) increases but never steps outside the 0-1 interval.
64
2. The relationship between Pi and Xi is non-linear, that is, one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
- 0
Fig 7.1 A Cumulative Distribution Function (CDF)
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
probability that it takes a value less than or equal to x0, were x0 is some specified numerical
value of X. In short, F(X), the CDF of X, is F(X = x0) = P(X x0). Please refer to your text
statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic which gives rise to the logit model
b) the normal which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Recall that the LPM was (for home ownership)
Pi = E(Y = 1/Xi) =
β 0 +β1 X
i
Where X is income and Y = 1 means the family owns a house. Now consider the following
representation of home ownership.
1
−( β 0 + β1 X i )
Pi = E(Y = 1/Xi) = 1+ e
65
1
Pi = 1+ e
−Z i
where Zi =
β0 +β1 X
i
This equation represents what is known as the (cumulative) logistic distribution function. Since
the above equation is non linear in both the X and the β s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.
1
Zi
1 Pi = 1+ e
Pi 1+ e Zi Z
= −Z i
=e i
1−Pi 1+e
Pi
Now 1−Pi is simply the odds ratio in favor of owning a house- the ratio of the probability that
a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
( )
Pi
1−Pi β 0 +β1 X
Li = ln = Zi = i
L(the log of the odds ratio) is linear in X as well as β (the parameters). L is called the logit and
hence the name logit model is given to it.
The interpretation of the logit model is as follows:
β 1 the slope measures the change in L for a unit change in X.
β 0 the intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
Now for estimation purposes, let us write the logit model as
( )β
Pi
1−Pi 0 + β 1 Xi
Li = ln = + Ui
To estimate the above model we need values of Xi and Li. Standard OLS cannot be applied
( 1
) ( 0
since values of L are meaningless (ex. L = ln 0 and L = ln 1 .
)
66
Therefore estimation is by using the maximum likelihood method. (because of its mathematical
complexities we will not discuss the method here).
Example: Logit estimates. Assume that Y is linearly related to the variables Xi s as follows:
Yi =
β0 +β1 X + β2 X + β3 X + β4 X + β5 X + U
1 2 3 4 5 i
¿
Y = 1 if
Yi > 0
0 if
Y ¿i 0
The latent variable Y* is continuous (- < Y* < ). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X s through the structural
model.
Example:
Let Y measures whether one is employed or not. It is a binary variable taking values 0 and 1.
67
Y* - measures the willingness to participate in the labor market. This changes continuously and
is unobserved. If X is a wage rate, then as X increases the willingness to participate in the labor
market will increase. (Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the critical point.
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e., the
problem of non-normality of the error term and heteroscedasticity)
However, since the latent dependent variable is unobserved the model cannot be estimated
using OLS. Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the probit
(normit) and logit models, respectively. The coefficients derived from the maximum likelihood
(ML) function will be the coefficients for the probit model, if we assume a normal distribution.
If we assume that the appropriate distribution of the error term is a logistic distribution, the
coefficients that we get from the ML function will be the coefficient of the logit model. In both
cases, as with the LPM, it is assumed that E[i/Xi] = 0
In the probit model, it is assumed that Var (i/Xi) = 1. In the logit model, it is assumed that Var
2
(i/Xi) = π /3 . Hence the estimates of the parameters ( β s) from the two models are not
directly comparable.
But as Amemiya suggests, a logit estimate of a parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the same parameter. Similarly the coefficients of
LPM and logit models are related as follows:
β LPM = 0.25 β Logit, except for intercept
β LPM = 0.25 β Logit + 0.5 for intercept
Summary
- logit function
(α + βX i )
e 1
=
( α + βX i) −α −βX i
P(Y = 1/X) = 1+ e 1+e (we obtain this by dividing both the numerator
α +βx i
and denominator by e
- Probit function
68
where (.) is the normal probability distribution function
( )
2
1 1 X−μ
exp−
(i.e., σ √ 2 π
2 2 σ
Therefore, it is possible to avoid the problems of nonsensical result and the constancy impact of
X on the dependent variable (i.e. it will not be constant) since both models are non linear.
Check Your Progress 7.2
1. Explain the differences between the LPM and the logit or probit models.
2. Specify the mathematical form of both the probit and logit models.
3. Explain or outline the similarities and differences between the probit and logit models.
7.6 THE TOBIT MODEL
An extension of the probit model is the tobit model developed by James Tobin. To explain this
model, let us consider the home ownership example.
Suppose we want to find out the amount of money the consumer spends in buying a house in
relation to his or her income and other economic variables. Now we have a problem. If a
consumer does not purchase a house, obviously we have no data on housing expenditure for
such consumers; we have such data only on consumers who actually purchase a house.
Thus consumers are divided into two groups, one consisting of say, N 1 consumers about whom
we have information on the regressors (say income, interest rate etc)as well as the regresand
( amount of expenditure on housing) and another consisting of say, N 2 consumers about whom
we have information only on the regressors but on the regressand. A sample in which
information on regressand is available only for some observations is known as a censored
sample. Therefore, the tobit model is also known as a censored regression model.
Mathematically, we can express the tobit model as
Yi =
β 0 + β 1 X + U if RHS > 0
1i i
= 0, otherwise
Where RHS = right-hand side
The method of maximum likelihood can be used to estimate the parameters of such models.
7.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Answers to check your progress questions in this unit are already discussed in the text.
7.10 MODEL EXAMINATION QUESTIONS
69
1. When do we use models like LPM, logit and probit?
2. The LPM is the simplest of the above three models. But it has several limitations.
Discuss.
3. Can we use the standard OLS method to estimate the probit and logit models? Why?
4. Why do we call the tobit model is a censored regression model.
5. Specify the mathematical form of the tobit model and discuss how one can estimate
such models?
UNIT 8: TIME SERIES ECONOMETRICS (A BRIEF INTRODUCTIN)
8.0 AIMS AND OBJECTIVES
The aim of this unit is to extend the discussion of regression analysis by incorporating a brief
discussion of time series econometrics.
After the student have completed this unit, he/she will
understand concept of stationarity
formulate and conduct ADF test
distinguish between trend stationary and difference stationalry process
understand the relationship between spurious regression and integration
specify an error correction model.
8.1 INTRODUCTION
Recall from our unit one discussion that one of the two important type of data used in empirical
analysis is time series data. Time series data have become so frequently and intensively used in
empirical research that econometricians have recently begun to pay very careful attention to
such data.
In this very brief discussion we first define the concept of stationary time series and then
develop tests to find out whether a time series is stationary. In this connection we introduce
some related concepts, such as unit roots. We then distinguish between trend stationary and
difference stationary time series. A common problem in regression involving time series data is
the phenomenon of spurious regression. Therefore, an introduction to this concept will be
made. A last the concept of cointegration will be stated and point out its importance in
empirical research.
8.2 STATIONARITY AND UNIT ROOTS
70
Any time series data can be thought of as being generated by a stochastic or random process. A
type of stochastic process that has received a great deal of attention by time series analysis is
the so-called stationary stochastic process.
Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of covariance between two time periods depend only on the
distance or lag between the two time periods and not on the actual time at which the covariance
is computed. A non-stationary series on the other hand, do not have long run mean where the
variable returns and the variance extends to infinity as time goes by.
For many of time series data, however, stationarity is unlikely to exist. If this is the case, the
conventional hypothesis testing procedure based on t, F, Chi-square and other tests may be
suspected. In other words, if variables in the model are non-stationary, it results in spurious
regression. That is, the fact that the variables share common trend will tend to produce
significant relationship between the variables. Nonetheless, the relationship exhibits
contemporaneous correlation as a result of common trend rather than true causal relationship.
Hence, with non-stationary variables, conducting OLS generate misleading result.
Studies have developed different mechanism that enable non-stationary variables attain
stationalrity. It has been argued that if a variable has deterministic trend (i.e. if it can be
perfectly predictable rather than being variable or stochastic), including trend variable in the
regression removes the trend component and makes it stationary. For example in the regression
of consumption expenditure (PCE) an income (PDI) if we observe a very high r 2, which is
typically the case, it may reflect, not the true degree of association between the two variables,
but simply the common trend present in them. That is, with time the two variables move
together. To avoid such spurious association, the common practice is to regress PCE on PDI
and t(time), the trend variable. The coefficient of PDI obtained from this regression now
represents the net influence of PDI on PCE, having removed the trend effect. In other words,
the explicit introduction of the trend variable in the regression has the effect of detrending (i.e.,
removing the influence of trend from) both PCE and PDI. Such process is called trend
stationary since the deviation from the trend is stationary.
However, most time series data have a characteristic of stochastic trend (that is, the trend is
variable which therefore, cannot be predicted with certainty). In such cases, in order to avoid
the problem associated with spurious regression, pre-testing the variables for the existence of
71
unit roots (i.e., non stationarity) becomes compulsory. In general if a variable has stochastic
trend, it needs to be differenced in order to obtain stationarity. Such process is called difference
stationary process.
In this regard, the Dickey Fuller (DF) test enables us to assess the existence of stationarity. The
simplest DF test starts with the following first order autoregressive model.
Yt = Yt-1 + Ut ..(8.1)
Subtracting Yt-1 from both sides gives
Yt -Yt-1 = Yt = Yt-1 - Yt-1 + Ut Yt = Yt-1 +
= (-1)Yt-1 + Ut
Yt = Yt-1 + Ut (8.2)
where Yt = Yt -Yt-1, = - 1
The test for stationarity is conducted on the parameter . If = 0 or ( = 1) it implies that Yt =
Ut and hence the variable Y is not stationary (has unit root). In times series econometrics, a
time series that has a unit root is known as a random walk. This is because the change in Y
(Yt) is purely a result of the error term, U t. Thus, a random walk is an example of non-
stationary time series.
For the test of stationerity the hypothesis is formulated as follows:
H0: = 0 or ( = 1)
H1: < 0 or ( < 1)
Note that (8.2) is appropriate only when the series Y t has a zero mean and no trend term. But it
is impossible to know whether the true value of Y t has zero mean and no trend term. For this
reason including a constant (drift) and time trend in the regression is recommended. Thus (8.2)
is expanded to the following form.
Yt = + Yt-1 + T + Ut .(8.3)
where = constant term, T = the trend element.
Here as well the parameter is used while testing for stationerity. Rejecting the null hypothesis
(of H0: = 0) implies that there exists stationerity. That is Yt is also influenced by Yt-1 in
addition to Ut. Thus, the change in Yt (i.e., Yt) does not follow a random walk. Note that
accepting the null hypothesis is suggests the existence of unit root (or non stationarity)
72
The DF test has a series limitation in that it suffers from residual autocorrelation. Thus, it is
inappropriate to use DF distribution with the presence of a utocorrelated errors. To amend this
weakness, the DF model is augmented with additional lagged first difference of the dependent
variable. This is called Augmented Dicky Fuller (ADF). This regression model avoids
autocorrelation among the residuals. Incorporating lagged first difference of Y t in (8.3) gives
the following ADF model.
k
∑ ΔY t −i +U t
Yt = + T + Yt-1 + i i=1 ........................ .(8.4)
where k is the lag length
Now, the test for stationarity is free from the problem of residuals autocorrelation. Thus the
hypothesis testing (just like the above) can be conducted.
Example: Let us illustrate the ADF test using the Personal Consumption Expenditure (PCE)
data of Ethiopia suppose that regressions of PCE that corresponds to (8.4) gave following
results:
PCE = 233.08 + 1.64t 0.06PCEt-1 + PCEt-1 (8.5)
For our purpose the important thing is (taw) statistic of PCEt-1 variable. This is a table that
helps to test the hypothesis stated earlier. Suppose the calculated value do not exceeds its
table value, in this case we fail to reject the null hypothesis which indicates the PCE time series
is not stationary. Thus, if it is not stationary, using the variable at levels will lead to spurious
regression result. As has been stated earlier, if a variable is not stationary at levels, we need to
conduct the test on the variable in its difference form. If a variable that is not stationary in
levels appears to be stationary after nth difference then the variable is said to be integrated
order of n, symbolically we write I(n). Suppose we repeat the preceding exercise using the first
difference of PCE (i.e., PCEt = PCEt PCEt-1as explanatory variables). If the test result allows
us to reject the null hypothesis we conclude that PCE is integrated of order one, I(1). Note from
our discussion that application of OLS in stationary variables will bring about non-spurious
result. Therefore, before regression is performed that make use of time series variables, the
stationarity of all variables must first be checked.
Note that taking the variables in difference form presents only the dynamic interaction among
the variables with no information about the long run relationship. However, if the variables that
are non stationary separately have the same trend, it points that the variables have a stationary
73
linear combination. This in turn implies that the variables are cointegrated, i.e., there exists
long run equilibrium (relationship) among the variables.
Check Your Progress 1
1. Distinguish between trend stationary process (TSP) and a difference stationary process
(DSP)?
2. What is meant by stationarity and unit roots?
3. What is meant by integrated time series?
4. Discuss the concept of spurious regression
5. Explain the concept of ADF tests
8.3 COINTEGRATION ANALYSIS AND ERROR CORRECTION MECHANISM
Cointegration among the variables reflects the presence of long run relationship in the system.
We need to test for cointegration because differencing the variables to attain stationarity
generates a model that does not show the long run behavior of the variables. Hence, testing for
cointegration is the same as testing for long run relationship.
There are two approaches used in testing for cointegration. They are i) Engle-Granger (two-step
algorism) and ii) Johansen Approach.
The Engle-Granger (EG) method requires that for cointegration to exist, all the variables mush
be integrated of the same order. Once the variables are found to have the same order of
integration, the next step is testing for cointegration. This needs to generate the residual from
the estimated static equation and test its stationarity. By doing so we are testing whether the
deviation from the long run (captured by the error term) from the long run are stationary or not.
If the residuals are found to be stationary, it implies that the variables are cointegrated. This in
turn ensures that the deviation from the long run equilibrium relationship dies out with time.
Example: Suppose we regress PCE on PDI to find out the following estimated relationship
between the two.
PCEt = 0 + 1PDI + Ut (8.6)
To identify whether PCE and PDI are cointegrated (i.e., have stationary linear combination) or
not we write (8.6) as follows
Ut = PCEt-1 - 0 - 1PDI .... (8.7)
74
The purpose of (8.7) is to find that U t [i.e., the linear combination (PCE - 0 - 1PDI)] is I(0) or
stationary. Using the procedure stated in the earlier sub tunit for testing stationarity, if we reject
the null hypothesis then we say that the variables PCE and PDI are cointegrated.
If variables are cointegrated, the regression on the levels of the two variables as in (8.6), is
meaningful (i.e., not spurious); and we do not lose any valuable long term information, which
would result if we were to use their first differences instead.
In short, provided we check that the residuals are stationary, the traditional regression
methodology that we have learned so far (including t and F tests) is applicable to data involving
time series.
We just showed that PCE and PDI are cointegrated, that is there is a long-term equilibrium
relationship between the two. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in (8.7) as the equilibrium error . We can use this error
term to tie the short-run behavior of PCE to its long run value. In other words, the presence of
cointegration makes it possible to model the variables (that are in first difference) through the
error correction model (ECM). In the model a one time lagged value of the residual hold the
error correction term where its coefficient captures the speed of adjustment to the long run
equilibrium. The following model specification show with the PCE/PDI example how the ECM
works
ΔP C^ Et = + PDI + U^ t−1 + ................................(8.8)
0 1 t 2 t
where
U^ t−1 is the one period lagged value of the residual from regression (8.6) and is the
t
In (8.8) PDI captures the short run disturbances in PDI whereas the error correction term
U^ t−1
captures the adjustment toward the long-run equilibrium. If 2 is statistical significant (and has
to be negative between 0 and 1), it tells us what proportion of the disequilibrium in PCE in
one period is corrected in the next period.
Example: Suppose we obtain the following result
ΔP C^ Et = 11.69 + 0.29PDI 0.08U^ t−1 ..(8.9)
t
75
The result shows that short-run changes in PDI have significant positive effects on PCE and
that about 0.09 (or 9%) of the discrepancy (or deviation) between the actual and the long run,
or equilibrium, value of PCE is eliminated or corrected at each year (Note that the error
correction term captures the speed of adjustment to the long run equilibrium.
However, the use of Engle Granger method is criticized for its failure on some issues that are
addressed by the Johansen Approach. Interested readdress can get a detailed discussion of this
advanced approach on Harris (1995).
Check Your Progress 2
1. Discuss the concept of cointegration
2. Explain the error correction mechanism (ECM) what is its relation with cointegration
8.5 ANSWER TO CHECK YOUR PROGRESS
The answers for all questions are found in the discussion under sub units 8.2 and 8.3
8.6 MODEL EXAMINATION
1. Outline the Engle Granger Method for cointegration
2. A time series that has a unit root is called a random walk. Explain
Discuss the following
3. Why do we need to incorporate a one period lagged value of the error term in the ECM.
76
Any questions in the course that you have not been able to understand should be stated on a
separate sheet of paper and attached to this worksheet. Your tutor will clarify them for you.
After completing this test paper, be certain to write your Name, Id.No and Address on the first
page. Only your Name and Id.No on the other pages.
Part I: Attempt any three of the following
1. Econometrics is considered as an integration of economic theory, mathematical
economics and statistics but entirely different form each one of the them. Explain.
2. The following represents the true relationship between the independent variables
X1, X2, X3, and the dependent variable Y
Yi= bo+b1X1i+b2X2i+b3X3i+Ui
where Y=Quantity demanded
X1=price of the commodity.
X2=price of the other commodities
X3=Income of the consumer
Ui=disturbance term
i) Is the above relation exact? Why?
ii) What is the economic meaning of the coefficients?
iii) What will be the expected sign of the coefficients?
iv) What will be the expected size (magnitude) of the coefficients?
3. When do we use models like LPM, logit and probit?
4. The LPM is the simplest of the above three models. But it has several limitations. Discuss.
5. Explain the concept of simultaneous equation. When do we need it
6. What is spurious regression.
Part II: Workout Questions attempt any three of the following
1. There are occasions when the two variable linear regression model assumes the
following form:
Yi=Xi+Ei
where is the parameter and E is the disturbance term. In this model the intercept term
is zero. The model is therefore known as regression through the origin.
For this model show that
77
¿
β=
∑XY i i
∑X =
2
2
σ ∑ ei
¿ 2
( β )= u 2 2 =
∑ Xi
2
78
vii) Test the significance of the regression coefficients.
viii) Conduct tests of significance for r and R2
ix) Present the result of your analysis.
4. Given the following observations on output Y 1 labor input (x1) and capital input (x2) for 12
firms. (X1, X2, and Y are measured in arbitrary units).
a) Firms 1 2 3 4 5 6 7 8 9 10 11 12
Output 14 18 23 39 24 60 56 65 76 15 27 35
Labor input 11 13 22 45 30 60 62 57 76 15 28 34
Capital input 30 24 31 27 31 56 42 90 80 18 30 20
79