0% found this document useful (0 votes)
21 views64 pages

AE Unit II

Uploaded by

sheikhma1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views64 pages

AE Unit II

Uploaded by

sheikhma1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Unit – II

Problems in Regression Analysis


Dr. Vikram K. Joshi
M.Sc.(Stats); M.A.(Econometrics)
MBA; Ph.D (Bus. Eco)
Faculty Member, DMT, RCOEM, Nagpur
Unit Objective
• Understand the various problems in regression analysis and its remedies.

Unit Contents
⮚ Problems in Regression Analysis:
• Multicollinearity: Nature, problem and remedies, Auto-
correlation: Nature, problem and remedies,
• Heteroscedasticity: Nature, problem and remedies, Specification
error: Nature, problem and remedies.

Dr. Vikram K. Joshi


Multicollinearity

• Assumption of the CLRM requires that there are no exact linear relationships
among the sample values of the explanatory variables (the Xs).

• So, when the explanatory variables are very highly correlated with each other
(correlation coefficients either very close to 1 or to -1) then the problem of
multicollinearity occurs.

Dr. Vikram K. Joshi


Perfect Multicollinearity
• When there is a perfect linear relationship.
• Assume we have the following model:
Y= β0 + β1X1 + β2X2+e
where the sample values for X1 and X2 are:

X1 1 2 3 4 5 6
X2 2 4 6 8 10 12

• We observe that X2=2X1


• Therefore, although it seems that there are two explanatory variables in fact
it is only one.

Dr. Vikram K. Joshi


Perfect Multicollinearity

• When this occurs then the equation:


δ1X1+δ2X2=0
can be satisfied for non-zero values of both δ1 and δ2.
In our case we have that
(-2)X1+(1)X2=0
As δ1 = -2 and δ2 = 1.
• Obviously if the only solution is
δ1 = δ2 = 0
(usually called as the trivial solution) then the two variables are linearly
independent and there is no problematic multicollinearity.

Dr. Vikram K. Joshi


Perfect Multicollinearity

• In case of more than two explanatory variables the case is that one variable can
be expressed as an exact linear function of one or more or even all of the other
variables.

• So, if we have 5 explanatory variables we have:


δ1X1+δ2X2 +δ3X3+δ4X4 +δ5X5=0
- If there is multicollinearity between them.

Dr. Vikram K. Joshi


Imperfect Multicollinearity
• Imperfect multicollinearity (or near multicollinearity) exists when the
explanatory variables in an equation are correlated, but this correlation is less
than perfect.

• This can be expressed as:


X2 = X1 + v
where v is a random variable that can be viewed as the ‘error’ in the exact linear
relationship.

Dr. Vikram K. Joshi


Practical Consequences of Multicollinearity
1. Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends
to be statistically insignificant.
4. Also the t ratio of one or more coefficients is statistically insignificant, R2,
the overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small
changes in the data.

Dr. Vikram K. Joshi


Consequences of Multicollinearity
• The Variance Inflation Factor
R2j VIFj
0 1
0.5 2
0.8 5
0.9 10
0.95 20
0.075 40
0.99 100
0.995 200
0.999 1000
Dr. Vikram K. Joshi
Consequences of Multicollinearity
The Variance Inflation Factor

• VIF values that exceed 10 are generally viewed as evidence of the existence
of problematic multicollinearity.

• This happens for R2j > 0.9

Dr. Vikram K. Joshi


Detection of Multicollinearity
• Important to bear in mind while testing presence of multicollinearity:

1. Multicollinearity is a question of degree and not of kind. Hence meaningful


distinction is not between the presence and the absence of multicollinearity, but
between its various kinds.

2. Since multicollinearity refers to the condition of the explanatory variables that


are assumed to be non-stochastic, it is a feature of the sample and not of the
population.

Dr. Vikram K. Joshi


Detection of Multicollinearity - Rules
1. High R2 but few significant t ratios.
2. High pair-wise correlations among regressors.
3. Examination of partial correlations. (r1.23 or r12.3 , etc)
4. Auxiliary regressions (One or more of the regressors are exact or
approximately linear combinations of the other regressors.) { X2 = α + β X1
or X4 = α + β X3 }
5. Eigenvalues and condition index (Calculated by Eviews or Stata)
𝑴𝒂𝒙 𝑬𝒊𝒈𝒆𝒏 𝑽𝒂𝒍𝒖𝒆
{CI = = 𝑘 } (Between 10 – 30 & higher indicates presence of multicollinearity)
𝑴𝒊𝒏 𝑬𝒊𝒈𝒆𝒏 𝑽𝒂𝒍𝒖𝒆

6. Tolerance (T = 1 – R2) and variance inflation factor


7. Scatterplot

Dr. Vikram K. Joshi


Remedial Measures
1. Use of A priori information
If Y = β1 + β2 X2 + β3 X3 + ui
Where Y = consumption, X2 = income and X3 = wealth.
Suppose a priori we believe that X3 = 0.10*X2, then the above equation may be
written as
Yi = β1 + β2 X3 + ui
2. Combining cross-sectional and time series data
3. Dropping a variable(s) and avoiding a specification bias
4. Transformation of variables (first difference transformation)
5. Additional or new data
6. Other methods (principal components, factor analysis or ridge regressions)

Dr. Vikram K. Joshi


Heteroskedasticity
• What is Heteroskedasticity?
Hetero (different or unequal) is the opposite of Homo (same or equal)…
Skedastic means spread or scatter…
Homoskedasticity = equal spread
Heteroskedasticity = unequal spread

• Assumption of the CLRM states that the disturbances should have a constant
(equal) variance independent of t:
Var(ut)=σ2

Therefore, having an equal variance means that the disturbances are


homoskedastic.
Dr. Vikram K. Joshi
Heteroskedasticity
• What is Heteroskedasticity?
If the homoskedasticity assumption is violated then

Var(ut)=σt2

Where the only difference is the subscript t, attached to the σt2, which means
that the variance can change for every different observation in the sample
t=1, 2, 3, 4, …, n.

Look at the following graphs…

Dr. Vikram K. Joshi


Heteroskedasticity
• What is Heteroskedasticity?

Shows Homoskedastic residuals


Dr. Vikram K. Joshi
Heteroskedasticity
• What is Heteroskedasticity?

Shows Heteroscedastic variance of residuals

Dr. Vikram K. Joshi


Heteroskedasticity
• What is Heteroskedasticity?

Shows heteroscedastic variance of residual

Dr. Vikram K. Joshi


Heteroskedasticity – Reasons
or Why E(ui2) = σi2
• Error-learning models: as people learn, their errors of behaviour become
smaller over time or consistent over time.
• As income grows, more scope for choice about the disposition of their
income.
• Improvement in data collecting techniques is likely to decrease σi2.
• Due to presence of outliers.
• The regression model is not correctly specified.
• Skewness in the distribution.
• Incorrect data transformation or incorrect functional form.

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
There are two ways in general.
• The first is the informal way which is done through graphs and therefore we
call it the graphical method.

• The second is through formal tests for heteroskedasticity, like the following
ones:
1. The Park Test
2. The Glesjer Test
3. Spearman’s Rank Correlation Test
4. The Goldfeld-Quandt Test
5. The Breusch-Pagan-Godfrey Test
6. White’s Test

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Graphical Method
• We plot the square of the obtained residuals against fitted Y or the X’s or t
and we see the patterns.

or t

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Graphical Method
• We plot the square of the obtained residuals against fitted Y and the X’s
and we see the patterns.

or t

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Graphical Method
• We plot the square of the obtained residuals against fitted Y and the X’s and
we see the patterns.

or t

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Graphical Method
• We plot the square of the obtained residuals against fitted Y and the X’s and
we see the patterns.

or t

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Graphical Method
• We plot the square of the obtained residuals against fitted Y and the X’s and
we see the patterns.

or t

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Formal Methods
• The Park Test
Step 1: Estimate the model by OLS and obtain the residuals

Step 2: Run the following auxiliary regression:


ln ui2 = α + β ln Xi + vi

Step 3: If β turns out to be statistically significant, it would suggest that


heteroscedasticity is present in the data.

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Formal Methods
• The Glejser Test
• Similar in spirit to the Park Test.
Step 1: Estimate the model by OLS and obtain the residuals

Step 2: Run the following auxiliary regression:


ӏ ui ӏ = α + β Xi + vi

Step 3: If β turns out to be statistically significant, it would suggest that


heteroscedasticity is present in the data.

Dr. Vikram K. Joshi


Detecting Heteroskedasticity

= rs

Dr. Vikram K. Joshi


Detecting Heteroskedasticity

Dr. Vikram K. Joshi


Detecting Heteroskedasticity
⮚ Formal Methods
• Breusch-Pagan-Godfrey Test
Step 1: Estimate the model by OLS and obtain the residuals.
Assume Yi = β0 + β1Xi + ui.
Step 2: Obtain σ2 = Σ ui2 / n.
Step 3: Construct variable pi defined as
pi = u i2 / σ 2
and regress on Xi’s as given below:
pi = α1 + α2 Xi + vi, where vi is the residual term
Step 4: Obtain the ESS (explained sum of squares) from above equation and define,
Ѳ = ESS / 2 = ∑ (p^ - pbar)2 / 2
Assuming ui are normally distributed normally, if Ѳ > χ2 value at 5% ls, the hypothesis of
homoscedasticity is rejected.
Dr. Vikram K. Joshi
Detecting Heteroskedasticity
⮚ Formal Methods
• White Test
Step 1: Estimate the model by OLS and obtain the residuals.
Assume Yi = β1 + β2X2i + β3X3i + ui.

Step 2: Run the following auxiliary regression.

Step 3: Compute n * R2
where n and R2 are from the auxiliary regression.

Step 4: If n * R2 > χ2 value at 5% ls, the hypothesis of homoscedasticity is


rejected. Thus, there is an evidence of heteroscedasticity.
Dr. Vikram K. Joshi
Remedial Measures - Heteroscedasticity

• We have two different approaches:

(a) Weighted Least Squares (When σt2 is known)

(a) Heteroskedasticity – Consistent Estimation Methods (When σt2 is


unknown)

Dr. Vikram K. Joshi


Remedial Measures - Heteroscedasticity

(a) Weighted Least Squares (When σt2 is known)


The WLS procedure assigns weights, wt, adjusting our variables.

Define wt=1/σt, and rewrite the original model as:


wtYt = β1wt + β2X2twt + β3X3twt + … + βkXktwt + utwt

Where if we define as wtYt-1 = Y*t and Xitwt = X*it


we get
Y*t= β*1 + β*2X*2t + β*3X*3t +…+ β*kX*kt + u*t

Dr. Vikram K. Joshi


Remedial Measures - Heteroscedasticity

(b) Heteroskedasticity – Consistent Estimation Methods (When σt2 is


unknown)

• In such cases White’s Heteroscedasticity-Consistent Variances and standard


error estimation method provided by various computer packages can be
used.

• They are performed using usual OLS method and are also known as robust
standard errors.

Dr. Vikram K. Joshi


Autocorrelation
What is Autocorrelation?
• Assumption of the CLRM states that the covariances and correlations between
different disturbances are all zero:
cov(ut, us)=0 for all t≠s

• This assumption states that the disturbances ut and us are independently


distributed, which is called serial independence.

Dr. Vikram K. Joshi


Autocorrelation
What is Autocorrelation?
• If this assumption is no longer valid, then the disturbances are not pairwise
independent, but pairwise autocorrelated (or Serially Correlated).

• This means that an error occurring at period t may be carried over to the next
period t+1.

• Autocorrelation is most likely to occur in time series data.

• In cross-sectional we can change the arrangement of the data without altering


the results.

Dr. Vikram K. Joshi


Causes of Autocorrelation
⮚ Omitted variables
• Suppose Yt is related to X2t and X3t, but we wrongfully do not include X3t in
our model.

• The effect of X3t will be captured by the disturbances ut.

• If X3t like many economic series exhibit a trend over time, then X3t depends
on X3t-1, X3t -2 and so on.

• Similarly then ut depends on ut-1, ut-2 and so on.

Dr. Vikram K. Joshi


Causes of Autocorrelation
⮚ Misspecification
• Suppose Yt is related to X2t with a quadratic relationship:
• Yt = β1 + β2X22t + ut

• But we wrongfully assume and estimate a straight line:


• Yt = β1 + β2X2t + ut

• Then the error term obtained from the straight line will depend on X22t.

Dr. Vikram K. Joshi


Causes of Autocorrelation

⮚ Systematic errors in measurement


• Suppose a company updates its inventory at a given period in time.

• If a systematic error occurred then the cumulative inventory stock will exhibit
accumulated measurement errors.

• These errors will show up as an autocorrelated.

Dr. Vikram K. Joshi


First-Order Autocorrelation

• The simplest and most commonly observed is the first-order autocorrelation.

• Consider the multiple regression model:


Yt=β1+β2X2t+β3X3t+β4X4t+…+βkXkt+ut

• In which the current observation of the error term ut is a function of the


previous (lagged) observation of the error term:
ut=ρut-1+et

Dr. Vikram K. Joshi


First-Order Autocorrelation

• The coefficient ρ is called the first-order autocorrelation coefficient and takes


values from -1 to +1.

• It is obvious that the size of ρ will determine the strength of serial


correlation.

• We can have three different cases.

Dr. Vikram K. Joshi


First-Order Autocorrelation

(a) If ρ is zero, then we have no autocorrelation.

(a) If ρ approaches unity, the value of the previous observation of the error
becomes more important in determining the value of the current error and
therefore high degree of autocorrelation exists. In this case we have positive
autocorrelation.

(a) If ρ approaches -1, we have high degree of negative autocorrelation.

Dr. Vikram K. Joshi


Consequences of Autocorrelation

1. The OLS estimators are still unbiased and consistent. This is because both
unbiasedness and consistency do not depend on assumption of
autocorrelation which is in this case violated.

2. The OLS estimators will be inefficient and therefore no longer BLUE.

3. The estimated variances of the regression coefficients will be biased and


inconsistent, and therefore hypothesis testing is no longer valid. In most of
the cases, the R2 will be overestimated and the t-statistics will tend to be
higher.

Dr. Vikram K. Joshi


Detecting Autocorrelation

⮚ There are two ways in general.


• The first is the informal way which is done through graphs and therefore we
call it the graphical method.

• The second is through formal tests for autocorrelation, like the following
ones:

1. The Runs Test


2. The Durbin Watson d Test
3. The Breusch-Godfrey Test

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Graphical method.

Fig: Positive Serial Correlation


Dr. Vikram K. Joshi
Detecting Autocorrelation
⮚ Graphical method.

Fig: Negative Serial Correlation


Dr. Vikram K. Joshi
Detecting Autocorrelation
⮚ Formal method
• The Runs Test
Step 1: Estimate the model by OLS and obtain the residuals

Step 2: Give the signs (+ or -) to the obtained residuals


Now let N = total number of observations = N1 + N2
N1 = number of + symbols (i.e., + residuals)
N2 = number of – symbols (i.e., - residuals)
R = number of runs
Step 3: The null hypothesis is that the successive outcomes (here residuals) are
independent and the number of runs is normally distributed with
Mean E(R) = { 2N1N2/N } + 1 and
Variance σR2 = {2N1N2(2N1N2 – N)} / N2 (N-1)
If R lies in the confidence interval E(R) - 1.96σR ≤ R ≤ E(R) + 1.96σR}, do not reject
the null hypothesis of randomness with 95% confidence.
Dr. Vikram K. Joshi
Detecting Autocorrelation
⮚ Formal method
• Durbin-Watson d Test
The following assumptions should be satisfied:
1. The regression model includes a constant
2. Autocorrelation is assumed to be of first-order only
3. The error term ui is assumed to be normally distributed
4. The equation does not include a lagged dependent variable as an explanatory
variable

Dr. Vikram K. Joshi


Detecting Autocorrelation

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Formal method
• Durbin-Watson d Test

Zone of No Zone of
indecision autocorrelation indecision -ve autoc
+ve autoc

0 dL dU 2 4-dU 4-dL 4

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Formal method
• Durbin-Watson d Test
Drawbacks of the DW test
1. It may give inconclusive results
2. It is not applicable when a lagged dependent variable is used
3. It can’t take into account higher order of autocorrelation

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Formal method
• The Breusch-Godfrey (BG) Test
It is a Lagrange Multiplier Test that resolves the drawbacks of the DW test.

Consider the model:


Yt = β1 + β2X2t + β3X3t + β4X4t + … + βkXkt + ut
where:
ut = ρ1ut-1 + ρ2ut-2 + ρ3ut-3 +… + ρput-p + et

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Formal method
• The Breusch-Godfrey (BG) Test
Combining those two we get:
Yt = β1 + β2X2t + β3X3t + β4X4t + … + βkXkt +
+ ρ1ut-1 + ρ2ut-2 + ρ3ut-3 + … + ρput-p + et

The null and the alternative hypotheses are:


H0: ρ1= ρ2=…= ρp=0 no autocorrelation
Ha: at least one of the ρ’s is not zero, thus, autocorrelation

Dr. Vikram K. Joshi


Detecting Autocorrelation
⮚ Formal method
• The Breusch-Godfrey (BG) Test
Step 1: Estimate the model and obtain the residuals

Step 2: Run the full LM model with the number of lags used being determined
by the assumed order of autocorrelation.

Step 3: Compute the LM statistic = (n-ρ)R2 from the LM model and compare it
with the chi-square critical value.

Step 4: Conclude

Dr. Vikram K. Joshi


Specification Errors
Introduction
▪ Before any equation can be estimated, it must be completely specified.
▪ Broadly speaking, specifying an econometric equation consists of the
following:

• Choosing the “correct” explanatory variables


• Choosing the “correct” functional form
• Choosing the “correct” form of the error term

Dr. Vikram K. Joshi


Specification Errors
Specification error can arise in a number of ways:

(i) Omission of a relevant explanatory variable - underfitting

(ii) Inclusion of an irrelevant explanatory variable - overfitting

(iii) Adopting the wrong functional form

Dr. Vikram K. Joshi


Specification Errors
(i) Omission of a relevant explanatory variable - underfitting

“True Model”

Under-fitted Model

X3 omitted from the under-fitted model

In general, E(b2) ≠ β2

Dr. Vikram K. Joshi


Specification Errors
(ii) Inclusion of an irrelevant explanatory variable - overfitting

“True Model”

Over-fitted Model

X3i included in the over-fitted model

E(b2) and var (b2) are still unbiased.


However, estimates are now inefficient (variances
are generally larger)
Dr. Vikram K. Joshi
Specification Errors
(iii) Adopting the wrong functional form

For example, if we estimate a linear model

But the true model is a log-linear model

Then the mis-specification arises because we


estimate the “wrong” functional form
Dr. Vikram K. Joshi
Mis-Specification Tests
Mis-specification generally occurs when:

• We omit a relevant variable, or


• We include an irrelevant variable, or
• We use an incorrect functional form

In most circumstances we do not know what the “true”


model is. How can we determine, therefore, whether
the model we estimate is correctly specified?

Dr. Vikram K. Joshi


Mis-Specification Tests
Preliminary Analysis (informal Tests)

⮚ Variables based on economic theory (if possible)


• Observe sign and significance of coefficients; what happens when an
additional variable is added or deleted?

• Does adj R2 or R2 increase when more variables are added

• Look at the pattern of the residuals (if there are noticeable patterns then it is
possible that the model has been mis-specified)

Dr. Vikram K. Joshi


Mis-Specification Tests
Formal Test – Ramsey’s RESET Test)

A more formal test of mis-specification

Proxy variables

RESET test: proxies based on the predicted


value of Y

Dr. Vikram K. Joshi


Example
Suppose we estimate the following model

and want to test for mis-specification.


The RESET test uses the predicted values

And creates various powers of


Adding these powers to the original model, we then
estimate a new model:
Ŷi = b1 + b2 X2i + b3 Ŷ2i + b4 Ŷ3i + Ui
Dr. Vikram K. Joshi
Example
Perform an F-test on the significance of the additional
variables

If additional variables are significant: evidence


of mis-specification

Cautionary Note
RESET is easy to apply but cannot tell us the reason for the mis-
specification (i.e. omitted variable or functional form)

Dr. Vikram K. Joshi

You might also like