1
Topic:
Multicollinearity
Outline 2
What is multicollinearity
Types of multicollinearity
Cause of multicollinearity
Consequences of multicollinearity
Detection methods of multicollinearity
Remedies of multicollinearity
Example
What is Multicollinearity? 3
In statistics, multicollinearity is a phenomenon in which one
predictor variable in a multiple regression model can be linearly
predicted from the others with a substantial degree of accuracy
Examples: If some one want to predict the age of a person by
using weight and height
4
Types of Multicollinearity
There are two types of multicollinearity
Perfect multicollinearity
High or non-perfect multicollinearity
Perfect Multicollinearity 5
Perfect multicollinearity refers to a situation in which two or more
explanatory variables in a multiple regression model are perfectly
correlated
Consider the model:
If
Thus only would be estimable. We cannot get the estimates of and
separately. In this case, there is perfect multicollinearity because and
perfectly correlated
6
High or Non-Perfect Multicollinearity
Multicollinearity always exists. It occurs when there are
high correlations among predictor variables, leading to
unreliable and unstable estimates of regression coefficients
How Multicollinearity Occurs? 7
It is caused by inaccurate use of dummy variables
It is caused by inclusion of a variable which is computed
from other variables in the data set
Multicollinearity can also results from the replication of
same kind of variable
Multicollinearity generally occurs when the variables are
highly correlated with each other
8
Consequences of Multicollinearity
Even in the presence of multicollinearity, OLS is blue and
consistent
Stand error of estimate tends to be large
The probability of committing type-II error will be large
Estimates of standard errors and parameters tends to be
sensitive to changes in the data and specification of the
model
9
Detection Method of Multicollinearity
Correlation Matrix
Tolerance Measure
Variance Inflation Factor (VIF)
Condition Index (CI)
10
What is Correlation
Matrix?
A correlation matrix is a table showing correlation
coefficient between sets of variables
11
What is Tolerance Measure?
The percentage of the variance in the predictor variable
that is not accounted by the other predictor variables.
Most common tolerance values of 0.10 or less are cited as
problematical
Although 0.20 has also been suggested.
2
Tolerance 1 R
What is Variance Inflation Factor 12
(VIF)?
In statistics,
the variance inflation factor (VIF)quantifies the
severity of multicollinearity in an ordinary least
square regression analysis. It provides an index that measures
how much the variance of an estimated regression coefficient
is increased because of collinearity
VIF of 5 or 10 and above indicates a multicollinearity problem
13
Condition Index (CI)
The condition index is supposed to measure the sensitivity of the
regression estimates to small changes in the data. It is defined as
the square root of the ratio of the largest to the smallest
eigenvalue of the matrix of the explanatory variables
CI lies between 10 to 30 then there is moderate multicollinearity
CI exceeds 30 then there will be severe multicollinearity
Remedies of Multicollinearity 14
The remedies for multicollinearity range from modification of the
regression variates to the use of specialized estimation procedures
By weighted least square
By using the Ridge Regression
Omit one or more highly correlated independent variables
Use the model with the highly correlated independent variables
Example 15
Klein and Goldberger want to attempt the following U.S.
economy from 1936-1949.
Where Y = consumption, X1 = wage income, X2 = nonwage,
nonfarm income, and X3 = farm income. But since X1, X2,
and X3 are expected to be highly collinear.
Detect the multicollinearity and reduce it.
Raw data give for consumption, wage income, nonwage and
farm income.
16
Data
Analyze > Regression > Linear 17
Statistics… 18
Output 19
20
Remedy by Omitting variable
Analyze > Regression > Linear 21
Statistics… 22
23
Output
24
1 2
Remedy Through Weighted Least Square 25
By using weighted average 26
Output 27
28
1 2