Econometrics for Financial Analysis
Theme 5: Diagnostics and Coefficient
Tests for regression model
•Instructor : Dr. Esmat Kamel , MA, PhD Autonoma University of Madrid
•References used to prepare the slides
•Wooldridge, J. (2019) Introductory Econometrics: A Modern Approach. 7th
edition. Southwestern Cengage.
•Email: [email protected]
Application file The nature of data is a time series file and should
be converted to a time series
GDP time
First take a glance on data editor and inspect the
series .xlsx variables to see if you need adjustments or there
are missing values.
The variables are all exponential in nature like
GDP (Gross Domestic Product), FDI (Foreign
Direct Investment , remittances and Suez Canal
Revenues)
gen lnGDP=log(GDP)
gen lnFDI=log(FDI)
gen lnRemittences=log(Remittences)
gen
lnSuezCanalRevenues=log(SuezCanalRevenues)
5.1 Application
Descriptive data
The command Sum to summarize data descriptive
statistics with:
Number of Observations Obs.
Mean
Std.Dev.
Min.
Max.
Command: summarize
sum lnGDP lnFDI lnRemittences lnSuezCanalRevenues
Why trade balance cannot be logged ?
It can be a negative value, we cannot take the log of a
negative value
5.1.1Application
Descriptive data
The command list to list all data variables and
Observations
Command: list
list lnGDP lnFDI lnRemittences
lnSuezCanalRevenues
5.1.2 Report dataset as time series and Graphical
presentations
Plotting graphical presentations of the time
Report data as time series and declare series
information about it :- Command: tsline GDP
Command: tsreport
Starting period = 2013q2
Ending period = 2021q1
Number of obs = 32
Number of gaps = 0
5.1.4 Graphical presentations for all variables
Command: tsline lnSuezCanalRevenues
Command: tsline lnFDI
5.1.3 Graphical presentations for all variables together
Command: tsline lnGDP lnFDI lnRemittences lnSuezCanalRevenues
5.1.4 Plotting other Types of Graphs: Historgram and Box
Command: graph box lnGDP
Command:histogram lnFDI
The box height identifies the spread of the
The shape kurtosis,densities and skewness of central 50% of the data. The line inside the
the historgram do not indicate that this variable Box indicates to the median ordered middle
lnFDI follows a normal distribution, but we Value of the data set. Whiskers outside
should verify through testing for normality. determine the minimum and maximum values
of data range. Potential outliers appear
beyond the whiskers.
5.2 Running the Regression Model Intepretation
Time variable: time & Linear regression
One basis point increase in lnFDI, increases the lnGDP by 0.23 basis at a higher levels of
significance 95%.
One basis point increase in lnremittences , increases lnGDP by 1.182 basis points at highest
levels of significance 99%.
One basis point increase in lnSuezCanalRevenues, increases lnGDP by 0.825 basis points with
no significance.
As per are t-statistics of independent variables of 2.10, 5.36, -2.57 when compared to the
critical intervals, they fall outside the confidence interval and hence the H0: is rejected and
values are significant as different levels 0.05 , 0.10 and 0.05 with the exception of
lnSuezCanalRevenues which has a t-statistic of 1.04 which falls within the interval of the
critical value and hence is claimed to be insignificant.
Model’s over all goodness of fit R^2 is 72.9% of the variability in dependent variable is
explained throught the three potential independent variables and probability F is zero so model
is significant.
5.3 Estimating Residuals and Residuals Diagnostics
After the regress command directly estimate the residuals for the data set
predict residuals, residuals
Now normality test could be held to see whether residuals hold a normal distribution or not
Command: predict residuals, residuals The test interpretation if
Ho: Normality
Testing for normality via the Jarque Berra test H1: no Normality
Fail to reject as p-value of Chi
Command: sktest residual squared is greater than 0.10
5.3.1 Estimating Residuals and Residuals Diagnostics
The variance is not constant and there are various tests for heteroskedasticity through:
White heteroskedasticity test
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Command: estat hettest Intepretation is the p-value is
0.1249 , then we fail to reject
Breusch–Pagan/Cook–Weisberg test for the Ho , which means there
heteroskedasticity are normal error terms.
Assumption: Normal error terms
Variable: Fitted values of lnGDP
H0: Constant variance [homoskedasticity]
H1: Non constant variance [heteroskedasticity]
chi2(1) = 2.36
Prob > chi2 = 0.1249
5.3.1Treatment for residual Diagnostics
It is easier to use the GLS to treat for the problem of heteroskedasticity
Command: glm lnGDP lnFDI lnRemittences lnSuezCanalRevenues, family(gaussian)
link(identity)
We have the same interpretation of the model as before
Generalized linear models as there Was no heteroskedasticity at first place.
Sig
lnGDP Coef. St.Err. t-value p-value [95% Interval]
Conf
**
lnFDI .232 .11 2.10 .035 .016 .448
***
lnRemittences 1.182 .221 5.36 0 .749 1.614
lnSuezCanalReven .825 .797 1.04 .301 -.737 2.386
ues
**
Constant -10.432 4.054 -2.57 .01 -18.378 -2.486
Mean dependent var 6.520 SD dependent var 0.373
Number of obs 32 Chi-square 75.388
Prob > chi2 0.000 Akaike crit. (AIC) -7.086
*** p<.01, ** p<.05, * p<.1
5.4 Coefficient Diagnostics (Multicollinearity)
To detect if there exists multicollinearity in a model or not , this is done under two conditions:-
1) The correlation Matrix to be constructed to detect whether there exist suspection of linear
dependence between two or more independent variables.
2) The VIF (Vector Inflated Factors and its centric value which shouldn’t exceed 10 so not to
Classify the case as severe multicollinearity.
Command correlation matrix:
pwcorr lnGDP lnFDI lnRemittences lnSuezCanalRevenues
Variables (1) (2) (3) (4) The coefficient values exceeding 0.7
(1) lnGDP 1.000 like the one for 0.823 between lnFDI
(2) lnFDI 0.439 1.000
and lnGDP might be a source of
(3) lnRemittences 0.823 0.296 1.000
(4) 0.573 0.162 0.602 1.000 suspected multicollinearity but needs
lnSuezCanalRev~s to be inspected through the VIF
5.4.1 Coefficient Diagnostics (Multicollinearity)
To detect if there exists multicollinearity in a model or not , this is done under two conditions:-
2) The VIF (Vector Inflated Factors and its centric value which shouldn’t exceed 10 so not to
Classify the case as severe multicollinearity.
Command VIF:
regress lnGDP lnFDI lnRemittences lnSuezCanalRevenues
vif
VIF 1/VIF
lnRemittences 1.674 .597 As per the mean VIF and
lnSuezCanalRevenues 1.569 .637 centered Values there is
lnFDI 1.097 .912
no evident signs of
Mean VIF 1.447 .
multicollinearity between
values.