Spatial Regression
Praktikum 6 | Statistika Spasial
rahmaanisa@[Link]
Spatial autocorrelation in residuals
Spatial error model
• Incorporates spatial effects through error term
y x
W
• Where:
is the vector of error terms, spatially weighted using
the weights matrix (W)
is the spatial error coefficient
ξ is a vector of uncorrelated error terms
• If there is no spatial correlation between the errors,
then = 0
Spatial autocorrelation in Dependent Variable
Spatial lag model
• Incorporates spatial effects by including a spatially
lagged dependent variable as an additional predictor
y Wy x
• Where:
Wy is the spatially lagged DVs for weight s matrix W
x is a matrix of observatio ns on the explanatory variables
is a vector of error terms
is the spatial coefficient
• If there is no spatial dependence, and y does no
depend on neighboring y values, = 0
Spatial Model
y Wy XB u; u Wu
~ N (0, 2 I )
0 0 0
0 0 0
OLS SEM SAR
y XB y XB u y Wy XB
Spatial Regression in R
Example: Housing Prices in Boston
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 ft2
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise)
NOX Nitrogen oxide concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000's
Spatial Regression in R
1. Read in [Link]
2. Define neighbors (k nearest w/point data)
3. Create weights matrix
4. Moran’s test of DV, Moran scatterplot
5. Run OLS regression
6. Check residuals for spatial dependence
7. Determine which SR model to use w/LM tests
8. Run spatial regression model
Boston Data Set
• Silahkan download di alamat:
[Link]
• Pilih file dengan nama: [Link]
• Extract di working directory Anda masing2
Load the packages
library(rgdal)
library(maptools)
library(spdep)
Define neighbors and create weights matrix
> setwd("D:\\[Link]\\Statistika Spasial (S2)\\TA 2017-
2018\\Praktkum RA\\Reference\\Elisabeth Root")
> boston<-readOGR(dsn="boston",layer="boston")
> class(boston)
> boston$LOGMEDV<-log(boston$CMEDV)
> coords<-coordinates(boston)
> IDs<-[Link](as(boston, "[Link]"))
> bost_kd1<-dnearneigh(coords, d1=0, d2=3.973, [Link]=IDs)
> plot(boston)
> plot(bost_kd1, coords, add=T)
> bost_kd1_w<- nb2listw(bost_kd1)
Moran’s I on the DV
> [Link](boston$LOGMEDV, listw=bost_kd1_w)
Moran's I test under randomisation
data: boston$LOGMEDV
weights: bost_kd1_w
Moran I statistic standard deviate = 24.5658, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.3273430100 -0.0019801980 0.0001797138
Moran Plot for the DV
> [Link](boston$LOGMEDV,
bost_kd1_w,
labels=[Link](boston$ID))
OLS Regression
bostlm<-lm(LOGMEDV~RM + LSTAT + CRIM + ZN + CHAS + DIS, data=boston)
Residuals:
Min 1Q Median 3Q Max
-0.71552 -0.11248 -0.02159 0.10678 0.93024
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.8718878 0.1316376 21.817 < 2e-16 ***
RM 0.1153095 0.0172813 6.672 6.70e-11 ***
LSTAT -0.0345160 0.0019665 -17.552 < 2e-16 ***
CRIM -0.0115726 0.0012476 -9.276 < 2e-16 ***
ZN 0.0019330 0.0005512 3.507 0.000494 ***
CHAS 0.1342672 0.0370521 3.624 0.000320 ***
DIS -0.0302262 0.0066230 -4.564 6.33e-06 ***
---
Residual standard error: 0.2081 on 499 degrees of freedom
Multiple R-squared: 0.7433, Adjusted R-squared: 0.7402
F-statistic: 240.8 on 6 and 499 DF, p-value: < 2.2e-16
Checking residuals for spatial autocorrelation
> boston$lmresid<-residuals(bostlm)
> [Link](bostlm, bost_kd1_w)
Global Moran's I for regression residuals
Moran I statistic standard deviate = 5.8542, p-value = 2.396e-09
alternative hypothesis: greater
sample estimates:
Observed Moran's I Expectation Variance
0.0700808323 -0.0054856590 0.0001666168
Determining the type of dependence
> [Link](bostlm, bost_kd1_w, test="all")
Lagrange multiplier diagnostics for spatial dependence
LMerr = 26.1243, df = 1, p-value = 3.201e-07
LMlag = 46.7233, df = 1, p-value = 8.175e-12
RLMerr = 5.0497, df = 1, p-value = 0.02463
RLMlag = 25.6486, df = 1, p-value = 4.096e-07
SARMA = 51.773, df = 2, p-value = 5.723e-12
• Robust tests used to find a proper alternative
• Only use robust forms when BOTH LMErr and LMLag are significant
One more diagnostic…
> [Link](“lmtest”)
> library(lmtest)
> bptest(bostlm)
studentized Breusch-Pagan test
data: bostlm
BP = 70.9173, df = 6, p-value = 2.651e-13
• Indicates errors are heteroskedastic
• Not surprising since we have spatial dependence
Running a spatial lag model
> bostlag<-lagsarlm(LOGMEDV~RM + LSTAT + CRIM + ZN + CHAS + DIS,
data=boston, bost_kd1_w)
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.94228260 0.19267675 10.0805 < 2.2e-16
RM 0.10158292 0.01655116 6.1375 8.382e-10
LSTAT -0.03227679 0.00192717 -16.7483 < 2.2e-16
CRIM -0.01033127 0.00120283 -8.5891 < 2.2e-16
ZN 0.00166558 0.00052968 3.1445 0.001664
CHAS 0.07238573 0.03608725 2.0059 0.044872
DIS -0.04285133 0.00655158 -6.5406 6.127e-11
Rho: 0.34416, LR test value:37.426, p-value:9.4936e-10
Asymptotic standard error: 0.051967
z-value: 6.6226, p-value: 3.5291e-11
Wald statistic: 43.859, p-value: 3.5291e-11
Log likelihood: 98.51632 for lag model
ML residual variance (sigma squared): 0.03944, (sigma: 0.1986)
AIC: -179.03, (AIC for lm: -143.61)
A few more diagnostics
LM test for residual autocorrelation
test value: 1.9852, p-value: 0.15884
> [Link](bostlag)
studentized Breusch-Pagan test
data:
BP = 60.0237, df = 6, p-value = 4.451e-11
• LM test suggests there is no more spatial autocorrelation in the
data
• BP test indicates remaining heteroskedasticity in the residuals
• Most likely due to misspecification
Running a spatial error model
> bosterr<-errorsarlm(LOGMEDV~RM + LSTAT + CRIM + ZN + CHAS + DIS,
data=boston, listw=bost_kd1_w)
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.96330332 0.13381870 22.1442 < 2.2e-16
RM 0.09816980 0.01700824 5.7719 7.838e-09
LSTAT -0.03413153 0.00194289 -17.5674 < 2.2e-16
CRIM -0.01055839 0.00125282 -8.4277 < 2.2e-16
ZN 0.00200686 0.00062018 3.2359 0.001212
CHAS 0.06527760 0.03766168 1.7333 0.083049
DIS -0.02780598 0.01064794 -2.6114 0.009017
Lambda: 0.59085, LR test value: 24.766, p-value: 6.4731e-07
Asymptotic standard error: 0.086787
z-value: 6.8081, p-value: 9.8916e-12
Wald statistic: 46.35, p-value: 9.8918e-12
Log likelihood: 92.18617 for error model
ML residual variance (sigma squared): 0.03989, (sigma: 0.19972)
AIC: -166.37, (AIC for lm: -143.61)
Latihan
• Input data berikut:
kemiskinan<-[Link]("[Link]
• Input data bobot berikut:
bobot<-[Link]("[Link]
Memodelkan dengan regresi biasa (OLS)
[Link]<-kemiskinan[,-(1:4)]
[Link]<-lm(Y~.,data=[Link])
[Link]<-residuals([Link])
Uji asumsi
library(nortest)
library(car)
[Link]([Link])
hist([Link])
qqnorm([Link],datax=T)
qqline(rnorm(length([Link]),
mean([Link]),sd([Link])),datax=T, col="red")
durbinWatsonTest([Link])
[Link]([Link])
Uji Asumsi
Uji LM
bobot<-[Link]("[Link]
bobot<-[Link](bobot)
LM<-[Link]([Link], mat2listw(bobot, style="W"),
test=c("LMerr", "LMlag","RLMerr","RLMlag","SARMA"))
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = Y ~ ., data = [Link])
weights: mat2listw(bobot, style = "W")
LMerr = 15.399, df = 1, p-value = 8.702e-05
LMlag = 20.303, df = 1, p-value = 6.61e-06
RLMerr = 0.10805, df = 1, p-value = 0.7424
RLMlag = 5.0118, df = 1, p-value = 0.02518
SARMA = 20.411, df = 2, p-value = 3.697e-05
Model SAR
sar1<-lagsarlm(Y~.,
data=[Link],
mat2listw(bobot))
err.sar1<-residuals(sar1)
summary(sar1)
Next Steps
Diagnostics Checking for all models
Compare the goodness of fits
Select the best model