A) ESTIMACIÓN
Price: > quantile(PRICE, c(0.25,0.50, 0.75))
25% 50% 75%
65000 85900 120000
En ésta estimación se ve que hay una media en el precio de las casas de unos 85,900$us. Y un
75% de las casas se vendió en un precio de 120,000$us.
Area: > quantile(AREA, c(0.25,0.50, 0.75))
25% 50% 75%
1560 2056 2544
En ésta estimación vemos que en un 25% de las casas existen 1.560 pies cuadrados en cuanto a la
medida
Baths: > quantile(BATHS, c(0.25,0.50, 0.75))
25% 50% 75%
2 2 3
Aqui se puede observar que en un 50% de las casas hay 2 baños en cada casa y en un 75% de
casas hay 3 baños por casa.
Room: > quantile(ROOMS, c(0.25,0.50, 0.75))
25% 50% 75%
6 7 7
En este cuantil se aprecia que en un 25% de las casas hay 6 cuartos por casa y en un 75% hay 7
cuartos por casa.
> [Link](PRICE)
Jarque Bera Test
data: PRICE
X-squared = 112.24, df = 2, p-value < 2.2e-16
> [Link](AREA)
Jarque Bera Test
data: AREA
X-squared = 44.701, df = 2, p-value = 1.965e-10
> [Link](AGE)
Jarque Bera Test
data: AGE
X-squared = 1996, df = 2, p-value < 2.2e-16
> [Link](BATHS)
Jarque Bera Test
data: BATHS
X-squared = 23.699, df = 2, p-value = 7.143e-06
> [Link](ROOMS)
Jarque Bera Test
data: ROOMS
X-squared = 28.464, df = 2, p-value = 6.594e-07
> [Link](NBH)
Jarque Bera Test
data: NBH
X-squared = 32.047, df = 2, p-value = 1.099e-07
B) CUMPLIMIENTO DE SUPUESTOS ECONOMETRICOS Y MEDIDAS CORRECTIVAS
1: NORMALIDAD DE RESIDUOS
> mod1<-lm(LPRICE~LAREA+BATHS+ROOMS)
> summary(mod1)
Call:
lm(formula = LPRICE ~ LAREA + BATHS + ROOMS)
Residuals:
Min 1Q Median 3Q Max -1.3816
-0.1749 0.0157 0.2191 0.8234
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.32791 0.48986 14.959 < 2e-16 ***
LAREA 0.43676 0.07298 5.984 5.87e-09 ***
BATHS 0.22445 0.03352 6.695 9.80e-11 ***
ROOMS 0.03142 0.02440 1.288 0.199
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3047 on 317 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5164
F-statistic: 114.9 on 3 and 317 DF, p-value: < 2.2e-16
> [Link](mod1$residuals)
Jarque Bera Test
data: mod1$residuals
X-squared = 32.792, df = 2, p-value = 7.574e-08
> [Link](mod1$residuals)
Shapiro-Wilk normality test data:
mod1$residuals
W = 0.98262, p-value = 0.00063
> mod2<-lm(LPRICE~LAREA+BATHS+ROOMS+
+ LDIST+AGE+NBH)
> summary(mod2)
Call:
lm(formula = LPRICE ~ LAREA + BATHS + ROOMS + LDIST + AGE + NBH) Residuals:
Min 1Q Median 3Q Max -1.42352
-0.17924 0.00028 0.18119 0.77862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0990456 0.5873475 10.384 < 2e-16 ***
LAREA 0.5459062 0.0684483 7.975 2.86e-14 ***
BATHS 0.0993813 0.0348267 2.854 0.004610 **
ROOMS 0.0510629 0.0231256 2.208 0.027962 * LDIST
0.0692899 0.0378531 1.830 0.068124 .
AGE -0.0034331 0.0005587 -6.145 2.43e-09 ***
NBH -0.0257627 0.0073965 -3.483 0.000566 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2795 on 314 degrees of freedom
Multiple R-squared: 0.6008, Adjusted R-squared: 0.5931
F-statistic: 78.75 on 6 and 314 DF, p-value: < 2.2e-16
> [Link](mod2$residuals)
Jarque Bera Test
data: mod2$residuals
X-squared = 58.129, df = 2, p-value = 2.385e-13
> [Link](mod2$residuals)
Shapiro-Wilk normality test data:
mod2$residuals
W = 0.9794, p-value = 0.0001447
En ambos modelos con las dos pruebas que son Jarque Bera y Shapiro wilks se observa que tienen
un p-value menor al 0.10 lo que significa que hay normalidad en la distribución de residuos sin
problema alguno.
> mod1b<-lm(LPRICE~LAREA+ BATHS+ ROOMS, [Link])
> summary(mod1b)
Call:
lm(formula = LPRICE ~ LAREA + BATHS + ROOMS, data = [Link])
Residuals:
Min 1Q Median 3Q Max -0.73326
-0.18159 0.00152 0.20035 0.75464
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.00310 0.44332 15.797 < 2e-16 ***
LAREA 0.49010 0.06614 7.410 1.22e-12 ***
BATHS 0.20715 0.03020 6.860 3.76e-11 ***
ROOMS 0.02807 0.02173 1.291 0.198
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2704 on 309 degrees of freedom
Multiple R-squared: 0.5885, Adjusted R-squared: 0.5845
F-statistic: 147.3 on 3 and 309 DF, p-value: < 2.2e-16
> [Link](mod1b$residuals)
Jarque Bera Test
data: mod1b$residuals
X-squared = 0.80192, df = 2, p-value = 0.6697
> [Link](mod1b$residuals)
Shapiro-Wilk normality test data:
mod1b$residuals
W = 0.99641, p-value = 0.7064
> mod2b<-lm(LPRICE~LAREA+BATHS+ROOMS+ + LDIST+AGE+NBH, [Link])
> summary(mod2b)
Call:
lm(formula = LPRICE ~ LAREA + BATHS + ROOMS + LDIST + AGE + NBH,
data = [Link])
Residuals:
Min 1Q Median 3Q Max -0.74173
-0.17704 -0.00688 0.16910 0.67034
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0272615 0.5330123 11.308 < 2e-16 ***
LAREA 0.5646898 0.0627870 8.994 < 2e-16 ***
BATHS 0.1120188 0.0319593 3.505 0.000525 ***
ROOMS 0.0426382 0.0211510 2.016 0.044685 * LDIST
0.0652203 0.0348153 1.873 0.061977 .
AGE -0.0025515 0.0005727 -4.455 1.18e-05 ***
NBH -0.0286521 0.0067773 -4.228 3.12e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.252 on 306 degrees of freedom
Multiple R-squared: 0.6461, Adjusted R-squared: 0.6391
F-statistic: 93.09 on 6 and 306 DF, p-value: < 2.2e-16
> [Link](mod2b$residuals)
Jarque Bera Test
data: mod2b$residuals
X-squared = 0.99374, df = 2, p-value = 0.6084
> [Link](mod2b$residuals)
Shapiro-Wilk normality test
data: mod2b$residuals
W = 0.99605, p-value = 0.6254
En este caso se sacó el p-value final tomando en cuenta que se eliminó las observaciones
atópicas, lo que dio por resultado final en ambos modelos y en ambas pruebas que son Jarque
Bera y Shapiro wilk con un p-value mayor al 10% lo que significa que habrá problemas en la
distribución de residuos.
2) NO HETEROCEDASTICIDAD
PRUEBA DE GOLDFED – QUANDT
> gqtest(mod1b) Goldfeld-
Quandt test
data: mod1b
GQ = 1.2658, df1 = 153, df2 = 152, p-value = 0.07342 alternative
hypothesis: variance increases from segment 1 to 2
> gqtest(mod2b) Goldfeld-
Quandt test
data: mod2b
GQ = 1.3037, df1 = 150, df2 = 149, p-value = 0.05308 alternative
hypothesis: variance increases from segment 1 to 2
Se rechaza éste Proyecto ya que la probabilidad es menor que el 10% por lo tanto no existe
homocedasticidad.
PRUEBA DE BREUSH PAGAN >
bptest(mod1b)
studentized Breusch-Pagan test data:
mod1b
BP = 6.9233, df = 3, p-value = 0.07438
> bptest(mod2b)
studentized Breusch-Pagan test data:
mod2b
BP = 17.644, df = 6, p-value = 0.007186
> bptest(mod1b)
studentized Breusch-Pagan test data:
mod1b
BP = 6.9233, df = 3, p-value = 0.07438
> bptest(mod2b)
studentized Breusch-Pagan test data:
mod2b
BP = 17.644, df = 6, p-value = 0.007186
PRUEBA DE WHITE
> #White sin t?rminos cruzados
> resid1_b<-mod1b$residuals^2
> resid2_b<-mod2b$residuals^2
> summary(mod2b)
Call:
lm(formula = LPRICE ~ LAREA + BATHS + ROOMS + LDIST + AGE + NBH,
data = [Link])
Residuals:
Min 1Q Median 3Q Max -0.74173
-0.17704 -0.00688 0.16910 0.67034
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0272615 0.5330123 11.308 < 2e-16 ***
LAREA 0.5646898 0.0627870 8.994 < 2e-16 ***
BATHS 0.1120188 0.0319593 3.505 0.000525 ***
ROOMS 0.0426382 0.0211510 2.016 0.044685 *
LDIST 0.0652203 0.0348153 1.873 0.061977 .
AGE -0.0025515 0.0005727 -4.455 1.18e-05 ***
NBH -0.0286521 0.0067773 -4.228 3.12e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.252 on 306 degrees of freedom
Multiple R-squared: 0.6461, Adjusted R-squared: 0.6391
F-statistic: 93.09 on 6 and 306 DF, p-value: < 2.2e-16
> LAREA2<-[Link]$LAREA^2
> BATHS2<-[Link]$BATHS^2
> ROOMS2<-[Link]$ROOMS^2
> LDIST2<-[Link]$LDIST^2
> AGE2<-[Link]$AGE^2
> NBH2<-[Link]$NBH^2
> white1<-lm(resid1_b~LAREA2+BATHS2+ROOMS2)
> summary(white1)
Call:
lm(formula = resid1_b ~ LAREA2 + BATHS2 + ROOMS2) Residuals:
Min 1Q Median 3Q Max
-0.09904 -0.06011 -0.03400 0.02786 0.46894
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0787373 0.0769670 -1.023 0.3071 LAREA2
0.0032548 0.0015299 2.127 0.0342 * BATHS2 -
0.0001027 0.0023977 -0.043 0.9659
ROOMS2 -0.0008266 0.0005538 -1.492 0.1366
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0953 on 309 degrees of freedom
Multiple R-squared: 0.02156, Adjusted R-squared: 0.01206
F-statistic: 2.269 on 3 and 309 DF, p-value: 0.0805
> linearHypothesis(white1,
+ "LAREA2+BATHS2+ROOMS2=0")
Linear hypothesis test
Hypothesis:
LAREA2 + BATHS2 + ROOMS2 = 0
Model 1: restricted model
Model 2: resid1_b ~ LAREA2 + BATHS2 + ROOMS2
[Link] RSS Df Sum of Sq F Pr(>F)
1 310 2.8234
2 309 2.8062 1 0.017144 1.8877 0.1705
> white2<-lm(resid2_b~LAREA2+BATHS2+ROOMS2+ + +LDIST2+AGE2+NBH2)
> summary(white2)
Call:
lm(formula = resid2_b ~ LAREA2 + BATHS2 + ROOMS2 + +LDIST2 +
AGE2 + NBH2) Residuals:
Min 1Q Median 3Q Max -0.09878
-0.05058 -0.02451 0.02088 0.45139
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.531e-02 8.429e-02 -0.775 0.43903
LAREA2 4.153e-03 1.308e-03 3.175 0.00165 **
BATHS2 -1.507e-03 2.174e-03 -0.693 0.48886
ROOMS2 -5.691e-04 5.015e-04 -1.135 0.25735
LDIST2 -7.351e-04 5.572e-04 -1.319 0.18805
AGE2 1.519e-07 1.195e-06 0.127 0.89893
NBH2 -7.280e-04 3.918e-04 -1.858
0.06414 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08029 on 306 degrees of freedom Multiple R-squared: 0.05616,
Adjusted R-squared: 0.03765
F-statistic: 3.035 on 6 and 306 DF, p-value: 0.006748
> linearHypothesis(white2,
+ "LAREA2+BATHS2+ROOMS2+
+ LDIST2+AGE2+NBH2=0")
Linear hypothesis test
Hypothesis:
LAREA2 + BATHS2 + ROOMS2 + LDIST2 + AGE2 + NBH2 = 0
Model 1: restricted model
Model 2: resid2_b ~ LAREA2 + BATHS2 + ROOMS2 + +LDIST2 + AGE2 + NBH2
[Link] RSS Df Sum of Sq F Pr(>F)
1 307 1.9736
2 306 1.9726 1 0.0010397 0.1613 0.6883
PRUEBA DE HARRISON MCCABE
> ## 4.2.3 perform Harrison-McCabe test >
#Ho: Los residuos son homoscedásticos.
> hmctest(mod1b) Harrison-
McCabe test
data: mod1b
HMC = 0.46083, p-value = 0.185
> hmctest(mod2b) Harrison-
McCabe test
data: mod2b
HMC = 0.46755, p-value = 0.214
En este caso no se rechaza la hipótesis nula de la homocedasticidad debido a su probabilidad
mayor al 10%.
3) NO MULTICOLINEALIDAD
> # Factor inflacionario de varianza (FIV o VIF en inglés)
> # vif(mco3) <- Solo para LM
> vif(mod1b)
LAREA BATHS ROOMS
2.192242 2.354008 1.664775
> vif(mod2b)
LAREA BATHS ROOMS LDIST AGE NBH
2.274695 3.036072 1.815373 1.361321 1.418591 1.061939
En este supuesto se ve no existe problema de multicolinealidad debido a su pvalue mayor a 10%.
4) ESPECIFICACION DEL PROBLEMA
#Ho: No existe especificación cuadrática o cúbica en el modelo
> resettest(fgls1 , power=2, type="fitted")
RESET test
data: fgls1
RESET = 0.1663, df1 = 1, df2 = 307, p-value = 0.6837
> resettest(fgls2 , power=2, type="fitted")
RESET test
data: fgls2
RESET = 0.58144, df1 = 1, df2 = 304, p-value = 0.4463
Se acepta la hipotesis nula debido a su p-value mayor a 0.10 y porque no tiene modelos cuadraticos ni
especificación cuadrática.
> # Rainbow analysis
> # Ho: El modelo es lineal (en las variables)
> # H1: El modelo no es lineal (en las variables)
> rain <- raintest(mod1b)
> rain
Rainbow test
data: mod1b
Rain = 1.4732, df1 = 157, df2 = 152, p-value = 0.00836
> rain2 <- raintest(mod2b)
> rain2
Rainbow test
data: mod2b
Rain = 1.5486, df1 = 157, df2 = 149, p-value = 0.003656
De acuerdo a la probabilidad se llegó a la conclusión de que no es lineal en las variables ya que tiene
componentes cuadráticos y es menor al 10%.
> # Varios model comparision by Akaike criterion
> AIC(fgls1, fgls2)
df AIC
fgls1 6 63.115603 fgls2 9 5.904562 el mejor modelo llega a ser el que tenga un
menor valor por lo tanto el mejor modelo 2.
> # Varios model comparision by Scwharz criterion
> BIC(fgls1, fgls2)
df BIC
fgls1 6 85.59282 fgls2
9 39.62039
en esta prueba se elije al modelo 2 también como el mejor modelo porque el criterio es de minimización
y el modelo 2 es el que tiene un valor menor.
accuracy(fgls1)
ME RMSE MAE MPE MAPE MASE
Training set -0.0009993451 0.2668775 0.2155773 -0.06307038 1.89046 0.6384774 >
accuracy(fgls2)
ME RMSE MAE MPE MAPE MASE
Training set -0.002286489 0.2455381 0.1969613 -0.06517669 1.727098 0.5833423