0% found this document useful (0 votes)
7 views5 pages

PH6205 RTutorial 4

This is the tutorial 4 for the course PH6205 at City University of Hong Kong.

Uploaded by

xuehrcityu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

PH6205 RTutorial 4

This is the tutorial 4 for the course PH6205 at City University of Hong Kong.

Uploaded by

xuehrcityu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

PH6205 R Tutorial 4

Haoran Xue

(Last Updated February 25, 2025)

1 Scottish Lip Cancer Data


We analyze the Scottish Lip Cancer Data. First we load the data into R:
Lip.data = read.csv("/Users/x/Dropbox/PH6205/Slides4/SC_lip_cancer.csv")
head(Lip.data)

## id district obs pop propag


## 1 1 Caithness 11 83190 10
## 2 2 Sutherland 5 37521 16
## 3 3 Ross-Cromarty 15 129271 10
## 4 4 Banff-Buchan 39 231337 16
## 5 5 Nairn 3 29374 10
## 6 6 Skye-Lochalsh 9 28324 16

1.1 Poisson Regression


Next, we fit the Poisson Regression of obs on propag, with log(pop) as the offset:
obs = Lip.data$obs
pop = Lip.data$pop
propag = Lip.data$propag
poisson.model = glm(obs ~ propag, offset = log(pop), family = poisson(link = "log"))
summary(poisson.model)

##
## Call:
## glm(formula = obs ~ propag, family = poisson(link = "log"), offset = log(pop))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.82457 0.07006 -154.50 <2e-16 ***
## propag 0.08114 0.00603 13.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 423.92 on 55 degrees of freedom
## Residual deviance: 256.92 on 54 degrees of freedom
## AIC: 468.89
##
## Number of Fisher Scoring iterations: 5

1
1.2 Plot for Checking Overdispersion
We can draw the scatter plot of (y − λ̂)2 versus λ̂ to visually check for overdispersion:
lambda.hat = fitted(poisson.model)
plot(lambda.hat,(obs - lambda.hat)ˆ2)
abline(0,1,col = "red",lwd = 2)
500
(obs − lambda.hat)^2

400
300
200
100
0

0 10 20 30 40

lambda.hat
Here the red solid line is y = x. Most of the points are above the red line, i.e. (y − λ̂)2 > λ̂ in general. This
indicates the existence of overdispersion.
We can also calculate the dispersion parameter as
phi = sum((obs - lambda.hat)ˆ2/lambda.hat)/(56-1-1)
phi

## [1] 5.430562

1.3 Quasi-Poisson Regression


We fit a Quasi-Poisson Regression to deal with the overdispersion.
Quasipoisson.model = glm(obs ~ propag, offset = log(pop), family = quasipoisson)
summary(Quasipoisson.model)

##
## Call:
## glm(formula = obs ~ propag, family = quasipoisson, offset = log(pop))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.82457 0.16326 -66.301 < 2e-16 ***
## propag 0.08115 0.01405 5.774 3.91e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##

2
## (Dispersion parameter for quasipoisson family taken to be 5.430563)
##
## Null deviance: 423.92 on 55 degrees of freedom
## Residual deviance: 256.92 on 54 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5

1.4 Negative Binomial Regression


We can also fit a Negative Binomial Regression to deal with the overdispersion. We use the function
glm.nb() from the R package MASS to fit the Negative Binomial Regression model.
library(MASS)
NB.model = glm.nb(formula = obs ~ propag + offset(log(pop)))
summary(NB.model)

##
## Call:
## glm.nb(formula = obs ~ propag + offset(log(pop)), init.theta = 2.531752157,
## link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.66858 0.15792 -67.558 < 2e-16 ***
## propag 0.08499 0.01408 6.036 1.58e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.5318) family taken to be 1)
##
## Null deviance: 93.005 on 55 degrees of freedom
## Residual deviance: 63.170 on 54 degrees of freedom
## AIC: 356.58
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.532
## Std. Err.: 0.642
##
## 2 x log-likelihood: -350.575

2 Vietnam Highly Pathogenic Avian Influenza Data


We use the Vietnam Highly Pathogenic Avian Influenza Data to illustrate the Zero-inflated Poisson
Regression model. First we load the data:
VN.new = read.csv("/Users/x/Dropbox/PH6205/Slides4/Vietnam2003HPAI.csv")
head(VN.new)

## obs pop pop.dens road.dens river.dens elevation


## 1 0 1 0.5731923 -0.9532340 -0.58138623 0.4925043
## 2 0 1 0.7068947 -0.7887648 -0.72589578 0.4782148
## 3 0 1 0.4386707 0.3990558 -0.24816734 0.3940719

3
## 4 0 1 0.7793921 0.2686765 0.40521494 0.4497510
## 5 0 1 0.6405167 -0.7609713 0.12034335 0.4716434
## 6 0 1 0.2975146 -0.1706405 -0.06887103 0.3827288
obs = VN.new$obs
pop = VN.new$pop
pop.dens = VN.new$pop.dens
road.dens = VN.new$road.dens
river.dens = VN.new$river.dens
elevation = VN.new$elevation

We can use the table() function to show the counts for different values of obs:
table(obs)

## obs
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1407 156 67 39 23 18 15 14 10 8 3 7 5 1 1 1
## 16 17 18 21
## 2 1 2 1
We can see, out of 1781 districts, there are 1407 report obs as 0. Next we draw to bar plot for better
visualization. (Note: the codes for drawing bar plot are not required.)
barplot(c((table(obs))[1:19],0,0,(table(obs))[20]),names.arg = 0:21,xlab = "obs",ylab = "Count")
1400
1000
Count

600
200
0

0 2 4 6 8 10 12 14 16 18 20

obs
We fit a Zero-inflated Poisson Regression model. We use the function zeroinfl() from the R package
pscl.
library(pscl)

## Classes and Methods for R originally developed in the


## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),

4
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
ZIP.VN = zeroinfl(obs ~ road.dens + river.dens + elevation | pop.dens,
offset = log(pop),dist = c("poisson"))
summary(ZIP.VN)

##
## Call:
## zeroinfl(formula = obs ~ road.dens + river.dens + elevation | pop.dens,
## offset = log(pop), dist = c("poisson"))
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -4.10910 -0.37482 -0.17808 -0.06144 17.42802
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.97356 0.07965 -24.778 < 2e-16 ***
## road.dens -0.12954 0.02251 -5.755 8.64e-09 ***
## river.dens 0.28272 0.08009 3.530 0.000416 ***
## elevation -1.00963 0.09860 -10.239 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.0195 0.1690 6.032 1.62e-09 ***
## pop.dens 3.1881 0.3105 10.268 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 13
## Log-likelihood: -1425 on 6 Df

You might also like