0% found this document useful (0 votes)
8 views5 pages

PH6205 RTutorial 3

This is the tutorial 3 for the course PH6205 at City University of Hong Kong.

Uploaded by

xuehrcityu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

PH6205 RTutorial 3

This is the tutorial 3 for the course PH6205 at City University of Hong Kong.

Uploaded by

xuehrcityu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PH6205 R Tutorial 3

Haoran Xue

(Last Updated February 18, 2025)

CHD Example
Let’s reproduce the results for CHD example in our lecture.

Load Data into R


First, we need to load the CHD data into R. The name of the dataset is CHDData.csv. On my computer
the path to this dataset is “/Users/x/Library/CloudStorage/Dropbox/PH6205/Slides3”, you need
to change the path according to the location of the dataset on your own computer when you read the data.
We use the function read.csv().
CHD.data = read.csv("/Users/x/Library/CloudStorage/Dropbox/PH6205/Slides3/CHDData.csv")

We use the head() function to show the first 6 rows of CHD data:
head(CHD.data)

## ID AGE CHD Sex


## 1 1 20 0 Female
## 2 2 23 0 Male
## 3 3 24 0 Male
## 4 4 25 0 Female
## 5 5 25 1 Male
## 6 6 26 0 Male
Extract variables CHD and Age:
CHD = CHD.data$CHD
Age = CHD.data$AGE

Linear Regression Is Not Appropriate


Run the linear regression of CHD ∼ Age:
lm.CHDAge = lm(CHD~Age)

Now lets draw the scatter plot of CHD versus Age, using red color to denote samples with CHD, and blue to
denote samples without CHD. (Note: This part of code is not required.)
ind1 = which(CHD == 1)
ind2 = which(CHD == 0)
plot(Age[ind2],CHD[ind2],
ylim = c(0,1),xlim = range(Age),col = "blue",
xlab = "Age",ylab = "CHD")
points(Age[ind1],CHD[ind1],col = "red")
abline(lm.CHDAge)

1
1.0
0.8
0.6
CHD

0.4
0.2
0.0

20 30 40 50 60 70

Age
Show the details of linear regression results:
summary(lm.CHDAge)

##
## Call:
## lm(formula = CHD ~ Age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.85793 -0.33992 -0.07274 0.31656 0.99269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.537960 0.168809 -3.187 0.00193 **
## Age 0.021811 0.003679 5.929 4.57e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.429 on 98 degrees of freedom
## Multiple R-squared: 0.264, Adjusted R-squared: 0.2565
## F-statistic: 35.15 on 1 and 98 DF, p-value: 4.575e-08

Simple Logistic Regression


Let’s fit the simple logistic regression of CHD ∼ Age:
logistic.CHDAge = glm(CHD ~ Age, family = binomial)
summary(logistic.CHDAge)

##
## Call:
## glm(formula = CHD ~ Age, family = binomial)
##

2
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.30945 1.13365 -4.683 2.82e-06 ***
## Age 0.11092 0.02406 4.610 4.02e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 136.66 on 99 degrees of freedom
## Residual deviance: 107.35 on 98 degrees of freedom
## AIC: 111.35
##
## Number of Fisher Scoring iterations: 4

Multiple Logistic Regression


Let’s fit the multiple logistic regression of CHD ∼ Age + Sex:
Sex = CHD.data$Sex
logistic.CHDAgeSex = glm(CHD ~ Age + Sex, family = binomial)
summary(logistic.CHDAgeSex)

##
## Call:
## glm(formula = CHD ~ Age + Sex, family = binomial)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.43517 1.19000 -4.567 4.94e-06 ***
## Age 0.11041 0.02407 4.588 4.48e-06 ***
## SexMale 0.20176 0.54014 0.374 0.709
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 136.66 on 99 degrees of freedom
## Residual deviance: 107.21 on 97 degrees of freedom
## AIC: 113.21
##
## Number of Fisher Scoring iterations: 4

Compare Nested Models Using LRT Test


We use function lrtest() from the R package lmtest to perform the LRT test. First, you should install the
package lmtest, which is similar to what we did to install package car in R Tutorial 2. After successfully
installing lmtest, we load the package using function library():
library(lmtest)

## Loading required package: zoo


##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':

3
##
## as.Date, as.Date.numeric
Now we can use the function lrtest() to perform the LRT test:
lrtest(logistic.CHDAge, logistic.CHDAgeSex)

## Likelihood ratio test


##
## Model 1: CHD ~ Age
## Model 2: CHD ~ Age + Sex
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -53.677
## 2 3 -53.607 1 0.14 0.7083

HL Test
We use function hoslem.test() from the R package ResourceSelection to perform the Hosmer-Lemeshow
test. First, you should install the package ResourceSelection, which is similar to what we did to install
package car and lmtest. After successfully installing ResourceSelection, we load the package using
function library():
library(ResourceSelection)

## ResourceSelection 0.3-6 2023-06-27


The HL test takes two sets of values, with the first set as the binary outcome and the second set as the
predicted probabilities p̂i ’s. The function fitted() can be used to get the predicted probabilities:
fitted(logistic.CHDAgeSex)

## 1 2 3 4 5 6 7
## 0.03816625 0.06333496 0.07020957 0.06447555 0.07776843 0.08606576 0.08606576
## 8 9 10 11 12 13 14
## 0.10509814 0.10509814 0.11594475 0.12775098 0.12775098 0.12775098 0.10690499
## 15 16 17 18 19 20 21
## 0.12775098 0.12775098 0.15444378 0.15444378 0.14288754 0.16941897 0.18552758
## 22 23 24 25 26 27 28
## 0.15695020 0.15695020 0.15695020 0.18552758 0.17211894 0.17211894 0.22123058
## 29 30 31 32 33 34 35
## 0.18842604 0.22123058 0.24083707 0.20589388 0.24083707 0.22453305 0.26159750
## 36 37 38 39 40 41 42
## 0.28347926 0.24434039 0.26529728 0.30643177 0.33038578 0.28736807 0.35525318
## 43 44 45 46 47 48 49
## 0.35525318 0.35525318 0.35525318 0.38092752 0.33461756 0.38092752 0.40728524
## 50 51 52 53 54 55 56
## 0.35963241 0.40728524 0.40728524 0.43418763 0.43418763 0.46148346 0.46148346
## 57 58 59 60 61 62 63
## 0.48901221 0.43887755 0.48901221 0.51660777 0.51660777 0.51660777 0.54410241
## 64 65 66 67 68 69 70
## 0.54410241 0.54410241 0.57133084 0.57133084 0.59813416 0.62436343 0.62436343
## 71 72 73 74 75 76 77
## 0.60270860 0.60270860 0.67457219 0.65420877 0.69832867 0.69832867 0.72106802
## 78 79 80 81 82 83 84
## 0.72106802 0.72106802 0.70233019 0.74272480 0.70233019 0.74272480 0.74272480
## 85 86 87 88 89 90 91
## 0.70233019 0.76325219 0.76325219 0.76325219 0.78262110 0.74635133 0.76668026

4
## 92 93 94 95 96 97 98
## 0.80081893 0.81784789 0.83372326 0.80384284 0.84847137 0.86212770 0.86212770
## 99 100
## 0.87473498 0.91568744
Now we perform the HL test, the first argument CHD is the set of binary outcomes, and the second argument
fitted(logistic.CHDAgeSex) is the set of predicted probabilities:
hoslem.test(CHD,fitted(logistic.CHDAgeSex))

##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: CHD, fitted(logistic.CHDAgeSex)
## X-squared = 1.7167, df = 8, p-value = 0.9885

You might also like