0% found this document useful (0 votes)

29 views36 pages

05 GeneralizedLinearModels

This document discusses different types of generalized linear models (GLMs) including logistic regression, Poisson regression, gamma regression, and hurdle models. It provides examples of how to run these models in R and interpret the outputs.

Uploaded by

Cotta Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views36 pages

05 GeneralizedLinearModels

Uploaded by

Cotta Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Chapter 5: Generalized Linear Modeling

Tyson S. Barrett
Summer 2017
Utah State University

1
Introduction

Logistic Regression

Poisson Regression

Beta Regression

Conclusions

2
Introduction

3
Good Quote

“You must stick to your conviction, but be

ready to abandon your assumptions.”
— Dennis Waitley

4
GLMs

Generalized Linear Models (GLMs):

1. Are extensions of linear regression to areas where

assumptions of normality and homoskedasticity do not hold
2. There are several versions of GLM’s, each for different
types and distributions of outcomes.

We are going to go through several of the most common GLMs.

5
Types

We discuss:

1. Logistic Regression
2. Poisson Regression
3. GLM with Gamma distribution
4. Negative binomial
5. Beta Regression

6
Logistic Regression

7
Logistic Regression

For binary outcomes (e.g., yes or no, correct or incorrect, sick

or healthy)

logit(Y ) = β0 + β1 X1 + ... +

where logit(Y ) = ln Prob(Y =1)
1−Prob(Y =1)

8
Prep Data

## First creating binary depression variable

## Use mutate()
df <- df %>%
mutate(dep = dpq010 + dpq020 + dpq030 + dpq040 + dpq050 +
dpq060 + dpq070 + dpq080 + dpq090) %>%
mutate(dep2 = ifelse(dep >= 10, 1,
ifelse(dep < 10, 0, NA)))
## Fix some placeholders
df <- df %>%
mutate(asthma = washer(mcq010, 9),
asthma = washer(asthma, 2, value = 0)) %>%
mutate(sed = washer(pad680, 9999, 7777))

Note:

1. IF depression ≥ 10 then dep2 is 1,

2. IF dpression < 10, then dep2 is 0,
3. ELSE dep2 is NA.
9
Running Logistic Regression

• βs are in “log-odds”
• e β is an “odds ratio”

In R, this is simple.

10
Running Logistic Regression

l_fit <- glm(dep2 ~ asthma + sed + race + famsize,

data = df,
family = "binomial")
summary(l_fit)

11
Running Logistic Regression

##
## Call:
## glm(formula = dep2 ~ asthma + sed + race + famsize, family = "binomia
## data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7831 -0.4479 -0.4078 -0.3645 2.5471
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.6203555 0.2380770 -11.006 < 2e-16 ***
## asthma 0.5688452 0.1276326 4.457 8.32e-06 ***
## sed 0.0005638 0.0002610 2.160 0.0307 *
## raceOtherHispanic 0.7162568 0.2328673 3.076 0.0021 **
## raceWhite 0.1287059 0.2116414 0.608 0.5431
## raceBlack 0.0189205 0.2205461 0.086 0.9316
## raceOther -0.4901414 0.2570123 -1.907 0.0565 . 12
Output of Logistic Regression

We used glm() (stands for generalized linear model)

• The key to making it logistic, since you can use glm() for
a linear model using maximum likelihood instead of lm()
with least squares, is family = "binomial"
• Default link in “binomial” is logit.
• Can also do probit to use probit regression.

13
Poisson Regression

14
Poisson Regression

Again, use the glm() function.

• The difference here is we will be using an outcome that is a

count variable.
• For example, the sedentary variable (sed) that we have in
df is a count of the minutes of sedentary activity.

15
Running Poisson Regression

p_fit <- glm(sed ~ asthma + race + famsize,

data = df,
family = "poisson")
summary(p_fit)

16
Running Poisson Regression

##
## Call:
## glm(formula = sed ~ asthma + race + famsize, family = "poisson",
## data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -27.362 -8.430 -1.477 5.823 34.507
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.6499871 0.0035550 1589.31 <2e-16 ***
## asthma 0.0614965 0.0021434 28.69 <2e-16 ***
## raceOtherHispanic 0.1393438 0.0040940 34.04 <2e-16 ***
## raceWhite 0.3484622 0.0033438 104.21 <2e-16 ***
## raceBlack 0.3400346 0.0034430 98.76 <2e-16 ***
## raceOther 0.3557953 0.0036273 98.09 <2e-16 ***
## famsize -0.0188673 0.0005488 -34.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 496351 on 4436 degrees of freedom
## Residual deviance: 475428 on 4430 degrees of freedom
## (195 observations deleted due to missingness)
## AIC: 508999
##
## Number of Fisher Scoring iterations: 5

17
Running Poisson Regression

Sedentary may be over-dispersed:

750
count

500

250

0
0 400 800 1200
Minutes of Sedentary Behavior
18
Running Poisson Regression

and so other methods related to poisson may be

necessary.

• See gamma, hurdle models, and negative

binomial models next

19
Gamma

• very similar to poisson but does not require

integers and can handle more dispersion.
• the outcome must have values > 0.

20
Gamma

## Adjust sed
df$sed_gamma <- df$sed + .01
g_fit <- glm(sed_gamma ~ asthma + race + famsize,
data = df,
family = "Gamma")
summary(g_fit)

21
Gamma

##
## Call:
## glm(formula = sed_gamma ~ asthma + race + famsize, family = "Gamma",
## data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.3589 -0.4613 -0.0845 0.2926 1.6868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.567e-03 1.132e-04 31.515 < 2e-16 ***
## asthma -1.604e-04 5.865e-05 -2.735 0.00626 **
## raceOtherHispanic -4.874e-04 1.309e-04 -3.723 0.00020 ***
## raceWhite -1.090e-03 1.078e-04 -10.115 < 2e-16 ***
## raceBlack -1.068e-03 1.102e-04 -9.697 < 2e-16 ***
## raceOther -1.110e-03 1.145e-04 -9.695 < 2e-16 ***
## famsize 5.107e-05 1.552e-05 3.289 0.00101 ** 22
Two-Part or Hurdle Models

• Use the pscl package to run a hurdle model.

• These models are built for situations where there is a count
variable with many zeros (“zero-inflated”).
• The hurdle model makes slightly different assumptions
regarding the zeros than the pure negative binomial that we
present next.
• The hurdle consists of two models: one for whether the
person had a zero or more (binomial) and if more than zero,
how many (poisson).

To run a hurdle model, we are going to make a sedentary

variable with many more zeros to illustrate and then we will run
a hurdle model.
23
Two-Part or Hurdle Models

## Zero inflated sedentary (don't worry too much about the specifics)
df$sed_zero <- ifelse(sample(1:100,
size = length(df$sed),
replace=TRUE) %in% c(5,10,11,20:25), 0,
df$sed)
## Hurdle model
library(pscl)
h_fit = hurdle(sed_zero ~ asthma + race + famsize,
data = df)
summary(h_fit)

24
Two-Part or Hurdle Models

##
## Call:
## hurdle(formula = sed_zero ~ asthma + race + famsize, data = df)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -3.9248 -1.4783 -0.2191 1.2563 11.0364
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.6727458 0.0036627 1548.80 <2e-16 ***
## asthma 0.0627030 0.0022628 27.71 <2e-16 ***
## raceOtherHispanic 0.1201634 0.0042592 28.21 <2e-16 ***
## raceWhite 0.3248979 0.0034416 94.40 <2e-16 ***
## raceBlack 0.3337217 0.0035384 94.31 <2e-16 ***
## raceOther 0.3359265 0.0037427 89.75 <2e-16 ***
## famsize -0.0200684 0.0005781 -34.71 <2e-16 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.84791 0.23592 12.072 <2e-16 ***
## asthma -0.20907 0.13695 -1.527 0.1269
## raceOtherHispanic -0.57535 0.25379 -2.267 0.0234 *
## raceWhite -0.48597 0.22052 -2.204 0.0275 *
## raceBlack -0.31269 0.22953 -1.362 0.1731
## raceOther -0.37082 0.24153 -1.535 0.1247
## famsize -0.05421 0.03545 -1.529 0.1262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 12
## Log-likelihood: -2.307e+05 on 14 Df

25
Hurdle Models

Notice that the output has two parts:

1. “Count model coefficients (truncated poisson with log

link):” and
2. “Zero hurdle model coefficients (binomial with logit link):”.

Together they tell us about the relationship between the

predictors and a count variable with many zeros.

26
Negative Binomial

• negative binomial is also for zero-inflated count variables.

• It makes slightly different assumptions than the hurdle and
doesn’t use a two-part approach.
• Use the MASS package and the glm.nb() function.
library(MASS)
fit_nb <- glm.nb(sed_zero ~ asthma + race + famsize,
data = df)
summary(fit_nb)

27
Negative Binomial

Note that this model is not really appropriate

because our data is somewhat contrived.

28
Beta Regression

29
Beta Regression

• For outcomes that are bound between a lower and upper

bound
• For example, if we are looking at test scores that are bound
between 0 and 100.
• It is a very flexible method and allows for some extra
analysis regarding the variation.

30
Running Beta Regression

• Use the betareg package.

• But first, we are going to reach a little and create a ficticiously bound
variable in the data set.

## Variable bound between 0 and 1

df$beta_var <- sample(seq(.05, .99, by = .01),
size = length(df$asthma),
replace = TRUE)
library(betareg)
fit_beta <- betareg(beta_var ~ asthma + race + famsize,
data = df)
summary(fit_beta)

31
Running Beta Regression

##
## Call:
## betareg(formula = beta_var ~ asthma + race + famsize, data = df)
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -2.0364 -0.6739 -0.0598 0.6311 2.9235
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.195018 0.063399 3.076 0.0021 **
## asthma -0.057137 0.043789 -1.305 0.1920
## raceOtherHispanic -0.053873 0.072158 -0.747 0.4553
## raceWhite -0.025079 0.058871 -0.426 0.6701
## raceBlack -0.059966 0.061116 -0.981 0.3265
## raceOther -0.077520 0.065502 -1.183 0.2366
## famsize -0.009472 0.010843 -0.874 0.3824
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 2.45394 0.04465 54.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 83.2 on 8 Df
## Pseudo R-squared: 0.00112
## Number of iterations: 15 (BFGS) + 1 (Fisher scoring)

32
Beta Regression

• The output provides coefficients and the “Phi” coefficients.

• Both are important parts of using beta regression but we
are not going to discuss it here.

33
Conclusions

34
Conclusions

There are many resources available to learn more about each of

these GLM’s.
As for now, we are going to move on to more complex modeling
where there are clustering or repeated measures in the data.

35
Conclusions

One of the great things about R is that most modeling is very

similar to the basic lm() function.

• In all of these GLM’s the arguments are nearly all the same:
• a formula,
• the data, and
• family of model.

• As you’ll see for Multilevel and Other Models chapters, this

does not change much.
• Having a good start with basic models and GLM’s gets you
ready for nearly every other modeling type in R.

Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Multilevel Modeling: GEE vs ME
No ratings yet
Multilevel Modeling: GEE vs ME
35 pages
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
Sestrada Logistic Regression in R 02172023
No ratings yet
Sestrada Logistic Regression in R 02172023
25 pages
16 GLM2
No ratings yet
16 GLM2
29 pages
Week6 2 GLM2
No ratings yet
Week6 2 GLM2
26 pages
Count Data
No ratings yet
Count Data
5 pages
Stepwise Regression Techniques Explained
No ratings yet
Stepwise Regression Techniques Explained
15 pages
04.laboratory Exercise IV
80% (5)
04.laboratory Exercise IV
5 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
R Course
No ratings yet
R Course
7 pages
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
GLM in R
No ratings yet
GLM in R
6 pages
Review of Logistic and Poisson Regression Models
No ratings yet
Review of Logistic and Poisson Regression Models
15 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
Regression Basics for Epidemiologists
No ratings yet
Regression Basics for Epidemiologists
18 pages
Count Data Regression in R
No ratings yet
Count Data Regression in R
13 pages
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
Stata Commands for Data Analysis
No ratings yet
Stata Commands for Data Analysis
8 pages
Categorical Data Analysis: GLM Basics
100% (1)
Categorical Data Analysis: GLM Basics
53 pages
07 GLM
No ratings yet
07 GLM
49 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Run Commands
No ratings yet
Run Commands
8 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
6 pages
STATA Logistic Regression Guide
No ratings yet
STATA Logistic Regression Guide
4 pages
2-06 Non-Linear Models - Logged Variables and Standardized Coefficients
No ratings yet
2-06 Non-Linear Models - Logged Variables and Standardized Coefficients
28 pages
4 - Logistic Reg 1
No ratings yet
4 - Logistic Reg 1
30 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression Lab Guide
No ratings yet
Logistic Regression Lab Guide
10 pages
R Regression Analysis Guide
No ratings yet
R Regression Analysis Guide
16 pages
R Illustration 2021 Logistic Regression
No ratings yet
R Illustration 2021 Logistic Regression
18 pages
ECON 2220-E Final Assignment Guide
No ratings yet
ECON 2220-E Final Assignment Guide
5 pages
Group 4
No ratings yet
Group 4
9 pages
ESB2021 Resit With Solution
No ratings yet
ESB2021 Resit With Solution
9 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Linear Regression Analysis in R
100% (1)
Linear Regression Analysis in R
15 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
GLM & Berkeley Admissions Analysis
No ratings yet
GLM & Berkeley Admissions Analysis
24 pages
R Codes
No ratings yet
R Codes
5 pages
Logistic Regression Derivation Explained
No ratings yet
Logistic Regression Derivation Explained
6 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
Lec-03 LogisticRegression
No ratings yet
Lec-03 LogisticRegression
32 pages
Stat 5700 HW 2
No ratings yet
Stat 5700 HW 2
15 pages
Logistic Regression for Students
No ratings yet
Logistic Regression for Students
10 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Generalized Linear Models Guide
No ratings yet
Generalized Linear Models Guide
12 pages
Diabetes Progression Analysis with LASSO
No ratings yet
Diabetes Progression Analysis with LASSO
11 pages
Statistical Modelling For Biomedical Researchers
100% (2)
Statistical Modelling For Biomedical Researchers
544 pages
T3. Logistic Regressions
No ratings yet
T3. Logistic Regressions
3 pages
HW4 Solutions: Problem 6.2
No ratings yet
HW4 Solutions: Problem 6.2
8 pages
Computer Lab 3 MM
No ratings yet
Computer Lab 3 MM
38 pages
PH6205 RTutorial 4
No ratings yet
PH6205 RTutorial 4
5 pages
REgression 1
No ratings yet
REgression 1
19 pages
Lect 12
No ratings yet
Lect 12
36 pages
09 AdvancedPlotting
No ratings yet
09 AdvancedPlotting
57 pages
Tidyverse Data Manipulation Guide
No ratings yet
Tidyverse Data Manipulation Guide
44 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
R Function Writing Guide
No ratings yet
R Function Writing Guide
36 pages
Representation of Negative Numbers and
No ratings yet
Representation of Negative Numbers and
26 pages
Functional Bid Landscape Forecasting
No ratings yet
Functional Bid Landscape Forecasting
16 pages
Understanding Bayesian Classifiers
No ratings yet
Understanding Bayesian Classifiers
13 pages
Solution Manual For Digital Signal Processing 4th Edition
No ratings yet
Solution Manual For Digital Signal Processing 4th Edition
13 pages
TCS Codevita 2025 Questions
No ratings yet
TCS Codevita 2025 Questions
24 pages
MCQS
No ratings yet
MCQS
2 pages
YOLO Models for Outdoor Obstacle Detection
No ratings yet
YOLO Models for Outdoor Obstacle Detection
20 pages
Cryptography Coursework Guide
No ratings yet
Cryptography Coursework Guide
8 pages
A Review On Sentiment Analysis Using Machine Learning
No ratings yet
A Review On Sentiment Analysis Using Machine Learning
5 pages
Student Database STD ID STD Name Marks Percentage Grade Remark Grade Description Marks
No ratings yet
Student Database STD ID STD Name Marks Percentage Grade Remark Grade Description Marks
18 pages
BR100 Application Setup Document
No ratings yet
BR100 Application Setup Document
1 page
Ann Joshua
No ratings yet
Ann Joshua
20 pages
AI WK 11 Lec 21 22 Student
No ratings yet
AI WK 11 Lec 21 22 Student
23 pages
Max-Distance Sub-Sequence - en
No ratings yet
Max-Distance Sub-Sequence - en
5 pages
IoT Anomaly Detection Lab
No ratings yet
IoT Anomaly Detection Lab
8 pages
Quantum Error Correction Overview
No ratings yet
Quantum Error Correction Overview
22 pages
AI Model Exam for CSE Students
No ratings yet
AI Model Exam for CSE Students
4 pages
Multidimensional Item Response Theory 1st Edition Wes Bonifay PDF Download
No ratings yet
Multidimensional Item Response Theory 1st Edition Wes Bonifay PDF Download
45 pages
Unit Iv
No ratings yet
Unit Iv
23 pages
Constrained Location Algorithm Using TDOA
No ratings yet
Constrained Location Algorithm Using TDOA
3 pages
MATH39001 - Combinatorics and Graph Theory - Exam - Jan-2011
No ratings yet
MATH39001 - Combinatorics and Graph Theory - Exam - Jan-2011
8 pages
(H) 2019 LPP
No ratings yet
(H) 2019 LPP
7 pages
Understanding Radial-Basis Function Networks
0% (1)
Understanding Radial-Basis Function Networks
11 pages
Second Quantization: K K K K K K K
No ratings yet
Second Quantization: K K K K K K K
2 pages
ICIP 2024: Image Processing Conference
No ratings yet
ICIP 2024: Image Processing Conference
2 pages
Goals of Machine Learning in Artificial Intelligence
No ratings yet
Goals of Machine Learning in Artificial Intelligence
3 pages
Introduction To Applied Geostatistics For Geomodelling
No ratings yet
Introduction To Applied Geostatistics For Geomodelling
75 pages
Phone Number Letter Combinations
No ratings yet
Phone Number Letter Combinations
9 pages
Generative AI For Models Development
No ratings yet
Generative AI For Models Development
8 pages
SY IQC Lab Manual
No ratings yet
SY IQC Lab Manual
39 pages