Section 8 P
Section 8 P
Konstantin Kashin1
1
Thanks to Jen Pan, Brandon Stewart, Iain Osgood, and Patrick Lam for
contributing to this material.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Outline
Administrative Issues
Replication Paper
Outline
Administrative Issues
Why Zero-Inflation?
The problem is, some people didn’t even fish! These people have
systematically zero fish.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
The Model
We’re going to assume that whether or not the person fished is the
outcome of a Bernoulli trial.
0 with probability ψi
Yi = {
Logistic with probability 1 − ψi
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
The Model
ψi + (1 − ψi ) (1 − 1
) if yi = 0
P(Yi = yi ∣β, ψi ) { 1+e−Xβ
(1 − ψi ) ( 1+e1−Xβ ) if yi = 1
1
ψ=
1 + e−zi γ
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Yi
1 1
[(1 − ) ( )]
1 + e−zi γ 1 + e−Xi β
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
1
(1 − Yi ) ln[ψ + (1 − ψ) (1 − )]}
1 + e−Xi β
n
1 1
= ∑ {Yi ln [(1 − )( )] +
i=1 1+e −z i γ 1 + e−Xi β
1 1 1
(1 − Yi ) ln [ + (1 − ) (1 − )]}
1+e−z i γ 1+e −z i γ 1 + e−Xi β
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
out$par
[1] 1.507470 -2.686476 1.447307 1.876404 -1.247189
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
These numbers don’t mean a lot to us, so we can plot the predicted
probabilities of a person having not fished.
These numbers don’t mean a lot to us, so we can plot the predicted
probabilities of a group having not fished.
These numbers don’t mean a lot to us, so we can plot the predicted
probabilities of a group having not fished.
One Person
Two People
10
Three People
Four People
Density
5
0
Probability
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Outline
Administrative Issues
Poisson Distribution
0.15
Pr(Y=y)
0.1
0.05
0
0 2 4 6 8 10
y
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
One more time, the probability density function (PDF) for a random
variable Y that is distributed Pois(λ):
λy −λ
Pr(Y = y) = e
y!
Using a little bit of geometric series trickery, it isn’t too hard to show
λ y −λ
that E[Y] = ∑∞ y=0 y ⋅ y! e = λ.
It also turns out that Var(Y) = λ, a feature of the model we will discuss
later on.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Poisson data arises when there is some discrete event which occurs
(possibly multiple times) at a constant rate for some fixed time period.
Yi ∼ Pois(λi )
λi = exp(Xi β)
The data: the number of Republican deaths for every month from
1969, the beginning of sustained violence, to 2001 (at which point,
most organized violence had subsided). Also, the unemployment rates
in the two main religious communities.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
> summary(mod)$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.295875 0.1805327 7.178064 7.070547e-13
cathunemp 1.406498 0.6689819 2.102445 3.551432e-02
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
50
●
40
●
●
repdeaths
30
●
●
●
● ●
● ●
●
20
● ●
● ●
● ● ●
●
● ●
● ● ●
●● ●● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●● ● ● ●
10
● ● ● ●●● ● ●● ●
● ●● ● ● ● ● ●●
●● ● ● ● ● ●● ● ● ●●●
●● ● ● ● ● ●● ●
●● ●
● ●● ● ●●● ●● ●● ● ●● ●
● ●● ● ●●● ●●●
●● ●● ● ● ● ● ●● ●
● ●● ● ●●●●● ● ●●● ● ● ●● ●●●● ●
● ● ●● ● ●● ●●
●● ● ● ●●●● ● ●● ● ●● ●●● ●
● ●● ●●● ●
●●●●●●● ●●● ● ●●●●●
●● ● ●
●
●●
●●● ● ●
●● ● ● ●● ●
●● ● ● ●●●●●● ●●●
● ● ●●● ● ●●● ● ● ●
●●
● ● ●●●●
● ●●●●● ●●●●●●● ● ● ● ●●●
●●●● ● ●●●●●
0
●●●
●
● ●● ●●●
● ●●●
●●●
●●
●
●●
●●
●
●●
●● ●●
● ●●●●
●●●●
●●● ●● ● ● ● ● ●●
cathunemp
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
1500
2.0
Frequency
Density
1000
1.0
500
0.0
E[Y|U] Y|U
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
1500
2.0
Frequency
Density
1000
1.0
500
0.0
0
4.4 4.8 5.2 0 5 10 15
E[Y|U] Y|U
Overdispersion
50
40 ●
●
●
repdeaths
30
●
●
●
● ●
● ●
●
20
● ●
● ●
● ● ●
●
● ●
● ● ●
●● ●● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●● ● ● ●
10
● ● ● ●●● ● ●● ●
● ●● ● ● ● ● ●●
●● ● ● ● ● ●● ● ● ●●●
●● ● ● ● ● ●● ●
●● ●
● ●● ● ●●● ●● ●● ● ●● ●
● ●● ● ●●● ●●●
●● ●● ● ● ● ● ●● ●
● ●● ● ●●●●● ● ●●● ● ● ●● ●●●● ●
● ● ●● ● ●● ●●
●● ● ● ●●●● ● ●● ● ●● ●●● ●
● ●● ●●● ●
●●●●●●● ●●● ● ●●●●●
●● ● ●
●
●●
●●● ● ●
●● ● ● ●● ●
●● ● ● ●●●●●● ●●●
● ● ●●● ● ●●● ● ● ●
●●
● ● ●●●●
● ●●●●● ●●●●●●● ● ● ● ●●●
●●●● ● ●●●●●
0
●●●
●
● ●● ●●●
● ●●●
●●●
●●
●
●●
●●
●
●●
●● ●●
● ●●●●
●●●●
●●● ●● ● ● ● ● ●●
cathunemp
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Outline
Administrative Issues
The variance of the Poisson distribution is only equal to its mean if the
probability of an event occurring at any moment is independent of
whether an event has occurred at any other moment, and if the
occurrence rate is constant.
The trick is to assume that λ varies, within the same observation span,
according to a new parameter we will introduce call ς.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Alternative Parameterization
Yi ∣λi , ζi ∼ Poisson(ζi λi )
1 1
ζi ∼ Gamma ( 2 , 2 )
σ −1 σ −1
Note that Gamma distribution has a mean of 1. Therefore,
Poisson(ζi λi ) has mean λi . Note that the variance of this distribution is
σ 2 − 1. This means that as σ 2 goes to 1, the distribution of ζi collapses
to a spike over 1.
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Alternative Parameterization
Yi ∼ Negbin(λi , σ 2 )
where
Γ( σ 2λ−1
i
+ yi ) σ 2 − 1 yi 2 − 2λ i
fnb (yi ∣λi , σ ) =
2
( 2 ) (σ ) σ −1
y!Γ( 2λi ) σ
σ −1
Notes:
1. λi > 0 and σ > 1
2. E[Yi ] = λi and Var[Yi ] = λi σ 2 . What value of σ 2 would be
evidence against overdispersion?
3. We still have the same old systematic component: λi = exp(Xi β).
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Estimates
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2959 0.1805 7.178 7.07e-13 ***
cathunemp 1.4065 0.6690 2.102 0.0355 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Theta: 0.8551
Std. Err.: 0.0754
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Overdispersion Handled!
50
40 ●
●
●
repdeaths
30
●
●
●
● ●
● ●
●
20
● ●
● ●
● ● ●
●
● ●
● ● ●
●● ●● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●● ● ● ●
10
● ● ● ●●● ● ●● ●
● ●● ● ● ● ● ●●
●● ● ● ● ● ●● ● ● ●●●
●● ● ● ● ● ●● ●
●● ●
● ●● ● ●●● ●● ●● ● ●● ●
● ●● ● ●●● ●●●
●● ●● ● ● ● ● ●● ●
● ●● ● ●●●●● ● ●●● ● ● ●● ●●●● ●
● ● ●● ● ●● ●●
●● ● ● ●●●● ● ●● ● ●● ●●● ●
● ●● ●●● ●
●●●●●●● ●●● ● ●●●●●
●● ● ●
●
●●
●●● ● ●
●● ● ● ●● ●
●● ● ● ●●●●●● ●●●
● ● ●●● ● ●●● ● ● ●
●●
● ● ●●●●
● ●●●●● ●●●●●●● ● ● ● ●●●
●●●● ● ●●●●●
0
●●●
●
● ●● ●●●
● ●●●
●●●
●●
●
●●
●●
●
●●
●● ●●
● ●●●●
●●●●
●●● ●● ● ● ● ● ●●
cathunemp
Administrative Issues Zero-Inflated Logistic Regression Counts: Poisson Model Counts: Negative Binomial Model
Other Models