Count Data Models
CIVL 7012/8012
2
In Today’s Class
• Count data models
• Poisson Models
• Overdispersion
• Negative binomial distribution models
• Comparison
• Zero-inflated models
• R-implementation
2
3
Count Data
➢ In many a phenomena the dependent variable is of the count
type, such as:
➢The number of patents received by a firm in a year
➢The number of visits to a dentist in a year
➢The number of speeding tickets received in a year
➢ The underlying variable is discrete, taking only a finite non-
negative number of values.
➢In many cases the count is 0 for several observations
➢Each count example is measured over a certain finite time
period.
3
4
Models for Count Data
➢ Poisson Probability Distribution: Regression models based on this
probability distribution are known as Poisson Regression Models
(PRM).
➢ Negative Binomial Probability Distribution: An alternative to PRM is
the Negative Binomial Regression Model (NBRM), used to remedy
some of the deficiencies of the PRM.
4
5
Can we apply OLS to count data?
Patent data from 181firms
LR 90: log (R&D Expenditure)
Dummy categories
• AEROSP: Aerospace
• CHEMIST: Chemistry
• Computer: Comp Sc.
• Machines: Instrumental Engg
• Vehicles: Auto Engg.
• Reference: Food, fuel others
Dummy countries
• Japan:
• US:
• Reference: European countries
5
6
Inferences from the example (1)
• R&D have +ve influence
– 1% increase in R&D expenditure increases the likelihood of
patent increase by 0.73% ceteris paribus
• Chemistry has received 47 more patents compared to
the reference category
• Similarly vehicles industry has received 191 lower
patents compared to the reference category
• County dummy suggests that on an average US firms
received 77 few patents compared to the reference
category
6
7
Inferences from the example (2)
• OLS may not be appropriate as the number of patents
received by firms is usually a small number
7
8
Inferences from the example (2)
• The histogram is highly skewed to the right
• Coefficient of skewness: 3.3
• Coefficient of kurtosis: 14
• For a typical normal distribution
– Skewness is 0 and kurtosis is 3
• We can not use OLS to work with count data
8
Poisson Distribution
≈ Normal
Distribution
Small mean Small count numbers Many zeroes Poisson Regression
Large mean Large count numbers Few/none zeroes OLS Regression
9
10
Poisson Regression Models (1)
➢ If a discrete random variable Y follows the Poisson distribution, its
probability density function (PDF) is given by:
− i
e i
yi
f (Y = yi ) = Pr(Y = yi ) = , yi = 0,1,2...
yi !
where f(Y|yi) denotes the probability that the discrete random
variable Y takes non-negative integer value yi,
and λ is the parameter of the Poisson distribution.
10
11
Poisson Regression Models (2)
➢ Equidispersion: A unique feature of the Poisson distribution is that
the mean and the variance of a Poisson-distributed variable are the
same
➢ If variance > mean, there is overdispersion
11
12
Poisson Regression Models (3)
➢ The Poisson regression model can be written as:
yi = E ( yi ) + ui = i + ui
• where the ys are independently distributed as Poisson
random variables with mean λ for each individual expressed as:
•
• i = E(yi|Xi) = exp[B1 + B2X2i + … + BkXki] = exp(BX)
➢ Taking the exponential of BX will guarantee that the mean value
of the count variable, λ, will be positive.
➢ For estimation purposes, the model, estimated by ML, can be
written as: e− XB iy i
y =
i + u , y = 0,1, 2...
i i
yi ! 12
13
Solution
• Apply maximum likelihood approach
• Log of likelihood function
13
14
Elasticity
• To provide some insight into the implications
of parameter estimation results, elasticities
are computed to determine the marginal
effects of the independent variables.
• Elasticities provide an estimate of the impact
of a variable on the expected frequency and
are interpreted as the effect of a 1% change in
the variable on the expected frequency 𝜆 𝑖
14
15
Elasticity-Example
• For example, an elasticity of –1.32 is interpreted to
mean that a 1% increase in the variable reduces the
expected frequency by 1.32%.
• Elasticities are the correct way of evaluating the
relative impact of each variable in the model.
• Suitable for continuous variables
• Calculated for each individual observation
• Can be calculated as an average for the sample
15
16
Pseudo Elasticity
• What happens for discrete (dummy variables)
• The pseudo-elasticity gives the incremental
change in frequency caused by changes in the
indicator variables.
16
17
Poisson Regression Goodness of
fit measures
• Likelihood ratio test statistics
• Rho-square statistics
17
18
Patent Data with Poisson Model
LR90 coefficient suggests that 1%
Increase in R&D expenditure will
Increase the likelihood of patent
Receipt by 0.86%
For machines dummy
The number of patents received by
Machines category is
100(exp(0.6464)-1)= 90.86% compared
To the reference category
See the likelihood test statistics
2(-5081.331-(-15822.38))
Shows overall model significance
18
19
Poisson Regression Coefficient Interpretation
Example 1: Example 2:
yi ~ Poisson (exp(2.5 + 0.18Xi)) yi ~ Poisson (exp(2.5 - 0.18Xi))
(e0.18 )= 1.19 (e-0.18 )= 0.83
A one unit increase in X, A one unit increase in X, will
will increase the average decrease the average
number of y by 19% number of y by 17%
19
20
Safety Example (1)
20
21
Safety Example (2)
• Mathematical expression
21
22
Safety Example (3)
• The model contains a constant and four variables
– two average annual daily traffic (AADT) variables, median
width, and number of driveways.
• The mainline AADT appears to have a smaller influence
than the minor road AADT, contrary to what is
expected.
• Also, as median width increases, accidents decrease.
• Finally, the number of driveways close to the
intersection increases the number of intersection
injury accidents.
• The signs of the estimated parameters are in line with
expectation. 22
23
Elasticity
• 1% increase in AADT of the major road
increases the expected frequency by 1.045%
• 1% increase in median width decreases the
expected frequency by -0.228%
23
24
Limitations
• Poisson regression is a powerful tool
• But like any other model has limitations
• Three common analysis errors
– Failure to recognize equidispersion
– Failure to recognize if the data is truncated
– If the data contains preponderance of zeros
24
25
Equidispersion Test (1)
Equidispersion can be tested as follows:
➢ 1. Estimate Poisson regression model and obtain the
predicted value of Y.
➢ 2. Subtract the predicted value from the actual value of Y to
obtain the residuals, ei.
➢ 3. Square the residuals, and subtract from them from actual
Y.
➢ 4. Regress the result from (3) on the predicted value of Y
squared.
➢ 5. If the slope coefficient in this regression is statistically
significant, reject the assumption of equidispersion. 25
26
Equidispersion Test (2)
➢ 6. If the regression coefficient in (5) is positive and
statistically significant, there is overdispersion. If it is
negative, there is under-dispersion. In any case, reject the
Poisson model. However, if this coefficient is statistically
insignificant, you need not reject the PRM.
➢ Can correct standard errors by the method of quasi-
maximum likelihood estimation (QMLE) or by the method of
generalized linear model (GLM).
26
27
Patent Example Equidispersion
27
28
Overdispersion
• Observed variance > Theoretical variance
• The variation in the data is beyond Poisson model prediction
Var(Y)= μ+ α ∗ f(μ), (α: dispersion parameter)
• α = 0, indicates standard dispersion (Poisson Model)
• α > 0, indicates over-dispersion (Reality, Neg-Binomial)
• α < 0, indicates under-dispersion (Not common)
28
29
Negative Binomial vs. Poisson
Many zeroes Small mean Small count numbers Poisson Regression
Many zeroes Small mean more variability in count numbers NB Regression 29
30
Negative Binomial vs. Poisson
Many zeroes Large mean NB Regression
Few\none zeroes Large mean OLS Regression
30
31
Negative Binomial Regression
Model
31
32
NB Probability Distribution
• One formulation of the negative binomial distribution
can be used to model count data with over-dispersion
32
33
Negative Binomial Regression Models
➢ For the Negative Binomial Probability Distribution, we have:
2
2 = + ; 0, r 0
r
where σ2 is the variance, μ is the mean and r is a parameter of the
model.
➢ Variance is always larger than the mean, in contrast to the Poisson
PDF.
➢ The NBPD is thus more suitable to count data than the PPD.
➢ As r → ∞ and p (the probability of success) → 1, the NBPD
approaches the Poisson PDF, assuming mean μ stays constant.
33
34
NB of the Patent Data
34
35
NB of the Safety Example
35
36
Implementation in R
Poisson Model
glm(Y ~ X, family = poisson)
Negative Binomial Model
glm.nb(Y ~ X)
Hurdle-Poisson Model
hurdle(Y ~ X| X1, link = “logit”, dist = “poisson”)
hurdle(Y ~ X| X1, link = “logit”, dist = “negbin”)
Zero-Inflated Model
zip(Y ~ X| X1, link = “logit”, dist = “poisson”)
zinb(Y ~ X| X1, link = “logit”, dist = “negbin”)
36