0% found this document useful (0 votes)

9 views36 pages

L19 CountDataModels v2

Uploaded by

Gerbaba Guta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views36 pages

L19 CountDataModels v2

Uploaded by

Gerbaba Guta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Count Data Models

CIVL 7012/8012
2

In Today’s Class
• Count data models
• Poisson Models
• Overdispersion
• Negative binomial distribution models
• Comparison
• Zero-inflated models
• R-implementation
2
3

Count Data
➢ In many a phenomena the dependent variable is of the count
type, such as:
➢The number of patents received by a firm in a year
➢The number of visits to a dentist in a year
➢The number of speeding tickets received in a year
➢ The underlying variable is discrete, taking only a finite non-
negative number of values.
➢In many cases the count is 0 for several observations
➢Each count example is measured over a certain finite time
period.
3
4

Models for Count Data

➢ Poisson Probability Distribution: Regression models based on this
probability distribution are known as Poisson Regression Models
(PRM).

➢ Negative Binomial Probability Distribution: An alternative to PRM is

the Negative Binomial Regression Model (NBRM), used to remedy
some of the deficiencies of the PRM.

4
5

Can we apply OLS to count data?

Patent data from 181firms

LR 90: log (R&D Expenditure)

Dummy categories
• AEROSP: Aerospace
• CHEMIST: Chemistry
• Computer: Comp Sc.
• Machines: Instrumental Engg
• Vehicles: Auto Engg.
• Reference: Food, fuel others
Dummy countries
• Japan:
• US:
• Reference: European countries
5
6

Inferences from the example (1)

• R&D have +ve influence
– 1% increase in R&D expenditure increases the likelihood of
patent increase by 0.73% ceteris paribus
• Chemistry has received 47 more patents compared to
the reference category
• Similarly vehicles industry has received 191 lower
patents compared to the reference category
• County dummy suggests that on an average US firms
received 77 few patents compared to the reference
category
6
7

Inferences from the example (2)

• OLS may not be appropriate as the number of patents
received by firms is usually a small number

7
8

Inferences from the example (2)

• The histogram is highly skewed to the right
• Coefficient of skewness: 3.3
• Coefficient of kurtosis: 14
• For a typical normal distribution
– Skewness is 0 and kurtosis is 3
• We can not use OLS to work with count data

8
Poisson Distribution

≈ Normal
Distribution

Small mean Small count numbers Many zeroes Poisson Regression

Large mean Large count numbers Few/none zeroes OLS Regression
9
10

Poisson Regression Models (1)

➢ If a discrete random variable Y follows the Poisson distribution, its
probability density function (PDF) is given by:
− i
e i
yi
f (Y = yi ) = Pr(Y = yi ) = , yi = 0,1,2...
yi !
where f(Y|yi) denotes the probability that the discrete random
variable Y takes non-negative integer value yi,
and λ is the parameter of the Poisson distribution.

10
11

Poisson Regression Models (2)

➢ Equidispersion: A unique feature of the Poisson distribution is that
the mean and the variance of a Poisson-distributed variable are the
same
➢ If variance > mean, there is overdispersion

11
12

Poisson Regression Models (3)

➢ The Poisson regression model can be written as:
yi = E ( yi ) + ui = i + ui
• where the ys are independently distributed as Poisson
random variables with mean λ for each individual expressed as:
•

• i = E(yi|Xi) = exp[B1 + B2X2i + … + BkXki] = exp(BX)

➢ Taking the exponential of BX will guarantee that the mean value
of the count variable, λ, will be positive.
➢ For estimation purposes, the model, estimated by ML, can be
written as: e− XB iy i

y =
i + u , y = 0,1, 2...
i i
yi ! 12
13

Solution
• Apply maximum likelihood approach

• Log of likelihood function

13
14

Elasticity
• To provide some insight into the implications
of parameter estimation results, elasticities
are computed to determine the marginal
effects of the independent variables.
• Elasticities provide an estimate of the impact
of a variable on the expected frequency and
are interpreted as the effect of a 1% change in
the variable on the expected frequency 𝜆 𝑖
14
15

Elasticity-Example
• For example, an elasticity of –1.32 is interpreted to
mean that a 1% increase in the variable reduces the
expected frequency by 1.32%.
• Elasticities are the correct way of evaluating the
relative impact of each variable in the model.
• Suitable for continuous variables
• Calculated for each individual observation
• Can be calculated as an average for the sample

15
16

Pseudo Elasticity
• What happens for discrete (dummy variables)
• The pseudo-elasticity gives the incremental
change in frequency caused by changes in the
indicator variables.

16
17

Poisson Regression Goodness of

fit measures
• Likelihood ratio test statistics

• Rho-square statistics

17
18

Patent Data with Poisson Model

LR90 coefficient suggests that 1%
Increase in R&D expenditure will
Increase the likelihood of patent
Receipt by 0.86%

For machines dummy

The number of patents received by
Machines category is
100(exp(0.6464)-1)= 90.86% compared
To the reference category

See the likelihood test statistics

2(-5081.331-(-15822.38))

Shows overall model significance

18
19

Poisson Regression Coefficient Interpretation

Example 1: Example 2:
yi ~ Poisson (exp(2.5 + 0.18Xi)) yi ~ Poisson (exp(2.5 - 0.18Xi))

(e0.18 )= 1.19 (e-0.18 )= 0.83

A one unit increase in X, A one unit increase in X, will

will increase the average decrease the average
number of y by 19% number of y by 17%

19
20

Safety Example (1)

20
21

Safety Example (2)

• Mathematical expression
21
22

Safety Example (3)

• The model contains a constant and four variables
– two average annual daily traffic (AADT) variables, median
width, and number of driveways.
• The mainline AADT appears to have a smaller influence
than the minor road AADT, contrary to what is
expected.
• Also, as median width increases, accidents decrease.
• Finally, the number of driveways close to the
intersection increases the number of intersection
injury accidents.
• The signs of the estimated parameters are in line with
expectation. 22
23

Elasticity

• 1% increase in AADT of the major road

increases the expected frequency by 1.045%

• 1% increase in median width decreases the

expected frequency by -0.228%
23
24

Limitations
• Poisson regression is a powerful tool
• But like any other model has limitations
• Three common analysis errors
– Failure to recognize equidispersion
– Failure to recognize if the data is truncated
– If the data contains preponderance of zeros

24
25

Equidispersion Test (1)

Equidispersion can be tested as follows:
➢ 1. Estimate Poisson regression model and obtain the
predicted value of Y.
➢ 2. Subtract the predicted value from the actual value of Y to
obtain the residuals, ei.
➢ 3. Square the residuals, and subtract from them from actual
Y.
➢ 4. Regress the result from (3) on the predicted value of Y
squared.
➢ 5. If the slope coefficient in this regression is statistically
significant, reject the assumption of equidispersion. 25
26

Equidispersion Test (2)

➢ 6. If the regression coefficient in (5) is positive and
statistically significant, there is overdispersion. If it is
negative, there is under-dispersion. In any case, reject the
Poisson model. However, if this coefficient is statistically
insignificant, you need not reject the PRM.

➢ Can correct standard errors by the method of quasi-

maximum likelihood estimation (QMLE) or by the method of
generalized linear model (GLM).

26
27

Patent Example Equidispersion

27
28

Overdispersion
• Observed variance > Theoretical variance

• The variation in the data is beyond Poisson model prediction

Var(Y)= μ+ α ∗ f(μ), (α: dispersion parameter)

• α = 0, indicates standard dispersion (Poisson Model)

• α > 0, indicates over-dispersion (Reality, Neg-Binomial)

• α < 0, indicates under-dispersion (Not common)

28
29

Negative Binomial vs. Poisson

Many zeroes Small mean Small count numbers Poisson Regression

Many zeroes Small mean more variability in count numbers NB Regression 29

Negative Binomial vs. Poisson

Many zeroes Large mean NB Regression

Few\none zeroes Large mean OLS Regression
30
31

Negative Binomial Regression

Model


31
32

NB Probability Distribution
• One formulation of the negative binomial distribution
can be used to model count data with over-dispersion

32
33

Negative Binomial Regression Models

➢ For the Negative Binomial Probability Distribution, we have:
2
2 = + ;   0, r  0
r
where σ2 is the variance, μ is the mean and r is a parameter of the
model.
➢ Variance is always larger than the mean, in contrast to the Poisson
PDF.
➢ The NBPD is thus more suitable to count data than the PPD.
➢ As r → ∞ and p (the probability of success) → 1, the NBPD
approaches the Poisson PDF, assuming mean μ stays constant.

33
34

NB of the Patent Data

34
35

NB of the Safety Example

35
36

Implementation in R
Poisson Model
glm(Y ~ X, family = poisson)
Negative Binomial Model
glm.nb(Y ~ X)
Hurdle-Poisson Model
hurdle(Y ~ X| X1, link = “logit”, dist = “poisson”)
hurdle(Y ~ X| X1, link = “logit”, dist = “negbin”)
Zero-Inflated Model
zip(Y ~ X| X1, link = “logit”, dist = “poisson”)
zinb(Y ~ X| X1, link = “logit”, dist = “negbin”)
36

Poisson Regression Analysis in SPSS
No ratings yet
Poisson Regression Analysis in SPSS
34 pages
Cópia de Aula5 - Contagem
No ratings yet
Cópia de Aula5 - Contagem
28 pages
Chapter 11 Generalized
No ratings yet
Chapter 11 Generalized
28 pages
Bayesian Poisson Regression Guide
No ratings yet
Bayesian Poisson Regression Guide
122 pages
The Poisson Regression Model
No ratings yet
The Poisson Regression Model
6 pages
CE687A Lecture23
No ratings yet
CE687A Lecture23
32 pages
Count Models Poisson NB
No ratings yet
Count Models Poisson NB
10 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Shorten - Count Data Analysis
No ratings yet
Shorten - Count Data Analysis
24 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
Poisson Regression
No ratings yet
Poisson Regression
12 pages
Modeling Count Data
No ratings yet
Modeling Count Data
6 pages
Lecture 11: Alternatives To OLS With Limited Dependent Variables, Part 2
No ratings yet
Lecture 11: Alternatives To OLS With Limited Dependent Variables, Part 2
42 pages
Poisson Regression Guide
No ratings yet
Poisson Regression Guide
15 pages
TCRM CountData
No ratings yet
TCRM CountData
43 pages
1 s2.0 S0378375805000285 Main
No ratings yet
1 s2.0 S0378375805000285 Main
14 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
Zero-Inflated Count Models with COUNTREG
No ratings yet
Zero-Inflated Count Models with COUNTREG
11 pages
Poisson and Negative Binomial Model Fitting
No ratings yet
Poisson and Negative Binomial Model Fitting
15 pages
Poisson Regression Analysis in Stata
No ratings yet
Poisson Regression Analysis in Stata
23 pages
QM Formula and Statistical Concepts
No ratings yet
QM Formula and Statistical Concepts
31 pages
Understanding Probability and Distributions
No ratings yet
Understanding Probability and Distributions
7 pages
Actuarial Society of India: Examinations
No ratings yet
Actuarial Society of India: Examinations
6 pages
Chap11 Generalized Linear Models For Nonnormal Response
No ratings yet
Chap11 Generalized Linear Models For Nonnormal Response
41 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Comprehensive Econometric Models Summary
No ratings yet
Comprehensive Econometric Models Summary
3 pages
Overdispersion Models and Estimation
No ratings yet
Overdispersion Models and Estimation
20 pages
Count Data Models Explained
No ratings yet
Count Data Models Explained
7 pages
Words of Wisdom
No ratings yet
Words of Wisdom
17 pages
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
No ratings yet
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
25 pages
STAT270 Formula Booklet Vretta Updated
No ratings yet
STAT270 Formula Booklet Vretta Updated
10 pages
16 GLM2
No ratings yet
16 GLM2
29 pages
Chapter-09 Count Data, Poisson Regression, and Log-Linear Model
No ratings yet
Chapter-09 Count Data, Poisson Regression, and Log-Linear Model
15 pages
ST2187 - Block 6 Common Probability Distributions in Business Applications
No ratings yet
ST2187 - Block 6 Common Probability Distributions in Business Applications
15 pages
Statistical Modeling Notes
No ratings yet
Statistical Modeling Notes
25 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Error and Uncertainty: General Statistical Principles
No ratings yet
Error and Uncertainty: General Statistical Principles
8 pages
CS1B April 2024
No ratings yet
CS1B April 2024
9 pages
D2 Basic Stat
No ratings yet
D2 Basic Stat
53 pages
Analyzing Random Samples and Hypotheses
No ratings yet
Analyzing Random Samples and Hypotheses
45 pages
Week6 2 GLM2
No ratings yet
Week6 2 GLM2
26 pages
Probability Distributions in Modeling
No ratings yet
Probability Distributions in Modeling
9 pages
Understanding Standard Curve Statistics
No ratings yet
Understanding Standard Curve Statistics
2 pages
Poisson Regression Models
No ratings yet
Poisson Regression Models
14 pages
Lect 12
No ratings yet
Lect 12
36 pages
Lecture 6
No ratings yet
Lecture 6
76 pages
Probability Distributions Overview
No ratings yet
Probability Distributions Overview
45 pages
Normal, Binomial, Poisson, and Exponential Distributions
No ratings yet
Normal, Binomial, Poisson, and Exponential Distributions
39 pages
MS Theory Exam Study Guide
No ratings yet
MS Theory Exam Study Guide
50 pages
A-Level Statistics Revision Guide
No ratings yet
A-Level Statistics Revision Guide
9 pages
2021 Stat Notes
No ratings yet
2021 Stat Notes
162 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Probability and Statistics Questions Guide
No ratings yet
Probability and Statistics Questions Guide
23 pages
AP Statistics Study Guide
100% (2)
AP Statistics Study Guide
12 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Review Statistics
No ratings yet
Review Statistics
24 pages
Modeling Count Data. ISBN 1107611253, 978-1107611252
100% (27)
Modeling Count Data. ISBN 1107611253, 978-1107611252
23 pages
Collins Statis 2
No ratings yet
Collins Statis 2
147 pages
Econometrics: Dr. Sayyid Salman Rizavi
0% (1)
Econometrics: Dr. Sayyid Salman Rizavi
23 pages
Chapter 7 Confidence Interval and Sample Mean A
No ratings yet
Chapter 7 Confidence Interval and Sample Mean A
37 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
MANOVA Analysis in SPSS
No ratings yet
MANOVA Analysis in SPSS
5 pages
Least Squares for Engineers
No ratings yet
Least Squares for Engineers
13 pages
Outreg 2
No ratings yet
Outreg 2
34 pages
4th Semester - Econometrics For Finance-S12025
No ratings yet
4th Semester - Econometrics For Finance-S12025
8 pages
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
No ratings yet
Fundamentals of Statistical Signal Processing Estimation 3001q9c4fj
5 pages
2023 Past Year Question Paper
No ratings yet
2023 Past Year Question Paper
6 pages
Point Estimation of Parameters and Sampling Distributions: Chapter 7 (Cont)
No ratings yet
Point Estimation of Parameters and Sampling Distributions: Chapter 7 (Cont)
14 pages
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
No ratings yet
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
56 pages
Sample Size Determination and Confidence Interval Derivation For Exponential Distribution
No ratings yet
Sample Size Determination and Confidence Interval Derivation For Exponential Distribution
6 pages
Mangaldan Population Statistics Lesson
100% (1)
Mangaldan Population Statistics Lesson
14 pages
Loss Given Default As A Function of The Default Rate
No ratings yet
Loss Given Default As A Function of The Default Rate
6 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Economtric 2 Eqution
No ratings yet
Economtric 2 Eqution
64 pages
V64i04 PDF
No ratings yet
V64i04 PDF
34 pages
Powerpoint - Regression and Correlation Analysis
100% (1)
Powerpoint - Regression and Correlation Analysis
38 pages
Efron 1994
100% (1)
Efron 1994
14 pages
Data Analysis: Knowledge and Attitude Stats
No ratings yet
Data Analysis: Knowledge and Attitude Stats
3 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Curve Fitting
No ratings yet
Curve Fitting
21 pages
Bayesian Logistic Regression Overview
No ratings yet
Bayesian Logistic Regression Overview
4 pages
Chapter 15
No ratings yet
Chapter 15
43 pages
Module Details 51110818
No ratings yet
Module Details 51110818
4 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
43 pages
Time Series Analysis by State Space Methods 2nd Edition Durbin J. - Own The Complete Ebook With All Chapters in PDF Format
100% (8)
Time Series Analysis by State Space Methods 2nd Edition Durbin J. - Own The Complete Ebook With All Chapters in PDF Format
47 pages
Reporting Uniaxial Strength Data and Estimating Weibull Distribution Parameters For Advanced Ceramics
No ratings yet
Reporting Uniaxial Strength Data and Estimating Weibull Distribution Parameters For Advanced Ceramics
17 pages
NMEA Tool for Enhanced GPS Accuracy
No ratings yet
NMEA Tool for Enhanced GPS Accuracy
8 pages
Properties of Estimators Classnotes
No ratings yet
Properties of Estimators Classnotes
13 pages