PH 1700 Session 4b:
Poisson - Point and Interval
Estimates and One Sample
Inference
Rosner 6.9, & 7.10
Point Estimation and Exact
Interval Methods for the
Poisson
Chapter 6 Section 9
Summary
• Poisson
• Point Estimation
• Exact Interval Estimation
• One Sample Inference
• SMR
Estimation for Poisson
• The Poisson distribution is used to estimate occurrences in a given
time period
• Often times we can apply the Poisson distribution to events occurring
for people over a given time.
• A common unit of follow up time is person-years – the unit of time
defined as 1 person being followed for 1 year.
• Example: a study with 10 people that each person is followed up for 2 years
has a total of 20 person-years.
Point Estimate for Poisson
• Assume the number of events 𝑋 over 𝑇 person years is Poisson
distributed with parameter 𝜇 = 𝜆𝑇. An unbiased estimator of 𝜆 is
given by 𝜆& = 𝑋/𝑇, where 𝑋 is the observed number of events over 𝑇
person-years (the entire study)
• If 𝜆 is the incidence rate per person-year, and 𝑇 = number of person-
years of follow up, and we assume a Poisson distribution then
𝐸 𝑋 = 𝜆𝑇 and therefore 𝜆& is unbiased by
𝐸 𝜆& = 𝐸 𝑋 ⁄𝑇 = 𝜆𝑇⁄𝑇 = 𝜆
Example: Woburn, MA 1970’s Excess Cancer risk
in children’s leukemia
• In the book A Civil Action, the people in the town feared that a
contaminated water supply caused cancer
• Translating question to statistical framework:
• 12 cases of childhood leukemia (<19 y.o.) diagnosed from 1970-1979
• Total of 12000 childhood residents (<19 y.o.)
• National average is 5 cases per 100,000 person years
• Is the cancer risk in the town different from the national average?
Finding the Point Estimate for the town
• 10 years, 12,000 people; the study was for
𝑇 = 10×12,000 = 120,000 person years
(approximation, more later when we study Survival Analysis)
• Estimating incidence: 𝜆& = 12/120,000 = 0.0001 events per person-
year
• Rescale: 𝑇 = 1.2 100,000-person years and 𝜆& = 10 events per
100,000-person years
• What is the uncertainty around that estimate?
Exact Interval Estimation for Poisson
• An exact 100%(1 − 𝛼) confidence interval for Poisson parameter 𝜆 is
given by (𝜇! /𝑇, 𝜇" /𝑇) such that 𝜇! , and 𝜇" satisfy
& %'!
𝛼 𝑒 '(! 𝜇!# 𝑒 '(! 𝜇!#
Pr 𝑋 ≥ 𝑥 𝜇 = 𝜇! = =9 =1−9
2 𝑘! 𝑘!
#$% #$)
and
%
𝛼 𝑒 '(" 𝜇"#
Pr 𝑋 ≤ 𝑥 𝜇 = 𝜇" = =9
2 𝑘!
#$)
Example: Leukemia
• 12 cases in 1.2 100,000-person-years
• What is the 95% confidence interval for 𝜇? For 𝜆?
• 12 cases out of 1.2 100,000-person years is very small, use exact
methods.
• For 𝜇, use Table 7, page 881. Look under 95% column in row where
𝑥 = 12.
Piece of Table 7, page 881
Example: Leukemia - Confidence Intervals
• The 95% CI for 𝜇 is (6.20, 20.96)
• For 𝜆 = 𝜇/𝑇 , convert from 𝜇 : 𝑇 = 1.2 so the 95% CI for 𝜆 is
(6.20/1.2, 20.96/1.2) = (5.2,17.5)
cases for 100,000-person years
• Since our 95% CI does not include the national rate of 5 per 100,000
person years, 5 is not a plausible value for the rate of the town. We
can say the town rate of leukemia is significantly higher than the
national rate.
• This can also be done with Stata
Example: Leukemia –
Using Stata for Poisson Distribution Estimation
• cii means 120000 12, poisson
Poisson Exact
Variable Exposure Mean Std. Err. [95% Conf. Interval]
120000 0001 .0000289 .0000517 .0001747
• From Stata, the 95% CI for λ=(5.2/10* , 17.5/10* ), cases per each
person year. Equivalently to λ= 5.2, 17.5 per 10* -person years.
One-Sample Inference for
Poisson Distribution
Section 7.10
13
Example (page 259) 7.57: Occupational
Health: Rubber workers
• Starting in January 1, 1964, 8418 white male rubber workers ages 40-
84 were followed for 10 years for various mortality outcomes and
compared to the US white male mortality rates in 1968. 4 deaths due
to Hodgkin’s disease were found compared to 3.3 deaths expected
from US mortality rates. Is this significant?
14
Example (page 259) 7.57: Occupational
Health: Rubber workers Continued
• Let
X=total number of deaths in study population
Yi = 1 if the individual i dies in the study period 0 otherwise
pi =probability of death for the ith individual
Therefore Yi ~Bernoulli(pi)
𝑋 = ∑!"#,…& 𝑌! , (i=1 to n=8418)
𝐸 𝑋 = 𝐸(∑! 𝑌! ) = ∑! 𝐸(𝑌! ) = ∑! 𝑝!
• Under H0: death rate of rubber workers =US general population, the expected number of events
μ0 is given by
𝐻' : 𝐸 𝑋 = 𝜇( ,
• If the disease is rare, then the expected number of events ~Poisson(μ0), we test: H0:μ=μ0, vs.
H1:μ≠ μ0
15
So if we have…
• One variable of interest? _Yes_
• We have one-sample? _Yes_
• Assume that underlying distribution is normal or CLT holds? _No_
• Is underlying distribution binomial? _No_
• Is underlying distribution Poisson? _Yes_
• Then we have a one-sample Poisson Test…..
16
Mortality rates and the Poisson
• The Poisson distribution can be used as a model for the counts of
events, such as death, occurring rarely in a population.
• Useful when the probability of the event, such as mortality, is not
constant for everyone in the population being considered
• When the probability is not constant, then the binomial distribution
does not apply
17
One sample test: Poisson Distribution Critical
Value Method (as opposed to p-value method)
• Use the Poisson exact confidence interval method
• If X is a Poisson Random variable with expected value
µ, then to test the hypothesis
H0: µ= µ0 versus H1: µ ¹ µ0
using a two sided test of level a
Construct the two sided 100% x (1-a) confidence
interval for µ, based on the observed value x: (c1,c2)
• If µ0 < c1 or µ0 > c2 (outside the interval) then reject H0
• c1 < µ0 < c2, (inside the confidence interval) then do
not reject H0
18
Example: Occupational Health Rubber workers –
Critical value Method
• µ0 = 3.3 ; x =4; 𝛼 = 0.05, 1 − 𝛼 = 0.95
7
• 1.09<3.3<10.24, therefore we fail to reject H0.
• The rate of Hodgkin’s disease among Rubber workers
is not significantly different from the national
mortality of Hodgkin’s.
19
Example: Occupational Health Rubber workers
Confidence Interval for Poisson in Stata
. cii means 84180 4, poisson
-- Poisson Exact --
Variable | Exposure Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
Translating question to statistical framework
| 84180 .0000475 .0000238 .0000129 .0001217
• Stata gives the CI in terms of 𝜆 as a rate. To make it a CI for 𝜇 multiply by T:
• 0.0000129*84180 person years = 1.0859
• 0.001217*84180 person years = 10.2447
• CI for 𝜇: (1.09, 10.24)
Small Sample inference – p-value
• Let µ be the expected value of a Poisson distribution. To test the
hypothesis H0: µ = µ0 versus H1: µ ¹ µ0,
• Compute x= observed number of deaths in the study population
• Under H0, the random variable X will follow a Poisson distribution
with parameter µ0 and the exact p-value is given by
• min 2 Pr 𝑋 ≤ 𝑥 , 1 𝑖𝑓 𝑥 < 𝜇)
• min 2 Pr 𝑋 ≥ 𝑥 , 1 𝑖𝑓 𝑥 ≥ 𝜇)
8 #$% (&'
• With Pr[𝑋 = 𝑥] = = display poissonp(𝜇) , 𝑥)
%!
• And Pr 𝑋 ≤ 𝑥 = display poisson(𝜇) , 𝑥)
21
Equivalently, (Rosner page 260 and 261)
%
𝑒 '(& 𝜇)#
min 2× 9 ,1 𝑖𝑓 𝑥 < 𝜇)
𝑘!
#$)
%'!
𝑒 '(& 𝜇)#
min 2× 1 − 9 ,1 𝑖𝑓 𝑥 ≥ 𝜇)
𝑘!
#$)
22
Example: Occupational Health- p-value
• Back to the Rubber workers, with 4 deaths, and 3.3 expected from
the US mortality rate.
𝑋 = 4 > 𝜇) = 3.3; therefore we use
%
𝑒 '(& 𝜇)#
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = min 2× 1 − 9 ,1
𝑘!
#$)
23
Example Occupational Health - p-value in Stata
or display poissonp(3.3,0)
display poissonp(3.3,1)
display poissonp(3.3,2)
display poissonp(3.3,3)
thus
24
Example: Occupational Health-
Interpretation of p-value
• With a p value of 0.839, there is no evidence for the mortality from
Hodgkin’s disease among rubber workers being significantly different
from the US mortality rate.
25
Standardized mortality ratio (SMR)
• Another way to compare the mortality rate of a
sample with that of the population would be the
standardized mortality ratio (SMR).
• The standardized mortality ratio is 100%*observed/
expected number of deaths.
• The expected number of deaths assumes no difference
between the sample and the general population
• Standard morbidity ratio is an alternate name for the
standardized mortality ratio when the conditions do
not result in death.
26
SMR interpretation
• Similar to an Odds ratio
• SMR >100% implies increased risk in the sample
• SMR < 100% implies decreased risk in the sample
• And SMR = 100% implies neither an increased nor decreased risk in the
sample compared to the general population
27
Example: Occupational Health -SMR
• Recall we had 4 observed in our sample, and
expected 3.3 based on the US mortality reports for
Hodgkins.
• 100%*(4/3.3) = 121%
• We can reframe our Poisson test in terms of the SMR:
• H0: SMR =100%
• H1: SMR ¹ 100%
• Since we already performed the test, we can say the
SMR is not significantly different from 1 for the
Hodgkin’s disease in our sample of rubber workers.
28
One Sample Inference for the Poisson (large
sample)
• If the expected number of events (deaths) under the null distribution
are large enough, we can approximate
• This approximation is useful only if µ0 >10
• It uses a statistic that follows the Chi-squared distribution
29
Large sample test for Poisson µ
• Compute x = number of observed events in the study sample
• Compute the test statistic: (See next slide)
( x - µ0 )
2 2
æ SMR ö
• X 2
= = µ 0ç - 1÷ ~ c12 , under H
µ0 è 100 ø 0
• For a two sided test at level a, we reject H0 if our statistic X2 > c1,12 -a
• And fail to reject H0 if X < 1,1-a2 c 2
• The exact p-value is given by Pr( 1 > X2) c 2
30
Just a note:
!
Remember (SMR= 100)
"!
" '⁄ !)) "
:;< $&
• 𝜇) !))
−1 = 𝜇) !))
−1
"
=𝜇) %⁄(& −1
%'(& "
= 𝜇) (&
%'(& "
= (&
= 𝑋"
"
Remember if 𝑥 ∼ 𝑁 𝜇) , 𝜎 = 𝜇) , then
%'(& %'(& "
∼ 𝑁 0,1 and therefore, ∼ 𝜒!"
(& (&
31
Large Sample Approximate CI
• The 100%x(1-a) confidence interval for µ can be approximated by
x ± z1-a /2 x
32
Example: Occupational Health- Rubber
workers and Bladder cancer
• We observe 21 cases in our sample, and the US mortality
rate Is 18.1 deaths. Are there significantly more deaths in
our sample?
• SMR = 100%*(21/18.1) = 116%
• X = 21;
#$
8 & (&(
%
• Exact method (p-value): min 2× 1 − ∑#$) ,1
#!
é æ 20
e-18.118.1k ö ù
min ê 2 ´ ç1 - å ÷ ,1ú
• ë è k =0 k! ø û = min[2 x display poisson(18.1,20),1]
33
Example Occupational Health – p-value
8 #!).! !>.!(
min 2× 1 − ∑")
#$) #!
, 1 = min[2 x display poisson(18.1,20),1],
so we get:
8 #!).! !>.!(
min 2× 1 − ∑")
#$) #!
, 1 = min( 2*(1-.72270),1) = 0.5546
Thus by the exact test, the mortality rate from bladder cancer in our sample
of rubber workers is not significantly different from the general population.
34
Example: Occupational Health-
Comparing Methods
• Using the approximate method, we get
( x - µ0 ) ( 21 - 18.1)
2 2
2.92
X 2
= = = = 0.46464
µ0 18.1 18.1
• critical value for the test
• invchi2(1,0.95) = 3.8414
• Or p-value for the test: chi2tail(0.46464) = 0.4955
• By the approximate method, there is no significant
difference.
• Compare to exact: P- value = 0.5546
• In general, exact methods are strongly preferred for
inference concerning the Poisson distribution
35
Checklist for tests of hypothesis
• Identify the variable of interest
• Identify the parameter(s) of interest
• State the null and alternative hypotheses
• Identify the type I error level
• Identify the test statistic (you can use the flow chart in the back of the
textbook)
• Identify the distribution of the test statistic (a known probability
distribution)
• Determine the decision rule (do a graph!)
• Calculate the test statistic
• Report the test statistic, df, CI & p-value
• Make a decision
• Conclude and interpret
When do I use Poisson, Revisited
• Testing mortality\morbidity ratios or rare incidence rates
• mortality rate varies across the sample
• Any other situation the Poisson applies and we are testing the Poisson
parameter (eg. modeling counts)
• Use the Poisson test
• Exact method – small sample (µ0 < 10)
• Large sample approximation with chi-square (not as accurate), (µ0 > 10)
• However, exact method is often preferred, even if the approximation is okay.
37
Summary
• Poisson
• Point Estimation
• Exact Interval Estimation
• One Sample Inference
• SMR