Mixed Poisson Regression Models with Covariate Dependent Rates
Author(s): Peiming Wang, Martin L. Puterman, Iain Cockburn and Nhu Le
Source: Biometrics, Vol. 52, No. 2 (Jun., 1996), pp. 381-400
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2532881 .
Accessed: 24/09/2013 14:11
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
BIOMETRICS 52, 381-400
June 1996
Mixed Poisson Regression Models With
Covariate Dependent Rates*
PeimningWang,' Martin L. Puterman,2 lain Cockburn2, and Nhu Le3
1NanyangBusiness School, NanyangTechnicalUniversity,
NanyangAvenue,Singapore2264
2Facultyof Commerceand BusinessAdministration, Universityof BritishColumbia,
2053 Main Mall, Vancouver,B.C., Canada V6T 1Z2
3Divisionof Epidemiologyand Cancer Prevention,BritishColumbia Cancer Agency,
600 W 10thAvenue,Vancouver,B.C., Canada V5Z 4E6
SUMMARY
This paper studies a class of Poisson mixture models that includes covariates in rates. This model
contains Poisson regression and independent Poisson mixtures as special cases. Estimation methods
based on the EM and quasi-Newton algorithms, properties of these estimates, a model selection pro-
test are discussed. A Monte Carlo study investigates
cedure,residual analysis,and goodness-of-fit
implementationand model choice issues. This methodologyis used to analyze seizure frequency
and Ames salmonella assay data.
1. Introduction
Finite mixturemodels have been widelystudied and applied (cf. Titterington,Smith,and Markov,
1985, pp. 16-21). However with few exceptions,it appears that the effectof covariates on dis-
tributionparametershas not been considered.It is the purpose of this paper to do this. Such a
generalizationis importantbecause it allows inferencewhen finitemixturemodels are appropri-
ate fordescribingthe data. For example, in a clinical trial, covariatesmightrepresenttreatment
and other risk factorsand the data mightbe best described by a mixturebecause of unobserved
heterogeneousgroupsin the population.
Poisson regressionhas been recognizedas an importanttool foranalyzingthe effectsofcovariates
on count data. However,observed count data analyzed under such models oftenexhibit overdis-
persion so that, conditional on covariates,the Poisson assumption of mean-variance equality is
no longervalid. Consequently,many authors have studied the effectsof overdispersionon infer-
ences made under a Poisson model (e.g., Cox, 1983; Breslow, 1990b; Breslow and Clayton, 1993;
Brillinger,1986; Lawless, 1987a; McCullagh and Nelder, 1989; Cameron and Trivedi, 1990). On
the other hand, several approaches have been suggestedfordealing with this-problem,including
quasi-likelihood(Williams, 1982; Breslow, 1984, 1990a), random effectmodels (Schall, 1991), and
mixturemodels (Manton, Woodbury,and Stallard, 1981; Hinde, 1982; Lawless, 1987b; Follmann
and Lambert, 1989).
The quasi-likelihoodand randomeffectmodels regardoverdispersionas a nuisance factorwhich
interfereswith inference.Many studies of these models suggestthat the presenceof overdispersion
may result in less efficientregressioncoefficientestimates,but these estimates may consistently
estimate the true coefficientsprovidedthe regressionspecificationis correct(Cox, 1983; Lawless,
1987b). In contrast,we regard overdispersionas an intrinsicaspect of the data which requires
modelling.By using finitemixturemodels to explain overdispersionwe-adopt the viewpointthat
there are a finitenumberof unobservablecategoriesof observationswhich may be characterized
* This researchis partiallysupportedby NSERC grant0GP0005527.
Key words: Count data; EM algorithm;Epilepsy; Mixture models; Overdispersion;Poisson re-
gression;Residual analysis.
381
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
382 Biometrics, June 1996
by different values of regressioncoefficients.
Our studies of these models show that the presence
of overdispersionin the usual Poisson regressionmay have some implicationswhich are not con-
sistentwith those suggestedby the previous studies. For example, unlikethe implicationsof the
quasi-likelihoodapproach (Cox, 1983; McCullagh and Nelder, 1989), the models proposed in this
paper suggestthat overdispersionmay resultin eitherseriouslybiased parameteror standarderror
estimatesor both. This means that usingdifferent models to adjust foroverdispersionin statistical
analysis may lead. to different conclusions.Our models may provide furtherinsightinto the data
while at the same time providinga plausible explanationforoverdispersion.
This paper focuses on Poisson mixturemodels. In applications without covariates,the use of
mixturesto account foroverdispersionhas receivedconsiderableattentionbecause of theirsimplic-
ity and appeal (cf. Simar, 1976; Laird, 1978; Lindsay,1983; Leroux, 1989; Leroux and Puterman,
1992). In this paper, we generalizethis approach to incorporatecovariates.This generalizationis
importantforapplicationsbecause it allows covariatesto account forcategoricalas well as continu-
ous measurements.An unpublishedPh.D. dissertation(Wang, 1994) studies mixturemodels which
include covariates in the mixingprobabilitiesand in both the mixingprobabilitiesand Poisson
rates, respectively.It also studies finitebinomial mixtureswhich include covariatesin the mixing
probabilitiesand binomialparametersthrougha logit link function.
Our motivationcomes fromanalysisofdaily seizurefrequencyin epilepticchildren.Many subject
matterinvestigations(Hopkins, Davies, and Dobson, 1985; Milton et al., 1987; Olson, lasemidis,
and Sackellares,1989; Tauboll, Lundervoldand Gjerstad, 1989) have noted overdispersionin such
data and have had difficulty developingappropriatestatistical models. Recent papers by Albert
(1991) and Le, Leroux,and Puterman (1992) suggestthat such data may be well describedby using
a finitemixtureapproach. These papers focus on analysis of seizure count data in the absence of
covariateinformation. However,because we requiremethodsforanalyzingseizurefrequencydata in
a clinical trial setting,we requiremethodswhichallow investigationof covariateeffectsin Poisson
rates. Wang (1994) providesmanyotherapplications.
To justifythe need formodels foroverdispersion, we test whetherobserveddata is overdispersed
with respect to Poisson regression.Score tests foroverdispersionhave been developed by Fisher
(1950), Williams (1982), Collings and Margolin (1985), Cameron and Trivedi (1986, 1990), Zel-
terman and Chen (1988), Dean and Lawless (1989), and Dean (1992). Here we use test statistics
proposed by Dean (1992) who develops a unifyingtheoryforthese score tests.
In Section 2, we describe the proposed model and several special cases. In addition we discuss
identificationissues. Parameterestimationbased on the EM and quasi-Newtonalgorithmsis dis-
cussed in Section 3. Monte Carlo methods are used to validate the algorithmand provide insight
into the precisionof estimates. In Section 4, we discuss a model selection procedurewhich first
determinesthe numberof componentsby selectioncriteriabased on data, and then carriesout in-
ferenceabout regressionparametersby likelihoodratio tests. Residual analysis and goodness-of-fit
test forthe proposed model are also describedin this section. Section 5 illustratesthe procedure
withseizure and Ames salmonella assay data. Concludingremarksappear in Section 6.
2. Mixed Poisson Regression Models
2.1 The Model
Let the random variable Yi denote the ith response variable, and let {(yi, ti,xi), i = 1,... ,n}
denoteobservationswhereyi is the observedvalue ofYi, ti a non-negativequantityrepresenting the
time or extentof exposure and xi a k-dimensionalcovariatevectorcorrespondingto the regression
part of the model. Usually the firstelementof xi is a 1 correspondingto an intercept.Our mixed
Poisson regressionmodel assumes:
(1) The unobservedmixingprocess can occupy any one of c states wherec is finiteand unknown;
(2) For each observed count, yi, there is an unobservedrandom variable, Ai, representingthe
componentat whichyi is generated.Further,the (Yi, Ai) are pairwiseindependent;
(3) Ai followsa discrete distributionwith c points of support, and Pr(Ai j) = pj where
= 1;
EjZ=Pj
(4) Conditionalon Ai j, Yi followsa Poisson distributionwhichwe denote by
Hifj( Ixi,ti,aX;) _Po(yi Aij) = !8
exp(-Aij) (1)
where
Ai-tiAy(xi, aj) ti exp(a9 xi), forj =1,. .., c
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 383
with a -=(ail ... I oZC)' denotingunknownparameters, and aj = (api,.I. ., k), j 1, ... , c.
Note that we could also choose otherpositive link functions.
The above assumptions define the unconditionaldistributionof observations,yi, as a finite
Poisson mixture in which the mixing probabilities, pj, are constant, and the component
distributionsare Poisson distributionswith mean, Aij, which is determinedby the exposure, ti,
and by the Poisson rate, Aj(xi, aj), which is related to the covariatesxi throughan exponential
linkfunction.
Note that our model allows some coefficients to be constant across components,i.e., a1 =a1
forj = 1, . . ., c or 0 in one or several covariates,i.e., ajI = 0 forsome j, j= 1, . . . , c.
Under the above assumptionsthe probabilityfunctionof Yi satisfies
C
(i xii, a,P) E PjPo(yi I Aij) (2)
j=l
wherePo(yi Aij) is givenby (1), and p = (P, ... I pc)' is an unknownparametervector.
We mayequivalentlyview the model as arisingfromthe followingsamplingscheme.Observations
are independent.For observationi, componentj is chosen accordingto a multinomialdistribution
with probabilitypj. Subsequently,yi is generatedfroma Poisson distributionwith mean Aij.
Note that this model includesmany previouslystudied models as special cases.
* Choosing c= 1 yieldsa Poisson regressionmodel.
* Settingxi 1 and ti 1 forall i yieldsan independentPoisson mixturemodel (Simar, 1976;
Leroux, 1989).
* Lettingthe Poisson rates have commonregressionparametersand different interceptsyields
a Poisson regressionwith a random interceptwhich followsa discrete mixingdistribution.
Follmann and Lambert (1989) studied a binomialvariantof this model.
* Settingc -2 and Al (xi, al) 0 yields a Poisson regressionmodel with an extra mass at 0.
For the above model, the unconditionalmean and variance of Yi are, respectively,
C
{ {}
E(Yi) = E(E(Yi I Ai)) = ti E pjAij and (3)
j=l
Var(Yi) E(Var(Yi I Ni)) + Var(E(Yi I Ai))
C
tiE pjAij+t( E pj.A.j- pjAij (4)
Obviously,Var(E(Yi I Ai)) 0 if and only if
Ail = Ai2 * Aic
This impliesthat the mixturemodel is able to cope withexcess variationamong Y1, . . ., Y due to
heterogeneityin the population.
2.2 Identifiability
To be able to reliablyestimate the parametersof (2) we requirethat the mixturebe identifiable,
that is, two sets of parameterswhichdo not agree afterpermutationcannot yieldthe same mixture
distribution.Without covariates,Teicher (1961) provesthat the class of finitemixturesof Poisson
distributionsis identifiable.For finitebinomial mixtureswith covariates,Follmann and Lambert
(1991) give sufficient conditionsforidentifiability.
For finitePoisson mixtureswith covariates,we
extend the definitionof identifiabilityas follows.
DEFINITION. Consider the collectionof probabilitymodels {f(yi I xi, t, ap),... vf(yn |xn, tin,
a, p)}, with a restriction that A11 < ... < Am, parameter space C x Ax XP, sample spaces Y.,.. . ,IYn,
and fixedcvariate vectorsx1 . .., xn, wherexi C Rk fori =1, ... ., r. The collectionof probability
models is identifiable if for(c, ae,p),(c*, ae*,p*) C C x A x X),
if(yi |xi, t2,a,pA) fR(yi Xi,ti,a*, p)I (5)
forallyHiC Yi, i =1, ... .,r, implies (c, alp) (c*,ao*,p*).
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
384 Biometrics, June 1996
Note that the order restrictionin the definitionmeans that two models are equivalent if they
agree up to permutationsof parameters.
We now providea sufficient Suppose that (c, a, p), (c*, a*, p*) satisfy
conditionforidentifiability.
(5). This then impliesthat foreach i and all yi C Yi, i 1,... ,
C C
pPo(y. I A..) *Po(yi I A* (6)
j=1 j=1
wherePj and Aij = >j (xi, aj) are definedabove. Note that each side of (6) may be regardedas a
finitePoisson mixturewithoutcovariates.Teicher's resultimpliesthat
c= c, Pj=Pj and Ai =A*
fori = 1,. .,. In and j 1,.. , c. By the definitionof the model, we obtain
exp(a3 xi) exp(a'xi) forj 1,. . I,c. (7)
From (7) we obtain
(?aj - *)'x 0 forj=1,... Ic and Z'= 1,... In
or
(aj- a)'X =0 forj = 1,..., c, (8)
where X = (xi,..., xn). Hence a sufficient is that X is a full rank
conditionfor identifiability
matrix,in whichcase (8) impliesthat a = a*. This means that in an ANOVA structure,we might
have to reparameterizethe model to ensure identifiability.
3. Parameter Estimation
3.1 An Algorithm
For a fixednumberof componentsc, we obtain maximumlikelihoodestimates of the parameters
in the above model using both the EM algorithm(Dempster, Laird, and Rubin, 1977) and quasi-
Newton approach (Nash, 1990). As firstsuggestedby Dempster et al. (1977) formixturemodel
estimation,we implementthe EM algorithmby treatingunobservablecomponentmembershipof
the observationsas missingdata. We discuss choice of numberof componentsbelow.
Suppose that (Y, X, T)-{ (yi,xi, ti); i = 1, ... , n} is the observeddata generatedby the above
mixturemodel. Let (Y, Z, X, T) -{(yi, zi, xi, ti); i 1, ... , n} denote the completedata forthe
mixture,wherethe unobservedquantityzi = (zil, - , zi)' satisfies
ifAi = j
Zi =1
23V 0 otherwise.
The log likelihoodforthe completedata is
5 5zij
fl C n C
L(Y, Z, apP X, T) log(pj) + E E zij logPo(yi I Aij)-
i=l j=l i=l j=l
TheEM approachfindsthemaximum likelihood
estimatesusingan iterative
procedureconsisting
oftwosteps:an E-stepand an M-step.AttheE-step,it replacesthemissing data byitsexpectation
conditionalon theobserveddata. At theM-step,it findstheparameter estimateswhichmaximize
theexpectedloglikelihoodforthecomplete data,conditionalon theexpectedvaluesofthemissing
data. In ourcase,thisprocedure
can be statedas follows.
E-step.Givena(?) and P(O), replacethe missingdata Z by its expectationconditioned on these
initialvaluesof the parametersand the observeddata, (Y,X, T). In this case, the conditional
expectation ofthejthcomponent ofzi equalstheprobability
thattheobservation yi wasgenerated
bythejthcomponent ofthemixture conditional
distribution, on theparameters, thedata,and the
covariates.Denotetheconditionalexpectationofthejth component ofzi by%ij(a (0)Ip(O)). Then
zj(ao),p(0)) - P j(i|zi1tvoj) = i.e...
1, (9)
1=1
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
Mixed Poisson RegressionModels 385
M-step. Given conditional probabilities {%i (a ),P(0)) - (i1,..... ,ic)'; i 1,... ,In4, obtain
estimatesof the parametersby maximizing,with respectto a and p,
Q(a ,p I a0( ),pp()) E{L(Y, Z, ap, X, T) I Y, a() p() XIT}
Q1 + Q2
where
n c
Q1 =EE ij(a(O),
/3(O)) log(pj) and
i=1 j=1
n c
Q2 =EE ij(a()v/3(o)) log Po(yi l Aij).
i=1 j=1
The estimatedparameters,a and P, satisfythe followingM-step equations:
&Q1 j( Zi,j Zi,C forj ic-i, (10)
)0
Pipj iP l Pi PC
&3Q2 -z= 0. (11)
Ziijug+Po(yi Aij)
a i=1 j=
Solving (10) yields
Pj = n.L#Zi~j for j 1,... ,c1. (12)
i~ 1
Since closed formsolutionofequation (11) is unavailable,we use a quasi-Newtonapproach (Nash,
1990) to obtain estimates.Hence we implementthe E- and M-steps in the followingway to obtain
parameterestimates.
Step 0: Specifystartingvalues ao() = (a(4),.. ., a?Y)) and p(o) = (P(?) , ), and two tolerances
p..
co and c;
Step 1: (E-step) Compute Zi - (i,1,. ..,I )', (i = 1,... ,n), using (9). To avoid overflowin the
calculation of Eij, we divide both the numeratorand denominatorin (9) by the largesttermin
the sum in the denominator;
Step 2: (M-step)
(a) Find values of P using (12);
(b) Find values of & to solve (11) using a quasi-Newtonalgorithm;
Step 3: If at least one of the followingconditionsis true,set a~oe - & and p(o) = P, and go to Step
1; Otherwise,stop.
(1) &1 _
__ c c (0) >
o- a_0Z(?iZ1i &jI -
ax(?) 6.
(2) P -(O) 1Z1
-- 1,j P ;
(3) l(Y,&,P,X,T) - l(Y,a(o0),p(?),X,T)I > co, where l(Y, a,pIXT) is the observed log
likelihoodfunction.
Dempsteret al. (1977) and Wu (1983) discussedthe convergencepropertiesofthe EM algorithm
in a general setting.Since Q(a, p I a(o),p(?)) and its firstorderpartial derivativesare continuous
in c>, p, a (0)I and p(o), applying Wu's results (1983) lets us conclude that the sequence of the
observedlikelihoodl(Y, a(m) ,P(m),X, T) convergesto a local maximumor saddle point regardless
of startingvalues. Note that forsome startingvalues the stoppingcriteriain Step 3 mightneverbe
satisfied.Also l(Y, a, pI X, T) need not, in general,be globallyconcave. For these reasons,we need
to choose initialvalues carefullyin orderto increasethe chance that the algorithmconvergesto the
global maximum.Althoughwe use the followingapproach,we also suggestuse ofotherinformation
and intuitionfromparticularapplicationsto choose startingvalues.
We assume that c is known.The firststep of our approach divides the data {Yi, . . ., y7n}into c
groupsin termsofits percentilesand fitsthe data intoa c-componentindependentPoisson mixture
model withoutcovariatesby choosinginitialvalues based on the percentileinformation. The second
step, ifnecessary,fitsthe data intoa mixed Poisson regressionmodel containingonly one moreco-
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
386 Biometrics, June 1996
variate in Poisson rate in such a way that the initial values of the parametersincluded in the
previous mixturemodel equal the estimates of the correspondingparametersfromthe previous
fittingmodel, and initial values of the parameters not in the previous fittingmodel are set to
a small value, say 0.00001. This process is iterated until a complete set of initial values forthe
mixturemodel is obtained. The motivationof this ad hoc approach is based on clusteranalysis. At
each iterationwe use different criteriato classifythe data. First, the data is classifiedin termsof
its percentiles.Then the data is classifiedin termsof an independentPoisson mixturemodel, and
subsequentlyin termsof mixed Poisson regressionmodels. Note that choosinga complete set of
initialvalues fora mixturemodel step by step in such a way guaranteesthat the likelihoodvalues
will increaseat each step. Also our approach obtains maximumlikelihoodestimatesfora sequence
of nested mixturemodels.
We use an example to explain this approach. Suppose that we need to choose initial values to
fita three-component mixturemodel with covariatesxi = (1,di). First, we find 16.5, 33.0, 49.5,
66.0, and 82.5 percentilesof observations{yi... , yn} denoted as ql-q5, respectively,and fitthe
data into a three-component independentPoisson mixturemodel (xi = (1)) with the initialvalues
of a1.1, a2,j, and a3,1 equal to log(qi), log(q3), and log(q5), respectively,and both the initial
values of P1,P2, and p3 equal to 1/3. Note that under this specificationand an exponentiallink
function,the initial values of Aj(xi, aj), (j = 1,2,3), are equal to qj, q3, and q5 with the same
mixingprobabilities1/3. Then we fitthe data into the three-componentPoisson mixturemodel
with xi = (1,di) by choosingthe initial values of al,2, aZ2,2,and a3,2 equal to 0.00001 and the
initialvalues ofthe otherparametersequal to the estimatesof the correspondingparametersof the
firstfittingmodel.
Note that the EM algorithmconvergeslinearly(McCullagh and Nelder,1989; Rednerand Walker,
1984). On the otherhand, a quasi-Newtonalgorithmwhichmaximizesthe observedlog likelihood
functionhas a higherconvergencerate,thoughit requiresgood startingvalues in orderto converge.
Hence, to use complementarystrengthsof both the EM and quasi-Newtonalgorithms,we modify
the above Step 3 as follows:
Step 3'. (a) If at least one of the followingconditionsis true,set a(?0 - & and p(o) = , and go to
Step 1; Otherwise,go to (b).
(1) II&- )a 1 -E=i Zk1 &1 -Aa)1 > c;
(2)(2 3p(O
I_(?)11 EjC= 1p (.0) ? c;
I- -Pjl>e
(3) 1(Y,^ X, T) -I (Y. pa(0),
(?), X, T) > so3.
(b) Maximize the observed log likelihood function l(Y, a, p, X, T) using the quasi-Newton
algorithm(Nash, 1990) with & and P as initialvalues. Then, stop.
When the numberofcomponentsc is known,asymptoticnormalityof m((&,P) - (a, p)) is easily
provedunder standard regularityconditions(Lehmann, 1983). To approximatestandard error,we
compute&(6j i) and &(Pj) fromthe diagonalelementsofthe inverseofthe (c*k+(c-1))-dimensional
observedinformationmatrixwith c fixedat c whichis definedas
/ A21 021
021 (Y, a, p, X, T) - ( oa2 0a0p
09(Z,p)2 0921 021
&a0Z9p 9p2 /
The quasi-Newtonalgorithmyields approximatestandard errorsof the estimates.
3.2 A Monte Carlo Study
We use Monte Carlo methods to examine the performanceof the above algorithm.In particular,
we wished to verifyour code, determinethe precisionof estimates,and investigatesome model
selectioncriteria.Using a three-component mixturemodel, we analyzed 100 replicates,each with
100 observations.
Two different approaches forchoosinginitial values are compared in the study.In one, we use
the true parametervalues of the model generatingthe observationsas initial values in order to
determineperformanceof the algorithmin the best case. The otheruses the true parametervalues
of ai1,i, a2,i, and aR3,l as initial values, chooses initial values of P1, P2, and p3 accordingto the
approach describedin Section 3.1, and fitsthe samples to a three-component independentPoisson
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 387
mixturemodel. Then, followingthe approach of Section 3.1, we choose a complete set of initial
values for the parametersof the model generatingthe samples. These two different approaches
of choosing initial values lead to essentiallythe same estimates. The model forthe simulationis
describedbelow.
We set ti 1 and choose Poisson rates to depend on one time-dependentcovariate di,
i.e., xi = (1, di), where di = 0.2 for i = 1, ... I,10, di = 0.4 for i 11,.. .20, etc., and
a (al, a2, a3), where a' = (2.8, 2.9), a' = (2.6, 0.4), and a'= (3.6, 0.2). We choose
mixingprobabilitiesP1 = 0.5156, P2 = 0.3127, and p3 = 0.1717. Choosing an exponential link
functionyields the Poisson rates
Al (xi, a 1) = exp(2.8 - 2.9di),
A2(xi, a2) = exp(2.6 + 0.4di), and
A3(Xi, a3) = exp(3.6 + 0.2di).
We chose the above parametervalues so that the Poisson rate functionsdo not cross each other.
We would expect that in this case the algorithmwould performwell.
Resultsofthe Monte Carlo studyare presentedin Table 1. The table showsthat the sample means
of estimates are veryclose to the true parametervalues, suggestingthat the global maximumof
the observedlikelihoodis reached. Further,the standard deviationsare relativelysmall, indicating
parametersare estimatedprecisely.
Table 1
Means (aye) and standarddeviations(std) for Monte Carlo study
Component Mixing probability Poisson rate
j pi ave std viI ave std a 2 ave std
1 0.5156 0.5325 0.0501 2.8 2.80 0.04 -2.9 -2.90 0.11
2 0.3127 0.2997 0.0480 2.6 2.62 0.02 0.4 0.39 0.01
3 0.1717 0.1678 0.0365 3.6 3.60 0.01 0.2 0.20 0.01
1 Note that pj (j = 1, 2, 3) are reparametrized as P1 = exp(Q1)/(l + exp(Q1) + exp(Q2)), P2
exp(Q2)/(1 + exp(Q1)+ exp(Q2)), and p3 = 1/(1 + exp(Ql) + exp(Q2)).
Our implementationof the algorithmused FORTRAN-77 on a Sun SPARC Station 1+. The
average numberof the iterationsof the EM algorithmis 4.75 underthe stoppingcriterione = 0.01,
and average time is 6.65 seconds.
4. Implementation Issues
4.1 Model Selection
To apply the mixed Poisson regressionmodel we must determinethe numberof componentsc,
and we requirea method forinferenceabout model parameters.When c is known,inferencefor
the parameterscan be based on likelihoodratio tests. In practice,however,this is rarelythe case.
When c is unknown,standard likelihoodratio tests are no longervalid fordeterminingc or testing
hypothesesabout parametervalues. This is because mixingprobabilitiesmay lie on the boundary
of the parameterspace when the "true" numberof componentsis less than the fittednumberof
components.Hence the usual regularityconditionsforthe chi-squaredapproximationto likelihood
ratio test statisticdo not hold (McLachlan and Basford,1988). We propose the followingapproach
formodel selection.
Two widely used model selection criteria are Akaike's InformationCriterion (AIC) (Akaike,
1973, 1974) and the Bayesian InformationCriterion(BIC) (Schwarz, 1978). The AIC has its basis
in decision theoryand arises frommaximizingan estimated expected log likelihood.It penalizes
the log likelihoodby the numberof parametersfitto the data to avoid overfitting. The BIC comes
fromplacing a prior distributionon the parameter space includingall dimensionsand models
considered.It is asymptoticallyequivalentto the posteriorprobabilitythat the true parameteris
the mthmodel. Quantitatively,BIC penalizes the log likelihoodforthe numberof parametersfitto
the data multipliedby 1/2 log numberof observationsin the data. McLachlan and Basford (1988)
and Leroux and Puterman (1992) discuss the use of AIC and BIC to determinethe numberof
componentsin a finitemixturemodel withoutcovariates.Leroux (1992) establishedconsistencyof
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
388 Biometrics, June 1996
parameterestimatesunderthese penalized likelihoodcriteria.We definethe AIC and BIC criteria
forthe mixturemodel as follows:
* AIC: choose the model forwhichiC(X) - a,(X) is largest;
* BIC: choose the model forwhichiC(X) - (1/2)(log(n))a,(X) is largest,
where ic(X) is the maximumlog-likelihoodof the mixturewith c componentsand covariate X,
ac(X)- c * k + (c - 1) wherek is the dimensionsof aoj, and n is the total numberof observations.
Using the BIC (AIC), our model selection approach consists of two stages. At the firststage,
we determinec to maximize BIC (AIC) values forthe saturated mixturemodels that contain all
possible covariates. In practice c < 4 usually suffices.Althoughwe compute both AIC and BIC
values in our applications,we recommendusing BIC because our Monte Carlo studies suggestthat
BIC is more reliable. At the second stage, our model selectionapproach depends on our analysis
objectives. If our goal is inferenceabout model parameters,we may carryout likelihoodratio tests
for nested c-componentmixturemodels. If the goal is choosing an appropriatemodel to fit the
data, we select a model to maximizeBIC (AIC) values among c-componentmixturemodels. Since
this selection method is heuristicand only gives a guideline in applications, some other specific
concernsin model selectionshould be taken into account fromcase to case. For instance,in some
applications the numberof componentsand some parametersin a mixturemay be explicitlyor
implicitlydeterminedby underlyingtheory.
In the Monte Carlo studies discussed in Section 3.2 we computed both AIC and BIC values
for all possible mixed two to fourcomponentmodels. The study shows that AIC and BIC are
reliable methods forchoosingthe correctmodels. AIC chose the correctmodel 96% of the time.
When AIC failedto select the correctmodel, it always chose a model with too manycomponents,
suggestingthat AIC may underpenalizethe numberof parametersin the mixtures.On the other
hand, BIC alwayschose the correctmodels,suggestingthat BIC may not overpenalizethe number
of parameters. Note that these results are quite optimisticbecause the componentswere well
separated in the Monte Carlo study.The examples in the next section will exhibitthis procedure
in practice.
4.2 Overdispersion
To determinewhetherthe data is overdispersedwithrespectto the Poisson distributionin Poisson
regressionmodels, we use three score test statistics proposed by Dean (1992). They test the
hypothesisof no overdispersionagainst alternativesrepresentingdifferent
formsof overdispersion.
The test statisticsare
-(( _ )2 -
Pa -
2Z/2
((Y-
_ A)2 _ )
and
_
PC =y 1 E A-8')2 2 _Y,
v2n Ati
correspondingto the followingspecificationsof overdispersion:
(a) E(yi) -Hi, Var(yi) ,i (I + TMai) forT small;
(b) E(yi) =pi, Var(yi) +
pi (I Tpi);
(c) E(yi) pi, Var(yi) i
t(l + T).
In these formulaeAi is the estimatedmean value forthe independentidenticalobservationsbased
on Poisson regression.Under Ho: T - 0, each asymptoticallyfollowsa standardnormaldistribution.
Note that the differencebetween (a) and (b) is that the formerhas the approximateformsforthe
firsttwo moments,whereasthe latterhas the exact ones.
4.3 Residual Analysis and Goodness-of-fit
Once a mixed Poisson regressionmodel has been fitto a set of observations,it is essentialto check
the qualityof the fit.For this purpose,we considerPearson, deviance, and likelihoodresidualsfor
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 389
mixed Poisson regressionmodels. The Pearson residualssatisfy
yj Ai
-
~V(AI~)
whereAi = ti >j 1Pj Aij and V(fie) Lij = A-+ {J=l[Pi
E - Note
that the sum of squared Pearson residuals,X2, gives the Pearson goodness-of-fit
statisticforthe
mixed Poisson regressionmodel.
The deviance residualis definedas
rdi = sign(yi- /i) Vd,
where di = 2(1(yi,yi) - (/ii,yi)), 1(yi,yi) = Po(yi I yi), and l([ii,yi) is the log likelihoodof the
mixed Poisson regressionmodel foryi. Note that the sum of the squared deviance residuals, D,
equals the deviance forthe mixed Poisson regressionmodel.
The likelihoodresidualis derivedby comparingthe deviance obtained on fittinga mixed Poisson
regressionmodel to the completeset of n cases, withthe deviance obtained when the same model
is fittedto n - 1 cases, excludingthe ith. This gives rise to a quantitythat measuresthe change
in the deviance when each case in turnis excluded fromthe data set. Thus, the likelihoodresidual
forthe ith case is definedas
r1i = sign(yi - i) -D-(i)
whereD and D(i) are the deviances based on n and n - 1 cases, respectively.
Because these residuals asymptoticallyfollowthe standard normaldistributionas Ai -> 0o, and
large residuals may indicate poorly fittingobservations(Pierce and Schafer,1986), we use index
plot of residuals fordetectionof outliers.Note that plots of residuals versus fittedvalues are not
usefulhere because they may contain patternscorrespondingto different yi values.
To examinehow the ithobservationaffectsthe set ofparameterestimates,we definethe following
quantity
{11 (^ - (i))/se() 11+ 11(P -P(i)/se(p) 11}
1
mI{Y
J
s'
c6 -~~ jjI
e(t31jy) __
PJ -
8e(pj) f ,
(13)
where & and P, and 6(i) and P(i) are the maximum parameter estimates of the mixed Poisson
regressionmodel based on the complete data set of n cases and on the data set of n - 1 cases
excludingthe ith case respectively;and m is the total numberof parametersin the model. Because
each term in (13) measures a relative change in individual coefficient, wi can be interpretedas
average relativecoefficientchangesfora set of estimates.This is a usefulquantityforassessingthe
extentto whichthe set of parameterestimatesis affectedby the exclusionof the ith observation.
Relatively large values of this quantity will indicate that the correspondingobservations are
influentialand causing instabilityin the fittedmodel. An index plot of wi is the most usefulway of
presentingthese values. Computingthese values requiresfittingthe model n times,however,good
startingvalues are available so our algorithmterminatesquickly.
In order to evaluate the extent to which a mixed Poisson regressionmodel fitsa set of data,
we use the Pearson statisticX2 forgoodness of fittest. The Pearson's statisticis asymptotically
distributedas X2m underthe null hypothesis.We illustrateresidualanalysisand a goodness-of-fit
test below.
5. Examples
In this section, we apply the mixturemodels to two examples. We also compare the proposed
method in the paper to approaches proposed by McCullagh and Nelder (1989), Breslow (1984),
and Lawless (1987b).
5.1 Seizure Frequencyin a Clinical Trial
In this subsectionwe analyze data froma clinicaltrial carriedout at BritishColumbia's Children's
Hospital whichinvestigatedthe effectof intravenousgammaglobulin(IVIG) on suppressionofepi-
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
390 Biometrics, June 1996
leptic seizures. Subjects were randomizedinto two group. After28 days of baseline observation,
the treatmentgroup received monthlyinfusionsof IVIG while the controlgroup received "best
available therapy."The primaryend point of the trial was daily seizure frequency.The principal
data source was a daily seizurediarywhichcontainedthe numberof hoursof parentalobservation
and the numberof seizuresof each type duringthe observationperiod.
We use Poisson regressionto analyze the seriesof myoclonicseizurecountsfroma singlesubject
receivingIVIG. Data extractedfromthe seizure diary were the daily counts,yi, and the hours of
parental observation,ti, forthe ith day (Figure 1). As covariates we use treatment(xil), trend
(Xi2), and treatment-trend interaction(xi3), where
1 ifthereis a treatment(i > 28) (14)
XilI
0 otherwise,(i < 28)
Xi2 log(i), (15)
and
Xi3 =XilXi2 (16)
N
c)
0 20 40 60 80 100 1,20 140
(2n al ezr onsaeidpnetadflo day ie oso ersinmdlwt
Eodtoa en qa otepouto bevto ie(i n h oso ae(ubro
Figure 1. Daily seizure counts.
The second column in Table 2 reportsresultsof fittingthe data using Poisson regressionwith
covariates(14), (15) and (16), and a log link function.The data is overdispersedwith respect to
the Poisson distribution,since each of the overdispersiontests is highlysignificant(Pa =-16.18,
= 16.22,
Pbseiure pe and P, = 36.33). This suggests
hor)RaesrepFigured 1. the inadequacy
Daonnialy of the usual Poisson regressionmodel.
seizur counctios.
We apply the mixturemodel assumingthat:
(1) each daily observedseizure count,yi, is associated with time exposure (observationhours),
ti,covaiats (1), 15)and(16),
and covariates xi = (Xil, Xi2,and)
Xi3);a lxogj
lin function hjzisdat ovripre wit repect
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 391
Table 2
Parameter estimates(standarderrors)fromfiveestimationmethodsfor seizure data
Method I Method II Method III Mixed Poisson regression
Poisson (McCullagh & (Breslow, (Breslow,
Covariate regression Nelder,1989) 1984) 1984) Comp 1 Comp 2
Intercept 2.118 2.118 2.148 2.129 2.8450 2.0704
(0.0815) (0.1897) (0.5539) (0.3846) (0.2360) (0.0890)
x1 4.132 4.132 4.656 3.757 1.3020 7.4318
(0.3032) (0.7059) (1.094) (0.8322) (0.4904) (0.5095)
X2 -0.2257 -0.2257 -0.2412 -0.2408 -0.4063 -0.2707
(0.0329) (0.0766) (0.2191) (0.1523) (0.0909) (0.0377)
X3 -1.320 -1.320 -1.440 -1.221 -0.4309 -2.2762
(0.0800) (0.1863) (0.3098) (0.2316) (0.1385) (0.1377)
Mixing NA NA NA NA 0.2762 0.7238
probabilities
Unexplained 1.0 5.4206 0.8631 0.4051 NA NA
variance
wherei =1, ... .,140, j 1,... , c, and c is the numberof componentsin the mixturemodel. This
model allows the treatment,trend,and interactionofthe treatmentand trendto affectthe Poisson
rate, and the regressioncoefficients to varybetweencomponents.
Table 3 providesthe resultsof fittingthese models. Amongthe threesaturated mixturemodels,
both AIC and BIC suggest a two-componentmodel. Within the class of two componentmodels
we can carryout likelihoodratio tests fortreatment,trend,and interactioneffectrespectively.For
example, to test forany treatmenteffect,i.e., Ho: a 1 = aj3 = 0 forj 1,2, we findthat the
likelihoodratio statisticequals 2 * (445.30 - 376.18) = 138.24 > X4,0.99 13.277. Thus, there is
strongevidence of a treatmenteffect.
Table 3
Mixed Poisson regressionmodel estimatesfor seizure data
Component Mixing Poisson rate
number probability Log
(i) Pj ajo ajj aj2 aj3 likelihood AIC BIC
One-component mixture
1 2.118 4.132 -0.2257 -1.320 -583.16 -587.16 -593.04
Two-component mixture
1 0.4128 1.2183 -700.10 -703.10 -707.41
2 0.5872 -1.1571
1 0.3715 1.8959 -1.2761 -462.79 -467.79 -475.14
2 0.6285 1.3777 -3.1018
1 0.4007 2.3702 -0.3864 -445.30 -450.30 -457.65
2 0.5993 4.9182 -1.4524
1 0.3736 2.9919 -0.4732 -0.4718 -426.21 -433.21 -443.51
2 0.6264 2.1791 -2.3248 -0.3379
1 0.2761 2.8450 1.3020 -0.4063 -0.4309 -376.18 -385.18 -398.41
2 0.7239 2.0704 7.4318 -0.2707 -2.2762
Three-component mixture
1 0.2742 2.8440 1.2938 -0.4054 -0.4294 -375.29 -389.29 -409.88
2 0.0277 2.0809 -28.767 -0.3928 5.4488
3 0.6981 6 2.0694 7.3197 -0.2648 -2.2478
Bold numbersare theparameter,
estimates,log-likelihood,
AIC and BIG valuesoftheselectedmodel.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
392 Biometrics, June 1996
For the saturated two-componentmixed Poisson regressionmodel, the Pearson goodness-of-fit
statisticX2 is 134.0 with 131 degreesof freedom.This value does not exceed the upper 95% critical
point of the X32-distribution, suggestingthat there is no evidence of lack of fit. The Pearson,
deviance, and likelihoodresiduals fromthe fittedmodel are displayed in Figure 2. Observe that
the Pearson residualsdo not appear to followa normaldistribution;furtherthe deviance residuals
and likelihoodresiduals are verysimilarand forboth the 61st observationis fardistant fromthe
remainingobservations,suggestingthat it may be an outlier. On omittingthis observation,the
parameterestimateschange little;the deviance reductionis {12 1 = (-0.314)2 = 9.86, while X2 is
only reduced by 0.8. This means that the 61st observationis an outlier,but it may not have great
impact on the overallfitof the mixed Poisson regressionmodel to the data.
..- -- - - -- ----- - - --
index~ ~ i1f index
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
index index
Figure 2. Residual and average relative coefficientchanges for seizure data.
For detection of influential observations, the average relative coefficientchanges wi are displayed
in Figure 2. This plot shows that the sixth observation is the only influential observation. On
omitting the sixth observation, the average relative coefficientchange for each parameter estimate
is about 20%o, and the new parameter estimates become
c>1 (2.2701, 1.8800, -0.2006, -0.6373),
'2=(2.0045, 7.4989, -0.2444, -2.3026),
0.2740 and P2 = 0.7260.
Note that the change in the parameter estimates of the first component is relatively larger than
that in the other parameter estimates. After excluding the sixth observation we reanalyze the data.
The selection criteria again choose a saturated 2-component as the best fittingmodel.
We now interpret the original fitted model. In it the mixing probabilities equal 0.2761 and 0.7239
and the respective conditional rate functions are
Ai (xi, c>) =exp[2.8450 + 1.3020x1- 0.4063xi2 -0.4309xi3]
and
A2(xi, s>2) =exp[2.0704 + 7.4318x1- 0.2707xi2 -2.2762xi3].
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
Mixed Poisson RegressionModels 393
For instance,c>12 = 1.3020 is the estimated treatmenteffectconditionalon the patient being in
underlyingstate one, while &22 = 7.4318 is the effectwhen the patient is in state two. Zeger,
Liang, and Albert (1988) and Neuhaus, Kalbfleisch,and Hauck (1991) distinguishconditionaland
unconditionalinterpretations of regressionparameters.
Figure 3 providesthe estimatedhourlyseizure rate correspondingto each component(the solid
line is the rate forcomponentone and the dotted line forcomponenttwo) and the observedhourly
seizure rate yi/ti. Observe that afterthe treatmentboth the hourlyrates are lowerand the trend
is less steep than at baseline, suggestingthat this patient benefitedfromIVIG therapy.Figure 4
depicts the estimated mean E(Yi) (the solid line) and variance Var(Yi) (the dotted line) forthe
fittedmodel obtained through(3) and (4). Observe that afterthe treatmentthe variance becomes
much closer to the mean, suggestingthe patient's situation becomes more stable. Further,the
variance exceeds the mean throughout,with the greatest differencein the baseline period. The
"bumpiness"in these quantities is due to the nonconstantexposure. Note also that there is no
obvious parametricrelationshipbetweenthe estimatedmean and variance.
Althoughno empiricalverificationof classificationis available fromthe data, we note that the
clinical investigatorsconductingthis study foundthe two-componentmodel plausible. They said
that theyhave observedsubjects to have "bad days" and "good days" withno obvious explanation
of this effect.We believe our model captures this aspect of the data and by doing so providesa
clinicallymeaningfulexplanation of overdispersion.Note that Figure 3 also classifiesthe days in
terms of the estimated posteriorprobabilities.Those observationsmarked as "1" forma group
which is characterizedby the Poisson rate functionAl(xi,acl), while those marked as "2" form
anothergroupwhichis characterizedby A2(xi, (x2). In this sense,our model consistsoftwo Poisson
regressionmodels,describingthe seizure rate on "bad days" and "good days" respectively.
For the purpose of comparison,we also fitthe data with previouslysuggestedmethods. Table
2 reportsparameterestimatesforthese methods. Note that Method I and II are quasi-likelihood
approaches assuming a variance functionVar(Yi) = u2E(Yi) and Var(Yi) = E(Yi) + U2E(Yi)2
respectively.Method III assumes that E(Yi) = A is a randomvariable,and that log(Ai) x= ,3+ i,
wherexi are covariates,3 are unknownregressionparameters,and Ei are randomerrortermshaving
mean 0 and a constantunknownvarianceO2. The unknownparameterc2 in these models is called
unexplainedvariance. Estimation forthese models is discussed by McCullagh and Nelder (1989)
and Breslow (1984).
FromTable 2 we findthat usingdifferent methodsforoverdispersionmay lead to eitherdifferent
parameterestimates or different standard errorsor both. For instance, the coefficientestimate
for treatmentis 4.132 by Method I, 4.656 by Method II, 3.757 by Method III, and 1.3020 for
component1 and 7.4318 forcomponent2 in our model. Further,the ratio of estimateto standard
errorfortreatmentis 5.85 underMethod I, 4.26 underMethod II, 4.51 underMethod III, and 2.66
forcomponent1 and 14.59 forcomponent2 under our mixture.This impliesthat these methods
disagreewith respectto the significanceof treatmenteffect.Compared withthe threemethodsfor
overdispersion,the proposed model has smallerconfidenceintervalsforparameterestimates.
5.2 Ames Salmonella Assay Data
The assay data analyzed in this section were presentedby Margolin et al. (1981) and analyzed
by Breslow (1984) and Lawless (1987b) using quasi-likelihoodand negative binomial approaches
respectively.Table 4 showsthe numberof revertantcolonies (yi) observedon each ofthreereplicate
plates tested at each of six dose level of quinoline (di).
Lawless (1987b) definedthe expected frequencyof revertantsas
E(Yi I di) = A(di) exp(cao+ ca1di+ a2 log(di + 10)),
whileBreslow (1984) assumed E(Yi I d-) A(d-). At issue is whethera mutageniceffectis present.
This correspondsto testing the hypothesisthat a2 = 0. The data is overdispersedrelative to
Poisson regression,since each of the threetests foroverdispersionis highlysignificant(Pa = 5.628,
Pb = 5.656, and Pc = 5.607).
To account foroverdispersion,Breslow (1984) assumed a variance functionVar(Yi) A(di) +
uX2A(di)2,and obtained parameterestimatesby usingweightedleast-squarescombinedwithmethod
of moments.Similarly,Lawless (1987b) fittedthe data witha negativebinomialmodel in whichthe
variance functionis Var(Yi) =A(di) + uX2A(di)2,and obtained parameterestimatesby maximum
likelihood.Parameter estimates (standard errors)are reportedin Table 5.
Our analysis of the data using mixed Poisson regressionmodels follows.We assume that:
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
394 Biometrics,June 1996
LO)
baseline treatment
period
0
a)
a.
0~
a)
a)
o
W-
o
I ' II II
0 2 40 60 80 100 120 140
day
Figure 3. Estimated hourlyseizure rates and classificationof seizure data accordingto the
estimatedposteriorprobabilitiesbased on the fittedmodel.
N~~~~
1
c~~~~~~11, 2 1 :1
o :
o I baseline treatment
period
2a 2
0 20 40 60 80 100 120 140
day
Figure 4. Estimated mean ( ) and variance
s ) based on the fittedmodel forseizure
data.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 395
Table 4
Numberof relevantcolonies of salmonella (gi)
Dose of quinolined- (Qig/plate)
0 10 33 100 333 1000
Observed numberof 15 16 16 27 33 20
colonies 21 18 26 41 38 27
29 21 33 60 41 42
Table 5
Parameter estimates(standard errors)fromfive
estimationmethodsfor Ames salmonella assay data
Poisson regression Quasi- Negative
likelihood binomial Mixed Poisson regessionII
Complete Without (Breslow, (Lawless,
Covariate data observation12 1984) 1987b) Comp 1 Comp 2
Intercept 2.173 2.308 2.203 2.203 1.9094 2.4768
(0.2183) (0.2266) (0.3634) (0.359) (0.2674) (0.2753)
Dose -0.001013 -0.000750 -0.000974 -0.000980 -0.001260
(0.000245) (0.000265) (0.000437) (0.000381) (0.000275)
log(Dose+ 10) 0.3198 0.2632 0.3110 0.313 0.3640
(0.05698) (0.0607) (0.09901) (0.0868) (0.0665)
Mixing NA NA NA NA 0.8173 0.1827
probabilities
Unexplained 1.0 1.0 0.07181 0.0488 NA NA
variance'
(1) the numberofobservedrevertantcolonies,yi, is associated withcovariatesxi = (1, d., log(d +
10)), and t= 1;
(2) yi are independentand followa mixed Poisson regressionmodel with Poisson rates
A3((X,I) a exp(jo + ac.1d. + a%2 log(d- + 10))
wherei 1,1... 18 andj =1 ... c.
Table 6 shows the resultsof fittingthese models. Among the threesaturated models, both AIC
and BIC select two-componentmixtures.To test mutagenic effects,we compare the saturated
model with the one without covariate log(d2 + 10) using a likelihood ratio test. Since the chi-
square test statistic equals 2 * (68.81 - 60.90) = 15.82 > X2 9.21, mutageniceffectsare
significant.Further,the similarregressioncoefficient estimatesforeach componentin the saturated
model suggest common regressioncoefficients forboth components.This is indeed confirmedby
the likelihoodratio test (the chi-squaretest statisticis 2 * 0.01 = 0.02 < X2 - 9.21). Hence we
choose to representthe data by the two-componentmixturewith common regressioncoefficients
and componentdependentintercepts.
After fittingthe two-componentmixed Poisson regressionmodel to the data, the Pearson
goodness-of-fit statisticx2 is 16.2 with 13 degreesof freedom,suggestingthat thereis no evidence
of lack of fitin the mixed Poisson regressionmodel.
The fittedmodel may be interpretedas follows.In it, mixing probabilitiesequal 0.8173 and
0.1827 and the respectiveconditionalrate functionsare:
Ai(xzi,a1) = exp(1.9094 - 0.00126di + 0.3640log(d2 + 10)) and
A2(Xi, r2) = exp(2.4768 - 0.00126di + 0.3640 log(di + 10)).
This model indicatesthat mutageniceffectsare the same forboth components.This model may also
be regardedas a Poisson regressionwitha randominterceptfollowinga discretemixingdistribution
withtwo pointsof support.Figure 5 showsthe classificationforthe data in whicheach observation
is identifiedwith eitherof the two componentsin the mixtureaccordingto the estimatedposterior
probabilitiesdefinedby (9). This plot may providea way to visualize overdispersionforthe data.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
396 Biometrics, June 1996
Table 6
Mixed Poisson regressionmodel estimatesfor Ames salmonella assay data
Component Mixing Poisson rate
number probability Log
(j) Pj a3j aji a3j2 likelihood AIC BIC
One-component mixture
1 2.173 4.132 -0.001013 0.3198 -68.13 -71.13 -72.47
Two-component mixture
1 0.6145 3.0779 -68.93 -71.93 -73.27
2 0.3855 3.7112
1 0.5617 2.9886 0.000188 -68.81 -73.81 -76.04
2 0.4383 3.6428 0.000082
1 0.8132 1.9125 -0.001247 0.3623 -60.90 -67.90 -71.02
2 0.1868 2.4064 -0.001294 0.3790
1 0.8173 1.9094 -0.001260 0.3640 -60.91 -65.91 -68.14
2 0.1827 2.4768
Three-component mixture
1 0.5918 1.8484 -0.001190 0.3640 -60.78 -71.78 -76.68
2 0.3241 2.8535 -20.000154 0.1476
3 0.0841 5.9320 -0.000100 -0.3895
Bold numbersare the parameter, AIC and BIC valuesoftheselectedmodel.
estimates,log-likelihood,
From it we conjecturethat the three observationsclassifiedas component2 may be outliers,and
that overdispersionmay be due to these threeobservations.
Residual plots are displayed in Figure 6. The plot of average relativecoefficient
changes shows
that the 12th observationis influential.On omittingthe 12th observation,the estimates of the
interceptsin two components are 2.2242 and 2.5460 respectively;the estimates of the other
common regressionparameters are -0.00067 and 0.2430 respectively;and the estimates of the
mixing probabilitiesfor the two componentsare 0.5644 and 0.4356 respectively.Note that the
new interceptestimates are very close, suggestingthat the data excludingthe 12th observation
may not be overdispersed.In fact,we fitthis data with a Poisson regressionmodel, and findthat
there is no strongevidence of overdispersion(Pa = 1.6142, Pb = 1.6132 and Pc = 1.8543). This
shows that extra-Poissonvariationmay be caused by outliersas suggestedabove.
Table 5 shows that the regressioncoefficientestimates, &1 and 6&2, do not vary drastically
across models, but theirstandard errorsdo. For instance,the value of &12/se(&12)varies between
0.3640/0.0665 = 5.4737 under the mixed Poisson regressionmodel and 0.3110/0.09901 = 3.1411
under the quasi-likelihoodmodel. Thus, although all four models show that mutagenic effects
are significant,they disagree to the significanceof the effects.Note that in this application,
the confidenceintervalsunder the mixed Poisson regressionmodel are much Smallerthan either
the quasi-likelihoodor negative binomial model. Hence effectsare estimated more precisely.For
example, a 95% confidenceintervalforthe coefficient of log(dose + 10) under the mixed Poisson
regressionis 0.3640 ? 0.1303, 0.3110 ? 0.1941 under quasi-likelihood,and 0.313 ? 0.1701 underthe
negativebinomial model. This suggeststhat using different models to account foroverdispersion
may lead to different conclusionsin other applications. However,there is no guarantee that the
standard errorsunderthe finitemixturemodel will always be the smallest.
This example illustrates a differencebetween our approach and the usual approaches for
accountingfor overdispersion.Since the variance exceeds the mean, methods which correctfor
this by increasingthe variance may lead to less significantregressioncoefficientestimates. Our
approach has a different effect.By allowingfora random interceptwith finitesupport,the mixed
Poisson regressionmodel estimatescoefficient effectswith smallererror.
6. Conclusion
This paper provides a mixed Poisson regressionmodel in which the rates of the component
distributionsdepend on covariates. This model can be used to explain overdispersedPoisson
regressionmodels.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 397
0~~~~
co
0~
C4
C I
C\
0 200 400 600 800 1000
dose ofquinoline
Figure 5. Means and classification for assay data according to the estimated posterior
probabilities based on the fitted model.
N N .~~~~~~~
B~ ~ ~ ~ ~~- ----------- X -- --- ----- - -- ------
5 10 15 5 10 15
Index Index
5 10 15 5 10 15
index index
Figure 6. Residuals and average relativecoefficient
changes forassay data.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
398 Biometrics,June 1996
The EM algorithmhas been applied to findthe MLE of the parametersin the mixed models.
The results of simulationshave shown that our code worksefficiently. In addition, we discussed
how to use AIC and BIC formodel choice and residualanalysis foroutlierdetection.Two examples
illustratethe use of these models and providesome interestingresults.In the firstapplicationthat
regressioncoefficients varyacross components,whereasthe second example has commonregression
coefficientsand random intercepts.
ACKNOWLEDGEMENTS
We wishto thankProfessorsN. M. Laird and B. Leroux forhelpfulcommentson an earlyversionof
this paper. This researchhas been partiallysupportedby NSERC grantA5527. We thankProfessor
Breslowforprovidingus with his code forestimatingparametersof the quasi-likelihoodmodel.
RESUME
Cet article6tudie une classe de modulesde melangede lois de Poisson avec des taux dependantsde
covariables.Les menlanges de lois de Poisson independantesou les modeles de regressionde Poisson
en sontdes cas particuliers.La discussionportesuccessivementsur les mnthodesd'estimationquasi-
Newtonet bashes sur l'algorithmeEM, sur une procedurede selectionde modele,sur lanalyse des
residuset les tests d'ajustement. La mise en oeuvre et les performancespour le choix d'un modele
sont exploreespar une methodede Monte Carlo. Cette methodologieest enfinutiliseepour 6tudier
d'une part des frequencesde crises d'6pilepsie et d'autre part celles des mutationsde Salmonelles
dans le test de mutagenicit6de Ames.
CODES
FORTRAN codes for algorithms and the seizure data are available from http://markov.
commerce.ubc.ca. /marty/home.
html
REFERENCES
Akaike,H. (1973). Informationtheoryand an extensionof the maximumlikelihoodprinciple.In
Second InternationalSymposiumon InformationTheory,B. N. Petrov and F. Csaki (eds),
267-281. Budapest: Akademia Kaido.
Akaike,H. (1974). A new look at the statistical model identification.IEEE Trans. on Automatic
ControlAC-19, 716-723.
Albert, P. S. (1991). A two-state Markov model for a time series of epileptic seizure counts.
Biometrics47, 1371-1381.
Breslow,N. E. (1984). Extra-Poissonvariationin log-linearmodels. AppliedStatistics33, 38-44.
Breslow,N. E. (1990a). Tests of hypothesesin overdispersedPoisson regressionand other quasi-
likelihoodmethods. Journalof the AmericanStatisticalAssociation 85, 565-571.
Breslow, N. E. (1990b). Furtherstudies in variabilityof pock counts. Statistics in Medicine 9,
615-626.
Breslow, N. E. and Clayton, D. G. (1993). Approximateinferencein generalized linear mixed
models. Journalof the American StatisticalAssociation 88, 9-25.
Brillinger,D. R. (1986). The Natural variabilityof vital rates and associated statistics (with
discussion). Biometrics42, 693-734.
Cameron,A. C. and Trivedi,P. K. (1990). Regression-basedtests foroverdispersionin the Poisson
model. Journalof Econometrics46, 347-364.
Cameron,A. C. and Trivedi,P. K. (1986). Econometricmodels based on count data: Comparisons
and applicationsof some estimatorsand tests. Journalof AppliedEconometrics1, 29-53.
Collings,B. J. and Margolin,B. H. (1985). Testinggoodness-of-fit
forthe Poisson assumptionwhen
observationsare not identicallydistributed.Journal of the American StatisticalAssociation
80, 411-418.
Cox, D.R. (1983). Some remarkson overdispersion.Biometrika70, 269-274.
Dean, C. and Lawless J. F. (1989). Tests fordetectingoverdispersionin Poisson regressionmodel.
Journalof the American StatisticalAssociation84, 467-472.
Dean, C. (1992). Testingforoverdispersionin Poisson and binomialregressionmodels. Journalof
the American StatisticalAssociation87, 451-457.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihoodfromincomplete
data via the EM algorithm(with discussion), Journalof the Royal StatisticalSociety,Series
B 39, 1-38.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
MixedPoisson RegressionModels 399
Fisher,R. A. (1950). The significanceofdeviationsfromexpectationin a Poisson series. Biometrics
6, 17-24.
Follmann,D. A. and Lambert,D. (1989). Generalizinglogisticregressionby nonparametricmixing.
Journalof the AmericanStatisticalAssociation 84, 294-300.
Follmann, D. A. and Lambert, D. (1991). Identifiabilityof finitemixtureof logistic regression
models. Journalof StatisticalPlanning and Inference27, 375-381.
Hinde, J. P. (1982). Compound Poisson regressionmodels. In GLIM 82 R. Gilchrist(ed), 109-121.
New York: Springer.
Hopkins,A. Davies, P. and Dobson, C. (1985). Mathematical models of patternsof seizures:Their
use in the evaluationof drugs. Archivesof Neurology42, 463-467.
Laird, N. M. (1978). Nonparametricmaximumlikelihoodestimationofmixingdistribution.Journal
of the AmericanStatisticalAssociation 73, 805-811.
Lawless, J. F. (1987a). Regression methods for Poisson process data. Journal of the American
StatisticalAssociation82, 808-815.
Lawless, J. F. (1987b). Negative binomialand mixed poisson regression.The Canadian Journalof
Statistics15, 209-225.
Le, D. N., Leroux, B. G. and Puterman, M. L. (1992). Exact likelihoodevaluation in a Markov
mixturemodel fortime seriesof seizure counts. Biometrics48, 317-323.
Lehmann,E. L. (1983). Theoryof Point Estimation.New York: Wiley.
Leroux, B. G. (1989). Maximumlikelihoodestimationformixturedistributionand hiddenMarkov
models. Universityof BritishColumbia, Vancouver,B.C., Canada, Ph.D. dissertation.
Leroux, B. G. (1992). Consistent estimation of a mixing distribution.Annals of Statistics 20,
1350-1360.
Leroux, B. G. and Puterman M. L. (1992). Maximum penalized likelihood estimation for
independentand Markov dependentmixturemodels. Biometrics48, 545-558.
Lindsay,B. G. (1983). The geometryofmixinglikelihood:A generaltheory.The Annals ofStatistics
11, 86-94.
Manton, K. G., Woodbury,M. A., and Stallard, E. (1981). A variance componentsapproach to
categoricaldata models with heterogeneouscell populations: Analysis of spatial gradientsin
lung cancer mortalityrates in North Carolina counties. Biometrics37, 259-269.
Margolin, B. H., Kaplan, N., and Zeiger, E. (1981). Statistical analysis of the Ames
salmonella/microsome test. ProceedingsoftheNational AcademyofSciences U.S.A. 76, 3779-
3783.
McCullagh, P. and Nelder,J. A. (1989). GeneralizedLinear Models,2nd edition.London: Chapman
and Hall.
McLachlan, G. J. and Basford,K. E. (1988). MixtureModels. New York: Marcel Dekker,Inc.
Milton, J. G., Gotman, J., Remillard, G. M., and Andermann,F. (1987). Timing of seizure
recurrencein adult epilepticpatients:A statisticalanalysis. Epilepsia 28, 471-478.
Nash, J. C. (1990). CompactNumericalMethodsfor Computers,2nd edition.Bristol:Adam Hilger.
Neuhaus, J. M., Kalbfleisch,J. D., and Hauck, W. W., (1991). A comparisonofcluster-specificand
population-averagedapproachesforanalyzingcorrelatedbinarydata. InternationalStatistical
Review59, 25-35.
Olson, L. D., lasemidis, L. D., and Sackellares,J. C. (1989). Evidence that interseizureinterval
exhibitlow dimensionalnonrandomdynamics.Epilepsia 30, 644 (abstract).
Pierce, D. A. and Schafer,W. (1986). Residuals in generalized linear models. Journal of the
AmericanStatisticalAssociation 81, 977-986.
Redner, R. A. and Walker, H. F., (1984). Mixture densities,maximum likelihoodand the EM
algorithm,SIAM Review26, 195-239.
Schall, R. (1991). Estimation in generalizedlinear models with random effects.Biometrika78,
719-727.
Schwarz,G. (1978). Estimatingthe dimensionof a model. The Annals of Statistics6, 461-464.
Simar, L. (1976). Maximum likelihoodestimationof a compound Poisson process. The Annals of
Statistics4, 1200-1209.
Tauboll, E., Lundervold,A., and Gjerstad, L. (1989). Characterizationof seizure patterns in
epilepsy.Epilepsia 30, 644 (abstract).
Teicher,H. (1961). Identifiability of mixtures.The Annals of MathematicalStatistics32, 244-248.
Titterington,D. M., Smith,A. F., and Markov,U. E. (1985), StatisticalAnalysis of Finite Mixture
Distributions.New York: JohnWiley & Sons.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions
400 Biometrics, June 1996
Wang, P., (1994). Mixed regressionmodels for discrete data. Unpublished Ph.D. Dissertation,
Universityof BritishColumbia, Vancouver,B.C., Canada.
Williams, D. A. (1982). Extra-binomialvariationin logisticlinear models. Applied Statistics31,
144-148.
Wu, C. F. J. (1983). On the convergencepropertiesof the EM algorithm.The Annals of Statistics
11, 95-103.
Zeger, S. L., Liang, K. Y., and Albert,P. A., (1988). Models forlongitudinaldata: A generalized
estimatingequation approach, Biometrics44, 1049-1060.
Zelterman,D. and Chen, C. F., (1988). Homogeneitytests against central-mixture alternatives,
Journalof the AmericanStatisticalAssociation 83, 179-182.
ReceivedOctober1993; revisedOctober1994; acceptedMarch 1995.
This content downloaded from 131.94.16.10 on Tue, 24 Sep 2013 14:11:32 PM
All use subject to JSTOR Terms and Conditions