Power and sample-size calculation for Poisson regression
6 February 2003, revised 7 Feb
Peter Lane, Research Statistics Unit
Background
Standard software such as nQuery and Pass do not provide for the calculation of
power or sample size from Poisson regression. There may be specialist packages,
but I am not aware of any. The GSK Technical Reference Document on sample
size also does not mention Poisson models. The two approaches I have used are to
use a Normal approximation, and to use simulation.
One reference is: David F Signorini (1991), Sample size for Poisson regression.
Biometrika, 78, 446-450. But this deals specifically with a quantitative
explanatory variable and is concerned with the distribution of values of that
variable.
Normal approximation
In a simple Poisson regression model, the response is assumed to have a Poisson
distribution with a separate mean for each treatment group. The model can be
extended to include other explanatory variables, including covariates and class
effects, in the same way as in a linear model except that it is usual to combine the
effects multiplicatively. This is done because it has been found in practice to
provide concise explanation of variation in data with Poisson distributions, and
because count data can often be modelled in terms of probabilities which
combine multiplicatively. It is effected in practice by using a logarithmic "link
function", with effects contributing to an additive "linear predictor" on this
transformed scale.
In the simple model with two treatment groups, we have
yij ~ P(μi), i=1,2, j=1… ni
The estimate of each mean μi is
mi = Σ(yij, j=1… ni)/ni
which has an approximate Normal distribution by the Central Limit Theorem
mi ~ N(μi, μi/ni)
so the difference between the two means is also approximately Normal
m2– m1 ~ N(μ2–μ1, (μ1+μ2)/n), if n1 = n2 = n
If there is overdispersion, the variance of this estimate is modified to
φ(μ1+μ2)/n
where is an estimate of overdispersion. I recommend the deviance estimate,
φD = (residual deviance) / (residual d.f.)
which corresponds to the DSCALE option of the MODEL statement in Proc
Mixed of SAS. The alternative is the Pearson estimate, corresponding to the
PSCALE option.
With a modest number of patients in each arm, say ni >10, the Normal
approximation will be good, as long as the means are not too close to 0, say μi >5.
(I will produce a table indicating how good this approximation is for a range of
combinations of n and μ.) The standard sample-size formula for a t-test with
Normal data can then be applied, giving the number required for each arm, with
equal allocation, as
n = (Z(1–β)+Z(1–α/2))2 * φ(μ1+μ2)/( μ2–μ1)2
where Z(p) is the Normal equivalent deviate for cumulative probability p, α and β
are the Type I and II Errors. The required values of μ1 and μ2 depend on the type
of test to be carried out.
Unequal allocation
If the allocation ratio n2/n1 is r, the formula becomes
n1 = (Z(1–β)+Z(1–α/2))2 * φ(μ1+μ2/r)/( μ2–μ1)2
So, for example, for r=2, so that there are more observations on the test drug than
for the reference, the total number required is
n1+n2 = (Z(1–β)+Z(1–α/2))2 * 3φ(μ1+μ2/2)/( μ1–μ2)2
which is about 13% higher (taking μ1=μ2) than for r=1. For r=3 it is about 33%
higher than r=1.
Use of the formula
It is important to identify the response variable carefully. The Poisson model is
appropriate for analysing actual counts, because the Poisson distribution is the
distribution of the counts resulting from observing a Poisson process over a fixed
period of time. If the counts are scaled, to give a rate for a different interval of
time than what was observed, then the distribution will no longer be Poisson. If
the counts are scaled to give a rate by dividing by ρ, the mean of the rate will be
μ/ρ but the variance will be μ/ρ2; so the distribution could be modelled as Poisson
with underdispersion (φ=1/ρ). But it is better to model the original counts
themselves.
In a standard two-sided test of non-equality, or difference, between two
treatments (conventionally referred to as a test of superiority), it is usual to
calculate sample size on the basis of a pre-assigned value for a difference between
the means, called the clinically relevant difference, denoted here by δ=μ2–μ1. For
Normal data, all that is actually required is this difference; but for Poisson data,
we need both means (giving the formula as above) or δ and one mean, say μ1 for a
reference treatment (placebo or standard active), for which the formula becomes
n = (Z(1–β)+Z(1–α/2))2 * φ(2μ1+δ)/δ2.
A test of superiority is just a one-sided test of difference. The formula is
n = (Z(1–β)+Z(1–α))2 * φ(2μ1+δ)/δ2,
but in practice, for example for drug submissions to regulators, the Type I Error is
required to be half that for the corresponding test of non-equality, so the same
sample size is needed.
In a test of equivalence between two treatments, it is usual to calculate sample
size based on equality of the means. This is done with an "intersection-union"
test, which is described as TOST or two one-sided tests, which requires the
specification of a tolerance, τ, representing the maximum acceptable value of the
difference between the means. The formula for sample size is similar to that for a
non-equality test, but the Type II Error is divided by 2 and the Type I Error
multiplied by 2:
n = (Z(1–β/2)+Z(1–α))2 * 2φμ1/τ 2.
(But see the modification below, in the section on sensitivity analysis).
The Type I Error for this combined test is actually the same as that for each one-
sided test, but in practice for drug submissions (as laid out in ICH E9) it is
required to be half of what would be used for two-sided tests, so the Z(1–α)
component of this formula will be the same as the corresponding component in a
test of non-equality.
In a test of non-inferiority, sample size is again usually calculated on the basis of
equal means for the two treatments. This test is just a one-sided version of the
equivalence test, and the formula becomes
n = (Z(1–β)+Z(1–α))2 * 2φμ1/τ 2.
If a difference δ is assumed between the treatments (still with δ=μ2–μ1, so that δ
is positive if the test mean is greater than the reference mean), the formula
becomes
n = (Z(1–β)+Z(1–α))2 * φ(2μ1+δ)/(τ+δ) 2.
Effect of covariates
If effects of covariates are to be included in a Poisson regression, they should be
combined multiplicatively. With a linear model, the power of the tests described
above would be decreased slightly by the inclusion of covariates, because there
are fewer d.f. for estimating the variance; but this effect is minimal when there are
many patients. With a Poisson model, there is actually no effect on the power,
though in practice the overdispersion parameter has to be estimated in the same
way as the variance in a Normal model. The means used in the calculations are
also not affected, as long as the usual assumption is made in advance, that the
covariates have zero effect. The fact that the model will actually be analysed with
a log link function does not affect the power calculation, because the same model
is being hypothesized if the means are expressed on the natural scale as when they
are expressed on a log scale.
Sensitivity analysis
As with any model, it is important to conduct a sensitivity analysis to establish
how the required sample size, or the power with a given sample size, changes as
the assumed values change. An extra consideration in this model is the
overdispersion factor. It should also be borne in mind that observed values of a
mean that are greater than the hypothesized clinically relevant difference will be
associated with a greater variance as well. In a non-equality, superiority or non-
inferiority test the extra variance will be more than compensated for by the extra
difference between the means. But, in an equivalence test, if the test mean proves
larger than the reference mean, the variance will also be larger than used in the
formula, so it would be prudent to adjust the formula to
n = (Z(1–β/2)+Z(1–α))2 * 2φ(μ1+τ)/τ 2.
Analysis
In the analysis, the log link has to be used, in order to include the effect of
covariates. With this parameterization, the tolerance level to use in a non-
inferiority test needs to be specified on the log scale as well. It is not practicable
to specify a value on the scale of the response variable itself. Instead, the
tolerance level should be specified as a percentage of the mean of the reference
treatment.