SAMPLE SIZE
ESTIMATION
Community Medicine
North Bengal Medical College
Research Process
Research Planning
Hypothesis and Aims
Research Design
Data Collection
Organization and Presentation
Data Analysis
Interpretation and Conclusion
Publication
Research Design
Study Type and Design
Sampling Method
Sample Size
Why Estimate Sample Size?
Too small sample: may fail to answer the
question or answer imprecisely
Too big a sample: may answer the
question but may be logistically difficult or
costly
The goal:
to estimate an appropriate number of
participants given the study design that
will give reasonably precise values with
adequate power.
So, the
Sample -
Must be of optimum size & should be large
enough to give valid estimate about
population characteristics.
There is no magic number that we can
point to as an optimum sample size.
Also we can not say what percentage of
population should be sampled.
Factors Determining Sample
Size
Nature of universe
Confidence interval
Type of study
Design effect
Sampling technique
Anticipated
Magnitude of
problem
Precision & power of
the study
dropouts
SOME TERMINOLOGIES
- For Understanding Sample Size
Estimation
Hypotheses
Hypothesis: a prediction about the outcome of
research
Hypothesis testing is a procedure that uses
sample data to evaluate an hypothesis about a
population parameter (e.g. mean, standard
deviation, proportion)
Briefly, we make a decision about the
hypothesis on the basis of our sample data.
Types of Hypotheses
Null Hypothesis (H0):
a statement which usually claims a zero difference
which the researcher tries to disprove, reject or nullify.
(The mean weight of males and females are not different. )
Alternative Hypothesis (H1):
the statement we actually want to test;
usually postulates a non-zero difference or relationship
(The mean weight of males and females are different.)
Directional Hypotheses
H 0: 1 = 2
H 1: 1 2
Two-sided test
H1: 1 > 2; 1 < 2
One-sided test
Errors in Hypothesis
Testing
Type I () Error & Confidence
Interval
Type I Error ( error)
Rejecting H0 when it is actually true
Concluding a difference when actually no
difference exists
Confidence Interval:
The probability that an estimate of a
population parameter is within certain
specified limits of the true value;
commonly denoted by : 1- .
p Value and Significance
Level
p-value:
Probability of type I () error = p-value
A probability indicating how likely to get a sample with
such a test statistic like ours or with more extreme one
provided that the H0 is true.
The smaller the p value - more unlikely the H0 is true.
Significance level
An arbitrarily and a priori declared probability
threshold.
Cut-off point for p-value, below which H0 will be
rejected.
Typically set at 5%. (i.e. = 0.05; CI (1- ) = 0.95)
Type II () Error and Power
Type II Error ( )
Accepting H0 when it is actually false
Concluding no difference when one does
exist
Power:
Probability of detecting difference if one
exists
It is commonly denoted by: 1
Interpretation: 80% power means there is
Commonly used power: 80% or 90%
80% chance that a true effect /difference will
be found and 20% chance that a true effect
will be missed ( error).
Z and Z
Z value required for a chosen level
of
(Type I error)
Z value required for a chosen level
of (Type II error)
OR a chosen level of Power (1- )
Level of (p
value)
0.05
1.96
0.01
0.001
2.57
3.29
Level of
0.10
0.15
1.28
1.04
0.20
0.84
0.25
0.67
Precision
A measure of how close an estimate is to
the true value of a population parameter.
It may be expressed in absolute terms or
relative to the estimate.
Desired width of the confidence interval
for sample estimate
Design Effect
A measure of variability due to selection
of study subjects by any sampling
method other than simple random
sampling.
Thus ultimately the calculated sample
size is multiplied by 2 (usually) to get
the same precision as simple random
sampling
Standard Deviation (
or SD)
Most frequently used measure of dispersion of
data
(Root-Means-Square-Deviation)
SD is given by the formula
SD = (X ) 2 / n 1
When sample size > 30, denominator n is used
instead of n 1.
More the SD, more the dispersion of data around
mean.
When sample size increases then SD decreases.
Standard Error
When studying a population or universe, many
different samples can be chosen out of it.
If we calculate the sample mean, we would
see that all the sample means are different,
though all the samples have been drawn from
same universe.
Mean of all the sample means will corroborate
to population mean. The standard deviation of
the means is a measure of the standard error
and is given by the formula SD/ n.
n
Sample Size
Calculations
Sample Size: for various types
of study
Cross sectional Study
One sample situation
Two sample situation
Case Control Study
Cohort Study
Experimental Study
Cross-Sectional :
One Sample
Situations
Outcome measure is dichotomous
variable (proportion)
Estimating a population proportion with
specified absolute precision
Estimating a population proportion with
specified relative precision
Outcome measure is continuous
variable (mean, standard
deviation)
Estimating a population
proportion with specified absolute
precision
Required information and notation
(a) Anticipated population proportion
(b) Confidence level
100(1-)%
(c) Absolute precision required on either
side of the proportion (in percentage
points)
d
Sample Size: n = z1 -/2P(1P)
d2
Example
: Problem 1
A local health department wishes to
estimate the prevalence of tuberculosis
among children under five years of age in
its locality.
How many children should be included in
the sample so that the prevalence may be
estimated to within 5 percentage points of
the true value with 95 % confidence, if it is
known that the true rate is unlikely to
exceed 20 %?
Problem 1:
Solution
(a) Anticipated population proportion 20 %
(b) Confidence level
95 %
(c) Absolute precision (15 % -25 %)
5%
z1 -/2= 1.96
Sample Size= z1 -/2P(1- P) = (1.96)2 x 0.2 x 0.8
2
d2
= 246 children
(0.05)2
Estimating a population
proportion with specified
relative precision
Required information and notation
(a) Anticipated population proportion
P
(b) Confidence level
100(1-)%
Sample
n = z1
(c)
RelativeSize:
precision
2
-/2(1- P)
2 P
Example:
Problem 2
An investigator seeks to estimate the
proportion of children in the country who
are receiving appropriate childhood
vaccinations (Immunization coverage).
How many children must be studied if the
resulting estimate is to fall within 10 % of
the true proportion with 95 % confidence?
(The vaccination coverage is expected to
be 50%.)
Problem 2: Solution
(a) Anticipated population proportion
(b) Confidence level
95%
(c) Relative precision (45% to 55%)
%) z1 -/2= 1.96
50 %
10 % (of 50
Sample Size= z1 -/2(1- P) = (1.96)2 x 0.5
2 P
(0.1)2 X 0.5
= 384 children
Continuous Outcome
Variable
Required information and notation
(a) Anticipated population SD
(b) Confidence level
100(1)%
(c) Relative precision 2
Sample Size: n = 2z1 -/2
Example:
Problem 3
Calculate the sample size to obtain an
estimate of Hb % in a community ,
where Hb % in the community is 10.4
gm % & SD is 2.1 gm %.
The chosen confidence level is 95 %
and relative precision (allowable error)
is 5 %
Problem 3:
Solution
(a) Anticipated population SD
2.1
(b) Confidence level
95 %
(c) Relative precision (5% of 2.1)
0.52
z1 -/2 = 1.96
N = 2 z1 -/22
62.65 = 63
= ( 1.96)2 x (2.1)2
(0.52)2
Cross-Sectional:
Two Sample
Situations
Estimating the difference between
two population proportions with
specified absolute precision
Estimating the difference between
two population proportions with
specified relative precision
Estimating difference between two
population proportions with
specified absolute precision
Required information and notation
a.Anticipated population proportions
P1
and P2
b.Confidence level
100(1-)%
c.Absolute precision required on either side
of the true value of the difference between
the proportions (in percentage points)
d
d. Intermediate value
V=P1 (1P1)+P2 (1-P2)
Sample Size:
2
n= z1 -/2 [P1 (1-P1) + P2 (1-P2)]
d2
or
n= z1 -/2 V
d2
where V= P1 (1-P1) + P2 (1-P2)
Example:
Problem 4
In a pilot study of 50 agricultural workers in an
irrigation project, it was observed that 40% had
active Schistosomiasis.
A similar pilot study of 50 agricultural workers not
employed on the irrigation project demonstrated
that 32% had active Schistosomiasis.
If an epidemiologist would like to carry out a larger
study to estimate the Schistosomiasis risk
difference to within 5 percentage points of the
true value with 95% confidence, how many people
must be studied in each of the two groups?
Problem 4:
a.
b.
c.
d.
Solution
Anticipated population proportions
40%, 32%
Confidence level
95%
Absolute precision (in percentage points)
5
Intermediate value
0.46
z1 -/2= 1.96
2
Sample Size= z
1 -/2 V =
(1.96)2 X 0.46
d2
= 707 in each group.
(0.05)2
Case Control Study
Disease
Expose Unexpose
d
d
a
b
No
c
d
disease
Odds Ratio (OR) = ad/bc
Sample Size:
Case Control
Study
Required information and notation
a.Two of the following should be known:
Anticipated probability of "exposure for people
with the disease
[a/(a + b)] P1
Anticipated probability of "exposure for people
without the disease [c/(c + d)]
P2
Anticipated odds ratio OR
b. Confidence level 100(1-)%
c. Relative precision
Sample Size = z1 -/2
P1(1 P1)]
P2(1- P2)]
[loge(1- )]2
Sample size can be derived using appropriate
table for case control studies
Example:
Problem 5
In an area cholera is posing a serious public
health problem; about 30 % of the population
are believed to be using water from
contaminated sources.
A case-control study of the association between
cholera and exposure to contaminated water is
to be undertaken in the area to estimate the
odds ratio to within 25 % of the true value,
which is believed to be approximately 2, with 95
% confidence.
What sample sizes would be needed in the
cholera and control groups?
Problem 5: Solution
Anticipated probability of "exposure" given "disease= ?
Anticipated probability of "exposure" given "no
disease
(approximated by overall exposure rate)
= 30 %
Anticipated odds ratio
Confidence level
95 %
Relative precision
25 %
Applying formula sample size of 408 would be needed in
each group.
This can be derived from appropriate table of sample size
for case control studies
If number of cases are large, then same
number of controls;
Cases : Control =
1:1
If number of cases are small, then number
of controls may be twice or thrice the
number of cases.
Cases : Controls = 1 : 2 / 3 / 4
Cohort Study
Diseas
e
Exposed
a
Unexpos
c
ed
Relative Risk = a/a+b
c/c+d
No
disease
b
d
Sample Size:
Cohort study
Required information and notation
a.Two of the following should be known:
Anticipated probability of disease in people
exposed to the factor of interest
P1
Anticipated probability of disease in people not
exposed to the factor of interest
P2
Anticipated relative risk RR
b. Confidence level 100(1-)%
c. Relative precision
Sample Size = z1 -/2
(1 P1)
P1
[loge(1- )]2
+ (1- P2)
P2
Sample size can be derived using
appropriate table for cohort studies
Example:
Problem 6
An epidemiologist is planning a study to
investigate the possibility that a certain lung
disease is linked with exposure to a recently
identified air pollutant.
What sample size would be needed in each of
two groups, exposed and not exposed, if the
researcher wishes to estimate the relative risk to
within 50 % of the true value (which is believed
to be approximately 2) with 95 % confidence?
The disease is present in 20% of people who are
not exposed to the air pollutant.
Problem 6:
Solution
Anticipated probability of disease given "exposure?
Anticipated probability of disease given "no
exposure
20%
Anticipated relative risk
2
Confidence level
95%
Relative precision
50%
Applying the formula, sample size of 44 would be
needed in each group.
This can be derived using appropriate table of
sample size for cohort studies
Sample Size:
Experimental
Studies
Considerations:
Here the purpose is to test null hypothesis,
thus sample size calculation requires
specification of limits of errors one is
willing to accept in accepting or rejecting
null hypothesis (type I & II error).
Outcome Measure
- Dichotomous variable OR Continuous
variable.
Effect size of clinical importance
Effect Size of Clinical
Importance
This is the smallest difference between
the group means or proportions (or odds
ratio / relative risk closest to unity)
which would be considered to be
clinically or biologically important.
The sample size should be set so that if
such a difference exists, is very likely
that a statistically significant result
would be obtained
Outcome Measure: Dichotomous
Variable
Required information and notation
a. Test value of the difference between P1-P2= 0
the population proportions under the
null hypothesis
b. Anticipated values of the population
P1 and P2
proportions
c. Level of significance 100 %
d. Power of the test 100(1-)%
e. Alternative hypothesis:
either (P1-P2)> 0 or (P1-P2) < 0 (for one-sided
test)
Or P1 - P2 0 (for two-sided test)
N=
(Z
+P2Q2)
1-
+ Z )
(P1Q1
(P1 - P2)
Where Q1 is 1-P1 and Q2 is 1-P2
Example:
Problem 7
Estimate the sample size for a trial to
study the effects of a new treatment
over standard treatment to reduce
the 5 year mortality in patients with
a particular cancer.
The success rate of the standard
drug is 55% and with the new drug it
is expected to be 70%.
Problem 7: Solution
Z
at 95% confidence limit= 1.96
When power of the trial 80%, =0.2, Z = 0.84
P1=0.7, Q1= 0.3; P2=0.55, Q2= 0.45
n=
(Z
1-
+ Z ) 2 (P1Q1 +P2Q2)
(P1-P2) 2
= (1.96+ 0.84)2 {(0.7 X 0.3) +(055 X 0.45)}
(0.7-0.55)2
= 160 each from study and control groups
Outcome Measure: Continuous
Variable
Required information and notation
a.Estimate of variable of individual values
b. Magnitude of difference that is desired to
detect
c. Level of significance
100
%
Sample size n = 2 (Z 1- + Z
d. Power of the test
100(1-)
%) 2 2
()
Example:
Problem 8
Estimate the sample size for a trial to
study the effects of a new drug over the
standard drug to reduce the morbidity in
patients with COPD.
The standard deviation of FEV 1 of the
standard drug is 0.4 ml and the
difference between mean FEV 1 values of
treatment group and control group is 150
ml.
Problem 8
: Solution
= Standard deviation of FEV1 (From previous
study, we have got SD of FEV1 = 0.40)
Z (value of Z for ) = 1.96 (p = 0.05, 95%
confidence desired two tailed test)
Z (value of Z for beta)= 0.84 (20% beta error,
thus 80% power desired two tailed test)
(Difference to be detected) = 150 ml (0.15 l)
or larger difference between mean FEV 1 values
of experimental group and control group.
Applying the formula
n= 2 (Z
1-
+ Z ) 2 ()
()
2 (1.96+ 0.84) (0.40)2
(0.15) 2
= 125
So, 125 subjects per group; hence total
125 X 2 = 250 subjects
Recommended Reading
SAMPLE SIZE
DETERMINATION IN HEALTH
STUDIES- A Practical Manual
S. K. Lwanga & S. Lemeshow
Free Sample Size Software
Epi- Info:
http://wwwn.cdc.gov/epiinfo/
Win Pepi:
http://www.brixtonhealth.com/pepi4
windows.html
Thank You