Sampling
What is sampling?
Sampling involves the selection of a number of study units from a defined study population.
The population is too large for us to consider collecting information from all its members.
Instead we select a sample of individuals hoping that the sample is representative of the
population.
When taking a sample, we will be confronted with the following questions:
a) What is the group of people from which we want to draw a sample?
b) How many people do we need in our sample?
c) How will these people be selected?
Definitions
Target population (reference population): Is that population about which an
investigator wishes to draw a conclusion.
Study population (population sampled): Population from which the sample actually
was drawn and about which a conclusion can be made. For Practical reasons the
study population is often more limited than the target population. In some
instances, the target population and the population sampled are identical.
Sampling unit: The unit of selection in the sampling process. For example, in a
sample of districts, the sampling unit is a district; in a sample of persons, a person,
etc.
Study unit: The unit on which the observations will be collected. For example,
persons in a study of disease prevalence, or households, in a study of family size.
N.B. The sampling unit is not necessarily the same as the study unit.
Sample design: The scheme for selecting the sampling units from the study population.
Sampling frame: The list of units from which the sample is to be selected.
40
Research methodology
The existence of an adequate and up-to-date sampling frame often
defines the study population.
Sampling methods
An important issue influencing the choice of the most appropriate sampling method is
whether a sampling frame is available, that is, a listing of all the units that compose the study
population.
a) Non-probability sampling methods
Examples:
1. Convenience sampling: is a method in which for convenience sake the study
units that happen to be available at the time of data collection are selected.
2. Quota sampling: is a method that insures that a certain number of sample units
from different categories with specific characteristics appear in the sample so
that all these characteristics are represented. In this method the investigator
interviews as many people in each category of study unit as he can find until he
has filled his quota.
3. Purposeful sampling strategies for qualitative studies: Qualitative research
methods are typically used when focusing on a limited number of informants,
whom we select strategically so that their in-depth information will give optimal
insight into an issue about which little is known. This is called purposeful
sampling.
The above sampling methods do not claim to be representative of the entire
population.
41
Research methodology
Random sampling strategies to collect quantitative data: If the aim of a study is to
measure variables distributed in a population (e.g., diseases) or to test hypotheses about
which factors are contributing significantly to a certain problem, we have to be sure that we
can generalise the findings obtained from a sample to the total study population. Then,
purposeful sampling methods are inadequate, and probability or random sampling methods
have to be used.
b) Probability sampling methods: They involve random selection procedures to ensure
that each unit of the sample is chosen on the basis of chance. All units of the study
population should have an equal or at least a known chance of being included in the
sample.
1. Simple Random Sampling (SRS): This is the most basic scheme of random
sampling. To select a simple
random sample you need to:
Make a numbered list of all the units in the population from which you
want to draw a sample. Each unit on the list should be numbered in
sequence from 1 to N (Where N is the Size of the population).
Decide on the size of the sample
Select the required number of sampling units, using a “lottery” method
or a table of random numbers.
2. Systematic Sampling: Individuals are chosen at regular intervals (for example,
every 5th, 10th, etc.) from the sampling frame. Ideally we randomly select a number
to tell us where to start selecting individuals from the list. For example, a
systematic sample is to be selected from 1000 students of a school. The sample
size is decided to be 100. The sampling fraction is: 100/1000 = 1/10. The number
of the first student to be included in the sample is chosen randomly by picking one
out of the first ten pieces of paper, numbered 1 to 10. If number 5 is picked, every
tenth student will be included in the sample, starting with student number 5, until
100 students are selected. Students with the following numbers will be included in
the sample: 5,15, 25, 35,45, . . . , 985, 995.
42
Research methodology
Systematic Sampling is usually less time consuming and easier to
perform than SRS.
It provides a good approximation to SRS.
Should not be used if there is any sort of cyclic pattern in the ordering
of the subjects on the list.
Unlike SRS, systematic sampling can be conducted without a sampling
frame (useful in some situations where a sampling frame is not readily
available).
4. Stratified sampling: If it is important that the sample includes representative
groups of study units with specific characteristics (for example, residents from
urban and rural areas), then the sampling frame must be divided into groups, or
strata, according to these characteristics. Random or systematic samples of a
predetermined size will then have to be obtained from each group (stratum).
This is called stratified sampling.
Some of the reasons for stratifying the population may be:
Different sampling schemes may be used in different strata, e.g. Urban
and rural
Conditions may suggest that prevalence rates will vary between strata: the
overall estimate for the whole population will be more precise if
stratification is used.
Administrative reasons may make it easier to carry out the survey through
an organization with a regional structure.
5. Cluster sampling: When a list of groupings of study units is available (e.g.
villages, etc.) or can be easily compiled, a number of these groupings can be
randomly selected. The selection of groups of study units (clusters) instead of
the selection of study units individually is called cluster sampling. Clusters are
often geographic units (e.g. districts, villages) or organizational units (e.g.
clinics).
43
Research methodology
6. Multi-Stage Sampling: This method is appropriate when the population is large
and widely scattered. The number of stages of sampling is the number of times
a sampling procedure is carried out.
The primary sampling unit (PSU) is the sampling unit (or unit of
selection in the sampling procedure) in the first sampling stage;
• The secondary sampling unit (SSU) is the sampling unit in the second sampling stage,
etc.
e.g.
After selection of a sample of clusters (e.g. household), further sampling of
individuals may be carried out within each household selected. This constitutes two
stage sampling, with the PSU being households and the SSU being individuals.
Advantages: less costly, we only need to draw up a list of individuals in the clusters
actually selected, and we can do that when we arrive there.
Disadvantage: less precise than SRS.
a) Sampling error (i.e., random error)
When we take a sample, our results will not exactly equal the correct results for the whole
population. That is, our results will be subject to errors. This error has two components:
sampling and non-sampling errors.
Random error, the opposite of reliability (i.e., Precision or repeatability), consists of random
deviations from the true value, which can occur in any direction.
Sampling error (random error) can be minimized by increasing the size of the sample.
Reliability (or precision): This refers to the repeatability of a measure, i.e., the degree of
closeness between repeated measurement of the same value. Reliability addresses the
question, if the same thing is measured several times, how close are the measurements to
each other?
44
Research methodology
The sources of variation resulting in poor reliability include:
a) Variation in the characteristic of the subject being measured. Example: blood pressure
b) The measuring instruments, e.g. questionnaires
c) The persons collecting the information (observer variation)
Inter-observer variation: differences between observers in measuring the same
observation
Intra-observer variation: differences in measuring the same observation by the same
observer on different occasions.
b) Non Sampling error (i.e., bias)
Bias, the opposite of validity, consists of systematic deviations from the true value, always in
the same direction.
It is possible to eliminate or reduce the non-sampling error (bias) by careful design of the
sampling procedure.
Validity: This refers to the degree of closeness between a measurement and the true value
of what is being measured. Validity addresses the question, how close is the measured
value to the true value?
To be accurate, a measuring device must be both valid and reliable. However, if one cannot
have both, validity is more important in situations when we are interested in the absolute
value of what is being measured. Reliability on the other hand is more important when it is
not essential to know the absolute value, but rather we are interested in finding out if there is
a trend, or to rank values.
Examples of types of bias in sampling include:
Bias resulting from incompleteness of the sampling frame: accessibility bias, seasonability
bias, self-reporting bias, volunteer bias, non-response bias etc.
45
Research methodology
Non-response bias refers to failure to obtain information on some of the subjects included in
the sample to be studied. It results in significant bias when the following two situations are
both fulfilled.
1. When non-respondents constitute a significant proportion of the sample.
2. When non-respondents differ significantly from respondents.
The issue of non-response should be considered during the planning stage of the
study:
a) Non-response should be kept to a minimum. E.g. below 15%
Methods that may help in maintaining non-response at a low level could be:
• Training data collectors to initiate contact with study subjects in a respectful way and
convince them about the importance of the given study (this minimizes the refusal type
of non-response)
• Offering incentives to encourage participation (this should be done by taking account
of the potential problems that may arise in conducting future research)
• By making repeated attempts (at least 3 times) to contact study subjects who were
absent at the time of the initial visit.
b) The number of non-responses should be documented according to type, so as to
facilitate an assessment of the extent of bias introduced by non-response.
c) As much information as possible should be collected on non-respondents, so as to see
in what ways they may differ from respondents.
• Selection bias cannot be corrected by increasing the size of the sample, why? How
do you remove this type of bias?