Sampling Methods.
Presenter: Dr. Raksha Khaveyya. PK.
Moderator: Dr. Sangeethapriya. AS.
Outline.
• Common terminologies
• Probability Sampling
• Non- probability Sampling
Common terminologies.
• Universe ( Whole population): entire group of the study population is known as universe
or whole population. Represents the complete set of individuals, objects or scores in
which we are interested.
• Sampling frame: a list where all individuals from the whole population are drawn up is
known as sampling frame.
• Sampling unit:each member of the whole population.
• Sample: a small representative part of the whole population.
Probability sampling.
Simple random sampling
• Each member of population has an equal chance of being chosen ( guarantees the
sample chosen is representative of the population).
• Applicable when sample size is small, homogeneous and readily available.
• Complete list of population must be available as sampling frame.
• First all sampling units are assigned with numbers.
• Then sample can be selected by random number table or lottery method.
• Disadvantage: The selected sample may not be truly representative in case of small
sample size.
Random number table.
• A random number table typically contains 10,000 random digits between 0 and 9 that
are written in pairs and arranged in groups of 5 and displayed in rows.
• Because they are randomly ordered, no individual digit can be predicted from
knowledge of any other digit or group of digits.
Random number table method.
• Each sampling unit is assigned with a number.
• Assigned numbers must consist of the same number of digits. If out of 50 people, 10
persons are to be chosen, then numbering should be like 01, 02, 03 and so on up to 50.
• Decide the direction in which the numbers will be read (down, up, left or right).
• Place one point in the random number table blindly.
• The number closest to the point indicated on the table is the starting point.
• Finally, read the numbers in the direction decided beforehand using the number of digits
required.
Random number table method.
• The sampling units (which had prior assigned numbers) matching with the numbers
chosen from the table are then selected.
• Any number occurring for the second time is ignored.
• Any 2-digit number greater than the total number of sampling units is also ignored and
skipped on to the next number.
• Continue in this way until entire sample have been reached.
Lottery method.
• Here each number of the population is assigned with a number.
• The numbers are written in a chit of paper or card, which are placed in a bowl or jar and
thoroughly mixed up. Cards may be shuffled also.
• Then, a researcher picks up the cards as per sample size requirement randomly without
looking.
• Population members having the number as drawn in the cards are included in the
sample.
SRS with replacement SRS without replacement
A member once selected in a sample as A member once selected is not returned to
study subject is returned back to the study the study population before drawing next
population before drawing next sample. sample.
A member has the chance of getting A member’s chance of getting selected is
selected again i.e., more than once. restricted to only once.
We can pick up a card from the bowl, and We put the card aside after drawing it from
put it back into the bowl after selecting the the bowl.
number in the card.
Systematic random sampling.
• For large scattered heterogeneous population.
• All sampling units are assigned with number.
• A random starting point chosen 1st then every nth number has been chosen.
• n is sample interval = total population/sample size.
• 1st unit as random and others as systematic nth unit.
Stratified random sampling.
• For heterogeneous population ( when we want to know distribution according to
particular variable)
• 1st heterogeneous group divided into small homogeneous groups called Strata.
• From each group required number of sample units taken by simple or systematic
random sampling in proportion to its original size.
• Strata should be mutually exclusive(no member should belong to more than one group)
and exhaustive(all members should belong to some group).
Cluster sampling.
• Dividing the specific population of interest into geographically distinct groups/clusters,
such as neighborhood/families.
• Cluster: randomly selected group ; Used when units of population are natural groups or
clusters like blocks, wards, villages, slums etc. If related to geographical area : Area
sampling.
• The 30 cluster sampling technique: 30*7 sample developed by WHO
• From list of all clusters, select 30 clusters= 1st step.(Primary sampling unit PSU)
• Selection of 7 interview site from each cluster = 2nd step(Secondary sampling unit SSU) ;
2 stage cluster sampling.
Cluster sampling.
• Used for evaluation of immunization coverage of districts, attitude of people towards
immunization, contraception, intervention program etc
• ADVANTAGES: suitable for a large geographical area where list of household is not there,
time saving, less costly, sample size is less.
• DISADVANTAGES: (1) gives higher standard error than other sampling design(requires a
larger sample size for reliable estimates of population characteristics)
(2)Cost might be increased due to scattered distribution of clusters(cost of
the increased sample size > costs for carrying out unclustered sampling – clustering not to be
done.
Intra-class/ Intra-cluster coefficient(ICC).
• Participants from the same cluster are likely to be somewhat similar to one another.
• Hence, selecting an additional member from the same cluster will not add much new
information.
• rather inclusion of participants from new clusters will be more useful.
• Thus, it is better to include more number of clusters than more participants within a single
particular cluster.
• ICC – common correlation among pairs of observations from the same cluster.
Selection of clusters from the list of PSUs.
• Simple/systematic random sampling.
• Probability proportionate to population size (PPS):
• List of village, town or wards with respective population/household numbers prepared.
• Say among 30 clusters, 10 cluster has to be taken
• Cumulative population of 30 cluster calculated and divided by 10= sampling interval (SI)
• One random number selected by random number table which is equal or less than SI=
Random start (RS)
• The village/ town have cumulative population equals/exceeds the particular selected RS is
1st cluster.
Selection of individuals from each selected
cluster.
• Simple one stage cluster sample: 1st stage: cluster selected; 2nd stage: all units are
selected.
• Simple two stage cluster sample: 1st stage: cluster selected ; 2nd stage: units are
selected by simple/Systematic random sampling.
• Multi stage sample: more than 2 stage involved;1st stage: cluster selected ; 2nd stage:
clusters are stratified; 3rd stage: simple/Systematic random sampling.
Multistage sampling.
• Carried out in several stages, in large country survey ( anemia /hook worm
survey)
• Any type of probability sampling technique can be applied at each stage and the
technique can be different at each stage.
• India: 5 states: 3 districts: 2 blocks, survey will be done only at the selected blocks
which will adequately represent the entire population of India.
• Reduces the work load of the investigator.
Multiphase sampling.
• Part of information is collected from whole sample and another part from the
subsample.
• 20 fever cases at first clinical examination+ basic blood tests ; high ESR are
subjected to widal / MP test ; those tested negative will be subjected to another
set of tests.
• Less costly/less laborious.
Lot quality assurance sampling(LQAS)
• The technique was developed in 1920s to control the quality of output in industrial
production processes.
• Involves a small independent random sample of a manufactured batch(lot) and then are
tested for quality, if the number of defective items in the sample exceeds a pre-
determined criteria, then the lot is rejected.
• In health sector, LQAS is used to identify communities with unacceptably low
immunization coverage or worrying level of disease prevalence etc.
• Do not give the exact prevalence but probability that particular area has an inadequate
level of immunization or high prevalence of a particular disease.
Lot quality assurance sampling(LQAS)
• Whole district =supervision unit(areas to be chosen for assessment by LQAS)
• Each community= supervision area(Community where assessment will be done)
• Minimum of 19 items from each supervision area is chosen in order to assess an
indicator.
• Sample size of all supervision area = 95 or more – to calculate the performance
of the indicators at the project level.
• 5-6 supervision areas is ideal.
Lot quality assurance sampling(LQAS)
• Can be used to assess binary outcomes only.
• Output: Expressed as % of clients who received a service in a defined period of
time.
• Good= maintain program at current level, identify best practices to help other
programs
• Below average= identify reasons, develop solutions
Lot quality assurance sampling(LQAS)
• Advantages:
• Low sample size is needed.
• Less expensive
• Results are locally relevant and can be utilized in district level annual planning and
decision making.
• Disadvantages:
• Not good for calculating exact coverage in a supervision area, but can calculate coverage
for an entire program.
• Not good for setting priorities among supervision areas with little difference in coverage.
Non-probability
Sampling
Judgement Sampling.
• The researcher selects a sample deliberately or purposely on the basis of his/her own
judgement.
• In other words, the researcher selects the subjects on the basis of what he/she thinks as
representative(using his/her judgement) of the population with regard to characteristics
under study.
• Less time consuming, useful at times but can be misleading, as proper representation
cannot be ensured.
• For example, a researcher may decide to draw the entire sample from one “representative”
block, even though the population includes several blocks in the city.
Snowball sampling(Network sampling).
• Applied when the target population is hard-to-reach, such as drug addicts, commercial sex
workers, individuals with HIV/AIDS, etc.
• Or when the disease in question is rare, where study subjects are hard to find.
• One study subject is asked to identify persons with the same exposure in question for the
purpose of finding the next subject.
• The researcher then goes to the identified person and continues in the same way until the
required sample size is obtained.
Convenience Sampling.
• The researcher selects a sample by choosing those who are convenient and easy to
select.
• In this method, the subjects are either the easiest to select or they are most likely to
respond in the study.
• For example, if the researcher aims to study occupational hazard, he/she may go to the
nearby industry and interview a few who are available.
• Generally, it is the common method for selecting participants in a FGD.
Quota Sampling.
• Researchers are given quotas to fill from different strata of population keeping the
proportions of quota same as observed in the population.
• Example: In a village, Hindu and Muslim population are 60% and 40% respectively; and a
researcher by the method of quota sampling can select participants by his own choice in
the same ratio of 6:4 (Hindu:Muslim).
Other Non-probability sampling methods.
• Self-selection sampling: Participants take part in the research on their own as a volunteer.
• Extreme(deviant cases) sampling: Participants are selected based upon highly unusual
manifestations of the phenomenon of interest.
• Intensity sampling: few select cases that manifest the phenomenon intensely(but are not
extreme cases) are selected to have rich information from them.
• Criterion sampling: All the cases that meet some specific criteria of interest are selected for
in-depth investigation to have a detailed knowledge about a particular type of case and to
identify all sources of variation.
References.
• Essentials of biostatistics and research methodology – Indranil saha and Bobby
Paul. 4th edition.
• Mahajan’s methods in biostatistics -9th edition.