LECTURE 10: SAMPLING METHODS
Introduction to Sampling
It is a common practice in journalism that pictures or stories of one or two
individuals are presented to demonstrate the behaviour of individuals in entire
groups or societies. These stories could be about homelessness, domestic violence,
coping with disaster, overcoming adversity to succeed in education or career and
abuse of alcohol. However, we do not know whether one or two participants
interviewed for a story are like the rest of the members of the societies or countries
reported on or they are just two people who caught the eye of this one reporter. In
other words, we don’t know how generalizable their stories are, and if we don’t have
confidence in generalizability, then the validity of this account of how members of a
particular culture or society behave is suspect.
In this lecture, you will learn about sampling methods, the procedures that primarily
determine the generalizability of research findings. We first review the rationale for
using sampling in social research and consider circumstances when sampling is not
necessary. The lecture then turns to specific sampling methods and when they are
most appropriate.
Objectives
1. Define a sample
2. Define sample components and the population
3. Explain the purpose of sampling in research
4. Explain the methods of sampling in research
Definition of a Sample
The group of people such as traders, farmers, teachers and abandoned children or
units such as schools, health facilities, water facilities, and business premises you
are interested in studying constitute your target population. Usually there is not
enough time, money or staff to allow you to involve every individual case, person,
or unit in the target population in your study. Often when you conduct research you
pick only a part of the target population for observation and analysis. It is only when
we conduct a census that we normally seek and record information about each
person or unit in a target population. We refer to the part of the target population
that we select for study as a sample.
1
Sampling refers to a planned way of selecting study subjects. When research
focuses on people, sampling refers to the process of determining who will be
studied. However, when research is about organizations, programmes, or other
aggregates of individuals in formal or informal groups, it means the process of
determining which group or organization or programme will be studied. In a study
of things such as farm produce marketed in an area, cultural artefacts like films and
diaries, or modes of transport available between two small towns, sampling would
be defined as the process of determining what things to study. Whatever the units
of analysis – things, people, groups, geographic zones, organizations or institutions
– sampling refers to the selection of the ‘units’ to be studied.
Define Sample Components and the Population
Let’s say that we are designing a survey about substance use among students in a
college named Sparrow University College. We don’t have the time or resources to
study the entire student population in the university college, even though it
consists of the set of individuals or other entities to which we wish to be able to
generalize our findings. So instead, we resolve to study a sample, a subset of this
population. The individual members of this sample are called elements, or
elementary units.
In many studies, we sample directly from the elements in the population of interest.
We may survey a sample of the entire population of students in a school, based on
a list obtained from the registrar’s office. This list, from which the elements of the
population are selected, is termed the sampling frame. The students who are
selected and interviewed from that list are the elements.
In some studies, the entities that can be reached easily are not the same as the
elements from which we want information, but they include those elements. For
example, we may have a list of households but not a list of the entire population of
a town, even though the adults are the elements that we actually want to sample.
In this situation, we could draw a sample of households so that we can identify the
adult individuals in these households. The households are termed enumeration
units, and the adults in the households are the elements (Levy & Lemeshow 1999).
Sometimes, the individuals or other entities from which we collect information are
not actually the elements in our study. For example, a researcher might sample
schools for a survey about educational practices and then interview a sample of
2
teachers in each sampled school to obtain the information about educational
practices. Both the schools and the teachers are termed sampling units, because
we sample from both (Levy & Lemeshow 1999). The schools are selected in the first
stage of the sample, so they are the primary sampling units (in this case, they are
also the elements in the study). The teachers are secondary sampling units (but
they are not elements, because they are used to provide information about the
entire school).
It is important to know exactly what population a sample can represent when you
select or evaluate sample components. In a survey of “health care workers in
Kenya,” the general population may reasonably be construed as all individuals in
Kenya whose primary occupation is the provision of health care services. But always
be alert to ways in which the population may have been narrowed by the sample
selection procedures (e.g., nurses, paramedics, surgeons, General Practitioners,
dentists, ophthalmologists, gynaecologists, orthopaedists etc). For example,
perhaps only nurses serving in Kenya were surveyed. The population for a study is
the aggregation of elements that we actually focus on and sample from, not some
larger aggregation that we really wish we could have studied (in this case health
care workers).
Purposes of Sampling
Usually when we decide to carry out research we begin with an assumption that a
part of the population of interest to us reflects the same characteristics as the
entire population of which it is part. Secondly, we recognize that we may not
effectively study every person, group, or item in a population that we are interested
in. For example, even if there was adequate time, money, and staff to conduct
interviews with the subjects, not everyone would be reachable or available to be
interviewed. There are therefore two main reasons why we need to select samples
in research.
Representative Results and Control of Bias
The purpose and general value of research is that it helps bring about a stronger
understanding of the world in which we live. In order to achieve this goal it is
important that what is studied represents something greater than itself in the larger
society. When, for example, we select subjects say widows, orphans, or people with
visual disabilities, from a larger universe of possible subjects, this should allow the
3
evidence from the sample to be representative of the whole population. For
instance, when we are investigating the patterns of career choices among
graduates of the social work or sociology programme of the University of Nairobi,
we should be able to use evidence from a sample of the students to represent all
the graduates of the programme.
In research we usually sample so as to obtain representative results and to reduce
bias. For example, if we only interviewed literate people concerning their opinions
on a development programme, we would have biased the study in favour of literate
people. The same would occur if we undertook a study that only reflected the
unique experiences of particular widows, orphans, or individuals with visual
disabilities.
As we strive to obtain representative results, we are reducing or avoiding bias. The
researcher therefore must select subjects and observe them in a way that will not
bias the findings. A researcher needs to ensure that there is no bias in the
distribution of subjects between groups. For example, if age, level of education, and
sex are used as variables then the subjects drawn from various age, sex, and
education categories should be proportionately distributed. Therefore, when we
select a sample with a very large number of illiterate people and a very small
number of college-educated or high school graduates and vice versa, we would be
increasing bias rather than reducing it. The preferred method of reducing sampling
bias is to randomly assign subjects to groups.
Sample generalizability depends on sample quality, which is determined by the
amount of sampling error—the difference between the characteristics of a sample
and the characteristics of the population from which it was selected. The larger the
sampling error, the less representative the sample—and thus the less generalizable
the findings. To assess sample quality when you are planning or evaluating a study,
ask yourself these questions:
From what population were the cases selected?
What method was used to select cases from this population?
Do the cases that were studied represent, in the aggregate, the population
from which they were selected?
Making Inferences
4
Another purpose of sampling is to enable researchers to make inferences
(judgements, estimates or predictions) from findings based on a sample to the
larger population from which that sample is drawn. When we draw samples
according to the rules of probability (which means that every member of the
population has a known probability of being selected into the sample), we can make
inferences from the findings on the sample to the larger population from which the
sample was drawn. This ability to make inferences from a sample to a much larger
population means that the findings from a sample of a limited size can be used to
predict what the findings would have been for the whole population.
A representative sample is a sample that “looks like” the population from which
it was selected in all respects that are potentially relevant to the study. The
distribution of characteristics among the elements of a representative sample is the
same as the distribution of those characteristics among the total population. In an
unrepresentative sample, some characteristics are overrepresented or
underrepresented.
By using limited size sample as a basis for predicting the population characteristics,
we are able to save on costs of conducting research on the whole population.
Scientifically selected samples are known to accurately predict measures from a
larger population. An illustration of how we can make inferences from samples
about populations from which such samples are drawn is the use of opinion polls to
predict outcomes of presidential elections in modern democracies. Corporate
groups also use the results of market surveys to predict consumer behaviour and to
design their marketing strategies.
Consider a Census
In some circumstances, it may be feasible to skirt the issue of generalizability by
conducting a census— studying the entire population of interest—rather than
drawing a sample. This is what the national governments try to do every 10 years
with the National Censuses. A census is a research in which information is obtained
through the responses that all available members of an entire population give to
questions. Censuses also include studies of all the employees (or students) in small
organizations, studies comparing all 47 counties in Kenya, and studies of the entire
population of a particular type of organization in some area.
5
However, in comparison with the say the U.S. Census and similar efforts in other
countries, states or counties, and cities, the population that is studied in these other
censuses is relatively small. The reason that social scientists do not often attempt
to collect data from all the members of some large population is simply that doing
so would be too expensive and time consuming—and they can do almost as well
with a sample. Some social scientists conduct research with data from the National
Censuses, but it is the government that collects the data and it is the taxpayer that
pays for the effort.
General Guidelines on Sampling
There are three general considerations that guide decisions about sampling:
1. Limitations in terms of resources: Studying everyone in a population
would reduce the need to develop a sample but would also
tremendously increase the effort and cost of data collection. Thus one
of the main reasons for sampling is to serve as a way to contain costs.
2. Similarities and differences among subjects: If the subjects in a study
are similar to each other (homogeneous) in background, they may tend
to be more representative of each other in a number of ways. If one is
studying vegetable farmers in an area, they are likely to share many
similarities among themselves. You may therefore draw a smaller and
less complex sample from this category of farmers. However, if the
subjects are heterogeneous (e.g. parents sending their children to a
community school), you may need to have a sample design that
captures representatives from different types of subjects in the
population.
3. Types of analyses to be carried out after collecting data: If you plan to
study many different variables in your analysis, you may need a larger
sample that is designed to get sufficient variation within those
variables. For example, if you want to study the public perceptions of
‘youth employment programme’ (Kazi Kwa Vijana/NYS Programmes) as
a means of addressing youth unemployment, you may need to have
varied categories of respondents. These would include parents,
teachers, policy makers, policy implementers, female youth, male
youth, youth with college education, youth with high school education,
6
youth with primary education, etc. If you plan to focus on a particular
group in your analysis, you will need to be sure that your sample
includes enough representatives from that group.
Methods of Sampling
We have two broad categories of sampling methods in research. These are the
probability and non-probability sampling methods.
Probability Sampling
This sampling method is also known as chance method of sampling. It means that
every member of the study population has a known chance (probability) of being
selected into the sample. To draw a probability sample you need to define or
identify the population from which the study subjects would be drawn. A population
is a collection of all the units or elements (either known or unknown) from which a
sample is drawn. Probability of selection means the likelihood that an element
will be selected from the population for inclusion in the sample. In a census of all
elements of a population, the probability that any particular element will be
selected is 1.0. If half of the elements in the population are sampled on the basis of
chance (say, by tossing a coin), the probability of selection for each element is one
half, or .5. As the size of the sample as a proportion of the population decreases, so
does the probability of selection.
For example, our population could be all handicraft traders operating along street
‘A’. From this population we could choose a characteristic such as footwear articles
or household decorations to help us draw a list of all the units or elements to form
our sampling frame. A sampling frame is the list of all elements or units from
which a sample is selected.
Random sampling refers to a method of sampling that relies on a random, or
chance, selection method so that every element of the sampling frame has a known
probability of being selected. There is a natural tendency to confuse the concept of
random sampling, in which cases are selected only on the basis of chance, with a
haphazard method of sampling. On first impression, “leaving things up to chance”
seems to imply not exerting any control over the sampling method. But to ensure
that nothing but chance influences the selection of cases, the researcher must
proceed very methodically, leaving nothing to chance except the selection of the
7
cases themselves. The researcher must follow carefully controlled procedures if a
purely random process is to occur. In fact, when reading about sampling methods,
do not assume that a random sample was obtained just because the researcher
used a random selection method at some point in the sampling process. Look for
those two particular problems; selecting elements from an incomplete list of the
total population and failing to obtain an adequate response rate.
If the sampling frame is incomplete, a sample selected randomly from that list will
not really be a random sample of the population. You should always consider the
adequacy of the sampling frame. Even for a simple population such as a university’s
student body, the registrar’s list is likely to be at least a bit out-of-date at any given
time. For example, some students will have dropped out, but their status will not yet
be officially recorded. Although you may judge the amount of error introduced in
this particular situation to be negligible, the problems are greatly compounded for a
larger population. The sampling frame for a city, state, or nation is always likely to
be incomplete because of constant migration into and out of the area. Even
unavoidable omissions from the sampling frame can bias a sample against
particular groups within the population.
A very inclusive sampling frame may still yield systematic bias if many sample
members cannot be contacted or refuse to participate. Nonresponse is a major
hazard in survey research because non-respondents are likely to differ
systematically from those who take the time to participate. You should not assume
that findings from a randomly selected sample will be generalizable to the
population from which the sample was selected if the rate of nonresponse is
considerable (certainly not if it is much above 30%). In general, both the size of the
sample and the homogeneity (sameness) of the population affect the degree of
error due to chance; the proportion of the population that the sample represents
does not. To elaborate,
The larger the sample, the more confidence we can have in the sample’s
representativeness. If we randomly pick 5 people to represent the entire
population of our city, our sample is unlikely to be very representative of the
entire population in terms of age, gender, race, attitudes, and so on. But if we
randomly pick 100 people, the odds of having a representative sample are
8
much better; with a random sample of 1,000, the odds become very good
indeed.
The more homogeneous the population, the more confidence we can have in
the representativeness of a sample of any particular size. Let’s say we plan to
draw samples of 50 from each of two communities to estimate mean family
income. One community is very diverse, with family incomes varying from
$12,000 to $85,000. In the other, more homogeneous community, family
incomes are concentrated in a narrow range, from $41,000 to $64,000. The
estimated mean family income based on the sample from the homogeneous
community is more likely to be representative than is the estimate based on
the sample from the more heterogeneous community. With less variation to
represent, fewer cases are needed to represent the homogeneous
community.
The fraction of the total population that a sample contains does not affect the
sample’s representativeness unless that fraction is large. We can regard any
sampling fraction less than 2% with about the same degree of confidence
(Sudman 1976). In fact, sample representativeness is not likely to increase
much until the sampling fraction is quite a bit higher. Other things being
equal, a sample of 1,000 from a population of 1 million (with a sampling
fraction of 0.001, or 0.1%) is much better than a sample of 100 from a
population of 10,000 (although the sampling fraction for this smaller sample
is 0.01, or 1%, which is 10 times higher). The size of the samples is what
makes representativeness more likely, not the proportion of the whole that
the sample represents.
Probability Sampling Methods
Probability sampling methods are those in which the probability of selection is
known and is not zero (so there is some chance of selecting each element). These
methods randomly select elements and therefore have no systematic bias;
nothing but chance determines which elements are included in the sample. This
feature of probability samples makes them much more desirable than
nonprobability samples, when the goal is to generalize to a larger population. There
are several methods for drawing random samples. These are:
9
Simple random sampling (SRS): This is a method of sampling in which the units
in a sampling frame are numbered and then drawn into the sample if they match
the random numbers which have been selected. The random numbers from which
you draw your sample should match the total number of elements in the sampling
frame. For example, if you have a total of 50 units or elements in a sampling frame,
the random numbers must correspond to the units, that is, 50 random numbers
must be drawn and then picked randomly to match the desired sample size. If for
instance we desire a 20 per cent sample, the size would be 10 out of the 50 cases
and we would draw 10 numbers randomly from among the 50 random numbers to
make our sample.
Systematic random sampling: The more common form of selection for a
probability sample is to select every n th person or unit once you have made a
random start. You may decide to interview every fifth person on the list and this
would constitute a systematic random sample. You need to define a sampling
interval by dividing the population (or sampling frame) size by desired size of the
sample. For example, if you want to select 50 cases from among a population of
500 primary school teachers in an educational district, then your systematic
sampling would require that you select every tenth teacher. Remember that the
sampling frame list must not be ordered in a way that sets up systematic intervals,
such that for example, head teachers and deputy head teachers are listed prior to
every 50 ordinary teachers since this may introduce a systematic distortion and
bias the sample.
Stratified random sampling: A stratified random sample is used when you know
in advance that the population in question contains a number of non-overlapping
sub-groups (strata). These are samples that are randomly drawn from sampling
frames that have been stratified by characteristics such as sex, age, occupation,
business size etc. Each of these strata forms a homogeneous sub-group from which
random samples are separately selected. For example, suppose you were interested
in farmers’ outputs, and farmers grew the same crop in three different ecological
zones, then you might decide to take a random sample of farmers within each of
the three zones.
Proportionate probability sampling: This is a probability sampling method that
you would use when you decide to select proportionately to the size of the strata,
10
clusters, or areas. For instance, depending on how we classify the study population,
we may find that some segments are larger or smaller than the rest. For example,
in households in a block of apartments, we may have fewer 3 bedroom units than 2
and 1 bedroom units respectively. If our study focuses on renting preferences for
families moving into an upgraded urban settlement, we may need to pay attention
to this difference in number of housing units in each category as classified by
number of bedrooms.
Cluster Sampling
Cluster sampling is useful when a sampling frame of elements is not available, as
often is the case for large populations spread out across a wide geographic area or
among many different organizations. Cluster sampling is sampling in which
elements are selected in two or more stages, with the first stage being the random
selection of naturally occurring clusters and the last stage being the random
selection of elements within clusters. A cluster is a naturally occurring, mixed
aggregate of elements of the population, with each element appearing in one, and
only one, cluster. Schools could serve as clusters for sampling students, blocks
could serve as clusters for sampling city residents, counties could serve as clusters
for sampling the general population, and businesses could serve as clusters for
sampling employees.
Drawing a cluster sample is, at least, a two-stage procedure. First, the researcher
draws a random sample of clusters. A list of clusters should be much easier to
obtain than a list of all the individuals in each cluster in the population. Next, the
researcher draws a random sample of elements within each selected cluster.
Because only a fraction of the total clusters are involved, obtaining the sampling
frame at this stage should be much easier.
In a cluster sample of city residents, for example, blocks could be the first-stage
clusters. A research assistant could walk around each selected block and record the
addresses of all occupied dwelling units. Or, in a cluster sample of students, a
researcher could contact the schools selected in the first stage and make
arrangements with the registrar to obtain lists of students at each school. Cluster
samples often involve multiple stages, with clusters within clusters, as when a
national sample of individuals might involve first sampling states, then geographic
units within those states, then dwellings within those units, and finally, individuals
11
within the dwellings. In multistage cluster sampling, the clusters at the first stage of
sampling are termed the primary sampling units (Levy & Lemeshow 1999).
Non-probability Sampling
Many groups may be interesting to study but for various reasons, no sampling
frame can ever be developed for them. Imagine for example that you want to study
the daily experiences of train commuters between Thika and Nairobi. Certainly
there is no list available of individuals who commute by train daily between these
two towns. However you may be able to draw together a sample useful for your
purposes by using a form of non-probability sampling. A sample that does not follow
the rules of probability sampling (there is no known chance that a case will be
selected into the sample) is a non-probability sample.
Convenience Sampling: Consider, for example, that you have no list of daily train
commuters between Nairobi and Thika so you rule out the possibility of drawing a
sample randomly based on a list of train commuters. You may then decide to obtain
a sample of train commuters by selecting individuals that are conveniently available
either at the point of boarding or alighting at Nairobi or Thika train stations.
Likewise you may select a sample of train commuters based on the convenience of
the first 150 train commuters on the Nairobi-Thika route that are willing to be
interviewed. However you have no way of estimating the representativeness of
convenience samples and therefore cannot accurately predict the population
experiences based on such a sample.
Purposive or Judgemental Sampling: A purposive sample is a form of non-
probability sample in which the subjects selected seem to meet the study’s needs.
This form of sampling generally considers the most common or typical
characteristics of the population or group it is desired to sample, tries to figure out
where such individuals or units can be found and then tries to study them. For
example, in order to study the training needs for kindergarten school teachers in
region of Kenya you may have to purposively select kindergarten school teachers
who are yet to undergo professional training. Another method would be to look for
untypical or deviant cases such as individuals who have already been trained to
allow for comparison with untrained cases.
12
Quota Sampling: Quota samples are non-probability samples in which sub-
samples are selected from clearly defined groups. In quota sampling, groups are
defined and the sizes are specified, and then individuals who fit those descriptions
are selected to fill the quotas wherever they can be found. For example, if we know
that 40 percent of entrepreneurs in a given study population are women we would
draw a sample in which 40 percent of the respondents will be women. In quota
sampling you draw a range of characteristics that would influence the behaviour or
relationship you are investigating, and then you determine the quota sizes based on
the distribution of those characteristics. For example, if you want to investigate how
individual entrepreneur characteristics would affect business performance you can
list the entrepreneur characteristics such as age, levels of education, sex, race, and
ethnicity. You then determine the distribution of these characteristics among the
entrepreneurs and use that distribution as your quota sizes.
Snowball Sampling: In snowball sampling, you first find a few subjects who are
characterized by the qualities you seek, interview them, and then ask them for
names of other people whom they know who have the same qualities or other
qualities that interest you. These could be individuals who are engaged in a practice
or activity such as water harvesting, new farming methods, and use of energy
saving stoves. In this manner, you accumulate more and more respondents by
suing each respondent you get as a source of new names for your sample. A
snowball sample therefore is built from the subjects suggested by previous subjects.
Summary
In this topic we have examined the meaning of samples and the purposes of
sampling in research. We have also defined other concepts that are related to
sampling such as population, sampling frame, sampling units and sample size. We
have also examined the methods of drawing an appropriate sample and
distinguished between probability and non-probability samples. We have also
emphasized that the design and objectives of your study will determine the type of
sample you need.
References
Baker, T. L (1999). Doing Social Research. (3rd Edition) Boston: McGraw Hill. Chapter
5
13
Pratt, B. & Loizos, P. (2003). Choosing Research Methods: Data Collection for
Development Workers. London: Oxfarm.
14