U3-1
Sampling
We have seen that the reason for collecting data from a
sample is to get some idea about the population that the
sample represents.
Until now, we have usually assumed that the samples
from which we measure variables are representative of
the whole population.
U3-2
Sampling
However, this may not always be the case. We need to
pay careful attention to the way we collect our samples
in order to make them useful (representative of the
population which we wish to examine).
If our data do not fairly represent the larger group, then
we cannot infer any of our conclusions onto that
population.
U3-3
Example
A local newspaper publishes the following survey:
“Do you think marijuana should be legalized?”
It lists a phone number to call to vote Yes and another
number to vote No.
The next day they print the results:
“A total of 1193 people responded to the survey and 83%
(990) voted Yes, while only 17% (203) voted No.”
U3-4
Voluntary Response Sample
The paper writes an editorial using this as evidence to
appeal to the government to legalize the currently
illegal drug.
Is anything wrong with this survey?
The process of collecting data from a sample to make
statements about the population is called sampling.
The sampling method used in the previous example is
called a voluntary response sample.
U3-5
Voluntary Response Sample
Such a sample consists of people who choose to
include themselves into the sample by responding to a
question or survey. The problem with this type of
sampling is that the data represent the opinions of
people who feel strongly about the subject in question.
U3-6
Voluntary Response Sample
People who feel strongly about marijuana use are
likely those who think it should be legalized. Those in
favour of change often feel more passionately than
those in favour of keeping things the same. Someone
who is happy with the way things are is not likely to
take the time (and often spend money) to call in and
answer a survey question.
U3-7
Voluntary Response Sample
As such, these results are obviously not representative
of the opinions of the general public.
In reality, the percentage of people favouring the
legalization of marijuana is much lower.
Voluntary response surveys are almost always
unreliable and should be avoided if possible.
U3-8
Sampling
Up until now, we have assumed that the data come
from a “good” (representative) sample. We see now
that we must examine the sampling method used
before we can generalize our conclusions to the
population. If we are selecting the sample ourselves,
then we must do so in a way that enables us to use the
sample data to make reliable statements about the
entire population.
U3-9
Example
A research company is conducting a survey at local
shopping malls asking shoppers about their buying
habits.
The company would like to know if consumers have
changed their spending behaviour since the recent
decline in the economy.
An employee of the company selects shoppers and
asks them to respond to the survey.
U3-10
Convenience Sample
This is called a convenience sample.
This type of sample chooses individuals who are
easiest to reach.
The surveyor is more likely to ask friendly-looking
people to respond to the questionnaire.
U3-11
Convenience Sample
Going to a mall and surveying people who seem easy
to approach certainly makes things easier for the
research firm and the employee, but it makes the
sample virtually useless.
The sample is supposed to represent all consumers,
but it really represents those who are already
shopping (and so spending behaviours will appear
higher), and also those who seem more friendly.
U3-12
Sampling Bias
Both voluntary response samples and convenience
samples are biased.
The design of a study is biased if it systematically
favours certain outcomes over others.
U3-13
Sampling
The solution for the biased types of samples we have
seen is to choose our samples in a way that neither
the sampler nor the respondents choose the sample
units.
We will choose the sample “by chance”; that is, in a
way that does not favour any of the potential units.
Choosing the sample in this manner attacks bias by
giving all individuals (men, women, rich, poor,
young, old, black, white) an equal chance of being
selected.
U3-14
Simple Random Sampling
The easiest way to choose such a sample is to “put
names in a hat” and to choose n of them. This is the
framework behind simple random sampling.
An outcome is called random if it has two or more
possible values with a known probability of being
observed.
U3-15
Simple Random Sampling
A simple random sample (SRS) of size n consists of n
individuals from the population chosen in such a way
that every group of n individuals has an equal chance
to be the sample actually selected.
It follows that each individual has an equal chance of
being selected into the sample.
U3-16
Random Sampling
The “pull names from a hat” idea is what we use to
motivate our SRS method, but in practice, we can use
computer software to randomly select our sample.
U3-17
Stratified Random Sampling
A simple random sample gives each unit in the
population an equal chance of being selected, but this
isn’t the case for all sampling methods.
Sometimes our population is naturally divided into
groups of similar individuals.
U3-18
Stratified Random Sampling
Suppose we wish to conduct a survey asking
Canadians what their disposable income is. Different
provinces have different levels of income and taxation,
so it is natural to divide the country into 13 strata (ten
provinces, three territories). A stratum is a group of
similar individuals.
Within each of the strata we take an SRS of size ni.
U3-19
Stratified Random Sampling
Note that our total sample is not an SRS. Not every
possible group of n individuals in the population has
the same chance of being chosen. In addition, not
even every individual has the same chance of being
chosen (unless the sample size in each province is
proportional to its population size).
U3-20
Stratified Random Sampling
Randomization would have ensured that we get a
diverse sample among the provinces, but in this
setting, stratification is even better, as we not only
ensure that we get this scenario of diversification, but
we can also select how many units we want from
each group.
U3-21
Multistage Sampling
Suppose a polling firm would like to conduct a door-
to-door survey of urban Canadians, asking their
opinions on the environment. It is not realistic to
select an SRS of n people to go survey. This would
cost too much. Instead, we may wish to select an
SRS of Canadian cities, then we select an SRS of
neighbourhoods in each selected city, then we select
an SRS of blocks of houses in each selected
neighbourhood, and we then survey the occupants of
those households.
U3-22
Multistage Sampling
This type of sampling is called multistage sampling.
Again, this type of sampling does not produce an
SRS. Not every combination of units has a chance of
being selected. Not every household will have the
same chance of being included either, unless we
carefully design the sample to make this happen.
U3-23
Multistage Sampling
While a stratified random sample can be viewed as
“more powerful” than an SRS (bias is more
strategically eliminated), a multistage sample is not
quite as “good” as an SRS. It is often our only
option, though, subject to time and cost restraints. It
should be noted, however, that these types of samples
can serve our purposes very well if conducted
properly.
U3-24
Simple Random Sample
U3-25
Stratified Random Sample
U3-26
Multistage Random Sampling
U3-27
Example
Even a carefully designed sample can be
collected in a non-random fashion.
A group of lobbyists opposed to euthanasia
wishes to dissuade the government from
legalizing assisted suicide in Canada. A
telephone survey is conducted. Respondents
from all walks of life are included in the sample
– men and women, young and old, with different
backgrounds and incomes.
U3-28
Leading Question
The question asked is as follows:
“Should it be legal to kill a living human at any stage of
his or her life?”
Not surprisingly, 93% of respondents said No. The group
presented its findings to the government saying “The
current law, with the support of 93% of Canadians, must
be kept.”
The flaws in this conclusion are obvious. The results
were obtained (“fabricated”) using a leading question.
U3-29
Sampling
The wording of questions is only one thing that can
influence the respondent. The tone or brevity of the
interviewer, as well as the interviewer’s race or
gender often biases a respondent’s answer.
In addition, the respondent may be unable to recall
the information requested, making his or her answer a
“guess”. Often, the person being surveyed will also
lie, to avoid revealing unflattering opinions or
behaviours.
U3-30
Nonresponse & Undercoverage
Two other problems:
Nonresponse occurs when respondents refuse to
answer the question(s). This is the case with many
phone surveys. People hang up, not wanting to be
bothered.
Undercoverage results when some units in the
population have no chance of being included in the
sample. For example, phone surveys (even if done
randomly) exclude a certain percentage of the
population – those without phones (or unlisted
numbers).
U3-31
Sampling
Important things to remember in selecting a “good”
sample:
§ Select the sample in an unbiased and representative
manner.
§ proper interviewer training
§ good (non-leading, easy-to-understand) wording of
questions
§ Try and make non-response a non-issue.
U3-32
Sampling
Important things to remember in selecting a “good”
sample (cont’d):
§ Include all population units in your possible
sample.
§ Voluntary response and convenience samples are
not appropriate.
§ If you can do something better than an SRS, do it
(make the sample more representative of the
population).