ADVANCED RESEARCH METHODOLOGY
SAMPLING AND IT’S
TECHNIQUES
Assignment 3
Nayab Khan (Mphil)
11/1/2016
SAMPLING AND IT’S TECHNIQUES
TABLE OF CONTENT
1. What Is a Sample?
1.1.Attributes of Sample
1.2.Representative Sample
2. Determining Sample?
2.1.Why to Determine Sample?
2.2.Sample Size Determination
2.3.How to Determine Sample?
3. Calculating Sample Size
3.1.Approaches to Calculate Sample Size
3.2.Sample Size to Estimate a Population Mean
4. Basic Terms and Notations
5. Types of Sampling
6. Methods of Sampling
7. Event Sampling Methodology
8. Response Rate
9. Sampling Strategies
10. Errors in Sampling
10.1. Bias in sampling
10.2. Statistical errors
11. Software used for Calculating Sample
1. SAMPLE
A sample is “a smaller (but hopefully representative) collection of units from a
population used to determine truths about that population” (Field, 2005)
A sample is a “subgroup of a population” (Frey et al. 125).
1.1. ATTRIBUTES OF SAMPLE
Every individual in the chosen population should have an equal chance to be
included in the sample.
Ideally, choice of one participant should not affect the chance of another's
selection (hence we try to select the sample randomly – thus, it is important to
note that random sampling does not describe the sample or its size as much as
it describes how the sample is chosen).
1.2. REPRESENTATIVE SAMPLE
A sample whose characteristics correspond to, or reflect, those of the original
population or reference population
To ensure representativeness, the sample may be either completely random or
stratified depending upon the conceptualized population and the sampling
objective.
The aim of any sample is to represent the characteristics of the sample frame.
There are a number of different methods used to generate a sample.
As a researcher you will have to select the most appropriate method meet the
requirements of your research.
2. DETERMINING SAMPLE
2.1. WHY TO DETERMINE SAMPLE
Resources (time, money) and workload
Gives results with known accuracy that can be calculated mathematically
The population of interest is usually too large to attempt to survey all of its
members.
A carefully chosen sample can be used to represent the population.
The sample reflects the characteristics of the population from which it is drawn.
When it’s impossible to study the whole population
2.2. SAMPLE SIZE DETERMINATION
Sample size determination is the act of choosing the number of observations
or replicates to include in a statistical sample. The sample size is an important feature of
any empirical study in which the goal is to make inferences about a population from a
sample. In practice, the sample size used in a study is determined based on the expense of
data collection, and the need to have sufficient statistical power. In complicated studies
there may be several different sample sizes involved in the study: for example, in
stratified survey there would be different sample sizes for each stratum. In a census, data
are collected on the entire population; hence the sample size is equal to the population
size. In experimental design, where a study may be divided into different treatment
groups, there may be different sample sizes for each group.
Sample sizes may be chosen in several different ways:
Experience - For example, include those items readily available or convenient to
collect. A choice of small sample sizes, though sometimes necessary, can result in
wide confidence intervals or risks of errors in statistical hypothesis testing.
Using a target variance for an estimate to be derived from the sample eventually
obtained
Using a target for the power of a statistical test to be applied once the sample is
collected.
2.3. HOW TO DETERMINE SAMPLE SIZE
“We need a margin of error less than 2.5%”. Typical surveys have margins of error
ranging from less than 1% to something of the order of 4% — we can choose any
margin of error we like but need to specify it.
95% confidence intervals are typical but not in any way mandatory — we could do
90%, 99% or something else entirely. For this example, we assume 95%.
May be guided by past surveys or general knowledge of public opinion. Let’s suppose
answer is 30%.
3. CALCULATING SAMPLE SIZE
3.1. APPROCHES TO CALCULATE SAMPLE SIZE
There are two approaches to sample size calculations:
• Precision-based: With what precision do you want to estimate the proportion, mean
difference . . . (or whatever it is you are measuring)?
• Power-based: How small a difference is it important to detect and with what degree of
certainty?
Generally, the sample size for any study depends on the:[1]
Acceptable level of significance
Power of the study
Expected effect size
Underlying event rate in the population
Standard deviation in the population.
Before calculating a sample size, you need to determine a few things about the target population
and the sample you need:
1. Population Size — How many total people fit your demographic? For instance, if you
want to know about mothers living in the US, your population size would be the total
number of mothers living in the US. Don’t worry if you are unsure about this number. It
is common for the population to be unknown or approximated.
2. Margin of Error (Confidence Interval) — No sample will be perfect, so you need to
decide how much error to allow. The confidence interval determines how much higher or
lower than the population mean you are willing to let your sample mean fall. If you’ve
ever seen a political poll on the news, you’ve seen a confidence interval. It will look
something like this: “68% of voters said yes to Proposition Z, with a margin of error of
+/- 5%.”
3. Confidence Level — How confident do you want to be that the actual mean falls within
your confidence interval? The most common confidence intervals are 90% confident,
95% confident, and 99% confident.
4. Standard of Deviation — How much variance do you expect in your responses? Since
we haven’t actually administered our survey yet, the safe decision is to use .5 – this is the
most forgiving number and ensures that your sample will be large enough.
We already know that the margin of error is 1.96 times the standard error and that the standard
√pˆ(1−pˆ)
error is as; = .
𝑛
In general the formula is
√pˆ(1−pˆ)
ME = z n
Where;
• ME is the desired margin of error
• z is the z-score, e.g. 1.645 for a 90% confidence interval, 1.96 for a 90% confidence interval,
2.58 for a 99% confidence interval (see Table 8.2, page 369)
• pˆ is our prior judgment of the correct value of p.
• n is the sample size (to be found)
3.2. SAMPLE SIZE TO ESTIMATE A POPULATION MEAN
The issues are similar if we are designing a survey or an experiment to estimate a population mean. In
this case, the formula is
𝑠
ME = 𝑡
√𝑛
Where;
• ME is the desired margin of error •
t is the t-score that we use to calculate the confidence interval, that depends on both the degrees of
freedom and the desired confidence level,
• s is the standard deviation,
• n is the sample size we want to find.
4. BASIC TERMS AND NOTATIONS
Target Population:
The population to be studied/ to which the investigator wants to generalize his results
Sampling Unit:
Smallest unit from which sample can be selected
Sampling frame
List of all the sampling units from which sample is drawn
Sampling scheme
Method of selecting sampling units from sampling frame
Population
A population can be defined as including all people or items with the characteristic one
wish to understand.
Parameters
Numerical characteristic of a population
Statistics
Numerical characteristic of a sample
Data
The measurements that are collected by the investigator
Notation Population
σ: The known standard deviation of the population.
σ2: The known variance of the population.
P: The true population proportion.
N: The number of observations in the population.
x: The sample estimate of the population mean.
Notation Sample
s: The sample estimate of the standard deviation of the population.
s2: The sample estimate of the population variance.
p: The proportion of successes in the sample.
n: The number of observations in the sample.
SD: The standard deviation of the sampling distribution.
SE: The standard error. (This is an estimate of the standard deviation of the sampling
distribution.)
5. TYPES OF SAMPLING
There are two basic types of sampling, which further have many sub-methods for
sampling.
1. Probability Sampling
A probability sampling scheme is one in which every unit in the population has a
chance (greater than zero) of being selected in the sample, and this probability can
be accurately determined.
Methods include random sampling, systematic sampling, and stratified
sampling.
They are considered to be:
Objective
Empirical
Scientific
Quantitative
Representative
2. Non-Probability Sampling
Any sampling method where some elements of population have no chance of
selection (these are sometimes referred to as 'out of coverage'/'under-covered'), or
where the probability of selection can't be accurately determined. It involves the
selection of elements based on assumptions regarding the population of interest,
which forms the criteria for selection. Hence, because the selection of elements is
nonrandom, non-probability sampling not allows the estimation of sampling
errors.
Methods include convenience sampling, judgment sampling, quota
sampling, and snowball sampling
They are considered to be:
Interpretive
Subjective
Not scientific
Qualitative
Unrepresentative
6. METHODS OF SAMPLING
6.1. PROBABILITY SAMPLING METHODS
Simple random sampling. Simple random sampling refers to any sampling method that
has the following properties.
The population consists of N objects.
The sample consists of n objects.
If all possible samples of n objects are equally likely to occur, the sampling
method is called simple random sampling.
There are many ways to obtain a simple random sample. One way would be the lottery
method. Each of the N population members is assigned a unique number. The numbers
are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n
numbers. Population members having the selected numbers are included in the sample.
Stratified sampling. With stratified sampling, the population is divided into groups,
based on some characteristic. Then, within each group, a probability sample (often a
simple random sample) is selected. In stratified sampling, the groups are called strata.
As a example, suppose we conduct a national survey. We might divide the population
into groups or strata, based on geography - north, east, south, and west. Then, within each
stratum, we might randomly select survey respondents.
Cluster sampling. With cluster sampling, every member of the population is assigned to
one, and only one, group. Each group is called a cluster. A sample of clusters is chosen,
using a probability method (often simple random sampling). Only individuals within
sampled clusters are surveyed.
Note the difference between cluster sampling and stratified sampling. With stratified
sampling, the sample includes elements from each stratum. With cluster sampling, in
contrast, the sample includes elements only from sampled clusters.
Multistage sampling. With multistage sampling, we select a sample by using
combinations of different sampling methods.
For example, in Stage 1, we might use cluster sampling to choose clusters from a
population. Then, in Stage 2, we might use simple random sampling to select a subset of
elements from each chosen cluster for the final sample.
Systematic random sampling. With systematic random sampling, we create a list of
every member of the population. From the list, we randomly select the first sample
element from the first k elements on the population list. Thereafter, we select
every element on the list.
6.2. NON-PROBABILITY SAMPLING METHODS
Voluntary sample. A voluntary sample is made up of people who self-select into
the survey. Often, these folks have a strong interest in the main topic of the
survey.
Suppose, for example, that a news show asks viewers to participate in an on-line
poll. This would be a volunteer sample. The sample is chosen by the viewers, not
by the survey administrator.
Convenience sample. A convenience sample is made up of people who are easy
to reach.
Consider the following example. A pollster interviews shoppers at a local mall. If
the mall was chosen because it was a convenient site from which to solicit survey
participants and/or because it was close to the pollster's home or business, this
would be a convenience sample.
Quota sampling
This method of sampling is often used by market researchers. Interviewers are
given a quota of subjects of a specified type to attempt to recruit. For example, an
interviewer might be told to go out and select 20 adult men and 20 adult women,
10 teenage girls and 10 teenage boys so that they could interview them about their
television viewing. There are several flaws with this method, but most
importantly it is not truly random.2
Snowball sampling
This method is commonly used in social sciences when investigating hard to
reach groups. Existing subjects are asked to nominate further subjects known to
them, so the sample increases in size like a rolling snowball. For example, when
carrying out a survey of risk behaviours amongst intravenous drug users,
participants may be asked to nominate other users to be interviewed.
7. EVENT SAMPLING METHODOLOGY
ESM is a new form of sampling method that allows researchers to study ongoing
experiences and events that vary across and within days in its naturally-occurring
environment. Because of the frequent sampling of events inherent in ESM, it enables
researchers to measure the typology of activity and detect the temporal and dynamic
fluctuations of work experiences. Popularity of ESM as a new form of research design
increased over the recent years because it addresses the shortcomings of cross-sectional
research, where once unable to, researchers can now detect intra-individual variances
across time. In ESM, participants are asked to record their experiences and perceptions in
a paper or electronic diary.
8. RESPONSE RATE
Response rate (also known as completion rate or return rate) in survey research refers to
the number of people who answered the survey divided by the number of people in the
sample. It is usually expressed in the form of a percentage. The term is also used in direct
marketing to refer to the number of people who responded to an offer.
8.1. IMPORTANCE
A survey’s response rate is the result of dividing the number of people who were
interviewed by the total number of people in the sample who were eligible to participate
and should have been interviewed.[1] A low response rate can give rise to sampling bias if
the nonresponse is unequal among the participants regarding exposure and/or outcome.
Such bias is known as non-response bias.
For many years, a survey's response rate was viewed as an important indicator of survey
quality. Many observers presumed that higher response rates assure more accurate survey
results (Aday 1996; Babbie 1990; Backstrom and Hursh 1963; Rea and Parker 1997). But
because measuring the relation between nonresponse and the accuracy of a survey
statistic is complex and expensive, few rigorously designed studies provided empirical
evidence to document the consequences of lower response rates until recently.
Such studies have finally been conducted in recent years, and several conclude that the
expense of increasing the response rate frequently is not justified given the difference in
survey accuracy.
9. STRATEGIES FOR SAMPLING
There are three broad approaches to selecting a sample for a qualitative study.
Convenience sample
This is the least rigorous technique, involving the selection of the most accessible
subjects. It is the least costly to the researcher, in terms of time, effort and money, but
may result in poor quality data and lacks intellectual credibility. There is an element of
convenience sampling in many qualitative studies, but a more thoughtful approach to
selection of a sample is usually justified.
Judgment sample
Also known as purposeful sample, this is the most common sampling technique. The
researcher actively selects the most productive sample to answer the research question.
This can involve developing a framework of the variables that might influence an
individual's contribution and will be based on the researcher's practical knowledge of the
research area, the available literature and evidence from the study itself. This is a more
intellectual strategy than the simple demographic stratification of epidemiological
studies, though age, gender and social class might be important variables. If the subjects
are known to the researcher, they may be stratified according to known public attitudes or
beliefs. It may be advantageous to study a broad range of subjects (maximum variation
sample), outliers (deviant sample), subjects who have specific experiences (critical case
sample6 ) or subjects with special expertise (key informant sample). Subjects may be able
to recommend useful potential candidates for study (snowball sample). During
interpretation of the data it is important to consider subjects who support emerging
explanations and, perhaps more importantly, subjects who disagree (confirming and
disconfirming samples).
Theoretical sample
The iterative process of qualitative study design means that samples are usually theory
driven to a greater or lesser extent. Theoretical sampling necessitates building
interpretative theories from the emerging data and selecting a new sample to examine and
elaborate on this theory. It is the principal strategy for the grounded theoretical approach3
but will be used in some form in most qualitative investigations necessitating
interpretation.
10.ERRORS IN SAMPLING
In statistics, sampling error is the error caused by observing a sample instead of the whole
population. The sampling error is the difference between a sample statistic used to
estimate a population parameter and the actual but unknown value of the parameter
(Bunns & Grove, 2009).
There are five common errors of sampling, as follows
Population Specification Error: This error occurs when the researcher does not
understand who she should survey. For example, imagine a survey about breakfast cereal
consumption. Who should she survey? It might be the entire family, the mother, or the
children. The mother probably makes the purchase decision, but the children influence
her choice.
Sample Frame Error: A frame error occurs when the wrong sub-population is used to
select a sample. A classic frame error occurred in the 1936 presidential election between
Roosevelt and Landon. The sample frame was from car registrations and telephone
directories. In 1936, many Americans did not own cars or telephones and those who did
were largely Republicans. The results wrongly predicted a Republican victory.
Selection Error: This occurs when respondents self select their participation in the study
– only those that are interested respond. Selection error can be controlled by going extra
lengths to get participation. A typical survey process includes initiating pre-survey
contact requesting cooperation, actual surveying, post survey follow-up if a response is
not received, a second survey request, and finally interviews using alternate modes such
as telephone or person to person.
Non-Response: Non-response errors occur when respondents are different than those
who do not respond. This may occur because either the potential respondent was not
contacted or they refused to respond. The extent of this non-response error can be
checked through follow-up surveys using alternate modes.
Sampling Errors: These errors occur because of variation in the number or
representativeness of the sample that responds. Sampling errors can be controlled by (1)
careful sample designs, (2) large samples, and (3) multiple contacts to assure
representative response.
10.1. BIAS IN SAMPLING
There are five important potential sources of bias that should be considered when selecting a
sample, by whatever method.
1. Any changes from the pre-agreed sampling rules can introduce bias
2. Bias is introduced if people in hard to reach groups are omitted
3. Replacing selected individuals with others, for example if they are difficult to contact, also
introduces bias
4. It is important to try and maximize the response rate to a survey; low response rates can
introduce bias
5. If an out of date list is used as the sample frame, it may also introduce bias, if it excludes
people who have recently moved to an area, for example.
10.2. STATISTICAL ERRORS
Type I error, also known as a “false positive”: the error of rejecting a null hypothesis
when it is actually true. In other words, this is the error of accepting an alternative
hypothesis (the real hypothesis of interest) when the results can be attributed to
chance. Plainly speaking, it occurs when we are observing a difference when in truth
there is none (or more specifically - no statistically significant difference). So the
probability of making a type I error in a test with rejection region R is 0 P R H ( | is true) .
Type II error, also known as a "false negative": the error of not rejecting a null
hypothesis when the alternative hypothesis is the true state of nature. In other words,
this is the error of failing to accept an alternative hypothesis when you don't have
adequate power. Plainly speaking, it occurs when we are failing to observe a difference
when in truth there is one. So the probability of making a type II error in a test with
rejection region R is 1 ( | is true) − P R Ha . The power of the test can be ( | is true) P R
Ha .
11.SOFTWARE USED FOR CALCULATING SAMPLE
The Survey System
Raosoft, Inc.
Vanderbilt
Ower-analysis