Sampling
Sampling and
and
Sampling
Sampling
Distributions
Distributions
Chap 3
Each sl ide has its own narration in an audio file.
For the explanation of any slide click on the audi o icon to start i t.
Profe ssorF riedman's Sta tis tics Cours ebyH&L Friedmanis licensed undera
CreativeCommons Attribution-NonCommercial-S hareAlik e3.0Unporte dLicense.
…
• Sampling is a technique that is used to
select a sample out of a population. It is a
process of gathering information from part
of the population.
Common terminologies of
sampling
• Population (universe) – it is a
collection of items or individuals
chosen for a study. The
characteristic of a population is
known as parameter.
• Sample – it is a subset of a
population. It is some
representative group of the study
population. The characteristic of a
sample is known as statistic.
…
• Census – it is a complete
enumeration or measurement of
every individual or item in the
population. It is gathering
information from all elements of a
population.
.
• Sampling error – it is the
difference between the population
parameter and the observed
probability sample statistic.
• Non-sampling error – it is an error
that occurs in the collection,
recording and computation of data.
.
• Sampling with replacement – a
sampling procedure in which sample
items are returned to the population;
as a result, there is a possibility of
their being chosen again in the
sample.
• Sampling without replacement –
a sampling procedure in which
sample items are not returned to the
population; as a result, none of these
can be selected in the sample again.
.
• Why sampling (reasons for
sampling)
• Decision makers need information
about population, i.e., census
information. However, census
taking often is:
• Very expensive and time
consuming to provide information
when it is needed.
…
• Impossible to undertake for
destruction nature of some tasks
and physical checking of all items in
the population.
• It is not feasible to include the
whole population when determining
on the phases of something.
Sampling and non-
sampling errors
• Sampling error: in sampling
approach, the findings of the survey
will entirely depend on the
information to be generated from
those elements of the population
that are included in the investigation
(generalization will be made about
the population by basing upon the
information gained from some parts
of the population).
.
• It is, however, obvious that
information input derived from a
sample would leave some room for a
wrong decision to be taken, in as much
as information about a part can not be
a perfect substitute of the information
covering the total. This may eventually
lead to committing errors. Such kinds
of errors (errors that basically resulted
from sampling approach) are what we
call sampling errors.
.
• Non-sampling error: Such kinds of
errors are errors that might occur
irrespective of the approach used
(both in the case of sampling and
census). It refers to the persistence
tendency of the findings to deviate
from the actual (true) information
due to biasdness and mistakes.
technique
• Sampling technique refers to the
procedures to be followed in
selecting the sample cases
(population elements to be included
in the sample) among the whole
population elements. In sampling,
the resultant sample cases should
be capable enough to represent the
population from which they are
drawn.
.
• Probability (random) sampling
technique: a sampling technique
that provides every element of the
population with a known non-zero
chance of being included in the
sample.
– systematic random . sample.
•Choose the first element randomly, then
every kth observation, where k = N/n
– stratified random sample.
•The population is sub-divided based on a
characteristic and a simple random
sample is conducted within each stratum
– cluster sample
•First take a random sample of clusters
from the population of cluster. Then, a
simple random sample within each cluster.
Example, election district, orchard.
.
• there are four kinds of probability
sampling techniques.
• Simple random sampling: refers
to a sampling technique in which
sampling is conducted in such a
way that each and every element of
the population has equal chance of
being included in the sample and
also every sample size of n has the
same chance of being chosen.
.
• Systematic random sampling
technique: is a sampling method,
which consists of every ith element
of the population.
.
• Stratified random sampling
technique: - A random sampling
technique in which the population
will be divided into two or more
non-overlapping groups (strata) and
then sample cases will be drawn
from each stratum through simple
random technique or systematic
random technique.
.
• Cluster sampling: - random
sampling technique in which the
population will be divided into two
or more overlapping groups
(clusters) and then some of these
clusters will be selected and then
sample elements will be selected
from the selected clusters based on
simple random or other random
sampling technique.
.
• Nonprobability Samples – based on
convenience or judgment
– Convenience (or chunk) sample -
students in a class, easy of access
– Judgment sample - based on the
researcher’s judgment as to what
constitutes “representativeness”
– Quota sample - interviewers are given
quotas based on demographics for
instance,
– Snow ball sample – a sample selects
another sample
Sampling Distribution
of X̅
• The sample mean, X̅, is a random variable.
• There are a lot of different values of X̅
• Every sample we collect has a different X̅
• There is only one population mean, µ.
Sampling Distribution
of X̅
• If each of you in the class collected your
own data you would each get a different X̅
• Appro 95% of your X̅’s would be close to µ,
within ±2 s.d.
• This is because X̅ is a normally distributed
random variable – for large samples
• X̅ follows a normal distribution centered
about µ
• This is known as the Central Limit
Theorem
Central Limit Theorem
• Consider that you take samples from a
given population
• Even if a population distribution is non
normal, the sampling distribution of X̅
may be considered to be approximately
normal for large samples.
– What’s large? At least 30; some say 50.
Central Limit Theorem
(cont’d)
This “hypothetical” sampling distribution
of the mean, as n gets large, has the
following properties:
•E(X̅) = μ. It has a mean equal to the
population mean.
•It has a standard deviation (called the
standard error of the mean, ) equal
to the population standard deviation
divided by √n.
•It is normally distributed.
Central Limit Theorem
(cont’d)
This means that, for large samples,
the sampling distribution of the
mean (X̅) can be approximated by a
normal distribution with
Example
• Suppose that in a population
consisting of 5 elements and one
wishes to take a random sample of
2. There are 10 possible samples
which might be selected. 5C2
• Consider the Population (N=5): 1,
2, 3, 4, 5
…Very Small
Example…
• Since we know the entire population, we
can compute the population parameters:
• μ = 3.0
•σ= = √2 = 1.41
• [Note: N, not n-1. This is the formula for
computing the population standard
deviation, σ.]
Example…
• All possible samples of size n=2:
Sample Mean ( X )
(1,2) 1.5
(1,3) 2.0
(1,4) 2.5
(1,5) 3.0
(2,3) 2.5
(2,4) 3.0
(2,5) 3.5
(3,4) 3.5
(3,5) 4.0
(4,5) 4.5
30.0
• The average of all the possible sample
means is E(X̅)=30.0 / 10 = 3.0. So,
E(X̅)=μ. This property is called
unbiasedness.
The Sampling
Distribution of X̅
• By the Central Limit Theorem, X̅
follows a normal distribution (for
large n):
The Sampling Distribution of
X̅
• Since this is a normal distribution, we
can standardize it (transform to Z)
just like any other normal distribution.
X E( X ) X
Z = Z
X / n
• If n is large, say 30 or more, use s as
an unbiased estimate of σ.
Example- Steel Chains
• Suppose you have steel chains with an
average breaking strength of μ=200
lbs. with a σ=10 lbs., and you take a
sample of n=100 chains.
• What is the probability that the sample
mean breaking strength will be 195
lbs. or less? This is the same as
asking: What proportion of the sample
means will be X̅ =195 lbs. or less?
• >= 190, >=220
Example- Steel Chains
• Solution (Draw a picture!)
•Z= = -5
• Ans: The probability is close to zero.
Example- Hybrid
Motors
• In a large automobile manufacturing
company, the life of hybrid motors is
normally distributed with a mean of
100,000 miles and a standard deviation
of 10,000 miles.
• (a) What is the probability that a
randomly selected hybrid motor has a
life between 90,000 miles and 110,000
miles per year?
(b) If a random sample of 100 motors is
selected, what is the probability that
the sample mean will be below 98,000
miles per year?
Example- Hybrid
Motors
• SOLUTION (Of course, DRAW A PICTURE!)
• (a) Convert to Z using the formula:
Z = (Xi − μ) / σ
Z= = −1
Z= = +1
• Thus, we have to find how much area lies between −1
and +1 of the Z-distribution
Answer = .6826
Example- Hybrid Motors
• (b) SOLUTION (Of course, DRAW A PICTURE!)
• (b) Here we are looking at the sampling distribution of
the mean. Sample means follow a different distribution
and to convert to Z, we use the following formula.
= −2 ,
• Ans: The probability the sample mean will be below
98,000/year is .5 − .4772 = .0228.
Other Sampling
Distributions
• We have been looking at the
relationship between X̅ and µ.
• Of course, statisticians will often be
interested in estimating other
parameters such as the population
proportion (P), the population
standard deviation (σ), the population
median, etc.
• In each case we use a statistic from a
sample to estimate the parameter.
Each of these statistics has its own
Implications
• The relationship between X̅ and µ is the
foundation of statistical inference
• Statistical inference includes estimation (of
µ) and testing hypotheses (about µ)
• Since there are so many X̅’s – as many as
there are possible samples - we use the X̅
value we happened to get as a tool to make
inferences about the only true mean, µ.
• Without actually conducting a census we
can never know µ with 100% certainty