0 ratings 0% found this document useful (0 votes) 24 views 16 pages Hypothesis and Sampling
The document discusses fundamental concepts in probability and statistics, including the addition theorem, conditional probability, and the distinction between independent and dependent events. It also covers sampling methods, types of sampling, and the importance of sample surveys in making inferences about populations. Additionally, it explains hypothesis testing, including null and alternative hypotheses, test statistics, and types of errors in statistical testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Hypothesis and Sampling For Later
‘The addition theorem is also known as the theorem of total probability
Tea and B are mutually exclusive events, then P(A. B) = 0 and, therefore,
‘the addition rule simplifies to P(A B) = P(A) + P(B)
Conditional Probability
Given two events and 2, each with a positive probability of oceurring, the
Probability that A occurs given that has occurred (A conditioned on)
equal to P(A|B) = Pion
Similarly; the probability that B occurs given that A has occurred (B eondi-
3 Pang
tioned on A) is equal to P(B|A) = Pine
Independent vs Dependent Events
prevents, A and B, are independent if P(A|B) = P(A) or, equivalently,
P(BIA) = P(B). Otherwise, the events are dependent.
Multiplication Theorem of Probability
The probability of simultaneous occurrence of any two events A and B is defined
P(ANB) = P(A).P(BIA), if P(A) £0
or, P(ANB) = P(B).P(A\B), if P(B) 40
1f A and B are independent events, then the probability that A and B both
occur equals the product of the probability of A and the probability of B: that
is, P(ANB) = P(A)P(B)1 Sampling
Introduction
‘A sample survey i a metho of drawing an inference about the characteristics
of» population or universe by alneving a part ofthe population. Fr example,
then one has to mae an inference about m lage I and isnot practicable to
xamine each individual member of the lot, one always takes bel of sample
Surveys, that i to my one exatsines only ow member ofthe let aad, on the
bas of his nap intration, one nak dc ant te web Tha
a person wanting to purchase basket of oranges may examin fo on
tram the bak and on ha bas mae Hs deo aout the wile bk
Types of Sampling
Sampling i first broadly clasified as Subjective and Objective.
Any type of sampling which depends upon the personal judgment or disretion
Of the sampler hirnself is called Subjective, But the sampling method which is
fixed by a sampling rule or is independent of the sampler’s own judgment is
Objective sampling, Objective sampling is again classified into two subgroy
i. non-probabilistic, fi, probabilistic and mixed sampling,
In non-probabilistic objective sampling, there isa fixed sampling rule but there
is no probability attached to the mode of selection, e.g. selecting every 5° in-
Tist. If, however, the selection of the first individual is made in
dividual from
Such a manner that each of the first 10 gets an equal chance of being selected, it
becomes a case of mixed sampling, if for each individual there isa definite pre-
ity of being selected, the sampling is said to be probabilistic
assigned probabil
‘The collection of all units of a specified type in a given region at
js termed as a population or universe. For
farms, houses or automobiles in a
Population:
; yersons, families,
: : birds in a forest etc
region of a population of trees or a
1 to be finite population or an infinite population according
units in itis finite or infinite
A population is
1 coer the ete polation ad thes
empling Jnl Toe gum ond non-ovenapotg in ie sia hl YS
inate din elnge to oe nnd onl one ning wat, Fe
a re cacy wanly ly 1 nel oe
ned to De nor group of farms owed of operated
‘Sampling Unit: The sampling
since it is form
information. In a crop Survey;‘essential to have a frame of all the sampling units belonging to the population
to be studied with their proper identification particulars and such a frame is
called the sampling frame, ‘This may be a list of units with their identification
particulars.
AAs the sampling frame forms the basic material from which a sample is drawn,
it should be insured that the frame contains all the sampling units of the pop.
ulation under consideration but excludes units of any other population,
‘Sample: A sample is a subset of a population selected to obtain information
concerning the characteristics of the population. In other words, one or more
‘sampling units selected from a population according to some specified procedure
are said to constitute a sample.
Random sample: A random or probability sample is a sample drawn in such
‘@ manner that each unit in the population has a predetermined probability of
selection,
Sample space: The collection of all possible sample, sequence, sets is called
the sample space.
Sampling design: The combination of the sample space and the associated
probability measure is called a sampling, design.
I are usually unknown,
Parameter: Statistical constants of the population wh
Statistic: In practice parameter values are not known and their estimates based
mple values are generally used. Thus statistic which may be regarded
eter, obtained from the sample, is a function of
on the
fas an estimate of the para
sample values only.
Estimator: An estimator is a statistic obtained by a specified procedure for
estimating a population parameter. The estimator is a random variable, as its
value differs from sample to sample and the samples are selected with specified
probabilities. The particular value, which the estimator takes for @ given sam-
ple, is known as an estimate,
drawn from a finite population of si
an compute statisti, whieh will obviously vary from sample to sample, The
ausregate of the various values of the statistic under consideration o obtained
(one from each sample), may be grouped into a frequeney distribution whieh
known as the sampling distribution of the statistic. oeStandard Error: The standard deviation of the sampling distribution of
Statistic is known as its standard e*40r-
Sampling and Complete Enumeration
‘rhe total count of all units of the population for a certain, harecetetTS is
‘The ‘oti Caplets entmeration, ao termed census survey. The anes, Ta
ae competed for earrying out complete enumeration wit generally
PeTarge and there are many’ situations with Timited means wiper complete em
be ar nok be posible, where recourse to sclction of few oot ‘will be
Teiptul, When only a part, called sample, is selected from the ‘population and
cecmine, it is ealled sample enumeration or sample SUIVeY
‘A sample survey will usually be les expensive then e copa AIT ‘and the
A sare rmation wil obtain in les time, This doesnot imply (Sat CoE is
deere consideration in conducting a sample survey, Tes eh, important that
iegroe of accuracy of results js also maintained ‘Occasionally, the technique
a ceeiple survey is applied to verify that the resis ‘obtained from the census
ie survey Over census SUTVEY
Surveys. ‘The main advantages or merits of samt
may be outlined as follows:
«¢ Reduced cost of surveys
« Greater speed of getting results,
rater accuracy of results,
1s Greater seope, and
# Adaptability
thas ts own limitations and the advantages of sampling over
Sample survey
eration can be derived only if
‘complete €
+ the units are drawn in a scientific mannSt
«an appropriate sampling technique i8 used, and
7 pete of nia ateted abe sume SISK
Basic principles of sample surveys
two basic principles for sample SNS are
1. Validityresults
‘be
‘By validity, we mean that the sample should be so selected tha
could be interpreted objectively in terms of probability. The principle will
isfied by selecting a probability sample, which ensures that there fs some
definite, pre-assigned probability for each individual of the population.
Efficiency is measured by the inverse of the sample variance of the estimator.
Cost is measured by the expenditure incurred in terms of money or man-hours.
‘The principle of optimization insures that a given level of efficiency will be
reached with minimum cost or that the maximum possible efficiency will be
attained with a given level of cost.
Sampling and non-sampling errors
‘The error which arises due to only a sample (a part of population) being used
to estimate the population parameters and draw inferences about the popula-
tion is termed sampling error or sampling fluctuation. Whatever may be the
degree of cautiousness in selecting a sample; there will always be a difference
en the parameter and its corresponding estimate. This error is inherent
and unavoidable in any and every sampling scheme. A sample with the smallest
sampling error will always be considered a good representative of the popula
tion. This error can be reduced by increasing the size of the sample (number of
units selected in the sample). In faet, the decrease in sampling error is inversely
proportional to the square root of the sample size. When the sample survey
‘a census survey, the sampling error becomes zero,
betwee
‘The non-sampling errors primarily arise at the following stages:
‘« Fuilure to measure some of units in the selected sample
'» Observational errors due to defective measurement technique
« Errors introduced in editing, coding and tabulating the results.
‘errors are present in both the complete enumeration survey and
factice, the census survey results may suffer from non-
from sampling error. ‘The non-
while sampling
Non-sampling
the sample survey. Tn pri
sampling errors although these may be free
sampling
error is likely to increase with increase in sample size,
error decreases with increase in sample size.Simple Random Sampling
A procedure for selecting a sample of size n out of a finite population of size
WN in which each of the possible distinct samples has an equal chance of being
elected is called random sampling or simple random sampling. We may have
two distinct types of simple random sampling as follows:
‘+ Simple random sampling with replacement (srsw7).
‘* Simple random sampling without replacement (srswor),
In sampling with replacement a unit is selected from the population consisting
of N units, its content noted and then returned to the population before the
next draw is made, and the process is repeated n times to give a sample of
sie tai method, at each draw, each of the N units of the population gets
ne same probability 1/IN of being selected. Here the same unit of the popula-
tion may occur more than once in the sample.
In simple random sampling without replacement a unit is selected, its content
noted and thegnit is not returned to the population before next draw is made.
The process is repeated n times to give a sample of n units, In this method
at the r* draw, each of the N — r+ 1 units of the population gets the same
probability 1/(N — r+ 1) of being included in the sample. Here any unit of the
population cannot occur more than once in the sample.
Stratified Random Sampling
If the population is very heterogeneous and considerations of cost limit the size
of the semple, it may be found impossible to get: a sufficiently precise estimate
ty taking a simple random sample from the entire population. For this, one
possible way to estimate the population mean or total with greater precision is
posfivide the population in several groups (sub-population or classes, these sub-
populations are non-overlapping) each of which is mom homogenous than the
On eNiraw w random sample of predetermined size from each
The groups, into which the population is divided, are called
otita or each group is called stratum and the whole procedure of dividing the
population into the strata and then drawing random sample from each one of
the strata is called stratified random sampling.
entire population
one of the groups.
Principal Reasons for Stratification:
de a heterogeneous population into strata i such,
n is internally homogeneous:
(cost consideration), field
s in saving in cost and
« To gain in precision, div!
‘away that each stratum
mmodate administrative convenience
To accor
‘strata, which usually results
work is organized by
effort.* To obtain separate estimates for strata.
We can accommodate different sampling plan in different strata.
e We can have dat
‘a of known precision for certain sub.
subd:
divisions treating each
ivision as a population in its own right.2 ‘Testing of Hypothesis
2.1. Statistical Hypothesis
‘A statistical hypothesis iva statoment about the nature of » population. Te is
en stated in terms of population parameter.
‘To test a statistical hypothesis, we must decide whether that hypothesis appears
to be consistent with the data of the sample.
2.1.1 Null Hypothesis
A tobacco firm claims that it has discovered a new way of curing tobacco leaves
that will result in a mean nicotine content of a cigarette of 1.5 milligrams or less.
A researcher is skeptical of this claim and indeed believes that the mean will
exceed 1.5 milligrams, To disprove the elaian of the tobacco firm, the researcher
has decided to test its hypothesis that the mean is less than or equal to 1.5
milligrams.
‘This starting hypothesis to be tested is called the null hypothesis and is denoted
by Ho. Symbolically,
Hon <15,
where j. denotes the mean nicotine per cigarette
2.1.2 Alternative Hypothesis
The alternative hypothesis, which we denote A; (or, sometimes H.,), contains
the values of the parameter that we consider plausible if we reject the null hy
pothesis.
Our null hypothesis is that je < 1.5. What's the alternative?
Researcher believes that the mean nicotine content exceeds 1.5 milligrams, so
his/her alternative hypothesis can be written symbolically as
Hy: p> 1s
2.1.3 Test Statis
value is determined from the sample data.
statistic, the null hypothesis will be rejected
sured by the new
A test statistic is a statistic whose
Depending on the value of this test s
or [Link] test Ho: < 1.5, a random sample of cigarettes
method should be chosen and their nicotine content measured,
“The decision of whether to reject the null hypothesis is based on the value of a
test statistic.
10The critical region, also called the rejeetion region, is that set of values of the
test statistie for which the null hypothesis is rejected
In the cigarette example being considered, the test statistie might be the ave
axe nicotine content of the sample of cigarettes,
The statistical test would then reject the null hypothesis when this test statistic
was stlficiently larger than 1.5
2.1.5 Statistical Test
The statistical test of the mull hypothesis Hy is completely specified once the
test statistic andl the critical region are specified
If TS denotes the test statistic and C denotes the critical region, then the
statistical test of the null hypothesis Hy is as follows
Reject Hy if TS isin C,
Do not reject Hy, if TS is not in C.
For instance, in the nicotine example we have been considering, if
=08 and n=36,
then one possible test of the null hypothesis is
Reject Ho.
Do not reject Ho.
uHo is Consistent with the Data
1 Teyana 2
P(X > 1.7) > 0.05
Hy is not Consistent with the Data
1 15. Ls
P(X > 18) < 0.05
Interpretation
The rejection of the null hypothesis Ho
appear to be consistent with the observ
‘The result that Ho is not rejected is a weal
{s a strong statement that Hy does not
wed data.
statement that should be interpreted
to mean that Ho is consistent with the data.
n2.1.6 Two Types of Errors
Thus, in any procedure for testing a given null hypo
Thin ay Hrs ypothesis, two diferent types
Irype Terror: if the test rejects H when Ho is true, (Fase negative
Ilervor: if the test does not reject Hy when Ho is fase. (False positive)
2.1.7 Obje
ve of a Si
istical Test
Now, tt must be understood Husk th Ob) ects of Bata! ee 1 tr
hypothesis Ho is not to determine whether F
Whe truth is consistent with the resultant data
rherefore, given this objective, I is reasonable hat Ho shoul be rejected only
‘Luke sample data are very unlikely when His true
2.1.8 Level of Significance
secomplshing this iso specify a small value © 9 then
sg ay ua whenever His true, ts Probe
require that the test have th
ce rejected is Yes than or equal 12 a
in advance
atid the lve of significance ofthe est A
ap 10, 0.05, and 0.01
The value
‘with commonly chosen ¥
F sposhess that anew method of prod
suppose that Hf 8 the ByPott
Se aire a rejection of Ho would restlt
For fs superior to the one PE ret
Avalueof a
in a change of method
aoa aben Hg is true is a
a, chat i, we would want OS™
Summary
Hy: lies in R
Hy :6 does not lie in R.
1B© (RMermine the probability distribution ofthe point estimator when Hy ie
true,
__* specify the critical region so that the probability that the
estimator will
fallin that region when 1 is true is less than oF equal to a
2.2 Tests Concerning the Mean of a Normal Population
2.2.1 Case of Known Variance: Z-test
One Sample Z- test
* We have a random sample from « normal population;
* interested in testing hypotheses concerning the population mean;
* population variance is known;
* sample size does not matter,
2 = Vil —w)/o
TworSided Z-test
Hoiu= bo ve Hint pg
a NR me
Fejet Hy He —— Done eoet > let Hy
One-Sided Z-test (Right-Tailed)
Hon S my vs Hisw> pw2.2.2 Case of Unknown Variance: t-test
‘One Sample t-test
© We have a random sample from a normal population; y
* interested in testing hypotheses concerning the population mean;
* population varianee is unknown;
‘© sample size is small
Z = ValX — 9)/S
Two-Sided t-test
Honan ws Hint po
058
bit To g
eject Le — Do nat ety "Paty
(One-Sided t-test (Right-Tailed)
Ho ino vs Hh n> Ho:The appropriate significauce-level-a testis us follows:
Reject Ho, if |TS| > zay2,
Do not reject Ho, otherwise
2.8.2 Case of Unknown Variances: Large Sample Test
Zotest
‘ We have independent samples from two normal populations
«+ interested in testing hypotheses concerning the respective population means;
* population variances are unknown;
© sample sizes are large (at least 20)
When ji, = j1y, the test statistic TS, given by
tees
T
Vsti + S/n"
will have an approximately standard normal distal
162.8.3 Case of Unknown Vari
5 ees: Small Sample Test
* We have independent samples from two normal populations
* interested in testing hypotheses concerning the respective population means
* Population variances are unknown but approximately equal
© sample sizes are small
The estimator 42 defined by
1
is called the pooled estimator ofthe common unknown variance
To test Hi ninst Hi He # ly We use the test statisti
x7
1s
When Ho is tru rove TS hav bution with 1 egress of
freedom. ‘The significan is then t
TS)
2.8.4 Case of Paired-sample
against sy where the two
pearren Yaa ‘We can test this null
the data values in a pairing
eae lle
Tha
ead
hy is therefore equivalent to the
Ke 4 The hyp ha
an teat the hypothesis that the population
hypothesis th 0. Thus we
Da constitute & sample from a
that the random variables Dis-s-oL test. Under
cen this null hypothesis by using the
normal popula nest
Jc, TS, is given by
Ho, the ist
ns- MD
The significance-levela test will be toReject Hy if (TS) 2 tect aya
Not reject Hy otherwise.
Summary for Testing Population Mean(s)
i T 7
| I
| Populatien Sample Variance(s) Test
| | Type Size
| Normal | single small known 2
i | or large
Normal | single small unknown t
a ]
Normal | single large unknown eB
Normal | two independent | small known Zz
| or large
Normal | two independent | large | unknown Zz
Normal two independent | small unknown | equal t
Normal | two related small unknown paired-t