0% found this document useful (0 votes)
24 views16 pages

Hypothesis and Sampling

The document discusses fundamental concepts in probability and statistics, including the addition theorem, conditional probability, and the distinction between independent and dependent events. It also covers sampling methods, types of sampling, and the importance of sample surveys in making inferences about populations. Additionally, it explains hypothesis testing, including null and alternative hypotheses, test statistics, and types of errors in statistical testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Hypothesis and Sampling

The document discusses fundamental concepts in probability and statistics, including the addition theorem, conditional probability, and the distinction between independent and dependent events. It also covers sampling methods, types of sampling, and the importance of sample surveys in making inferences about populations. Additionally, it explains hypothesis testing, including null and alternative hypotheses, test statistics, and types of errors in statistical testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
‘The addition theorem is also known as the theorem of total probability Tea and B are mutually exclusive events, then P(A. B) = 0 and, therefore, ‘the addition rule simplifies to P(A B) = P(A) + P(B) Conditional Probability Given two events and 2, each with a positive probability of oceurring, the Probability that A occurs given that has occurred (A conditioned on) equal to P(A|B) = Pion Similarly; the probability that B occurs given that A has occurred (B eondi- 3 Pang tioned on A) is equal to P(B|A) = Pine Independent vs Dependent Events prevents, A and B, are independent if P(A|B) = P(A) or, equivalently, P(BIA) = P(B). Otherwise, the events are dependent. Multiplication Theorem of Probability The probability of simultaneous occurrence of any two events A and B is defined P(ANB) = P(A).P(BIA), if P(A) £0 or, P(ANB) = P(B).P(A\B), if P(B) 40 1f A and B are independent events, then the probability that A and B both occur equals the product of the probability of A and the probability of B: that is, P(ANB) = P(A)P(B) 1 Sampling Introduction ‘A sample survey i a metho of drawing an inference about the characteristics of» population or universe by alneving a part ofthe population. Fr example, then one has to mae an inference about m lage I and isnot practicable to xamine each individual member of the lot, one always takes bel of sample Surveys, that i to my one exatsines only ow member ofthe let aad, on the bas of his nap intration, one nak dc ant te web Tha a person wanting to purchase basket of oranges may examin fo on tram the bak and on ha bas mae Hs deo aout the wile bk Types of Sampling Sampling i first broadly clasified as Subjective and Objective. Any type of sampling which depends upon the personal judgment or disretion Of the sampler hirnself is called Subjective, But the sampling method which is fixed by a sampling rule or is independent of the sampler’s own judgment is Objective sampling, Objective sampling is again classified into two subgroy i. non-probabilistic, fi, probabilistic and mixed sampling, In non-probabilistic objective sampling, there isa fixed sampling rule but there is no probability attached to the mode of selection, e.g. selecting every 5° in- Tist. If, however, the selection of the first individual is made in dividual from Such a manner that each of the first 10 gets an equal chance of being selected, it becomes a case of mixed sampling, if for each individual there isa definite pre- ity of being selected, the sampling is said to be probabilistic assigned probabil ‘The collection of all units of a specified type in a given region at js termed as a population or universe. For farms, houses or automobiles in a Population: ; yersons, families, : : birds in a forest etc region of a population of trees or a 1 to be finite population or an infinite population according units in itis finite or infinite A population is 1 coer the ete polation ad thes empling Jnl Toe gum ond non-ovenapotg in ie sia hl YS inate din elnge to oe nnd onl one ning wat, Fe a re cacy wanly ly 1 nel oe ned to De nor group of farms owed of operated ‘Sampling Unit: The sampling since it is form information. In a crop Survey; ‘essential to have a frame of all the sampling units belonging to the population to be studied with their proper identification particulars and such a frame is called the sampling frame, ‘This may be a list of units with their identification particulars. AAs the sampling frame forms the basic material from which a sample is drawn, it should be insured that the frame contains all the sampling units of the pop. ulation under consideration but excludes units of any other population, ‘Sample: A sample is a subset of a population selected to obtain information concerning the characteristics of the population. In other words, one or more ‘sampling units selected from a population according to some specified procedure are said to constitute a sample. Random sample: A random or probability sample is a sample drawn in such ‘@ manner that each unit in the population has a predetermined probability of selection, Sample space: The collection of all possible sample, sequence, sets is called the sample space. Sampling design: The combination of the sample space and the associated probability measure is called a sampling, design. I are usually unknown, Parameter: Statistical constants of the population wh Statistic: In practice parameter values are not known and their estimates based mple values are generally used. Thus statistic which may be regarded eter, obtained from the sample, is a function of on the fas an estimate of the para sample values only. Estimator: An estimator is a statistic obtained by a specified procedure for estimating a population parameter. The estimator is a random variable, as its value differs from sample to sample and the samples are selected with specified probabilities. The particular value, which the estimator takes for @ given sam- ple, is known as an estimate, drawn from a finite population of si an compute statisti, whieh will obviously vary from sample to sample, The ausregate of the various values of the statistic under consideration o obtained (one from each sample), may be grouped into a frequeney distribution whieh known as the sampling distribution of the statistic. oe Standard Error: The standard deviation of the sampling distribution of Statistic is known as its standard e*40r- Sampling and Complete Enumeration ‘rhe total count of all units of the population for a certain, harecetetTS is ‘The ‘oti Caplets entmeration, ao termed census survey. The anes, Ta ae competed for earrying out complete enumeration wit generally PeTarge and there are many’ situations with Timited means wiper complete em be ar nok be posible, where recourse to sclction of few oot ‘will be Teiptul, When only a part, called sample, is selected from the ‘population and cecmine, it is ealled sample enumeration or sample SUIVeY ‘A sample survey will usually be les expensive then e copa AIT ‘and the A sare rmation wil obtain in les time, This doesnot imply (Sat CoE is deere consideration in conducting a sample survey, Tes eh, important that iegroe of accuracy of results js also maintained ‘Occasionally, the technique a ceeiple survey is applied to verify that the resis ‘obtained from the census ie survey Over census SUTVEY Surveys. ‘The main advantages or merits of samt may be outlined as follows: «¢ Reduced cost of surveys « Greater speed of getting results, rater accuracy of results, 1s Greater seope, and # Adaptability thas ts own limitations and the advantages of sampling over Sample survey eration can be derived only if ‘complete € + the units are drawn in a scientific mannSt «an appropriate sampling technique i8 used, and 7 pete of nia ateted abe sume SISK Basic principles of sample surveys two basic principles for sample SNS are 1. Validity results ‘be ‘By validity, we mean that the sample should be so selected tha could be interpreted objectively in terms of probability. The principle will isfied by selecting a probability sample, which ensures that there fs some definite, pre-assigned probability for each individual of the population. Efficiency is measured by the inverse of the sample variance of the estimator. Cost is measured by the expenditure incurred in terms of money or man-hours. ‘The principle of optimization insures that a given level of efficiency will be reached with minimum cost or that the maximum possible efficiency will be attained with a given level of cost. Sampling and non-sampling errors ‘The error which arises due to only a sample (a part of population) being used to estimate the population parameters and draw inferences about the popula- tion is termed sampling error or sampling fluctuation. Whatever may be the degree of cautiousness in selecting a sample; there will always be a difference en the parameter and its corresponding estimate. This error is inherent and unavoidable in any and every sampling scheme. A sample with the smallest sampling error will always be considered a good representative of the popula tion. This error can be reduced by increasing the size of the sample (number of units selected in the sample). In faet, the decrease in sampling error is inversely proportional to the square root of the sample size. When the sample survey ‘a census survey, the sampling error becomes zero, betwee ‘The non-sampling errors primarily arise at the following stages: ‘« Fuilure to measure some of units in the selected sample '» Observational errors due to defective measurement technique « Errors introduced in editing, coding and tabulating the results. ‘errors are present in both the complete enumeration survey and factice, the census survey results may suffer from non- from sampling error. ‘The non- while sampling Non-sampling the sample survey. Tn pri sampling errors although these may be free sampling error is likely to increase with increase in sample size, error decreases with increase in sample size. Simple Random Sampling A procedure for selecting a sample of size n out of a finite population of size WN in which each of the possible distinct samples has an equal chance of being elected is called random sampling or simple random sampling. We may have two distinct types of simple random sampling as follows: ‘+ Simple random sampling with replacement (srsw7). ‘* Simple random sampling without replacement (srswor), In sampling with replacement a unit is selected from the population consisting of N units, its content noted and then returned to the population before the next draw is made, and the process is repeated n times to give a sample of sie tai method, at each draw, each of the N units of the population gets ne same probability 1/IN of being selected. Here the same unit of the popula- tion may occur more than once in the sample. In simple random sampling without replacement a unit is selected, its content noted and thegnit is not returned to the population before next draw is made. The process is repeated n times to give a sample of n units, In this method at the r* draw, each of the N — r+ 1 units of the population gets the same probability 1/(N — r+ 1) of being included in the sample. Here any unit of the population cannot occur more than once in the sample. Stratified Random Sampling If the population is very heterogeneous and considerations of cost limit the size of the semple, it may be found impossible to get: a sufficiently precise estimate ty taking a simple random sample from the entire population. For this, one possible way to estimate the population mean or total with greater precision is posfivide the population in several groups (sub-population or classes, these sub- populations are non-overlapping) each of which is mom homogenous than the On eNiraw w random sample of predetermined size from each The groups, into which the population is divided, are called otita or each group is called stratum and the whole procedure of dividing the population into the strata and then drawing random sample from each one of the strata is called stratified random sampling. entire population one of the groups. Principal Reasons for Stratification: de a heterogeneous population into strata i such, n is internally homogeneous: (cost consideration), field s in saving in cost and « To gain in precision, div! ‘away that each stratum mmodate administrative convenience To accor ‘strata, which usually results work is organized by effort. * To obtain separate estimates for strata. We can accommodate different sampling plan in different strata. e We can have dat ‘a of known precision for certain sub. subd: divisions treating each ivision as a population in its own right. 2 ‘Testing of Hypothesis 2.1. Statistical Hypothesis ‘A statistical hypothesis iva statoment about the nature of » population. Te is en stated in terms of population parameter. ‘To test a statistical hypothesis, we must decide whether that hypothesis appears to be consistent with the data of the sample. 2.1.1 Null Hypothesis A tobacco firm claims that it has discovered a new way of curing tobacco leaves that will result in a mean nicotine content of a cigarette of 1.5 milligrams or less. A researcher is skeptical of this claim and indeed believes that the mean will exceed 1.5 milligrams, To disprove the elaian of the tobacco firm, the researcher has decided to test its hypothesis that the mean is less than or equal to 1.5 milligrams. ‘This starting hypothesis to be tested is called the null hypothesis and is denoted by Ho. Symbolically, Hon <15, where j. denotes the mean nicotine per cigarette 2.1.2 Alternative Hypothesis The alternative hypothesis, which we denote A; (or, sometimes H.,), contains the values of the parameter that we consider plausible if we reject the null hy pothesis. Our null hypothesis is that je < 1.5. What's the alternative? Researcher believes that the mean nicotine content exceeds 1.5 milligrams, so his/her alternative hypothesis can be written symbolically as Hy: p> 1s 2.1.3 Test Statis value is determined from the sample data. statistic, the null hypothesis will be rejected sured by the new A test statistic is a statistic whose Depending on the value of this test s or [Link] test Ho: < 1.5, a random sample of cigarettes method should be chosen and their nicotine content measured, “The decision of whether to reject the null hypothesis is based on the value of a test statistic. 10 The critical region, also called the rejeetion region, is that set of values of the test statistie for which the null hypothesis is rejected In the cigarette example being considered, the test statistie might be the ave axe nicotine content of the sample of cigarettes, The statistical test would then reject the null hypothesis when this test statistic was stlficiently larger than 1.5 2.1.5 Statistical Test The statistical test of the mull hypothesis Hy is completely specified once the test statistic andl the critical region are specified If TS denotes the test statistic and C denotes the critical region, then the statistical test of the null hypothesis Hy is as follows Reject Hy if TS isin C, Do not reject Hy, if TS is not in C. For instance, in the nicotine example we have been considering, if =08 and n=36, then one possible test of the null hypothesis is Reject Ho. Do not reject Ho. u Ho is Consistent with the Data 1 Teyana 2 P(X > 1.7) > 0.05 Hy is not Consistent with the Data 1 15. Ls P(X > 18) < 0.05 Interpretation The rejection of the null hypothesis Ho appear to be consistent with the observ ‘The result that Ho is not rejected is a weal {s a strong statement that Hy does not wed data. statement that should be interpreted to mean that Ho is consistent with the data. n 2.1.6 Two Types of Errors Thus, in any procedure for testing a given null hypo Thin ay Hrs ypothesis, two diferent types Irype Terror: if the test rejects H when Ho is true, (Fase negative Ilervor: if the test does not reject Hy when Ho is fase. (False positive) 2.1.7 Obje ve of a Si istical Test Now, tt must be understood Husk th Ob) ects of Bata! ee 1 tr hypothesis Ho is not to determine whether F Whe truth is consistent with the resultant data rherefore, given this objective, I is reasonable hat Ho shoul be rejected only ‘Luke sample data are very unlikely when His true 2.1.8 Level of Significance secomplshing this iso specify a small value © 9 then sg ay ua whenever His true, ts Probe require that the test have th ce rejected is Yes than or equal 12 a in advance atid the lve of significance ofthe est A ap 10, 0.05, and 0.01 The value ‘with commonly chosen ¥ F sposhess that anew method of prod suppose that Hf 8 the ByPott Se aire a rejection of Ho would restlt For fs superior to the one PE ret Avalueof a in a change of method aoa aben Hg is true is a a, chat i, we would want OS™ Summary Hy: lies in R Hy :6 does not lie in R. 1B © (RMermine the probability distribution ofthe point estimator when Hy ie true, __* specify the critical region so that the probability that the estimator will fallin that region when 1 is true is less than oF equal to a 2.2 Tests Concerning the Mean of a Normal Population 2.2.1 Case of Known Variance: Z-test One Sample Z- test * We have a random sample from « normal population; * interested in testing hypotheses concerning the population mean; * population variance is known; * sample size does not matter, 2 = Vil —w)/o TworSided Z-test Hoiu= bo ve Hint pg a NR me Fejet Hy He —— Done eoet > let Hy One-Sided Z-test (Right-Tailed) Hon S my vs Hisw> pw 2.2.2 Case of Unknown Variance: t-test ‘One Sample t-test © We have a random sample from a normal population; y * interested in testing hypotheses concerning the population mean; * population varianee is unknown; ‘© sample size is small Z = ValX — 9)/S Two-Sided t-test Honan ws Hint po 058 bit To g eject Le — Do nat ety "Paty (One-Sided t-test (Right-Tailed) Ho ino vs Hh n> Ho: The appropriate significauce-level-a testis us follows: Reject Ho, if |TS| > zay2, Do not reject Ho, otherwise 2.8.2 Case of Unknown Variances: Large Sample Test Zotest ‘ We have independent samples from two normal populations «+ interested in testing hypotheses concerning the respective population means; * population variances are unknown; © sample sizes are large (at least 20) When ji, = j1y, the test statistic TS, given by tees T Vsti + S/n" will have an approximately standard normal distal 16 2.8.3 Case of Unknown Vari 5 ees: Small Sample Test * We have independent samples from two normal populations * interested in testing hypotheses concerning the respective population means * Population variances are unknown but approximately equal © sample sizes are small The estimator 42 defined by 1 is called the pooled estimator ofthe common unknown variance To test Hi ninst Hi He # ly We use the test statisti x7 1s When Ho is tru rove TS hav bution with 1 egress of freedom. ‘The significan is then t TS) 2.8.4 Case of Paired-sample against sy where the two pearren Yaa ‘We can test this null the data values in a pairing eae lle Tha ead hy is therefore equivalent to the Ke 4 The hyp ha an teat the hypothesis that the population hypothesis th 0. Thus we Da constitute & sample from a that the random variables Dis-s-oL test. Under cen this null hypothesis by using the normal popula nest Jc, TS, is given by Ho, the ist ns- MD The significance-levela test will be to Reject Hy if (TS) 2 tect aya Not reject Hy otherwise. Summary for Testing Population Mean(s) i T 7 | I | Populatien Sample Variance(s) Test | | Type Size | Normal | single small known 2 i | or large Normal | single small unknown t a ] Normal | single large unknown eB Normal | two independent | small known Zz | or large Normal | two independent | large | unknown Zz Normal two independent | small unknown | equal t Normal | two related small unknown paired-t

You might also like