Advanced Sampling Theory
Advanced Sampling Theory
Advanced Sampling
Theory with Applications
How Michael 'selected' Amy
Volume I
by
Sarjinder Singh
St. Cloud State University,
Department of Statistics,
St. Cloud, MN, U.S.A.
SPRINGER-SCIENCE+BUSINESS M E D I A , B . V .
A C.LP. Catalogue record for this book is available from the Library of Congress.
PREFACE
XX I
1.0 Introduction 1
1.1 Population 1
1.1.1 Finite popul ation 1
1.1.2 Infinite population 1
1.1.3 Target population 1
1.1.4 Study popul ation 1
1.2 Sample 2
1.3 Examples of populations and samples 2
1.4 Census 2
1.5 Relati ve aspects of sampling versus census 2
1.6 Stud y variable 2
1.7 Auxiliary variable 3
1.8 Difference betwe en stud y variable and auxiliary var iable 3
1.9 Parameter 3
I. I0 Statistic 3
I. I I Stat istics 4
1.12 Sample se lectio n 4
1.12.1 Ch it method or Lottery method 4
1.12.1.1 With replacement sampling 4
1.12.1.2 Without replacem ent sampling 5
1.12.2 Random number table method 5
1.12.2.1Remainder method 6
1.13 Probability sampling 7
1.14 Probability of selecting a sample 7
1.15 Popu lation mean /tot al 8
1.16 Population moments 8
1.17 Population standard deviation 8
1.18 Population coefficient of variation 8
1.19 Relative mean square err or 9
1.20 Sample mean 9
1.21 Sample variance 9
1.22 Estimator 10
1.23 Estimate 10
1.24 Sample space 10
1.25 Univariate random variable 11
1.25.1 Qualitative random variables 11
VIII Advanced sampling theory with applications
2.0 Introduction 71
2.1 Simple random sampling with replacement 71
2.2 Simple random sampling without replacement 79
2.3 Estimation of population proportion 94
2.4 Searls' estimator of population mean 103
2.5 Use of distinct units in the WR sample at the estimation stage 106
2.5.1 Estimation of mean 107
2.5 .2 Estimation of finite population variance 113
2.6 Estimation of total or mean ofa subgroup (domain) ofa population 118
2.7 Dealing with a rare attribute using inverse sampling [23
2.8 Controlled sampling 125
2.9 Determinant sampling 127
Exercises 128
Practical problems 132
x Advanced samp ling theory with applicat ions
5 U SE OF AUXILIARY INFORMATION:
PROBABILITY PROPORTIONAL TO SIZE AND
WITHOUT REPLACEMENT (PPSWOR) SAMPLING
5. I I
Ordered and unordered estimators 444
5.11.1 Ordered estimators 445
5.11.2 Unordered estimators 449
5.12 Rao--Hartley--Cochran (RHC) sampling strategy 452
5.13 Unbiased strategies using IPPS sampling schemes 462
5.13.1 Estimation of population mean using a ratio estimator 462
5.13.2 Estimation of finite population variance 464
5.14 Godambe 's strategy: Estimation of parameters in survey sampling 465
5.14.1 Optimal estimating function 470
5.14.2 Regression type estimators 472
5.14.3 Singh's strategy in two-dimensional space 473
5.14.4 Godambe's strategy for linear Bayes and optimal
estimation 476
5.15 Unified theory of survey sampling 479
5.15.1 Class of admissible estimators 479
5.15.2 Estimator 479
5.15.3 Admissible estimator 479
5.15.4 Strictly admissible estimator 479
5.15.5 Linear estimators of population total 483
5.15.6 Admissible estimators of variances of estimators of total 485
5.15.6. I Condition for the unbiased estimator of variance 485
5.15.6.2 Admissible and unbiased estimator of variance 485
5.15.6.3 Fixed size sampling design 485
5.15.6.4 Horvitz and Thompson estimator and its variance
in two forms 485
5.15.7 Polynomial type estimators 489
5.15.8 Alternative optimality criterion 490
5.15.9 Sufficient statistic in survey sampling 491
5.16 Estimators based on conditional inclusion probabilities 493
5.17 Current topics in survey sampling 494
5.17.1 Surveydesign 495
5.17.2 Data collection and processing 495
5.17.3 Estimation and analysis of data 496
5.18 Miscellaneous discussions/topics 497
5.18.1 Generalized IPPS designs 497
5.18.2 Tam's optimal strategies 498
5.18.3 Use of ranks in sample selection 498
5.18.4 Prediction approach 498
5.18.5 Total of bottom (or top) percentiles of a finite population 499
5.18.6 General form of estimator of variance 499
5.18.7 Poisson sampling 499
5.18.8 Cosmetic calibration 500
5.18.9 Mixing of non-parametric models in survey sampling 501
5.19 Golden Jubilee Year 2003 of the linear regression estimator 504
Exercises 507
Practical Problems 520
XIV Advanced sampling theory with applications
13 M ISCELLANEOUS TOPICS
A pPENDIX
T ABLES
POPULATIONS
BIBLIOGRAPHY
1131
AUTHOR INDEX
1193
HANDY SUBJECT INDEX
1215
ADDITIONAL INFORMATION
1219
PREFACE
I have pro vided a summary of my book from which a stati stician can reach a fruitful
dec ision by makin g a comparison in his/her mind with the existing books in the
international marke t.
Title s) 4
Dedication 2
Table of contents 14
Preface 8 9 I
I 70 13 II 20 2 58
2 66 20 22 19 58 24
3 158 36 68 38 307 61
4 54 9 15 10 84 26
5 180 13 43 15 651 43
6 86 10 29 10 170 21
7 34 8 17 9 72 23
8 116 21 24 19 112 70
9 64 12 11 14 61 57
10 60 3 31 4 162 13
II 86 3 33 5 216 7
12 90 8 24 9 154 28
13 40 6 7 5 100 15
A endix 26 12
Biblio ra h 62
Author Index 22
Subi ect Index 4
Related Books 2
24
This book also covers, in a very simple and compact way, many new topics not yet
available in any book on the intern ational market. A few of these interesting topics
are: median estimation under single phase and two-ph ase sampling, difference
between low level and higher level calibration approach, calibration weights and
design weights, estimation of parametric function s, hidden gangs in finite
populations, compromised imputation, variance estimation using distinct units ,
general class of estimators of popul ation mean and variance, wider class of
estimators of population mean and variance, power tran sformation estimators,
estimators based on the mean of non-sampled units of the auxiliary character, ratio
and regression type estimators for estimating finite population variance similar to
prop osed by Isaki ( 1982), unbiased estimators of mean and variance under
Midzuno 's scheme of sampling, usual and mod ified jackknife variance estimator,
Preface XXIII
This book has 459 tables, figures, maps, and graphs to explain the exercises and
theory in a simple way. The collection of 1179 references (assembled over more
than ten years from journals available in India, Australia, Canada, and the USA) is a
vital resource for researcher . The most interesting part is the method of notation
along with complete proofs of the basic theorems . From my experience and
discussion with several research workers in survey sampling , I found that most
people dislike the form or method of notation used by different writers in the past.
In the book I have tried to keep these notations simple, neat, and understandable. I
used data relating to the United States of America and other countries of the world,
so that international students should find it interesting and easy to understand. I am
confident that the book will find a good place and reputation in the international
market, as there is currently no book which is so thorough and simple in its
presentation of the subject of survey sampling.
The objective , style, and pattern of this book are quite different from other books
available in the market. This book will be helpful to:
In this book I have begun each chapter with basic concepts and complete
derivations of the theorems or results. I ended each chapter by filling the gap
between the origin of each topic and the recent references. In each chapter I
provided exercises which summarize the research papers. Thus this book not only
gives the basic techniques of sampling theory but also reviews most of the research
papers available in the literature related to sampling theory. It will also serve as an
umbrella of references under different topics in sampling theory, in addition to
clarifying the basic mathematical derivations . In short, it is an advanced book, but
provides an exposure to elementary ideas too. It is a much better restatement of the
existing knowledge available in journals and books . I have used data, graphs,
tables, and pictures to make sampling techniques clear to the learners .
XXIV Advanced sampling theory with applications
EXERCISES ,>,
At the end of each chapter I have provided exercises and their solutions are given
through references to the related research papers. Exercises can be used to clarify or
relate the classroom work to the other possibilities in the literature .
At the end of each chapter I have provided practical problems which enable
students and teachers to do additional exercises with real data.
I have taken real data related to the United States of America and many other
countries around the world. This data is freely available in libraries for public use
and it has been provided in the Appendix of this book for the convenience of the
readers . This will be interesting to the international students .
.SOLU.TIO:N·.MANUAL'·'
I am working on a complete solution manual to the practical problems and selected
theoretical exercises given at the end the chapters.
I was born in the village of Ajnoud, in the district of Ludhiana, in the state of
Punjab, India in 1963. My primary education is from the Govt. Primary School,
Ajnoud; the Govt. Middle School, Bilga; and Govt. High School, Sahnewal, which
are near my birthplace. I did my undergraduate work at Govt. College Karamsar,
Rarra Sahib. Still I remember that I used to bicycle my way to college, about 15 km,
daily on the bank of canals. It was fun and that life has never come back. M.Sc. and
Ph.D. degrees in statistics were completed at the Punjab Agricultural University
(PAU), Ludhiana, and most of the time spent in room no. 46 of hostel no. 5.
At present I am an Assistant Professor at St. Cloud State University, St. Cloud, MN,
USA, and recently introduced the idea of obtaining exact traditional linear
regression estimator using calibration approach. From 200 I to 2002 I did post
doctoral work at Carleton University, Canada. From 2000 to 2001 I was a Visiting
Instructor at the University of Saskatchewan, Canada. From 1999 to 2000 I was a
Visiting Instructor at the University of Southern Maine, USA, where I taught
several courses to undergraduate and graduate students, and introduced the idea of
compromised imputation in survey sampling. From 1998 to 1999 I was Visiting
Scientist at the University of Windsor Canada. From 1996 to 1998 I was Research
Officer-II in the Methodology Division of the Australian Bureau of Statistics where
I developed higher order calibration approach for estimating the variance of the
GREG, and introduced the concept of hidden gangs in finite populations. From
1995 to 1996 I was Research Assistant at Monash University, Australia. From 1991
to 1995 I was Research Fellow, Assistant Statistician and then Assistant Professor
at PAU, Ludhiana, India and was also awarded a Ph.D. in statistics in 1991. I have
published over 80 research papers in reputed journals of statistics and energy
science. I am also co-author of a monograph entitled, Energy in Punjab Agriculture,
published by the Indian Council of Agricultural Research, New Delhi.
~CKNOWLED6EMENTS
Indeed the words at my command are not adequate to convey the feelings of
gratitude toward the late Prof. Ravindra Singh for his constant, untiring and ever
encouraging support since 1996 when I started writing this book. Prof. Ravindra
Singh passed away Feb . 4, 2003, which is a great loss to his erstwhile students and
colleagues, including me. He was my major advisor in my Ph.D. and was closely
associated in my research work. Since 1996 Mr. Stephen Hom, supervisor at the
Australian Bureau of Statistics, always encouraged to me to complete this book and
I appreciate his sincere co-operation, contribution and kindness in joint research
papers as well guidance to complete this book. The help of Prof. M.L. King,
Monash University is also appreciated. I started writing this book while staying
with Dr. Jaswinder Singh, his wife Dr. Rajvinder Kaur, and their daughter Miss
XXVI Advanced sampling theory with applications
Jasraj Kaur in Australia during 1996. Almost seven years I worked day and night on
this book, and during May-July, 2003, I rented a room near an Indian restaurant in
Malton , Canada to save cooking time and spent most of the time on this book
Thanks are due to Prof. Ragunath Arnab, University of Durban--Westville, for help
in completing the work in Chapter 10 related to his contribution in successive
sampling, and completing some joint research papers . The help of Prof. H.P. Singh,
Vikram University in joint publications is also duly acknowledged.
The contribution of late Prof. D.S. Tracy , University of Windsor, of reading a few
chapters of the very early draft of the manuscript has also been duly acknowledged.
The contribution of Ms. Margot Siekman, University of Southern Maine in reading
a few chapters has also been duly acknowledged. Thanks are also due to a
professional editor Kathlean Prenderqast, University of Saskatchewan, for critically
checking the grammar and punctuation of a few chapters. Prof. M. Bickis ,
University of Saskatchewan, really helped me in my career when I was on the road
and looking for a job by going from university to university in Canada. Prof. Silvia
Valdes and Ms. Laurie McDermott's help, University of Southern Maine, has been
much appreciated. Thanks are also due to Professor Patrick Farrell, Carleton
University, for giving me a chance to work with him as a post doctoral fellow .
Thanks are also due to Prof. David Robinson at SCSU for providing a very peaceful
work environment in the department. The aid of one Stat 321 student, Miss Kok
Yuin Ong in cross checking all the solved numerical examples, and a professional
English editor Mr. Eric Westphal in reading the entire manuscript at SCSU is much
appreciated. Thanks are also due to a professional editor Dr . M. Cole from England
for editing the complete manuscript, and to bring it in the present form. Mary
Shrode and Mitra Sangrovla, Learning Resources and Technology Service, SCSU,
for help in drawing a few illustrations using NOV A art explosion 600,000 images
collection is duly acknowledged.
The permission of Dimitri Chappas , NOAA/ National Climatic Data Center to print
a few maps is also duly acknowledged. Free access to data given in the Appendix by
Agricultural Statistics and Statistical Abstracts of the United States are also duly
acknowledged. I would also like to extend my thanks to the Editor James Finlay,
Associate Editor Inge Hardon , and reviewers for bringing the original version of the
manuscript into the present form and into the public domain .
Note that I used EXCEL to solve the numerical examples , and while using a hand
calculator there may be some discrepancies in the results after one or two decimal
places . Further note that the names used in the examples such as Amy, Bob, Mr.
Bean, etc., are generic , and are not intended to resemble any real people. I would
also like to submit that all opinions and methods of presentation of results in this
book are solely the author's and are not necessarily representative of any institute or
organization. I tried to collect all recent and old papers, but if you have any
published related paper and would like that to be highlighted in the next volume of
my book, please feel free to mail a copy to me, and it will be my pleasure to give a
suitable place to your paper . To my knowledge this will be the first book , in survey
sampling, open to everyone to share contribution irrespective your designation ,
status, group of scientists, journals names, or any other discriminating character
existing in this world, you feel. Your opinions are most welcome and any suggestion
for improvement will be much appreciated via e-mail.
Sarjinder Singh (B:Sc., M.Sc., Ph.D ., Gold Medalist, and Post Doctorate)
Assistant Professor, Department of Statistics, S1. Cloud State University,
S1. Cloud, MN, 56301-4498, USA E-mail: [email protected]
1. BASIC CONCEPTS AND MATHEMATICAL NOTATION
1.0 INTRODUCTION
In this chapter we introduce some basic concepts and mathematical notation , which
should be known to every surve y statistician. The meanin g and the use of these
terms is supported by using them in the subsequent chapters.
1.1 POPULATION
If the number of objects or units in the popula tion is count able , it is said to be a
finite population. For example, the number of houses in a suburb is a finite
population.
A finite or infinite population about which we requ ire information is called target
population. For example, all 18 year old girls in the United States .
This is the basic finite set of individuals we intend to study. For exa mple, all 18 year
old girls whose permanent address is in New York .
The following table provides some of the major differences between a sample and a
census .
.. . ..,
: i: ,lc~f~:lir,,~~~ AspeCt· i : '· '"Y'.i' : Hy ,..." . •ll~Ji:~ . ,.... .'F, : ' · " ·· c Ji'~~/ "'.i:i' $ l !l:; j :;: '''Census: · j : l( '';·)~
Cost Less More
Effort Less More
Time consumed Less More
Errors May be predicted with certain confidence No such errors
Accuracy of More Less
measurements
The variable of interest or the variable about which we want to draw some inference
is called a study variable . Its value for the til unit is generally denoted by Yi' For
example , the life of the bulbs produced by certain plant can be taken as a study
variable .
Chapter I : Basic concepts and mathematical notation 3
1. 7 AUXILIARY VARIABLE
A variable hav ing a direct or indirect relationship to the study variable is called an
auxiliary variable. The value of an auxiliary variable for the /" unit is generally
denoted by X i or zi , etc .. For example, the time or money spent on producing each
bulb by the plant to maintain the quality can be taken as an auxiliary variable.
The main differences between the study variable and auxiliary variable are as
follows :
\
Factors >', ;> ;',.f "S tudy/V ariable Auxiliary Variable
Cost More Less
Effort More Less
Sources of availability Current Surveys or Current or Past Survey,
Experiments Books or Journals etc.
Interest of an investigator More Less
Error in measurement More Less
Sources of error More Fewer
Notation Y X,Z
1.9 PARAMETER
An unknown quantity, which may vary over different sets of values forming
population is called a parameter. Any function of population values of a variable is
called a parameter. It is generally denoted by O .
Mathematically, suppose a population n consists of N units and the value of its /"
unit is Yi . Then any function of Y; values is a parameter, i.e.,
Parameter = f(Y1'Y2 ' .... ' YN ). (1.9 .1)
For example, if Y; denotes the total life time of the /" bulb , then the average life
time of the bulbs produced by the company is a parameter and is given by
I
Parameter = -(l\+Y2+ .... +YN ) . (1.9 .2)
N
1.10 STATISTIC
I: 11 STATISTICS
A sample can be selected from a population in many ways . In this chapter, we will
discuss only two simple methods of samp le selection. As the readers get familiar
with sample selection, more complicated schemes will be discussed in following
chapters.
Suppose we have N = 10,000 blocks in New York City . We wish to draw a sample
of n = 100blocks to draw an inference about a character unde r study, e.g., average
amount of alcohol used or number of bulbs used in each block produced by a
certain company. Assign numbers to the 10,000 blocks and write these numbers on
chits and fold them in such way that all chits look identical. Put all the chits in a
box. Then there are two poss ibilities :
Select one chit out of 10,000 chits in the box and note the number of the block
written on it. This is the first unit selected in the sample. Before selecting the
second chit, we replace the first chit in the box and mix with the other chits
thoroughly. Then select the second chit and note the name of the block written on it.
This is called the second unit selected in the sample. Go on repeating the process,
until 100 chits have been selected. Note that the chits are selected after replacing the
previous chit in the box some chits may be selected more than once. Such a
sampling procedure is called Simple Random Sampling With Replacement or
simply SRSWR sampling. Let us expla in with a few numbers of block s in a
population as follows :
In general, the total number of samples of size n drawn from a population of size
N in with replacement sampling is Nil and is denoted by s(n).
Thus
s(n) = s", (1.12 .1)
Now imagine the situation , 'How many WR samples, each of n = 100blocks, are
possible out of N = 10,000blocks?'
In case of without replacement sampling, we do not replace the chit while selecting
the next chit; i.e., the number of chits in the box goes on decreasing as we go on
selecting chits. Hence, there is no chance for a chit to be selected more than once.
Such a sampling procedure is called Simple Random Sampling and Without
Replacement or simply SRSWOR sampling. Let us explain it as follows: Suppose a
population consists of N = 3 blocks A, Band C. We wish to draw all possible
unordered samples of size n = 2. Evidently, the possible samples are : AB,
AC, BC. Thus a total of 3 samples of size 2 can be drawn from the population of
size 3, which in fact is given by 3C 2 = 3 . In general, the total number of samples of
size n drawn without replacement from a population of size N is given by NCII or
Thus
N N!
s (n) = CII = ( ) (1.12.2)
n! N-n.
where n! = n(n-IXn - 2).......2.1 , and O! = I.
Now think again, 'How many WOR samples, each of n = 100 blocks, are possible
out of N = 10,000blocks?'
Note that it is a very cumbersome job to make identical chits if the size of the
population is very large. In such situations, another method of sample selection is
based on the use of a random number table . A random number table is a set of
numbers used for drawing random samples. The numbers are usually compiled by a
process involving a chance element, and in their simplest form, consist of a series of
digits 0 to 9 occurring at random with equal probability.
As mentioned above, in this table the numbers from 0 to 9 are written both in
columns and rows. For the purpose of illustrations, we used Pseudo-Random
Numbers (PRN), generated by using the UNIF subroutine following Bratley, Fox,
6 Advanced sampling theory with applications
and Schrage (1983), as given in Table 1 of the Appendix. We generally app ly the
following rules to select a sample:
Rule 1. First we write all random numbers into groups of columns as already done
in Table I of the Appendix. We take as many columns in each group as the number
of digits in the population size.
Rule 2. List all the indiv iduals or units in the population and assign them numbers
1,2,3,...,N.
Rule 3. Randomly select any starting po int in the table of random numbers. Write
all the numbers less than or equal to N that follow the starting point until we obtain
n numbers. If we are using SRSWOR sampling discard any number that is repeated
in the random number table. If we are using SRSWR sampling retain the repeated
numbers .
Rule 4. Select those units that are assigned the numbers listed in Rule 3. This will
constitute a required random sample .
In the case of SRSWOR sampling, the figures 039, 048 would not get repeated; i.e.,
we would take every unit only once, so we will continue to select two more distinct
random numbers as 078 and 163.
Although the above method of selecting a sample by using a random number table
is very efficient, may make a lot of rejections of the random numbers, therefore we
would like to discuss a shortcut method called the remainder method.
1.12.2.LREMAINlfER METHOD
Using the above example, if any three digit selected random number is greater than
225 then divide it by 225. We choose the serial number from 1 through 224
corresponding to the remainder when it is not zero and the serial number 225 when
the remainder is zero. However, it is necessary to reject the numbers from 901 to
999 (besides 000) in adopting this procedure as otherwise units with ser ial number
1 to 99 will have a larger probability (5/999) of selection, while those with serial
Chapter I : Basic concepts and mathema tica l notation 7
number 100 to 225 will have probability only equal to 4/999. If we use this
proced ure and also the same three figure random numbers as given in columns I to
3, 4 to 6, etc., we obtain the sample of units which are assig ned numbers given
below. Agai n in SRSWR sampling the number that gives rise to the same remainder
are not discarded while in SRSWOR sampling procedure such numbers are
discarded . Thus an SRSWR samp le is as give n below:
.... C' , , H Units selected in the sample
138 151 099 025 014 022 197 176 I I 209 042 194
015 049 095 040 027 124 116 097 126 142 073 158
108 053 046 001 207 156 201 027 II I 209 065 184
Note that in the SRSWR sample, only one unit 209 is repeated, thus for SRSWOR
sampling, we continue to apply remainder approach until another distinct unit is
selected, which is 089 in this case. Further note that the first random number 992
was discarded due to requiremen t of this rule .
Every sample selected from the popu lation has some known probabil ity of being
selected at any occ asion . It is generally denoted by the symbo l, PI or p(t) . For
example the probability of selecting a samp le using
with replacemen t sampling, PI = 1/ N n , t = 1,2, ..., N n , (1.14.1)
and
without replacement sampling, PI = 1/ N Cn , t = 1,2 , ... ,N CII • (1.14.2)
The following tab le describes the difference between with replacement and witho ut
replacement sampl ing procedures.
With repl acemen t sampl ing ' .:I··' Without replacement sampl ing
Cheaper Costly
Few units may be selected more than A unit can get selected only once .
once .
Less efficient. More efficient.
Number of possible samp les s(n) = N n
Number of poss ible samples s(n) = N C"
Let Yi , j = 1,2,....,N, denote the value of the ( h unit In a population, then the
population mean is defined as
- 1( ) 1 N
Y = - l"\ + Y2 + ....+ YN = - L Y; ( 1.15.1)
N N ;=\
and popu lation tota l is defined as
Y=(l"\ +Y2 + ····+YN) = ~Y;= NY . (1.15 .2)
;=\
Th e unit s of mea surements of population mean are the sam e as thos e for the actual
data. For exa mple, if the (h unit, Y; , ';j j , is mea sured in doll ars, then the popul ation
mean , Y, has the same units as dollars.
Th e positive square root of the popu lation variance is called the population standard
deviation and it is denoted by O"y . Th e units of measurements of " » will again be
the same as that of actual data. For instance, in the above example, the units of
0"
measurements of y will be doll ars.
The ratio of standar d deviation to population mean is call ed the coe fficient of
variation. It is denoted by Cy that is
(1.1 8.1)
Chapter I : Basic concept s and math ematical notation 9
Evidently Cy is a unit free numb er. It is useful to compare the variability in two
different populations having different units of measur ements, e.g., S and kg. It is
also ca lled the relative standard error (RSE) . Sometim es we also consider
C y ~Sy /Y.
The relative mean square error is defin ed as the square of the coe fficient of
variation Cy and is generally written by RMSE.
Mathematically
2
2 ay (1.19.1)
RMSE = Cy = -=T .
y
Let Yi' i = 1,2,..., 11, deno te the value of the til unit selected in the sample, then the
sample mean is defin ed as
_ 1 11
Y =- L Yi · (1.2 0. 1)
Il i=l
2 1 /l ( \2 (1.21.1)
S =- - L Yi - YJ .
y 11- 1 i= 1
Remark 1.1. The popul ation mean Y and population van ance a; etc., are
unknown quantities (parameters) and can be denoted by the symbol 8 . The sampl e
mean Y and sample variance s~ etc., are known after sampling and are called
statistic and can be denot ed by iJ . Also note that sample standard deviation (or
standard error) and sample coe fficient of variation can also be defin ed as Sy = M
and Cy =
•
--=-
Sy
, respe ctively. Note that standard error is a statistic whe re as standard
Y
deviation is a parameter.
10 Advanced sampling theory with applications
1.22 ESTIMATOR
A statistic 81 obtained from values in the sample s is also called an esti mator of the
population parameter () . Note that the notation 81 , or 8, or 8
11 have same
meaning. For example the notation YI' or Y, or Yll have the same meani ng, and s;,
or S;'(I) have the same meaning. We choose acco rding to our requirements for a
give n top ic or exercise.
1.23 ESTIMATE
Any num eric value obtained from the sample information is called the estimate of
the population parameter. It is also ca lled a statistic.
A pic toria l represe ntatio n of such a sample space is give n in Figure 1.24.1.
e50
T
2 x 2 = 4 outcomes
c/ .< :
Tr ee diagram:
HH
H
.< :
H
T HT
H TH
T
""" First
Coin
T
Seco nd
Coin
TT
Sample
Spa ce
A random variable is a real valued function defin ed on the sample spac e lfI . It is
generally of two type s:
Qualit ative random variables assume values that are not necessar ily numerical, but
can be categorized . For example, Gender has two po ssibl e values: Male and
Female. These two can be arbitrary coded numerically as Female = 0 and Male = I .
Such coded variables are called Nominal variables. In another example, consider
Grades that can take five pos sible values: A, B, C, D , and F. These five
categori es can be arb itrarily coded numerically as: A = 4, B = 3, C = 2, D = 1, and
F = o. Note that here the magnitude of cod ing tells us quality of Grade that if code
is 3 then the Grade is better than the Grade if code is 2. Such a coded variable is
called Ordinal variabl e. Also note that in the case of the Nominal variable, code
Male = I and Female = 0, does not mean that males are superior to female s.
Adding, subtracting or averagin g such qual itative variables has no meaning. Thu s
qualitative variables are of two types: ( a) Nominal var iables; ( b ) Ordinal
variables. Pie charts or Bar charts are generally used to present qualitat ive
variables.
Quantitative random variables can take num erical values for which addin g,
subtrac ting or avera ging such variables does have meanin g. Exa mples of
cont inuou s variables are wei ght, height, numb er of students, etc.. In general, two
types of quantitative random var iables are availabl e: ( a ) Discret e random variable;
( b ) Continuous random variable.
A rando m variabl e is said to be continuous if it can take all possibl e value s bet ween
certain limits. For exa mple, height a student can be 5.6 feet.
12 Advanced sampling theory with applications
Random
Variable
...
Qualitative Quantitative
Note that Age itself is a quantitative variable whereas Age Groups is a qualitative
variable. Pie charts, bar charts, dot plots , line charts, stem and leaf plots, histograms
and box plots are generally used to present quantitative variables.
VAl;UE.ANDNARIANCE"OF A UNIVARIATE
1.28EXPE:~TEJ)
RAN])OMVARIAimE . . .
If a discrete random variable X takes all possible values Xi with probability mass
function , P(Xi) , in the sample space, If, then its expected value is
or, equivalently
Sometimes (1.28.2) is called a formula by definition and that in (1.28 .3) is called a
computing formula.
or equivalently
b
V(x) = Ix 2 f(x)ix - {E(x)}2 . ( 1.28.6)
a
In this case there are a coun tab le number of points XI , Xz , .. . along with associated
1.5
_1
><
ir 0.5
2 3 4 5
x
dF(x)
by f(x) = - - . Th e c.d.f. F(x) IS a non-decre asing function of x and is
dx
continuo us on the right. Also note that F(- 00) = 0, F(+00) = I , 0 ~ F(x) ~ I, and
b
P(a ~ .r s b) = Jf(x)dx = F(b)- F(a) . For exampl e, if x is a continuous random
a
variable with probability den sity function (p.d. f.)
I O c x « I,
f ()
.r = { (1.29.2)
o otherwise,
then its cumulative distribution function (c.d.f.) is given by
0 if x < O,
F(x) = x if 0 ~ .r ~ I,
1
(1.29.3)
I if x> I,
and its graphical representation is given in Figure 1.29.2.
1
1.5
~ :: ~, ~+ -~ ,-~,
0.0 0.2 0.4 0.6 0.8 1.0 1.5 2 2.5 3
x
Example 1.30. 1. A discrete random variab le X has the followi ng probability mass
function:
Select a random sample of three units using the method of random numbers .
Sol ution: The cumulative distribution function of the random variable X is given
by
We used the first six columns of the Pseudo-Random Nu mber (PRN) Table I give n
in the Appendix multi plied by 10-6 as the random ly selected values of F(x). Then
the integral value of the random variable x selected in the sample is obtained using
In case of with replacement sampling, the value of x = 3 has bee n selected twice, as
otherwise for WOR sampling we have to continue the process until three distinct
values of x are not selected .
Exa mple 1.30.2. If x follows a binomial distribu tion with parameters Nand p , that
is, x - B(N,p), say N = 10 and P = 0.4 . Select an SRSWR sample of 11= 4 units by
using the random number method .
Chapter 1: Basic concepts and mathematical notation 17
We used three columns from 7th to 9th of the Pseudo-Random Number (PRN) Tab le
1 give n in the Appendix multiplied by 10- 3 as the randomly se lected va lues of
F(x) . Then the integral value of the random var iable x selected in the sample is
number drawn from the Pseudo-Random Number (PRN) Table 1 given in the
Appendix .
Then the value of the ran dom variab le x selected in the sample is given by
1
16
1 if x > 3.
Select a sample of 11 = 10 units by using SRSWR sampling.
Solution. We are given F(x) = ~(x _I)4 which implies that x = 2[F(x)JI/4 + I . By
16
using the first three column s of the Pseudo -Random Numbers (PRN) Tab le I given
in the Appen dix multiplied by 10- 3 we obtain the observed values of F(x) and the
samp led values of x as:
1'2 F(x) "t '
h .x • ."
C
0.992 2.995988
0.588 2.751356
0.601 2.760956
0.549 2.721563
0.925 2.961397
0.014 1.687958
0.697 2.827419
0.872 2.932676
0.626 2.778990
0.236 2.393985
Using the three column s multiplied by 10-3 , say 7th to 9th, of the Pseudo-Random
Numbers (PRN ) Table I given in the Appendix , the first five observed values of
F(x) are given by 0.622,0.77 1,0.917,0.675 and 0.534 . Thus the sampled five
values from the above distribution are
Chapter I: Basic concepts and mathematical notation 19
'F(x)u" r. x
0.622 0.403214
0.771 1.141487
0.917 3.747745
0.675 0.612801
0.534 0.107222
Note that we have used the tan function in radians and :r = 4 tan- I ( I ).
Solution. The dist ribution of x is uniform between 5 and 10, so its probability
distribution function is
F(x) = p[x ~ x] = ff (x)dx= .!-(x- S) ( 1.30.9)
5 S
which implies that
x =S[F(x)+I] . (1.30.10)
Using the three columns multiplied by 10- 3 , say t h to 9th , of the Pseudo-Random
Number Table I given in the appendix, the first five observed values of F(x) are
given by 0.622, 0.771, 0.917, 0.675 and 0.534 . Thus the sampled five values from
the above distribution are given by
u.
p(x)'
" , .'
.... .~,. X' . •
0.622 8.110
0.771 8.855
0.917 9.585
0.675 8.375
0.534 7.670
If X and Yare discrete random variables, the probability that X will take on the
value x and Y will take on the value y as p(X = x,Y = y) = p(x,y), is called the joint
probability distribution function of a bivariate random variable.
20 Advanced sampling theory with applications
DISCRETERANDOM VARIABLES'
(a) p(x,y)?: 0 for each pair of values (x,y) within its domain.
and
(b) IIp(x,y)= 1, where the sum extends over all possible pairs (x,y) .
xy
If X and Yare discrete random variables and p(x,y) is the value of the joint
probability distribution at (x, y), the function given by
pAx) = Ip(x,y)
y (I .34.I)
for each x with in the range of X is called the marginal distribution of X , and the
function ,
Py{Y) = I p(x,y) ( 1.34.2)
x
for each y within the range of Y is called the marginal distribution of Y.
Letp(x,y)denote the joint probability mass function (p.m .f.) of two random
variables x and y . Also, let F(x,y) denote the cumulative mass function (c.m .f.)
of X and y . It is well known that, the distribution of the marginal distribution
function (m.d.f.) Py(Y) for any joint probability density function of X and y is
rectangular (or uniform) in the range [0, 1]. Random numbers in the random
Chapte r I : Basic concepts and mathematical notation 21
number table also follow the same distribution . Then to find out the value of y one
solves the equation (1.35.1) below .
The known form of the joint dens ity function p(x, y) [one can choose any suitable
form for p(xI>Y)] can be substituted in (1.35 .1). The value / of y so obtained is
used to find the value x * of x . For this we use the cond itional mass funct ion of x
given y = y * since the distribution of the cond itional mass function will also be
un iform in [0, I] . Thus anoth er random number R, is drawn and the value / of X
is determined from the equation
( 1.35.2)
1.36 CONTINUOUSBIVARIATERAND()l\1VAR.IABEE
A bivariate func tion with value s f (x,y ) , defin ed over the two-dimensional plane is
called a j oint prob ability density function of the continuous random variables X
and Y if and only if
A bivar iate function can serve as the joint probability distribution of a pair of
continuous random variables X and Y if and only if its values, f (x, y), satisfy
the conditions:
( a) f(x , y ) ~ 0 for each pair of values (x,y ) withi n its doma in; (1.37 .1)
+00+00
(b) J fJ (x,y ) dxdy = 1. (1.37 .2)
- 00 -00
22 Advance d sa mp ling theory with applications
If X and Yare continuous random var iables, the fun ction given by
y x
F(x, y) = p(x s x, Y :$ y )= J fJ(s, 1}:isdl (1.3 8.1)
-00 - 00
for - 00 < x < + 00 , -00 < y < +00 , where j(s, I) is the value of the j oint probab ility
distribu tion of X and Y at the point (s, I), is calIed the Joint distribution function
or the Joint cumulative distri bution, of X and Y.
If X and Ya re continuous random variables and j(x,y) is the va lue of the j oint
prob ab ility density function, then cumulative marginal probab ility distributi on
function of y is give n by
v +00
Fy(y) = oJ fJ(x, y)dxdy (1.39 .1)
- 00-00
for - 00 < y < +00 , and the cumulative margi nal probability dis tribution funct ion of x
is give n by
x +00
FAx)= J fJ(x,y)dydx ( 1.39.2)
- 00 - 00
In genera l, let j(x, y) deno te the joint probability density functio n (p.d.f.) of two
continuous ran dom varia bles X and y . Also let F(x, y) denote the cumu lative
density function (c.d.f.) of X and y . It is well know n that the distribution of the
marginal distribution function (m.d.f.) F2(y) for any joint probability density
fun ction of X and y is rectangular (or unifo rm) in the range [0, I] . Rand om
nu mbers in the rand om number table also follow the same distributi on. To find out
r
the value of y , one so lves the equ ation ( 1040.1) below.
y =y* since the distribution of the conditional distribution function will also be
uniform in [0, I]. Thu s anoth er random number R, is drawn and the va lue x * of X
is de term ined from the equation:
Example 1.40.1. If the joint prob ability density function of two continuous random
variables x and y is given by,
()
1
f x,y =
~3 (x + 2Y) for O< x <l, O < y <l ,
o otherwise,
then , se lect six pairs of obse rva tions (x, y) by using the Random Number Tabl e
method.
r l'{ r
Solution. We have
Fy(Y)= Y{+oo
f ff(x,y)dx 2 f(x
y= ' f - ) + 2y )dx y = y + 2Y 2
0-00 0 30 3
Let 0 < Rl < 1 be any oth er random number, say obtained by usin g i h
to 9 th
co lumns, of the Pseudo-Random Numbers given in Tabl e I of the Appendix, then
the value of x is given by so lving the integr al
x {
ff~r Iy *\ .
= y p x = Rl or
F( \ ..J
- f 1.99 + x J'IX = 0.622
o 30
or, equiva lently solving a quadr atic equation x 2 + 3.98x - 3Rl = 0 , which implies
24 Advanced sampling theory with applicat ions
1;41 ~lUNBIASEDNESS
where, PI' denote the probability of selecting the (Iii sampl e from the population,
n , and s~) PI = 1. Note that total number of possib le samples, in case of SRSWR
1=1
sampling are, s(n ) = N n ,and in case of SRSWOR sampling are, s(n)=N en '
For example:
( i ) Sample mean YI is an unbiased estimator of population mean Y under both
SRSWR and SRSWOR sampling
Chapter I: Basic concepts and mathematical notation 25
E()lt) = s~)Pt)lt = Y.
_
(1041.2)
1=1
Show that the sample mean j', is an unbiased estimator of population mean f under
both SRSWR and SRSWOR sampling. The sample variance s; is an unbiased
estimator of population mean squared error S; under SRSWOR sampling, and the
population variance iJ; under SRSWR sampling, respectively.
The total number of all possible samples is s(n )= N C; = 4C2 = 6 and Pt = 1/6 .
Now we have the following table .
26 Advanced sampling theory with app lications
Sample Sampled-units ." S amp le mean < ' Sample < \' Probability of
, . .. ,.. <'<,' ,' • ., >
, -~'
The above tab le shows that the distribution of sample means is symmetric and that
of sample variance is skewed to the right in the case of without rep lacement
sampling.
Case II. Suppose we are drawing all possible samples of size n = 2 by usi ng
SRSWR sampling.
The total number of all possible samples is s(n) = N il = 4 2 = 16 and Pt = 1/16 for
all t = 1,2, ..., 16.
,..
t " ,
2
Sy(t) a sample
:'" c't Pt
'"
1 (A, A) or ( I, I) YI = (1 + 1)/ 2=1.0 0.0 1/ 16
2 (A, B) or (1 , 2) Y2 = (1+ 2)/2 = 1.5 0.5 1/l 6
3 (A, C) or (I , 3) Y3 =(1+3)/2 =2.0 2.0 l/16
4 (A, D) or ( I , 4) Y4 = (1 + 4)/ 2 = 2.5 4.5 1/l 6
5 (B, A) or (2, I) Y5 = (2 + 1)/ 2 = 1.5 0.5 l/ 16
6 (B, B) or (2,2) Y6 = (2 + 2)/2 = 2.0 0.0 1/ 16
7 (B, C) or (2,3) Y7 = (2 + 3)/ 2 = 2.5 0.5 1/l 6
8 (B, D) or (2, 4) Y8 = (2+ 4) / 2 = 3.0 2.0 l/ 16
9 (C, A) or (3, I) Y9 = (3 + 1)/ 2 = 2.0 2.0 1/l 6
10 (C, B) or (3 , 2) YIO = (3 +2)/2 = 2.5 0.5 1/l 6
II (C, C) or (3,3) YI I =(3+3) / 2=3 .0 0.0 I/l 6
12 (C, D) or (3, 4) Y12 = (3 +4)/ 2 = 3.5 0.5 l/ 16
13 (D,A) or (4, I) YI3 = (4 + 1)/2 = 2.5 4.5 I/l 6
14 (D, B) or (4,2) YI4 = (4 + 2)/2 = 3.0 2.0 l/16
15 (D , C) or (4,3) YI5 = (4 +3)/2 = 3.5 0.5 1/l 6
16 (D , D) or (4, 4) YI6 = (4 +4)/2 = 4.0 0.0 l/16
Thus the expec ted val ue of the sa mple mean )it is give n by
28 Advanced sampling theory with app licatio ns
£(y,)=
_
-N"1 N" _ 1 16_ 1 40-
L Y/ = - LY, = - (1+ 1.5+ ....+4)= - = 2.5 = Y
/ =1 16 s=1 16 16
and that of the sampl e varia nce s; is given by
z ] 1 N" Z 1 16 Z 1( ) 20 _ 2
£ [sY(') =- " LS y(/) = -LS y(,)= -0+0.5+ ...+ 2+ 0.5 =- = 1.25 -o- y .
N ' =1 16 s =1 16 16
Then we have the following new term .
1.41.1.1 BIAS
It is the difference between the expected value of a statistic ()/ and the actual value
of the parameter () that is
B(O/) = £(0,)-o. ( 1.41.5)
Thu s an estimator 0, is unbi ased if £(0/)=(), which is obvious by setting B(OJ=o.
1.41.2 CONSISTENCY
There are several definitions for the consiste ncy of any statistic, but we will use the
simplest. An estimator 0/ of the population parameter () is said to be consistent if
Lim(O/)= o. (1.4 1.6)
n-too
For example:
( i ) The sample mean y/ ( or simply y) is a consis tent estimator of the finite
popul ation mean, Y.
( ii ) The sample mean squared error s; is a consistent estimator of the population
1.41.3 SUFFICIENCY
An esti mator 0, is said to be suffic ient for a parameter () if the distribut ion of a
sample YI,YZ,...,Y" given 0/ does not depe nd on () . The distribution of 0, then
contains all the information in the samp le relevant to the estim ation of () and
°
knowledge of 0/ and its sampling distribution is 'sufficient' to give that
information . In general, a set of estima tors or statistics 1, Oz, ,Ok are 'jointly
sufficient' for para meters ()" (}z , . .. .. , (}k if the distribution of samp le values given
01 ,Oz , A does not depend on these (}I>(}z, ,(}k .
Chapter I: Basic concepts and mathematical notation 29
1.41.4 EFFICIENCY
Before defining the term efficiency, we shall discuss two more terms, viz., variance
and mean square error of the estimator.
1.41.4.1 VARIANCE
(1.41.7)
MSE(el)= v(e l ) .
Thus if e and e
l 2 are two different estimators of the parameter e then the
estimatore is said to be more efficient than the estimator e2 if and only if
l
MSE(e,) <MSE(eJ
1.42 RELATIVE EFFICIENCY
RE =MSE(e2)xIOO/MSE(e,) . (1.42.1)
The ratio of the absolute value of the bias in an estimator to the square root of the
mean squar e error of the estimator is called the relative bias.
It is defined as:
RB=ls(el)I/~MSE(el) (1.43 .1)
where B(e = E(e e and the
l ) l )- relative bias IS independent of the units of
measurement of the origin al data.
30 Advanced sampling theory with applications
If el, e2, ....,e" are independently distrib uted random variables with E(ej) = e '<j j ,
. I n .
and e= - I ej , then
11 j=l
v(e)= - ( 1_ ) f (ej -e'f (I .44.1)
1111- 1 j=1
is an unbiased estimator of v(e). If ej = e(jl is the l ' estimator of e obtained by
dropping the l'
unit from the samp le of size /I , then such a method of varia nce
estimation are also called Jackknife method of varia nce estimation , and the
estimator of variance takes the form
VJack (e) = (11 - I)
11 j=1
i. (e( jl - ef (1.44. 2)
. I" ·
where e = - I e(jl .
11 j=1
. _ I"
For example, if e = Y = - I Yi is an estimator of the population mean Y under
11 i=1
. I"
SRSWR samp ling, then e(jl = Y(jl = - - . I Yi , denote the estimator of the
II-I'*J=I
population mean Y obtai ned by droppingj" unit from the samp le. Clearly, we can
write
Also
where f = n] N .
Note that this is not always possible to adjust Jackknife estimator of variance to
make it unbiased for other sampling schemes available in the literature.
1.45L.OSSFUNCTION
(1.46 .1)
holds for all possible values of the characteristic under study . Now an estimator 0,
belonging to r is said to be admissible in r if there exists no other estimator in r
which is better than 0,.
A sample survey is a survey which is carried out using sampling methods, i.e., in
which only a portion and not the whole population is surveyed.
32 Advanced sampling theory with applications
Example 1.48.1. Select all possible SRSWR samples each of two units from the
population consisting of four units 1,3,5 and 7.
( a ) Construct the sampling distribution of the sample means.
( b ) Construct the sampling distribution of the sample variances.
Solution. The list of 16 samples of size 2 from the population and the mean of each
sample is given in the following table.
Samnle ..~ 1,1 1,3 1,5 1,7 3,1 3,3 3,5 3,7 5,1 5,3 5,5 5,7 7,1 7,3 7,5 7,7
Means I 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
Variances 0 2 8 18 2 0 2 8 8 2 0 2 18 8 2 0
( a ) The relative frequency distribution of the sample means is
."
Sample Frequency Relative
means frequency
I I 0.0625
2 2 0.1250
3 3 0.1875
4 4 0.2500
5 3 0.1875
6 2 0.1250
7 I 0.0625
Sampling distribution of
the sample means
>. 0.3
Q) <J
~ a; 0.2
ra
8:1 go
::::I
0.1
.;: 0
2 3 4 5 6 7
Sample Means
0 4 0.250
2 6 0.375
8 4 0.250
18 2 0.125
0 .4 .
'"
u 0 .35
Ii 0 .3
& 0 .2 5
£ 0 .2
~ 0 .15
... 0.'
~ 0 .0 5
o.
18
Sam pie v ari a nce
1.49 SAMPLING.FRAME
A sample space lfI (or S) of iden tifiable units or eleme nts of populatio n to be
surveyed is called a sampling frame . It may be a discrete space such as househo lds,
ind ividuals or a continuous space such as area under a particular crop.
Let If/ ={t }, t = 1,2 ,..., s(O) be a speci fied space of samples, B( be a Borel set in
lfI and p( be the probability measure defined on B(, then the triplet (If/, B(, p() is
called a sample survey design.
In general, two types of errors, which arise dur ing the process of sampling, have
bee n ob served in actual practice in the estima tors :
( a ) Sa mp ling errors; ( b) Non-samp ling errors.
Let us brie fly explain these errors.
34 Advanced sampling theory with applications
An error which arises due to sampling is called a sampling error. Let us explain this
with the help of the following example. For a population of size N = 4 , let the units
be A = 1 , B = 2, C = 3, and D = 4. The population mean is given by, Y = 2.5.
There are N Cll =4C2 = 6 possible samples each of size II = 2 . The units selected in
the six samples are : (A, B), (A,C), (A,D), (B, C), (B, D), and (C, D} Thus six
sample means are given by:
2 3 4 5 6
(A, C) (A, D) (B,C) (B,D) (C,D)
or
3,4)
3.5
Ifwe take each of the sample means and population mean separately, then, we have
the following cases: error of (A ,B) = 11.5 - 2.51 = 1.0 ; error of (A, C) = 12.0 - 2.51 = 0.5 ;
error of (A, D) = 12.5 - 2.51 = 0.0 ; error of (B,C) = 12.5 - 2.51 = 0.0; error of
(B, D) = 13.0 - 2.51 = 0.5 ; error of (C,D) = 13.5 - 2.51= 1.0 . Note that we are measuring
only two units out of four units, i.e., we have only partial information in the sample
therefore sampling error arises. One of the measurements for the sampling error is
the variance of the estimator. For example, the variance of the sample mean
estimator, YI' is
The people from whom we get the information are called the respondents and the
people in the sample from whom we do not get information are called non-
respondents . The error which arises, when we fail to get the information is called
non-response error and the phenomenon is called non-response. This error arises
because of the fact that we are not able to cover the whole sample. For example , if
we want to interview 100 farmers and suppose 5 out of them do not allow us to
interview them. Then we are interviewing only 95. So the sample is not complete.
Such errors are called non-response errors.
1.51.ij,MEASQREMENT E~ORS .
The errors that we bring in measuring the characters are called measurement errors.
For example, suppose we want to measure the age of the respondents. Among the
respondents, some may report their age less than their actual age. These types of
errors are called measurement errors.
The errors which arise due to missing some numbers due to non availability of data
or recording some numbers wrongly, while making a table is called a tabulation
error.
After the table is formed, we start our calculations. The errors committed In
calculations are known as computational errors.
'. .
1.52 POINT ESTIMATOR
A point estimator endeavours to give the best single estimated value of the
parameter. For example, the average height of school children is 5.3 feet.
Thus there are two cases: If VVt) is known then for a large sample, a
(1 - a)100% confidence interval estimate for the population mean Y is given by
Yt±Za/2~VVt) (1.54.4)
where Za/2 values are given in Table 3 of the Appendix, and if VV t ) is unknown
then for a small sample, a (I - a)100% confidence interval estimate for the
population mean Y is given by
Yt±la/2(df=n-l~vVt) (1.54 .5)
where la/2[df = n - I) values are given in Table 2 of the Appendix, and df stands for
degree of freedom . Note that if a=0.05 it represents (l -a)100% =(1-0.05)100%
= 95% confidence interval.
( a ) When population va riance is known : The lower and upper limits are
( b ) W hen populatio n variance is not known: The lower and upper limits are
All possible sample s, sample means, and variances, lower and upper limits of the
95% confidence interval, and their coverage is given in the following table.
Thus we observed that the population mean Y = 6.29 lies 20 times between LI and
U 1 out of total 21 times, and hence the observed proportion of confidence intervals
containing population mean = 20/21 = 0.9524. In other words, 95.24% cases the
population mean lies between the confidence interval estimates when variance is
known . Thus the observed percentage is very close to the expected coverage of
95%.
Also we observed, when population variance is not known, then the populat ion
mean lies 16 times between L z and U z, and hence the observed proportion of the
confidence interval estimates containing population mean = 16/21 = 0.7619, that is,
only 76.19% times the population mean lies between the confidence interval
estimates when variance is unknown. Here the observed percentage of the coverage
is lower than the expected coverage of 95%. This may be due to very small sample
and population size. In practice as the sample size becomes large, (How large? Just
smile because there is no unique answer), then the observed proportion of coverage
in both cases converges to 95%.
0, that is, Yi = 1 (ifi E A) and Yi = 0 ~fi E A C ) , then the sample mean Y s also
becomes sample proportion P, as follows:
- 1~ 1(
Yt=-L.Yi=-I+O+1+0+ 0 +1 ) =-=p,
n\ ' (1.56.1)
n i; \ n n
where nj denotes the number of units of the sample in the group A, and n denotes
the total number of units in the sample. Note that the value of sample proportion
also lies between 0 and I that is 0:0; P:0; 1 .
Example 1.57.1. Consider a class consisting of 6 students. Their names and major
are given in the following table:
',';/ ''''' N am A,,, } IT" l'",,), " I . ,' , " '" ',i , ,'
Amy Math
Bob English
Chris Math
Don English
Erin Math
Frank English
40 Advanc ed sampling theory with applications
( b ) How many SRSWOR samples, each offour units, will there be?
The possible combinations of choosing 4 objects out of 6 object s are given by:
6 = 6! =~=6 x5 x4 x3 x2 xl =15.
C4
4!(6-4) 4!x2! 4 x3 x2 xl x2 xl
Note that each combination can be taken as a without replacement sample, so the
total number of distinct samples will be 15.
( c ) Sampling distribution of estimate of propo rtion: Let us construct those 15
samples as follow s:
The above table shows that the distribution of estimates of proportion is symmetric,
or say normal.
0'2 = V(Xi ) = v(p) = [i~I(P;.1} )]- ell f = [PF(f + P2X~ + P3X!]- ell f
=[~ Xo.252 + ~x O.502 + ~X O.752 ] _ (0.5?
15 15 15
= [0.0125 + 0.15 + 0.1125] - (0.25) = 0.275 - 0.25 = 0.025.
Example 1.57.2. Consider a class of 16 students taking statistics course , and their
names , marks, and major subjects are given in the following table:
Solution. We have
T·.
hi~ ..;!!~
.~
I l''1am ::.:•• j§ •• c~n~iii!. ,;
Ruth 92 8464
Ryan 97 9409
Tim 68 4624
Raul 62 3844
Marla 97 9409
Erin 68 4624
Judy 76 5776
Troy 75 5625
Tara 51 2601
Lisa 94 8836
John 70 4900
Cher 89 7921
Lona 62 3844
Gina 63 3969
Jeff 48 2304
Sara 97 9409
I:.i t "' Sum;:I :· .L 95559 .
So our SRSWOR sample consists of four students = {Judy, Tara, Jeff, Ruth} .
Judy 76 5776
Tara 51 2601
Jeff 48 2304
Ruth 92 8464
Sum 267 19145
Thus
11 2
IYi
11
II = 4 , = 267 and IYi = 19145.
i =1 i=1
( b ) Sample mean :
11
Iy
- - = -267 = 66.75 (stati
- = -i-I
Yt . )
statistic
II 4
which is an estimate of popu lation mean.
N
Note that an estimator of population total Y = If; will be given by
i=\
I.Yi J2
( i=1
n 2
I Yi - -"'----~- 19145- (267f
2 i= 1 II _ _ _----""4_ = 440.91 (statistic).
Sy =
II-I 4-1
(e) An estimator of the variance of the estimator of the population mean is
;;(Y/) = (N -IIJs 2 = (16-4JX440.91 = 82.67 (statistic).
Nil y 16x4
( f) Here 95% confidence interval is given by
Yt ± I.96~V(yt), or 66.75± I.96~52.548, or 66.75 ± 14.20, or [52.55, 80.20] .
Yes, the true popul ation mean Y = 75.26 lies in the 95% confidence interval
estimate. The interpretation of95% confidence interval is that we are 95% sure that
the true mean lies in these two limits of this interval estimate. Note that interval
estimate is a statistic.
46 Advanced sampling theory with applications
where la /2(df = n -1) = 10.025(df = 3) = 3.182 is taken from Table 2 of the Appendix.
Yes, again the true population mean lies in this 95% confidence interval and its
interpretation is same as above. Again note that interval estimate is a statistic.
3.
( a) Let us give upper case 'FLAG' of 1 to English majors and 0 to Math major
students in the whole population, then we have
o
2 R an Math o
3 Tim En lish
4 Raul Math o
5 Marla En lish
6 Erin Math o
7 Jud En \ish
8 Tro En lish
9 Tara Math o
10 Lisa Math o
11 John Math o
12 Cher En lish
13 Lona Math o
14 Gina Math o
15 Jeff Math o
16
Population Proportion:
N
L:FLAG i .. .
p= i=l = No.ofstudents wIth enghsh maJor =~=O .3125 (parameter).
N Total No.of Students 16
Chapter I: Basic concepts and mathematical notation 47
( b ) Let us now give the same lower case ' flag' to students in the sample .
r-:
:', "
Judy English I
Tara Math 0
Jeff Math 0
Ruth Math 0
:< J/>",Attt :,:;Y i:":, ~ UIIl'
I :~ ' " .
, ' '<> >' "/
Note that a proportion can never be negative, so lower limit has been changed to O.
Caution! It must be noted that we have here a very small sample, but in practice
when we deal with the problem of estimation of proportion, the minimum sample
size of 30 units is recommended from large populations. Note that instead of using
'FLAG' or 'flag' , sometimes we assign codes 0 or I directly to the variable Yor
X.
f §~l11ple '.
~Nu~ber '
Cher, John, Marla, Sara 0.25
2 Erin, Jud ,Raul, Tara 0.25
3 Gina, Lisa, Ruth, Tim 0.25
4 Jeff, Lona, Ran, Tro 0.25
48 Advanced sampling theory with applications
161.03
127.91
13.61
25.60
., 328.18
Chapter I: Basic concepts and mathematical notation 49
(_)
MSE Yt = ~
L. Pt Yt - Y {_ -}2 =-1 x328 .18=82.045 .
t= \ 4
where Y = 75.56 .
Thus we have
_ 13 _ 1
E(Yt ) = 'LPtYt = - x962 .3 = 74.02 ,
t= 1 13
and
B(Yt) = E(yt )- Y = 74.02 -75.56 = - 1.54 ,
V(Yt) = I Pt {Yt - E(Yt)}2 = ~13 x 237.8077 = 18.2929,
t=\
and
MSE(Yt) = Ipt ~t - Y}2 = ~ x 268.76956 = 20.675.
t =\ 13
Although John ' s samp ling scheme is less biased, it has too much mean square error
compared to Mike's sampling scheme . Thus we shall prefer Mike' s sampl ing
scheme over John's sampling scheme. Also note that the relative efficiency of
Mike' s sampl ing scheme over John 's sampling scheme is given by
MSE(- )
RE = Yt John X 100 = 82.045 X 100 = 396.83% .
MSE(Yt )Mike 20.675
Thus one can say that Mike's sampling plan is almost four times more efficient than
John 's sampl ing scheme.
50 Advanced sampling theory with applications
1.58 RELATIVESTANDARDERROR
whe re Rv(e)=v(e)/[E(e)f denotes the relative variance of the estim ator e. The
another famous name for relative standard error is coefficient of variation .
1.59.AUXILIARYINFORMATION··.
o o o
00 o
o
o o o
o o
o
x X
y P x)' = + 1 Y Px)' = - 1
x X
000
o o
o o
x X
Note that a similar scatter plot can be made from sample values to find the sign of
sample correlation coefficient rty .
( C) The population regression coefficient of X on Y is defined as
,B = Cov(X,Y)/V(X). (1.59.5)
For simple random sampling, it is given by
,B=SXy /S; . (1.59.6)
A biased estimator of f3 is given by
b= sxy /s; (1.59 .7)
which in fact represents a change in the study variable Y with a unit change in the
auxiliary variable X . Note that sign of ,B (orb) is same as that of PXy(or rxy ) .
From the above table, I(Y; - rl = 88, I {x; - xl = 52 and ~(y; - rXx; - x)= 65,
;;) ;;\ ;=\
so that
2 2
2 I N( -) 88 2 I N( - ) 52
Sy = - - L Y; - Y = - - = 22 , S x = - - LX; - X = - = 13 ,
N - I ;=\ 5-1 N-I;=I 5-1
I N( 65 -X -) Sxy 16.25
Sxy =--LY;-Y X i - X = -=16.25 , Px y = g = r.;:;-;:;:=0.960 ,
N - I;=I 5-1 S2 S2 ,,13 x22
x y
Units
A
B
c
Sum
Continued .
54 Adva nced samp ling theory with applications
13.44 4.00
0.44 1.00
18.78 9.00
E(x) = 19 = X , that is, the sample mean x is unbiased for population mean of the
auxiliary variable;
E(s~)= 22.00 = s~, that is, the sample variance s~ is unbiased for population s~
of the study variable;
E(s;)= 13.00 = S; , that is, the sample variance s; is unbiased for population S;
of the auxiliary variable ;
E(sxy ) = 16.25 = S xy , that is, the sample covariance s xy is unbiased for population
S xy of both variables;
EVxy)= 0.964 7c- Pxy' that is, the sample rxy is biased for population Pxy' and
B~xy)= EVxy)- P xy = 0.964-0.960 = 0.004;
56 Advanced sampling theory with applicatio ns
and
E(b) = 1.4 l' 13, that is, the sample b is biased for the popu lation 13;
and
B(b)= E(b)- 13 = 1.40-1 .25 = 0.15.
( c) The covariance between ji and x is defined as:
Cov(y,x)= E[y - E(Y)Ix - E(x)] =E[y - fl:~ - x]=IPs~s -f Ixs - xl
s=1
Now we have
( d ) Now we have
N - n S = (5-3) xI6.25 = 2.16667.
Nn ~ 5 x3
Thus we have
_ _) N- n (1- j)
COy (y , x =- - S ty = - -S,y, where j = niN .
Nn n
If x and y are two random variable and c and d are two real constants, then
(a) v(ex) = e 2V(x} (1.60. 1)
II
( C) If x = IXi , where the x; are also random variables, then we have
i= l
II
( d ) If x = I Cixi , where C; are real constants, then we have
i= l
V(X) = V ( ;~
" C;X; ) = ;~"C;2V(X;) .
II n
( e ) If x = I Cixi and Y = L d;y;, where Ci and d i are real con stant s, then we have
i= \ ;=1
The se are param eters which dea l with arrangi ng the data in ascendi ng or descen ding
order, and we introdu ce a few of them here as follows:
It is a measure which divides the popul ation into exa ctly two eq ua l parts, and it is
denoted by M y . Its analogo us from the sample is ca lled sample median , and is
denoted by if y' A pictorial repr esentation is given below:
/ Minimum )
\ Value
( ii ) If the sample size I is even, then the average of the values at the (%}h and
(%+ I}h positions from ordered data are called sampl e median . As an illustration,
consider a sample con sisting of 11= 6 (even) observations as 50, 90, 30, 60, 70 and
20. First step is to arrange the data in ascending order as: 20, 30, 50, 60, 70, 90.
Th e second step is to pick up two values: one at ( %}h = (%}h = 3rd position = 50 ,
and seco nd at (% + I}h = ( %+ I}h = 4th position = 60 . Then the average of these
These are three measures which divide the popul ation into four equal parts. The /11
quartile is represe nted by Qj, i = 1,2,3. A pictori al representation is give n below:
Minimu Maximum
Value Value
Note that the second quartile Q2 is a median. The first quartile QI is a median of
the data less than or equal to the second quartile Q2, and third quartile Q3 is the
median of the data more than or equal to the second quartile Q2' Thus finding three
quartiles needs to find median three times from the given ordered data. The
population interquartile range is defined as: 0 = (Q3 - QI)' The sample analogous of
population quartiles are called sample quartiles and are denoted by Qi, i = 1,2,3 and
sample interquartile range is defined as: <3 = (Q3 - QI)' which is a measure of
variation in the data set.
1.61.3 ~OI'JILATION.PERCENl'ILES
These are 99 measures, which divide the population into equal 100 parts. The { Ii
population percentile is represented by 11, i = 1,2,.... ,99 and its pictorial
representation is given below:
1% 1% 1%
and its sample analogous is called sample mode and is denoted by ifO' As an
illustration, for the data set 60, 70, 30, 60, 30, 30, 80, 30, the mode value is 30,
because it occurred most frequently .
1.62 DEFINITION(S»OESIATISTICS
There are several definitions of statistics and we list a few of them are as follows :
A few people have the following types of views in their mind about statistics:
( a ) Statistics can prove anyth ing;
( b ) There are three types of lies --- lies, damned lies, and statistics;
( c ) Statistics are like clay of which one can make a God or devil as he/she pleases ;
( d ) It is only a tool , and cannot prove or disprove anyth ing.
It has scope in almost every kind of category we are divided in this world due to our
social setup , for example, Trade, Industry, Commerce, Economics, Biology,
Botany, Astronomy, Physics, Chemistry, Education, Medicine, Sociology,
Psychology, Religious studies , Meteorology, National defence, and Business:
Production, Sale, Purchage, Finance, Accounting, Quality control , etc..
EXERCISES
Exercise 1.1. Define the terms population, parameter, sample, and statistic.
Exercise 1.3. Describe the relationship between the variance and mean squared
error of an estimator. Hence deduc e the term relative efficiency.
Chapter 1: Basic concepts and mathematical notation 61
Exercise 1.4. You are required to plan a sample survey to study the environment
activities of a business in the United States . Suggest a suitable survey plan on the
following points : ( a ) sampling units; ( b ) sampling frame ; ( c ) method of
sampling; and ( d ) method of collecting information. Prepare a suitable
questionnaire which may be used to collect the required information.
Exercise 1.5. Define population, sampling unit and sampling frame for conducting
surveys on each of the following subjects. Mention other possible sampling units,
if any, in each case and discuss their relative merits .
( a ) Housing conditions in the United States.
( b ) Study of incidence of lung cancer and heart attacks in the United States .
(c) Measurement of the volume of timber available in the forests of Canberra .
( d ) Study of the birth rate in India.
( e ) Study of nutrient contents of food consumed by the residents of California.
( f) Labour manpower of large businesses in Canada .
( g ) Estimation of population density in India.
Exercise 1.8. Show that the sample variance s; = _1_ I(Yi -)if can be put in
n-I i=l
different ways as
N _ N
N
"y J2
S;=_I_[Ir/ _Ny2]= _1_ Iy;2_~
( L../
N - 1 i~ \ N -1 i ~\ N N(N -I)
can be written as
Exercise 1.10. Construct a sample space and tree diagram for each one of the
following situations:
( a) Toss a fair coin; ( b ) Toss two fair coins ; ( c ) Toss a fair die; ( d ) Toss a fair
coin and a fair die; (e) Toss two fair dice ; and (f) Toss a fair die and a fair coin .
Exercise 1.11. State what type of variable each of the following is. If a variable is
quantitative, say whether it is discrete or continuous; and if the variable is
qualitative say whether it is nominal or ordinal.
I Religious preference.
2 Amount of water in a glass.
3 Master card number.
4 Number of students in a class of 32 who turn in assignments on time.
5 Brand of personal computer.
6 Amount of fluid dispensed by a machine used to fill cups with chocolate.
7 Number of graduate applications in statistics each year at the SCSU .
8 Amount of time required to drive a car for 35 miles.
9 Room temperature recorded every half hour.
10 Weight ofletters to be mailed .
11 Taste of milk.
12 Occup ation list.
13 Coded numbers to different colors, e.g., Red--l , Green--2, and Pink--3 .
14 Average daily low temperature per year in the St. Cloud city.
15 Nat ional ity of the students in your University.
16 Phone number.
17 Rent paid by the tenant.
18 Frog Jump in ems .
19 Colors of marbles .
Chapter 1: Basic concepts and mathematical notation 63
PRACTICAL PROBLEMS
Practical 1.1. From a population of size 5 how many samples of size 2 can be
drawn by using ( a ) SRSWR and ( b ) SRSWOR sampling?
Practical 1.2. Mr. Bean selects all poss ible samples of two units from a population
consisting offour units viz. 10, 15, 20, 25 by using SRSWOR sampling. He noted
that the harmonic mean of this population is given by
The total number of possible samples =N CI/= 4CZ = 6 and these samples are given by
(10, 15), (10, 20), (10, 25), (15, 20), (15, 25) and (20, 25) .
The harmonic means for these samples are, respectively, are
H,=n
, / I1/ -=21 y{-10II}
i~' Yi 15
1 y{ -10II}
+ - =12, ' / I -=2
Hz=n +- =13 .33333
1/
20
i~ IYi
Mr. Bean took the harmonic mean of these six sample harmonic means, as follows :
64 Advanced sampling theory with applications
HM = _ 6 _= 6
6 1
- ,- {-1 + 1 + 1 + 1 +- I-+I }
i~IHi 12 13.33333 14.28571 17.14286 18.75 22.22222
= 15.58442 = H y .
(a) Sample harmonic mean is an unb iased estimator of population harmonic mean.
E(iI)=ts(nJ ~) = Hv :
I
1=1 HI
. .
Hmt. Expected value . E HI =
. (,)
z:
s(o ) '
PI H I
.
with PI
1 ( ) ( L N
=- '<f t = 1,2,oo ., s nand s n F ell '
1=1 6
( c ) Find the bias, var iance, and mean square error in the estimator ill .
( d ) Does the relation MSE(ill)= V (ill)+ {s(il l )}2 hold?
Practical 1.3. Suppose that a population consists of 5 units given by : 10, 15, 20, 25,
and 30 .Select all possible samples of 3 units using SRSWR and SRSWOR
sampling.
( a ) Show that the sample mean is an unbiased estimator of population mean in
each case.
( b ) The sample var iance is unbiased estimator of the population variance under
SRSWR sampling, and for population mean squared error under SRSWOR
sampling.
( c ) Also plot the sampl ing distribution of sample mean and sample variance in
each situation.
( d ) Find the variance of sample mean under SRSWOR sampling using the
definition of variance? Show all steps .
( e ) Also compute ( N~ n )s; and comment on it.
r
Practical 1.4. Repeat Mr. Bean's exercise with the geometric mean (OM) and
comment on the results.
(}~l/i
n
P r actical 1.6. Suppose an urn contains N baIls of which Np are black and Nq are
white so that p + q = 1. The probability that if n baIls are drawn (without
replacement), exactly x of them will be black, is given by
such that 0:0; x :0; Np; and 0:0; n - x :0; Nq . Using the concept of c.d.f., select a
sample of three units by using without replacement samp ling.
Hi nt : Hypergeometric distrib ution .
Use the first 6 col umns multiplied by 10- 6 as the values of the cumulative
distribution funct ion (c.d.f.) F(x) of the random variable x , and select a random
samp le of IS units by using with replacement sampling.
Hint: F(x) = 100+tanHF(x) -0.5)].
66 Advanced sampling theory with applications
rl,
population is given by
with tJ = 100 and a = 2.5 .Use the first 6 columns multiplied by 10- 6 as the values
of the cumulative distribution function (c.d .f.) F(x) of the random variable x, and
select a random sample of 15 units by using with replacement sampling.
Hint: x = tJ+Z() and z - N(O,I).
Practical 1.12. In the hope of preventing ecological damage from oil spills, a
biochemical company is developing an enzyme to break up oil into less harmful
chemicals. The table below shows the time it took for the enzyme to break up oil
samples at different temperatures. The researcher plans to use these data in
statistical analysis:
( a ) If you are a consultant which variable you will consider dependent and
independent? Denote your dependent variable by Y and independent variable
with X .
( b ) Assuming that these six observations form a population, compute the following
parameters:
- - 2 2 _ Sy _ St _ Sxy _ Sty _ Cy
Y ,X, Sy,Sx' Sy,Sx,Cy-~,Ct-~,Sxy,Px
y--- ,f3--2 andK-pxy- .
Y X SxSy c, s;
( g ) Con struct 95% confidence interval assuming that population mean square
is unknown and sample size in small. Doe s the population mean falls in it?
Interpret it.
( h ) Find the variance of estimator of proportion of countries having suicide
rate more than 25%.
Practical 1.15. Consider a popul ation cons isting of the follow ing six units:
Assume that these 10 observations form a sample compute the following stat istic:
- ·, x- ·, s2y '. s2x '. S ' S . C'y -----=-
_ Sy . C' _
, x '" -=- , xy '.
Sx . S rxy -- Sxy . b_ s xy and
Y y ' x' -- , --
2
Y x SxSy Sx
Practical 1.17. The follow ing data show s the daily temp eratures in Ne w York over
a period of two weeks:
Chapter I: Basic concepts and mathematical notation 69
Find the following: sample size; sample mean; median; mode; first quartile ; second
quartile; third quartile; minimum value; maximum value; and interquartile range .
Practical 1.18. Construct scatter diagrams and find the linear correlation coefficient
in each one of the following five samples each of five units and comment on the
different situations will arise:
Practical 1.19. The following balloon is filled with five gases with their different
atomic number and atomic weights.
La a.a ._ • .L.a. a .LL .......
·
·
·
··
................ . ·-
70 Advanced sampling theory with applications
( a ) Find the average atomic weight of all the gases in the balloon;
( b ) Find the population variance 0- 2 of atomic weight of all the gases in the
balloon ;
( c ) Select all possible with replacement samples each consist ing of two gases;
( d ) Estimate the average atomic weight from each one of the 25 samples;
( e ) Construct a frequency distribution table of all poss ible sample means ;
( f) Construct an histogram. Is it symmetric?;
( g ) Find the expected value of all sample means of atomic weights from the
frequency distribution table you developed ;
( h ) Find the variance of all the sample means of atomic weights from the
frequency distribution table you developed.
Practical 1.20. Consider a sample Y \,Y2" "'Yn and let Y k and s; denote the sample
mean and variance, respectively, of the first k observat ions .
( a ) Show that
2 (k - J) 2 J ( - )2
s k+! = - - S k + - - Y k+! - Yk .
k k+l
( b ) Suppose that a sample of 15 observations has sample mean and a sample
standard deviation 12.60 and 0.50, respectively. If we consider 16th observation of
the data set as 10.2. What will be the values of the sample mean and sample
standard deviation for all 16 observations?
2. SIMPLE RANDOM SAMPLING
2:0 INTRODUCTION
Simple Random Sampling (SRS) is the simplest and most commo n method of
selecti ng a sample, in which the sample is selected unit by unit , with equa l
probability of selection for each unit at eac h draw. In other words, simple random
sampling is a method of selecting a sample s of II units from a popul ation n of
size N by giving equal prob abilit y of selection to all units. It is a sampling scheme
in whic h all po ssible combinations of II units may be formed from the popul ation
of N units with the same chance of selection.
As discussed in chapter I:
( a ) If a unit is selected, observed, and replaced in the popul ation before the next
draw is made and the procedure is repeated n times, it gives rise to a simple
rando m sample of II units. Thi s procedure is kno wn as simple rando m sampling
with replacement and is denoted as SRSW R.
( b ) If a unit is selected, observed , and not replaced in the popul ation befor e
makin g the next draw, and the procedure is repeated until n distin ct units are
select ed, ignoring all repetition s, it is called simple random sampling without
replac ement and is denoted by SRSWOR. Let us discuss the properties of the
estim ator s of population mean, variance, and proportion in each of these cases.
_ [I"] I"
£V,,) = £ - I Yi =- I £(yJ .
II i =1 II i=\
(2 .1.1)
Now Yi is a random variable and each unit has been selected by SR SWR sampling,
therefore Yi can take value s JI,Yz"" ' YN with prob abilities l/ V , l/ N, ...,!/N . By
the definition of the expected value we have
I N -
E(Yi) = - L}j = Y .
Ni=1
Thu s (2. 1.1) impl ies
II
[I
E(YII ) =-InL - NLYi] =- LY
n
=Y .
i= 1 N i= l
I
Ili=l
c: -
(2.1.2)
Proof. We have
E(YII) = E[NYn ]= NE(Yn) =NY =Y (2.1.3)
which proves corollary.
Theor em 2.1.2. The varia nce of the estimator y" of the population mean Y is
- -I 2 2 -I N - -I N 2 -2
V(YII) = II O"y' whe re O"y = N i~I(}j - Y)2 = N [ i~l}j - NY ] (2 .1.4)
V(Yn) = V( -I LYi
II J=2I LV(Yi)
II
· (2. 1.5)
II i=1 II i=1
By the defin ition of var iance we have
V(Yi)=E[Yi - E(Yi)]2 = E~l )- {E(Yi )}2 =.l. ~ Y? _ y 2
N i=l
=-I [NL}j2 - NY
-2 ] =-I N(
L }j - -\2
YJ = 0"Y2 .
N i=1 N i=1
Using (2 .1.5) we have V(y,,) = O"}/11 . Hence the theorem.
where
2 I n _ I n 2 -2
Sy =- - L(Yi -Yn)2 = - - [ LYi -llYn ] ·
I/-li=l II- I i= 1
Note that
Chapter 2: Simple Random Sampling 73
2
E~,~ )= VVn )+y2= cry + y2
n
and
Es[y2]=E[1
- - (n'IYi2- nYn2)] =-- n 2-nYn2] =--
1 E[ 'IYi n 2-nYn2]
1 E[n- 'IYi
n- 1 i= \ n-I i= \ n- 1 n i=1
[1
_- -n- - 'In El)'i
n - 1 1/ i=1
(-2)~ -_ - n- [1"(1
( 2)- El)'n NY;2J - [cr;
- L- L
1/ - 1
- +Y-2)]
1/ i=\ N i= 1 1/
=_'_
'
1/ -
[J.- 2: Y/ - y2_cr; ]=_'_
1 N i=\
' [cr~ - cr.~ )
n- 1 )
1/ 1/
= cr 2.
)
E[v(y,J= ~E(s;)
n
= cr;n = V(y,,) .
Hence the theorem.
Corolla ry 2.1.2. The variance of the estimator Yn = NYn of the popul ation total is
V(y,,) = N2V(y,,) .
T heorem 2.1.4 . Unde r SRSWR sampling, while estimating population mean (or
total) , the minimum sample size with minimum relati ve standard error (RSE) equal
1
to ¢ , is given by
n ~ [;,; , (218)
Proof. The relat ive standard error of the estimator Y" is given by
We need an estimator Yll such that RSE(y,, ) ~ rjJ , which implies that
2 2
cr; /~I P) ~rjJ, or cr! 2 ~rjJ2, or n e ;~2 '
1/Y rjJ Y
Hence the theorem.
Remark J.
2.1: If rjJ =( YZ:/2 with = Za/2 e j; then p[I(YIl; y)! ~ e) = 1- a.
Example 2.1.1. In 1995, a fisherman selected an SRSWR sample of six kinds of
fish out of 69 kind s of fish ava ilable at Atlantic and Gul f Coasts as give n below :
13859 192071881
2 3489 12173121
3 2319 5377761
4 3688 13601344
5 16238 263672644
6 3688 13601344
.Sum 500498095'
V(Yn) = s;
= 37658120.3 = 6276353.38.
n 6
Using Table 2 from the Appendix the 95% confidence interval for the average
number of fish is given by
Yn± (O.05/2(df = 6 -1)Jv(Yn) , or 7213.5 ± 2.571.J6276353.38, or [772.46, 13654.53] .
Example 2.1.2. We wish to estimate the average number of fish in each one of the
species groups caught by marine recreational fishermen at the Atlantic and Gulf
coasts. There are 69 species groups caught during 1995 as shown in the population
4 in the Appendix. What is the minimum number of species groups to be selected
by SRSWR sampling to attain the accuracy of relative standard error 30%?
Given: sJ; = 37199578 and Y = 311528 .
Thus a sample of size n = 20 units is required to attain 30% relative standard error
of the estimator of population mean under SRSWR sampling.
Example 2.1.3. Select an SRSWR sample of twenty units from population 4 given
in the Appendix . Collect the information on the number of fish during 1995 in each
of the species group selected in the sample. Estimate the average number of fish in
each one of the species groups caught by marine recreational fishermen at Atlantic
and Gulf coasts during 1995. Construct the 95% confidence interval for the average
number of fish in each species group available in the United States.
Solution. The population size is N = 69, thus we used the first two columns of the
Pseudo-Random Number (PRN) Table 1 given in the Appendix to select 20 random
numbers between 1 and 69. The random numbers so selected are 58, 60, 54, 01, 69,
62,23,64,46,04,32,47,57,56,57,60,33,05,22 and 38.
Example 2.1.4. The depth y of the roots of plants in a field is uniform ly distributed
between 5cm and 8cm with the probability density function
f(y) = -1 V5<y<8•
3
We wish to estimate the average length of roots of the plants with an accuracy of
relative standard error of 5%, what is the required minimum with replacement
sample size n?
Chapter 2: Simple Random Sampling 77
Example 2.1.5. The depth y of the roots of plants in a field is uniformly distributed
between 5cm and 8 em with the probability density function
A (I - a )100% confidence interval for the average depth of roots in the field is
Yn=+= ta/2(df = n -INv(y,J .
Using Table 2 from the Appendix the 95% confidence interval estimate of the
average depth of the roots is given by
Yn =+= to.025(df = 7 -I NV(Yn), or 6.8711+ 2.447~0. 1 309 , or [5.9857, 7.756] .
Theorem 2.1.5. The covar iance between two sample means Yn and xn under
SRSWR sampli ng is:
_ _) O'xy
C OY( Yn'Xn = -, (2.1.10)
n
where
0'xy = NI N -XXi - Xr) .
2: (Jj - Y
i= l
Proof. We have
- ,X-n) =C OY(I- In Yi,-InIXiJ= 2""
COY(Yn I ICOY(Yi
n
' X;) , (2.1.11)
n i=1 n i=l n i=l
Now
COY(Yi 'Xi )= E(YiXi) - E(Yi )E(Xi) . (2.1.12)
The random variable s (YiXi)' Yi and xi , respectively, can take anyone of the
value (JjXi ), Y.I and x.I for i = 1,2,...,N with probability 1/ N . Thus we have
I N
=-IYX- -X
Y I N(
- =-2: Y,. - -X -) =0'
y X -X .
N i= l l I N i=l 1 I xy
=_1_{
n i=!
IE(YiXi)-nE(Ynxn)} =
-1
_1_{i:...!.- IJiX
n -1 i=1 N i=1
i - n(Cov(Yn'xn)+ Y x)l
f
=_1_{~ ~}jXi _n(axy +y xJ} =_n_{...!.- ~}jXi -yx _axy}
n -1 N i=! n n -1 N i=l n
n- {O"xy---
O"x y}
=- =O"xy ·
n-l n
Hence the theorem.
_ -1 n
Theorem 2.2.1. The sample mean Y n =n L Yi is an unbiased estimator of the
i=1
- IN
population mean Y = N- Lli.
i=1
Proof. We have to show that E(Yn) = Y. It is interesting to note that this result can
be proved by using three different methods as shown below.
I 0 otherwise.
80 Adva nce d sampling theory with app lications
Note that Yi is a fixed value in the popul ation for the i''' un it, therefore, the
expec ted value of (2.2 . I) is give n by
Not e that (N- I)C(II_l) is the numb er of samples in which a given population unit can
occur out of all NC II SRSWOR samples, and therefore the prob ab ility that i'''
(N-I )C
pop ulation unit is selected in the sample is = N (II-I) =.!!...- So the random
CII N
variab le t i takes the value I with probability ~ and 0 with pr obab ility (1-~ ).
Thu s the expected value of t, is
_ [I I
Method II. We can also prove the same result as foIlows
In (2 .2.4) the sample value Yi is a random vari able and can take any population
value lj , i = 1,2,...,N , with probabil ity 1/N .
Thu s we have
1 N -
E (Yi) = - L}j = Y.
Ni~ 1
Method III. To prove the above result by another method, let us consi der
(YII)I = sample mean YI based on the tl" sample selected from the pop ulation.
Note that there are N C II possi ble samples, the probability of selecting the l" samp le
IS
PI = I/(NCII ).
By the defin ition of expec ted value, we have
Chapter 2: Simple Random Sampling 81
L LY; J
C
=
I
N
N " ( ,,
n( C,,) 1;1 ;;1 1
Sam Ie no 2 3 4 5 6
Sampleduni ( A ,B) (A,C) (A,D) (B,C) (B,D) (C,D)
PopQla@nuilit~J. (}] , Yz ) (}], Y3) (}], Y4 ) (Yz , Y3) (Yz , Y4 ) ( Y3 , Y4 )
The values of the units in the sample in all these cases are YI and yz.
Thus we have
=~+Yz~+~+Yz~+~+Yz~+~+Yz~+~+Yz\+~+Yz~
= (}] + Yz)+(}] + Y3)+ (}] + Y4)+ (Yz + Y3)+ (Yz + Y4)+ (Y3 + Y4 )
= f {(N-I)q,,_I)}(Ji) .
;;1
Theorem 2.2.2. The probability for any population unit to get selected in the
sample at any particular draw is equivalent to inverse of the population size, that is,
Probabil ity of select ing the i1h unit in a sample = ~. (2.2.5)
N
82 Adva nced sampling theory with applications
Proof. Let us consider that at the r'" draw, the i''' popul ation unit Yi is se lected. Th is
is poss ible only if this unit has not been selec ted in the previous (r- I) draws . Let
us now consider the draws one by one.
First draw: The probab ility for the particular unit' }j , to get selected on the first
draw out of N units is = 1/N . Note that ' the probability ' that }j is not selected on
the first draw, from a popu lation of N units is = {1- 1/N } = (N - 1)/ N .
Second draw: The probabil ity that a particular unit is selected on the second dra w
(if it is not already selected on the first draw) is the product of two prob ab ilities,
namely
(Probability that }j is not selected on the first draw) x (Probability that }j is
selec ted on the second draw)
Therefore the prob ab ility that }j is selected on the seco nd draw is equal to
(N - I ) 1 1
- N - x(N_ I) = N
Note that the probabil ity the }j is not selecte d on the seco nd draw out of the
remai ning (N -I) popul ation units is equal to
1--- =--
1
(N - I)
N- 2
N- l '
Third draw : The pro babi lity that a particular popu lation unit is se lected on the third
draw (if it is not selected on the seco nd draw) is the product of three probabilities .
Probabi lity that }j is selected on the third draw (if it is not selected on first or
seco nd draw) is a prod uct of three probabilities as
(Probability that }j is not selected on first draw) x (Probab ility that }j is not
selected on second draw) x (Probability that }j is selected on the third draw)
= (~) x (~) x _1
N N- I N-2
=J.-.
N
Note that the prob ability that Y; is not selected on the third draw out of (N - 2)
population units is equal to
1-- -=-- .
1
N- 2
N -3
N- 2
Th is procedure continues up to (r - I) draws.
rth draw: Prob abil ity that }j is not selected up to (r - I) th draw is given by
- - x---x x
(N- I)
N
(N -2)
(N- I)
N- (r - I)
N- (r - 2)
=-- -
N- r+ 1
N
Probability that }j is selected at ,-t" draw [ass uming that it is not selected at any of
the prev ious (r - I) draws] is equa l to
Chapter 2: Simple Random Sampling 83
I I
N -(r-I) = N -r+1 .
So we obtain the probability of a particular unit Y; to get selected at the l' draw is
(N - r + I)
-'-----'- x
I
-
I
N (N -r+l) N
Hence the theorem.
where S'[;2 = -I - N(
2: Yi - Y-)2 and f = n/ N denote the finite population correction
N-I i=l
factor (f.p .c.).
Proof. We have
where Ii is a random variable that takes the value ' 1' if the {" unit is included in the
sample, otherwise it takes the value O. Note that the Jj is fixed for the {" un it in the
population we have
In (2 .2.8) we need to determine the V(/J and CovV; , Ii). Note that the distributions
of t, and 11 are
I with probability n] N, and 12 = {I with probability (n/ N),
t, = { 0 with probability 1- (n/ N), , 0 with probability {I - (n/ N)}.
We have
vk) = E[/i - E(/iW= E~l)- {E(/i )}2
(2.2 .9)
The probability that both {" and r units are included in the sample IS
(N-2)C(n_2)
N =
n(n
(
- I)) an d ot herwise the pro babilitv
0 i
I ity IS 1-
n(n
(
-I)) ,there f ore
ell N N-I N N-I
84 Advanced sampling theory with app lications
I
I with probability /1((/1 - I)) .
N N- I
I;lj = 0 with probability { I - ~~:;_I?J
Now we have
COV/i,l ( )- E()
( j )=EI;lj /1(/1 - I)
li El( j ) = N(N - I)
(/IN )( N/I ) = Nn [ N- 1I- N
Il - Il ]
VCYII)=~[~r;2{/I(N~n)}+
n N i=\
~j=\ r;Yj{-~(N-n)}]
N (N- I) i",
= (N-n)[~f2
c:
__I_ c: ~ fY .] (2.2 .11)
nN2 i=\ (N-I) i", j=\ I J '
Note that
-2 1 N 1 N 2 N
Y (N i=1I: YiJ2 = -N2 [ i=1I: Yi
= - + I:
ioto )= \
YiY) ] , (2.2 .12)
we obtain
N 2 -2 N 2
I r;Yj =NY -I r; ·
i"' j=\ i=1
On substituting (2.2.12) in (2.2.11) we obtain
VCYIl )= (N
fiN
-;1)[~r;2 __
(N
I_( N2y2 - ~ r;2J]
i= 1 i=1 -I )
=(N
/IN
-;1 )[(1 + _I_)~
N- I
r;2_ ~ y2]
N-I i= 1
where
Sy
2=-( -1) [N
Ir;2- NY-2].
N- I i= 1
Now
E(s~)= E[_I{Iyr
n - I i= 1
-ny;}] =_1 [E( IYlJ -nE~; )l
n -I i= 1 J
= n~I[~i~/~l)- E~;)]. (2.2.15)
Not e that E~,~ )= V(Yn )+ {E(Yn)}2 = N- n s; + y2and each unit }j becomes selected
Nn
with probabil ity 1/ N , thus (2.2. I5) becomes
Theorem 2.2.5. Under SRSWOR sampling, while estimating population mean (or
total), the minimum sample size with minimum relative standard error (RSE) equal
to ¢ , is
(2.2.17)
Note that we need an estimator Yll such that RSE(y,, ) 5, ¢ , which implies that
- 1
( ~ - ~)
II
~}
N y2
5, ¢, or (~" - ~) 5-
N y2
5, ¢2, or
1 ¢ 2 y2
n >: [ N + s;]
Rem ark 2.2: If ¢ = (YZ:/2) ,with e=Za/2 (I ~/\~ then p(! (Y"i Y)I : ; e]=I-a .
Exa mple 2.2.2 . A fishermen recruiting company, XYZ , sele cted an SRSWOR
samp le of six kinds of fish out of 69 kinds of fish avai lable at Atlantic and Gulf
Coasts as be low :
Samp le 2
Yi Yi
Unit
I 16855 284091025
2 10940 119683600
3 4793 22972849
4 2146 4605316
5 3816 14561856
6 935 874225
Sum 39485 446788871
s; = _I_!IYT _
n - 1 i=l
n- 1( IYi)2) = _6 -I1_[ 446788871- (39485)2
i=l 6
j
= 37388933.4 .
Thus
v(Yn) = (I~/ }; = ( 1- 0;869 ) x 37388933.4 = 5689972.5 .
Using Tabl e 2 from the Appendix the 95% confidence interval for the average
number of fish is given by
Y,, ±lO.02S(df=6-I)Jv(y,,), or 6580.83±2.5nJ5689972.5 , or [448.05,12713 .61] .
( c ) An estimate of total number of fish is given by
y = Ny" = 69 x 6580 .83 = 454077.27 .
( d ) The 95% confidence interval for the total number of fish is given by
N x [448.05,1 2713.6 1], or 69 x [448.05,12713.61] , or [30915.45, 877239.09] .
Exa mple 2.2.3. We wish to estimate the average number of fish in each one of the
species groups caught by marine recreational fishermen at the Atlantic and Gulf
coasts. There were 69 species groups caught during 1995 as shown in the
popul ation 4 in the Appendix. What is the minimum numbe r of species groups to be
selected by SRSWOR sampling to attain the accuracy of relative standard error
30% ?
Gi ven: s; = 3719957 8 and Y = 311528.
n ';? [~ + ¢2Y2]-1 = [~+ 0.32 x4514 .8982 j-l= 15.6 0::: 16.
N S2y 69 37199578
Thus a minimum samp le of size n = 16 units is required to attain 30% relat ive
standard error of the estimator of population tota l or mean under SRSWO R
samplin g.
Solution. The population size is N = 69, therefo re we used the secon d and third
columns of the Pseudo-Random Number (PRN) Table I given in the Appen dix to
select 16 random numbers between 1 and 69. The random numbers so selected are
01,49,25, 14,2~36,42 ,44,65 ,2~4~66, 17, 08, 33, and 53.
.y
~i ~ Yn ~ ,;,
No. ' ~Jt~"'" '~;
".
01 Sharks, other 20 16 -937.8 130 879492.2852
08 Toadfishes 1632 -1321.8100 1747188 .2850
14 Scu lpins 71 -2882 .8 100 83 10607.9100
17 Temperate basses, other 23 -2930.8 100 858966 1.9100
20 Sea basses, other 2068 -885 .8 130 784663 .7852
25 Florida pompano 644 -2309.8 100 5335233.7850
26 Jacks, other 1625 -1328.8 100 1765742. 6600
33 Snappers, other 492 -2461.8100 6060 520.7850
36 Grunts, other 3379 425. 1875 180784.4102
40 Red porgy 230 -2723 .8 100 74 19154.5350
42 Spotted seatro ut 246 15 21661.1900 469207043.9000
44 Sand seatrout 4355 1401.1880 1963326.4 100
49 Black drum 1595 -1358.8 100 184637 1.4 100
53 Barracuda 908 -2045.8 100 4185348.7850
65 Winter flounder 2324 -629.8130 396663 .7852
66 Flounders, other 1284 -1669 .8100 2788273.7850
.,.il< ,""",0 '.'\\ Sum ~ 47261
" 0.0000 52 1460078.4000
An estimate of the average number of fish in each species group during 1995 is
Yn =.!- fYi = 47261 = 2953.813 .
n i=l 16
Now
s; =-n-I-l f (vi - Yn)2 = 521460078.4 = 34764005.23.
16- 1
i=l
and the estimate of variance of the estimator Yn is
v(Yn) = C~f }; =C-:~69 Jx 34764005.23 =1668924.16.
A (I - a)I00% confidence interval for the average number of fish in each one of the
species grou ps caught during 1995 by marine recreational fisherme n in the United
States is
Yn+(a/2(df = n-1)Jv(Yn ).
Using Tab le 2 from the Appendix the 95% confidence interval is given by
Exam ple 2.2.5 . The distribution of yield (kglha) y of a crop in 1000 plots has a
Cauchy distribution:
1
f (y) = i }, - 00 < y < + 00 .
ll"ll+(y-IO)2
We wish to estimate the average yield with an accura cy of relati ve standard error of
0.15%. What is the minimum sample size 11 requ ired while using SRSWOR
sampling?
Solution. Since the true mean and variance of a variable having Cauch y distribution
are unknown, therefore it is not possible to find the required sample size under such
a distribution.
Exam ple 2.2.6 . The distribution of yield (kg/ha) y of a crop in 1000 plots has a
logistic distribution
f(y )= _1 sech2{~(~)}
4/3 2 /3.
with a. = 40 and /3. = 2.5.
( a ) Find the value of minimum sample size 11 required to estimate ave rage yield
with an accuracy of standard error of 5%
( b ) Select a sampl e of the required size and construct 95% confidence interv al for
the average yield .
( c ) Does the true average yield lies in the 95% confidence interval?
Solution. ( a ) We know that the mean and variance of a logistic distribution are
given by
Mean = a . = 40
and
.
Variance = O"y =
2 /3.2- ll"2
= 2.5 x
2 3.14159 2
= 20.56.
3 3
Also we are given N = 1000 thus
2 N 2 1000
S y = - - 0" Y = -
- - x 20.56 = 20.5806 .
N- I 1000-1
Thu s the minimum sample size required for 1ft = 0.05 is given by
2 2j-l
n e - I + -1ft2f2j
-
-l [ I
= - - +
0.05 x40
=5 .11 ,:::5.
[N S2
y
1000 20.5806
We know that the cumul ative distribution function for the logistic distribution is
Using the last three columns of the Pseudo-Random Number (PRN) Table I given
in the Appendix, multiplied by 10-3, we obtain five values of F(y) and the
corresponding values ofy as given below:
h- =..!.-~
L~
. = 195.375 =39075
. .
n 1=1 5
We use the alternative method to find s; given by
s; = _1_[±Y1_
n -I
ny; ] = _1_[7661.202 - 5 x39.075
1=1 5-1
2]= 6.7309 ,
Using Table 2 from the Appendix the 95% confidence interval is given by
( c ) Yes, the resultant 95% confidence interval estimate contains the true average
yield a. = 40.
Theorem 2.2.6. The covariance between the two sample means Yn and xn under
a,
SRSWOR sampling is:
- -) (1-
Cov (Yn' X n = - n - xy ' (2.2.18)
where
S
xy
=_l_~(y
N-l1:J
-YXx. -x).
I I
Proof. We have
Chapter 2: Simple Random Sampling 91
__
Cov(Y", x,,) = COY(I"
- LY; , -I"
II ;=1
L·r; )= COY(-IIIN;=1
II ;= 1
LI;Y; , -III ;=LI;X;
N )
1
(2.2. 19)
where I; is a rand om variable that takes the value' I ' if the { II unit is included in the
sample, otherwise it takes the value O. Note that the pair Y; and X ; is fixed for the
Cov(y",x,,) = ~COV(II;Y;,
II 1=1
IliX; ) = ~[E{II;Y;}{II;X;}
1=1 II 1=1 1=1
- E{~I;Y;}E{It;X;}]
1=1 1=1
In (2.2.20) we need to determine the E~?) and E(I;Ij)' Note that the distributions
of and are :
{I
I; I?
The probability that both {II and /" units are included in the sample IS
(N-2)
N q ,,-2) =
c,
()
t
- I ) and otherwise the probability is I
II
N N- I
lit
(
)
-I ) , therefore
N N- I
11(11 -1)
with probabi lity N (N _I)'
I;lj = 11
o wit hprobabi lity {I N(N-I) .
II(II-I) }
Now we have
11(11 -1) { 11(11-1) } = 11(11-1)
E(
1;1)
j = 1x (
N N- I
) +0 I
N(N-I ) N(N - I)' (2.2.22)
=1 n
- [{-LYX.+
N n(n -I) LN YX · } - {n
-LYN n N }]
}{ -LX.
n2 N i =1 1 1 N(N -IL;< j=\ t) N i =\ 1 N i = 1 1 . (2.2.23)
Note that
= ~[~{~}
n N N- 1
~ Y;Xi - Y x {ntN-- I))}]
i =\
=(-
n
f)
N (Y;-Y Xi-X = -
I - - -1- I
N - I i=\
1- - St,. -x -) ( f) n }
Hence the theorem.
COV(YII' C
xn )= ~f}XY (2 .2.24)
where
Sxy = -1-[fYiXi-nynXn].
n-I i=1
Proof. We have to show that
E[cov(Yn' xn)] = Cov(Yn ' x,J.
We have
E[cov(Yn , Xn)]=E[l~f Sty] = I~f E(Sty)
Now
n[1 1I
= - - - L E(y;x; )-
__ ] .
E(Yllx
/I-I /I ; ;1
lI)
Now
E(YllxlI) = COV(YII' xlI )+ E(YII)E(xlI) = N - n Sxy + YX ,
Nil
and each pair of units >j and X; gets selected with probability 1/ N , therefore we
have
(N
E (Sty ) =-/I- [ -I L11 L - I >jX;) - - --/I) {(N
- X
Sty + Y - }]
/I -I /I;;) ;;\ N Nil
= _II_[~ ~ YX - Y X _ (N - 11 )s ]
11 - I N ;;) I I Nil xy
= _
I II_[N
-I N-I S ty _ (NNil-11) S ty] = S.w '
.
Thus we obtain
E[cov(YII ' XII)] = Cov(y,I' XII )'
Hence the theorem.
Ex am ple 2.2.7. Consider the joint proba bility densi ty function of two continuous
()l
random variables x and y is
f x,y =
~(x
3
+ 2y) 0 < x < I, 0 < y < I,
o otherwise.
ea ) Select six pairs of observations (y, x) by using the Random Number Table
method .
eb ) Estimate the value of covariance between x and y .
Solution . e a ) See Chapter I.
( b ) Estimate of covariance:
R Y R
2 I X
0.992 0.995 0.622 0.423
0.588 0.722 0.771 0.514
0.601 0.732 0.917 0.600
0.549 0.69 1 0.675 0.456
0.925 0.954 0.534 0.368
0.0 14 0.039 0.5 13 0.355
94 Advanced samp ling theory with applications
So we obtain
y "'~ (~, - y-) .'! 'l' X"c< ..' Ii" (x -x) (y - jl)(X- x)
'"
0.995 0.306 167 0.423 -0.030 -0.009080
0.722 0.033 167 0.514 0.06 1 0.002034
0.732 0.043 167 0.600 0.147 0.006360
0.691 0.002167 0.456 0.003 0.000000
0.954 0.265167 0.368 -0.085 -0.022450
0.039 -0.649830 0.355 -0.098 0.063467
I.. Sum k'. 4.133 , "0.000000 ""2 :7 16 0.000 0.0403 35
Thus an estimate of the covaria nce betwee n two sample means is give n by
• (-Y n , X- n )
COY = ( -1- f- J = (-1- -f J 1
Sxy --L, ~ (Yi - Y-XXi- r)
X
n n n - I i= l
Let N be the total number of units in the population nand N a be the numb er of
units possessing a certain attribute, A (say). Then population proportion is the ratio
of number of units possessing the attribute A to the total number of units in the
popu lation, i.e., Py = Nal N . Thus we have the following theorem:
Theorem 2.3.1. The popu lation proporti on Py is a special case of the popu lation
mean Y.
We will discuss the problem of estimation of popu lation proportion using SRSWR
and SRSWOR samp ling.
Chapter 2: Simp le Rand om Sampling 95
C ase I. When the sample is drawn using simple random sampling wit h rep lacement
(S RSWR samp ling), we have the following theo rems .
a; = ~[ I Y;2 -Nf2 ].
N i=1
Note that
y, = {I if the;''' unit possesses the attribute A,
, 0 otherwise,
and
y2=
,
{I
0 otherwise.
if the ;''' unit possesses the attribute A,
So that
a~ = ~ [NA - N~~] = '; -~~ = ~.(l -~\,)= PyQy '
Th us we have
96 Advanced sampling theory with applications
V(Py)= PyQy .
n
Hence the theorem.
Proof. We have to show that E[v(pJ = v(p)or in other words we have to show
that
E(PyqyJ= PyQy .
n-1 n
Now we know that s; /n is an unbiased estimator of a;/n .
Defining
y. =
I if the it" sampled unit E A,
an d y .2 = {1 if the /1 sampled unit E A,
1 { 0 otherwise, I 0 otherwise.
Hence we will obtain
2 1 [nLYi2-nYn-2] =--[r-npy
Sy =--
I r .2] n [r
= - - - - Py
_2] =--Py
n _ (1- Py
_) =npyqy
--.
n-1 i=l n-1 n-1 n n-1 n-I
So that
2
Sy = Pyqy
II II-I
Hence the theorem.
(2.3.4)
~ IIP
Qy ~ rjJ, or Qy s rjJ2, or
IIPy
11;:0: ;y .
Py
y rjJ
Hence the theorem.
Example 2.3.1. We wish to estimate the proportion of the number of fish in the
group Herring cau ght by marine recreational fish ermen at the Atlantic and Gul f
coasts. There are 30027 fish out of total 311,528 fish caught during 1995 as shown
in the population 4 in the Appendix. What is the minimum number of fish to be
selected by SRSWR sampling to atta in 5% relative standard error of the estimator
of population proportion ?
Solution. We ha ve
P , = 30027 = 0.0964 and Q), = 1- Py = 1- 0.0964 = 0.9036.
) 3 11528 '
Thus for rp = 0.05 , we have
II ;:>: Q / (rp2 P) =
0.9036 = 3749.4 '" 3750.
Y 0.052 x 0.0964
Y
Thus a minimum sample of size II = 3750 fish is required to attain 5% relati ve
standard error of the estimator of population proportion under SRSWR sampling.
Example 2.3.2. A fisherm an visited the Atlantic and Gulf coast and caught 4000
fish on e by one . He not ed the species group of each fish cau ght by him and put
back that fish in the sea before making the next catch. He ob served that 400 fish
belong to the group Herr ings .
( a ) Estimate the proportion of fish in the group Herrings livin g in the Atl an tic and
Gulf coast.
( b ) Co nstruct the 95 % confidence inter val.
Solution. W e are given 11 =4000 and r = 400 .
( a) An estimate of the proportion of the fish in the Herrings group is give n by
P =!- = 400 = 0.1.
y 1/ 4000
(b ) Under SRSWR sampling an estim ate of the v(p) is given by
v(p ,)= P/ly = 0.1x 0.9 = 2.2505 x 10- 5.
)
11 -1 4000 -1
A (1- a)100% confide nce interval for the true proportion Py is give n by
Py + Za/2 ~V( Py ) .
Thus the 95 % confidence interval for the proportion of fish belonging to the
Herrings group is given by
Py + 1.96~v( Py ) , or 0.1 +1.96b.2505 x 10- 5 , or [0.0907, 0.1092].
( a ) Select a sample of the required size, and estimate the proportion of plants with
height more than 15 cm.
( b ) Construct a 95% confidence interval estimate, assuming that your sample size
is large, and interpret your results.
Solution. We know that if y has uniform distribution function
1
f( y) = - \;f a <y <b •
b-a
Thus the proportion of plants with height more than 15cm is given by
20 20 1 1 5
Py = fJ(y}ty = f -dy
15
= -(20 -15) =- = 0.3333,
15 15 15 IS
and the variance
0"; = Py (1- py ) = 0.3333(1- 0.3333) = 0.2222 .
( a) We need ¢ = O.4S, thus the required minimum sample size is given by
19.31
0.183 7.75 o
0.448 11.72 o
0.171 7.57 o
0.567 13.51 o
0.737 16.06
0.856 17.84
0.233 8.50 o
0.895 18.43
0.263 8.95 o
Thus an estimate of the proportion Py is given by,
Chapter 2: Simple Random Sampling 99
Case II. When a samp le is drawn using SRSWO R sampling, we have the fol1owing
theorems.
Theorem 2.3.6. The unbiased estimator of the population proportion P; is given by
r
r- >;
A
V\YII (N-n)
(-:: ) =---Sy, where Sy = - I - 2 2 (NIf;2- NY-2J.
Nn N- I i=l
Again we define
y = {I if the ;ti' unit possesses the attribute A,
I ° otherwise ,
and
y'2 = {I if the /h unit possesses the attribute A,
I ° otherwise .
So
S2 = _I_(N _ Np2 )= ~(p _ p2 )= NPyQy = S2 .
y N- I A y N-I y y N- I P
Hence we have
v (p )=N-ns2=N- n x~PQv = (N -n)pQ "
y Nn P N/I N- 1 Y. n(N -1) y }
which proves the theorem.
N ow we k no w t h at (N-II)
- - -Sy2 .
IS an un b lase
i d estimator
esti 0 f -N-- II Sy2 .
Nil Nil
Cha ngi ng
y; = {I if the /" population unit A, and E y;2 = {I if the i~" poulation unit A, E
o otherwise, 0 otherwise,
mak es
(N- II)S2 = (N - Il) PQ
Nil y II(N - I) y r:
Similarly, if we make the ch anges
v. = {I if the /~ sampled unit E A, an d
2
y. ={I if the /" sampled unit E A,
o otherwise, I 0 otherwise,
then
-(N-II)
- - s2 _(-
N-Il
-JIl
- Pyqy
- _- (N-Il)
--P q
. .
Nil p - Nil (II-I) - N(II - I) y y'
Hen ce the theorem.
(2.3.6)
Note that we need an estimator Py suc h that RSE(p y ) ~ ¢, w hich implies that
(N
II
-IJ (N Q
y
-I )~,
<¢
- ,
Hen ce the theorem.
Chapter 2: Simple Random Sampling 101
Example 2.3.4. We wish to estimate the proportion of the number of fish in the
group Herrings caught by marine recreational fishermen at the Atlantic and Gulf
coasts . There are 30 ,027 fish out of total 311,528 of fish caught during 1995 as
shown in the population 4 in the Appendix . What is the minimum number of fish to
be selected by SRSWOR sampling to attain the accuracy of relative stand ard error
5%?
Solution. We have
P = 30027 = 0.0964 and Qy = 1- Py = 1- 0.0964 = 0.9036.
y 311528 '
Thus for ¢ = 0.05 , we have
ne 2( NQ y 311528 xO.9036 =3 704.8;::;3705.
¢ N -I)Py+Qy 0.052(311528-1)xO .0964 +0 .9036
Thus a minimum sample of size n = 3705 fish is required to attain 5% relative
standard error of the estimator of population proportion under SRSWOR sampling.
Example 2.3.5. A fisherman visited the Atlantic and Gulf coast and caught 4000
fish. He noted the species group of each fish caught by him . He observed that 400
fish belong to the group Herrings.
( a) Estimate the proportion of fish in the group Herrings living in the Atlantic and
Gulf coast.
( b ) Construct the 95% confidence interval.
Given: Total number of fish living in the coast = 311528.
Solution. We are given N = 311,528, n = 4,000 and r = 400 .
( a ) An estimate of the proportion of the fish in the Herrings group is
_ r 400
p = - = - - =0.1.
y n 4000
(b) Under SRSWOR sampling, an estimate of the V(Py) is given by
v(- ,) = (N -n)Pyqy = (311528-4000)x 0.l xO.9 =2.2216x10-5.
r, N n-1 311528 4000-1
A (1 - a)1 00% confidence interval for the true proportion Py is given by
Py + Za/2~V( Py ) .
Thus the 95% confidence interval for the proportion of fish belonging to Herrings
group is given by
py+1.96~~{ Py), or 0.1+1.96~2.2216x10-5, or [0.0908,0.1092].
Example 2.3.6. Ina field there are 1,000 plants and the distribution of their height
is given by the probability mass function
( a ) Select a random sample of n = 10 units and est imate the proportion of plants
with height more than or equa l to 225 ern.
( b ) Construct a 95% confidence interva l, assuming that it is a large sample.
Using the first three columns, multip lied by 10-3 , of the Pseudo-Random Number
(PRN) Table I given in the Appendix, we obtain the 10 values of F(Y) and y as:
F(y) ' Y
:
~ 225
yes-vL
+ .. '" " ,,
:no-~O :i
0.992 225 I
0.588 100 0
0.601 150 0
0.549 100 0
0.925 225 I
0.0 14 Discard this number
0.697 ISO 0
0.872 200 0
0.626 150 0
0.236 50 0
0.884 225 I
(a) An estimate of the proportion of plants with height more tha n 225 em is
Theorem 2.4.1. The minimum mean squared error of the estimator, Ysearl ' is
Min.MSE(Ysearl) = V(Yn)/ {I + V(Yn )/p}. (2.4 .2)
Proof. We have
MSE(Yscarl)= E~s - rf = E[AYn - rf = E[AYn -E(AYn)+ E(AYn)- rf
= E[A{Yn - E(Yn)} + AE(Yn)- Yf = E[A{Yn - E(Yn )} + (A -1)y]2
= ElA
2{Yn - E(Yn)f + (A -If y 2+ 2(A -1)YA{Yn - E(Yn )}j
.
Mm.MSE Ysearl
(_ ) = ,2 V(_)
Yn + (, - I)2-2
Y = {_
f4V(Yn) }2 + {y2
2 (_) - I}2 Y
-2
A A
2
y +V(Yn) Y +VYn
= y4 V(Yn)
{y2 + V(Yn)}
2 +{ y2 -t 2
Y + V(Yn)
- ~(Yn )}\2
y4V(Yn) + y2 {V(Yn )}2 = y2V(Yn ){y2 + V(Yn )}
{y2 + v(Yn)f {r 2+ V(Yn)f {y2 + V(Yn )}2
y2 V(Yn) V(Yn) (2.4.6)
y2+ V(Yn) I+V(Yn)/y 2'
Hence the theorem .
Theorem 2.4.2. Under SRSWR sampling, the minimum mean squared error of the
Searls' estimator is
Min.MSE(Ysearl) = n-10"; / {I + n -10"; / Y2}. (2.4.7)
Proof. Obvious from (2.4 .2) because under SRSWR sampling we have
V(Y-n )= n - I O"y2 .
104 Advanced sampling theory with appl ications
Theorem 2.4.3. The relati ve efficiency of the Searls' estimator Y searl with respect
to usual estimator Y", under SRSWR samplin g, is given by
RE =I + eT;' /~IY2 ) . (2.4.8)
Thu s the relat ive gain in the Searls' estimat or is inversely proportional to the
sample size, II. In other words, as /I ~ 00 , the value of RE ~ 1.
Proof. It follows from the definition of the relative efficiency. Note that the relative
efficiency of Searl s' estimator with respect to the usual estimator is given by
-
RE -
MSE(YIl) _
( ) - .
V(YIl )
( ) -
-
/I
-I 21eT y
II -leT;
I 2 2
)-1
MSE Ysea rl Mm.MSE Ysearl 1+ /1- Y a Y
- 1- 2 2
=1 +/1 YeT y . (2.4 .9)
Hence the theorem .
Theorem 2.4.4. Under SRSWOR sampling , the minimum mean squared error of
the Searls' estimator is
Theorem 2.4.5. The relative efficiency of the Searls' estimator Ysearl with respect
to Y/I , und er SRSWOR, is given by
1
RE = + (~ -
/I
J...)C
N
2
Y
.
Thu s the relative gain in efficienc y of the Searls ' estimator is inversely proportional
to the sampl e size II. In other word s, as /I ~ N the value of RE ~ 1.
M',.MSE(y" o" )
1 \
Y
/I N y2
2
= +1 (~II - J...)
N y2
::y =1+ (~II - J...)c
N Y
2
.
E xample 2.4.1. We wish to estimate the ave rage num ber of fish in eac h one of the
spec ies gro ups caught by marine recrea tional fishermen at the Atlantic and Gulf
coasts. Th ere were 69 species caught during 1995 as show n in the pop ulation 4 of
the Ap pen dix . We selecte d a sample of 20 units by SRSW R sampling . Wha t is the
gain in effic iency owed to the Searls' estimator over the sample mean?
Giv en: S; = 371995 78 and Y = 3 11528 .
Solution. We are given N = 69 , S; = 37199578 and Y = 311528 , thus
- Y 311528
Y =- =- - = 4514.898
N 69
and
0-Y2 = -(N-N-1)SY2 = -
(69 -1)
69
- x 37199578 = 36660453.68.
The relative efficiency of the Searls' estimator Ysearl with respect to usual estimator
Yn , under SRS WR, is given by
RE = [1 + 0-12 ] x l 00 = [ 1+ 36660453.68 ] x 100 = 108.99% .
lIy 20 x45 14.8982
Find the relative efficiency of Searls' estimator ove r the usual estimator based on a
sample of 5 or 20 units, respec tive ly.
Solution. We are given a = 200 and b = 500 , therefore the population mean
Searls ( 1967), Reddy ( 1978a) and Arn holt and Hebert ( 1995) studied the properties
of this estimator and found that it is useful if C y is large and sample size is small.
106 Advanced sampl ing theory with applications
We sha ll discuss the problem of estimatio n of finite popul ation mean and variance
by using only distinct units from the SRSW R sample. However, before goi ng
further we shall discuss some results, which will be helpful in de riving the result
from distinc t units for SRSW R sample. Basu ( 1958) introdu ced the concept of
sufficiency in sampling from finite populations. Acco rding to him , for every
orde red sample SO there ex ists an unordered sample s uo which is obtained from SO
by ignoring information concerning the order in which the labels occur . The data
obtained from the sample suo can be repre sent ed as
C/UO= \Yi
( ... / E S uo ) . (2.5 .1)
dO= ~i : i E so ). (2.5.2)
Then the probab ility of observing the ordered d" give n the unordered data dUO is
(2 .5.3)
where L is the summation ove r all those ordered sampl es sa which results in the
unordered sample suo . Since the probabil ity P(dOI dUO) is independ ent of any
populatio n parameters and hence the unordered statistics dU
Ois a suffic ient statistic
for any population parameter.
Let us now first state the Rao-- Blackwell theorem, which is based on Rao ( 1945)
and Blackwell ( 1947) results.
Theorem 2.5.1. Let e~ = e(do) be an estimator of e con structed from ordered data
s
=
suo
L{Le,o(do\~
As:) }p(suO) Le~(do )p(so)=£(e~).
sO
=
( b) We have
MSE(e~ ) = E[es O - of = E[e.~ - es + es - of
=E(e~ -e. +E(es -0) +2E (e~ -esXe -0)
.s
I. ) l
=E(e~ - + MSE(es)+0 .
Hence the theorem.
Now we will discuss the problem of estimation of mean and variance on the basis of
distinct un its in the sample. Clearly a unit can be repeated onl y in W R sampling
schemes . Hence we are dealing only with SRSW R sampling scheme . Suppose v
denote the numb er of distinct units in the sam ple of size n drawn from the
popul ation of N units by using SRSWR scheme.
Th e distribution of distin ct units in the sample was first develop ed by Feller (195 7)
as follows
For esti mating the population mean Y by using information only from the distinct
units we have the following theor ems.
Proof. Followi ng Raj and Khamis (1958), let E 2 and E1 be the expected values
defined for a given sample (fixed numb er of distinct unit s) and for all possible
samples, respectively, then by taking expe cted value on both sides of (2.5. 1.1), we
have
Theorem 2.5.1.2. Th e variance of the unb iased estimator y,. based on distinct un its
IS
(2.5.1 .3)
Proof. Suppose V2 and VI denote the variance for the given sample (fixed numb er
of distin ct unit s) and over all possible samples, we have
108 Ad vanced sa mpling theory with applications
It is interes ting to note that as the sa mple size 11 drawn wi th SRSW R sa mpling
approaches to the population of size N, the magn itud e of the relative efficiency also
inc reases. Th e rea son of inc rease in the relati ve effic iency may be that the increase
in sample size also increases the probability of rep etition of unit s in SRSW R
sa mpling .
Th e relati ve efficiency und er the Feller (1957) distribution is given by
V(y ) (N -I)N(I/-I )
RE = - = --'--r--'----,
n{Nil J(I/- I)}
_1/-
(2 .5. I .6)
V(Yv)
J=I
whic h is free fro m any po pulatio n parameter but depe nds upon populat ion size an d
sa mple size .
Th e following tabl e shows the percent relative efficie ncy of dist inct unit s based
est imators wit h res pect to the esti mators based on SRSW R sa mpling for different
va lues of sa mple sizes n an d po pulation sizes N = 10 .
Chapter 2: Simple Random Sampling 109
"
"
:: oi
BenetitOf\use ' distinct -units
>,J Sample size ( n )
J
J 2 3 4 5 6 7 8 9
J(n-l)
I I I I I 1 1 1 I
2 2 4 8 16 32 64 128 256
3 3 9 27 81 243 729 2187 6561
4 4 16 64 256 1024 4096 16384 65536
5 5 25 125 625 3125 15625 78125 390625
6 6 36 216 1296 7776 46656 279936 1679616
7 7 49 343 2401 16807 11 7649 823543 5764801
8 8 64 512 4096 32768 262144 2097152 16777216
9 9 81 729 6561 59049 531441 4782969 43046721
Sum I" 45 """ 285 2025 15333' 120825 978405 8080425 67731333
Theorem 2.5.1.4. (a) Show that an altern ative estimator of the population mean
based on distinct units is
where I] (v) and h (v) are suitably chosen constants such that ys is an unbiased
estimator of Y and its variance is minimum. Now from the property of
unbiasedness we have
E(ys) = EUi(v)yv + h(v)]= fi(v)Y + h(v)= Y . (2.5.l.l1)
This implies that
h(v)= [1 - fi(v)]Y. (2.5.l.l2)
Evidently the value of h (v) contains the unknown value Y, the exact value of
h (v) is not known unless fi(v)=1, which implies I: (v) = O. Thus we chose
fi(v) = 1, then h (v) = 0 , which means a better estimator of population mean Y
v
would be yy = v-I 2.: Yi . In practical situations, sometimes a priori information or
i=1
knowledge of X (say) is available about population mean Y from past surveys or
pilot surveys . In such situations, the value of h (v) is given by
h(v)= [1- fi(v)]X . (2.5.1.13)
Thus if we will chose h (v) as given in (2.5.l.l3), then the bias in the estimator ys
will be minimum . Unfortunately, I: (v) depends upon the value of II (v) too. The
best method to chose I I (v ) is such that the variance of ys is minimum . Now the
variance of the estimator ys is given by
V(Ys) = E]V2(Y.J + V] E2 (ys) = E]V2Ui(v)yv + h(v)] + V2ElUi (v)Yv + h(v)]
If no such information abou t Y is ava ilable, then we have X = 0 and the above
estima tor reduces to
_ (Nv)/(N-v) _
Y2 = E[(Nv)/(N _v)Vv . (2.5.1.18)
Path ak ( 1961) has show n that
Theorem 2.5.1.5. Show that if the square of the population coeffi cient of var iation
Cf, = sf, /f 2 exceeds (II-I) , then the esti mato r Y2 = (v/E(v))Yv is more effic ient
than Yv'
Proof. We know that
_
V(Y2) = 2
S;(N-I)1 /I 2
[ff-(1- N1 )" ) -f-
f 2(1- -;;1)"+ (I- fj2 )/1 )]
N! I-( I- N))
+ y2 II 2
[ N( I- fjI )" -N 21 I )211 +N(N - \{ I- N
( -fj 2 )" ] .
N 2 [1_( I_ ~) ] (2.5.1.25)
l:
N- I
J
(
II -I
)
_ S2 ) -1
- Y Nil
2 -2 (2.5.1.26)
= CISy -C2Y (say) .
Now the estimator Y2 IS better than Yv if v(Yv)- V()l2 )<O or if
(s;jy2)> (c2/cd.
The approximate values of C) and C2 for large pop ulation s, correct up to terms of
order N - 2 , arc given by
C) =_1_+ 5(11- 1) and C, = (n- I) _ (n- IXn-2)
2nN 12nN 2 2nN 3nN 2
and thus, (C2 /C));:::(n-I) . Hence the theorem.
Theorem 2.5.1.6. If squared error be the loss functio n then show that )Iv is
adm issib le amo ngst all functio ns of )Iv and v .
Proof. Let I = )I,. + /
()lv, v) be the function of )lv and v . Suppose that the est imator
I is uniformly better than )Iv . Suppose R(l ) be the quadratic loss function for the
Chapter 2: Simple Random Sampling 113
es timator t. Then the estimator t will be uniformly better than the estim ator Yv if
"»2 = N - I IN ( Y; - -Y \2J
i= 1
using distin ct uni ts in a sa mple of II unit s drawn by using S RSW R sa mpling. Th e
usual estimator of <7; is given by
-[1-C,.(II-()II I)]
Sv2 - Cv
2
Sci (2.5.2.1)
where
l
2 _! (V- It ±(Yi - Yv)2 if v > I, (2.5.2.2)
sci - i= 1
o otherwise,
and
Proof. Suppose we have any convex loss function and T be an ord ered sufficient
stati stic , then by the Rao--Blackwell theorem we have
To prove that the estimator at (2.5.2.1) is uniformly better than s ~ let us consider
the following cases :
If v = I , i.e., only one unit has been selected in the sample of two units drawn by
SRSWR then (2.5.2.4) is obv iously zero . Suppose 'I I and 'I II denote the
summations over all integral values of at such that the following equalities holds:
v v
2:a(i) = n , a(i) > 0 for i = 1,2,..., v and 'I a()F (n - 2), a(j r:: 0, at) ) > 0 and
i=1 )=!
E[(YI - yz)Z I
2
r] ± ~(i)
=
i ~j= l
- Y(j)~ P[XI = X(i )' Xz = x(j) I r].
2
On substituting the value of P~tl = X(i),X2 = x(j ) ITJ from (2.5.2 .8), we obtain
14 IN 1213.024
47 WA 1100.745
22 MI 323.028
42 TN 553.266
23 MN 1354.768
48 WV 99.277
06 CO 315.809
07 CT 7.130
21 MA 7.590
31 NM 140.582
36 OK 612.108
16 KS 1049.834
27 NE 1337.852
10 GA 939.460
18 LA 282.565
26 MT 292.965
( a ) Estimate the average real estate farm loans in the United States using
information from distinct units only.
( b ) Estimate the finite population variance of the real estate loans in the US using
information from distinct units only.
( c ) Estimate the average real estate loans and its finite population variance by
including repeated units in the sample . Comment on the results .
Solution. Here n = 20 and v = 17, and on the basis of distinct units information, we
have
( b ) An est imator of the finite population variance CT; based on distinct units
information is given by
2 [ C)n-I)] 2
Sv = 1- C)n) sd'
Now
2 1 V( _)2 3911380
sd = -(- ) L y.- y = = 244461.25,
v-I i=1 I v 17-1
and
C (n) = vn _(v)(V_I)n+.oo+(_I)(V-I)(V )In .
v I v-I
C)n) = CI7(20) = 1720 - cl 7(17 _1)20 + C~7 (17_2)20 - cj7 (17 - 3fO + CJ7(17 _4)20
- cF (17 _5)20 + C~7 (17- 6)20 - cj7 (17 _7)20 + cJ7 (17 _8)20
- cr (17- 9)20 + c1J(17 _10)20 - cli (17 _11)20 + cli (17 _12)20
-clI (17 _13)20 + cll (17 _14)20 - clJ (17 _ 15)20+ clJ (17 _16)20
= 2.6366 x 10
20 ,
and
C)n-I)= CI7(19)= 1719 -CF(17-1)19 +Cf(17-2)19 -Cj7(17-3)19 +CJ7(17 -4)19
-CF(17-5)19 +C~7(17-6)19 -Cj7(17-7)19 +CJ7(17- 8)19
S; = ( I _ 2.6366
4.4805 x 10
x 10
18
20
) x 244461.25 = 240307.004 .
1i,ymEer
t
farm loans, fYl"'-I '~t;;;
~ )
~
(-
"" , (; Yi - Yn ~', ' Yi - Yn t ,;"~
'~ '
2.6 'ESTIMkTION
OF K,POPULA
Sometime we are interested in estimating the total or mean value of a variable of
interest within a subgroup or part of the population. Such a part or subgroup of a
population is called the domain of interest. For example, in a state wide survey, a
district may be considered as a domain. After completing the survey sampling
process from the whole population, one may be interested in estimating the mean or
total of a particular subgroup of the population. We are interested in estimating
population parameters of a subgroup of a population. For example
~ ..
" , ;r ;!,. ;; ~. ;
;. yr••";·· ;.0· !0".
Let D be the domain of interest and N D be the number of units in this domain.
ND _ 1 ND
Let YD = L f; and YD =- L Y; be the total and mean for the domain D
;=1 ND ;=1
respectively . Suppose we selected an SRSWOR sample s of n units from the
entire population nand nD ~ n units out of the selected units are from the domain
D of interest. In certain situations the value of N D is known and in another
situations the value of N D is unknown. We shall discuss the both situations as
follows. Define a variable
y' =
I
{I0
if i E D ,
if i ~ D. (2.6.1)
N • •
Then we have If; = YD = Y (say).
i=1
Theorem 2.6.2. The variance of the estimator YD under SRSWOR sampling is:
2(
V(YD) = N I- f)sb, where Sb=_I_f~Y;'2_N-I(~y;,)2J. (2 .6.3)
n N -I 1;=1 ;=1
Proof. Obvious from the results ofSRS WOR sampling.
v.(y'D)_ j
2
- N (1_ f) SD,
(J
2 1 '2 -1
2
where SD= - L Y; - n L Y; 2J .
/I /I ,
- (2.6.4)
n n -I ;=1 ;=1
Proof. Obvious .
Lemma 2.6.1. If i E SD indicates that the t" unit is in the sample sub group of D
then, we have
Prf( . E SDI nD> 0 ) = -no . (2.6.5)
ND
Proof. We have
Pr(i E SD InD > 0)
Pr(i E SD, 1I1D) Number of samples of sizes n Dwith i E SD
= Pr(n D) = Number of samples of sizes nD
Number of ways (nD - I) can be chosen from (ND- I) and (n- nD)from (N- ND)
Numb er of ways n D can be chosen from NDand (n- nD) from (N- ND)
ND
=E2 [- - "i.JiYi InD > 0] , where Ii = {I if iES.O'
n DiED 0 otherwise,
Proof. We have
1 1
-;;; = n( n: ) = n( n: - Po ) + nPo
Chapter 2: Simple Random Sampling 121
by
(2 .6.8)
Hence the lemma. For more details about the expected values of an inverse random
variab le one can refer to Stephen (1945).
Theorem 2.6.5 . Show that the variance of the est imator YD , when N D is know n, is
V(YD)", Pr(IID > O{{PD N2(~_ f) + : : (1- PD)}Sb + Pr(IID = 0)Y5]. (2.6.9)
Proof. We have
V(YD)= E, ~2(YD IIID)]+ VI [E2(YDI n» )] . (2.6.10)
Now
'
V2(YD [IID)= OIlD
N D2 ( I-~
ND
Jsb if no > 0, (, ) {YD if liD > 0,
and E2 YD j llD = 0
1 if no = 0,
if no = O.
Now
.if liD > 0] = VI [YDI(IID > 0)] = Y5 V, [/(IID > 0)]= Y5 Pr(IID > OXI- Pr(IID > 0))
If liD = 0
= Y8 Pr(IID > O)Pr(IID = 0), (2.6.12)
122 Advanced sampling theory with app lications
and
if v» > 0]
If no =0
= min(flf,';'(IID = j )N5 (1 - ~)S5 +pr(IID = o)«0
j= 1 no ND
= Pr(1I D > 0)
min(ND ,II) N2 ( II 2
I Prlll D = j ill D > 0)---.!2... 1- ~ S D
J
no
;J
j =l ND
'" N5 S5{_ I_ + 1
IIPD II PD N D
~ P~
-_I_ }pr(IID > 0), where PD = N D
N
+ N5 1
2
'" { PD N
II
(1 -~)S5
N
~ P~ S5} pr(II D > 0),
II P (2.6.13)
D
1 ( J2) l
' no 2 1 II *2 *
IY; - IY;
1/
PD = - and SD = - - no t
II no -I ;= 1 ;= 1
P(n = I ) =
(:~J:QmJ
J (NP-m+l) m.m + I,... .m + NQ.
(
NN-I+I x , 1= (2.7.1)
I-I
Such a distribution is called negative hypergeometric distribution and we have
m+NQ
L: P(n = I) = I. Then we have the following theorem.
I=m
1\2\
p =
(N - IXm- IXm- 2) +(m- I)
-- - .
1 . N(II - IXII - 2) N(II - I)
est imate
v"(")_
p - {"}2
p - 11\p 2 ] _
- -(m-- If- - (N-I Xm- IXm-2 ) (m- I)
(II - If N(II - IXII - 2) N(II - I)
estimate
P ( II; Ill , P ) = II - J
I P 11/Q /1- 11/ cl or /I = Ill, III + I, III + 2,...
( m- I
Then we have the following theorem.
Theorem 2.7.3 . An unbiased estimator of the required proportion P of a rare
attribute is
• Ill-I
p =- - (2.7.4)
/I - I
and an estimator of the V(p ) is given by
v(p) = p(l - p). (2.7.5)
11 -2
Proof. Ob vious for large N from the previous theor ems.
While using simple random sampling and without replacement (SRSWO R) design,
the number of possible samples, N C; , is very large, even for moderate sample and
population sizes. For example, if
Number of samples
N
/I 30 40
5 142,506 658,008
10 30,045 ,0 15 847,660,528
15 155, 117,520 40,225 ,345,056
126 Advanced sampling theory with applications
Some time s in the field surveys , all the possible samples are not equally preferable
from the operational point of view , because a few of them may be inaccessible,
expensive, or inconv enient, etc.. It is therefore advantageous if the sampl ing design
is such that the total number of poss ible samples is much less than N en , retainin g
the unbi asedness prop erties of sample mean and sample variance for their
respective population param eters. Neyman's (1923) notati on for causal effect s in
randomized experiments and Fisher's (1925) proposal to actually randomize
treatments to units . Neyman (1923) appears to have been the first to provide a
mathem atical analysis for a randomi zed experim ent with expli cit notation for the
potential outcomes, implicitly making the stability assumption. This notation
became standard for work in randomi zed exper iments (e.g., Pitman, 1937; Welch,
1937 ; McCarth y, 1939; Anscombe, 1948; Kempthorne, 1952; Brillin ger, Jones, and
Tuk ey, 1978; Hod ges and Lehmann, 1970 , and dozens of other place s, often
assuming con stant treatment effects as in Cox, 1958, and sometimes being used
quite informally as in Freedman, Pisan i, and Purv es, 1978) . Neym an's formali sm
was a maj or advan ce because it allowed explicit prob abil istic inferences to be
drawn from data, where the probabilit ies were explicitly defined by the random ized
ass ignment mechanism. Independently and nearly simultaneously, Fisher ( 1925)
invented a somewhat different method of inferenc e for rand omized experiments,
also based on the specia l class of randomi zed assignment mech anisms. Fisher's test
and resulting ' significance levels' (i.e., p values), remain the accepted rigorou s
standard for the analysis of randomi zed clinical trials at the end of the twent ieth
century, so called ' intent to treat analy ses. The notions of the cen tral role of
randomized experiments seems to have been ' in the air ' in the 1920, but Fisher was
the first to combine physical randomi zation with a theoreti cal analysis tied to it. A
review on randomi zation is also available by Fienberg and Tanur ( 1987). These
ideas were primarily assoc iated with the notion of fairness and obje ctivity in their
earlier work. The role of the International Statistical Institute in the earlier work
related to sample surveys, as reviewed by Smith and Sugd en (1985). Fienberg and
Tanur (1987) explored some of the developments following from their earlier
pioneer ing work with an emph asis on the parall els between the methodologies in
the design of experiments and the design of sample surveys . Chakrabarti (1963)
initiated the idea that the results on the existence and construction of balanced
sampling designs can be easily translated to the language of design theor y by using
the corr espondence between sampling design and block designs. Bellhouse (19 84a)
also work ed on these lines and has shown that a systematic applic ation of the
treatments minimi ses the variance of the treatment constant averaged over the
application of the treatment. The lack of cross reference in the review papers by
Cox ( 1984) and Smith (1984) suggested that the specia lisation extends even to
compartmentalisation within the minds and pro fession al lives of outstand ing
investigators, for both these authors have been steeped in the tradition of parall els.
For example, consider a balanced incomplete block design (BIBD) with standard
parameters (b , v, r, k,A), where v denotes the number of varieties, b the number of
blocks , k the block size, r the number of times each treatment occurs and A the
number of times any pair of treatments occur together in a blocks. In practice
Chapter 2: Simple Random Sampling 127
i ' __
SI
{I if i E S
0 if i '" S
such that £(1.." ) = -r ,
J b
and
I i f i.] E S ( ) A
i s,;; =
" { 0 if i, j '" S such that E i "I)" = -b . J
such that
I N ] 1N ) 1N r r N -
£(y)=£[ - l. Y/ si =- l.}j £(Jsi =- l. }j-=- l. }j=Y
II i= l II i= l II i=l b lib i=l
becau se vr = bk .
Similarly using r(k -I) = A(V-I) and bk = vr we have
1/ J ( N
£ ( ~~YiYj =£ ~ ~}jY/sij =-l. l. }jYj = - _
(
JA N
) l. ~ Yi Yj= -( _ )l.l.}jYj
(r(r - I)J N k(1I - I) N
'*1 '* 1 b'*l bv 1 '* 1 v v 1 1* 1
11(11-1) N
= (
N N- I
)l.DiYj
N.j
'
Theorem 2.8.1. Under controlled sampling design, the sample mean and sample
variance rema in unbi ased to their respecti ve parameters.
Subramani and Trac y (1993) used the concept of incomplete block design in sample
surveys and introduced a new sampling scheme called determinant sampling. This
scheme totally ignore s the units close to each other for selection in the sample. In
the preceding discus sion , the units which are close to each other in some sense are
called contiguous units. Chakrabarti (1963) excluded conti guous units when
tran slating the result s of sampling designs to experimental design s since these units
have a tendency to provide identical inform ation which may be induced by factors
like time , category or location . As an example, in socio economic surveys people
128 Adva nced sampling theory with applications
have a tendency to exhibit similar expenditure patterns on household items dur ing
different wee ks of the month. More over peopl e belonging to the same income
category class have a grea ter tende ncy to have simi lar expe nditure patterns. With
regar d to the factor location, residents of a speci fic area show similar symp toms of a
disease caused by env ironmental pollution as of some infectious disease . Sim ilarly
in crop field surveys contiguous farms and fields shou ld be avoi ded. Because of
this limitation, Rao (197 5, 1987) has sugges ted that if contiguous units occ ur in any
observe d sample, they may be collapsed into a sing le unit, with the corresponding
response as the average observed respon se over these units. An estimate of the
unkn own par ameter is then recommended on the basis of such a reduced sample.
The situations for getting more information on the popul ation by avoiding pairs of
contiguous un its in the observed sample are well summarised by Heda yat, Rao, and
Stufk en (19 88). Tracy and Osahan (1994 a) furth er extend their work for other
sampling schemes.
EXERCISES
Exercise 2.1. Define simple random sampling. Is the sample mean a consistent or
unbi ased estimator of the population mean? Derive the variance of the estimator
using ( a ) SRSWR sampling ( b) SRSWOR sampling. Also derive an unbi ased
estimator of variance in each situation.
Exercise 2.2. A popul ation consists of N units, the value of one unit being known to
be YI • An SRS WOR of (II - I) units is drawn from the remainin g (N - I)
population units. Show that the estimator
)'1 = l) + (N-1)YIl_I
11 -1 N
where YIl- I = (11 - lt l LYi ,is an unbiased estimator of the popu lation total, Y=L Y j ,
i= 1 ;=2
but the variance of the estimator Y1 is not less than the variance of estim ator
Y2=NYIl ' where YIl=I1- 1 I Yi is an estimator of popu lation mean based on the sample
i= 1
of size 11 selected from the popul ation of N units. In other word s, the estimator Y1
is no more efficient than Y2. Give reasons.
Hint: By setting V(YI )?:: V(Y2) we obtain N > 11 which is always true for SRSWOR
Exercise 2.3. Suppose in the list on N businesses serially numb ered, k businesses
are found to be dead and t new businesses came into exis tence making the total
numb er of business (N - k + t). Give a simple procedure for selecting a businesses
with equal prob ability from (N - k + t) businesses, avoidin g renumbering of the
origi nal busine sses and show that the newly developed procedure achieves equal
probab ility for the new business too.
Chapte r 2: Simple Rand om Sampling 129
Hint: Using SRSWOR sampling the probability of selecting each unit will be
l
(N - k+ tt •
Exercise 2.4. Show that the bias in the Searl s' estimator defin ed as, Ysearl = AYn , is
B(Ysearl)= -Y V(YIl )/{f2 +v(y,.)}. Hence deduc e its values und er SRSWR and
SRSWOR.
Hint: Redd y (1 978a).
Exercise 2.5. An analogue to the Searls' estimator for estimating the population
propo rtion is defined as, P searl = Y Py , where y is a constant. Find the min imum
mean square error of the estimator Psearl under SRSWR and SRSWOR sampling.
Also study the bias of the estimator in each situation.
Hint: Conti (1995).
i=[I+ 11
(N
N- I
)(s;,/y 2 )tJ
Show that i is a consi stent estimator of the optimum value of A . Also calcul ate
the bias and mean squared error, to the first order of approximation, in the estimator
of popu lation mean defined as, Yo = i yn . Deduce the results for estimating
popul ation proportion with the estimator, P searl = r Py , where r is a consistent
estimator of y .
Hint: Mangat, Singh , and Singh (199 1).
Exercise 2.7. Sho w that: ( a) under SRSWR sampling s;' is an unbi ased estimator
of a.~ , ( b) under SRSWOR sampling s;' is an unbia sed estimator of sJ,.
Exercise 2.8. Define the Searls' estimator of population mean . Show that the
relative efficiency of the Searls ' estimator is a decreasing funct ion of sample size
under (a) SWSWR (b) SRSWOR sampling designs.
Exercise 2.9. Show that the prob ability of selecting the i l " unit in the Sl" sampl e
remain the same under SRSWR and SRSWOR and is given by 1/ N .
Exercise 2.10. Why is the Searls ' estimator not useful in actual practice? Suggest
some modifications to make it practicable.
Hint: Use i in place of A. .
130 Advanced sampling theory with applications
Exercise 2.11. In case ofSWSWR sampling, if ther e are two characters Y and x ,
the covariance between Y and X is defined as a x)' =~ I (Yi - vXXi - x). Then the
N i=1
·
usua I estimator 0 f a . .
xy IS given
by Sri'
-X
= - I - I1/ ( Yi - Y Xi - rX) . Show that an
. n-I i= 1
estimator better than S ty based onl y on distinct units is:
Exercise 2.12. (a) Show that the usua l estimator of the popu lation tota l (namely
Ny) in SRSWOR has average minimum mean squared error, for permutations of
va lues attac hed to the units , in the general class of linear translation invariant
estimato rs of the population total Y.
( b ) Show that for SRSWOR sampling of size II , the estimator which minimises
the average mean squared error, for permutations of values atta ched to the unit s, in
the class of all linear estim ators is give n by,
N
Ie =-II (-I + Ii )iIYi
ES
where Ii = (N
N- I
-II ) C; and
II
Cy is the known population coefficient of variation.
Hint: Ramakrishnan and Rao ( 1975) .
Exercise 2.13. Let a finite population consi st of N units . To every unit there is
attached a characteristic y . The characteristics are assumed to be measured on a
given sca le with distinct points Y I,Y Z,...,Yt . Let N, be the number of unit s
associated with scale point Y/ ' with N = I N/ . A simp le random sample of size II
t
where /It is the numb er of times the value s of Yt are observed in the sample.
Hin t: Hartl ey and Rao (196 8).
Exercise 2.14. Suppose we selected a sample of size II such that the {IJ unit of the
population occurs Ii times in the sample. Assume that II I of the se unit s (Ill < II ) are
r
selected with frequen cy one . Evidently II = /II + I f; , where r is the number of
i=1
units occ urring I i times in the sample. Let d (= III +r)be the number of distinct
Chapter 2: Simple Random Sampling 131
unit s in the sample. Th e d unit s are measured by one set of investigators and the r
repeated units by another set, preferably by the supervising staffs. The measurement
of the d units be denoted by Xl> X2, .•., xIII for the non-repeated ones and
xIII +1' xIII +2> •.•,x lIl +,. for the rep eated ones . The measurement of the r repeated unit s
be denoted by 21, 22, .. .,2,.. Us ing the abo ve information and not ation, study the
asymptotic properties of the following estimators of population mean:
(a) Xd =~~IIXIII
d
+rxrJ; (b ) ZR = Z,. ( ~dJ ;
X,.
(c) z/,.= Z,. +P(Xd - X,.);
Exercise 2.15. Discuss the problem of the estimation of domain total in survey
sampling. Derive the estimator of domain total and find its variance under different
situations.
Exercise 2.16. Under SRSWR sampling, show that the distinct unit s based unbiased
estimators of the finite population variance a y2 are given by
(Nil) J (II- I)
( a) •
VI =
J= I
N"-I(N- I) Y
S
2•
, ( b) V2 =[(~ - ~ ) +NI-II(I - ~)}3;
. _ Cv_l(n - l) 2. ( d) ~ _
(N
il)J (II -1) [1- Cv(n- I)] 2.
( C ) v3 ~4 -
J- I
II-I ( )
- ( )
c, n
sd,
N N -I c, (n ) Sd ,
Exercise 2.17. Discuss the method and theory of the estimation ofrare attributes in
survey sampling.
Exercise 2.18. Write a program in FORTRAN or SAS to find the values of the
coefficients Cv(n-I) and Cv(n ). Test the results for n = 5 and v = 3 with all steps
using your desk calculator.
where cI. is a constant depending on the /" draw, YI is the va lue of Y on the unit
se lected at the t" draw.
132 Ad vance d sampling theory with app lications
( a ) Show that -
Ynew is unbiased for population mean Y " Ci = 1.
if and onl y if L
i= J
( c ) Show that V(Ynew) is minim ised subject to the condition I Ci = 1 if and only if
i=1
Ci = 1/11, ,11 .
i = 1,2,...
H int: Ic; ~ (I ci i /11 = 1/11 and equal ity hold s if and only if Ci = 1/11 .
( a ) Sho w that both estim ators y" and YII/ are unbi ased for population mean Y.
Exercise 2.21. Discuss controlled sampling. Show that the sample mean and sampl e
variance rem ain unbiased to their respective parameters.
Exercise 2.22 . Discuss the concept of rare attribute and give a pos sible solution
using inverse samp ling.
PRACTICAL PROBL EM S
P r acti cal 2.1. Co nsider the problem of estimation of the tota l number of fish caught
by marine recreational fishermen at Atlantic and Gulf coasts. We know that there
were 69 species caught during 1992 as shown in the population 4 in the Appendix .
What is the minimum numb er of species groups to be se lected by SR SWR sampling
to attain the accuracy of relative standard error 12%?
Given: s; = 31,0 10,599 and Y = 291 ,882.
Chapter 2: Simpl e Random Sampling 133
Practical 2.2. Your supervis or has sugges ted you to think on the problem of
estim ation of the total numb er of fish caught by marine recreational fishermen at
Atlantic and Gulf coa sts. He told you that there were 69 species caught during 1993
as shown in the population 4 in the Appendix. He needs your help in deciding the
sample size using SRSWOR design with the relative standard erro r 25% . How your
kno wledg e in statistic s can help him?
Given: sJ, = 39,881,874 and Y = 316,784.
Practical 2.3. Th e demand for the Bluefi sh has been found to be highe st in certain
markets. In order to supply these types of fish the estimation of the proport ion of
bluefish is an important issue . At Atlantic and Gulf coas ts, in a large sampl e of
311,528 fish there were sho wn to be 10,940 Bluefish caught durin g 1995. What is
the minimum numb er of fish to be selected by SRSWR sampling to attain the
accuracy of relativ e standard error 12%?
Practical 2.4. John considers the problem of estim ation of the total number of fish
caught by marine recreati onal fishermen at Atlantic and Gulf coasts. There were 69
spec ies caught durin g 1994 as shown in the popul ation 4 in the Appendix. John
selected a sample of 20 units by SRSW R sampling. What will be his gain in
effi ciency ifh e considers the Sear ls' estimator instea d of usual estimator?
Given: sJ, = 49,829,270 and Y = 341,856.
Practical 2.5. Select an SRSWR sample of twenty units from population 4 given in
the Appendix. Collect the information on the number of fish during 1994 in each of
the species group selected in the sample . Estimate the average number of fish
caught by marine recreational fishermen at the Atlantic and Gulf coa sts dur ing
1994. Construc t 95% confid ence interval for the average numb er of fish in each
spec ies group of the United States .
Practical 2.7. Select an SRSWR sample of 20 state s using Random Number Table
meth od from popul ation I of the Appendix. Note the frequency of each state
selected in the sample. Construct a new sample by keepin g onl y distinct states and
coll ect the information about the nonr eal estate farm loans in these states. From the
information collected in the sample:
( a ) Estimate the average nonreal estate farm loans in the Unit ed States USIng
information from distin ct units only.
( b ) Estimate the finite population variance of the nonreal estate loans in the United
States using distinct units only.
134 Adva nced sampling theory with applications
( c ) Estimate the average nonrea l estate loans and its finite pop ulati on variance by
inclu ding repeated unit s in the sample. Comment on the results.
Practical 2.8. A fisherman visited the Atlantic and Gulf coast and caught 6,000 fish
one by one. He noted the species group of eac h fish caught by him and put back
that fish in the sea before mak ing the next caught. He observed that 700 fish belon g
to the group Herrings.
( a ) Estimate the proportion of fish in the group Herrings living in the Atlanti c and
Gulf coast.
( b ) Co nstruc t the 95% confidence interval.
Practical 2.9. Durin g 1995 Michael visited the Atlantic and Gulf coast and caught
7,000 fish. He observed the spec ies group of each one of the fish caught by him
using SRSWOR sampling and found that 1,068 fish belong to the group Red
snapper.
( a ) Estimate the proportion of fish in the group Red snappe r living in the Atlantic
and Gul f coast.
( b ) Construct the 95% confid ence interval.
Gi ven: Total numb er of fish living in the coast = 311 ,52 8.
Practical 2.10. Follo win g the instructions of an ABC comp any, select an SRSW R
sample of 25 unit s from the popul ation I by using the 4 th and 5th co lumns of the
Pseud o-R and om Numb ers (PRN) given in Table I of the Appendix . Record the
states selected more than once in the sample. Reduc e the sample size by keeping
only eac h state onc e in the sample and collect the information about the real estate
farm loans in these states. Use this information to:
( a ) Estimate the average real estate farm loans 10 the Uni ted States using
inform ation from distin ct units only.
( b ) Estimate the finite popul ation variance of the real estate loans in the US using
informati on from distinct units only.
( c ) Estimate the average real estate loans and its finite popul ation variance by
includ ing repeated units in the sample. Comment on the result s.
Practical 2.11. You think of a practical situation where you have to estimate a total
of a variabl e or characteristic of a subgroup (dom ain) of a population. Tak e a
sample of reasonable size from the population under study and collect the
information from the units selected in the sample. Apply the appropriate formul ae
to construct the 95% confidence interval estimate.
Practical 2.12. A practic al situation arises where you have to estimate a proportion
of a rare attribute in a popul ation, e.g., extra marital relations. Coll ect the
information from the units selected in the sample throu gh inverse sampling from the
population under study. Apply the appropriate formul ae to construc t the 95%
confidence intervals for the prop ortion of the rare attribute in the popul ation.
Chapter 2: Simple Random Sampling 135
Practical 2.13. A sample of 30 out of 100 managers was taken, and they were
asked whether or not they usually take work home. The responses of these
managers are given below where ' Yes' indicates they usually take work home and
'No' means they do not.
Construct 95% confidence intervals for the proportion of all managers who take
work home using the following sampling schemes :
( a ) Simple Random Sampl ing and With Replacement;
( b ) Simple Random Sampling and Without Replacement.
Practical 2.14. From a list of 80,000 farms in a state, a sample of 2,100 farms was
selected by SRSWOR sampling. The data for the number of cattle for the sample
were as follows :
n n 2
LYi = 38,000 , and L Yi = 920,000.
i ;1 i ;!
Estimate from the sample the total number of cattle in the state, the average number
of cattle per farm, along with their standard errors , coefficient of variat ion and 95%
confidence interval.
Practical 2.15. At St. Cloud State University, the length of hairs, Y, on the heads
of girls is assumed to be uniformly distributed between 5 em and 25cm with the
probability density function
1
f(y) = - \;j 5 < Y < 25
20
( a ) We wish to estimate the average length of hairs with an accuracy of relative
standa rd error of 5%, what is the required minimum number of hairs to be taken
from the girls?
( b ) Select a sample of the required size, and use it to construct a 95% confidence
interval for the average length of hairs?
Practical 2.16. The distribution ofweighty shipped to 1000 locations has a logistic
distribution
f Y =-sech
() 1
4fl.
2{ -
1 --
2 fl.
(x-a•J}
with a. = 10 and fl. = 0.5 .
( a ) Find the value of the minimum sample size n required to estim ate the average
weight shipped with an accuracy of standard error of 0.05% .
( b ) Select a sample of the required size and construct 95% confidence interval for
the average weight shipped.
( c) Does the true weight lies in the 95% confidence interval?
136 Advanced sampl ing theory with applicat ions
Practical 2.17. Assume that the life of every person is made of an infinite number
of good and bad events . Count the total number of good and bad events you
remember that have happened to you. Estimate the proportion of good events in
your life. Construct a 95% confidence interval estimate. Name the sampling
scheme you adopted to estimate proportion of good happenings, and comment.
Practical 2.18. Assuming that everyone dreams infinite number times during
sleeping hours in the life. Count the number of good and bad dreams in your life
you remember. Estimate the proportion of good dreams and construct a 95%
confidence interval estimate . Name the sampling scheme you followed to estimate
the proportion of good dreams, and comment.
Practical 2.19. Dr. Dreamer believes that if a person takes good dreams during
sleeping hours then he/she is mentally more healthy, and pleasant person . You are
instructed to report stories of your dreams to the doctor until you are not having 15
good dreams . Find the Dr. Dreamer's 95% confidence interval estimate of the
proportion of good dreams in your life. Can you be considered a pleasant person?
Comment and list the sampling scheme used.
3. USE OF AUXILIARY INFORMATION: SIMPLE RANDOM
SAMPLING
3.0 INTRODUCTION
It is well know n that suit able use of aux iliary informatio n in probab ility sam pling
results in co nsiderab le redu ction in the varia nce of the estimato rs of population
parameters viz. population mean (or total), med ian, variance, reg ress ion coefficient,
and popul ation correlation coefficient, etc.. In this chapter we will consider the
problem of estimation of different population parameters of interest to sur vey
statisticians using known auxiliary inform ation und er SRSWOR and SRSWR
sampling schemes only . Before proceeding furth er it is nece ssary to de fine som e
notation and ex pec ted values, which will be useful throu ghout this chapter.
Ass ume that a simple random sample (SRS) of size 11 is drawn from the give n
popul ation of N unit s. Let the value of the study variable Y and the auxiliary
variable X for the / " unit (i = 1,2,...,N) of the popul ation be denoted by >i and Xi
and for the i''' un it in the sample (i = 1,2,...,11) by Yi and Xi' respectiv ely. From the
sampl e obse rvations we have
- \ /I _ \ /I 2 \ /I _ 2 2 1 /I - 2
Y =- 'L Yt » X =- 'L Xi ' SY =-(- ) 'L (Yi - y) , Sx =-( - ) 'L (Xi - X) ,
11 i=1 11 i=1 11 - \ i=1 11 - \ i=1
and
S
xy
=-\()
11 -\ i=1
£(Y.-Y)(x.-x) .
I I
f i rs = -( -
I -) ~ (
L.. Yi - Y
-)1' (Xi - X- )s , and AI'S = fi rs
/V 1'/ 2 s /2)
' /20 fl 02 .
N- l i=\
Note that
fl2 0 = Sy2 , fl02 = S "2 and fil l = S ,y , so that Cy2 = Sy2/ Y-2 = fl 20 / Y-2 ,
Cr2 = S,2/ X- 2 = fl02 / - 2
X, and Pxy = S ty / (S,Sy ) = fill / (Vc-
f l 20 Vc-)
fl02 •
Let us define
y x
&0 ==-1, &1 =~-I,
Y X
The next section has been devoted to estimate the population mean in the presence
of known auxiliary information,
Several estimators of population mean are available in the literature and we will
discuss some of them .
3.2.1 RATIO :ESTIMATOR
Cochran (1940) was the first to show the contr ibution of known auxiliary
information in improving the efficiency of the estimator of the population mean Y
in survey sampling. Assuming that the population mean X of the auxiliary variable
is known, he introduced a ratio estimator of population mean Y defined as
Chapter 3: Use of auxiliary informat ion: Simple random sampli ng 139
- -(XJ
:x .
YR = Y (3.2.1.1 )
Theorem 3.2.1.1. The bias in the ratio estimator YR of the population mean Y , to
the first order of approximation, is
Assum ing led < 1 and using the binomial expansion of the term (1 + e,t' we have
where O(e,) denot es the higher order terms of e1' Note that le]1 < 1, ef """""* 0 as
g > 1 increases. Therefore the terms in (3.2.1.4) with higher powers of e ] are
negligible and can be ignored. Now taking expected values on both sides of
(3.2.104) and using the results from section 3.1 we obtain
Thus the bias in the estimator YR to the first order of approximation is given by
(3.2. 1.2). Henc e the theorem.
Theorem 3.2.1.2. The mean squared error of the ratio estimator Y R of the
population mean Y , to the first order of approx imation, is given by
MSE(h) = C ~f)y2[c; + C; - 2PxyCyCxl. (3.2.1.6)
Proof. By the definition of mean squared error (MSE) and usin g (3.2.104) we have
MSE(YR) = E[YR- r] "" E[V(1+eo- el +et - eoe, + 0(e2))- vf
"" V2 E[eo -e]+e? - eoe,]2.
Again neglecting high er order terms and using results from section 3.1 the MSE to
the first ord er of approximation is given by
By substituting the values of Cy ' C r and Pxy in (3.2 .1.6), one can easily see that
the mean squared error of the estimator YR, to the first order of approximation, can
be written as
Theorem 3.2.1.3. An estimator of the mean squared error of the ratio estimator YR ,
to the first order of approximation, is
Theorem 3.2.1.4. Another form of the estimator of the mean squared error of the
ratio estimator YR , to the first order of approximation, is
Theorem 3.2.1.5. The ratio estimator YR is more efficient than sample mean Y if
<. 1
P ry - >- ' (3.2.1.10)
. Cr 2
Proof. The proof follows from the fact that the ratio estimator YR is more effic ient
than the sample mean Y if
MSE(YR) < v(y)
orif
In the condition (3.2.1.10), if we assume that C y :::: e x ' then it holds for all values
of the correlation coefficient Pxy in the range (0.5, 1.0] . A Monte Carlo study of
ratio estimator is availab le from Rao and Beegle (1967). Thus we have the
following theorem.
T heore m 3.2.1.6. The ratio estimator YR is more efficient than the sample mean
Y if Pxy > 0.5 , i.e. , if the correlation between X and Y is positi ve and high.
Example 3.2.1.1. Mr. Bean was interested in estimating the average amount of real
estate farm loans (in $000 ) during 1997 in the United States. He took an SRSWOR
sample of eight states from the population 1 given in the Appendix. From the states
selected in the samp le he gathered the following information.
:, State
"
CA GA LA MS NM PA TX VT
Nonreal estate /' 3928.732 540.696 405.799 549.551 274.035 298.351 3520 .361 19.363
fafrrnloans(X..) '$
Real est~J~ farfn 1343.461 939.460 282.565 627.013 140.582 756.169 1248.761 57.747
loans (Y $,
The average amount $878.16 of nonreal estate farm loans (in $000) for the year
1997 is known. Apply the ratio method of estimation for estimating the average
amount of the real estate farm loans (in $000) during 1997. Also find an estimator
of the mean squared error of the ratio estimator and hence deduce 95% confidence
interval.
Thus we have II = 8,
142 Advanced sampling theory with applications
8
f(Xi -xf IXi
s2 = -'.::
i-:..!...
l _ 17382362.33 2483194.6, x = .!.::.!.- = 9536.888 = 1192.11
x 8-1 7 8 8
8
f(Yi - y)2 I Yi
s2 = H 1675478.89 = 239354.1 - = .!.::.!.- = 5395.758 = 674.469
y 8-1 7 ' Y 8 8 '
8
I(Yi - yXXi - r )
S = H = 4474294.08 = 639184.86 and r = I = 674.469 = 0.5658 .
xy 8-1 7 ' x 1192.11
We are given X = 878.16, N = 50 and f = 0.16.
Thus the ratio estimate of average amount of real estate farm loans during 1997, Y
(say), is given by
-
YR
= -(
Y
XJ
x = 674.469(878.162)
1192.11
= 496.86
Using the Table 2 given in the Appendix the 95% confidence interval is given by
Example 3.2.1.2. After applying the ratio method of estimat ion, Mr. Bean wants to
know if he achieved any gain in efficiency by using the ratio estimator. The amount
of real and nonreal estate farm loans (in $000) during 1997 in 50 different states of
the United States has been presented in population I of the Appendix. Find the
relative efficiency of the ratio estimator , for estimating the average amount of real
estate farm loans during 1997 by using known information on nonreal estate farm
loans during 1997, with respect to the usual estimator of population mean, given the
sample size is of eight units.
Solution. From the description of the population, we have Y; = Amount (in $000)
of real estate farm loans in different states during 1997, Xi = Amount (in $000) of
Chapter 3: Use of auxiliary information: Simple rando m sampling 143
nonreal estate farm loans in diffe ren t states during 1997, Y = 555.43, X = 878.16,
Sy2= 342021.5, C,2= 1.5256 , Cy2= 1.1086 , Pxy = 0.8038, an d N = 50 .
Thus we have
= 17606.39 .
Also
Thus the percent relative efficiency (RE) of the ratio estimator YR w ith respect to
the usual estima tor Y is given by
RE = v(-) x 100/ MSE(- ) = 35912.26 x 100 = 203.97%
y YR 17606.39
which shows that the ratio estimator is more effic ient than the usual estima tor of
pop ulatio n mean. It shou ld be noted that the relative efficiency does not depend
upon the sample size.
Theorem 3.2.1.7. The minimum sample size for the re lative standard error (RSE) to
be less tha n or equal to a fixed value ¢ is give n by
1/ >
¢2 y2
+-
1]-1 (3 .2 .1. 11)
- [ S2y + R 2S2x - 2RS xy N
(~ _ J...)(c;
II N
+C} - 2PxyCxCy) :,> ¢2
-1
¢2y2 1
or 11 >[ 2 2¢2 +J...]-I or II ~ 2 2s;2- 2RS + -N ]
Cy + C, - 2pxyC,Cy N [ Sy + R xy
Hence the theorem.
144 Advanced sampling theory with applications
Example 3.2.1.3. Mr. Bean wishes to estimate the average real estate farm loans in
the United States with the help of ratio method of estimation by using known
information about the nonreal estate farm loans as shown in population I in the
Appendix . What is the minimum sample size required for the relative standard error
(RSE) to be equal to 12.5%?
Solution. From the description of the population I given in the Appendix, we have
- - 2 2
N = 50, Y = 555.43, X = 878.16, Sy = 342021.5, Sx = 1176526, Sxy = 509910.41,
R = Y- / X
- =-
555.43 ..
- = 0.63249 , ¢ = 0.125, th us th e minimum samp Ie size
. IS.
878.16
¢2yz 1]-1
n> +-
2 - 2RS
y + R S2
- [ S2 x xy N
=[
2
0.125 x (555.43'f +J...-]-I =20.51",21.
342021.5 + 0.632492 x 1176526- 2 x 0.63249 x 509910.41 50
Solution. Note that the population size is 50. Mr. Bean started with the first two
columns of the Pseudo-Random Numbers (PRN) given in Table I of the Appendix
and selected the following 2 I distinct random numbers between I and 50 as: 01, 23,
46,04,32,47,33,05,22,38,29,40,03,36,27,19,14,42, 48, 06, and 07.
:it~~~i!~[ :rf,[!~~r' :;, I r!f'!r~i~fi~' :~ ! ~i(~: ~!~):,:[I:! : l i ~~i[~Y) g 1·','(:1:" ;11:1~;t;!-z:i Y,1
I~ : : ',:
:'i ' ,!;!" , \2
1,! !.i:J::'r,, ! 'N
01 AL 348.334 408.978 303627.6 21302 .6 80424 .302
03 AZ 43 I.439 54.633 218948 .3 250299 .3 234099 .570
04 AR 848.317 907.700 2605.2 124445.1 -18005 .653
05 CA 3928.732 1343.461 9177106 .0 621777 .6 2388748 .500
06 CO 906.281 315.809 47.9 57179.9 -1655.427
07 CT 4.373 7.130 800998.3 300087.3 490274 .840
14 IN 1022.782 1213.024 15233.5 433084 .8 81224.255
19 ME 51.539 8.849 718797.2 298206.9 462979 .800
22 MI 440.518 323.028 210534.2 53779.6 106406.960
23 MN 2466.892 1354.768 2457163.0 639737 .2 1253769.700
27 NE 3585.406 1337.852 7214853 .0 6 I2963.4 2102960 .000
29 NH 0.471 6.044 807998 .0 301278 .3 493388 .550
Continued .
Chapter 3: Use of auxiliary information: Simple random sampling 145
Give n N=50 and X=878 .16 . Now from the above table, n =21, Y=55 4 .93223,
x = 899.35809, s; = 1282260 , s; = 233397.6, Sty = 442921.47 , r = 0.617, and
f = 0.42.
Thus rat io estima te of the ave rage real estate farm loans in the United States is
-
YR Y xJ
= -( X = 554.93223( 878.16 ) = $541.85 .
899 .35809
An estimate of MSE(YR) is give n by
MSE\.YR f )f
• to: ) = ( -1-n- lSy2 + r 2Sx2 - 2rs ]
xy
= C -2
0{42
) [ 233397.6 + (0.617)2 x 1282260 - 2 x 0.617 x 442921.47]
= 4832.64 .
Us ing Tabl e 2 from the Appendix the 95% confide nce interval for the ave rage real
estate farm loans is given by
3.2.2 PRODU€ffESHMAffOR
Murthy (1964) considered another est imator of popul ation mea n Y using known
population mean X of the aux iliary variable as a product estima tor
(3.2.2 .1)
Theorem 3.2.2.1. The exact bias in the product estimator yp of the population
mean r is given by
l
3.1 we have
Thus the bias in the product estimator yp of the population mean is given by
B(yp)=E(yp)-Y =C~f)yPXYCXCy .
Hence the theorem.
Theorem 3.2.2.2. The mean squared error of the product estimator yp, to the first
order of approximation, is given by
MSE(yp) = C~f)y2[c; + C; + 2PxyCyCxJ, (3 .2.2.5)
Proof. By the definit ion of mean squared error (MSE) using (3.2.2.3) and again
neglecting higher order terms and using results from section 3.1 we have
- ) -2 r 2
MSE (YP = Y ElCo + CI2 + 2 cOCI ]
Hence the theorem.
Theorem 3.2.2.3. An estimator of the MSE of the product estimator yp , to the first
order of approximation, is given by
, ( ) () -f)[
MSE yp = -n- Sy2 +r2Sx2 +2rsxy] . (3.2.2.6)
Theorem 3.2.2.4. The product estimator yp is more effic ient than sample mean y
if
Cy )
PXYC <-"2 ' (3.2.2.7)
x
Chap ter 3: Use of auxiliary inform ation : Simple random sa mpling 147
Proof. The proof follows from the fact that the product estimator y p IS more
efficient than the sample mean y if
MSE(yp) < V(y)
orif
In the condition (3.2.2.7), if we assume that Cy '" Cx ' then it holds for all values of
the correlation coefficient P xy in the range [-1. 0, - 0.5) . Thus we have the
following theorem .
Theorem 3.2.2.5. The product estimator yp is more efficient than the sample mean
y if Pxy < -0.5 , i.e. , if the correlation between X and Y is negative and high .
Remark 3.2.2.1. We observed that the product and ratio estimators are better than
sample mean if the value of P xy lies in the interval [-1.0, -0.5) and (+0.5, +1.0],
respecti vely. Thus the sample mean estimator remains better than both the ratio and
product estimators of the population mean if Pxy lies in the range [-0 .5, + 0.5] .
Assume that the average age 67.267 years of the subj ects is known as shown in the
population 2 in the Appendix. Assuming that as the age of a person increases then
the sleeping hours decrease, apply the product method of estimation for estimating
the average sleep time in the particular village under study. Also find an estimator
of the mean squared error of the product estimator and deduce a 95% confidence
interval.
148 Advanced sampling theory with applications
i;< '"
1 408 55 132.25 110.25 - 120.75
2 420 67 552.25 2.25 35 .25
3 456 56 3540.25 90.25 -565.25
4 345 78 2652.25 156.25 -643 .75
5 360 71 1332.25 30.25 -200.75
6 390 66 42 .25 0.25 -3.25
Sum 2379 393 8251 .50 389.50 ~ 1 49 8 .5 0
Here }j = Duration of sleep (in minutes) , Xi = Age of subj ects (~50 years) , n = 6,
Y = 396.5, i = 65.5, s; = 77.9, s;= 1650.3, Sxy = -299.7, and r = Yli = 6.053 .
Also we are give n X = 67.267, N = 30 and f = 0.20.
Thus product estimate of the average sleep time, Y (say), is given by
Yp ~J = 396.5(~J
- = Y-( X 67.267 = 386.08'
and an estimate of MSE(yp) is given by
MSE(yp) = (1 ~f J[s; + r2s~ + 2rSry]
= ( 1- ~.20 J[1650.3 + (6.053)2x 77.9 - 2 x 6.053 x 299.7] = 116.83 .
A (1- a)100% confidence interval for population mean Y is given by
Exa mple 3.2.2.2. The duration of sleep (in minutes) and age of 30 people aged 50
and over living in a small village of the United States is given in the population 2.
Suppose a psychologist selected an SRSW OR sample of six individuals to collec t
the required information. Find the relative efficiency of the prod uct estimator, for
estimating average duration of sleep using age as an auxiliary variable, with respect
to the usual estimator of popu lation mean .
Solution. Using the description of the population 2 given in the Appendix we have
Yi = Duration of sleep (in minutes), Xi = Age of subjects (~50 years), N = 30 ,
- 2 2
X = 2018, Y = 11526, X = 67.267, Y = 384.2, Sy = 3582.58, Sx = 85.237,
C y2 = 0.0243, C x2 = 0.0188 , Sxy = - 472.607, an d Pxy = -0.8552 .
Chapter 3: Use of auxiliary information: Simple random sampling 149
Thus we have
I- f Y
- ) = ( -n-
MSE (yp )-2 rlCy2+ Cx2+ 2pxyCxCy]
= C-~.20 }384.2)2[0.0243 + 0.0188 - 2 x 0.8552~0.0243x 0.0188]
= 128.759.
Also
v(y) = (I ~f )s; = (1- ~.20) x 3582.58 = 477.677 .
Thus the percent relative efficiency (RE) of the product estimator yp with respect
to the usual estimator y is given by
Corollary 3.2.2.1. The minimum sample size for the relative standard error (RSE)
to be less than or equal to a fixed value ¢ is given by
- 1
¢2f2 1 (3.2.2.8)
n> +-
- [ S2
y + R 2S2
x + 2RSxy N]
3 ~2.3 REGRESSIONESTIMATQR .,
Thus the difference estimator Ydif is unbiased for the population mean, Y. The
variance of the estimator Ydif is given by
V(Ydir) = E[Ydif - yf
= E[Y(I+8o)-dX&] - y]2 = E[Y&O - dX&]]2
=E[ y2&6+d2X2&,2 _2dY X&o&,]
= (l~f)[Y2C;+d2X2C~_2dY XPXyCxcJ (3.2.3.4)
150 Advanced sampling theor y with applications
= (1-/)[S2 _
n y
s.~y
S2
] = (1-n/)Sy2[1_ S2S
Sly ; = (1- /) S2(I_ p.~) .
2 n y Y
(3.2.3.6)
x Y x
Cy Y Sxy
For the optimum value of d = Pxy - ~ = - 2 = /3 (regression coefficient, say) the
c, X Sx
difference estimator becomes
-
Ydif =Y S;
- + [ Sxy )(X
- - x-) . (3.2 .3.7)
Thu s the difference estimator becomes non-functional if the value of the regression
coeffic ient /3 = Sxy / s1 is unknown . In such situations, Hansen, Hurwitz, and
Madow (195 3) consider the linear regression estimator of the popul ation mean
Y as
YLR = Y + p(x - x), (3.2.3.8)
whe re p = s.w / s.~ denotes the estimator of the regression coefficie nt /3 = Sxy / S; .
Then we have the follo wing theorems:
Theorem 3.2.3.1. The bias in the linear regression estimator YLR of population
mean Y is given by
Proof. The linear regression estimator YLR , in terms of &0 , &\ , &3 and &4 , can
easily be written as
-( ) Sxy (I +&4)[- -( )]
YLR = Y 1+ &0 + 2 X - X 1+ &]
Sx( I +&3)
l
= Y(I+ &0)+ /3(1+ &4XI + &3t [X - X(I + &1)] .
Using the binomial expansion (I + «r ' = 1- &3 + &f + 0(&3) we obtain
YLR = Y(I + &0)- /3X lcl +&]&4 - &J&3 + 0(&)] . (3.2.3.10)
Taking expected value on both sides of (3.2.3.10) and neglecting higher order
terms, we obtain
Chapter 3: Use of auxiliary information: Simple random sampling 151
E(YLR) = Y
- - f3 X 1)[
1- - Cx--C
- (-
n
Al2
Pxy
xAo3 J= Y+
- (-1- - f3XCx Ao3 -A-
n
l2 J.
Pxy
I) - [
Thu s the bias is given by
Theorem 3.2.3.2. The mean squared error of the linear regression estimator YLR,
to the first order of approx imation, is
Proof. By the definition of mean squared error (MSE) and using (3.2.3.10) and
neglecting higher order terms we have
_(1 -/ )[
- - - S 2, + -S}y - 2-S}y] - - - S 2 - -
S2 S2 n y
_(1-/ )[ S~y]
S2
II }
x x x
= (I~/ )S~(I- p}y).
Hence the theorem .
Theorem 3.2.3.3. An estimator of the mean squared error of the linear regression
estimator YLR, to the first order of approximation, is given by
Theorem 3.2.3.4. The linear regression estimator YLR is always more efficient than
the sample mean Y if Pxy ;t 0 .
152 Advanced sampling theory with applications
Remark 3.2.3.1. If jJ = Ylx then the linear regression estimator YLR reduces to the
usual ratio estimator YR and if jJ = - Y/ X , then the linear regression estimator YLR
reduces to the usual product estimator yp .
Apply the regression method of estimation for estimating the average amount of the
real estate farm loans (in 000) during 1997. Also find an estimator of the mean
squared error of the regression estimator and deduce a 95% confidence interval.
Assume that the average amount $878.16 of nonreal estate farm loans (in $000) for
the year 1997 is known.
Solution. From the sample information, we have
H ere 2 2
n=8, Y=348 .3554 , x=531.4353, sx=170082.85, sy=131919.67,
,
sxy=118102.55, fJ=sxy / sx=0.6943
2 I
and rxy=sxy/\SxSy ) =0.7884 . Also we are
given X=878 .16, N=50 and /=0 .16.
Thus the regression estimate of average amount of real estate farm loans during
1997, Y (for example), is given by
YLR = Y + jJ(x- x)= 348.3554 + 0.6943x (878.16 - 531.4353) = 589.08
Chapter3: Use of auxiliary information: Simplerandomsampling 153
Solution. From the description of the population 1 given the Appendix we have
- - 2
Y = 555.43, X = 878.16, Sy = 342021.5, Pxy = 0.8038, and N = 50 .
Also from example 3.2.1.2 we have
MSE(YR)= 17606.39 .
Now
What is the minimum sample size required for relative standard error (RSE) to be
equal to 12.5%? Use that data as shown in population I of the Appendix .
Exa mple 3.2.3.4. A bank manager selects an SRSWOR sample of eighteen states
from population I of the Appendix and colIects information about real estate farm
loans and nonrea l estate farm loans. Estimate the average real estate farm loans by
using the regression method of estimation, given that the average amount of nonreal
estate farm loans in the United States is known to be equal to $878.16 .
Solution. The bank manager used the 19th and 20 th columns of the Pseudo-Random
Numbers (PRN) given in Table I of the Appendix to select the folIowing 18 distinct
,
random numbers between 1 and 50 as:16, 31, 50, 29, 08, 33,19,28,11,07,27,37,
-r .(Yi X)(Xi-X)
48, 22, 24, 46, 41, and 32.
r (xt~xr
"-, ,
Random Stat~
No ,.,
Yi ~> ( y.-y
-&- c;
.',L I '< ,~
Here N = 50 and X = 878. 16 . The above table shows II = 18, Y = 304 .5265,
x = 631.8754, s; = 982 834.03, s;= 147134.11, and Sxy =343672.33 . Thu s
iJ = 0.3496 , t:~y = 0.9037 and f = 0.36 .
Thu s the regression estimate of the average real estate farm loans in the United
States is
YLR = Y + iJ(x - x)= 304.5265 + 0.3496(878.16 - 631.8754 ) = 390. 627.
Using Table 2 from the Appendix the 95 % confidence interval for the average real
estate farm loans is given by
390.627 ± 2.120v'959.059 or [324.973, 456.280] .
'~:Units A B C D E
Yi 9 II 13 16 21
Xi 14 IS 19 20 24
Do the following:
( a ) Select all possible SRSWOR samples each of n = 3 units;
( b ) Find the variance of the sample mean estimator by definition;
( c ) Find the variance of the sample mean estimator using the formula. Comment;
( d ) Find the exact mean square error of the ratio estimator by definit ion;
( e ) Find the approximate mean square error of the ratio estimator using first order
approximation;
( f) Find the ratio of approximate mean square error to that of exact mean square
error of the ratio estimator and comment;
( g ) Find the exact mean square error of the regression estimator using definit ion;
( h ) Find the approximate mean square error of the regression estimator using first
order approximation;
( i ) Find the ratio of approx imate mean square error of the regression estimator to
that of the exact mean square error and comment;
( j ) Find the exact relative efficien cy of the ratio estimator with respect to sample
mean estimator;
156 Adv anced sampling theory with applications
( k ) Find the approximate relative efficiency of the ratio estimator with respect to
the sample mean estimator and comment;
( I ) Find the exact relative efficienc y of the regression estimator with respect to the
samp le mean estimator;
(m) Find the approximate relative efficiency of the regress ion estimator with respect
to the sample mean and comment.
Solution. ( a ) From Chapter I we have following information for this population
- - 2
Y = 14, X = 19, Sy = 22, s;2 = 13, S xy = 16.25, P xy = 0.96, and f3 = 1.25. Also
from the all poss ible 10 samples of n = 3 units taken from the population of N = 5
units .
(~) f- -\2
Exact V(Yt)= L PI 151 - Y J = 2.933 .
1=1
( c ) The variance of the samp le mean YI with formu la is given by
We can see from ( b ) and ( c ) that the exact variance and variance by the formula
are same .
Chapter 3: Use of auxiliary information: Simple random sampling 157
( d) The exact mean square error of the ratio estimator YR (t) = Y{ ; J is given by
(~) -
ExactMSE{YR} = I pJh(t)-yf =0.714.
1=\
( f) The ratio of approximate mean square error to the exact mean square error is
given by
. f S Approx.MSE(h) 0.681 0 953
RatIO 0 Mean quare Errors = ( ) = - - =. .
Exact.MSE YR 0.714
Note that this ratio of the mean square errors approaches unity if sample size and
population size are such that f = n] N ~ 0 .
- } = (1-n-
Approx.MSE{YLR - fJ Sy2[1- Pxy
2] = (1-3/5J
- 3 - x 22 x [1- 0.962] = 0.230.
(i ) The ratio of approximate mean square error to the exact mean square error of
the linear regression estimator is given by
. 0 f Mean Square Errors = Approx.MSE(YLR)
RatIO () =-0.230 0 386
-= . .
Exact. MSE YLR 0.596
Note that , for this particular example, the ratio of approximate mean square to the
exact mean square is far away from one, but if f = n]N ~ 0 then this ratio
approaches to unity.
(j ) The exact relative efficiency (RE) of the ratio estimator with respect to sample
mean estimator is
Exact RE of the Ratio Estimator = V(Yl)XI(O) = 2.933x100 = 410.78% .
Exact.MSE YR 0.714
158 Advanced samplin g theor y with applications
( k ) The approximate relative efficiency of the ratio estimator with respect to the
sample mean estimator is
.
Approximate RE t hee Rati
atio Estimator
. = V(Yt) x IOO( ) = 2.933 x 100 = 43O.69 0Yo .
Approx. MSE YR 0.68 1
It shows that the app roximate relative efficienc y expr ession for the ratio estima tor
gives a slightly higher efficiency than in reality.
( I ) The exa ct relative efficiency of the regress ion estimator with respect to the
sample mean estimator is
( m) The approximate relative effic iency of the linear regression estimator with
respect to sample mean estimator is
Approx. RE of the Regression Estimator = V(Yt ) x 100 = 2.933 x 100 = 1275 .21% .
Approx. MSE(YLR) 0.230
It also shows that the approximate relative effici ency expression for the regression
est imator gives higher effi ciency than in reality.
Caution : Be careful while using appro ximate expression for mean square error of
the linear regression estimator or approximate expression for estimating the mean
square error of the linear regression estimator. The interval estimate of the
popul ation mean may be bigger than you are constructing with the approximate
results.
Note the following graphic al situations in the Figure 3.2.1 for the use of ratio,
product, and regre ssion estimator in actual practice.
~:'I: V
~
i "I
' :E~~~
ll~~
2.5
J I
I
~ 'I j
.a 0.5 I
The follo wing tabl e is used to collect some informati on about these three
estimators, which will be useful to the readers:
Chapter 3: Use of auxiliary information: Simple random sampling 159
5 We have to estimate only If both variab les are Here we have two
one mode l parameter, so positive (x > 0 , and unknown parameters,
the degree of freedom for Y > 0) but the correlation viz.: intercept and slope,
constructing confidence is negative, then we have thus we must use
interval estimates will be both intercept and slope, df=(n-2) . Its more
df = (n - I) . and then we shou ld must justification is give n In
use df=(n-2). the Section 3.6.
Srivastava (1967) considered another estimator of popu lation mean, Y , using the
know n popu lation mean, X, of the auxiliary variable, as a power transformation
estimator given by
_
Yrw = Y X
_(:x)a (3.2.4. I)
T heorem 3.2.4.1. The bias in the power transformation estimator Yrw , to the first
order of approximation, is given by
Proof. The power transformation estimator Yrw , in terms of £0 and £, , can easily
be written as
Yrw = Y(I +£ 0 XI + elf = Y(I +&0 {I +a£, + a(a -I)£ ,2 +0(&1))
2
Taking expected values on both sides of (3.2.4.3) and using results from section
3.1, we obtain (3.2.4.2). Hence the theorem.
T heorem 3.2.4.2. The minimum mean squared error of the power transformation
estimator Yrw , to the first order of approximation , is given by
P roof. By the definitio n of mean squared error (MSE), using (3.2.4 .3) and aga in
neglecting the higher order terms we have
MSE(Yrw) = E~rw - r] = E[Y(I +&0 +a£1 +0(&; ))- yf
= y2E[ £6+ a 2£,2 + 2a£0£, ] .
Chapter 3: Use of auxiliary information: Simple random sampling 161
The power a depends upon the optimum values of unknown parameters. Thus the
estimator ypw is not practicable. Thus we have the following corollary.
r
YPW(pract) of population mean Y is given by
YPW(pract) = Y( ~ (3.2.4.8)
where
a = -(xsxy}/~s;)
is a consistent estim ator of a . Note that while making confidence interval estimate
with the power transformation estimator the degree of freedom will be (n - 2).
Remark 3.2.4.1. The difference estim ator Ydif of the population mean, Y , given as
Ydif = Y + d(X - x) (3 .2.4.9)
has the same variance equal to the mean squared error of the linear regression
estimator for the optimum value of d = Sxy / S.~ = f3 . Again note that the degree of
freedom for constructing confidence interval estimates will be df = (n - I), because
the slope is assumed to be known, but we estimate the intercept.
or Ynsu = Y( ~J (3.2.5.2)
auxiliary variable.
n - s\--
- ( l +so--
= Y n - sOs\ ) .
N -n N- n
Tak ing expected values on both sides we have
E(Ynsu )= YE(I + So - _ n_ s\ - _n- SOS\)
N -n N- n
- n 1- f - _ -Y YPxyC,Cy
= Y - - - x --YPxyCxCy
N -n n N
Thus the bias in the estimator Ynsu is given by
_ )_ (_ )_ - _ _ YPxyCxCy __ Sxy
B (Ynsu - EYnsu Y- -_
N NX
which proves the theorem.
Theorem 3.2.5.2 . The mean squared error of the estimator Ynsu is given by
where g = _ n_ and N 2 ~ 00 •
r
N- n
Proof. We have
1- f -2 2 n 2 n
= ( -n- ) Y [ Cy + ( N-n ) 2 Cx-2(
N -)n PxyCxC y ]
Theorem 3.2.5.3. The estimator Ynsu is more efficient than the ratio estimator YR if
N N
n < - , and Pxy < ( )' (3.2.5.4)
2 2N- n
assuming that the correlation coefficient Pxy is positive .
Proof. The estimator Ynsu wilI be more efficient than the ratio estimator YR if
MSE(Ynsu)< MSE(YR)
or C~f)y2 [c~ + g 2C; - 2g p xy C xCy ]< C~f )y2 [cf, +C; - 2P xyCxCy ]
or (g2 - I~; - 2(g -1)pxyCxCy < 0 or (g -IXg+ l)c; - 2(g -1)pxyCxCy < 0
or (g -I)[(g+I)c}-2PxyC t Cy] < 0. (3.2.5.5)
or n - N + n < 0 and Cy (g + I)
N- n PXYC <-2-
x
or N Cy n+ N-n N
n < - and Pxy - < ( ) = ( )
2 Cx 2 N-n 2 N-n
N N
For Cy '" Cx we have n < - and Pxy < ( ).
2 2N- n
This cond ition holds in practice . For example , if N = 100 and n = 30 then Pxy IS
or n - N + n O d Cy (g + I)
> an Pxy->--
N-n c, 2
or N Cy n + N - n N
n > 2 and Pry C, > 2(N _ n) = 2(N - n) .
For Cy "" Cx we have
n >-
N
2
and Pxy > (
N
2 N-n
r
This condition will not hold in practice. For example, if N = 100 and n = 70 then the
value of Pxy needs to be more that 1.667, which is not possible. Hence the
theorem.
Note that lu - II < I thus the higher order terms can be neglected. Using (3.2.6.2)
and (3.2.6.3) in (3.2.6.1) we obtain
tg = y[ I+(u-I)H) + (u - I)2Hz + ..... ] (3.2 .6.4)
sn l o2 H
where HI = "u lu=l and H2 = - - - 2 lu=\ denote the first and second order partial
u 2 ou
derivatives of H with respect to u and are the known constants. Evidently the class
of estimators t g given at (3.2.6.4) can easily be written in terms of &0 and &) as
Theorem 3.2.6.1. The bias in the general class of estimators t g defined at (3.2.6.1),
to the first order of approximation, is
(3.2 .6.6)
Theorem 3.2.6.2. The minimum mean squared error of the general class of
estimators t g defined at (3.2.6.1), to the first order of approximation, is given by
.
Mm.MSE t ()(I-f)-22(
g= -n- Y Cy1- Pxy2) . (3.2.6 .7)
HI = -P xy-CCy ·
x
(3 .2.6.9)
One may note here that regression estimator and difference estimator are not special
cases of the general class of estimators defined in (3.2.6 .1). Srivastava (1980)
defined another class of estimators and named a wider class of estimators as
tw = H[y, u] (3.2.7.1)
where H[y, u] is a function of y and u and satisfies the following regularity
conditions:
( a ) The point (y, u) assumes the value in a closed convex subset R2 of two-
dimensional real space containing the point (Y,I) ;
( b ) The function H(y, u) is continuous and bounded in R2 ;
( c ) H(Y, I) = Y and Ho(Y, I) = 1, where Ho(Y, I) denotes the first order partial
derivative of H with respect to y;
( d ) The first and second order partial derivatives of H (y, u) exist and are
continuous and bounded in R2 .
Expanding H(y, u) about the point (Y, I) in a second order Taylor series we have
tw =H(y,u)=H[Y +(y-Y}I+(u-I)]
(- ) (_ -y:JH sn
= H Y,I + Y - Y o y ly=Y,u=1+(u -I) ou ly=Y,u=l +(u -I)
\2 1 0 2lJ
2 ou2 ly=Y,u=l +
_ -)2 1 0 2H (_ -y 1 0 2lJ
+(y-Y 2 02 y2 Iy =y,U=1 +Y - YAU- I)2 o yo u ly=Y,u=1 + (3.2 .7.2)
2
lJ = 10 H 1_ _
4 2 0 2y 2 y=Y,u=l·
Thus we have the following theorems.
Theorem 3.2.7.1. The asymptotic bias in the wider class of estimators i; of the
population mean Y is:
B(t w ) = C~f )[YPxyCrCyJ-l3 + C';J-l 2+ Y 2C;J-l 4]. (3.2.7.4)
Proof. The wider class of estimators t w ' in terms of &0 and &1' can easily be
written as
Chapter 3: Usc of auxiliary information : Simple random sampl ing 167
(3.2.7.5)
Taking expected values on both sides of (3.2.7.5) and using the definit ion of bias,
we obtain (3.2.7.4). Hence the theorem .
Theorem 3.2.7.2. The minimum mean squared error of the wider class of
estimators, t w ' is given by
In this sect ion, we will show that the known variance of the auxiliary variable can
also be used as a benchmark, in addition to the known population total or mean of
the auxiliary variable, to improve the estimators of the finite population mean of the
study variabl e under certa in circumstances.
where u = xl X , v = s;/
S; and H(u , v) is a function of u and v such that:
( a ) The point (u, v) assumes the value in a closed convex subset R2 of two-
dimensional real space containing the point (I, I);
( b ) The function H(u , v) is continuous and bounded in R2 ;
(c )H(I ,I) = I;
( d ) The first and second order partial derivatives of H(u ,v) exist and are continuous
and bounded in R2 .
Thus all ratio and product type estimators of population mean r defined as
- - X
YI = y (-J
x [ 2)
s;
S;
- - (-
X
, Y2 =y aX+(I-a)X
J[ Sx
2) ,
yS; +(I - y )S;
-
and Y3 = y
- X
(-Ja[
x s;
Sx
2)Y
are the special cases of the class of estimators defined in (3.2 .8.1).
Expanding H(u , v) about the point (I, I) in a second order Taylor's series we obtain
YSJ = yH (u,v) = yH [1+(lI - I),I +(v- I)]
_[ Of! Of! \2 1 (} 2 H
"' Y H (I,I) +(u-I )& I(I,I) +(v-I )~ I(I,I) +(lI - l) 2w
2 1(1,1)
1 {}2H 1 {}2H ]
+(v - 1 f 2 (}v2 1(1,1) +(u- 1Xv - I)2 cum 1(1,1)+ .'
2H
= r(1 + coX 1+ s .H, + C3H2 + c I 3 + c} H4 + CIC3 HS + ..... ]
-[
'" Y 1+EO +cIH I +c3H2 + cI2 H3 +c3H4
2
+ clc3HS
+ cOEIHI + COC3 H2 + .... ] (3.2.8.2)
where
Of! Of!
HI =& '(1,1), H 2 = ~ I(I , I ) ' and
1 (} 2 H
Hs = 2 wOv 1(1,1)'
Thus we have the following theorems:
Theorem 3.2.8.1. The asymptotic bias in the class of ratio type estimator s YSJ IS
Proof. It follows by taking expected values on both sides (3.2.8 .2) we have
E(YSJ) = rE[ 1+ co + e.H , +c3H2 + c?H 3 + c}H 4 +clc3HS +coclH I + coc3 H2 + .... ]
B(YSJ ) = E(YSJ ) - Y
Theorem 3.2.8.2. The minimum MSE of the class of estimators YSJ is given by
MS E(YSJ) f
= E[YSJ - Y = y 2E[eo + e,H, +e3H 2 + 0 (& )j2
- 2 [2 2 2 2 2 ]
= Y E eo + e, HI +e3 H2 + 2eoe,H , + 2eOe3 H2 +2 ele3H ,H2
+2Cx~3HIH2 ]. (3.2.8.5)
Srivastava and Jhajj ( 198 1) also con sidered a wider class of estimators of
population mean Y as
YSJ(w) = H (y , 1/, v) (3 .2.8.8)
( a ) The point (y, 1/, v) assumes the value in a closed convex subset R3 of three-
dimensional real space containing the point (Y, I, I) ;
170 Advanced sampling theory with applications
( d ) The first and second order partial derivatives of H(y, u, v) exist, and are
continuous, and bound ed in R3 .
Expanding H(y, u,v) about the point (Y, I, I) in a second order Taylor 's series we
have
(3.2.8.9)
where
if{ if{ if{ 1 0 2H 1 0 2H
oy I(Y.I,I)=I , HI =~ I(Y,I,I)' H 2=a; I(Y,I,I), H 3- 2 oy2 I(Y,I,I)' H 4 ="2 at2 I(h l)'
2 2 2 2
H t 0 H I H =..!.- 0 H I_ H =..!.- 0 H I_ and H =..!.- 0 H 1-
5 ="2 a,.2 (Y,I,I)' 6 20yat (Y,I,I)' 7 2 ata,. (Y,I,I)' g 20voy (Y,I,I)'
Theorem 3.2.8.4. The minimum MSE of the class of estimators YSJ(w) is given by
HI =- C t,
YCy {PXy (A04
x /L04 -
-1)- A]2A03}
,z \
1- /L03 J
' and Hz =-
YCy {A]2 - Pxy A03}
,
/L04 -
I ,z
- /L03
. (3.2.8.15)
Remark 3.2.8.1.
( b ) The asymptotic minimum mean squared error of the ratio type and the wider
class of estimators remains the same.
( c ) Note that A]2 and A03 are odd ordered moments. In case X and Y follow the
bivariate normal distribution then both A]2 and A03 are zero. In such situations the
minimum mean squared error of the class of estimators proposed by Srivastava and
Jhajj (1981) reduces to the mean squared error of the usual linear regression
estimator . Thus there is no advantage in using the known variance of auxiliary
variable for the construction of the estimator of the population mean Y if the joint
distribution of the study variable Y and auxiliary variable X is a bivariate normal
distribution .
172 Advanced sampling theor y with applications
( e ) There are large number of estimators belongi ng to the same clas s of estimators
with the same minimum asymp totic mean square error, so it is difficult to select an
estimator for a particu lar survey, and there is no theoretical technique avai lable in
the literature to select an estimator.
Solution. From the description of the population I give n in the Appendix we have
- 2
Y = 555.43 , X = 878.16 , C y = 1.1086 Ao3 = 1.5936 , Pxy = 0.8038 , N = 50 ,
A12 = 1.0982 , and Ao4 = 4.5247 .
Now
= 12709.55 ,
Thus percent relative efficiency (RE) of the general class of estimators, YSJ, with
respect to the linear regression estimator, YLR , is given by
- ) x 100/Mm.MSE
RE = V (YLR . (-)
YSJ = 12709.55 x I 00 = II 059
. %.
11491.74
It should be noted that in this case the relative efficiency is independent of sampl e
size 1/.
The next section of this chapter has been devoted to con structing the unbia sed ratio
and product type of estimators of the population mean . We will discuss
Queno uille 's method, interpe netrating sampl ing method, exactly unbia sed ratio and
product type esti mators, and bias filtratio n techniques .
Chapter 3: Use of auxiliary information: Simple random sampling 173
We have observed that the ratio and product type of estimators are biased . Several
researchers have attempted to reduce the bias from these estimators. We should also
like to discuss a few methods to construct unbiased ratio and product type
estimators of population mean before going on to the problems of estimation of
finite population variance, correlation coefficient, and regression coefficient.
(a) YRI = YI(~J , where YI = n-I.IYi and x] = n-1 .I,xi are the first half sample
XI 1= 1 1=1
(b) YR2 = Y2( ~ J' where Yz = n-1 IYi and x2= n-I IXi are the second half sample
X2 i=1 i=1
means for Y and X variables , respectively ;
(c) YR = y(~J, where Y=(2nt l ~Yi and x= (2ntl ¥Xi are the sample means for
X 1= 1 1=1
and
(N -2n)
a= 2N (3.2.9 .5)
Proof. We have
E(YQ) = E[a(h l + YRJ+(1 - 2a)YR] = a[E(yRI)+ E(YRJ] + (1- 2a)E(YR)
(N - 2n)
2a(~-~)
n N
+(1-2a{~-~)=o,
\2n N
or if a=-
2N
.
0.6
0.5
0.4
0.3
...o
1Il 0.2
0.1
Ql
~ O -f! 1 +
> -0.1 3 4 5
-0.2
-0.3
-0.4
-0.5 j
Sample Size (n)
For more details, one can refer to Singh and Singh (1993), Murthy (1962) and Rao
(1965a) . The reduction in bias to the desired degree by using the method of
Quenouille (1956) has also been discussed by Singh (1979) .
Chapter 3: Use of auxiliary information: Simpl e random sampling 175
Let us first present an idea about the interpenetrating samples. If we want to select
II units with SRSWOR sampling, we can select k independent samples each of
size til = /I / k , where we assume that /I / k is an integer. We draw til units out of N
units, then put back these til units so as to make the popul ation size the same. To
make the k samples independent, each individual sample of III units is selected
with SRS WOR sampling. Now we have k samples each of size til. From the /"
sample, a ratio type estimator to estimate the popul ation mean Y is
_ <v,_(x)
YRj Xj
where Yj = tII-
1m
Z:Yi
m
and xj =tII - I Z:Xi denote the i" sample means for
i=\ i=l
the Y and X variables, respecti vely, for j = 1, 2,..., k. Let us defin e a new estimator of
the population mean Y as
_
YRK =
i
-k
z;
L YRj = -k
1 ~_ ( X )
L Y j -=-:- . (3.2.9.6)
F I J= I xJ
Also from the full sample information, we have the usual ratio estimator of
population mean Y given by
- =y-(XJ
YR x
and
Note that til units are drawn k times from a population of size N wh ich is equivalent
to a sample of size /I = km is drawn from a population of size kN . Thus we have the
followin g theorem:
Theorem 3.2.9.2. An unb iased estimator of the popul ation mean Y is given by
- kYR - YRK
Yu = k -I (3.2.9.9)
Proof. We have
E(yu) = E[A.YR - YRK] = kE(YR) - E(YRK)
k- 1 k- l
176 Advanced sampling theory with applic ations
Theorem 3.2.9.3. The varianc e of the unbiased estimator Yu of the popul ation mean
Y IS
Note that k > 1, thus the unbiased estimat or Yu is less efficient than the ratio
estimator YR in case of finite popu lations.
Exa mple 3.2.9.1. Select three different samples each of five units by using
SRSWO R sampling from the population 1 given in the Appen dix. Collect the
information for the real and nonrea l estate fann loans from the states selected in
each samp le. The average nonreal estate farm loan is assumed to be known . Obtain
three differen t ratio estimates of the average real estate farm loans from the
information collected in the three sample s. Pool the information collected in three
sample s to obtain a pooled ratio estimate of the average real estate farm loans.
( a ) Derive an unbiased estimate of the average real estate farm loans.
( b ) Construct 95% confidence interval.
Give n: Average nonreal estate farm loans $878.16.
S ampleI I
Random-Number State Real estatefann Nonrea l estate farm
I S; Rli :5; 50 loans, Yi loans, Xi
01 AL 408 .978 348 .334
23 MN 1354.768 2466 .892
46 VA 321 .583 188.477
04 AR 907 .700 848.317
32 NY 201.631 426.274
Sum 3194 .660 4278.294
6.044
1213.024
1100.745
323.028
553.266
3196 :107 '
P 00 Ied Sam I: e
..,
Stafe .;;;';;;.Yi:;; :liF••r:·;ix; :,,;
: ". ';,' \y;; ~).;• •
.... >
m'u'l'·i'y)2.;. ' I;e.:(.{ cW ·;' ...;.. (V.
' ",'
'.:A c..,
, ...•
AL 408.978 348.334 -198. 1400 -343.722 39259 .3 118 144.9 68 104.989
MN 1354.768 2466 .892 747.6503 1774.836 558981.0 3150042.0 1326956 .627
VA 321 .583 188.477 -285 .5350 -503 .579 81530 .1 253591.9 143789 .300
AR 907 .700 848.317 300.5823 156.261 90349 .7 24417.5 46969.256
Contmued .....
Chapter 3: Use of auxiliary information: Simple random sampling 179
A ratio estimate of the average real estate farm loans from the pooled sample
information is given by
An unbiased estimate of the average real estate farm loans in the United States is
Using Table 2 from the Appendix the 95% confidence interval of the average
amount of the real estate farm loans in the United States is