Unit 1 Notes
Unit 1 Notes
Preliminaries
Population:
In statistics, a population refers to the entire group of individuals, objects, or events that
we want to study and draw conclusions about. It includes all the elements that share a
common characteristic or set of characteristics. The population is the target of our
investigation, and we aim to make inferences or generalize findings about the population
based on collected data.
Examples of populations:
All registered voters in a
country. Every apple tree in an
orchard.
All employees working in a particular company.
All cars produced by a specific manufacturer in a given year.
However, it is obvious that for any statistical investigation, complete enumeration of the
population is rather impossible. For example, if we want to have an idea of the average
per capita (monthly) income of the people in India, we will have to enumerate all the
earnings of individuals in the country, which is rather a very difficult task.
If the population is infinite, complete enumeration is not possible. Also, if the units are
destroyed during inspection (e.g., inspection of crackers, explosive materials, ets.), 100%
inspection, though possible, is not all desirable. Even when the population is finite or
non- destructive inspection is possible, we don't seek 100% inspection due to various
reasons. These reasons include administrative and financial constraints, time limitations,
and other practical considerations. Instead, we use sampling as a method to gather
information.
Sample:
A sample is a subset or smaller representative selection taken from a population. It is a
practical and feasible way to collect data and make inferences about the population as a
whole. By studying the sample, we can gain insights into the characteristics, behaviors, or
properties of the larger population.
Examples of samples:
Surveying 500 randomly selected households to estimate the average household income
in a city.
Conducting a focus group with 10 individuals to gather opinions about a new
product. Testing a sample of 100 patients to evaluate the effectiveness of a new
medication.
Examining a sample of 50 schools to assess their performance on standardized tests.
In each of these examples, the sample represents a smaller subset of the respective
population. By studying and analyzing the sample data, researchers can make
generalizations or draw conclusions about the larger population.
It's important to note that the quality and representativeness of the sample play a crucial
role in the accuracy and validity of the inferences made about the population. Various
sampling methods are employed to ensure the sample is representative, minimizing biases
and increasing the likelihood of generalizability.
Sampling and census are two methods used in data collection, with sampling being a
more commonly used approach. Let's go over some basic concepts related to sampling
and census:
Sampling: Sampling is the process of selecting a subset of individuals, objects, or events
from the population to gather data. The selected subset is called a sample. Sampling
allows researchers to collect information from a smaller group and make inferences about
the larger population.
Census: A census is a data collection method that aims to collect information from every
individual, object, or event in the population. It attempts to gather data from the entire
population rather than a sample. Conducting a census can be time-consuming, expensive,
and impractical for large populations.
Additional Notes:
An estimator and an estimate are two related concepts used in statistics.
Estimator: An estimator is a rule or a function that is used to calculate an estimate of an
unknown parameter based on observed data. In simpler terms, an estimator is a method or
a formula that is used to make an educated guess about an unknown quantity or
parameter.
Example of an estimator: In a population, you want to estimate the average height of
individuals. You collect a random sample of 100 individuals and measure their heights.
To estimate the average height of the entire population, you can use the sample mean (x̄
) as an estimator. The sample mean is a formula that calculates the average height of
the observed sample.
Estimate: An estimate, on the other hand, is the calculated value or approximation of the
unknown parameter based on the data and the estimator. It is the specific value obtained
by applying the estimator to the observed data.
Example of an estimate: Continuing with the previous example, if the sample mean (x̄
) of the observed heights in the random sample is 175 cm, then 175 cm would be the
estimate of the average height of the population. It represents the specific value obtained
by using the estimator (sample mean) on the observed data.
To summarize, an estimator is a method or formula used to make an educated guess about
an unknown parameter, while an estimate is the specific value obtained by applying the
estimator to the observed data.
SAMPLING DISTRIBUTION:
Sampling Distribution: The number of possible samples of size ‘n’ that can be drawn
from the finite population of size N is 𝑵𝑪𝒏. (If N is large or infinite, then we can draw a
large number of such samples.) For each of these samples, we can compute a statistic, say
‘t’ (e.g., mean, variance, etc.,) which will obviously vary from sample to sample. The
collection of these statistic values, obtained from each sample, can be organized into a
frequency distribution called the sampling distribution of the statistic. Thus, we can
have the sampling distribution of the sample mean 𝑥̅, the sample variance (𝑠2), etc.
√𝑃(1 −
1
𝑃)⁄𝑛
2 Sample Proportion ‘p’
1.25331 ∗ 𝜎⁄√𝑛
5 Quartiles
𝜇3
population correlation coefficient
𝜎3√96/𝑛
𝜇4
8
9 𝜎4√96/𝑛
𝑉 2𝑉2 𝑉
√(1 + ) ≅
10 Coefficient of Variation (V)
√𝑛 104 √2𝑛
Utility of Standard Error (SE):
S.E. plays a very important role in the large sample theory and forms the basis of the
testing of hypothesis. If ‘t’ is any statistic, then for large samples, we have
𝑍 =
𝑡−𝐸(𝑡) ~ 𝑁(0, 1) ===> 𝑍 ~ 𝑁(0, 1)
𝑡−𝐸(𝑡)
√𝑉(𝑡)
=
𝑆.𝐸(𝑡)
Therefore, if the difference between the observed and the expected (hypothetical) values
of the statistic is greater than 1.96*S.E(statistic), then the hypothesis is said to be
rejected at 5% level of significance. Similarly, if |𝒕 − 𝑬(𝒕)| ≤ 𝟏. 𝟗𝟔 ∗ 𝑺. 𝑬(𝒕), then
the difference is not regarded significant at 5% level of significance. In other words, the
difference, t – E(t), could have arisen due to fluctuations of sampling and the data do not
provide us any evidence against the null hypothesis which may, therefore, be accepted at
5% level of significance.
The magnitude of the standard error gives an index of the precision of the estimate of the
parameter. The reciprocal of the standard error is taken as the measure of reliability or
precision of the sample.
The objectives of a survey are the specific goals or purposes that guide the survey
research. These objectives outline what the survey aims to achieve and the
information it seeks to gather. Sometimes, even the organization or group funding a
survey may not have a clear idea of what they want to achieve or how they will use the
results. It is important for the survey sponsors to make sure that their objectives align
with the resources they have, such as money, people, and time needed to obtain and
analyze the survey results.
2. Defining the Population to be Sampled. The population, i.e., the aggregate of objects
(animate or in-animate) from which sample is chosen should be defined in clear and
unambiguous terms. For example, in sampling of farms clear-cut rules must be framed to
define a farm regarding shape, size., etc., keeping in mind the border-line cases to enable
the investigator to decide in the field without much hesitation whether or not to include a
given farm in the population.
But practical difficulties in handling certain segments of the population may point to their
elimination from the scope of the survey. Consequently, for reasons of practicability or
convenience the population to be sampled (the sampled population) is different, in fact
more restricted, than the population for which results are wanted (the target population).
Additional Notes:
When selecting farms for a sample survey, it is necessary to establish specific rules that
clearly define what constitutes a farm in terms of its shape, size, and other relevant
factors. These rules should be designed in a way that allows the investigator to make
decisions in the field without much difficulty, even in cases where a farm may fall on the
border or is uncertain to include in the survey population.
Defining the population to be sampled means determining the specific group or set of
individuals that will be the focus of the survey. It involves clearly identifying the
characteristics or criteria that individuals must possess in order to be included in the
population.
Imagine you are conducting a survey on smartphone usage among college students. In
this case, the population to be sampled would be college students who own and use
smartphones. The characteristics that define this population include being enrolled in
college and owning a smartphone.
To define the population more precisely, you might specify additional criteria such as the
minimum age of the students (e.g., 18 years and above), the type of college (e.g.,
universities or community colleges), and the level of education (e.g., undergraduate or
graduate students).
By clearly defining the population, you establish the boundaries within which you will
select your sample. This ensures that the sample accurately represents the target group
and allows you to draw meaningful conclusions about smartphone usage among college
students.
3. The Frame and Sampling Units. The population must be capable of division into
what are called sampling units for purposes of sample selection. The sampling units
must cover the entire population and they must be distinct, unambiguous, and non-
overlapping in the sense that every element of the population belongs to one and only one
sampling unit. For example, in socio-economic survey for selecting people in a town, the
sampling unit might be an individual person, a family, a household or a block in a
locality.
In order to cover the population decided upon, there should be some list, map or other
acceptable material, called the frame, which serves as a guide to the population to be
covered. The construction of the frame is often one of the major practical problems
since it is the frame which determines the structure of the sample survey. The lists
which have been routinely collected for some purpose, are usually found to be
incomplete or partly illegible or often contain an unknown amount of duplication.
Such lists should be carefully scrutinized and examined to ensure that they are free from
these defects and are up-to-date. If they are not up-to-date, they should be brought up-to-
date before using them. A good frame is hard to come by and only good experience helps
to construct a good frame.
Additional Notes:
In the context of sampling, the terms "frame" and "sampling units" refer to important
concepts related to identifying and selecting elements from a population.
Frame: A frame is a list or a source that contains all the individuals or elements that
make up the target population. It serves as a reference or sampling frame from which the
sample will be selected. A frame should ideally include every member of the population
to ensure representativeness.
Example: Let's say we want to conduct a survey to understand the preferences of college
students in a particular city. The frame would be a comprehensive list of all the colleges
or universities in that city. Each college or university would be represented as a separate
entry in the frame.
Sampling Units: Sampling units are the individual elements or units that are selected
from the population to form the sample. They are the entities on which observations or
measurements are made. The sampling units should be clearly defined and distinct,
ensuring that each element has an equal chance of being selected for the sample.
Example: Continuing with the college student survey example, the sampling units would
be the individual students within each college or university. Each student would represent
a separate sampling unit, and the selection process would involve randomly choosing
students from each college to form the sample.
It is essential to have a well-defined frame and clearly identified sampling units to ensure
that the sample accurately represents the population and allows for unbiased and reliable
inferences to be drawn.
4. Data to be collected. The data should be collected keeping in view the objectives of
the survey. The tendency should not be to collect too much data, some of which are never
subsequently examined and analyzed. A practical method is to chalk out an outline of the
tables that the survey should produce. This would help in eliminating the collection of
irrelevant information and ensure that no essential data are omitted.
Additional Notes:
When collecting data for a survey, it is crucial to align the data collection process with
the survey's objectives. It is not advisable to gather excessive amounts of data, especially
if some of it will not be examined or analyzed later. A practical approach is to create an
outline of the tables or results that the survey aims to produce. This helps eliminate the
collection of irrelevant information and ensures that no vital data are accidentally omitted.
By focusing on the intended outcomes, the data collection process can be streamlined and
more effective.
5. The Questionnaire or Schedule. Having decided about the type of the data to be
collected, the next important part of the sample survey is the construction of the
questionnaire (to be filled in by the respondent) or schedule of enquiry (to be completed
by the interviewer) which requires skill, special technique as well as familiarity with the
subject-matter under study. The questions should be clear, brief, corroborative, non-
offending, courteous in tone, unambiguous and to the point so that not much scope of
guessing is left on the part of the respondent or interviewer. Suitable and detailed
instructions for filling-up the questionnaire or schedule should also be prepared.
Additional Notes:
Once the type of data to be collected is determined, the next crucial step in a sample
survey is creating a questionnaire or interview schedule. This task requires skill,
knowledge of the subject, and expertise in designing effective survey instruments. The
questions should be clear, concise, respectful, and easy to understand for both
respondents and interviewers. It is important to avoid ambiguity and provide clear
instructions for completing the questionnaire or schedule.
8. Selection of Proper Sampling Design: The size of the sample (n), the procedure of
selection and the estimation of the population parameters along with their margins of
uncertainty are some of the important statistical problems that should receive the most
careful attention.
A number of designs (plans) for the selection of a sample are available and a judicious
selection will guarantee good and reliable estimates. For each sampling plan, rough
estimates of sample size ‘n’ can be obtained for a desired degree of precision. The relative
costs and time involved should also be considered before making a final selection of the
sampling plan.
From a practical point of view a small pretest, (i.e., trying out the questionnaire
and field methods on a small scale) has been found to be immensely useful. It always
helps to decide upon an effective method of asking questions and results in the
improvement of the questionnaire. Additionally, conducting a small-scale survey can
reveal potential issues and challenges that could become significant in a larger survey,
such as exceeding the budget or time constraints set for the project.
10. Summary and Analysis of the Data: The analysis of the data may be broadly
classified into the following heads:
(a) Scrutiny and editing of the data. An initial quality check should be carried
out by the supervisory staff while the investigators are in the field. Accordingly,
the schedules should be thoroughly scrutinized to examine the plausibility and
consistency of the data obtained. The scrutiny or editing of the completed
questionnaires will help in amending recording errors or in eliminating data that
are obviously erroneous and inconsistent.
(b) Tabulation of data. Before carrying out the tabulation of the data, we must
decide about the procedure for tabulation of the data which are incomplete
due to non-response to certain items in the questionnaire and where certain
questions are deleted in editing process. The method of tabulation, viz., hand
tabulation or machine tabulation, will depend upon the quantity of the data. For
a large-scale survey, machine tabulation will obviously be much quicker and
economical. For a large-scale sample survey, the use of code numbers for
qualitative variables is essential for machine tabulation. With simple
questionnaires, the answers can sometimes be pre-coded, i.e., entered in a manner
in which they can be conveniently or routinely transferred to mechanical
equipment such as personal computers, etc. Finally, the tables that lead to the
estimates are prepared.
(c) Statistical analysis. After the data has been properly scrutinized, edited and
tabulated, a very careful statistical analysis is to be made. Different methods of
estimation may be available for the same data. Appropriate formulae should then
be used to provide final estimates of the required information. Efforts should be
made to keep the procedure free from errors.
11. Information gained for Future Surveys: Any completed survey is helpful in
providing a note of caution and taking lessons from it for designing future
surveys. The data obtained from a completed sample, including measures like
means, standard deviations, and variability, along with the associated cost, can
provide valuable guidance for improving future sampling methods. Moreover, in
any complex survey, things usually do not go exactly as planned. Any completed
sample may serve as a lesson to the organizers for future surveys in
recognizing and rectifying the mistakes committed in the execution of the
survey.
Additional Notes:
The Law of Statistical Regularity is derived from the mathematical theory of
probability. According to [Link], “the Law of Statistical Regularity formulated in the
mathematical theory of probability lays down that a moderately large number of items
chosen at random from a very large group are almost sure to have the characteristics of
the large group.”
For example, if we want to find out the average income of 10,000 people, we take a
sample of 100 people and find the average. Suppose another person takes another sample
of 100 people from the same population and finds the average, the average income found
out by both the persons will have the least difference. On the other hand if the average
income of the same 10,000 people is found out by the census method, the result will be
more or less the same.
Additional Notes:
(i) Sampling Errors. Sampling errors have their origin in sampling since only a small
portion of the population was used to estimate population characteristics and draw
inferences about the population. As a result, sampling errors are not present in a complete
enumeration survey.
1. Faulty selection of the sample. Some of the bias is introduced by the use of defective
sampling technique for the selection of a sample, e.g., purposive or judgment sampling in
which the investigator deliberately selects a representative sample to obtain certain
results. This bias can select a representative sample to obtain certain results. This bias can
be overcome by strictly adhering to a simple random sample or by selecting a sample at
random, subject to restrictions while improving the accuracy are of such nature that they
do not introduce bias in the results.
Additional Notes:
Faulty selection of the sample refers to a situation where the process of choosing
participants or elements for a sample is defective, leading to a sample that does not
accurately represent the target population or research objectives. This can introduce bias
and weaken the validity and generalizability of the study's findings.
There are several ways in which the selection of a sample can be faulty:
Non-random sampling: If the sampling method used is not random, it can lead to a
biased sample that does not provide an unbiased representation of the population.
Examples of non-random sampling methods include convenience sampling, where easily
accessible individuals are chosen, or purposive sampling, where specific individuals are
deliberately selected based on certain characteristics.
Selection bias: This occurs when certain segments or groups of the population are more
likely to be included or excluded from the sample, leading to a distorted representation.
For instance, if a survey is conducted only among online users, it excludes individuals
who do not have internet access, resulting in a biased sample.
Voluntary response bias: This occurs when participants self-select to be part of the
sample, typically in response to an invitation or call for participation. Voluntary response
samples tend to be biased as they attract individuals with strong opinions or experiences,
while others may choose not to participate.
Under-coverage: Under-coverage happens when certain segments of the population have
a lower chance of being included in the sample. For example, if a survey is conducted
through telephone calls and excludes households without landline phones, it
underrepresents the population using only mobile phones.
Sampling frame issues: A faulty sampling frame, which is the list or description of the
target population, can lead to a biased sample. If the sampling frame is outdated,
incomplete, or inaccurate, it may exclude certain individuals or include duplicates,
compromising the representativeness of the sample.
It is crucial to use appropriate sampling techniques and methods to minimize or avoid
these sources of bias and ensure the selection of a representative and unbiased sample. A
carefully selected sample increases the validity and reliability of research findings,
allowing for accurate generalizations and inferences about the target population.
Additional Notes:
Faulty demarcation of sampling units refers to an error or mistake in defining the
boundaries or units for selecting samples in a research study or survey. It occurs when the
sampling units are not appropriately defined or identified, leading to a flawed or biased
representation of the population.
To understand this concept with a simple example, let's consider a study that aims
to investigate the average income of people in a particular city. The researcher decides to
collect data by selecting households as the sampling units. However, due to faulty
demarcation, the researcher includes commercial properties and businesses along with
residential households.
In this case, the faulty demarcation of sampling units would result in a biased
sample. The inclusion of commercial properties and businesses can significantly skew the
results since they generally have higher income levels compared to residential
households. The study's findings would not accurately represent the average income of
people in the city because the sampling units were not properly defined or demarcated.
To avoid faulty demarcation of sampling units, researchers need to clearly define
the boundaries and criteria for selecting samples. In this example, the researcher should
have restricted the sampling units to residential households only to obtain a more accurate
representation of the population's income levels.
4. Constant error due to improper choice of the statistics for estimating the population
parameters. For example, if x1, x2, ..., X, is a sample of independent observations, then
the
sample variance 𝑠2 ∑𝑛
𝑖= (𝑥𝑖−𝑥̅)
as an estimate of the population variance 𝜎2 is biased
=
1 2
𝑠2
= ∑𝑖= (𝑥𝑖−𝑥̅)
is an unbiased estimate of 𝜎2.
whereas the statistic 𝑛
1 2
𝑛−1
Remark. Increase in the sample size (i.e., the number of units in the sample) usually
results in the decrease in sampling error. In fact, in many situations this decrease in
sampling error is inversely proportional to the square root of the sample size as illustrated
in Fig. 7·1.
(ii) Non-sampling Errors. As distinct from sampling errors which are due to the
inductive process of inferring about the population based on a sample, the non-sampling
errors primarily arise at the stages of observation, ascertainment and processing of the
data and are thus present in both the complete enumeration survey and the sample survey.
Thus, the data obtained in a complete census, although free from sampling errors,
would still be subject to non-sampling errors whereas data obtained in a sample
survey should be subject to both sampling and non-sampling errors.
Non-sampling errors can occur at every stage of the planning or execution of census or
sample survey. The preparation of an exhaustive list of all the sources of non-sampling
errors is a very difficult task. However, a careful examination of the major phases of a
survey (complete or sample) indicates that some of the more important non-sampling
errors arise from the following factors:
1. Faulty Planning or Definitions. The planning of a survey consists in explicitly stating
the objectives of the survey. These objectives are then translated into (i) a set of
definitions of the characteristics for which data are to be collected, and (ii) into a set of
specifications for collecting, processing and publishing. Here the non-sampling errors can
be due to:
(a) Data specification being inadequate and inconsistent with respect to the objectives of
the survey.
(b) Error due to location of the units and actual measurement of the characteristics, errors
in recording the measurements, errors due to ill-designed questionnaire, etc.
(c) Lack of trained and qualified investigators and lack of adequate supervisory staff.
2. Response Errors. These errors are introduced as a result of the responses furnished by
the respondents and may be due to any of the following reasons:
(i) Response errors may be accidental. For example, the respondent may misunderstand
a particular question and accordingly furnish improper information un-intentionally.
(ii) Prestige bias. An appeal to the pride or prestige of person interviewed may introduce
yet another kind of bias, called prestige bias by virtue of which he may upgrade his
education, intelligence quotient, occupation, income, etc., or downgrade his age, thus
resulting in wrong answers.
(iii) Self-interest. Quite often, in order to safeguard one's self-interest, one may give
incorrect information, e.g., a person may give an underestimate of his salary or
production and an over-statement of his expenses or requirements, etc.
(iv) Bias due to interviewer. Sometimes the interviewer may affect the accuracy of the
response by the way he asks questions or records them. The information obtained on
suggestions from the interviewer is very likely to be influenced by interviewer's beliefs
and prejudices.
(v) Failure of respondent’s memory. One source of error which is common to most of
the methods of collecting information is that of ‘recall’. Many of the questions in surveys
refer to happenings or conditions in the past and there is a problem both of remembering
the event and associating it with the correct time period.
3. Non-response Biases. Non-response biases occur if full information is not obtained on
all the sampling units. In house-to-house surveys, non-response usually results if the
respondent is not found at home even after repeated calls, or if he/she is unable to furnish
the information on all the questions or if he/she refuses to answer certain questions.
Therefore, some bias is introduced as a consequence of the exclusion of a section of the
population with certain peculiar characteristics, due to non-response.
4. Errors in Coverage. If the objectives of the survey are not precisely stated in clear cut
terms, this result in (i) the inclusion in the survey of certain units which are not to be
included, or (ii) the exclusion of certain units which were to be included in the survey
under the objectives. For example, in a census to determine the number of individuals in
the age group, say, 20 years to 50 years, more or less serious errors may occur in deciding
whom to enumerate unless particular community or area is not specified and also the time
at which the age is to be specified.
5. Compiling Errors. Various operations of data processing such as editing and coding
of responses, tabulation and summarizing the original observations made in the survey
are a potential source of error. Compilation errors are subject to control through
verification, consistency check, etc.
6. Publication Errors. Publication errors, i.e., the errors committed during presentation
and printing of tabulated results are basically due to two sources. The first refers to the
mechanics of publication—the proofing error and so on. The other, which is of a more
serious nature, lies in the failure of the survey organization to point out the limitations of
the statistics.
Additional Notes:
Publication errors in presenting and printing tabulated results can arise from two sources.
The first source involves mechanical issues during the publication process, such as
proofreading errors. These errors can occur due to oversight or mistakes made during the
printing or presentation stages.
The second source of publication errors is more significant and pertains to the survey
organization's failure to acknowledge the limitations of the statistics being reported. In
other words, it refers to the failure to clearly communicate the constraints and potential
shortcomings of the data and statistical analysis used in the publication.
To put it simply, publication errors can occur due to mistakes made during the printing or
presentation process. Additionally, they can also arise when survey organizations do not
adequately highlight the limitations and potential weaknesses of the statistics they are
reporting.
Remarks 1. In a sample survey, non-sampling errors may also arise due to defective
frame and faulty selection of sampling units.
2. It is obvious that the non-sampling errors are likely to be more serious in a
complete census as compared to a sample survey since in a sample survey the non-
sampling errors can be reduced to a greater extent by employing qualified, trained and
experienced personnel, better supervision and better equipment for processing and
analyzing relatively smaller data as compared to a complete census.
It has already been pointed out that usually sampling error decreases with an
increase in sample size. On the other hand, as the sample size increases, the non-sampling
error tends to increase. Accordingly, as sample size increases, the behavior of non-
sampling error is likely to be opposite to that of sampling error.
3. Quite often, the non-sampling error in a complete census is greater than both
the sampling and non-sampling errors taken together in a sample survey. Obviously in
such situations sample survey is to be preferred to complete enumeration survey.
3. Greater Accuracy of Results. The results of a sample survey are usually much more
reliable than those obtained from a complete census due to the following reasons:
(i) It is always possible to determine the extent of the sampling errors, and
(ii) The non-sampling errors due to factors such as training of field workers,
measuring and recording observations, location of units, incompleteness of
returns, biases due to interviewers, etc. are likely to be of a serious nature in
complete census than in a sample survey. In a sample survey non-sampling errors
can be controlled more effectively by employing more qualified and better trained
personnel, better supervision and better equipment for processing and analysis of
relatively limited data. Moreover, it is easier to guard against incomplete and
inaccurate returns. There can be a follow-up in case of non- response or
incomplete response. Effective control of non-sampling errors more than
compensates the errors in the estimates due to sampling. As such more
sophisticated statistical techniques can be employed to obtain relatively more
reliable results.
Additional Notes:
(i) The results of a sample survey are typically more dependable compared to those
obtained from a complete census for the following reason: The extent of sampling errors
can always be determined.
In other words, when conducting a sample survey, it is possible to assess and quantify the
errors that may arise due to sampling. This allows researchers to estimate the accuracy
and reliability of the survey results. On the other hand, in a complete census where data is
collected from the entire population, there is no sampling error since every individual is
accounted for. However, the absence of sampling error does not guarantee the absence of
other errors or biases in the data collection process.
To summarize, sample surveys provide greater accuracy of results because the potential
errors associated with sampling can be measured and accounted for, whereas in a
complete census, there is no sampling error, but other types of errors may still be present.
(ii) Non-sampling errors, which stem from various factors such as the training of field
workers, the accuracy of measuring and recording observations, the placement of units,
incomplete data returns, and biases introduced by interviewers, are more likely to be
significant in a complete census compared to a sample survey.
In other words, when conducting a complete census where data is collected from every
unit in the population, the potential for non-sampling errors tends to be higher. Factors
such as insufficient training of field workers, errors in measurement or recording of data,
issues with the geographical placement of units, incomplete data returns, and biases
introduced by interviewers can have a more pronounced impact on the accuracy and
quality of the data obtained in a complete census.
On the other hand, in a sample survey, where data is collected from a subset of the
population, the potential for non-sampling errors is relatively lower. This is because the
sampling process allows for more controlled data collection procedures, improved
training of field workers, and better management of potential biases.
To summarize, non-sampling errors are more likely to be significant in a complete census
compared to a sample survey due to factors such as field worker training, accuracy of
measurements, unit placement, incomplete data returns, and interviewer biases. The
sampling process in a survey helps mitigate some of these issues, resulting in potentially
greater data accuracy.
4. Greater Scope. Sample survey has generally greater scope as compared with complete
census. The complete enumeration is impracticable, rather inconceivable if the survey
requires a highly trained personnel and more sophisticated equipment for the collection
and analysis of the data. Since sample survey saves time and money, it is possible to have
a thorough and intensive enquiry because a more detailed information can be obtained
from a small group of respondents.
5. If the population is too large, as for example, of trees in a jungle, we are left with
no way but to resort to sampling.
6. If testing is destructive, i.e., if the quality of an article can be determined only
by destroying the article in the process of testing, as for example:
(i) testing the quality of milk or chemical salt by analysis,
(ii) testing the breaking strength of chalks,
(iii) testing of crackers and explosives,
(iv) testing the life of an electric tube or bulb, etc.,
complete enumeration is impossible and sampling technique is the only method to be
used in such cases.
Sampling theory has its own limitations and problems which may be briefly outlined
are as follows:
1. Proper care should be taken in the planning and execution of the sample
survey, otherwise the results obtained might be inaccurate and misleading.
2. Sampling theory requires the services of trained and qualified personnel and
sophisticated equipment for its planning, execution and analysis. In the absence of
these, the results of the sample survey are not trustworthy.
3. However, if the information is required about each and every unit of the universe,
there is no way but to resort to complete enumeration. Moreover, if time and money are
not
important factors or if the universe is not too large, a complete census may be better
than any sampling method.
7.8.1. Subjective (or Purposive or Judgment) Sampling. In this sampling, the sample
is selected with definite purpose in view and the choice of the sampling units depends
entirely on the discretion and judgment of the investigator. This sampling suffers from
drawbacks of favoritism and nepotism depending upon the beliefs and prejudices of
the investigator and thus does not give a representative sample of the population. For
example, if an investigator wants to give the picture that the standard of living has
increased in the city of New Delhi, he may take individuals in the sample from the posh
localities like Defence Colony, South Extension, Golf Link, Jor Bagh, Chanakyapuri,
etc., and ignore the localities where low-income group and middle-class families live.
This sampling method is rarely used and cannot be recommended for general use since it
is often biased due to element of subjectiveness or the part of the investigator (i.e., due to
investigator’s personal taste/opinion instead of facts). However, if the investigator is
experienced and skilled and this sampling is carefully applied, then judgment samples
may yield valuable results.
Additional Notes:
Subjective or judgment sampling is a method of selecting a sample based on the
researcher's personal judgment or expertise. It involves hand-picking individuals or
elements for the sample based on specific characteristics or knowledge about the
population.
This sampling suffers from drawbacks of favoritism and nepotism depending upon the
beliefs and prejudices of the investigator and thus does not give a representative sample
of the population.
For example, let's say a researcher wants to study the impact of a new educational
program on student performance. Instead of randomly selecting students, the researcher
may choose a sample based on their judgment. They might select students who are known
to be high achievers, struggling students, or those who have shown a particular interest in
the subject. The researcher's subjective judgment is used to identify individuals who are
likely to provide valuable insights into the impact of the program.
This sampling method is rarely used and cannot be recommended for general use since it
is often biased due to element of subjectiveness or the part of the investigator (i.e., due to
investigator’s personal taste/opinion instead of facts). However, if the investigator is
experienced and skilled and this sampling is carefully applied, then judgment samples
may yield valuable results.
7.8.3. Mixed Sampling. If the samples are selected partly according to some laws of
chance and partly according to a fixed sampling rule (no assignment of probabilities),
they are termed as mixed samples and the technique of selecting such samples is known
as mixed sampling.