0% found this document useful (0 votes)
7 views12 pages

Chapter Four

Chapter Four discusses statistical estimation, focusing on the process of using sample data to estimate population parameters through point and interval estimators. It highlights the limitations of point estimators and introduces interval estimators, which provide a range of values likely to contain the population parameter, along with the concepts of unbiasedness, consistency, and relative efficiency. The chapter also covers the construction of confidence intervals for population means, both when the population standard deviation is known and unknown.

Uploaded by

Getnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Chapter Four

Chapter Four discusses statistical estimation, focusing on the process of using sample data to estimate population parameters through point and interval estimators. It highlights the limitations of point estimators and introduces interval estimators, which provide a range of values likely to contain the population parameter, along with the concepts of unbiasedness, consistency, and relative efficiency. The chapter also covers the construction of confidence intervals for population means, both when the population standard deviation is known and unknown.

Uploaded by

Getnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Chapter Four

Statistical Estimation

4.2 Basic Concepts

As we explained in Chapter 1, statistical inference is the process by which we acquire


information and draw conclusions about populations from samples. There are two general
procedures for making inferences about populations: estimation and hypothesis testing. In this
chapter, we introduce the concepts and foundations of estimation and demonstrate them with
simple examples.
Estimation is the process of using statistics from sample data to estimate the parameters of the
population. A statistic is a random variable which depends on which sample is drawn from a
population. For instance, the sample mean is a point estimator of the population mean μ and the
sample proportion ṗ is a point estimator of the population proportion p.
Point and Interval Estimator
There are two types of estimator: point and interval estimators.
Point Estimator- A point estimator draws inferences about a population by estimating the
value of an unknown parameter using a single value or point. For example, sample mean is a
point estimator of population mean, sample standard deviation is a point estimator of population
standard deviation and sample proportion is a point estimator of population proportion.
There are three drawbacks to using point estimators. First, it is virtually certain that the estimate
will be wrong. (The probability that a continuous random variable will equal a specific value is
0; that is, the probability that X̄ will exactly equal μ is 0.) Second, we often need to know how
close the estimator is to the parameter. Third, in drawing inferences about a population, it is
intuitively reasonable to expect that a large sample will produce more accurate results because it
contains more information than a smaller sample does. But point estimators don’t have the
capacity to reflect the effects of larger sample sizes. Consequently, we use the second method of
estimating a population parameter, the interval estimator.
Interval Estimator- An interval estimator draws inferences about a population by estimating the
value of an unknown parameter using an interval. The purpose of an interval estimate is to
provide information about how close the point estimate, provided by the sample, is to the value
of the population parameter.
In this chapter, we show how to compute interval estimates of a population mean μ and a
population proportion p.
The general form of an interval estimate of a population mean is X̄ ± Margin error. Similarly,
the general form of an interval estimate of a population proportion is ṗ± Margin error. Margin of
σ
error is maintained by adding and subtracting z α/ 2 to and from sample mean.
√n
To illustrate the difference between point and interval estimators, suppose that a statistics
professor wants to estimate the mean summer income of his second-year business students.
Selecting 25 students at random, he calculates the sample mean weekly income to be $400. The
point estimate is the sample mean. In other words, he estimates the mean weekly summer income
of all second-year business students to be $400. Using the technique described subsequently, he

Page 1 of 12
may instead use an interval estimate; he estimates that the mean weekly summer income of
second-year business students to lie between $380 and $420.
Numerous applications of estimation occur in the real world. For example, television network
executives want to know the proportion of television viewers who are tuned in to their networks;
an economist wants to know the mean income of university graduates; and a medical researcher
wishes to estimate the recovery rate of heart attack victims treated with a new drug. In each of
these cases, to accomplish the objective exactly, the statistics practitioner would have to examine
each member of the population and then calculate the parameter of interest. For instance,
network executives would have to ask each person in the country what he or she is watching to
determine the proportion of people who are watching their shows. Because there are millions of
television viewers, the task is both impractical and prohibitively expensive.
An alternative would be to take a random sample from this population, calculate the sample
proportion, and use that as an estimator of the population proportion. The use of the sample
proportion to estimate the population proportion seems logical.
The selection of the sample statistic to be used as an estimator, however, depends on the
characteristics of that statistic. Naturally, we want to use the statistic with the following most
desirable qualities.
1. Unbiased Estimator- An unbiased estimator of a population parameter is an estimator
whose expected value is equal to that parameter.
This means that if you were to take an infinite number of samples and calculate the value of the
estimator in each sample, the average value of the estimators would equal the parameter. This
amounts to saying that, on average, the sample statistic is equal to the parameter (i.e., E( X̄ ) = μ
). We also know that the sample proportion is an unbiased estimator of the population proportion
because E(ṗ) = p and that the difference between two sample means is an unbiased estimator of
the difference between two population means because E( X̄ 1─ X̄ 2) = μ1−μ2 .
Knowing that an estimator is unbiased only assures us that its expected value equals the
parameter; it does not tell us how close the estimator is to the parameter. Another desirable
quality is that as the sample size grows larger, the sample statistic should come closer to the
population parameter. This quality is called consistency.
2. Consistency- An unbiased estimator is said to be consistent if the difference between the
estimator and the parameter grows smaller as the sample size grows larger.
The measure we use to gauge closeness is the variance (or the standard deviation). Thus, X̄ is a
consistent estimator of μ because the variance of is σ 2/n. This implies that as n grows larger, the
variance X̄ of grows smaller. As a consequence, an increasing proportion of sample means falls
close to μ. Similarly, ṗ is a consistent estimator of p because it is unbiased and the variance of ṗ
is p(1-p)/n, which grows smaller as n grows larger.
A third desirable quality is relative efficiency, which compares two unbiased estimators of a
parameter.
3. Relative Efficiency- If there are two unbiased estimators of a parameter, the one whose
variance is smaller is said to have relative efficiency.
Statisticians have established that the sample median is an unbiased estimator but that its
variance is greater than that of the sample mean (when the population is normal). As a
consequence, the sample mean is relatively more efficient than the sample median when
estimating the population mean.

Page 2 of 12
4.2 Interval Estimator of Population Mean
4.2.1 Interval Estimator for Population Mean When σ is Known
In order to develop an interval estimate of a population mean, either the population standard
deviation σ or the sample standard deviation s must be used to compute the margin of error. In
most applications σ is not known, and s is used to compute the margin of error. In some
applications, however, large amounts of relevant historical data are available and can be used to
estimate the population standard deviation prior to sampling. In addition, in quality control
applications where a process is assumed to be operating correctly, or “in control,” it is
appropriate to treat the population standard deviation as known. We refer to such cases as the σ
known case. In this section, we introduce an example in which it is reasonable to treat σ as
known and show how to construct an interval estimate for this case.
We now describe how an interval estimator is produced from a sampling distribution. Suppose
we have a population with mean μ and standard deviation σ . The population mean is assumed to
be unknown, and our task is to estimate its value. As we just discussed, the estimation procedure
requires the statistics practitioner to draw a random sample of size n and calculate the sample
mean X̄ .
The central limit theorem presented stated that X̄ is normally distributed if X is normally
distributed, or approximately normally distributed if X is non-normal and n is sufficiently large.
−μ
This means that the variable z = is standard normally distributed (or approximately so).
σ /√n
Thus, we can develop the following probability statement associated with the sampling
distribution of the mean:
σ σ
p( μ−z α /2 <¿ μ+ z α /2 ) = 1−α which was derived from
√n √n
p(−z α/ 2 <¿ z α / 2) = 1−α
Using a similar algebraic manipulation, we can express the probability in a slightly different
form:
σ σ
p( X̄ −z α / 2 √ n ¿ μ<+ z α /2 √ n ) = 1−α
Notice that in this form the population mean is in the center of the interval created by adding and
subtracting z α/ 2standard errors to and from the sample mean (margin error). It is important for
you to understand that this is merely another form of probability statement about the sample
mean. This equation says that, with repeated sampling from this population, the proportion of
σ σ
values of for which the interval X̄ −z α/ 2 ,+ z α /2 includes the population mean μ is equal to
√n √n
1-α . This form of probability statement is very useful to us because it is the confidence interval
estimator of μ.
Confidence interval is a range of values constructed from sample data so that the population
parameter is likely to occur within that range at a specified probability. The specified probability
is called the level of confidence.

Page 3 of 12
The probability 1- α is called the confidence level (coefficient).
X̄ −z α/ 2 σ is called the Lower Confidence Level (LCL)
√n
σ
+ z α /2 is called the Upper Confidence Level (UCL)
√n
Because the confidence level is the probability that the interval includes the actual value of μ, we
generally set 1- α close to 1 (usually between 0.90 and 0.99). In table below we list four

confidence level is 1- 𝛼 = 0.95, 𝛼 = 0.05, 𝛼/2 = 0.025, and z α/ 2= z 0.025= 1.96. The resulting
commonly used confidence intervals and their associated value of z α/ 2 . For example, if the

confidence interval estimator is then called the 95% confidence interval estimator of µ.

1- 𝛼 𝛼 𝛼/2
Four Commonly Used Confidence Levels and z α/ 2
z α/ 2
0.90 0.1 0.05 z 0.05=1.645
0.95 0.05 0.025 z 0.025= 1.96
0.98 0.02 0.01 z 0.01=2.33
0.99 0.01 0.005 z 0.005=2.575
Example: The Doll Computer Company makes its own computers and delivers them directly to
customers who order them via the Internet. Doll competes primarily on price and speed of
delivery. To achieve its objective of speed, Doll makes each of its five most popular computers
and transports them to warehouses across the country. The computers are stored in the
warehouses from which it generally takes 1 day to deliver a computer to the customer. This
strategy requires high levels of inventory that add considerably to the cost. To lower these costs,
the operations manager wants to use an inventory model.
He notes that both daily demand and lead time are random variables. He concludes that demand
during lead time is normally distributed, and he needs to know the mean to compute the optimum
inventory level. He observes 25 lead time periods and records the demand during each period.
These data are listed here. The manager would like a 95% confidence interval estimate of the
mean demand during lead time. From long experience, the manager knows that the standard
deviation is 75 computers. Construct confidence interval for Doll Computer Company.
Demand during lead-time
235 261 374 46 316 309 499 25 334
6 3
421 374 361 53 296 514 462 36
5 9
330 302 344 38 332 348 439 39
6 4

We need four values to construct the confidence interval estimate of µ. They are , z α/ 2,σ , n
Solution

=
Σxi 9,254
¿ =370.16
The confidence interval is set at 95%; thus, 1- 𝛼 = 0.95, 𝛼 = 0.05, and 𝛼/2 = 0.025
n 25

Page 4 of 12
From the above table we can find z α/ 2= z 0.025= 1.96.
Substituting the above attributes into the confidence interval estimator, we find
σ 75
± zα/ 2 = 370.16 ± 1.96 = 370.16 ± 29.40 = (340.76, 399.56).
√n √ 25
Here the numerical value 29.40 represents margin error.

Interpretation: The operations manager estimates that the mean demand during lead-time lies
between 340.76 and 399.56. He can use this estimate as an input in developing an inventory
policy.
Practical Advice
If the population follows a normal distribution, the confidence interval provided by confidence
interval estimator expression is exact. In other words, if expression were used repeatedly to
generate 95% confidence intervals, exactly 95% of the intervals generated would contain the
population mean. If the population does not follow a normal distribution, the confidence interval
provided by confidence interval estimator expression will be approximate. In this case, the
quality of the approximation depends on both the distribution of the population and the sample
size.
In most applications, a sample size of n ≥ 30 is adequate when using the expression to develop
an interval estimate of a population mean. If the population is not normally distributed, but is
roughly symmetric, sample sizes as small as 15 can be expected to provide good approximate
confidence intervals. With smaller sample sizes, the expression should only be used if the analyst
believes, or is willing to assume, that the population distribution is at least approximately
normal.
4.2.2 Interval Estimator of Population Mean When σ is Unknown
When developing an interval estimate of a population mean we usually do not have a good
estimate of the population standard deviation either. In these cases, we must use the same sample
to estimate both μ and σ. This situation represents the σ unknown case. When s is used to
estimate σ, the margin of error and the interval estimate for the population mean are based on a
probability distribution known as the t distribution (student t distribution). Although the
mathematical development of the t distribution is based on the assumption of a normal
distribution for the population we are sampling from, research shows that the t distribution can
be successfully applied in many situations where the population deviates significantly from
normal. Later in this section we provide guidelines for using the t distribution if the population is
not normally distributed.
The t distribution is a family of similar probability distributions, with a specific t distribution
depending on a parameter known as the degrees of freedom. The t distribution with one degree
t distribution with two degrees of freedom, with three degrees of
of freedom is unique, as is the
freedom, and so on. As the number of degrees of freedom increases, the difference between the t
distribution and the standard normal distribution becomes smaller and smaller.
Note that a t distribution with more degrees of freedom exhibits less variability and more closely
resembles the standard normal distribution. Note also that the mean of the t distribution is zero.

Page 5 of 12
We place a subscript on t to indicate the area in the upper tail of the t distribution. For example,
just as we used z0.025 to indicate the z value providing a 0.025 area in the upper tail of a standard
normal distribution, we will use t0.025 to indicate a 0.025 area in the upper tail of a t distribution.
In general, we will use the notation tα/2 to represent a t value with an area of α/2 in the upper tail
of the t distribution.
To know t-value we can use t-distribution table. Each row in the table corresponds to a separate
t distribution with the degrees of freedom shown. For example, for a t distribution with 9
degrees of freedom, t0.025 = 2.262. Similarly, for a t distribution with 60 degrees of freedom,
t0.025 = 2.000. As the degrees of freedom continue to increase, t 0.025 approaches z 0.025 = 1.96. In
fact, the standard normal distribution z values can be found in the infinite degrees of freedom
row (labeled ∞ ) of the t distribution table. If the degrees of freedom exceed 100, the infinite
degrees of freedom row can be used to approximate the actual t value; in other words, for more
than 100 degrees of freedom, the standard normal z value provides a good approximation to the t
value.
The following characteristics of the t distribution are based on the assumption that the
population of interest is normal, or nearly normal.
a. It is, like the z distribution, a continuous distribution.
b. It is, like the z distribution, bell-shaped and symmetrical.
c. There is not one t distribution, but rather a "family" of t distributions. All t distributions
have a mean of 0, but their standard deviations differ according to the sample size, n.
There is a t distribution for a sample size of 20, another for a sample size of 22, and so
on. The standard deviation for a t distribution with 5 observations is larger than for a t
distribution with 20 observations.
d. The t distribution is more spread out and flatter at the center than the standard normal
distribution. As the sample size increases, however, the t distribution approaches the
standard normal distribution, because the errors in using s to estimate σ decrease with
larger samples.
To develop a confidence interval for the population mean using the t distribution, we adjust the
above formula to:
s
±t
√n
To put it another way, to develop a confidence interval for the population mean with an unknown
population standard deviation we:

ii. Estimate the population standard deviation (𝝈) with the sample standard deviation (s).
a. Assume the sample is from a normal population.

iii. Use the t distribution rather than the z distribution.


We should be clear at this point. We usually employ the standard normal distribution when the
sample size is at least 30. We should, strictly speaking, base the decision whether to use z or t on

Page 6 of 12
whether 𝝈 is known or not. When 𝝈 is known, we use z; when it is not, we use t. The rule of
using z when the sample is 30 or more is based on the fact that the t distribution approaches the
normal distribution as the sample size increases.
When the sample reaches 30, there is little difference between the z and t values, so we may
ignore the difference and use z. We will show this when we discuss the details of the t
distribution and how to find values in a t distribution. The following chart summarizes the
decision-making process.

Determining when to use the z or t


Distribution
Is the population normal?
NO YES

Is the population standard


Is n 30 0r more? deviation is known ?

NO Use
YES NO YES
appropriate
Use the z Use the t Use the z
non-
distribution distribution distribution
parametrics
test

The following example will illustrate a confidence interval for a population mean when the
population standard deviation is unknown and how to find the appropriate value of t in a table.
Example: A tire manufacturer wishes to investigate the tread life of its tires. A samples of 10
tires driven 50,000 miles revealed a sample mean of 0.32 inch of tread remaining with a standard
deviation of 0.09 inch. Construct a 95 percent confidence interval for the population mean.
Would it be reasonable for the manufacturer to conclude that after 50,000 miles the population
mean amount of tread remaining is 0.30 inches?
Solution: To begin, we assume the population distribution is normal. In this case, we don't have a
lot of evidence, but the assumption is probably reasonable. We do not know the population
standard deviation, but we know the sample standard deviation, which is 0.09 inches. To use the
central limit theorem, we need a large sample, that is, a sample of 30 or more. In this instance
there are only 10 observations in the sample. Hence, we cannot use the central limit theorem. We
s
use the formula: ± t .
√n
To find the value of t we use t distribution table. The first step for locating t is to move across
the row identified for "Confidence Intervals" to the level of confidence requested. In this case we
want the 95 percent level of confidence, so we move to the column headed "95%." The column

Page 7 of 12
on the left margin is identified as "df." This refers to the number of degrees of freedom. The
number of degrees of freedom is the number of observations in the sample minus the number of
samples, written n ─ 1. In this case it is 10 - 1 = 9. For a 95 percent level of confidence and 9
degrees of freedom, we select the row with 9 degrees of freedom. The value of t is 2.262.

To determine the confidence interval we substitute the values in formula


s 0.09
±t = 0.32 ± 2.262 = 0.32 ± 0.064
√n √ 10
The endpoints of the confidence interval are 0.256 and 0.384. How do we interpret this result? It
is reasonable to conclude that the population mean is in this interval. The manufacturer can be
reasonably sure (95 percent confident) that the mean remaining tread depth is between 0.256 and
0.384 inches. Because the value of 0.30 is in this interval, it is possible that the mean of the
population is 0.30.
The reason the number of degrees of freedom associated with the t value in above expression is
n ─ 1 concerns the use of s as an estimate of the population standard deviation σ. The
expression for the sample standard deviation is
s = √ Σ¿ ¿ ¿ ¿
Degrees of freedom refer to the number of independent pieces of information that go into the
computation of Σ ¿ ¿. The n pieces of information involved in computing are Σ ¿ ¿ as follows: x1─ ,
x2 ─ , . . . , xn ─ . In previous chapter we indicated that Σ (xi ─ ) = 0 for any data set. Thus, only n
─ 1 of the values are independent; that is, if we know n ─ 1 of the values, the remaining value
can be determined exactly by using the condition that the sum of the xi ─ values must be 0.
Thus, n ─ 1 is the number of degrees of freedom associated
with Σ ¿ ¿and hence the number of degrees of freedom for the t distribution.

4.3 Interval Estimator for Population Proportion


Proportion is the fraction, ratio, or percent indicating the part of the sample or the population
having a particular trait of interest.
As an example of a proportion, a recent survey indicated that 92 out of 100 surveyed favored the
continued use of daylight savings time in the summer. The sample proportion is 92/100, or .92,
or 92 percent. If we let ṗ represent the sample proportion, x the number of "successes," and n the
number of items sampled, we can determine a sample proportion as ṗ = x/n.
The population proportion is identified by p. Therefore, p refers to the percent of successes in
the population. The population proportion is unknown and it is estimated by sample proportion.

To develop a confidence interval for a proportion, we need to meet the following assumptions.
1. The binomial conditions, discussed in Chapter 2, have been met. Briefly, these conditions
are:
a. The sample data is the result of counts.
b. There are only two possible outcomes. (We usually label one of the outcomes a
"success" and the other a "failure.")

Page 8 of 12
c. The probability of a success remains the same from one trial to the next.
d. The trials are independent. This means the outcome on one trial does not affect the
outcome on another.
2. The values np and n(1 - p) should both be greater than or equal to 5. This condition allows
us to employ the standard normal distribution, that is, Z, to complete a confidence interval.

With margin of error, the general expression for an interval estimate of a population proportion is
as follows.


ṗ ± z α/ 2 σ ṗ, and σ ṗ=
p(1− p)
n
,

But σ ṗ=
√ p(1− p)
n
cannot be used directly in the computation of the margin of error because p

will not be known; p is what we are trying to estimate. So ṗ is substituted for p and for an


interval estimate of a population proportion is given by ṗ ± z α/ 2
ṗ (1− ṗ)
n
.

When developing confidence intervals for proportions, the quantity z α / 2


margin of error.
√ ṗ (1− ṗ)
n
provides the

Example: The following example illustrates the computation of the margin of error and interval
estimate for a population proportion. A national survey of 900 women golfers was conducted to
learn how women golfers view their treatment at golf courses in the United States. The survey
found that 396 of the women golfers were satisfied with the availability of tee times. Thus, the
point estimate of the proportion of the population of women golfers who are satisfied with the
availability of tee times is 396/900 = 0.44. Using the above expression and a 95% confidence
level,


ṗ ± z α/ 2
ṗ (1− ṗ)
n √
= 0.44 ± 1.96
0.44(1−0.44)
900
= 0.44 ± 0.0324 = 0.4076 to 0.4724
Thus, the margin of error is 0.0324 and the 95% confidence interval estimate of the population
proportion is 0.4076 to 0.4724. Using percentages, the survey results enable us to state with 95%
confidence that between 40.76% and 47.24% of all women golfers are satisfied with the
availability of tee times.
4.4 Determining the Sample Size
A concern that usually arises when designing a statistical study is "How many items should be in
the sample?" If a sample is too large, money is wasted collecting the data. Similarly, if the
sample is too small, the resulting conclusions will be uncertain. The necessary sample size
depends on three factors:
i. The level of confidence desired.
ii. The margin of error the researcher will tolerate.
iii. The variability in the population being studied.
The first factor is the level of confidence. Those conducting the study select the level of
confidence. The 95% and the 99% levels of confidence are the most common, but any value
between 0 and 100 percent is possible. The 95% level of confidence corresponds to a z 𝛼/2 value

Page 9 of 12
of 1.96, and a 99% level of confidence corresponds to a z 𝛼/2 value of 2.58. The higher the level of
confidence selected, the larger the size of the corresponding sample.

The second factor is the allowable error. The maximum allowable error (or margin of error
designated as E), is the amount that is added and subtracted to the sample mean (or sample
proportion) to determine the endpoints of the confidence interval. It is the amount of error those
conducting the study are willing to tolerate. It is also one-half the width of the corresponding
confidence interval. A small allowable error will require a large sample. A large allowable error
will permit a smaller sample.

The third factor in determining the size of a sample is the population standard deviation. If the
population is widely dispersed, a large sample is required. On the other hand, if the population is
concentrated (homogeneous), the required sample size will be smaller. However, it may be
necessary to use an estimate for the population standard deviation. Here are three suggestions for
finding that estimate.
i. Use a comparable study. Use this approach when there is an estimate of the
dispersion available from another previous study. Suppose we want to estimate the
number of hours worked per week by refuse workers. Information from certain state or
federal agencies who regularly sample the workforce might be useful to provide an
estimate of the standard deviation. If a standard deviation observed in a previous study is
thought to be reliable, it can be used in the current study to help provide an approximate
sample size.
ii. Use a range-based approach. To use this approach we need to know or have an
estimate of the largest and smallest values in the population. Recall from Chapter 2,
where we described the Empirical Rule, that virtually all the observations could be
expected to be within plus or minus 3 standard deviations of the mean, assuming that the
distribution was approximately normal. Thus, the distance between the largest and the
smallest values is 6 standard deviations. We could estimate the standard deviation as one-
sixth of the range. For example, the director of operations at University Bank wants an
estimate of the number of checks written per month by college students. She believes that
the distribution is approximately normal, the minimum number of checks written is 2 per
month, and the most is 50 per month. The range of the number of checks written per
month is 48, found by 50 - 2. The estimate of the standard deviation then would be 8
checks per month, 48/6.
iii. Conduct a pilot study. This is the most common method. Suppose we want an estimate
of the number of hours per week worked by students enrolled in the College of Business at the
University of Addis. To test the validity of our questionnaire, we use it on a small sample of
students. From this small sample we compute the standard deviation of the number of hours
worked and use this value to determine the appropriate sample size.

To understand how the sample size determination process works, we return to the σ known case
presented in the above section. The confidence interval estimate is
σ
± z α/ 2
√n
σ
Let E = the desired margin of error, and E = z α / 2
√n

Page 10 of 12
z σ
Solving for √ n ,we’ve √ n , = αE/2 ,

2 2
z α/2 σ
Squaring both sides of this equation, we obtain n= 2
E
Note that we use the same formula to determine the sample size in case of unknown σ .
Example: A student in public administration wants to determine the mean amount members of
city councils in large cities earn per month as remuneration for being a council member. The
error in estimating the mean is to be less than $100 with a 95 percent level of confidence. The
student found a report by the Department of Labor that estimated the standard deviation to be
$1,000. What is the required sample size?
Solution: The maximum allowable error, E, is $100. The value of z α/ 2 for a 95 percent level of
confidence is 1.96, and the estimate of the standard deviation is $1,000. Substituting these values
into the above formula gives the required sample size as:
2
z α /2 σ
2
( 1.96 )2 (1,000)2 3,841,600
n= = = = 384.16
E
2
(100)
2 10,000
The computed value of 384.16 is rounded up to 385. A sample of 385 is required to meet the
specifications.

If the student wants to increase the level of confidence, for example to 99 percent, this will
require a larger sample. The z value corresponding to the 99 percent level of confidence is 2.58.

2
z α/2 σ ( 2.58 )2 (1,000)2 6,656,400
2
n= = = = 665.64
E
2
(100)
2 10,000
We recommend a sample size of 666. Observe how much the change in the confidence level
changed the size of the sample. An increase from the 95 percent to the 99 percent level of
confidence resulted in an increase of 281 observations. This could greatly increase the cost of the
study, both in terms of time and money. In contrast, it increases the accuracy of study
conclusion. Hence, the level of confidence should be considered carefully.

The procedure just described can be adapted to determine the sample size for a proportion.
Again, three items need to be specified: the desired level of confidence, the margin of error in the
population proportion and an estimate of the population proportion.

The formula to determine the sample size of a proportion is:

Let the Margin of error of proportion is denoted by E

ṗ (1− ṗ)
E = z α/ 2
n √
Solving this equation for n provides a formula for the sample size that will provide a margin of
error of size E.

Page 11 of 12
2
z α/2 ṗ(1− ṗ)
n= 2
E
Note, however, that we cannot use this formula to compute the sample size that will provide the
desired margin of error because will not be known until after we select the sample. What we
need, then, is a planning value for that can be used to make the computation. Using p* to
denote the planning value for ṗ , the following formula can be used to compute the sample size
that will provide a margin of error of size E.
2 ¿ ¿
z α/2 P (1−P )
n= 2
E

In practice, the planning value p* can be chosen by one of the following procedures.
a) Use the sample proportion from a previous sample of the same or similar units.
b) Use a pilot study to select a preliminary sample. The sample proportion from this sample
can be used as the planning value, p*.
c) Use judgment or a “best guess” for the value of p*.
d) If none of the preceding alternatives apply, use a planning value of p* = 0.50.
Example: The study in the previous example also estimates the proportion of cities that have
private refuse collectors. The student wants to estimate the margin of error to be within 0.10 of
the population proportion, the desired level of confidence is 90 percent, and no estimate is
available for the population proportion. What is the required sample size?
Solution: The estimate of the population proportion is to be within 0.10, so E = 0.10. The desired
level of confidence is 0.90, which corresponds to a z α/ 2 value of 1.65. Because no estimate of the
population proportion is available, we use p* = 0.50. The suggested number of observations is
2 ¿ ¿ 2
z α /2 P (1−P ) ( 1.65 ) 0.5 (1−0.5) 0.680625
n= = = = 68.0625
E
2
(0.1)
2
0.01
Therefore, the student needs a random sample of 69 cities.

Page 12 of 12

You might also like