0% found this document useful (0 votes)
215 views36 pages

Engineering Data Analysis Guide

CSU is committed to transforming lives through high-quality education and innovative research. The document discusses estimation, including defining point estimates and point estimators. It explains that a point estimate is a single value used to estimate an unknown population parameter, while a point estimator is the statistic used to compute the point estimate. The document also covers calculating confidence intervals for a population mean using large sample sizes, and defines key terms like confidence interval, confidence level, and confidence coefficient.

Uploaded by

Joemar Subong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views36 pages

Engineering Data Analysis Guide

CSU is committed to transforming lives through high-quality education and innovative research. The document discusses estimation, including defining point estimates and point estimators. It explains that a point estimate is a single value used to estimate an unknown population parameter, while a point estimator is the statistic used to compute the point estimate. The document also covers calculating confidence intervals for a population mean using large sample sizes, and defines key terms like confidence interval, confidence level, and confidence coefficient.

Uploaded by

Joemar Subong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

CSU Vision

Transforming

lives by

Educating for

the BEST.
Republic of the Philippines
Cagayan State University
Carig Campus
CSU Mission
COLLEGE OF ENGINEERING
CSU is committed
to transform the
lives of people and
communities FLIPPED NOTES NUMBER 7
through high
quality instruction
and innovative
research,
development,
production and
In partial fulfilment for the requirements of the course
extension. ENGINEERING DATA ANALYSIS

CSU – IGA
Competence

Social Responsibility
By:
Unifying Presence
SUBONG, JOEMAR D.
BACANI, VALERIE ELAINE M.

COE – IGA DOCA, AL JOHNKENETH A.


Innovative Thinking
TANNAGAN, NOREEN G.
Synthesis

Personal

Responsibility

Empathy

Research Skill

Entrepreneurial Skill

January 06, 2020


Estimation

I. Introduction

Objectives:

 Explain the concept of estimation


 Differentiate point estimate and point estimator

A population is a group of phenomena that have something in common. A population


is the group to be studied, and population data is a collection of all elements in the
population. Statistical inference may be divided into two major areas: parameter estimation
and hypothesis testing. Populations are characterized by descriptive measures called
parameters and are typically written using Greek letters. The population mean is μ (mu). The
population variance is σ2 (sigma squared) and population standard deviation is σ (sigma).
Inferences about parameters are based on sample statistics. For example, the population mean
(µ) is estimated by the sample mean (x̄). The population variance (σ 2) is estimated by the
sample variance (s2).

In a parameter estimation problem, suppose that a structural engineer is analyzing the


tensile strength of a component used in an automobile chassis. Since variability in tensile
strength is naturally present between the individual components because of differences in raw
material batches, manufacturing processes, and measurement procedures, the engineer is
interested in estimating the mean tensile strength of the components. In practice, the engineer
will use sample data to compute a number that is in some sense a reasonable value of the true
mean. This number is called a point estimate.

Definition 1
Point estimate of a population parameter is a single value of a statistic used to estimate the value
of the target parameter. For example, the sample mean x is a point estimate of the population
mean μ. Similarly, the sample proportion p is a point estimate of the population proportion P.

In general, if X is a random variable with probability distribution f(x), characterized


by the unknown parameter μ, and if x1, x2, …,xn is a random sample of size n from X, the
statistic X =h ( x 1 x 2 … , x n ) is called a point estimator of μ. Note that X is a random variable
because it is a function of random variables. After the sample has been selected,
X takes on a particular numerical value x called the point estimate of μ.

Definition 2
A point estimate of some population parameter μ is a single numerical value x of a statisticX . The
statistic X is called the point estimator.

An estimator should be “close” in some sense to the true value of the unknown parameter.
Formally, we say that X is an unbiased estimator of μ if the expected value of X is equal μ.
This is equivalent to saying that the mean of the probability distribution of X (or the mean of
the sampling distribution of X ) is equal to μ.

The point estimator X is an unbiased estimator for the parameter μ if

E ( X ) =0

If the estimator is not unbiased, then the difference

E ( X ) −θ

is called the bias of the estimator X .

When an estimator is unbiased, the bias is zero; that is, E ( X ) −θ=0

Example 1 If the average height of 100 randomly selected men aged 18 is 70.6 inches, then
we would say that the average height of all 18-year-old men is (at least approximately) 70.6
inches.

Explanation Estimating a population parameter like this by a single number is called point
estimation. The only drawback with a point estimate is that it gives no indication of how
reliable the estimate is. In brief, in the case of estimating a population mean μ we use a
formula to compute from the data a number E, called the margin of error of the estimate, and
form the interval [x−−E,x−+E]. We do this in such a way that a certain proportion, say 95%,
of all the intervals constructed from sample data by means of this formula contains the
unknown parameter μ. Such an interval is called a 95% confidence interval for μ. (Shafer
and Zhang, 2019)

II. Estimation of a population mean: Large-sample case

Objectives:
 To become familiar with the concept of an interval estimate of the population
mean.
 To calculate the confidence interval estimating the population mean
 To determine the relationship of confidence interval with confidence coefficient
and sample size

The term large-sample refers to the sample being of a sufficiently large size that we
can apply the Central Limit Theorem to determine the form of the sampling distribution of
x̄ . The Central Limit Theorem says that, for large samples (samples of size n≥30), when
viewed as a random variable the sample mean  X́ is normally distributed with mean  μ x́ =μ and
σ
standard deviation σ x́ = . The Empirical Rule says that we must go about two standard
√n
deviations from the mean to capture 95% of the values of  X́ generated by sample after
sample. A more precise distance based on the normality of  X́ is 1.960 standard deviations,
1.960 σ
which is  E= .
√n
It is standard practice to identify the level of confidence in terms of the area α in the
two tails of the distribution of  X́ when the middle part specified by the level of confidence is
taken out. The following figures are shown to present the general situation and confidence
interval of this.

Figure 1. For 100(1−α)% confidence the area in each tail is α/2

The z-value that cuts off a right tail of area c are denoted zc. Thus the number 1.960 in the
example is  z 0.025, which is  z α for α=1−0.95=0.05.
2
Figure 2. For  95% confidence the area in each tail is α/2=0.025

For 95% confidence the area in each tail is α/2=0.025.

The level of confidence can be any number between 0 and 100%, but the most common
values are probably 90% (α=0.10), 95% (α=0.05), and 99% (α=0.01).

Example 2 A sample of size 49 has sample mean 35 and sample standard deviation 14.
Construct a 98% confidence interval for the population mean using this information. Interpret
its meaning.

Solution:

z /2
For confidence level 98%, α=1−0.98=0.02, so = z0.01. From the critical values table, we
read directly that z0.01=2.326. Thus

S 14
x́ ± z α / 2
√n
=35 ±2.326 ( )
√ 49
=35 ± 4.652 ≈ 35 ± 4.7

We are 98% confident that the population mean μ lies in the interval [30.3,39.7], in the sense


that in repeated sampling 98% of all intervals constructed from the sample data in this
manner will contain μ.

Definition 3
A confidence interval for a parameter is an interval of numbers within which we expect the true
value of the population parameter to be contained. It is a range of possible values p might take,
controlling the probability that μ is not lower than the lowest value in this range and not higher
than the highest value. It is used to express the precision and uncertainty associated with a
particular sampling method. A confidence interval consists of three parts. A confidence level,
statistic, and margin of error. The endpoints of the interval are computed based on sample
information.
Example 3 A random sample of 120 students from a large university yields mean
GPA 2.71 with sample standard deviation 0.51. Construct a 90% confidence interval for the
mean GPA of all students at the university.

Solution:

z
For confidence level 90%, α=1−0.90=0.10, so   /2 = z0.05. From the critical values table we
read directly that z0.05=1.645. Since n=120, x̄ =2.71, and s=0.51. thus

S 0.51
x́ ± z α / 2
√n
=2.71± 1.165(√ 120)=2.71 ± 0.0766

One may be 90% confident that the true average GPA of all students at the university is
contained in the interval (2.71−0.08, 2.71+0.08)=(2.63,2.79).

Different intervals can be calculated having different value of  aside from 95 % confidence.
This can be done by choosing a confidence coefficient other than .95.

Definition 4
The confidence coefficient is the proportion of times that a confidence interval encloses the true
value of the population parameter if the confidence interval procedure is used repeatedly a very
large number of times.

The first step in constructing a confidence interval with any desired confidence coefficient is
to notice that, for a 95% confidence interval, the confidence coefficient of 95% is equal to the
total area under the sampling distribution (1.00), less .05 of the area, which is divided equally
between the two tails of the distribution. Thus, each tail has an area of .025. Second, consider
that the tabulated value of z that cuts off an area of .025 in the right tail of the standard
normal distribution is 1.96. The value z = 1.96 is also the distance, in terms of standard
deviation, that x̄ is from each endpoint of the 95% confidence interval. By assigning a
confidence coefficient other than .95 to a confidence interval, we change the area under the
sampling distribution between the endpoint of the interval, which in turn changes the tail area
associated with z. Thus, this z-value provides the key to constructing a confidence interval
with any desired confidence coefficient.

Large-sample (1 − α ) 100% confidence interval for a population mean, μ


  
x  z /2 x  x  z /2  
 n
z
where  /2 is the z-value that locates an area of α /2 to its right, σ is the standard
deviation of the population from which the sample was selected, n is the sample size, and x̄
is the value of the sample mean.

Assumption: n  30
[When the value of σ is unknown, the sample standard deviation s may be used to
approximate σ in the formula for the confidence interval. The approximation is generally
quite satisfactory when n  30.]
Example 4 A random number of seniors in a certain university were asked to report the
number of hours they spent on their studies during a certain week. Results show that the
average was 40 hours and the standard deviation was 10 hours. A study will be conducted
and they aim to know if student are now studying more than they used to. Suppose 50
students are interviewed and the results yields a statistics of x̄ = 41.5 hours and s = 9.2
hours.

Estimate μ , the mean number of hours spent on study, using a 99% confidence interval.
Interpret the interval in term of the problem.

Solution The general form of a large-sample 99% confidence interval for μ is

    s   9.2 
x  2.58    x  2.58    41.5  2.58    41.5  3.36
 n  n  50  or (38.14, 44.86).

Therefore, we can be 99% confident that the interval (38.14, 44.86) encloses the true mean
weekly time spent on the study. Since all the values in the interval fall above 38 hours and
below 45 hours, we conclude that there is tendency that students now spend more than 6
hours and less than 7.5 hours per day on average (suppose that they don't study on Sunday).

Example 5 Refer to Example 4.


a. Using the sample information in Example 4, construct a 95% confidence interval for
mean weekly time spent on study of all students.
b. For a fixed sample size, how is the width of the confidence interval related to the
confidence coefficient?

Solution
a. The form of a large-sample 95% confidence interval for a population mean  is
    s   9.2 
x  1.96    x  1.96    41.5  1.96    41.5  2.55
 n  n  50  or (38.95, 44.05).

b. The 99% confidence interval for  was determined in Example 4 to be (38.14, 44.86).
While the 95% confidence interval obtained in this example is (38.95, 44.05). From this,
it is concluded that the 95% confidence interval is narrower than the 99% confidence
interval.

A narrow confidence interval enables more precise population estimates. The width of the


confidence interval is a function of two elements; Confidence level and sampling error. The
greater the confidence level, the wider the confidence interval.

Relationship between width of confidence interval and confidence coefficient


The width of confidence interval is directly proportional to the confidence coefficient that is as the
width of the confidence interval for a parameter increases, the confidence coefficient also
increases. Therefore, a wider interval is needed to have a greater confidence that it contains the
true parameter value.
Example 6 Refer to Example 4.
a. Assume that the given values of the statistic x and s were based on a sample of size n =
100 instead of a sample size n = 50. Construct a 99% confidence interval for, the
population mean weekly time spent on study of all students in the university this year.
b. For a fixed confidence coefficient, how is the width of the confidence interval related to
the sample size?

Solution
a. Substitution of the values of the sample statistics into the general formula for a 99%
confidence interval for  yield
    s   9.2 
x  2.58    x  2.58    41.5  2.58    41.5  2.37
 n  n  100  or (39.13, 43.87)

b. The 99% confidence interval based on a sample of size n = 100, constructed in part a is
(39.13, 43.87) and the 99% confidence interval based on a sample of size n = 50 is (38.14,
44.86). From this, it is concluded that the 99% confidence interval based on a sample of
size n = 100 is narrower than the latter.

Relationship between width of confidence interval and sample size


The width of confidence interval is inversely proportional to the sample size, that is, the width of
the confidence interval decreases as the sample size increases.

The interpretation of a 95% confidence interval is that 95% of the intervals constructed in this
manner will contain the population mean. Thus, any interval computed in this manner has a
95% confidence of containing the population mean. By changing the constant from 1.96 to
1.645, a 90% confidence interval can be obtained. It should be noted from the formula for an
interval estimate that a 90% confidence interval is narrower than a 95% confidence interval
and as such has a slightly smaller confidence of including the population mean. Lower levels
of confidence lead to even more narrow intervals. In practice, a 95% confidence interval is
the most widely used.

In this section, the concepts of point estimation of the population mean , based on large
sample was introduced. Different terms such as confidence interval and confidence
coefficient were also presented and relationship of confidence interval with confidence
coefficient and sample size is also analyzed.
III. Estimation of a population mean: small sample case

Objectives

 To identify the properties Student’s t-distribution.


 To compute the confidence interval estimating the population mean with small
samples using the t-distribution

In an actual computation, the value of population standard deviation is rarely known.


Although this not an actual problem when the sample size is large, this is entirely different
when the sample size is small. Statisticians ran into problems when the sample size is small
because this causes inaccuracies in the confidence interval. William S. Goset (1876–1937) of
the Guinness brewery in Dublin, Ireland ran into this problem when his experiments with
hops and barley produced very few samples. Just replacing σ with s did not produce accurate
results when he tried to calculate a confidence interval thus he realized that he could not use a
normal distribution for the calculation; he found that the actual distribution depends on the
sample size. This problem led him to “discover” what is called the Student’s t-distribution.
The name comes from the fact that Gosset wrote under the pen name “A Student.” Up until
the mid-1970s, some statisticians used the normal distribution approximation for large sample
sizes and used the Student’s t-distribution only for sample sizes of at most 30 observations.

In the previous section, the Central Limit Theorem was used to estimate the
population mean of a large population sample. However, this cannot be used in this section
unless a certain assumption will be made and followed.

Assumption required for estimating μ based on small samples (n < 30)


The population from which the sample is selected has an approximate normal distribution.

If this assumption is valid, then we may again use x as a point estimation for , and
the general form of a small-sample confidence interval for  is as shown next box.

Small-sample confidence interval for μ


If σ is known
σ
x́ ± t α
2
( )
√n
If σ is unknown
s
x́ ± t α
2
( )
√n

where the distribution of t based on (n - 1) degrees of freedom.


If the population standard deviation is unknown and the sample size n is small then
when the sample standard deviation s is substituted for σ the normal approximation is no
longer valid. Therefore, Student’s t-distribution with n−1 degrees of freedom is used.

Properties of the Student’s t-Distribution

 The graph for the Student’s t-distribution is similar to the standard normal curve and
at infinite degrees of freedom it is the normal distribution. This can be confirmed by reading
the bottom line at infinite degrees of freedom for a familiar level of confidence, e.g. at
column 0.05, 95% level of confidence, the t-value of 1.96 is at infinite degrees of freedom.
 The mean for the Student’s t-distribution is zero and the distribution is symmetric
about zero, similar to the standard normal distribution.
 The Student’s t-distribution has more probability in its tails than the standard normal
distribution because the spread of the t-distribution is greater than the spread of the standard
normal. Therefore the graph of the Student’s t-distribution will be thicker in the tails and
shorter in the center than the graph of the standard normal distribution.
 The exact shape of the Student’s t-distribution depends on the degrees of freedom. As
the degrees of freedom increases, the graph of Student’s t-distribution becomes more like the
graph of the standard normal distribution.
 The underlying population of individual observations is assumed to be normally
distributed with unknown population mean μ and unknown population standard deviation σ.
This assumption comes from the Central Limit theorem because the individual observations
in this case are the x̄s of the sampling distribution. The size of the underlying population is
generally not relevant unless it is very small. If it is normal then the assumption is met and
doesn’t need discussion

Figure 3.1 Student’s t-distribution

As indicated by the figure, as the sample size n increases, Student’s t-distribution ever more


closely resembles the standard normal distribution. Although there is a different t-distribution
for every value of n, once the sample size is 30 or more it is typically acceptable to use the
standard normal distribution instead.
Example 7 A sample of size 15 drawn from a normally distributed population has sample
mean 35 and sample standard deviation 14. Construct a 95% confidence interval for the
population mean, and interpret its meaning.

Solution:

Since the population is normally distributed, the sample is small, and the population standard
deviation is unknown, the formula that applies is

x́ ± t α
2
( √sn )
Confidence level 95% means that

α=1−0.95=0.05(7.2.3)

so α/2=0.025. Since the sample size is n=15, there are n−1=14 degrees of freedom and

t0.025=2.145. Thus

s
x́ ± t α
2
( )
√n

¿ 35 ±2.145 ( √1415 )
¿ 35 ±7.8

Therefore one may be 95% confident that the true value of μ is contained in the interval
(35−7.8, 35+7.8)=(27.2,42.8)

Example 8 A random sample of 12 students from a large university yields mean


GPA 2.71 with sample standard deviation 0.51. Construct a 90% confidence interval for the
mean GPA of all students at the university. Assume that the numerical population of GPAs
from which the sample is taken has a normal distribution.

Solution:

Since the population is normally distributed, the sample is small, and the population standard
deviation is unknown, the formula that applies is

x́ ± t α
2
( √sn )
Confidence level 90% means that

α=1−0.90=0.10

so α/2=0.05. Since the sample size is n=12, there are n−1=11 degrees of freedom and  t0.05
=1.796. Thus

x́ ± t α
2
( √sn )
0.15
¿ 2.71 ±1.796 ( )
√12
¿ 2.71 ±0.26

Therefore, one may be 90% confident that the true average GPA of all students at the
university is contained in the interval (2.71−0.26, 2.71+0.26)=(2.45,2.97)

Example 9 The average earnings per share (EPS) for 10 industrial stocks randomly selected
from those listed on the Dow-Jones Industrial Average was found to be X  =1.85 with a
standard deviation of s=0.395. Calculate a 99% confidence interval for the average EPS of all
the industrials listed on the DJIA.

Solution

To help visualize the process of calculating a confident interval we draw the appropriate
distribution for the problem. In this case this is the Student’s t because we do not know the
population standard deviation and the sample is small, less than 30.
To find the appropriate t-value requires two pieces of information, the level of confidence
desired and the degrees of freedom. The question asked for a 99% confidence level. On the
graph this is shown where (1-α) , the level of confidence , is in the unshaded area. The tails,
thus, have .005 probability each, α/2. The degrees of freedom for this type of problem is n-1=
9. From the Student’s t table, at the row marked 9 and column marked .005, is the number of
standard deviations to capture 99% of the probability, 3.2498. These are then placed on the
graph remembering that the Student’s t is symmetrical and so the t-value is both plus or
minus on each side of the mean.

Inserting these values into the formula gives the result. These values can be placed on the
graph to see the relationship between the distribution of the sample means, X ‘s and the
Student’s t distribution.

x́ ± t α
2
( √sn )=1.8513 .2498 0.395
√ 10
=1.85510 .406

1.445 ≤ μ ≤ 2.257

We state the formal conclusion as :

With 99% confidence level, the average EPS of all the industries listed at DJIA is from 1.44
to 2.26.
IV. Estimation of a population proportion

Objectives
 Compute the confidence interval to estimate a population proportion
 Interpret the confidence interval in context.

This section focuses on the method for estimating population proportion. The
procedure to find the confidence interval for a population proportion is similar to that for the
population mean, but the formulas are a bit different although conceptually identical. While
the formulas are different, they are based upon the same mathematical foundation given by
the Central Limit Theorem. In determining if the problem falls under this section, the
underlying distribution must have a binary random variable and therefore is a binomial
distribution. (There is no mention of a mean or average.) If X is a binomial random variable,
then X ~ B(n, p) where n is the number of trials and p is the probability of a success. To form
a sample proportion, take X, the random variable for the number of successes and divide it
by n, the number of trials (or the sample size). The random variable P′ (read “P prime”) is the
sample proportion,
X
P' =
n

(Sometimes the random variable is denoted as  ^ P, read “P hat”.)


p′ = the estimated proportion of successes or sample proportion of successes (p′ is
a point estimate for p, the true population proportion, and thus q is the probability of a failure
in any one trial.)
x = the number of successes in the sample
n = the size of the sample
The formula for the confidence interval for a population proportion follows the same
format as that for an estimate of a population mean

The formula for the confidence interval for a population proportion is shown below.

Large-sample (1 − α ) 100% confidence interval for a population proportion, p


ˆˆ
pq
pˆ  z /2 pˆ  pˆ  z /2
n
where ^p is the sample proportion of observations with the characteristic of interest, and
qˆ  1  pˆ .

Remember that as p moves further from 0.5 the binomial distribution becomes less
symmetrical. Because we are estimating the binomial with the symmetrical normal
distribution the further away from symmetrical the binomial becomes the less confidence we
have in the estimate.
This conclusion can be demonstrated through the following analysis. Proportions are
based upon the binomial probability distribution. The possible outcomes are binary, either
“success” or “failure”. This gives rise to a proportion, meaning the percentage of the
outcomes that are “successes”. It was shown that the binomial distribution could be fully
understood if we knew only the probability of a success in any one trial, called p. The mean
and the standard deviation of the binomial were found to be:
μ=np
σ =npq

It was also shown that the binomial could be estimated by the normal distribution if BOTH
np and nq were greater than 5. Unfortunately, there is no correction factor for cases where the
sample size is small so np′ and nq’ must always be greater than 5 to develop an interval
estimate for p.

Example 9 According to a 2010 report from the American Council on Education, females
make up 57% of the college population in the United States. Students in a statistics class at
Tallahassee Community College want to determine the proportion of female students at TCC.
They select a random sample of 135 TCC students and find that 72 are female, which is a
sample proportion of 72 / 135 ≈ 0.533. So 53.3% of the students in the sample are female.
What can they conclude about the proportion of females at the college? How confident can
they be in their estimate?

Solution:
Step 1. Find a confidence interval.
Note that a confidence interval comes from a normal model of the sampling distribution and
there are two conditions for using a normal model for sample proportions:
 The sample must be random.
 The expected number of successes in the sample, np, and the expected number of
failures, n(1 – p), are both greater than or equal to 10. In symbols, this is np ≥ 10
and n(1 − p) ≥ 10. Recall that success doesn’t mean good and failure doesn’t mean bad.
A success is just what we are counting.
Advanced theory tells us that if the actual number of successes and failures in the sample are
greater than or equal to 10, then a normal model is still a good fit.
This sample contains 72 successes (female students) and 63 failures (male students). Both are
greater than 10. We therefore use the normal model for the sampling distribution.
Step 2. Find the margin of error:
Note that a sample proportion is only an estimate for the population proportion therefore, the
sample proportion is not equal to the population proportion, so there is some error due to
random chance. The standard deviation of the sample proportions is used to describe the
amount of error that is expected in random samples. This is called the standard error.
When using a normal model for the sampling distribution, 95% of sample proportions
estimate the population proportion within approximately 2 standard errors. So the margin of
error is the following:
p(1− p)
2
√ n
Now let’s calculate the margin of error for the TCC estimate of 53.3%. Since population
proportion p is unknown, the margin of error cannot be calculated. The solution to this
problem is to estimate the standard error using the sample proportion in place of p. This is
called the estimated standard error, and the formula is:

√¿ ¿ ¿
For this example, the estimated standard error is
0.533(1−0.533)
√ 135
≈ 0.043
So the margin of error for the 95% confidence interval is:
0.533(1−0.533)
2
√ 135
≈ 2 ( 0.043 )=0.086

Step 3. Find the confidence interval


We can interpret the margin of error by saying we are 95% confident that the proportion of
all students at TCC who are female is within 0.086 of our sample proportion of 0.533. We
can then write the interval in the following form:
^p ±marginoferror=0.533±0.086
When we add and subtract the margin of error from the sample proportion, the confidence
interval is 0.447 to 0.619.
Conclusion:
We are 95% confident that the proportion of all TCC students who are female is between
0.447 and 0.619.
V. Estimation of the difference between two population means: Large and Small
Independent samples
In Section 2, we learned how to estimate the parameter  based on a large sample from a
single population. We now proceed to a technique for using the information in two samples to
estimate the difference between two population means.

Objectives

 To construct a confidence interval estimating the difference in the means of two


distinct populations using large and small independent samples

The figure below illustrates the conceptual framework of investigation in this section. Each
population has a mean and a standard deviation. We arbitrarily label one population as
Population 1 and the other as Population 2, and subscript the parameters with the
numbers 1 and 2 to tell them apart. We draw a random sample from Population 1 and label
the sample statistics it yields with the subscript 1. Without reference to the first sample we
draw a sample from Population 2 and label its sample statistics with the subscript 2.

Figure 5.1 Independent Sampling from Two Population

Definition 5
Independence. Samples from two distinct populations are independent if each one is drawn
without reference to the other, and has no connection with the other.

The goal is to use the information in the samples to estimate the difference μ1−μ2 in the
means of the two populations and to make statistically valid inferences about it.

Since the mean x−1 of the sample drawn from Population 1 is a good estimator of μ1 and the
mean x−2 of the sample drawn from Population 2 is a good estimator of μ2, a reasonable
point estimate of the difference μ1−μ2 is  x́ 1− x´2. In order to widen this point estimate into a
confidence interval, we first suppose that both samples are large, that is, that
both n1≥30 and n2≥30. If so, then the following formula for a confidence interval for μ1−μ2 is
valid. The symbols s21and s22 denote the squares of s1 and s2. (In the relatively rare case that
both population standard deviations σ1 and σ2 are known they would be used instead of the
sample standard deviations.

Large-sample (1 - )100% confidence interval for (μ 1 − μ2 )


 12  22
( x 1  x2 )  z /2 ( x 1  x 2 )  ( x1  x2 )  z /2 
n1 n2

s12 s22
 ( x 1  x2 )  z /2 
n1 n2
2 2
(Note: We have used the sample variances s1 and s2 as approximations to the corresponding
population parameters.)

The assumptions upon which the above procedure is based are the following:

Assumptions required for large-sample estimation of (μ 1 − μ2 )


1. The two random samples are selected in an independent manner from the target populations.
That is the choice of elements in one sample does not affect, and is not affected by, the choice
of elements in the other sample.
2. The sample sizes n1 and n2 are sufficiently large. (at least 30)

Example 10 To compare customer satisfaction levels of two competing cable television


companies, 174 customers of Company 1 and 355 customers of Company 2 were randomly
selected and were asked to rate their cable companies on a five-point scale, with 1 being least
satisfied and 5 most satisfied. The survey results are summarized in the following table:

Company 1 Company 2
n1=174 n2=355
x-1=3.51 x-2=3.24
s1=0.51 s2=0.52

Construct a point estimate and a 99% confidence interval for μ1−μ2 , the difference in average

satisfaction levels of customers of the two companies as measured on this five-point scale.

Solution:

The point estimate of μ1−μ2  is

x́ 1− x´2 =3.51−3.24=0.27

In words, we estimate that the average customer satisfaction level for


Company 1 is 0.27 points higher on this five-point scale than it is for Company 2.

The 99% confidence level means that α=1−0.99=0.01so that zα/2=z0.005 and z0.005=2.576. Thus


s 21 s22 0.512 0.522
x́ 1− x´2 ± z α / 2
√ + =0.27 ± 2.576
n 1 n2 √+
174 355
=0.27 ± 0.12

We are 99% confident that the difference in the population means lies in the


interval [0.15,0.39], in the sense that in repeated sampling 99% of all intervals constructed
from the sample data in this manner will contain μ1−μ2. In the context of the problem we say
we are 99% confident that the average level of customer satisfaction for Company 1 is
between 0.15 and 0.39 points higher, on this five-point scale, than that for Company 2.

When estimating the difference between two population means, based on small samples from
each population, we must make specific assumptions about the relative frequency
distributions of the two populations, as indicated in the box.

Assumptions required for small-sample estimation of (μ 1 − μ2 )


1. Both of the populations which the samples are selected have relative frequency distributions
that are approximately normal.
2. The variances  1 and  2 of the two populations are equal.
2 2

3. The random samples are selected in an independent manner from two populations.

When these assumptions are satisfied, we may use the procedure specified in the next box to
construct a confidence interval for ( 1  2 ) , based on small samples (n1 and n2 < 30) from
respective populations.

Small-sample (1 - )100% confidence interval for (μ 1 − μ2 )


 1 1 
( x 1  x2 )  t /2 s2p   
 n1 n2 
where
(n1  1)s12  (n2  1)s22
s2p 
n1  n2  2
t /2
and the value of is based on (n1 + n2 - 2) degrees of freedom.

Example 11 A software company markets a new computer game with two experimental
packaging designs. Design 1 is sent to 11 stores; their average sales the first month is 52 units
with sample standard deviation 12 units. Design 2 is sent to 6 stores; their average sales the
first month is 46 units with sample standard deviation 10 units. Construct a point estimate and
a 95% confidence interval for the difference in average monthly sales between the two
package designs.

Solution:

The point estimate of μ1−μ2 is


x́ 1− x´2 =52-46-6

In words, we estimate that the average monthly sales for Design 1 is 6 units more per month
than the average monthly sales for Design 2.

To apply the formula for the confidence interval we must find tα/2

The 95% confidence level means that α=1−0.95=0.05 so that tα/2=t0.025 in the row with the


heading df=11+6−2=15 we read that t0.025=2.131. From the formula for the pooled sample
variance we compute

( n1−1 ) s 21+(n2 −1) s 22 ( 10 )( 12 ) + ( 5 ) (102 )


s2p= = =129.3
n1 +n 2−2 15

Thus
2 1 1 1 1
√( ) √ (
( x 1−x 2 ) ±t α s p n + n =6 ± ( 2.131 ) 129.3 11 + 6 ≈6 ± 12.3
2 1 2
)
We are 95% confident that the difference in the population means lies in the
interval [−6.3,18.3], in the sense that in repeated sampling 95% of all intervals constructed
from the sample data in this manner will contain μ1−μ2. Because the interval contains both
positive and negative values the statement in the context of the problem is that we
are 95% confident that the average monthly sales for Design 1 is between 18.3 units higher
and 6.3 units lower than the average monthly sales for Design 2.
VI. Estimation of the difference between two population means: Matched pairs

The procedure for estimating the difference between two population means presented in
Section 5 were based on the assumption that the samples were randomly selected from the
target populations. Sometimes we can obtain more information about the difference between
population means (  1  2 ) , by selecting paired observations.

Objectives
 To compute the confidence interval estimating the difference in the means of two
distinct populations using paired sample
 To test of hypotheses using the critical value approach

Assumptions required for estimation of (μ 1 − μ2 ) : Matched pairs


1. The sample paired observations are randomly selected from the target population of paired
observations.
2. The population of paired differences is normally distributed.

Small-sample (1 − α ) 100% confidence interval for μd = ( μ1 − μ2 )


Let d1, d2, . . . dn represent the differences between the pair-wise observations in a random sample
of n matched pairs. Then the small-sample confidence interval for μd = ( μ1 − μ2 ) is
 s 
d  t /2  d 
 n
where d is the mean of n sample differences, sd is their standard deviation, and t /2 is based
on (n-1) degrees of freedom.

Note that the population of differences must be normally distributed.

Testing hypotheses concerning the difference of two population means using paired
difference samples is done precisely as it is done for independent samples, although now the
null and alternative hypotheses are expressed in terms of μd instead of μ1−μ2. Thus the null
hypothesis will always be written

H0:μd=D0

The three forms of the alternative hypothesis, with the terminology for each case, are:

Table 6.1 Three forms of hypothesis

Form of Ha Terminology
H0:μd¿D0 Left-tailed
H0:μd¿D0 Right-tailed
H0:μd≠D0 Two-tailed
The same conditions on the population of differences that was required for
constructing a confidence interval for the difference of the means must also be met when
hypotheses are tested. Here is the standardized test statistic that is used in the test.

STANDARDIZED TEST STATISTIC FOR HYPOTHESIS TESTS CONCERNING THE


DIFFERENCE BETWEEN TWO POPULATION MEANS: PAIRED DIFFERENCE
SAMPLES

d́−D 0
T=
sd / √ n

where there are n pairs, d́ is the mean and sd is the standard deviation of their differences.
The test statistic has Student’s t-distribution with df=n−1 degrees of freedom.

The population of differences must be normally distributed

Example 12 Suppose that the n = 10 pairs of achievement test scores were given in Table 7.
Find a 95% confidence interval for the difference in mean achievement, d  ( 1  2 ) .

Table 7 Reading achievement test scores for Example 12

Student pair
1 2 3 4 5 6 7 8 9 10
Method 1 78 63 72 89 91 49 68 76 85 55
score
Method 2 71 44 61 84 74 51 55 60 77 39
score
Pair 7 19 11 5 17 -2 13 16 8 16
difference

Solution The differences between matched pairs of reading achievement test scores are
computed as

d = (method 1 score - method 2 score)


The mean, variance, and standard deviation of the differences are
d
 d  110  11.0
n 10
 d
2
(110)2
d
2
 1,594 
sd2  n  10  1,594  1,210  42.6667
n1 9 9
sd  42.67  6.53

The value of t.025, based on (n -1) = 9 degrees of freedom, is given in Table 2 of Appendix C
as t.025 = 2.262. Substituting these values into the formula for the confidence interval, we
obtain
 s 
d  t.025  d 
 n
 6.53 
 11.0  2.262    11.0  4.7
 10 

or (6.3, 15.7).

We estimate, with 95% confidence, that the difference between mean reading achievement
test scores for method 1 and 2 falls within the interval from 6.3 to 15.7. Since all the values
within the interval are positive. method 1 seems to produce a mean achievement test score
that substantially higher than the mean score for method 2.

Example 13: Using the Critical Value Approach

Using the data of Table 6.1, test the hypothesis that mean fuel economy for Type 1 gasoline
is greater than that for Type 2 gasoline against the null hypothesis that the two formulations
of gasoline yield the same mean fuel economy. Test at the 5% level of significance using the
critical value approach if the d́=0.14, sd=0.16 and n=3

Solution:

The only part of the table that we use is the third column, the differences.

 Step 1. Since the differences were computed in the order  Type 1 mpg−Type 2 mpg,
better fuel economy with Type 1 fuel corresponds to μd=μ1−μ2>0. Thus the test is
H0:μd=D0
Vs
H0:μd¿D0 at α=0.05

 Step 2. Since the sampling is in pairs the test statistic is


d́−D 0
T=
sd / √ n
 Step 3. Inserting the given values and D0=0 into the formula for the test statistic gives
d́−D 0 0.14
T= = =2.600
sd / √ n 0.16/ √ 3

 Step 4. Since the symbol in Ha is “>” this is a right-tailed test, so there is a single
critical value, tα=t0.05 with 88 degrees of freedom, which from the row labeled df=8
read off as 1.860. The rejection region is [1.860,∞).
 Step 5. As shown in below, the test statistic falls in the rejection region. The decision
is to reject H0. In the context of the problem our conclusion is:
Conclusion: The data provide sufficient evidence, at the 5% level of significance, to conclude
that the mean fuel economy provided by Type 1 gasoline is greater than that for
Type 2 gasoline.

VII. Estimation of the difference between two population proportions

This section extends the method of Section 4 to the case in which we want to estimate the
difference between two population proportions. Suppose we wish to compare the proportions
of two populations that have a specific characteristic, such as the proportion of men who are
left-handed compared to the proportion of women who are left-handed.

Objectives
 To construct a confidence interval estimating the difference in the proportions of two
distinct populations that have a particular characteristic of interest

The figure below illustrates the conceptual framework of our investigation. Each population
is divided into two groups, the group of elements that have the characteristic of interest (for
example, being left-handed) and the group of elements that do not. We arbitrarily label one
population as Population 1 and the other as Population 2, and subscript the proportion of each
population that possesses the characteristic with the number 1 or 2 to tell them apart. We
draw a random sample from Population 1 and label the sample statistic it yields with the
subscript 1. Without reference to the first sample we draw a sample from Population 2 and
label its sample statistic with the subscript 2.

Figure 7.1 Independent Sampling from Two Populations In Order to Compare Proportions
The goal is to use the information in the samples to estimate the difference p1−p2 in the two
population proportions and to make statistically valid inferences about it.

To judge the reliability of the point estimate ( pˆ 1  pˆ 2 ) , we need to know the characteristics of
its performance in repeated independent sampling from two populations. This information is
provided by the sampling distribution of ( pˆ 1  pˆ 2 ) , shown in the next box.

Sampling distribution of ( p 1 − p2 )
^ ^

For sufficiently large sample size, n1 and n2, the sample distribution of ( p^ 1 − ^p2 ) , based on
independent random samples from two populations, is approximately normal with
( pˆ  pˆ )  ( p
ˆ1  p
ˆ2 )
Mean: 1 2

And
p1 q1 p q
 ( pˆ p
ˆ2 )   2 2
1
n1 n2
Standard deviation:
where q1 = 1 - p1 and q2 = 1 - p2.

It follows that a large-sample confidence interval for ( pˆ 1  pˆ 2 ) may be obtained as shown in


the box.

Large-sample (1 − α ) 100% confidence interval for ( p 1 − p2 )


^ ^
ˆ1q
p ˆ1 ˆ q
p ˆ
(p ˆ 2 )  z /2 ( pˆ 1  pˆ 2 )  ( p
ˆ1  p ˆ1  p
ˆ 2 )  z /2  2 2
n1 n2

where ^p1 and ^p2 are the sample proportions of observations with the characteristics of
interest.

Assumption: The samples are sufficiently large so that the approximation is valid. As a general
rule of thumb, we will require that intervals

ˆ1q
p ˆ1 ˆ2q
p ˆ2
ˆ1  2
p ˆ2  2
p
n1 n2
and do not contain 0 or 1.

Example 14 The department of code enforcement of a county government issues permits to


general contractors to work on residential projects. For each permit issued, the department
inspects the result of the project and gives a “pass” or “fail” rating. A failed project must be
re-inspected until it receives a pass rating. The department had been frustrated by the high
cost of re-inspection and decided to publish the inspection records of all contractors on the
web. It was hoped that public access to the records would lower the re-inspection rate. A year
after the web access was made public, two samples of records were randomly selected. One
sample was selected from the pool of records before the web publication and one after. The
proportion of projects that passed on the first inspection was noted for each sample. The
results are summarized below. Construct a point estimate and a 90% confidence interval for
the difference in the passing rate on first inspection between the two time periods.

No public web access Public web access


n1=500 n2=100
p1=0.67
^ p2=0.80
^

Solution

Because the “No public web access” population was labeled as Population 1 and the “Public
web access” population was labeled as Population 2, in words this means that we estimate
that the proportion of projects that passed on the first inspection increased by 13 percentage
points after records were posted on the web.

The sample sizes are sufficiently large for constructing a confidence interval since for sample 1:

p1 ( 1−^
p1 )
3
√ ^
n1
=3
√ ( 0.67 )( 0.33 )
500
=0.06

So that

p1 ( 1− ^p1 ) p^1 ( 1− ^p1 )


p1−3
^
√ ^
n1
,^
p1 +3
√ n1
= [ 0.67−0.6 , 0.67+ 0.6 ] =[ 0.61 , 0.73 ] ⊂ [ 0,1 ]

And for sample 2;

p2 ( 1−^p2 )
3
√ ^
n2
=3
( 0.8 )( 0.2 )

100
=0.12

So that

p 2 ( 1− ^p 2) p^2 ( 1−^p2)
p2−3
^
√ ^
n2
,^
p 2+3
√ n2
=[ 0.8−0.12,0 .8+0.12 ]= [ 0.68,0 .92 ] ⊂ [ 0,1 ]

To apply the formula for the confidence interval, we first observe that the 90% confidence
level means that α=1−0.90=0.10 so that zα/2=z0.05 and z0.05=1.645. Thus the desired confidence
interval is

p1 − ^
(^ p2 )± z α
2 √ p1 ( 1− ^
^
n1
p1 )
+
p^2 ( 1− ^
√ n2
p2 )

( 0.67 ) (0.33) ( 0.8 ) (0.2)


¿ 0.13 ±1.645
√ 500
+
100

¿−0.13 ±0.07
The 90% confidence interval is [−0.20,−0.06]. We are 90% confident that the difference in
the population proportions lies in the interval [−0.20,−0.06], in the sense that in repeated
sampling 90% of all intervals constructed from the sample data in this manner will
contain p1−p2. Taking into account the labeling of the two populations, this means that we
are 90% confident that the proportion of projects that pass on the first inspection is between 6
and 20 percentage points higher after public access to the records than before.

VIII. Choosing the sample size

Objective

 Compute the sample size required to estimate population parameters for population
mean, population proportion, and two independent samples.

Calculating the right sample size is crucial to gaining accurate information. In fact, in a
survey, the confidence level and margin of error almost solely depends on the number of
responses received.

The first thing to understand is the difference between confidence levels and margins of
error. Simply put, a confidence level describes how sure you can be that your results are
accurate, whereas the margin of error shows the range the survey results would fall between
if our confidence level held true.

Sample Size for Population Mean µ

The margin of error (MOE) for the 95% confidence interval (CI) for µ is

2s
MOE=E ≈
√n
where s is the standard deviation of the sample. And the 95% CI is;

X ± MOE

Example 15 A manufacturer of cereal boxes wants to know the mean weight of the boxes it
produces. Previous studies have shown the population standard deviation of the weights of
the boxes to be 0.1 ounces. They would like to estimate µ with 95% confidence and have the
MOE no greater than 0.012.
Solution

The 95% CI for µ depends on the MOE which depends on s and the sample size n

2s
MOE=E ≈
√n
The sample standard deviation s is an estimate for the population standard deviation σ. If we
happen to know σ, we will use it instead of s for our MOE


MOE=E ≈
√n
This client has asked that the MOE be no greater than 0.012, and we know σ=0.1


MOE=
√n
2 ( 0.1 )
0.012=
√n
Solving for n gives:

2 ( 0.1 )
0.012=
√n
¿

0.04
0.000144=
n

n=277.78 ≈ 278

To have a 95% confidence interval for µ with a MOE of 0.012, this company will have to
sample 278 boxes.

The sample size needed to be 95% confident that x́, the sample mean, will be within MOE of
the population mean, µ.

n≥¿

Sample Size for Population Proportion p

The margin of error (MOE) for the 95% confidence interval (CI) for a population proportion p
is

^p ( 1− ^p )
MOE=E ≈ 2
√ n

Where ^p is the sample proportion and the 95% CI is:

^p ± MOE

Example 16 A manufacturer of coats wants to know the proportion of coats, p, it is


producing with defective zippers. They would like to estimate p with 95% confidence and
have the MOE no greater than 0.04.

Solution

At 95% confidence, the MOE depends on ^p and the sample size n.

^p ( 1− ^p )
MOE=2
√ n

It turns out that the margin of error is largest when ^p = 0.5. So, since ^p is unknown before
we collect our data, we’ll plug-in ^p = 0.5 which gives us the widest possible MOE (i.e. we’re
assuming worst case scenario for estimating p).

After plugging-in ^p=0.5, and the requested MOE from the client, this gives us

^p ( 1− ^p )
MOE=2
√ n

0.5(1−0.5)
0.04=2
√ n
2
0.5 ( 0.5 )
2
(0.004) =(2
n √ )

0.5(0.5)
0.0016=4
0.0016

1
n= =625
0.0016

To have a 95% confidence interval for p with a MOE of 0.04, this company will have to
sample 625 coats.

The sample size needed to be 95% confident that ^p, the sample proportion, will be within MOE
of the population proportion, p
1
n≥
MOE2

Sample Sizes for Two Independent Samples

In studies where the plan is to estimate the difference in means between two independent
populations, the formula for determining the sample sizes required in each comparison group is
given below:

n=2 ¿

where n is the sample size required in each group, Z is the value from the standard normal
distribution reflecting the confidence level that will be used, E is the desired margin of error
and σ reflects the standard deviation of the outcome variable

When we generated a confidence interval estimate for the difference in means, Sp can be used,
the pooled estimate of the common standard deviation, as a measure of variability in the
outcome, where Sp is computed as

( n1 −1 ) s 21+(n2−1)s 22
S p=
√ (n1 +n2−2)

Example 17 An investigator wants to plan a clinical trial to evaluate the efficacy of a new
drug designed to increase HDL cholesterol (the "good" cholesterol). The plan is to enrol
participants and to randomly assign them to receive either the new drug or a placebo. HDL
cholesterol will be measured in each participant after 12 weeks on the assigned treatment.
Based on prior experience with similar trials, the investigator expects that 10% of all
participants will be lost to follow up or will drop out of the study over 12 weeks. A 95%
confidence interval will be estimated to quantify the difference in mean HDL levels between
patients taking the new drug as compared to placebo. The investigator would like the margin
of error to be no more than 3 units. How many patients should be recruited into the study?  

Solution

The sample sizes are computed as follows:

n=2 ¿
¿2¿

Samples of size n1=250 and n2=250 will ensure that the 95% confidence interval for the
difference in mean HDL levels will have a margin of error of no more than 3 units. These
sample sizes refer to the numbers of participants with complete data. The investigators
hypothesized a 10% attrition (or drop-out) rate (in both groups). In order to ensure that the
total sample size of 500 is available at 12 weeks, the investigator needs to recruit more
participants to allow for attrition.  

N (number to enroll) * (% retained) = desired sample size

Therefore N (number to enroll) = desired sample size/(% retained)

N = 500/0.90 = 556

If they anticipate a 10% attrition rate, the investigators should enroll 556 participants. This
will ensure N=500 with complete data at the end of the trial

IX. Estimation of a population variance


The previous sections show that confidence intervals can be used to estimate the unknown
value of a population mean or a proportion. The normal and student t distributions are used
for developing these estimates. However, the variability of a population is also important. As
we have learned, less variability is almost always better. We use the chi-square distribution
(pronounce as kigh-square) to construct the confidence intervals (estimates) of variances or
standard deviations.

Objectives

 find critical values for the Χ2 distribution


 construct and interpret confidence intervals about σ2 and σ

χ2 (chi-square) distribution

Suppose we take a random sample of size n from a normal population with mean µ and
standard deviation σ. Then the sample statistic

2 ( n−1 ) s 2
χ=
σ2

follows a χ2 distribution with n-1 degrees of freedom

where s2 represents the sample variance.

Properties of the χ2 (chi-square) distribution

 The total area under χ2 curve equals 1.


 The value of the χ2 random variable is never negative, so the χ 2 curve starts at 0.
However, it extends indefinitely to the right, with no upper bound.
(When a sample with variance s2 is close to the population variance σ2, the value of χ2
will be close to the number of degrees of freedom n – 1, and n -1 is positive, so χ 2 will
be positive. This explains why the χ2 graph begins at 0)

 Because of the characteristics just described, the χ 2 curve is right skewed. In another
word, the chi-square distribution is not symmetric.

 There is a different curve for every different degrees of freedom, n-1. As the number
of degrees of freedom increases, the χ2 curve begins to look more symmetric.

Example 18 Find the critical values in the Χ2 distribution which separate the middle 95%
from the 2.5% in each tail, assuming there are 12 degrees of freedom.

Solution

Using the table of Chi-square distribution in the appendix, it can be inferred that the two
critical values given by the above conditions are 4.404 and 23.337.

For an easier view of the answer, the table below is given. The answer is highlighted by a red
color in a horizontal manner.

Finding Critical Values for the χ2

To construct the confidence intervals, we need to find the critical values of a chi-square
distribution for the given confidence level 100 (1 – α)%. We can use either the chi-square
table (table A-4) or technology. Table A-4 shows the degrees of freedom in the left column.
The area to the right of the χ2 critical value is given across the top of the table. (See appendix)

Since chi-square distribution is not symmetric, we cannot construct the confidence interval
for σ2 using the “point estimate ± Margin of error” method. We must find two different chi-
square critical values for each confidence interval for the given confidence level 100 (1 – α)
%.

A (1 - )100% confidence interval for a population variance, 2


(n  1)s2 (n  1)s2
  2

 2 /2  2(1  /2)

2 2
where 1  /2 , and  /2 are values of 2 that locate an area of /2 to the right and /2 to the left,
respectively, of a chi-square distribution based on (n - 1) degrees of freedom.

Assumption: The population from which the sample is selected has an approximate normal
distribution.

Example 19 Suppose a sample of 30 ECC students are given an IQ test. If the sample has a
standard deviation of 12.23 points, find a 90% confidence interval for the population standard
deviation.

Solution
We first need to find the critical values:

X 21−α /2= X 20.95,29 ≈ 17.708 and X 2α /2 =X 20.05,29 ≈ 42.557

Then the confidence interval is:

(n−1) s2 2 (n−1) s 2
<σ < 2
X 2α / 2 X 1−α / 2

(30−1)12.232 2 (30−1)12.232

42.557 17.708

101.9249<σ 2 244.9472

10.10<σ 15.65

So we are 90% confident that the standard deviation of the IQ of ECC students is between
10.10 and 15.65 bpm.

Summary
This chapter presented the technique of estimation - that is, using sample information to make
an inference about the value of a population parameter, or the difference between two
population parameters. In each instance, we presented the point estimate of the parameter of
interest, its sampling distribution, the general form of a confidence interval, and any
assumptions required for the validity of the procedure. In addition, we provided techniques
for determining the sample size necessary to estimate each of these parameters.
References

Shafer, D. S. & Zhang, Z. (2013). Introductory Statistics. Flat World Knowledge Inc.

Myers, S. L., Ye, K., & Walpole, R. E. (2007). Probability & statistics for engineers &
scientists. 8th Ed. Upper Saddle River, NJ: Pearson Prentice Hall.

Sullivan L., Power and Sample Size Determination. Boston Univeristy School of Public
Health

Lane. D., (2012). Introductory Statistics.

Introduction to Statistics. Lumen learning.com

Appendix

Table 1. T-distribution table


Table 2 Chi-square distribution table

You might also like