0% found this document useful (0 votes)
10 views91 pages

Lecture Note Statistical Inference

The document outlines the course 'Statistical Inference' (Stat 3052), detailing its structure, content, and assessment methods. It covers key topics such as point estimation, interval estimation, hypothesis testing, and nonparametric methods, emphasizing the importance of understanding population parameters through sample data. The course aims to develop students' mathematical theory of statistics, building on concepts from calculus and probability.

Uploaded by

christian talla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views91 pages

Lecture Note Statistical Inference

The document outlines the course 'Statistical Inference' (Stat 3052), detailing its structure, content, and assessment methods. It covers key topics such as point estimation, interval estimation, hypothesis testing, and nonparametric methods, emphasizing the importance of understanding population parameters through sample data. The course aims to develop students' mathematical theory of statistics, building on concepts from calculus and probability.

Uploaded by

christian talla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Course Title: Statistical Inference

Course code: Stat 3052


Credit: 5 EtCTS
Credit hours: 3 (3Lecture hrs+2 hrs tutorial)
Instructor’s Name: Kenenisa T. (MSc.)
Email: kenenisatadesse@[Link]

April, 2020
Jimma, Ethiopia
Statistical Inference(Stat-3052)
Outline
1 Chapter 0: Preliminaries
Definitions of Some Basic Terms
Sampling Distribution
What is Statistical Inference?
What is Statistical Inference?
What is Statistical Inference?
What is Statistical Inference?
2 Chapter 1: Parametric Point Estimation
Methods of Finding Parametric Point Estimators
Methods of Finding Parametric Point Estimators
Methods of Finding Parametric Point Estimators
Methods of Finding Parametric Point Estimators
Maximum Likelihood (ML) Method
Properties of MLE
Properties of MLE
Properties of MLE
Properties of MLE
Method of Moments
Properties of Point Estimators
Properties of Point Estimators
Properties of Point Estimators
Unbiased Estimators
Unbiased Estimators
Mean Square Error (MSE) of an Estimator
Efficiency of an Estimator May 12, 2019 2 / 78
Chapter 0: Preliminaries

The aim of statistical inference is to make certain determinations with regard to the
unknown constants known as parameter(s) in the underlying distribution.
With the intention of emphasizing the importance of the basic concepts, we begin with a
review of the definitions of terms related to random sampling distribution of some
estimators in the preliminary chapter.
The first step in statistical inference is Point Estimation, in which we compute a single
value (statistic) from the sample data to estimate a population parameter.
General concept of point estimators, different methods of finding estimators and
clarification of their properties are discussed in Chapter 1.
Then proceed to Interval Estimation, a method of obtaining, at a given level of confidence
(or probability), two statistics which include within their range an unknown but fixed
parameter are discussed in Chapter 2.
In Chapter 3 we discuss a 2nd major area of statistical inference is Testing of
Hypotheses. The significance of the differences between estimated parameters from two
or more samples are also included in this chapter; such as the significance the difference
of two population means.
Nonparametric methods that does not based on sampling distributions are discussed in
Chapter 4 (Group Work to be presented by Students).

May 12, 2019 3 / 78


Definitions of Some Basic Terms

Population refers to all elements of interest characterized by a distribution F with some


parameter, say θ ∈ Θ (where Θ is the set of its possible values called the parameter
space).
Sample is the set of data X1 , . . . , Xn , selected subset of the population, n is sample size.
Remember, to use sample data for inference, needs to be representative of population for
the question(s) of interest or our study.
For X1 , . . . , Xn , a random sample (independent and identically distributed, iid) from a
distribution with cumulative distribution function (cdf ) F (x; θ). The cdf admits a probability
mass function (pmf) in the discrete case and a probability density function (pdf) in the
continuous case, in either case, write this function as f (x; θ).
A parameter is a number associated with a population characteristic.– value unknown. It
is usually assumed to be fixed but unknown. Thus, we estimate the parameter using
sample information.
Examples of population parameters: sample mean (µ) and population variance (σ 2 ).
A statistic or estimate is a number computed from a sample. A statistic estimates a
parameter and it changes with each new sample.
A statistic is any function of the observations in a random sample, (no parameter in the
function).

May 12, 2019 4 / 78


Examples of A Statistic
For example, the sample mean, the sample variance, and the sample proportion p are
statistics and they are also random variables.
Let X1 , . . . , Xn be random samples taken from a population. The sample mean:

n
1X
X̄ = Xi . (1)
n
i=1

The sample variance (biased):

n
1X 2
S2 = Xi − X̄ . (2)
n
i=1

The sample variance (unbiased):

n
1 X 2
S2∗ = Xi − X̄ . (3)
n−1
i=1

The sample proportion p,


x
p= . (4)
n

May 12, 2019 5 / 78


Sampling Distribution
Since a statistic is a function of a random variable, it self is also a random variable, and it
has a probability distribution. The distribution of a statistic is called
the sampling distribution of the statistic because is depends on the sample chosen.
Example: Consider X = {X1 , . . . , Xn } be random samples (iid) taken from a normal
population with sample mean µ and variance σ 2 , or X ∼ N µ, σ 2 for each of sample of


size n.
I The sampling distribution of the sample mean, X̄ :

n n
" #
  1X 1X
E X̄ =E Xi = E [Xi ]
n n
i=1 i=1
n
1X
= µ, since E [Xi ] = µ
n
i=1
1
= (nµ) = µ.
n
n n n
" #  2 X  2 X
1X 1 1
σ 2 , sinceXis are iid
 
Var X̄ =Var Xi = Var [Xi ] =
n n n
i=1 i=1 i=1

1  2 σ2
= nσ = .
n n
2
 
Therefore, X̄ ∼ N µ, σn is the sampling distribution of, X̄ :
May 12, 2019 6 / 78
Assessment

Problem: Consider X = {X1 , . . . , Xn } be random samples(iid) taken from a normal


population with mean µ and variance σ 2 , or X ∼ N µ, σ 2 for each of sample of size n.
1. Define Z = √X̄ −µ
2
.
σ /n
a. Are there any parameter in the function of Z? What are they?
b. Is Z a statistic? Why?
c. Derive the E [Z].
d. Derive the Var [Z].
e. Give the sampling distribution of Z.
X −µ
2. Define W = σ
.
a. Are there any parameter in the function of W? What are they?
b. Is W a statistic? Why?
c. Derive the E [W].
d. Derive the Var [W].
e. Give the sampling distribution of W.

May 12, 2019 7 / 78


What is Statistical Inference?
Statistics is closely related to probability theory, but have entirely different goals.
Recall, from statical theory, that a typical probability problem starts with some assumptions
about the distribution of a random variable (e.g., that it’s binomial), and the objective is to
derive some properties (probabilities, expected values, etc) of said random variable based
on the stated assumptions.
In statistics, a sample from a given population is observed, and the goal is to learn
something about that population based on the sample.

Statistical Inference/Inferential Statistics


is a conceptually the process of drawing conclusions about population based on the samples that
are subject to random variation.

Every scientific discipline applies statistics to seek relevant information from a given
sample of data. The procedure leads to conclusions regarding a population, which
includes all possible observations of the process or phenomenon, and is called
statistical inference.
The goal of statistical inference is to develop the mathematical theory of statistics,
mostly building on calculus and probability.

Types of Statistical Inference:


I Parameter estimation:

May 12, 2019 8 / 78


What is Statistical Inference?
Statistics is closely related to probability theory, but have entirely different goals.
Recall, from statical theory, that a typical probability problem starts with some assumptions
about the distribution of a random variable (e.g., that it’s binomial), and the objective is to
derive some properties (probabilities, expected values, etc) of said random variable based
on the stated assumptions.
In statistics, a sample from a given population is observed, and the goal is to learn
something about that population based on the sample.

Statistical Inference/Inferential Statistics


is a conceptually the process of drawing conclusions about population based on the samples that
are subject to random variation.

Every scientific discipline applies statistics to seek relevant information from a given
sample of data. The procedure leads to conclusions regarding a population, which
includes all possible observations of the process or phenomenon, and is called
statistical inference.
The goal of statistical inference is to develop the mathematical theory of statistics,
mostly building on calculus and probability.

Types of Statistical Inference:


I Parameter estimation:
1 Point estimation;
2 Interval estimation;
May 12, 2019 8 / 78
What is Statistical Inference?
Statistics is closely related to probability theory, but have entirely different goals.
Recall, from statical theory, that a typical probability problem starts with some assumptions
about the distribution of a random variable (e.g., that it’s binomial), and the objective is to
derive some properties (probabilities, expected values, etc) of said random variable based
on the stated assumptions.
In statistics, a sample from a given population is observed, and the goal is to learn
something about that population based on the sample.

Statistical Inference/Inferential Statistics


is a conceptually the process of drawing conclusions about population based on the samples that
are subject to random variation.

Every scientific discipline applies statistics to seek relevant information from a given
sample of data. The procedure leads to conclusions regarding a population, which
includes all possible observations of the process or phenomenon, and is called
statistical inference.
The goal of statistical inference is to develop the mathematical theory of statistics,
mostly building on calculus and probability.

Types of Statistical Inference:


I Parameter estimation:
1 Point estimation; I Hypothesis testing;

2 Interval estimation;
May 12, 2019 8 / 78
What is Statistical Inference?
Statistics is closely related to probability theory, but have entirely different goals.
Recall, from statical theory, that a typical probability problem starts with some assumptions
about the distribution of a random variable (e.g., that it’s binomial), and the objective is to
derive some properties (probabilities, expected values, etc) of said random variable based
on the stated assumptions.
In statistics, a sample from a given population is observed, and the goal is to learn
something about that population based on the sample.

Statistical Inference/Inferential Statistics


is a conceptually the process of drawing conclusions about population based on the samples that
are subject to random variation.

Every scientific discipline applies statistics to seek relevant information from a given
sample of data. The procedure leads to conclusions regarding a population, which
includes all possible observations of the process or phenomenon, and is called
statistical inference.
The goal of statistical inference is to develop the mathematical theory of statistics,
mostly building on calculus and probability.

Types of Statistical Inference:


I Parameter estimation:
1 Point estimation; I Hypothesis testing;

2 Interval estimation; I Nonparametric Methods.

May 12, 2019 8 / 78


Chapter 1: Parametric Point Estimation
In this chapter, methods of parameter estimation called
point estimation are introduced.
One assumes for this purpose that the distribution of the population is
known. However, the values of the parameters of the distribution have to
be estimated from a sample of data, that is, a subset of the population.
One also assumes that the sample is random.

Definition:
A point Estimate of some population parameter θ is a single numerical value
of a statistic θ̂. The statistic θ̂ is called the point estimator.

As an example, X̄ is a point estimator of µ, that is, µ̂ = X̄ and S 2 is a


point estimator of σ 2 , that is, σ̂ 2 = S 2 .
The main objective of this chapter is to draw a random sample of size n,
X1 , . . . , Xn , from the underlying distribution, and on the basis of it to
construct a point estimate (or estimator) for θ, that is, a statistic
θ̂ = θ̂ (X1 , . . . , Xn ) ∈ Θ, which is used for estimating θ.

May 12, 2019 9 / 78


Methods of Finding Parametric Point Estimators

There is any number of estimates one may construct, thus, the


need to assume certain principles or methods for constructing θ̂.

Methods of Finding Parametric Point Estimators are:


1 Maximum likelihood estimation (MLE).

May 12, 2019 10 / 78


Methods of Finding Parametric Point Estimators

There is any number of estimates one may construct, thus, the


need to assume certain principles or methods for constructing θ̂.

Methods of Finding Parametric Point Estimators are:


1 Maximum likelihood estimation (MLE).
2 Estimation by the method of moments.

May 12, 2019 10 / 78


Methods of Finding Parametric Point Estimators

There is any number of estimates one may construct, thus, the


need to assume certain principles or methods for constructing θ̂.

Methods of Finding Parametric Point Estimators are:


1 Maximum likelihood estimation (MLE).
2 Estimation by the method of moments.
3 The method of least squares estimation.

May 12, 2019 10 / 78


Methods of Finding Parametric Point Estimators

There is any number of estimates one may construct, thus, the


need to assume certain principles or methods for constructing θ̂.

Methods of Finding Parametric Point Estimators are:


1 Maximum likelihood estimation (MLE).
2 Estimation by the method of moments.
3 The method of least squares estimation.

The least squares method is commonly used in the so-called,


Regression Analysis (Stat-2041), Statistical methods (Stat1013),
Time Series Analysis (Stat-2042) and other statistics courses.

May 12, 2019 10 / 78


Maximum Likelihood (ML) Method
Perhaps, the most widely accepted principle is the so-called principle of Maximum
Likelihood (ML).
Let X be a r.v. with p.d.f. f (.; θ), where θ is unknown a parameter lying in a parameter
space Θ.
Then the objective is to estimate θ on the basis of a random sample of size n from f (.; θ),
X1 , X2 , . . . , Xn .
Then, replacing θ in f (.; θ) by a "good" estimate of it, one would expect to be able to use
the resulting p.d.f. for the purposes.
This principle dictates that we form the joint p.d.f. of the observed values of the Xi0 s, is a
function of θ (and call it the likelihood function), and maximize the likelihood function with
respect to θ.
The maximizing point (assuming it exists and is unique) is a function of X1 , X2 , . . . , Xn , and
is what we call the Maximum Likelihood Estimate (MLE) of θ.
The notation used for the likelihood function is L(θ|X1 , X2 , . . . , Xn ). Then, we have that:
n
Y
L (θ|X1 , X2 , . . . , Xn ) = f (X1 ; θ) × f (X2 ; θ) × . . . × f (Xn ; θ) = f (Xi ; θ) , θ ∈ Θ. (5)
i=1

A value of θ which maximizes L (θ|X) is called a Maximum Likelihood Estimate(MLE) of


θ.
Clearly, the MLE depends on X , and we usually write θ̂ = θ̂ (X).
 
Thus, L θ̂|X = max {L (θ|X) ; θ ∈ Θ}.

May 12, 2019 11 / 78


MLE Cont...
Once we decide to adopt the Maximum Likelihood Principle it is done through
differentiation.
It must be stressed that, whenever a maximum is sought by differentiation, the
second-order derivative(s) must also be examined in search of a maximum.
Also, it should be mentioned that maximization of the likelihood function, which is the
product of n factors, is equivalent to maximization of its logarithm (always with base e),
which is the sum of n summands, thus much easier to work with.
REMARK 1: Let us recall that a function y = g(X ) attains a maximum at a point X = x0 ,
if
d d2
dx
g(X ) |x=x0 = 0 and dx 2 g(X ) |x=x0 < 0.

Example: Consider the following Bernoulli pmf of discrete random variables X = {0, 1}
and parameter p, with parametric space Θ = (p)
f Xj |p = pxj (1 − p)1−xj , where xj = 0, 1.


where X is a discrete variable and p is a parameter.


n
Y 
L (p|X ) = f Xj ; p
j=1
n
Y
= pxj (1 − p)1−xj where xj = 0, 1.
j=1

May 12, 2019 12 / 78


MLE Example 1
Taking the logarithm of both sides to get:
 
Yn
xj 1−xj 
ln (L (p|X )) =ln  p (1 − p)
j=1
n
X n
X 
=ln (p) xj + ln (1 − p) 1 − xj
j=1 j=1
 
n
X n
X
=ln (p) xj + ln (1 − p) n − xj  , and
j=1 j=1
!!
n
P n
P
d ln (p) xj + ln (1 − p) n− xj
d (ln (L (p|X ))) j=1 j=1
=
dp dp
!
n
P n
P
xj n− xj
j=1 j=1
= + .
p 1−p
Hence by equating the foregoing equation to zero, the estimate of the parameter is
obtained as
n
P
xj
j=1
p̂ = n
.

May 12, 2019 13 / 78


MLE Examples 2
Let X1 , . . . , Xn be a continuous random sample from the N µ, σ 2 distribution, with


parametric space Θ = µ, σ 2 , where only one of the parameters is known. Determine



the MLE of the other (unknown) parameter.
h i
−1 2
f Xj ; µ, σ 2 = √ 1 exp 2σ , −∞ < xj < ∞, σ 2 > 0

2πσ 2 (xi − µ)

Case 1: Let µ be unknown

n   
 Y 1 −1
L µ|Xj = √ exp 2
(xi − µ)2
2πσ 2σ
j=1
 
 n n
1 −1 X 2
= √ exp  (xi − µ)
2πσ 2σ 2
j=1
  
 n n
 1 −1 X 2 
ln L µ|Xj =ln  √ exp  (xi − µ)
2πσ 2σ 2
j=1
√ n
 1 X
= − n ln 2π − n ln (σ) − 2
(xi − µ)2 (6)

j=1

May 12, 2019 14 / 78


MLE Examples 2 cont...
Taking the partial derivative of Equation (6) in terms of µ both sides and then equating the
result in to zero:
" #
√  n
1
(xi − µ)2
P
  ∂ −n ln 2π − n ln (σ) − 2σ 2
∂ ln L µ|Xj j=1
=
∂µ ∂µ
n
2 X
=0 − (xi − µ) , equating to zero
2σ 2
j=1
n
X
(xi − µ) =0
j=1
n
X
xi − n µ =0
j=1
n
P
xi
j=1
µ̂ = = X̄
n

May 12, 2019 15 / 78


MLE Examples 2 cont...

Case 2: Let σ be unknown


Taking the partial derivative of Equation (6) in terms of σ both sides and then equating the
result in to zero:
" #
√  n
1
(xi − µ)2
P
∂ −n ln 2π − n ln (σ) − 2σ 2
∂ ln L σ 2 |Xj
 
j=1
=
∂σ ∂σ
n
n 1 X
=0 − + 3 (xi − µ)2 , equating to zero
σ σ
j=1
n
X
(xi − µ)2 =n σ 2
j=1
n
(xi − µ)2
P
j=1
σ̂ 2 = = S2
n

May 12, 2019 16 / 78


Properties of MLE

Properties of Maximum Likelihood Estimation


Under very general and not restrictive conditions, when the
sample size n is large and θ̂ if is the maximum likelihood estimator
of the parameter θ, then:

May 12, 2019 17 / 78


Properties of MLE

Properties of Maximum Likelihood Estimation


Under very general and not restrictive conditions, when the
sample size n is large and θ̂ if is the maximum likelihood estimator
of the parameter θ, then:
1. θ̂ is an approximately unbiased estimator for θ.

May 12, 2019 17 / 78


Properties of MLE

Properties of Maximum Likelihood Estimation


Under very general and not restrictive conditions, when the
sample size n is large and θ̂ if is the maximum likelihood estimator
of the parameter θ, then:
1. θ̂ is an approximately unbiased estimator for θ.
2. The variance of θ̂ is nearly as small as the variance that could be
obtained with any other estimator, and

May 12, 2019 17 / 78


Properties of MLE

Properties of Maximum Likelihood Estimation


Under very general and not restrictive conditions, when the
sample size n is large and θ̂ if is the maximum likelihood estimator
of the parameter θ, then:
1. θ̂ is an approximately unbiased estimator for θ.
2. The variance of θ̂ is nearly as small as the variance that could be
obtained with any other estimator, and
3. θ̂ has an approximate normal distribution.

May 12, 2019 17 / 78


Method of Moments
Definition: Moments Let X1 , . . . , Xn be a random sample from either a probability mass
function or probability density function with r unknown parameters θ1 , . . . , θr . The moment
estimators θˆ1 , . . . , θˆr are found by equating the first r population moments to the first r
sample moments and solving the resulting equations for the unknown parameters.
This methodology applies in principle also in the case that there are r parameters involved,
Θ = {θ1 , . . . , θr } , or, as we say, when Θ has r coordinates, r ≥ 1.
In such a case, we have to assume that the r first moments of the Xi0 s are finite; that is, the
first k th population moment,
h i
mk (θ1 , . . . , θr ) = E X K , (θ1 , . . . , θr ) ∈ <, k = 1, 2, . . . , r . (7)

Then form the first k th sample moments


n
1X k
µk = X (8)
n
j=1

k = 1, . . . , r , and equate Equations (7) and (9) to the corresponding (population)


moments; that is,
mk = µk , for k = 1, . . . , r (9)
that is, we solve for each parameter by equating m1 = µ1 , m2 = µ2 , . . . , mk = µk
Assuming that we can solve for θ1 , . . . , θr in Equation (9), and that the solutions are
unique, we arrive at what we call the moment estimates of the parameters θ1 , . . . , θr .

May 12, 2019 18 / 78


Examples of Moment Methods

Example 1 Let X1 , . . . , Xn be a continuous random sample from the N µ, σ 2 distribution,




with parametric space Θ = µ, σ 2 , where only one of the parameters is known.




n
1 P
µ1 = n
Xi = X̄ and
j=1

m1 = E [X ] = µ
Equating µ1 = m1
n
1 P
µ1 = m1 ⇒ n
Xi = µ
j=1
n
1 P
Therefore, µ̂ = n
Xi = X̄
j=1

May 12, 2019 19 / 78


Examples of Moments Method cont...
For moment estimate of σ 2
n
1
Xi2 and
P
µ2 = n
j=1

m2 = E X 2 = σ 2 + µ2 (Verfy!)
 

Equating µ2 = m2

µ2 =m2
n
1X 2
Xi =σ 2 + µ2
n
j=1
n
1X 2
Xi =σ 2 + X̄ 2 , since X̄ = µ
n
j=1
n
1X 2
Xi − X̄ 2 =σ 2
n
j=1
n
1 X 2
Xi − X̄ =σ 2
n
j=1

n 2
1
Thus, σ̂ 2 =
P
n
Xi − X̄
j=1

May 12, 2019 20 / 78


Assessment
1 Let X1 , · · · , Xn be a discrete random sample from the Poisson (λ) , λ > 0 distribution, with
parametric space Θ = (λ).
−λ x
f Xj ; λ = e x !λ i , λ > 0, xi = 0, 1, · · · and i = 1, 2, · · · , n

i
Determine the MLE of the unknown parameter λ.
2 Let X1 , · · · , Xn be a continuous random sample from the Negative Exponential distribution
exp (λ) , λ > 0, with parametric space Θ = (λ).
f Xj ; λ = λe−λxi , λ > 0, xi > 0, and i = 1, 2, · · · , n.


Derive the the MLE of the unknown parameter λ.


3 Given the pdf
f Xj ; θ = θ2 Xj e−θxi , λ > 0, xi > 0, and i = 1, 2, · · · , n,


Derive the the MLE of the unknown parameter θ.


4 Given the pdf
1 −(Xj −α)/β

f Xj ; α, β = β
e , α ∈ <, β > 0, xi > α, and i = 1, 2, · · · , n,
Derive the the MLE of the unknown parameter α andβ when
a α unknown when β is known.
b β unknown when α is known.
Suppose that X1 , · · · , Xn is a random sample from an exponential distribution with
parameter λ. Now there is only one parameter to estimate. Show that the moment
estimator of λ is λ̂ = 1/X̄ .

May 12, 2019 21 / 78


Properties of Point Estimators
Note that we may have several different choices for the point
estimator of a parameter. Thus, in order to decide which point
estimator of a particular parameter is the best one to use, we
need to examine their statistical properties and develop some
criteria for comparing estimators.

Properties of Point Estimators


A point estimator can be evaluated based on:
1. Unbiasedness (mean): whether the mean of this estimator is
close to the actual parameter?

May 12, 2019 22 / 78


Properties of Point Estimators
Note that we may have several different choices for the point
estimator of a parameter. Thus, in order to decide which point
estimator of a particular parameter is the best one to use, we
need to examine their statistical properties and develop some
criteria for comparing estimators.

Properties of Point Estimators


A point estimator can be evaluated based on:
1. Unbiasedness (mean): whether the mean of this estimator is
close to the actual parameter?
2. Efficiency (variance): whether the standard deviation of this
estimator is close to the actual parameter.

May 12, 2019 22 / 78


Properties of Point Estimators
Note that we may have several different choices for the point
estimator of a parameter. Thus, in order to decide which point
estimator of a particular parameter is the best one to use, we
need to examine their statistical properties and develop some
criteria for comparing estimators.

Properties of Point Estimators


A point estimator can be evaluated based on:
1. Unbiasedness (mean): whether the mean of this estimator is
close to the actual parameter?
2. Efficiency (variance): whether the standard deviation of this
estimator is close to the actual parameter.
3. Consistency (size): whether the probability distribution of the
estimator becomes concentrated on the parameter as the sample
sizes increases

May 12, 2019 22 / 78


Unbiased Estimators
Definition:
1. Unbiasedness: An estimator θ̂ of an unknown parameter θ is unbiased if
h i
E θ̂ = θ, for θ ∈ Θ
Otherwise, it is a Biased Estimator of θ.

May 12, 2019 23 / 78


Unbiased Estimators
Definition:
1. Unbiasedness: An estimator θ̂ of an unknown parameter θ is unbiased if
h i
E θ̂ = θ, for θ ∈ Θ
Otherwise, it is a Biased Estimator of θ.
2. Bias (B) if an estimator θ̂ of a parameter θ is biased, then
h i
Bias (B) = E θ̂ − θ

is called Bias (B) of θ̂.

The main point here is that, how an estimator should be close to the true value of the
unknown parameter.
When an estimator is unbiased, the bias is zero.
Example 1: Suppose that X is a random variable with mean µ and variance σ 2 . Let
X1 , · · · , Xn be a random sample of size n from the population represented by X. Show that
X̄ and S 2∗ defined in Equations (1) and (3) are unbiased estimator of µ and σ 2 ,
respectively.
Discussion 1:"
n n n
#
  1X 1X 1X 1
E X̄ =E Xi = E [Xi ] = µ, = (nµ) = µ, since E [Xi ] = µ.
n n n n
i=1 i=1 i=1
Therefore, X̄ is unbiased estimator of the population mean µ.

May 12, 2019 23 / 78


Discussion 2:
n
" #
h i 1 X 2
E S2∗ =E Xi − X̄
n−1
i=1
" n #
1 X 2
= E Xi − X̄
n−1
i=1
" n #
1 X
2 2

= E Xi − 2X̄ Xi + X̄
n−1
i=1
" n #
1 X
= E Xi2 − nX̄ 2
n−1
i=1
n
1 X h 2i h i h i h i σ2
= E Xi − nE X̄ 2 , since E Xi2 = µ2 + σ 2 and E X̄ 2 = µ2 + (**)
n−1 n
i=1
n
!
1 X 2 2

2 σ2
= µ +σ −n µ +
n−1 n
i=1
!
1   σ2
= n µ2 + σ 2 − n µ2 +
n−1 n
1
= (n − 1) σ 2 = σ 2
n−1
Therefore, S 2∗ is unbiased estimator of the population variance σ 2 .
May 12, 2019 24 / 78
To show that E Xi2 = µ2 + σ 2 (∗∗),
 

h i
Var [Xi ] =E (Xi − µ)2
h i
σ 2 =E Xi2 − 2µXi + µ2
h i
=E Xi2 − 2µE [Xi ] + µ2
h i
=E Xi2 − 2µ2 + µ2
h i
=E Xi2 − µ2
h i
⇒ E Xi2 =σ 2 + µ2

σ2
To show that E X̄ 2 = µ2 +
 
n
(∗∗),
h i  2
Var X̄ =E X̄ 2 − E X̄
 

σ2 h i  2
=E X̄ 2 − E X̄
n
h i
=E X̄ 2 − µ2
h i σ2
⇒ E X̄ 2 =µ2 +
n

May 12, 2019 25 / 78


Example 2: Suppose that X is a random variable with mean µ and variance σ 2 . Let
X1 , · · · , Xn be a random sample of size n from the population represented by X. Show that
S 2 defined in Equation (2) is biased estimator of σ 2 .
Discussion 3: " n
#
h i 1X 2
E S2 =E Xi − X̄
n
i=1
" n #
1 X 2
= E Xi − X̄
n
i=1
" n #
1 X 
= E Xi2 − 2X̄ Xi + X̄ 2
n
i=1
" n #
1 X
2 2
= E Xi − nX̄
n
i=1
" n #
1 X h 2i h i
= E Xi − nE X̄ 2
n
i=1
" n !#
1 X   σ2
= µ2 + σ 2 − n µ2 + , see(∗∗) above
n n
i=1
" !#
1   σ2 1 σ2
= n µ2 + σ 2 − n µ2 + = (n − 1) σ 2 = σ 2 −
n n n n
2
Therefore, S 2 is a biased estimator of the population variance σ 2 , with bias = B = − σn . Bias is
negative. MLE for tends to underestimate σ 2
May 12, 2019 26 / 78
Mean Square Error (MSE) of an Estimator
Sometimes it is necessary to use a biased estimator. In such cases, the mean square
error of the estimator can be important.
The MSE of an estimator is the expected squared difference between θ̂ and θ.
Definition: Mean Square Error (MSE)
The mean square error of an estimator of the parameter
 θ̂isdefined as
2
MSE(θ̂) = E θ̂ − θ

Assertion:
The mean square error of θ̂ is equal to the variance
h iof the estimator plus the squared bias. That
is, MSE(θ̂) = Var θ̂ + (Bias)2

Proof:  2   h i h i 2 
MSE(θ̂) =E θ̂ − θ = E θ̂ − E θ̂ + E θ̂ − θ
 h i2  h i 2  h i  h i 
=E θ̂ − E θ̂ + E θ̂ − θ − 2 θ̂ − E θ̂ E θ̂ − θ
 h i2   h i 2  h h i  h i i
=E θ̂ − E θ̂ + E E θ̂ − θ − 2E θ̂ − E θ̂ E θ̂ − θ (10)
| {z }
=0−−−−−−−(∗∗∗)
 h i2   h i 2  h i
=E θ̂ − E θ̂ + E E θ̂ − θ = Var θ̂ + (Bias)2 .

May 12, 2019 27 / 78


Efficiency of an Estimator
The mean square error is an important criterion for comparing two estimators.
The term efficiency is used as a relative measure of the variance of the sampling
distribution, with the efficiency increasing as the variance decreases.
One may search unbiased estimators to find the one with the smallest variance and call it
the most efficient.
Definition: Efficiency
An estimator that has minimum mean square error among all possible unbiased estimators is
called an efficient estimator.

The mean square error of an estimator, which is equivalent to the sum of its variance and
the square of its bias, can be used as a relative measure of efficiency (RE) when
comparing two or more estimators.

Definition: Relative Efficiency


Let θ̂1 and θ̂2 be two estimators of the parameter θ, and let MSE(θ̂1 ) and MSE(θ̂2 ) be the mean
square errors of θ̂1 and θ̂2 . Then the RE of θ̂2 to θ̂1 is defined as
 
MSE θ̂1
RE =   (11)
MSE θ̂2

Remark:
If this relative efficiency is less than 1, we would conclude that θ̂1 is a more efficient estimator of
θ than θ̂2 , in the sense that it has a smaller mean square error.

May 12, 2019 28 / 78


Efficiency

Figure: The density function of the efficient estimator is exemplified by a


normal density with (σ = 0.5). The dotted line indicates a less efficient
estimator (σ = 1).

May 12, 2019 29 / 78


Example of RE

Example The unbiased estimated mean of of the densities of 40


concrete test cubes is 2445kg/m3 . However, if we had only the
first five test cubes, the second unbiased estimated mean would
be 2431kg/m3 . Hence the relative efficiency, as given by the ratio
of the MSE values, bears inversely with the ratio of variances:
MSE (θ̂ ) 2 2
RE = MSE θ̂1 = σσ2 /n1
= σσ2/40 = 81 < 1.
( 2) /n2 /4

This result confirms what we already know, that is, the


large-sample estimator for the mean is more efficient than that
based on a smaller sample.
The efficiency is seen to be proportional to the sample size n.

May 12, 2019 30 / 78


Consistency of an Estimator
A consistent estimator of a parameter θ produces statistics that
converge to θ, in terms of probability.

Definition: Consistency
An estimator θn , based on a sample size n, is a consistent estimator
of a parameter θ, if for any positive number ,
h i
lim Pr θ̂n − θ ≤  = 1 (12)
n→∞

As n grows, the estimator will collapse on the true value of the


parameter: thus, we do have asymptotic unbiasedness.
One finds, however, that sometimes an unbiased estimator may
not be consistent.
Example In Equations (2) and (3) we considered two methods
(S 2 and S 2∗ ) of estimating the variance σ 2 .

May 12, 2019 31 / 78


Assessment
1. Suppose we have independently distributed random samples of size 2n from a population
denoted by X, and E [X] = µ and Var [X] = σ 2 . Let
2n n
1 P 1 P
X̄1 = 2n
Xi and X̄2 = n
Xi ,
i=1 i=1
be two estimators of µ. Which is the better estimator of µ? Explain your choice.
2. Let X1 , . . . , Xn denote a random sample from a population having mean µ and variance
σ 2 . Consider the following estimators of µ:

X1 + X2 + . . . + X7
θ̂1 = and
7
2X1 − X6 + X4
θ̂2 = .
2

a. Is either estimator unbiased?


b. Which estimator is best? In what sense is it best?
c. Calculate the relative efficiency of the two estimators.
h i
3. Suppose that θ̂1 and θ̂2 are estimators of the parameter θ We know that E θ̂1 = θ,
h i h i h i
E θ̂2 = θ/2,Var θ̂1 = 10 and Var θ̂2 = 4. Which estimator is best? In what sense is
it best?

May 12, 2019 32 / 78


Chapter 2: Parametric Interval Estimation
So far we have discussed the point estimation of a parameter, or more precisely, point
estimation of several real valued parametric functions in the previous chapter.
In case of continuous distributions the probability that the point estimator actually equaled
the value of the parameter being estimated is zero.
Hence, it seems desirable that a point estimate should be accompanied by some measure
of the possible error of estimate.
For instance, a point estimate must be accompanied by some interval about the point
estimate together with some measure of assurance that the true value of the parameter
lies within the interval.
Instead of making the inference of estimating the true value of the parameter to be a point,
we might make the inference that the true value of the parameter is contained in some
interval. This is called the problem of interval estimation.
In this chapter, methods of parameter estimation called interval estimation are
introduced.
An interval estimate for a population parameter θ is called a confidence interval (CI).
We cannot be certain that the interval contains the true, unknown population
parameter—we only use a sample from the full population to compute the point estimate
and the interval estimation too.
However, the confidence interval is constructed so that we have high confidence that it
does contain the unknown population parameter θ.
Surprisingly, it is easy to determine such intervals in many cases, and the same data that
provided the point estimate are typically used.

May 12, 2019 33 / 78


Basics of Parametric Interval Estimation
A confidence interval estimate for θ is an interval of the form L ≤ θ ≤ U where the
endpoints L and U are statistic computed from the sample data.
Because different samples will produce different values of L and U, these end-points are
values of random variables L and U, respectively.
Parametric Confidence Interval
Suppose that we can determine values of L and U such that the following probability statement is
true: Pr [L ≤ θ ≤ U] = 1 − α; where 0 ≤ α ≤ 1. (13)
There is a probability of (1 − α) of selecting a sample for which the CI will contain the true-value θ.
The end-points or bounds L and U are called the lower- and upper-confidence limits, respectively, and 1 − α is called the
confidence coefficient.

Remark: The length of confidence interval is the difference of lower- and


upper-confidence limits, given by U- L.
Example: Suppose that X1 , . . . , Xn is a random sample from a normal distribution with
unknown mean µ and known variance σ 2 . From the results of Chapter 1 we know that the
2
sample mean X̄ is normally distributed with mean µ and variance σn . We may standardize
by subtracting the mean and dividing by the standard deviation, which results:
X̄ − µ
Z = √ . (14)
σ/ n
Now Z has a standard normal distribution, that is Z ∼ N (0, 1).
Creating a new random variable Z by this transformation is referred to as standardizing
quantity Z with pdf: 1 1 2
f (Z ) = √ e− 2 z (15)

which is independent of the true value of the unknown parameters µ and σ . 2

May 12, 2019 34 / 78


Confidence interval for the mean µ (when σ 2 is known)
The random√ variable Z represents the distance of X̄ from its mean µ in terms of standard
error σ/ n.
It is the key step to calculate a probability for an arbitrary normal random variable.
From Equations (19) and (14) where only one of µ or σ 2 is unknown.
To construct a confidence interval for it with confidence coefficient 1 − α.
1. Let µ be unknown. Consider any two points L < U from the Normal tables for which
Pr [L ≤ Z ≤ U] = 1 − α where Z ∼ N (0, 1). In particular, for U = Zα/2 and L = −Zα/2
It follows that:
Pr [L ≤ Z ≤ U] = 1 − α
 
X̄ − µ
Pr −Zα/2 ≤ √ ≤ Zα/2 = 1 − α, for all µ
σ/ n
 √ √ 
Pr −Zα/2 σ/ n ≤ X̄ − µ ≤ Zα/2 σ/ n = 1 − α
 √ √ 
Pr −X̄ − Zα/2 σ/ n ≤ −µ ≤ −X̄ + Zα/2 σ/ n = 1 − α
 √ √ 
Pr X̄ − Zα/2 σ/ n ≤ µ ≤ X̄ + Zα/2 σ/ n = 1 − α.

Definition: Confidence Interval for the unknown mean parameter µ (when σ 2


is known)
Let X̄ be the mean of a random sample of size n drawn from a normal population with known
standard deviation σ. The 100(1 − α)% central two-sided confidence interval for the population
mean µ is given by:  √ √ 
Pr X̄ − Zα/2 σ/ n ≤ µ ≤ X̄ + Zα/2 σ/ n = 1 − α. (16)
√ √ 
That is, µ lies in the interval X̄ − Zα/2 σ/ n, X̄ + Zα/2 σ/ n
. May 12, 2019 35 / 78
Plot of two sided CI for Standard Normal

Figure: Standard Normal pdf showing two-sided Confidence Interval.

May 12, 2019 36 / 78


Confidence Interval for the Variance σ 2 (when µ is known)

n
2. Let σ 2 be unknown. Set S 2 = 1
(Xi − µ)2 .
P
n−1
i=1
n−1 2
(n−1)S 2 P  Xi −µ
Recall that σ2
= σ
∼ χ2(n−1) .
i=1
From the Chi-Square tables, determine any pair 0 < L < U for which
Pr [L ≤ X ≤ U] = 1 − α, where X ∼ χ2n−1 . Then we have

" #
(n − 1)S 2
Pr L ≤ ≤ U = 1 − α, for all σ 2
σ2
" #
1 σ2 1
Pr ≤ ≤ =1−α
U (n − 1)S 2 L
" #
(n − 1)S 2 (n − 1)S 2
Pr ≤ σ2 ≤ =1−α
U L
" #
(n − 1)S 2 2 (n − 1)S 2
Pr ≤ σ ≤ = 1 − α.
χ2n−1,α/2 χ2n−1,1−α/2

May 12, 2019 37 / 78


Definition: Confidence Interval for Variance, σ 2 (when µ is known)
Let Ŝ 2 be the variance of a random sample of size n drawn from a normal distribution with
unknown variance. The 100(1 − α)% percent equi-tailed two-sided confidence interval for
the population variance σ 2 is as follows:
" #
(n − 1)S 2 2 (n − 1)S 2
Pr ≤ σ ≤ = 1 − α; or (17)
χ2n−1,α/2 χ2n−1,1−α/2
 
(n−1)S 2 (n−1)S 2
That is, σ 2 lies in the interval 2 , 2 ,
χn−1,α/2 χn−1,1−α/2

where χ2n−1,α/2 and χ2n−1,1−α/2 are the values that a χ2n−1 variate exceeds with
probabilities α/2 and (1 − α/2), respectively.

Figure: Equal-tails Confidence Interval for Variance (chi-squared distribution).


May 12, 2019 38 / 78
Remark:
The corresponding one-sided upper confidence limit for σ 2 is defined as:
" #
2 (n − 1)S 2
σ ≤ 2 = 1 − α. (18)
χn−1,1−α

Example: The compressive strengths of 40 test cubes of concrete samples with the
sample mean and sample standard deviation of 60.14 and 5.02 N/mm2 , respectively. We
also assume that the compressive strengths are normally distributed. To facilitate the
application, let us assume that the estimated standard deviation of 5.02N/mm2 is the true
on known value.
a. Construct a 95% confidence interval for the population mean µ.
b. Construct an upper one-sided 99% confidence limit for the population variance.
c. Construct a 95% two-sided confidence limit for the population variance.
Discussion: Given: n=40, X̄ = 60.4 and Ŝ = 5.02
a. From standardized normal Table Zα/2 = Z0.025 = 1.96. Using Equation (16), we have
h √ √ i
Pr X̄ − Zα/2 Ŝ/ n ≤ µ ≤ X̄ + Zα/2 Ŝ/ n = 1 − α
h √ √ i
Pr 60.4 − 1.96 ∗ 5.02/ 40 ≤ µ ≤ 60.4 + 1.96 ∗ 5.02/ 40 = 0.95
Pr [58.58 ≤ µ ≤ 61.70] = 0.95
Therefore we are 95% confident that the interval (58.58, 61.70) includes the true
population mean µ.
The length of confidence interval is 61.70 − 58.58 = 3.12.
May 12, 2019 39 / 78
Example Cont...
b. From χ Table χ2n−1, α = χ239, 0.01 = 21.426. Using Equation (18), we have
2
" #
2 (n − 1)S 2
Pr σ ≤ 2 =1−α
χn−1,1−α
" #
39(5.02)2
Pr σ 2 ≤ = 0.99
χ239,0.01
 
39(25.2004) h i
Pr σ 2 ≤ = 0.99 ⇒ Pr σ 2 ≤ 45.87 = 0.99.
21.426
Hence the 99% upper confidence limit for σ is 6.76N/mm2 .
c. From χ2 Table χ2n−1, α/2 = χ239, 0.025 = 58.120 and χ2n−1, 1−α/2 = χ239, 0.975 = 23.654.
Using Equation (17), we have
" #
(n − 1)S 2 2 (n − 1)S 2
Pr ≤σ ≤ 2 =1−α
χ2n−1,α/2 χn−1,1−α/2
" #
(39)(25.2004) 2 (39)(25.2004)
Pr ≤ σ ≤ = 0.95
χ239, 0.025 χ239,0.975
 
982.816 982.816 h i
Pr ≤ σ2 ≤ = 0.95 ⇒ Pr 16.9 ≤ σ 2 ≤ 41.55 = 0.95.
58.120 23.654
Hence the 95% two-sided confidence limit for σ is (4.11, 6.45) in N/mm2 .
This interval is fairly wide because there is a lot of variability in the compressive strengths
of cubes of concrete samples measured.
May 12, 2019 40 / 78
Simultaneous Confidence Interval for the Mean and Variance (Small Sample)

In Example
 1 and 2, the position was adopted that only one of the parameters in the
N µ, σ 2 distribution was unknown. In practice, both µ and σ 2 are most often unknown.
In this sub section we try pave the way to solving the problem.

Simultaneous Confidence Interval for the Mean


Let X1 , . . . , Xn be a random sample from the N µ, σ 2 distribution, where both µ and σ 2 are

unknown.
To construct confidence intervals for µ and σ 2 , each with confidence coefficient 1 − α, we have
X̄ −µ
√ ∼ N (0, 1) and
σ/ n
2 n−1
P  Xi −µ 2
(n−1)S
σ2
= σ
∼ χ2n−1 ,
i=1
n
where S 2 = 1
(Xi − µ)2 and these two r.v.’s are independent.
P
n−1
i=1
It follows that their ratio
X̄ −µ

r σ/ n
(X̄ −µ )
= S/

n
∼ tn−1 (t − distribution with (n − 1) D.F .).
(n−1)S 2
σ 2 (n−1)

May 12, 2019 41 / 78


Simultaneous Confidence Interval cont...
As usual from the t-tables, determine any pair L and U with L < U such that
P [L ≤ X ≤ U] = 1 − α, where X ∼ tn−1 .
Let L = −tn−1;α/2 and U = tn−1;α/2 it follows that:
 
X̄ − µ
P L≤ √ ≤U =1−α
S/ n
 
X̄ − µ
P −tn−1;α/2 ≤ √ ≤ tn−1;α/2 = 1 − α
S/ n
 
S S
P −tn−1;α/2 √ ≤ X̄ − µ ≤ tn−1;α/2 √ =1−α
n n
 
S S
P X̄ − tn−1;α/2 √ ≤ µ ≤ X̄ + tn−1;α/2 √ =1−α (19)
n n

Definition:
If X̄ and S are the sample mean and sample standard deviation of a random samples
X1 , X2 , · · · , Xn from a normal distribution with unknown variance σ 2 , a 100(1 − α)% confidence
interval for population mean µ is  
X̄ − tn−1;α/2 √Sn , X̄ + tn−1;α/2 √Sn
where tn−1;α/2 is the upper 100α/2 percentage point of the t distribution with n − 1 degrees of
freedom.

One-sided confidence bounds for the mean of a t− distribution are also of interest and are
simply to use only the appropriate lower or upper confidence limit from Equation (19) and
replace tn−1;α/2 by tn−1;α .
May 12, 2019 42 / 78
Simultaneous Confidence Interval cont...

Figure: Two-Sided Confidence Interval for the Population Mean Using


t-Distribution

May 12, 2019 43 / 78


Confidence Interval for σ 2

The construction of a confidence interval for σ 2 in the presence of (an unknown) µ is


easier.
(n−1)S 2
We have already mentioned that σ2
∼ χ2(n−1) and repeat the process to obtain the
confidence interval as !
(n − 1)S 2 (n − 1)S 2
2
, 2 , (20)
χn−1,α/2 χn−1,1−α/2
n 2
1
where, S 2 =
P
n−1
Xi − X̄
i=1
Note that: The confidence interval in Equation (16) is different from Equation (19) in that σ
in Equation (16) is replaced by an estimate S, and then the constant Zα/2 in Equation (16)
is adjusted to tn−1;α/2 .
Likewise, the confidence intervals in Equation (17) and (20) are of the same form, with the
only difference that (the unknown) µ in Equation (17) is replaced by its estimate X̄ in
Equation (20).

May 12, 2019 44 / 78


Example
An article in the Journal of Heat Transfer (Trans. ASME, Sec. C, 96, 1974, p. 59) described
a new method of measuring the thermal conductivity of Armco iron. Using a temperature of
1000 F and a power input of 550watts, the following 10 measurements of thermal
conductivity (in Btu/hr − ft −0 F ) were obtained:
41.60, 41.48, 42.34, 41.95, 41.86, 42.18, 41.72, 42.26, 41.81, 42.04
A point estimate of the sample mean thermal conductivity at 1000 F and 550 watts is
41.60+41.48+···+42.04
X̄ = 10
= 41.924Btu/hr − ft −0 F
And a point estimate of the sample standard deviation is:
v
u
u 1 X n
2
S =t Xi − X̄
n−1
i=1
s
(41.60 − 41.924)2 + (41.48 − 41.924)2 + · · · + (42.04 − 41.924)2
= = 0.284Btu/hr − f
9
The estimated standard error of X̄ is S 0.284
σX̄ = √
n
= √ = 0.0898
10
If we can assume that thermal conductivity is normally distributed, σ 2 is unknown and
n=10 is small it is advisable to use t − distribution.
From the student t-distribution table tn−1,α/2 = t9,0.025 = 2.262
Thus, at 95% confidence level the true mean µ of thermal conductivity is with the interval
X̄ ± tn−1,α/2 √Sn = 41.924 ± 2.262(0.0898) = (41.721, 42.127) .

May 12, 2019 45 / 78


Example cont...
To construct a 95% confidence interval for σ 2 , χ2n−1,α/2 = χ29,0.025 = 19.02 and
χ2n−1,1−α/2 = χ29,0.975 = 2.70

!
(n − 1)S 2 (n − 1)S 2
, 2
χ2n−1,α/2 χn−1,1−α/2
!
9(0.2842 ) 9(0.2842 )
= ,
χ29,0.025 χ29,0.975
 
0.726 0.726
= , = (0.038 , 0.269)
19.02 2.70

This last expression may be converted into a confidence interval on the standard deviation
σ by taking the square root of both sides, resulting in (0.195 , 0.518).
Therefore, at the 95% level of confidence, the thermal conductivity data indicate that the
process standard deviation could be as small as 0.195 Btu/hr − ft −0 F and large as
0.518 Btu/hr − ft −0 F

May 12, 2019 46 / 78


A Large-Sample Confidence Interval For a Population Proportion
It is often necessary to construct confidence intervals on a population proportion.

Population Proportion
Suppose that a random sample of size n has been taken from a large (possibly infinite)
population and that X ≤ n observations in this sample belong to a class of interest. Then

X
P̂ = (21)
n
is a point estimator of the proportion of the population p that belongs to this class, where n and p
are the parameters of a binomial distribution.

The sampling distribution of P̂ is approximately normal with mean p and variance if


p (p − 1) is not too close to either 0 or 1 and if n is relatively large.
Assuming that np and n(p − 1) are greater than 5,
X − np
Z =p
np(p − 1)
X
−p
= qn , devide by n
p(p−1)
n

P̂ − p
=q ≈ N (0, 1)
p(p−1)
n

May 12, 2019 47 / 78


A Large-Sample CI for Population Proportion
To construct the confidence interval on p, note that
 
Pr −Zα/2 ≤ Z ≤ Zα/2 = 1 − α
 
P̂ − p
Pr −Zα/2 ≤ q ≤ Zα/2  = 1 − α
 
p(p−1)
n
" r r #
p(p − 1) p(p − 1)
Pr P̂ − Zα/2 ≤ p ≤ P̂ + Zα/2 =1−α
n n

Definition: Confidence Interval for Population Proportion, P̂


A 100(1 − α)% confidence interval for p is
r r !
p(p − 1) p(p − 1)
P̂ − Zα/2 , P̂ + Zα/2 (22)
n n

q
p(p−1)
where the quantity n
in Equation (22) is called the standard error of the point
estimator P̂.

May 12, 2019 48 / 78


Example
In a random sample of 85 automobile engine crankshaft bearings, 10 have a surface finish
that is rougher than the specifications allow. Construct A 95% two-sided confidence
interval for p.
Discussion:
A point estimate of the proportion of bearings in the population that exceeds the roughness
specification is
10
P̂ = 85
= 0.12
and
s s
P̂(1 − P̂) P̂(P̂ − 1)
P̂ − Z0.025 ≤ p ≤ P̂ + Z0.025
n n
s s
0.12(0.88) 0.12(0.88)
0.12 − 1.96 ≤ p ≤ 0.12 + 1.96
85 85
0.05 ≤ p ≤ 0.19

May 12, 2019 49 / 78


Confidence Interval for the Difference Between two Samples Means cont...
Assumption: σ12 = σ22 = σ 2 ; that is, we have to assume that the variances, although
unknown, are equal.
   
σ2 σ2
Recall that: X̄ − µ1 ∼ N 0, m1 and Ȳ − µ2 ∼ N 0, n2 .

By independence of X and Y it follows that:



X̄ − Ȳ − (µ1 − µ2 )
q ∼ N (0, 1) (23)
1
σ m + n1

m 2 n 2
1 1
Further recall that if SX2 = and SY2 =
P P
m−1
Xi − X̄ n−1
Yi − Ȳ , then
i=1 i=1
2 2
(m−1)SX (n−1)SY
σ2
∼ χ2(m−1) and σ2
∼ χ2(n−1) .
By independence of X and Y ,

(m − 1)SX2 + (n − 1)SY2
∼ χ2(m+n−2) (24)
σ2
From Equations (23) and (24)

X̄ − Ȳ − (µ1 − µ2 )
r
2 +(n−1)S 2 
(m−1)SX
 ∼ tm+n−2 . (25)
Y 1 1
m+n−2 m
+n

May 12, 2019 50 / 78


Confidence Interval for the Difference Between two Samples Means cont...
Definition: CI for Difference of Two Means from Two Independent Population
Let X1 , · · · , Xm and Y1 , · · · , Yn be two independent random samples from the N µ1 , σ12


and N µ2 , σ22 distributions, respectively, with all µ1 , µ2 , σ12 and σ22 unknown.

Thus, from a t-distribution in Equation (25), the confidence interval for the difference of the true
parameter means (µ1 − µ2 ) is
s
(m − 1)SX2 + (n − 1)SY2
 
 1 1
X̄ − Ȳ ± tm+n−2,α/2 + (26)
m+n−2 m n

Figure: Confidence Interval for the Difference of Two Means

May 12, 2019 51 / 78


σ12
CI for Ratio of Variances, for Two Samples from Independent Normal Populations
σ22 2 2
(m−1)SX (n−1)SY
Recall once more that ∼ χ2(m−1) and ∼ χ2(n−1) .
σ12 σ22
2
By independence of X and Y , SX
σ12 σ12 SY2
2
= × ∼ Fn−1,m−1 . (27)
SY σ22 SX2
σ22
From the F -tables, determine any pair (L, U) with 0 < L < U such
thatP(L ≤ X ≤ U) = 1 − α, where X ∼ Fn−1,m−1 .
Then, "
σ2 S2
#
Pr L ≤ 12 × Y2 ≤ U = 1 − α
σ2 SX
" #
SX2 σ12 S2
Pr L 2 ≤ 2 ≤ U X2 = 1 − α
SY σ2 SY
" #
SX2 σ2 S2
Pr Fn−1,m−1;1−α/2 2 ≤ 12 ≤ Fn−1,m−1;α/2 X2 = 1 − α
SY σ2 SY
σ12
CI for Ratio of Variances
σ22
Let X1 , · · · , Xm and Y1 , · · · , Yn be two independent random samples from the N µ1 , σ12


and N µ2 , σ22 distributions, respectively, with all µ1 , µ2 , σ12 and σ22 unknown.


σ12
A 100(1 − α)% confidence interval required for is then,
σ22 !
SX2 SX2
Fn−1,m−1;1−α/2 , Fn−1,m−1;α/2 . (28)
SY2 SY2
May 12, 2019 52 / 78
Figure: Confidence Interval for Ratio of Variances from two Independent
Population (F-Distribution)

May 12, 2019 53 / 78


Example
The summary statistics given below from two catalysts types in which 8 samples in the
pilot plant are take from each are being analyzed to determine how they affect the mean
yield of a chemical process. Specifically, the 1st catalyst is currently in use, but the 2nd
catalyst is acceptable.

Table: Catalyst Yield Data


May 12, 2019 54 / 78
Discussion
Construct a confidence interval for difference between the mean yields. Use
α = 0.05, and assume equal variances.
Using Equation (26) and tm+n−2,α/2 = t14,0.025 = 2.145, m=n=8, m+n-2=14.
s
(m − 1)SX2 + (n − 1)SY2
 
 1 1
X̄ − Ȳ ± tm+n−2,α/2 +
m+n−2 m n
s
7(2.39)2 + 7(2.98)2 1
 
1
= (92.225 − 92.733) ± t14;0.025 +
14 8 8
s
7(2.392 ) + 7(2.98)2 1
 
1
= − 0.478 ± 2.145 +
14 8 8
= − 0.478 ± 2.145(1.350)
= − 0.478 ± 2.897 = (−3.375, 2.419) .
σ12
Construct a confidence interval for the ratio variance of yields. Use α = 0.05.
σ22
1 1
Using Equation (28) and Fn−1,m−1;1−α/2 = F7,7;0.975 = F7,7,0.025
= 4.99
= 0.200 and
Fn−1,m−1;α/2 = F7,7,0.025 = 4.99.
! !
SX2 SX2 2.392 2.392
Fn−1,m−1;1−α/2 , Fn−1,m−1;α/2 = F7,7;0.975 , F7,7,0.025
SY2 SY2 2.98 2 2.982
= (0.200(0.643), 4.99(0.643)) = (0.129, 3.209)

May 12, 2019 55 / 78


Assessment

1. Let X̄ = 102 and that n = 50 and σ 2 = 10. What is a 95% confidence interval for µ?
2. A survey was made in the core course, asking (among other things) the annual salary of
the jobs that the students had before enrolling as a full–time PhD students. Here is a
subset (n = 10) of those responses (in thousands of dollars):
20, 34, 52, 21, 26, 29, 71, 41, 23, 67
a. Construct a 95% confidence interval for the true average income for incoming full–time
PhD students.
b. Construct a 95% confidence interval for the true standard deviation income for incoming
full–time PhD students.
3. A forester wishes to estimate the average number of "count trees" per acre (trees larger
than a specified size) on a 2,000-acre plantation. She can then use this information to
determine the total timber volume for trees in the plantation. A random sample of n = 50
one-acre plots is selected and examined. The average (mean) number of count trees per
acre is found to be 27.3, with a standard deviation of 12.1. Use this information to
construct a 99% confidence interval for µ, the mean number of count trees per acre for the
entire plantation.

May 12, 2019 56 / 78


Chapter 3: Basics of Hypothesis Testing

In the previous chapter we illustrated how to construct a confidence interval estimate of a


parameter from sample data.
However, many problems in decision making require that we decide whether to accept or
reject a statement about some parameter. The statement is called a hypothesis, and
the action of decision-making procedure about the hypothesis is called hypothesis
testing.
Definition: A statistical hypothesis is a statement about the parameters of one or more
populations.
A random sample is taken from the population and statistical hypotheses, called null and
its alternative, are declared. Then a statistical test is made.
If the observed random sample do not support the model or theory postulated, the null
hypothesis is rejected in favor of the alternative one, which may be considered to be
true.
However, if the observations are in agreement, then the null hypothesis is not rejected.
This does not necessarily mean that it is accepted. It suggests that there is
insufficient evidence from the data against the null hypothesis in favor of the
alternative one.

May 12, 2019 57 / 78


Two-Sided Hypotheses Test
A test of any hypothesis such as
H0 : θ̂ = θ0 versus Ha : θ̂ 6= θ0
is called a two-sided test, because it is important to detect differences from the
hypothesized value θ0 of the parameter that lie on either side of θ0 .
In such a test, the critical region is split into two parts, with (usually) equal probability
placed in each tail of the distribution of the test statistic.
For example, if say Z0 is standardized normally distributed random variable, then the
critical regions can be visualized as in the figure follows.

Figure: The distribution of Z when H0 : µ̂ = µ0 is true, with critical region for the
two-sided alternative Ha : µ̂ 6= µ0

May 12, 2019 58 / 78


One-Sided Hypotheses Test
We may also develop procedures for testing hypotheses on the mean µ where the
alternative hypothesis is one-sided.
H0 : θ̂ = θ0 versus Ha : θ̂ < θ0 or
H0 : θ̂ = θ0 versus Ha : θ̂ > θ0
If the alternative hypothesis is Ha : θ̂ > θ0 , the critical region should lie in the upper tail of
the distribution of the test statistic, whereas if the alternative hypothesis is Ha : θ̂ < θ0 , the
critical region should lie in the lower tail of the distribution. Consequently, these tests are
called one-tailed tests.

Figure: Critical Regions for One-sided Alternative Ha : θ̂ > θ0 (left), and the One-sided
Alternative Ha : θ̂ < θ0 (right), for Standardized Normal Distributed Z.

May 12, 2019 59 / 78


Rejection Regions

Critical values: The values of the test statistic that separate the rejection and
non-rejection regions. They are the boundary values obtained corresponding to the preset
level.
Rejection region: The set of values for the test statistic that leads to rejection of H0 .
Non-rejection region: the set of values not in the rejection region that leads to
non-rejection of H0 .

May 12, 2019 60 / 78


The Procedure for Hypothesis Tests
As just outlined, hypothesis testing concerns one or more parameters and also the related
probability distribution. The basic steps, in applying hypothesis-testing methodology
is recommended as follows.
1. From the problem context, identify the parameter of interest.
2. State the null hypothesis, H0 in terms of a population parameter, such as µ or σ 2 .
3. Specify an appropriate alternative hypothesis, Ha in terms of the same population
parameter.
4 Choose a significance level, α.
5. Determine an appropriate test statistic, substituting quantities given by the null hypothesis
but not the observed values. State what statistical distribution is being used, so that we
may need to make an assumption regarding the underlying distribution.
6. Compute any necessary sample quantities, assuming that the null hypothesis is true and
substitute these into the equation for the test statistic.
7 State the rejection region or also called a critical region, for the test statistic.
8. Decide whether or not H0 should be rejected and report that in the problem context based
on the observed level of significance p-value.
9. State a conclusion, that might be either to accept the null hypothesis, or else to reject the
null hypothesis in favour of the alternative hypothesis.

May 12, 2019 61 / 78


Types of Possible Error
We may decide to take some action on the basis of the test of significance, such as
adjusting the process if a result is statistically significant. But we can never be
completely certain we are taking the right action.
There are two types of possible error which we must consider.

Table: Types of Possible Error

H0 True H0 False
Fail to reject H0 Correct Decision Type II error
Reject H0 Type I error Correct Decision
The type I error specification is the probability of making errors when the null
hypothesis is true. This specification is commonly represented with the symbol α.
For example if we say that a test has α ≤ 0.05 we guarantee that if the null hypothesis is
true the test will not make more than 1/20 mistakes.
P(of type I error) = P(of rejecting H0 whereas H0 is true)=α (the significance level).
The type II error specification is the probability of making errors when the null
hypothesis is false. This specification is commonly represented with the symbol β.
P(of type II error) = P(of accepting H0 whereas H0 is false)=β.
For example if we say that for a test β is unknown we say that we cannot guarantee how it
will behave when the null hypothesis is actually false.
The power of test: specification is the probability of correctly rejecting the null hypothesis
when it is false, or the power of a test is the probability of making the correct decision
when the alternative hypothesis is true.
Thus the power specification is 1 − β.
May 12, 2019 62 / 78
P-Value
The p-value is the probability that the null hypothesis is true (based on the data) or
p-value is the smallest significance level at which the null hypothesis can be rejected.
We noted previously that reporting the results of a hypothesis test in terms of a P-value is
very useful because it conveys more information than just the simple statement "reject H0 "
or "fail to reject H0 ".
The p-value is a number between 0 and 1 that represents a probability.
The observed level of significance or p-value is the probability of obtaining a result as far
away from the expected value as the observation is, or farther, purely by chance, when the
null hypothesis is true.
Notice that a smaller observed level of significance indicates that the null hypothesis is less
likely.
If this observed level of significance is small enough, we conclude that the null hypothesis
is not plausible.
In many instances we choose a critical level of significance before observations are made.
The most common choices for the critical level of significance are 10%, 5%, and 1%.
If the observed level of significance is smaller than a particular critical level of significance,
we say that the result is statistically significant at that level of significance.
If the observed level of significance is not smaller than the critical level of significance, we
say that the result is not statistically significant at that level of significance.

May 12, 2019 63 / 78


Test of the Mean of Normal Population when the Variance of the Population is known.

Assume that a random sample X1 , X2 , . . . , Xn has been taken from the population. Based
on our previous discussion, the sample mean X̄ is an unbiased point estimator of µ with
variance σ 2 /n.
1. The test of hypotheses:
H0 : µ̂ = µ0 versus Ha : µ̂ 6= µ0
where µ0 is a specified constant.
2. The test statistic: X̄ − µ0
Zcal = √ (29)
σ/ n
 
3. If the null hypothesis H0 : µ̂ = µ0 is true,E X̄ = µ0 , and it follows that the distribution of
Z ∼ N (0, 1) is the standard normal the probability is 1 − α that the test statistic Zcal falls
between −Zα/2 and Zα/2 , where Zα/2 is the 100α/2 percentage point of the standard
normal distribution. That is,
 
Pr −Zα/2 ≤ Zcal ≤ Zα/2 = 1 − α.
4. Rejection Region: Reject H0 if the observed value of the test statistic Zcal is either
Zcal > Zα/2 or Zcal < −Zα/2 .

May 12, 2019 64 / 78


Example
The Texas A & M agricultural extension service wants to determine whether the mean yield
per acre (in bushels) for a particular variety of soybeans has increased during the current
year over the mean yield in the previous 2 years when µ = 520 bushels per acre. The
research statement is that yield in the current year has increased above 520 with
α = 0.025 confidence level. Suppose we have decided to take a sample of n = 36
one-acre plots, and from these data we compute ȳ = 573 and S = 124. Can we conclude
that the mean yield for all farms is above 520?
Discussion: Assume that σ can be estimated by S.
1. The test of hypothesis:
H0 : µ̂ ≤ 520 versus Ha : µ̂ > 520
2. The test statistic:
ȳ − µ0
Zcal = √
S/ n
573 − 520
= √
124/ 36
573 − 520
= √
124/ 36
=2.56

3. Rejection region: Zcal = 2.56 > Ztabulated = Zα = Z0.025 = 1.96


4. Conclusion: We reject the null hypothesis in favor of the research hypothesis and
conclude that the average soybean yield per acre is greater than 520.
May 12, 2019 65 / 78
Summary of Statistical Test of Hypothesis of the Mean for Large n ≥ 30, (σ is known)

May 12, 2019 66 / 78


Hypothesis Testing on the Mean of a Population with Unknown Variance σ 2

The important point upon which the test procedure relies is that if X1 , X2 , . . . , Xn is a
random sample from a normal distribution with mean µ and unknown variance σ 2 , then the
random variable
X̄ − µ0
T = √ (30)
S/ n
has a t distribution with n − 1 degrees of freedom.
mean X̄ is an unbiased point estimator of
Based on our previous discussion, the sample √
µ and estimated sample standard deviation S/ n we have:
1. The test of hypotheses:
H0 : µ̂ = µ0 versus Ha : µ̂ 6= µ0
where µ0 is a specified constant.
2. The test statistic: X̄ − µ0
Tc = √ (31)
S/ n
The critical region to control the type I error probability at the desired level, the t
percentage points tα/2, n−1 and as the boundaries of the critical region −tα/2, n−1 and
tα/2, n−1 so that we would reject H0 : µ̂ = µ0 if
Tc > tα/2, n−1 or Tc < −tα/2, n−1

May 12, 2019 67 / 78


Example
An airline wants to evaluate the depth perception of its pilots over the age of 50. A random
sample of n = 14 airline pilots over the age of 50 are asked to judge the distance between
two markers placed 20 feet apart at the opposite end of the laboratory. The sample data
listed here are the pilots’ error (recorded in feet) in judging the distance.
2.7, 2.4, 1.9, 2.6, 2.4 1.9, 2.3
2.2, 2.5, 2.3, 1.8, 2.5, 2.0, 2.2
Use the sample data to test the hypothesis that the average error in depth perception for
the company’s pilots over the age of 50 is 2.00 at α = 0.05 confidence level on µ.
Discussion: The sample n = 14 is small and assuming that the data sets are normally
distributed. Verify that X̄ = 2.26 and S = 0.28.
1. Hypothesis:
H0 : µ̂ = 2.00 versus Ha : µ̂ 6= 2.00
1. Test Statistic:
X̄ − µ0 2.26 − 2.00 0.26
Tc = √ = √ = = 3.474
S/ n 0.28/ 14 0.28/3.742
3. Critical regionFrom the t-distribution table, tα/2,n−1 = t0.025,13 = 2.16.
4. Conclusion: Since Tc = 3.474 > t0.025,13 = 2.16, H0 is rejected. the average error in
depth perception for the company’s pilots over the age of 50 is different from 2.00
Exercise:
a. Compute the upper and lower one-sided tests at the same significance level.
b. Compute a 95% confidence interval on µ, the average error in depth perception for the
company’s pilots over the age of 50.

May 12, 2019 68 / 78


Hypothesis Testing on the Mean of a Population with Unknown Variance σ 2

.
Figure: Critical Regions for two-sided (a), One-sided Alternative Ha : θ̂ > θ0 (left), and
the One-sided Alternative Ha : θ̂ < θ0 (right), for Student t Distributed T.

May 12, 2019 69 / 78


Tests for a Population Variance, σ 2

Let X1 , X2 , . . . , Xn is a random sample from a normal distribution with mean µ and


(n−1)S 2
unknown variance σ 2 . We have already mentioned that σ2
=∼ χ2(n−1) and repeat the
process to obtain the confidence interval as
 
(n−1)S 2 (n−1)S 2
2 , 2 ,
χn−1,α/2 χn−1,1−α/2

n 2
1
where, S 2 =
P
n−1
Xi − X̄
i=1
1. The test of hypotheses:
H0 : σ̂ 2 = σ02 versus Ha : σ̂ 2 6= σ02
where σ02 is a specified constant population variance.
2. The test statistic: (n − 1)S 2
χ2cal = (32)
σ02
The critical region to control the type I error probability at the desired level, the χ2
percentage points χ2n−1,α/2 and as the boundaries of the critical region χ2n−1,α/2 and
χ2n−1,1−α/2 so that we would reject H0 : σ̂ 2 = σ02 if
χ2cal > χ2n−1,1−α/2 or χ2cal < χ2n−1,α/2

May 12, 2019 70 / 78


Tests for a Population Variance cont...

May 12, 2019 71 / 78


Tests on a Population Proportion

Recall that a random sample of size n has been taken from a large (possibly infinite)
population and that X (≤ n) observations in this sample belong to a class of interest.
Then is a point estimator of the proportion of the population p that belongs to this class.
Note that n and p are the parameters of a binomial distribution with mean p and
variance np(1 − p), if p is not too close to either 0 or 1 and if n is relatively large.
1. The test of hypotheses:
H0 : p̂ = p0 versus Ha : p̂ 6= p0
where p the binomial parameter and assuming that X ∼ N (np0 , np0 (1 − p0 )).
2. The test statistic: X − np0
Zcal = p (33)
np0 (1 − p0 )
3. Rejection Region: Reject H0 if the observed value of the test statistic Zcal is either
Zcal > Zα/2 or Zcal < −Zα/2 .

May 12, 2019 72 / 78


Example
Example: A semiconductor manufacturer produces controllers used in automobile engine
applications. The customer requires that the process fallout or fraction defective at a
critical manufacturing step not exceed 0.05 and that the manufacturer demonstrate
process capability at this level of quality using α = 0.05. The semiconductor manufacturer
takes a random sample of 200 devices and finds that four of them are defective. Can the
manufacturer demonstrate process capability for the customer?
Discussion: X = 4, α = 0.05, n = 200 and p0 = 0.05.
1. The test of hypotheses:
H0 : p̂ = 0.05 versus Ha : p̂ < 0.05
2. Rejection Region: Reject H0 if the observed value of the test statistic Zcal is or
Zcal < −Zα = −Z0.05 = −1.645.
3. The test statistic:
X − np0
Zcal = p
np0 (1 − p0 )
4 − 200(0.05)
=p = −1.95
200(0.05)(1 − 0.05)

4. Conclusion: Reject H0 since Zcal = −1.95 < −Zα = −Z0.05 = −1.645. We conclude
that the process is capable.

May 12, 2019 73 / 78


Hypotheses Tests for a Difference in Means Distributions, Variances Unknown
Let X11 , X12 , . . . , X1n1 be a random sample of n1 observations from the first population and
X21 , X22 , . . . , X2n2 be a random sample of n2 observations from the second population.
Let X̄1 , X̄2 , S12 and S22 be the sample means and sample variances, respectively.
 
The expected value of the difference in sample means is E X̄1 − X̄2 = µ1 − µ2 , so is an
unbiased estimator of the difference in means (Verify!).
Tests of hypotheses on the difference in means µ1 − µ2 of two normal distributions where
the variances and are unknown. A t-statistic will be used to test these hypotheses.
Two different situations must be treated. In the first case, we assume that the variances of
the two normal distributions are unknown but equal; that is, σ12 = σ22 = σ 2 . In the
second, we assume that σ12 and are unknown and σ22 not necessarily equal.
Case 1: σ12 = σ22 = σ 2
1. The variance of X̄1 − X̄2 is
 σ2 σ2
Var X̄1 − X̄2 = 1 + 2

n1 n2
 
2 1 1
=σ +
n1 n2
2. The pooled estimator of σ 2 , denoted by Sp2 , is defined by
(n1 −1)S12 +(n2 −1)S22
Sp2 = n1 +n2 −2

X̄1 − X̄2 − (µ1 − µ2 )
Tc = r 
1
Sp n
+ n1
1 2

has a t- distribution with n1 + n2 − 2 degrees of freedom.


May 12, 2019 74 / 78
Hypotheses Tests for a Difference in Means for Equal Variance

1. Test hypothesis: H0 : µ1 − µ2 = D0
2. Test Statistics
X̄1 − X̄2 − D0
Tc = r  (34)
1
Sp n
+ n1
1 2

May 12, 2019 75 / 78


Example
The summary statistics given below from two catalysts types in which 8 samples in the
pilot plant are take from each are being analyzed to determine how they affect the mean
yield of a chemical process. Specifically, the 1st catalyst is currently in use, but the 2nd
catalyst is acceptable.

Table: Catalyst Yield Data


May 12, 2019 76 / 78
Example
A test is run in the pilot plant and results in the data shown in Table above. Is there any
difference between the mean yields? Use α = 0.05, and assume equal variances.
1. Test hypothesis: H0 : µ1 − µ2 = 0 versus Ha : µ1 − µ2 6= 0
(n1 − 1) S12 + (n2 − 1) S22
Sp2 =
n1 + n2 − 2
7 2.392 + 7 2.982
 
=
8+8−2
7 2.392 + 7 2.982 √
 
= = 7.30 ⇒ Sp = 7.30 = 2.70
8+8−2
3. Rejection region: Reject H0 if Tc > tα/2,14 = t0.025 , 14 = 2.145 or
Tc < −tα/2,14 = −t0.025,14 = −2.145
2. Test Statistics µ1 − µ2 − 0
Zc = r  
1
Sp n
+ n1
1 2

92.255 − 92.733 − 0
= r  = −0.35
1 1
2.70 8
+ 8

4. Conclusion: Since −tα/2,14 = 2.145 < Tc = −0.35 < t0.025,14 = 2.145 H0 does not be
rejected. That is, at the α = 0.05 level of significance, we do not have strong evidence to
conclude that catalyst 2 results in a mean yield that differs from the mean yield when
catalyst 1 is used.
May 12, 2019 77 / 78
Tests on Two Population Proportions
Recall that a random sample of size n1 and n2 has been taken from a large (possibly
infinite) populations and that X1 (≤ n) and X2 (≤ n) observations in this sample belong to a
class of interest.
There is a point estimator of the proportion of the population p that belongs to this class.
Note that n and p are the parameters of a binomial distribution with mean p and
variance np(1 − p), if p is not too close to either 0 or 1 and if n is relatively large.
1. The test of hypotheses:
H0 : p̂1 = p̂2

May 12, 2019 78 / 78

You might also like