0% found this document useful (0 votes)
432 views29 pages

Statistical Estimation Essentials

(1) The document discusses statistical estimation and properties of estimators. It defines unbiased estimators as those where the expected value of the estimator is equal to the true population parameter. (2) It also discusses minimum variance estimators, which have the smallest possible variability in their estimates. Common estimation methods like method of moments and maximum likelihood estimation are introduced. (3) Examples are provided to illustrate unbiased estimators and how to calculate variances of estimators. The best estimators are those that are unbiased and have minimum variance.

Uploaded by

Rafael Lendy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
432 views29 pages

Statistical Estimation Essentials

(1) The document discusses statistical estimation and properties of estimators. It defines unbiased estimators as those where the expected value of the estimator is equal to the true population parameter. (2) It also discusses minimum variance estimators, which have the smallest possible variability in their estimates. Common estimation methods like method of moments and maximum likelihood estimation are introduced. (3) Examples are provided to illustrate unbiased estimators and how to calculate variances of estimators. The best estimators are those that are unbiased and have minimum variance.

Uploaded by

Rafael Lendy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 6 - Statistical Estimation

Unbiased estimators & Minimum variance,

Method of Moments (M.M.),

Maximum Likelihood estimators (M.L.E.)

Estimators

The objective of statistics is to make an inference about a population based on information


contained in a sample. Since populations are characterized by numerical descriptive measures
called parameters, the objective of many statistical investigations is to make an inference about
one or more parameters. Estimation has many practical applications. For example, a notebook
manufacturer might be interested in estimating the proportion p of notebooks that would
fail prior to the expiration of the one-year guarantee period. Other important population
parameters are the population mean (µ), population variance (σ 2 ), and population standard
deviation (σ). For example, a factory makes a type of scale, potential customers might wish
to estimate the mean measurement of an object with standard (known) weight of 4 kg as well
as the standard deviation (or variance) of the measurements. To simplify our terminology, we
will call the parameter of interest the target parameter θ.

Some Properties of Point Estimators

An estimator (normally in the form of an equation) is a rule that tells us how to calculate
an estimate (a number) based on the measurements contained in a sample. It is possible to
obtain many different estimators (rules) for a single population parameter. This should not be
surprising. Ten engineers, each was assigned to estimate the cost of construction of a new cruise
terminal, would most likely arrive at different estimates. Such engineers, called estimators in
the construction industry, use certain fixed guidelines plus intuition to achieve their estimates.
Each represents a unique human subjective rule for obtaining a single estimate. This brings us
to the most important point that some estimators are good and some are bad. How do we
define good or bad here?

1
2

Desired Properties of θ̂

If X1 , X2 , . . . , Xn represents a random sample to be taken from a population, then µ̂ = X, where


X is the sample mean. Its value is NOT FIXED. It may vary from sample to sample. Certainly
its value is NOT identical to µ in general, since µ is a fixed quantity, although unknown to us.

We use µ̂ to stand for an estimator of µ or an estimate for µ.

To estimate the mean weight (µ) of 7 million residents of HK, a sample of size 8 is taken. The
results (in kg) are as follows: 71.2, 60, 55.3, 65.4, 32.7, 78.6, 68.8, 59.6 Estimate µ.

From the 8 data we can obtain easily µ̂ = X = 61.45. The estimate µ̂ = X = 61.45 uses ALL
the information in the sample FAIRLY. There is no intention to overestimate or underestimate
µ. It is called an unbiased estimator of µ.

To appreciate this, suppose we adopt a rule of dropping the largest value and using the average
of the remaining data as an estimate for µ.

ˆ = 71.2 + 60 + 55.3 + 65.4 + 32.7 + 68.8 + 59.6 = 59.


Then say for the above data µ̂
7
ˆ underestimates µ and is BIASED
This is a downward biased estimate for µ. In the long run µ̂
towards light-weight people and BIASED against heavy people, thus is NOT fair.
n
1 X ˆ 2 = D2 =
2 2
Similarly, to estimate σ , we use σ̂ = S 2
= (Xi − X)2 instead of σ̂
n − 1 i=1
n
1X
(Xi − X)2 . This is because D2 gives a downward biased estimate for σ 2 while S 2 is
n i=1
an UNBIASED estimator.

Unbiasedness Property. Let θ̂ be a point estimator of a parameter θ. Then θ̂ is an unbiased


estimator if E(θ̂) = θ. That is, we would like to have the estimates for θ clustered around its
true value, therefore a good estimator (in some sense).

Minimum Variance. In addition to unbiasedness, we would like the spread of the sampling
distribution of the estimator to be as small as possible. In other words, we want Var(θ̂) to be
a minimum.

Example 1:
(a) Let θ be the parameter of interest and let θ̂ be an estimator of that parameter. Select the
correct definition for an unbiased estimator from the following:

A : If E(θ̂) = θ, then θ is an unbiased estimator of θ̂.


B : If E(θ) = θ̂, then θ is an unbiased estimator of θ̂.
C : If E(θ̂) = θ, then θ̂ is an unbiased estimator of θ. *
D : If E(θ) = θ̂, then θ̂ is an unbiased estimator of θ.
E : None of these.
3

(b) Let Xi have mean µ and variance σ 2 , for i = 1, 2, . . . , n. Prove that E(X) = µ.

Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, . . . , n.

X1 + X2 + · · · + Xn
X= .
n
 
X 1 + X2 + · · · + Xn 1
E(X) = E = [E(X1 ) + E(X2 ) + · · · + E(Xn )]
n n
1 1
= (µ + µ + · · · + µ) = (nµ) = µ.
n n
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.

Example 2:
(a) Let θ be the parameter of interest and let θ̂ be an estimator of θ. Which of the following is
the correct definition of an unbiased estimator?

(A) If E(θ) = θ̂, then θ is an unbiased estimator of θ̂.


(B) If E(θ̂) = θ̂, then θ̂ is an unbiased estimator of θ.
(C) If E(θ) = θ̂, then θ̂ is an unbiased estimator of θ.
(D) If E(θ̂) = θ, then θ is an unbiased estimator of θ̂.
(E) If E(θ̂) = θ, then θ̂ is an unbiased estimator of θ. *

(b) Let {X1 , X2 , X3 } be a random sample taken from a population where the mean is µ and
the variance is σ 2 .

(i) Show that X is an unbiased estimator of µ.


Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, 3.
X1 + X2 + X3
X= .
3
 
X1 + X 2 + X3 1
E(X) = E = [E(X1 ) + E(X2 ) + E(X3 )]
3 3
1 1
= (µ + µ + µ) = (3µ) = µ.
3 3
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.
σ2 σ2
(ii) What is the variance of X? Var(X) = = .
n 3

(iii) Another estimator of µ is suggested, namely X e = 2X1 + X2 + 2X3 . Show that X


e is also
5 5 5
an unbiased estimator of µ.
 
2X1 X2 2X3 2 1 2
E(X) = E
e + + = E(X1 ) + E(X2 ) + E(X3 )
5 5 5 5 5 5
2µ µ 2µ 5µ
= + + = = µ.
5 5 5 5

Thus, E(X)
e = µ. That is, X
e is also an unbiased estimator of µ.
4

(iv) Find the variance of X.


e
 
2X1 X2 2X3
Var(X)
e = Var + +
5 5 5
4 1 4
= Var(X1 ) + Var(X2 ) + Var(X3 )
25 25 25
2 2 2
4σ σ 4σ
= + +
25 25 25
9σ 2
= .
25

(v) By comparing your answers in part (ii) and part (iv), which estimator of µ, X or X,
e is
better? Why?
σ2 25σ 2 2
e = 9σ = 27σ .
2
Var(X) = = and Var(X)
3 75 25 75

Var(X) is smaller than Var(X).


e

Therefore, X is B.L.U.E. - the Better (BEST) Linear Unbiased Estimator since it is an


U.M.V.E. - Unbiased Minimum Variance Estimator.

Example 3:
Let {X1 , X2 , X3 , X4 } be a random sample from a population where the mean is µ and the
variance is σ 2 .

(i) Show that X is an unbiased estimator of µ.


Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, 3, 4.
X 1 + X2 + X 3 + X4
X = .
4
 
X1 + X2 + X3 + X4 1
E(X) = E = [E(X1 ) + E(X2 ) + E(X3 ) + E(X4 )]
4 4
1 1
= (µ + µ + µ + µ) = (4µ) = µ.
4 4
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.
σ2 σ2
(ii) What is the variance of X? Var(X) = = .
n 4

(iii) Another estimator of µ is suggested, X e = 3X1 + X2 + X3 + 3X4 . Show that Xe is also


8 8 8 8
an unbiased estimator of µ.
 
3X 1 X 2 X 3 3X 4 3 1 1 3
E(X)
e = E + + + = E(X1 ) + E(X2 ) + E(X3 ) + E(X4 )
8 8 8 8 8 8 8 8
3µ µ µ 3µ 8µ
= + + + = = µ.
8 8 8 8 8
Thus, E(X)
e = µ. That is, X e is also an unbiased estimator of µ.
5

(iv) Find the variance of X.


e
 
3X 1 X 2 X 2 3X 4
Var(X)
e = Var + + +
8 8 8 8
9 1 1 9
= Var(X1 ) + Var(X2 ) + Var(X3 ) + Var(X4 )
64 64 64 64
2 2 2 2 2 2
9σ σ σ 9σ 20σ 5σ
= + + + = = .
64 64 64 64 64 16

(v) By comparing your answers in part (ii) and part (iv), which estimator of µ, X or X,
e is
better? Why?
σ2 4σ 2 e = 5σ .
2
Var(X) = = and Var(X)
4 16 16

Var(X) is smaller than Var(X).


e

Therefore, X is BLUE - the Better Linear Unbiased Estimator since it is a U.M.V.E. -


Unbiased Minimum Variance Estimator.

Example 4: Show that S 2 is an unbiased estimator of σ 2 . Show that D2 is biased. (Assume


a normal population.)

Xi − µ
Z = Z is a standard normal
σ
(Xi − µ)2
Z2 = ∼ χ21 . . . (1)
Pn σ2
i=1 (Xi − µ)2
∼ χ2n . . . (2)
Pn σ2
i=1 (Xi − X)2
∼ χ2n−1 . . . (3)
σ2
n
X
(Xi − X)2 ∼ σ 2 χ2n−1 ∗∗∗∗∗∗
i=1

Pn
(Xi − X)2 σ 2 χ2n−1
S 2 = i=1 ∼ ,
n−1 n−1
 n
P 2
  2 2 
2 i=1 (Xi − X) σ χn−1
E(S ) = E = E
n−1 n−1
2
σ σ2
= E(χ2n−1 ) = × (n − 1) . . . (4)
n−1 n−1
= σ2.

Therefore S 2 an unbiased estimator of σ 2 .


6

Notes:

(1) If you square a standard normal random variable, you get a random variable that follows
a Chi-squared distribution with 1 degree of freedom (df).

(2) If you add up n independent Chi-squared random variables (with 1 df each) you get a
random variable that follows a Chi-squared distribution with n degrees of freedom.

(3) If you replace a parameter with its estimate you lose 1 degree of freedom.

(4) The mean of a Chi-squared random variable is equal to the degrees of freedom of the
distribution.

Starting again from ******

Pn
2 (Xi − X)2 σ 2 χ2n−1
D = i=1 ∼ ,
 Pn n n
 2 2 
2

2 i=1 (Xi − X) σ χn−1
E(D ) = E = E
n n
2
σ σ2
= E(χ2n−1 ) = × (n − 1) . . . (4)
n n
6= σ 2 .

Therefore, D2 is NOT an unbiased estimator of σ 2 .

Note: S is NOT an unbiased estimator of σ.

A more formal proof is given below.


7

n
! " n
#
X X 2
E (Xi − X)2 = E Xi2 − nX
i=1
" i=1
n
#
X 1
= E Xi2 − (X1 + X2 + · · · + Xn )2
n
" i=1
n n X n
#
X 1 X
= E Xi2 − X i Xj
i=1
n i=1 j=1
" n n
#
X 1 X 2 X
= E Xi2 − Xi2 − Xi X j
i=1
n i=1
n i<j
n
n−1X  2X
= E Xi2 − E (Xi Xj )
n i=1 n i<j
 2 n(n − 1)
= (n − 1) µ2 + σ 2 − E (Xi ) E (Xj )
n 2
= (n − 1) µ2 + σ 2 − (n − 1)µ2


= (n − 1)σ 2

Some Commonly Used Unbiased Point Estimators

Target Parameter (θ) Sample Size(s) Point Estimator (θ̂) Var(θ̂)


σ2
µ n Y n

pq
p n p̂ n

σ12 σ22
µ1 − µ2 n1 , n2 Y1−Y2 n1
+ n2

p1 q1 p2 q2
p1 − p2 n1 , n2 p̂1 − p̂2 n1
+ n2

2 σ4
σ2 n S2 n−1

How to Construct Estimators of an Unknown Parameter?

We will briefly discuss two important estimation methods (techniques) in statistics here, namely
the Method of Moments and the Maximum Likelihood Estimation Method.

Method of Moments (M.M.)


The Method of Moments is a very simple procedure for finding an estimator of one or
more parameters. The method of moments involves equating sample moments with theoretical
moments. So, let’s start by making sure we recall the definitions of theoretical moments, as
well as learn the definitions of sample moments.

Definitions:
8

(1) E(X k ) is the k-th (theoretical) raw moment of the random variable X (about the origin),
for k = 1, 2, . . .

(2) E[(X − µ)k ] is the k-th (theoretical) central moment of X (about the mean), for k =
1, 2, . . .

The basic idea behind this form of the method is to:


n
1X
(1) Equate the first sample moment about the origin M1 = Xi = X to the first theo-
n i=1
retical moment E(X).
n
1X 2
(2) Equate the second sample moment about the origin M2 = X to the second theo-
n i=1 i
retical moment E(X 2 ).

(3) Continue equating sample moments about the origin, Mk , with the corresponding theo-
retical moments E(X k ), k = 3, 4, . . . until you have the same number of equations and
unknown parameters.

(4) Solve for the unknown parameters.

The resulting estimators are called method of moments estimators (or simply moment estima-
tors).

Example 5
Let X1 , X2 , . . . , Xn be Bernoulli random variables with parameter p. What is the method of
moments estimator of p?

Solution: For X ∼ Bin(n = 1, p), the first theoretical moment about the origin is
E(X) = np = p since n = 1.

We have just one parameter for which we are trying to derive the method of moments estimator.
Therefore, we need just one equation. Equating the first theoretical moment about the origin
with the corresponding sample moment, we get p = X.

We just need to put a “hat” on the parameter to make it clear that it is an estimator. We
can also subscript the estimator with an “MM” to indicate that the estimator is the method of
moments estimator:
p̂M M = X.

Example 6
Let X1 , X2 , . . . , Xn be normal random variables with mean µ, and variance σ 2 . What are the
method of moments estimators of the mean, µ, and variance, σ 2 ?

Solution: Here X ∼ N(µ, σ 2 ). The first and second theoretical moments about the origin are
E(X) = µ and E(X 2 ) = σ 2 + µ2 .
9

In this case, we have two parameters for which we are trying to derive method of moments
estimators. Therefore, we need two equations here. Equating the first theoretical moment
about the origin with the corresponding sample moment, we get

µ̂M M = X.

And, equating the second theoretical moment about the origin with the corresponding sample
moment, we get
n
2 2 2 1X 2
E(X ) = σ + µ = Xi .
n i=1

Now, the first equation tells us that the method of moments estimator of the mean µ is the
sample mean. And, substituting the sample mean in for µ in the second equation and solving
for σ 2 , we get that the method of moments estimator of the variance σ 2 is
n n
2 1X 2 2 1X 2
σ̂M M = Xi − X = Xi − X .
n i=1 n i=1

Example 7
Let X1 , X2 , . . . , Xn be gamma random variables with parameters α and λ. What are the method
of moments estimators of α and λ?
λα α−1 −λx
Solution: The probability density function is f (x) = x e for x > 0.
Γ(α)
α α
E(X) = and Var(X) = 2 .
λ λ
α
Thus, = X ⇐⇒ α̂M M = λX.
λ

Substitute this into E(X 2 ) = Var(X) + [E(X)]2 and solve for λ.

α α2
E(X 2 ) = + ,
Pn λ2 λ2
i=1 Xi2 X 2
= +X ,
n λ
P n 2
X i=1 Xi 2
= −X
λ n
Pn 2 2
i=1 Xi − nX
= .
n
nX
λ̂M M = Pn 2.
2
i=1 X i − nX
10

Example 8
Sometimes, M.M.E.’s (method of moment estimators or moment estimators) are meaningless.
Consider the following example.

A random variable X follows the uniform distribution Uni(0, θ) and we obtained some realized
values of X as x1 = 2, x2 = 3, x3 = 5 and x4 = 12.

We know that
θ
E(X) = ,
2
so the M.M.E. of θ is
θ̂M M = 2X.
2 + 3 + 5 + 12
The value of X = = 5.5.
4
The estimate for θ in the above example is

θ̂M M = 2 (5.5) = 11.

However, this estimate for θ, which is the upper bound of the uniform distribution, is clearly not
acceptable because of the realized value x4 = 12 has already exceeded this estimated “bound”.

Let’s look at another method for finding estimators of parameters.


11

Maximum Likelihood Estimation (M.L.E.)

Introduction

In 1922, a famous English statistician Sir Ronald A. Fisher proposed a new method of parameter
estimation, namely the maximum likelihood estimation (M.L.E.).

The idea is to find the value of the unknown parameters for which they make the most likely
possible outcome as the realized data being observed. Calculus is to be used to maximize the
“likelihood”.

Suppose X1 , X2 , . . . , Xn have joint density function denoted by f (x1 , . . . , xn ; θ) where θ is the


parameter of the joint probability distribution.

Given observed values X1 = x1 , X2 = x2 , . . . , Xn = xn , the likelihood of θ is the function

L(θ) = L(θ; x1 , . . . , xn ) = f (x1 , . . . , xn ; θ),

which is the probability of observing the given data as a function of θ. The maximum
likelihood estimate (M.L.E.) for θ, namely θ̂, is the value of θ that maximizes L(θ) [or equiva-
lently `(θ) = ln L(θ)]: it is the value that makes the observed data the most probable. If the
Xi ’s are i.i.d., then the likelihood function simplifies to

n
Y
L(θ) = f (xi ; θ),
i=1

and the task is to maximize

n
X
`(θ) = lnL(θ) = ln{f (xi ; θ)}
i=1

with respect to θ.
12

Summary of M.L.E. in Common Distributions

Distribution Support Parameter(s) MLE for MLE Same Unbiased


of X concerned single ob- as
servation MME?
X X
Bin(n, p) {0, 1, 2, . . . , n} p n n
Yes Yes
Poi(λ) {0, 1, 2, . . .} λ X X Yes Yes
1 1
Geo(p) {1, 2, 3, . . .} p X X
Yes No*
1 1
Geo(p) {0, 1, 2, . . .} p X+1 X+1
Yes No
r r
NegBin(n, p) {r, r+1, r+2, . . .} p X X
Yes No
1 1
Exp(λ) R+ λ X X
Yes No
α α
Gam(α, λ) R+ λ X X
Yes No#
N(µ, σ 2 ) R µ X X Yes Yes
1
Pn
N(µ, σ 2 ) R σ2 (X − µ)2 n i=1 (Xi − µ)
2
N/A Yes
1
Pn
N(µ, σ 2 ) R µ, σ 2 N/A X and n i=1 (Xi − X)
2
Yes, Yes Yes, No
U(0, θ) (0, θ) θ X X(m) No No^

* For X ∼ Geo(p),
 
1
E (p̂M L ) = E
X

X 1
= (1 − p)x−1 p
x=1
x

p X1
= (1 − p)x
1 − p x=1 x
∞ Z
p X 1
= (1 − t)x−1 dt
1 − p x=1 p
Z 1X ∞
p
= (1 − t)x−1 dt
1 − p p x=1
Z 1
p 1
= dt
1−p p t
p
= [ln |t|]1p
1−p
p ln p
= − 6= p.
1−p
13

# For X ∼ Gam(α, λ),


  α
E λ̂M L = E
Z ∞X
α λα α−1 −λx
= x e dx
0 x Γ(α)
Z ∞
α λα
= xα−2 e−λx dx
0 α − 1 Γ(α − 1)
Z ∞
αλ λα−1
= xα−2 e−λx dx
α − 1 0 Γ(α − 1)
αλ
= 6= λ.
α−1

 
It can also be seen that when α = 1, i.e., X ∼ Gam(1, λ) ≡ Exp(λ), E λ̂M L does not exist
at all.

^ For X ∼ Uni(0, θ), the maximum of n observations, X(n) has a pdf

d 
f (x) = P X(n) ≤ x
dx
d
= P {max(X1 , X2 , . . . , Xn ) ≤ x}
dx
d
= P (X1 ≤ x ∩ X2 ≤ x ∩ · · · ∩ Xn ≤ x)
dx
d
= [P (X1 ≤ x)P (X2 ≤ x) · · · P (Xn ≤ x)]
dx
d  x n
=
dx θ
nxn−1
= , for 0 < x < θ;
θn
and zero otherwise. Therefore,
θ
nxn−1
Z
 
E X(n) = x n dx
0 θ
Z θ
n
= n xn dx
θ 0

n xn+1

= n
θ n+1 0

= 6= θ.
n+1
14

Binomial distribution

Suppose that X ∼ Bin(n, p) with unknown parameter p, the pmf is


 
n x
p(x) = p (1 − p)n−x , for x = 0, 1, 2, . . . , n.
x

The likelihood function is  


n x
L(p; x) = p (1 − p)n−x .
x
The log-likelihood function is
 
n
`(p; x) = ln L(p; x) = ln + x ln p + (n − x) ln(1 − p).
x

Then,
∂` x n−x x
= 0 =⇒ − = 0 =⇒ p = .
∂p p 1−p n
The MLE of p is
X
p̂M L = .
n

Note: MLE and MME are the same in this case because E(X) = np.

Remark:
With m observations, the likelihood function is
m m   m  
Y Y n xi n−xi
Y n Pm Pm
L(p; x) = L(p, xi ) = p (1 − p) = × p i=1 xi (1 − p) i=1 (n−xi ) .
i=1 i=1
xi i=1
xi

The log-likelihood function is


m   X m m
X n X
`(p; x) = ln L(p; x) = ln + xi ln p + (n − xi ) ln(1 − p).
i=1
xi i=1 i=1

Then,
Pm
mn − m
P
∂` i=1 xi
i=1 xi
= 0 =⇒ − =0
∂p p 1−p
mx mn − mx
=⇒ − =0
p 1−p
x
=⇒ p = .
n
The MLE of p is
X
p̂M L = .
n
15

Poisson distribution

Suppose that X ∼ Poi(λ) with unknown parameter λ, the pf is

e−λ λx
p(x) = , for x = 0, 1, 2, . . . .
x!

The likelihood function is


e−λ λx
L(λ; x) = .
x!
The log-likelihood function is

`(λ; x) = ln L(λ; x) = −λ + x ln λ − ln x!.

Then,
∂` x
= 0 =⇒ −1 + = 0 =⇒ λ = x.
∂λ λ
The MLE of λ is
λ̂M L = X.

Note: MLE and MME are the same in this case because E(X) = λ.

Geometric distribution

Suppose that X ∼ Geo(p) where p is an unknown parameter, with pmf

p(x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . .

[Here X is the number of trial(s) until the first success.]

The likelihood function is


L(p; x) = (1 − p)x−1 p.
The log-likelihood function is

`(p; x) = ln L(p; x) = (x − 1) ln(1 − p) + ln p.

Then,
∂` x−1 1 1
= 0 =⇒ − + = 0 =⇒ p = .
∂p 1−p p x
The MLE of p is
1
p̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = 1/p.
16

Negative Binomial distribution

Suppose that X ∼ NegBin(n, p) where p is an unknown parameter, with pmf


 
x−1 r
p(x) = p (1 − p)x−r , for x = r, r + 1, r + 2, . . . .
r−1

[Here X is the number of trial(s) until the r-th success.]

The likelihood function is  


x−1 r
L(p; x) = p (1 − p)x−r .
r−1
The log-likelihood function is
 
x−1
`(p; x) = ln L(p; x) = ln + r ln p + (x − r) ln(1 − p).
r−1
Then,
∂` r x−r r
= 0 =⇒ − = 0 =⇒ p = .
∂p p 1−p x
The MLE of p is
r
p̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = r/p.

Exponential distribution

Suppose that X ∼ Exp(λ) with unknown parameter λ, the pdf is

f (x) = λe−λx , for x > 0.

The likelihood function is


L(λ; x) = λe−λx .
The log-likelihood function is

`(λ; x) = ln L(λ; x) = ln λ − λx.

Then,
∂` 1 1
= 0 =⇒ − x = 0 =⇒ λ = .
∂λ λ x
The MLE of λ is
1
λ̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = 1/λ.
17

Gamma distribution with known shape parameter α

Suppose that X ∼ Gam(α, λ) with unknown rate parameter λ, the pdf is


 α
 λ xα−1 e−λx , for x > 0;
f (x) = Γ(α)
0, otherwise.

The likelihood function is


λα α−1 −λx
L(α, λ; x) = x e .
Γ(α)
The log-likelihood function is
`(α, λ; x) = ln L(α, λ; x) = α ln λ − ln Γ(α) + (α − 1) ln x − λx.
Then,
∂` α α
= 0 =⇒ − x = 0 =⇒ λ = .
∂λ λ x
The MLE of λ is
α
λ̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = α/λ.

Remark: For a Gam(α, λ) distribution with known rate parameter λ but unknown shape pa-
rameter α, the MLE of α has no closed form.

Normal distribution with known variance parameter σ 2

Suppose that X ∼ N(µ, σ 2 ) with unknown mean parameter µ, the pdf is


1 (x−µ)2
f (x) = √ e− 2σ2 , for − ∞ < x < ∞.
2πσ 2

The likelihood function is


1 (x−µ)2
L(µ, σ 2 ; x) = √ e− 2σ 2 .
2πσ 2
The log-likelihood function is
1 (x − µ)2
`(µ, σ 2 ; x) = ln L(µ, σ 2 ; x) = − ln(2πσ 2 ) − .
2 2σ 2
Then,
∂` x−µ
= 0 =⇒ = 0 =⇒ µ = x.
∂µ σ2
The MLE of µ is
µ̂M L = X.

Note: MLE and MME are the same in this case because E(X) = µ.
18

Normal distribution with known mean parameter µ

Suppose that X ∼ N(µ, σ 2 ) with unknown variance parameter σ 2 (not σ), the pdf is
1 (x−µ)2
f (x) = √ e− 2σ 2 , for − ∞ < x < ∞.
2πσ 2
The likelihood function is
1 (x−µ)2
L(µ, σ 2 ; x) = √ e− 2σ 2 .
2πσ 2
The log-likelihood function is

1 1 (x − µ)2
`(µ, σ 2 ; x) = ln L(µ, σ 2 ; x) = − ln(2π) − ln(σ 2 ) − .
2 2 2σ 2
Then,
∂` 1 (x − µ)2
= 0 =⇒ − + = 0 =⇒ σ 2 = (x − µ)2 .
∂σ 2 2σ 2 2(σ 2 )2
The MLE of σ 2 is
2 2
σ̂M L = (X − µ) .

Remark:
With n observations, the likelihood function is
n n   n2
Y Y 1 (xi −µ)2 1 1 Pn 2

2
L(µ, σ ; x) = 2
L(µ, σ ; xi ) = √ e 2σ 2 = e− 2σ2 i=1 (xi −µ) .
i=1 i=1 2πσ 2 2πσ 2

The log-likelihood function is


n
2 2n n 2 1 X
`(µ, σ ; x) = ln L(µ, σ ; x) = − ln(2π) − ln(σ ) − 2 (xi − µ)2 .
2 2 2σ i=1

Then,
n n
∂` n 1 X 2 2 1X
= 0 =⇒ − 2 + (xi − µ) = 0 =⇒ σ = (xi − µ)2 .
∂σ 2 2σ 2(σ 2 )2 i=1 n i=1

The MLE of σ 2 is n
2 1X
σ̂M L = (Xi − µ)2 .
n i=1
19

Normal distribution with no known parameters

Suppose that X ∼ N(µ, σ 2 ) with unknown mean parameter µ and unknown variance parameter
σ 2 (not σ), the pdf is
1 (x−µ)2
f (x) = √ e− 2σ 2 , for − ∞ < x < ∞.
2πσ 2

With n observations, the likelihood function is


n n   n2
Y Y 1 (xi −µ)2 1 1 Pn 2

2
L(µ, σ ; x) = 2
L(µ, σ ; xi ) = √ e 2σ 2 = e− 2σ2 i=1 (xi −µ) .
i=1 i=1 2πσ 2 2πσ 2

The log-likelihood function is


n
2 n
2 n 2 1 X
`(µ, σ ; x) = ln L(µ, σ ; x) = − ln(2π) − ln(σ ) − 2 (xi − µ)2 .
2 2 2σ i=1

For the MLE for µ, consider


n n
∂` 1 X 1X
= 0 =⇒ 2 (xi − µ) = 0 =⇒ µ = xi = x.
∂µ σ i=1 n i=1

The MLE of µ is
µ̂M L = X.

For the MLE for σ 2 , consider


n n
∂` n 1 X 2 2 1X
2
= 0 =⇒ − 2 + 2 2
(xi − µ) = 0 =⇒ σ = (xi − µ)2 .
∂σ 2σ 2(σ ) i=1 n i=1

The MLE of σ 2 is n
2 1X
σ̂M L = (Xi − µ)2 .
n i=1

Here µ is unknown, so it is to be substituted by its MLE µ̂M L = X, that is,


n n
2 1X n−1 1 X n−1 2
σ̂M L = (Xi − X)2 = · (Xi − X)2 = S ,
n i=1 n n − 1 i=1 n

where S 2 is the sample variance.

Note 1: MLE and MME are the same in this case because

E(X) = µ,
E(X 2 ) = σ 2 + µ2 .
=⇒ σ2 = E(X 2 ) − [E(X)]2 .
2
=⇒ σ̂M M = m02 − (m01 )2 .
20

and
n
2 1X
σ̂M L = (Xi − X)2
n i=1
n
1 X 2 2

= Xi − 2Xi X + X
n i=1
n n
!
1 X X 2
= Xi2 − 2X Xi + nX
n i=1 i=1
n
!
1 X 2
= Xi2 − 2XnX + nX
n i=1
n
!
1 X 2
= Xi2 − nX
n i=1
n
1X 2 2
= Xi − X
n i=1
= m02 − (m01 )2 .

Note 2: These estimators are not unbiased as


 
2 2 n−1 2 n−1 2
σ 6= σ 2 .
 
E σ̂M L = E σ̂M M = E S =
n n

Uniform distribution

Suppose that X ∼ Uni(0, θ) with unknown parameter θ, the pdf is


(
1
, for 0 < x < θ;
f (x) = θ
0, otherwise.

With n observations, the likelihood function is


n n
Y Y 1 1
L(θ; x) = L(θ; xi ) = = ,
i=1 i=1
θ θn

which is a decreasing function of θ.

That is, the smaller θ is, the larger the likelihood function is. Therefore, we are going to see
what the smallest value of θ it can take based on the n observations. Since all the observations
of X (i.e., x1 , x2 , . . . , xn ) must lie between 0 and θ, the smallest value of θ can only be the
maximum of these n observations. Thus, the MLE of θ is

θ̂M L = max(X1 , X2 , . . . , Xn ) = X(n) ,

that is, the n-th order statistics of X or the largest observation in the sample.
21

Exercises

Exercise 1: Let {X1 , X2 , . . . , Xn } be a random sample from a distribution with pdf

f (x) = θxθ−1 , for 0 < x < 1;

and zero otherwise, where θ is the unknown parameter.

(a) Find the maximum likelihood estimator of θ.

(b) Use the result in part (a) to estimate θ from the following observed sample:

{0.718, 0.571, 0.662, 0.975, 0.746, 0.979, 0.429, 0.509, 0.876, 0.666}

Solution:

Likelihood function: !θ−1


n
Y n
Y
L(θ; x) = θxθ−1
i = θn xi
i=1 i=1

Log-likelihood function:
n
X
`(θ; x) = ln L(θ; x) = n ln θ + (θ − 1) ln xi
i=1

Consider
n n
!−1
∂ n X X
`(θ; x) = 0 =⇒ + ln xi = 0 =⇒ θ = −n ln xi .
∂θ θ i=1 i=1
Pn −1
The log-likelihood function attains its maximum when θ = −n ( i=1 ln xi )
∂2 n
as the second derivative 2
`(θ; x) = − 2 < 0.
∂θ θ
Therefore the MLE for θ is given by
n
!−1 n
!−1
X 1X
θ̂M L = −n ln xi =− ln xi .
i=1
n i=1

Note that
10
1 X
ln xi = −0.3704.
10 i=1
The maximum likelihood estimate for θ is
1
θ̂M L = − = 2.7000.
−0.3704
22

Exercise 2: Eddy’s budgerigar, Marigold, chirps for X minutes until she is fed. The prob-
ability density function of X is given as follows:
(
Cxe−λx , for x > 0;
f (x) =
0, otherwise,
where C and λ are two positive constants.

(a) Express C in terms of λ.


 
1
(b) Determine E(X) and E in terms of λ.
X
(c) Show that the moment estimator of λ is
2
λ̂ = .
X
(d) Show that the maximum likelihood estimator of λ is the same as its moment estimator.
(e) Eddy records the chirping time (in minutes) of Marigold for 20 days as follows:

2, 5, 5, 7, 8, 8, 9, 12, 12, 12, 12, 12, 15, 16, 18, 18, 18, 18, 18, 25
(i) Estimate the parameter λ and write down the corresponding variance of X.
(ii) Eddy makes 5 more observations and re-estimates λ using all the observed data. He
finds that with the new estimation the standard deviation of the chirping time is 8
minutes. Determine the average chirping time of the 5 new observations.

Solution: Eddy’s budgerigar, Marigold, chirps for X minutes until she is fed. The proba-
bility density function of X is given as follows:
(
Cxe−λx , for x > 0;
f (x) =
0, otherwise,
where C and λ are two positive constants.

(a) Express C in terms of λ.

Checking your formula sheet, X follows a gamma distribution.


A gamma distribution Γ(n, λ) has a density
 λn
n−1 −λx
 Γ(n) x e , for x > 0;


f (x) =



0, otherwise.
Therefore, by comparing we have n = 2 and X ∼ Γ(2, λ)
Thus,
λ2 λ2
C= = = λ2 .
Γ(2) 1!
23
 
1
(b) Determine E(X) and E in terms of λ.
X

Check your formula sheet to get


n 2
E(X) = = .
λ λ

On the other hand


  Z ∞
1 1
E = f (x)dx
X 0 x
Z ∞
1 2 −λx
= λ xe dx
0 x
Z ∞
= λ λe−λx dx
0
= λ×1
= λ.

2
(c) Show that the moment estimator of λ is λ̂ = .
X

2
The first moment E(X) gives X = .
λ
Therefore,
2
λ̂ = .
X

(d) Show that the maximum likelihood estimator of λ is the same as its moment estimator.

For n observations x1 , . . . , xn , consider the likelihood function as


n
Y
L(x1 , . . . , xn ; λ) = f (xi ; λ)
i=1
Yn
= λ2 xi e−λxi
i=1
n
!
Y Pn
= λ2n xi e−λ i=1 xi

i=1

The loglikelihood is

`(x1 , . . . , xn ; λ) = ln L(x1 , . . . , xn ; λ)
Yn n
X
= 2n ln λ + ln xi − λ xi .
i=1 i=1
24

n
Y n
X
(Note: ln xi = ln xi .)
i=1 i=1
Consider n
∂` d` 2n X
= = − xi .
∂λ dλ λ i=1

For
∂`
= 0,
∂λ
n
2n X
− xi = 0,
λ̂ i=1
n
2n X
= xi ,
λ̂ i=1
2n 2
λ̂ = n = .
X X
xi
i=1

Remark: Check the second derivative


∂ 2` 2n
2
=− 2 <0
∂λ λ
implying that the above refers to a maximization.

(e) Eddy records the chirping time (in minutes) of Marigold for 20 days as follows:

2, 5, 5, 7, 8, 8, 9, 12, 12, 12, 12, 12, 15, 16, 18, 18, 18, 18, 18, 25

(i) Estimate the parameter λ and write down the corresponding variance of X.

Using the results of (c) or (d):


2 2 2 2
λ̂ = = = = = 0.16.
x (2 + 5 + · · · + 25)/20 250/20 12.5

Check your formula sheet to get


n 2
Var(X) = = = 78.125.
λ̂2 0.162

(ii) Eddy makes 5 more observations and re-estimates λ using all the observed data. She
finds that with the new estimation the standard deviation of the chirping time is 8
minutes. Determine the average chirping time of the 5 new observations.

Consider
2
Var(X) = = 82 = 64.
λ̂2
25

We have
r
2
λ̂ = ,
64

2 2
= ,
x 8

x = 8 2.

2 + 5 + · · · + 25 + (sum of 5 new obs) √


= 8 2,
20 + 5
250 + (sum of 5 new obs) √
= 8 2,
25

sum of 5 new obs = 200 2 − 250,

sum of 5 new obs √


= 40 2 − 50 = 6.56854188.
5
The average chirping time of the 5 new observations is 6.56854188 minutes.
26

Exercise 3: A theory suggests that X, the number of accidents incurred by a worker per
year, has the following probability function:

(
θ(1 − θ)x , for x = 0, 1, 2, . . . ;
f (x) =
0, otherwise,

1−θ
where 0 < θ < 1 is an unknown parameter of the distribution of X with mean and
θ
1−θ
variance . Let {X1 , . . . , Xn } be a random sample of size n from the population.
θ2

(a) Construct θ̂, the maximum likelihood estimator of θ, in terms of X1 , . . . , Xn .

(b) Construct θ,
e the moment estimator of θ, in terms of X1 , . . . , Xn .

Solution:

(a) Construct θ̂, the maximum likelihood estimator of θ, in terms of X1 , . . . , Xn .

For n observations x1 , . . . , xn , consider the likelihood function as

n
Y
L(θ; x1 , . . . , xn ) = f (θ; xi )
i=1
n
Y
= θ(1 − θ)xi
i=1
Pn
n xi
= θ (1 − θ) i=1 .

The loglikelihood is

`(θ; x1 , . . . , xn ) = ln L(θ; x1 , . . . , xn )
X n
= n ln(θ) + xi ln(1 − θ).
i=1

Consider the first derivative


Pn
∂` n xi
= − i=1 .
∂θ θ 1−θ
27

∂`
Now Set = 0.
∂θ
Pn
n i=1 xi
Thus, = ,
θ̂1 − θ̂
Pn
1 − θ̂ i=1 xi
= ,
θ̂ n

1 − θ̂
= x̄,
θ̂

1 − θ̂ = x̄θ̂,

1 = x̄θ̂ + θ̂,

1 = θ̂(x̄ + 1).

1
Therefore, θ̂ = .
X̄ + 1

Check the second derivative:


Pn n
∂` n i=1 xi
X
−1
= − = nθ − xi (1 − θ)−1 ,
∂θ θ 1−θ i=1

n
∂ 2` −2
X −n nx̄
2
= −nθ − xi (1 − θ)−2 = 2 − .
∂θ i=1
θ (1 − θ)2
1
When θ = ,
x̄ + 1
∂ 2` (x̄ + 1)2
 
−n nx̄ 2 2 1
= 2 − = −n(x̄ + 1) − n = −n(x̄ + 1) 1 + < 0.
∂θ2 θ (1 − θ)2 x̄ x̄
1
Therefore, θ̂ = is a maximum as long as X < −1.
X +1

(b) Construct θ,
e the moment estimator of θ, in terms of X1 , . . . , Xn .

1−θ
Given E(X) = .
θ
1 − θe
Solve X = for θ.
e
θe
1
θe = .
X +1
28

Exercise 4: A population has a density function given by:

f (x) = (θ + 1)xθ for 0 ≤ x ≤ 1,

where θ is an unknown positive parameter. Given {X1 , X2 , . . . , Xn } is a random sample from


the population.

(a) Find the maximum likelihood estimator (M.L.E.) θ̂1 of θ.

(b) Find the population mean E(X) in terms of θ.

(c) Hence, find a moment estimator θ̂2 of θ by the method of moments.

Solution:

(a) Find the maximum likelihood estimator (M.L.E.) θ̂1 of θ.

For n observations x1 , . . . , xn , consider the likelihood function as

n
Y
L(θ; x1 , . . . , xn ) = f (θ; xi )
i=1
Yn
= (θ + 1)xθi
i=1
= (θ + 1)n (x1 x2 · · · xn )θ .

The loglikelihood is

`(θ; x1 , . . . , xn ) = ln L(θ; x1 , . . . , xn )
= n ln(θ + 1) + θ ln(x1 x2 · · · xn ).

n
Y n
X
(Note: ln xi = ln xi .)
i=1 i=1

Consider the first derivative

n
∂` d` n X
= = + ln(xi ).
∂θ dθ θ + 1 i=1
29

For
∂`
= 0,
∂θ
n
n X
+ ln(xi ) = 0,
θ̂ + 1 i=1
n
n X
= − ln(xi ),
θ̂ + 1 i=1

−n
θ̂ + 1 = n ,
X
ln(xi )
i=1
n
Therefore, θ̂1 = −1 − n .
X
ln(Xi )
i=1
n
∂` X
= n(θ + 1)−1 + ln(xi ).
∂θ i=1
Remark: Check the second derivative:
∂ 2` −n
2
= −n(θ + 1)−2 = < 0,
∂λ (θ + 1)2
therefore the above is a maximization.

(b) Find the population mean E(X) in terms of θ.

1 1 1
(θ + 1)xθ+2
Z Z 
θ θ+1 θ+1
E(X) = x(θ + 1)x dx = (θ + 1)x dx = = .
0 0 θ+2 0 θ+2

(c) Hence, find a moment estimator θ̂2 of θ by the method of moments.


θ+1
E(X) = .
θ+2
θ̂ + 1
Solve for θ̂ : X = ,
θ̂ + 2

θ̂ + 1 = X θ̂ + 2X,

θ̂ − X θ̂ = 2X − 1,

θ̂(1 − X) = 2X − 1,

2X − 1
Therefore, θ̂2 = .
1−X

You might also like