0% found this document useful (0 votes)

432 views29 pages

Statistical Estimation Essentials

(1) The document discusses statistical estimation and properties of estimators. It defines unbiased estimators as those where the expected value of the estimator is equal to the true population parameter. (2) It also discusses minimum variance estimators, which have the smallest possible variability in their estimates. Common estimation methods like method of moments and maximum likelihood estimation are introduced. (3) Examples are provided to illustrate unbiased estimators and how to calculate variances of estimators. The best estimators are those that are unbiased and have minimum variance.

Uploaded by

Rafael Lendy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

432 views29 pages

Statistical Estimation Essentials

Uploaded by

Rafael Lendy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Chapter 6 - Statistical Estimation

Unbiased estimators & Minimum variance,

Method of Moments (M.M.),

Maximum Likelihood estimators (M.L.E.)

Estimators

The objective of statistics is to make an inference about a population based on information

contained in a sample. Since populations are characterized by numerical descriptive measures
called parameters, the objective of many statistical investigations is to make an inference about
one or more parameters. Estimation has many practical applications. For example, a notebook
manufacturer might be interested in estimating the proportion p of notebooks that would
fail prior to the expiration of the one-year guarantee period. Other important population
parameters are the population mean (µ), population variance (σ 2 ), and population standard
deviation (σ). For example, a factory makes a type of scale, potential customers might wish
to estimate the mean measurement of an object with standard (known) weight of 4 kg as well
as the standard deviation (or variance) of the measurements. To simplify our terminology, we
will call the parameter of interest the target parameter θ.

Some Properties of Point Estimators

An estimator (normally in the form of an equation) is a rule that tells us how to calculate
an estimate (a number) based on the measurements contained in a sample. It is possible to
obtain many different estimators (rules) for a single population parameter. This should not be
surprising. Ten engineers, each was assigned to estimate the cost of construction of a new cruise
terminal, would most likely arrive at different estimates. Such engineers, called estimators in
the construction industry, use certain fixed guidelines plus intuition to achieve their estimates.
Each represents a unique human subjective rule for obtaining a single estimate. This brings us
to the most important point that some estimators are good and some are bad. How do we
define good or bad here?

1
2

Desired Properties of θ̂

If X1 , X2 , . . . , Xn represents a random sample to be taken from a population, then µ̂ = X, where

X is the sample mean. Its value is NOT FIXED. It may vary from sample to sample. Certainly
its value is NOT identical to µ in general, since µ is a fixed quantity, although unknown to us.

We use µ̂ to stand for an estimator of µ or an estimate for µ.

To estimate the mean weight (µ) of 7 million residents of HK, a sample of size 8 is taken. The
results (in kg) are as follows: 71.2, 60, 55.3, 65.4, 32.7, 78.6, 68.8, 59.6 Estimate µ.

From the 8 data we can obtain easily µ̂ = X = 61.45. The estimate µ̂ = X = 61.45 uses ALL
the information in the sample FAIRLY. There is no intention to overestimate or underestimate
µ. It is called an unbiased estimator of µ.

To appreciate this, suppose we adopt a rule of dropping the largest value and using the average
of the remaining data as an estimate for µ.

ˆ = 71.2 + 60 + 55.3 + 65.4 + 32.7 + 68.8 + 59.6 = 59.

Then say for the above data µ̂
7
ˆ underestimates µ and is BIASED
This is a downward biased estimate for µ. In the long run µ̂
towards light-weight people and BIASED against heavy people, thus is NOT fair.
n
1 X ˆ 2 = D2 =
2 2
Similarly, to estimate σ , we use σ̂ = S 2
= (Xi − X)2 instead of σ̂
n − 1 i=1
n
1X
(Xi − X)2 . This is because D2 gives a downward biased estimate for σ 2 while S 2 is
n i=1
an UNBIASED estimator.

Unbiasedness Property. Let θ̂ be a point estimator of a parameter θ. Then θ̂ is an unbiased

estimator if E(θ̂) = θ. That is, we would like to have the estimates for θ clustered around its
true value, therefore a good estimator (in some sense).

Minimum Variance. In addition to unbiasedness, we would like the spread of the sampling
distribution of the estimator to be as small as possible. In other words, we want Var(θ̂) to be
a minimum.

Example 1:
(a) Let θ be the parameter of interest and let θ̂ be an estimator of that parameter. Select the
correct definition for an unbiased estimator from the following:

A : If E(θ̂) = θ, then θ is an unbiased estimator of θ̂.

B : If E(θ) = θ̂, then θ is an unbiased estimator of θ̂.
C : If E(θ̂) = θ, then θ̂ is an unbiased estimator of θ. *
D : If E(θ) = θ̂, then θ̂ is an unbiased estimator of θ.
E : None of these.
3

(b) Let Xi have mean µ and variance σ 2 , for i = 1, 2, . . . , n. Prove that E(X) = µ.

Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, . . . , n.

X1 + X2 + · · · + Xn
X= .
n

X 1 + X2 + · · · + Xn 1
E(X) = E = [E(X1 ) + E(X2 ) + · · · + E(Xn )]
n n
1 1
= (µ + µ + · · · + µ) = (nµ) = µ.
n n
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.

Example 2:
(a) Let θ be the parameter of interest and let θ̂ be an estimator of θ. Which of the following is
the correct definition of an unbiased estimator?

(A) If E(θ) = θ̂, then θ is an unbiased estimator of θ̂.

(B) If E(θ̂) = θ̂, then θ̂ is an unbiased estimator of θ.
(C) If E(θ) = θ̂, then θ̂ is an unbiased estimator of θ.
(D) If E(θ̂) = θ, then θ is an unbiased estimator of θ̂.
(E) If E(θ̂) = θ, then θ̂ is an unbiased estimator of θ. *

(b) Let {X1 , X2 , X3 } be a random sample taken from a population where the mean is µ and
the variance is σ 2 .

(i) Show that X is an unbiased estimator of µ.

Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, 3.
X1 + X2 + X3
X= .
3

X1 + X 2 + X3 1
E(X) = E = [E(X1 ) + E(X2 ) + E(X3 )]
3 3
1 1
= (µ + µ + µ) = (3µ) = µ.
3 3
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.
σ2 σ2
(ii) What is the variance of X? Var(X) = = .
n 3

(iii) Another estimator of µ is suggested, namely X e = 2X1 + X2 + 2X3 . Show that X

e is also
5 5 5
an unbiased estimator of µ.

2X1 X2 2X3 2 1 2
E(X) = E
e + + = E(X1 ) + E(X2 ) + E(X3 )
5 5 5 5 5 5
2µ µ 2µ 5µ
= + + = = µ.
5 5 5 5

Thus, E(X)
e = µ. That is, X
e is also an unbiased estimator of µ.
4

(iv) Find the variance of X.

e

2X1 X2 2X3
Var(X)
e = Var + +
5 5 5
4 1 4
= Var(X1 ) + Var(X2 ) + Var(X3 )
25 25 25
2 2 2
4σ σ 4σ
= + +
25 25 25
9σ 2
= .
25

(v) By comparing your answers in part (ii) and part (iv), which estimator of µ, X or X,
e is
better? Why?
σ2 25σ 2 2
e = 9σ = 27σ .
2
Var(X) = = and Var(X)
3 75 25 75

Var(X) is smaller than Var(X).

Therefore, X is B.L.U.E. - the Better (BEST) Linear Unbiased Estimator since it is an

U.M.V.E. - Unbiased Minimum Variance Estimator.

Example 3:
Let {X1 , X2 , X3 , X4 } be a random sample from a population where the mean is µ and the
variance is σ 2 .

(i) Show that X is an unbiased estimator of µ.

Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, 3, 4.
X 1 + X2 + X 3 + X4
X = .
4

X1 + X2 + X3 + X4 1
E(X) = E = [E(X1 ) + E(X2 ) + E(X3 ) + E(X4 )]
4 4
1 1
= (µ + µ + µ + µ) = (4µ) = µ.
4 4
Thus, E(X) = µ. That is, X is an unbiased estimator of µ.
σ2 σ2
(ii) What is the variance of X? Var(X) = = .
n 4

(iii) Another estimator of µ is suggested, X e = 3X1 + X2 + X3 + 3X4 . Show that Xe is also

8 8 8 8
an unbiased estimator of µ.

3X 1 X 2 X 3 3X 4 3 1 1 3
E(X)
e = E + + + = E(X1 ) + E(X2 ) + E(X3 ) + E(X4 )
8 8 8 8 8 8 8 8
3µ µ µ 3µ 8µ
= + + + = = µ.
8 8 8 8 8
Thus, E(X)
e = µ. That is, X e is also an unbiased estimator of µ.
5

(iv) Find the variance of X.

e

3X 1 X 2 X 2 3X 4
Var(X)
e = Var + + +
8 8 8 8
9 1 1 9
= Var(X1 ) + Var(X2 ) + Var(X3 ) + Var(X4 )
64 64 64 64
2 2 2 2 2 2
9σ σ σ 9σ 20σ 5σ
= + + + = = .
64 64 64 64 64 16

(v) By comparing your answers in part (ii) and part (iv), which estimator of µ, X or X,
e is
better? Why?
σ2 4σ 2 e = 5σ .
2
Var(X) = = and Var(X)
4 16 16

Var(X) is smaller than Var(X).

Therefore, X is BLUE - the Better Linear Unbiased Estimator since it is a U.M.V.E. -

Unbiased Minimum Variance Estimator.

Example 4: Show that S 2 is an unbiased estimator of σ 2 . Show that D2 is biased. (Assume

a normal population.)

Xi − µ
Z = Z is a standard normal
σ
(Xi − µ)2
Z2 = ∼ χ21 . . . (1)
Pn σ2
i=1 (Xi − µ)2
∼ χ2n . . . (2)
Pn σ2
i=1 (Xi − X)2
∼ χ2n−1 . . . (3)
σ2
n
X
(Xi − X)2 ∼ σ 2 χ2n−1 ∗∗∗∗∗∗
i=1

Pn
(Xi − X)2 σ 2 χ2n−1
S 2 = i=1 ∼ ,
n−1 n−1
n
P 2
2 2
2 i=1 (Xi − X) σ χn−1
E(S ) = E = E
n−1 n−1
2
σ σ2
= E(χ2n−1 ) = × (n − 1) . . . (4)
n−1 n−1
= σ2.

Therefore S 2 an unbiased estimator of σ 2 .

Notes:

(1) If you square a standard normal random variable, you get a random variable that follows
a Chi-squared distribution with 1 degree of freedom (df).

(2) If you add up n independent Chi-squared random variables (with 1 df each) you get a
random variable that follows a Chi-squared distribution with n degrees of freedom.

(3) If you replace a parameter with its estimate you lose 1 degree of freedom.

(4) The mean of a Chi-squared random variable is equal to the degrees of freedom of the
distribution.

Starting again from ******

Pn
2 (Xi − X)2 σ 2 χ2n−1
D = i=1 ∼ ,
Pn n n
2 2
2

2 i=1 (Xi − X) σ χn−1
E(D ) = E = E
n n
2
σ σ2
= E(χ2n−1 ) = × (n − 1) . . . (4)
n n
6= σ 2 .

Therefore, D2 is NOT an unbiased estimator of σ 2 .

Note: S is NOT an unbiased estimator of σ.

A more formal proof is given below.

n
! " n
#
X X 2
E (Xi − X)2 = E Xi2 − nX
i=1
" i=1
n
#
X 1
= E Xi2 − (X1 + X2 + · · · + Xn )2
n
" i=1
n n X n
#
X 1 X
= E Xi2 − X i Xj
i=1
n i=1 j=1
" n n
#
X 1 X 2 X
= E Xi2 − Xi2 − Xi X j
i=1
n i=1
n i<j
n
n−1X 2X
= E Xi2 − E (Xi Xj )
n i=1 n i<j
2 n(n − 1)
= (n − 1) µ2 + σ 2 − E (Xi ) E (Xj )
n 2
= (n − 1) µ2 + σ 2 − (n − 1)µ2

= (n − 1)σ 2

Some Commonly Used Unbiased Point Estimators

Target Parameter (θ) Sample Size(s) Point Estimator (θ̂) Var(θ̂)

σ2
µ n Y n

pq
p n p̂ n

σ12 σ22
µ1 − µ2 n1 , n2 Y1−Y2 n1
+ n2

p1 q1 p2 q2
p1 − p2 n1 , n2 p̂1 − p̂2 n1
+ n2

2 σ4
σ2 n S2 n−1

How to Construct Estimators of an Unknown Parameter?

We will briefly discuss two important estimation methods (techniques) in statistics here, namely
the Method of Moments and the Maximum Likelihood Estimation Method.

Method of Moments (M.M.)

The Method of Moments is a very simple procedure for finding an estimator of one or
more parameters. The method of moments involves equating sample moments with theoretical
moments. So, let’s start by making sure we recall the definitions of theoretical moments, as
well as learn the definitions of sample moments.

Definitions:
8

(1) E(X k ) is the k-th (theoretical) raw moment of the random variable X (about the origin),
for k = 1, 2, . . .

(2) E[(X − µ)k ] is the k-th (theoretical) central moment of X (about the mean), for k =
1, 2, . . .

The basic idea behind this form of the method is to:

n
1X
(1) Equate the first sample moment about the origin M1 = Xi = X to the first theo-
n i=1
retical moment E(X).
n
1X 2
(2) Equate the second sample moment about the origin M2 = X to the second theo-
n i=1 i
retical moment E(X 2 ).

(3) Continue equating sample moments about the origin, Mk , with the corresponding theo-
retical moments E(X k ), k = 3, 4, . . . until you have the same number of equations and
unknown parameters.

(4) Solve for the unknown parameters.

The resulting estimators are called method of moments estimators (or simply moment estima-
tors).

Example 5
Let X1 , X2 , . . . , Xn be Bernoulli random variables with parameter p. What is the method of
moments estimator of p?

Solution: For X ∼ Bin(n = 1, p), the first theoretical moment about the origin is
E(X) = np = p since n = 1.

We have just one parameter for which we are trying to derive the method of moments estimator.
Therefore, we need just one equation. Equating the first theoretical moment about the origin
with the corresponding sample moment, we get p = X.

We just need to put a “hat” on the parameter to make it clear that it is an estimator. We
can also subscript the estimator with an “MM” to indicate that the estimator is the method of
moments estimator:
p̂M M = X.

Example 6
Let X1 , X2 , . . . , Xn be normal random variables with mean µ, and variance σ 2 . What are the
method of moments estimators of the mean, µ, and variance, σ 2 ?

Solution: Here X ∼ N(µ, σ 2 ). The first and second theoretical moments about the origin are
E(X) = µ and E(X 2 ) = σ 2 + µ2 .
9

In this case, we have two parameters for which we are trying to derive method of moments
estimators. Therefore, we need two equations here. Equating the first theoretical moment
about the origin with the corresponding sample moment, we get

µ̂M M = X.

And, equating the second theoretical moment about the origin with the corresponding sample
moment, we get
n
2 2 2 1X 2
E(X ) = σ + µ = Xi .
n i=1

Now, the first equation tells us that the method of moments estimator of the mean µ is the
sample mean. And, substituting the sample mean in for µ in the second equation and solving
for σ 2 , we get that the method of moments estimator of the variance σ 2 is
n n
2 1X 2 2 1X 2
σ̂M M = Xi − X = Xi − X .
n i=1 n i=1

Example 7
Let X1 , X2 , . . . , Xn be gamma random variables with parameters α and λ. What are the method
of moments estimators of α and λ?
λα α−1 −λx
Solution: The probability density function is f (x) = x e for x > 0.
Γ(α)
α α
E(X) = and Var(X) = 2 .
λ λ
α
Thus, = X ⇐⇒ α̂M M = λX.
λ

Substitute this into E(X 2 ) = Var(X) + [E(X)]2 and solve for λ.

α α2
E(X 2 ) = + ,
Pn λ2 λ2
i=1 Xi2 X 2
= +X ,
n λ
P n 2
X i=1 Xi 2
= −X
λ n
Pn 2 2
i=1 Xi − nX
= .
n
nX
λ̂M M = Pn 2.
2
i=1 X i − nX
10

Example 8
Sometimes, M.M.E.’s (method of moment estimators or moment estimators) are meaningless.
Consider the following example.

A random variable X follows the uniform distribution Uni(0, θ) and we obtained some realized
values of X as x1 = 2, x2 = 3, x3 = 5 and x4 = 12.

We know that
θ
E(X) = ,
2
so the M.M.E. of θ is
θ̂M M = 2X.
2 + 3 + 5 + 12
The value of X = = 5.5.
4
The estimate for θ in the above example is

θ̂M M = 2 (5.5) = 11.

However, this estimate for θ, which is the upper bound of the uniform distribution, is clearly not
acceptable because of the realized value x4 = 12 has already exceeded this estimated “bound”.

Let’s look at another method for finding estimators of parameters.

Maximum Likelihood Estimation (M.L.E.)

Introduction

In 1922, a famous English statistician Sir Ronald A. Fisher proposed a new method of parameter
estimation, namely the maximum likelihood estimation (M.L.E.).

The idea is to find the value of the unknown parameters for which they make the most likely
possible outcome as the realized data being observed. Calculus is to be used to maximize the
“likelihood”.

Suppose X1 , X2 , . . . , Xn have joint density function denoted by f (x1 , . . . , xn ; θ) where θ is the

parameter of the joint probability distribution.

Given observed values X1 = x1 , X2 = x2 , . . . , Xn = xn , the likelihood of θ is the function

L(θ) = L(θ; x1 , . . . , xn ) = f (x1 , . . . , xn ; θ),

which is the probability of observing the given data as a function of θ. The maximum
likelihood estimate (M.L.E.) for θ, namely θ̂, is the value of θ that maximizes L(θ) [or equiva-
lently `(θ) = ln L(θ)]: it is the value that makes the observed data the most probable. If the
Xi ’s are i.i.d., then the likelihood function simplifies to

n
Y
L(θ) = f (xi ; θ),
i=1

and the task is to maximize

n
X
`(θ) = lnL(θ) = ln{f (xi ; θ)}
i=1

with respect to θ.
12

Summary of M.L.E. in Common Distributions

Distribution Support Parameter(s) MLE for MLE Same Unbiased

of X concerned single ob- as
servation MME?
X X
Bin(n, p) {0, 1, 2, . . . , n} p n n
Yes Yes
Poi(λ) {0, 1, 2, . . .} λ X X Yes Yes
1 1
Geo(p) {1, 2, 3, . . .} p X X
Yes No*
1 1
Geo(p) {0, 1, 2, . . .} p X+1 X+1
Yes No
r r
NegBin(n, p) {r, r+1, r+2, . . .} p X X
Yes No
1 1
Exp(λ) R+ λ X X
Yes No
α α
Gam(α, λ) R+ λ X X
Yes No#
N(µ, σ 2 ) R µ X X Yes Yes
1
Pn
N(µ, σ 2 ) R σ2 (X − µ)2 n i=1 (Xi − µ)
2
N/A Yes
1
Pn
N(µ, σ 2 ) R µ, σ 2 N/A X and n i=1 (Xi − X)
2
Yes, Yes Yes, No
U(0, θ) (0, θ) θ X X(m) No No^

* For X ∼ Geo(p),

1
E (p̂M L ) = E
X
∞
X 1
= (1 − p)x−1 p
x=1
x
∞
p X1
= (1 − p)x
1 − p x=1 x
∞ Z
p X 1
= (1 − t)x−1 dt
1 − p x=1 p
Z 1X ∞
p
= (1 − t)x−1 dt
1 − p p x=1
Z 1
p 1
= dt
1−p p t
p
= [ln |t|]1p
1−p
p ln p
= − 6= p.
1−p
13

# For X ∼ Gam(α, λ),

α
E λ̂M L = E
Z ∞X
α λα α−1 −λx
= x e dx
0 x Γ(α)
Z ∞
α λα
= xα−2 e−λx dx
0 α − 1 Γ(α − 1)
Z ∞
αλ λα−1
= xα−2 e−λx dx
α − 1 0 Γ(α − 1)
αλ
= 6= λ.
α−1

It can also be seen that when α = 1, i.e., X ∼ Gam(1, λ) ≡ Exp(λ), E λ̂M L does not exist
at all.

^ For X ∼ Uni(0, θ), the maximum of n observations, X(n) has a pdf

d
f (x) = P X(n) ≤ x
dx
d
= P {max(X1 , X2 , . . . , Xn ) ≤ x}
dx
d
= P (X1 ≤ x ∩ X2 ≤ x ∩ · · · ∩ Xn ≤ x)
dx
d
= [P (X1 ≤ x)P (X2 ≤ x) · · · P (Xn ≤ x)]
dx
d x n
=
dx θ
nxn−1
= , for 0 < x < θ;
θn
and zero otherwise. Therefore,
θ
nxn−1
Z

E X(n) = x n dx
0 θ
Z θ
n
= n xn dx
θ 0
θ
n xn+1

= n
θ n+1 0
nθ
= 6= θ.
n+1
14

Binomial distribution

Suppose that X ∼ Bin(n, p) with unknown parameter p, the pmf is

n x
p(x) = p (1 − p)n−x , for x = 0, 1, 2, . . . , n.
x

The likelihood function is

n x
L(p; x) = p (1 − p)n−x .
x
The log-likelihood function is

n
`(p; x) = ln L(p; x) = ln + x ln p + (n − x) ln(1 − p).
x

Then,
∂` x n−x x
= 0 =⇒ − = 0 =⇒ p = .
∂p p 1−p n
The MLE of p is
X
p̂M L = .
n

Note: MLE and MME are the same in this case because E(X) = np.

Remark:
With m observations, the likelihood function is
m m m
Y Y n xi n−xi
Y n Pm Pm
L(p; x) = L(p, xi ) = p (1 − p) = × p i=1 xi (1 − p) i=1 (n−xi ) .
i=1 i=1
xi i=1
xi

The log-likelihood function is

m X m m
X n X
`(p; x) = ln L(p; x) = ln + xi ln p + (n − xi ) ln(1 − p).
i=1
xi i=1 i=1

Then,
Pm
mn − m
P
∂` i=1 xi
i=1 xi
= 0 =⇒ − =0
∂p p 1−p
mx mn − mx
=⇒ − =0
p 1−p
x
=⇒ p = .
n
The MLE of p is
X
p̂M L = .
n
15

Poisson distribution

Suppose that X ∼ Poi(λ) with unknown parameter λ, the pf is

e−λ λx
p(x) = , for x = 0, 1, 2, . . . .
x!

The likelihood function is

e−λ λx
L(λ; x) = .
x!
The log-likelihood function is

`(λ; x) = ln L(λ; x) = −λ + x ln λ − ln x!.

Then,
∂` x
= 0 =⇒ −1 + = 0 =⇒ λ = x.
∂λ λ
The MLE of λ is
λ̂M L = X.

Note: MLE and MME are the same in this case because E(X) = λ.

Geometric distribution

Suppose that X ∼ Geo(p) where p is an unknown parameter, with pmf

p(x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . .

[Here X is the number of trial(s) until the first success.]

The likelihood function is

L(p; x) = (1 − p)x−1 p.
The log-likelihood function is

`(p; x) = ln L(p; x) = (x − 1) ln(1 − p) + ln p.

Then,
∂` x−1 1 1
= 0 =⇒ − + = 0 =⇒ p = .
∂p 1−p p x
The MLE of p is
1
p̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = 1/p.
16

Negative Binomial distribution

Suppose that X ∼ NegBin(n, p) where p is an unknown parameter, with pmf

x−1 r
p(x) = p (1 − p)x−r , for x = r, r + 1, r + 2, . . . .
r−1

[Here X is the number of trial(s) until the r-th success.]

The likelihood function is

x−1 r
L(p; x) = p (1 − p)x−r .
r−1
The log-likelihood function is

x−1
`(p; x) = ln L(p; x) = ln + r ln p + (x − r) ln(1 − p).
r−1
Then,
∂` r x−r r
= 0 =⇒ − = 0 =⇒ p = .
∂p p 1−p x
The MLE of p is
r
p̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = r/p.

Exponential distribution

Suppose that X ∼ Exp(λ) with unknown parameter λ, the pdf is

f (x) = λe−λx , for x > 0.

The likelihood function is

L(λ; x) = λe−λx .
The log-likelihood function is

`(λ; x) = ln L(λ; x) = ln λ − λx.

Then,
∂` 1 1
= 0 =⇒ − x = 0 =⇒ λ = .
∂λ λ x
The MLE of λ is
1
λ̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = 1/λ.
17

Gamma distribution with known shape parameter α

Suppose that X ∼ Gam(α, λ) with unknown rate parameter λ, the pdf is

 α
 λ xα−1 e−λx , for x > 0;
f (x) = Γ(α)
0, otherwise.


The likelihood function is

λα α−1 −λx
L(α, λ; x) = x e .
Γ(α)
The log-likelihood function is
`(α, λ; x) = ln L(α, λ; x) = α ln λ − ln Γ(α) + (α − 1) ln x − λx.
Then,
∂` α α
= 0 =⇒ − x = 0 =⇒ λ = .
∂λ λ x
The MLE of λ is
α
λ̂M L = .
X

Note: MLE and MME are the same in this case because E(X) = α/λ.

Remark: For a Gam(α, λ) distribution with known rate parameter λ but unknown shape pa-
rameter α, the MLE of α has no closed form.

Normal distribution with known variance parameter σ 2

Suppose that X ∼ N(µ, σ 2 ) with unknown mean parameter µ, the pdf is

1 (x−µ)2
f (x) = √ e− 2σ2 , for − ∞ < x < ∞.
2πσ 2

The likelihood function is

1 (x−µ)2
L(µ, σ 2 ; x) = √ e− 2σ 2 .
2πσ 2
The log-likelihood function is
1 (x − µ)2
`(µ, σ 2 ; x) = ln L(µ, σ 2 ; x) = − ln(2πσ 2 ) − .
2 2σ 2
Then,
∂` x−µ
= 0 =⇒ = 0 =⇒ µ = x.
∂µ σ2
The MLE of µ is
µ̂M L = X.

Note: MLE and MME are the same in this case because E(X) = µ.
18

Normal distribution with known mean parameter µ

Suppose that X ∼ N(µ, σ 2 ) with unknown variance parameter σ 2 (not σ), the pdf is
1 (x−µ)2
f (x) = √ e− 2σ 2 , for − ∞ < x < ∞.
2πσ 2
The likelihood function is
1 (x−µ)2
L(µ, σ 2 ; x) = √ e− 2σ 2 .
2πσ 2
The log-likelihood function is

1 1 (x − µ)2
`(µ, σ 2 ; x) = ln L(µ, σ 2 ; x) = − ln(2π) − ln(σ 2 ) − .
2 2 2σ 2
Then,
∂` 1 (x − µ)2
= 0 =⇒ − + = 0 =⇒ σ 2 = (x − µ)2 .
∂σ 2 2σ 2 2(σ 2 )2
The MLE of σ 2 is
2 2
σ̂M L = (X − µ) .

Remark:
With n observations, the likelihood function is
n n n2
Y Y 1 (xi −µ)2 1 1 Pn 2
−
2
L(µ, σ ; x) = 2
L(µ, σ ; xi ) = √ e 2σ 2 = e− 2σ2 i=1 (xi −µ) .
i=1 i=1 2πσ 2 2πσ 2

The log-likelihood function is

n
2 2n n 2 1 X
`(µ, σ ; x) = ln L(µ, σ ; x) = − ln(2π) − ln(σ ) − 2 (xi − µ)2 .
2 2 2σ i=1

Then,
n n
∂` n 1 X 2 2 1X
= 0 =⇒ − 2 + (xi − µ) = 0 =⇒ σ = (xi − µ)2 .
∂σ 2 2σ 2(σ 2 )2 i=1 n i=1

The MLE of σ 2 is n
2 1X
σ̂M L = (Xi − µ)2 .
n i=1
19

Normal distribution with no known parameters

Suppose that X ∼ N(µ, σ 2 ) with unknown mean parameter µ and unknown variance parameter
σ 2 (not σ), the pdf is
1 (x−µ)2
f (x) = √ e− 2σ 2 , for − ∞ < x < ∞.
2πσ 2

With n observations, the likelihood function is

n n n2
Y Y 1 (xi −µ)2 1 1 Pn 2
−
2
L(µ, σ ; x) = 2
L(µ, σ ; xi ) = √ e 2σ 2 = e− 2σ2 i=1 (xi −µ) .
i=1 i=1 2πσ 2 2πσ 2

The log-likelihood function is

n
2 n
2 n 2 1 X
`(µ, σ ; x) = ln L(µ, σ ; x) = − ln(2π) − ln(σ ) − 2 (xi − µ)2 .
2 2 2σ i=1

For the MLE for µ, consider

n n
∂` 1 X 1X
= 0 =⇒ 2 (xi − µ) = 0 =⇒ µ = xi = x.
∂µ σ i=1 n i=1

The MLE of µ is
µ̂M L = X.

For the MLE for σ 2 , consider

n n
∂` n 1 X 2 2 1X
2
= 0 =⇒ − 2 + 2 2
(xi − µ) = 0 =⇒ σ = (xi − µ)2 .
∂σ 2σ 2(σ ) i=1 n i=1

The MLE of σ 2 is n
2 1X
σ̂M L = (Xi − µ)2 .
n i=1

Here µ is unknown, so it is to be substituted by its MLE µ̂M L = X, that is,

n n
2 1X n−1 1 X n−1 2
σ̂M L = (Xi − X)2 = · (Xi − X)2 = S ,
n i=1 n n − 1 i=1 n

where S 2 is the sample variance.

Note 1: MLE and MME are the same in this case because

E(X) = µ,
E(X 2 ) = σ 2 + µ2 .
=⇒ σ2 = E(X 2 ) − [E(X)]2 .
2
=⇒ σ̂M M = m02 − (m01 )2 .
20

and
n
2 1X
σ̂M L = (Xi − X)2
n i=1
n
1 X 2 2

= Xi − 2Xi X + X
n i=1
n n
!
1 X X 2
= Xi2 − 2X Xi + nX
n i=1 i=1
n
!
1 X 2
= Xi2 − 2XnX + nX
n i=1
n
!
1 X 2
= Xi2 − nX
n i=1
n
1X 2 2
= Xi − X
n i=1
= m02 − (m01 )2 .

Note 2: These estimators are not unbiased as

2 2 n−1 2 n−1 2
σ 6= σ 2 .

E σ̂M L = E σ̂M M = E S =
n n

Uniform distribution

Suppose that X ∼ Uni(0, θ) with unknown parameter θ, the pdf is

(
1
, for 0 < x < θ;
f (x) = θ
0, otherwise.

With n observations, the likelihood function is

n n
Y Y 1 1
L(θ; x) = L(θ; xi ) = = ,
i=1 i=1
θ θn

which is a decreasing function of θ.

That is, the smaller θ is, the larger the likelihood function is. Therefore, we are going to see
what the smallest value of θ it can take based on the n observations. Since all the observations
of X (i.e., x1 , x2 , . . . , xn ) must lie between 0 and θ, the smallest value of θ can only be the
maximum of these n observations. Thus, the MLE of θ is

θ̂M L = max(X1 , X2 , . . . , Xn ) = X(n) ,

that is, the n-th order statistics of X or the largest observation in the sample.
21

Exercises

Exercise 1: Let {X1 , X2 , . . . , Xn } be a random sample from a distribution with pdf

f (x) = θxθ−1 , for 0 < x < 1;

and zero otherwise, where θ is the unknown parameter.

(a) Find the maximum likelihood estimator of θ.

(b) Use the result in part (a) to estimate θ from the following observed sample:

{0.718, 0.571, 0.662, 0.975, 0.746, 0.979, 0.429, 0.509, 0.876, 0.666}

Solution:

Likelihood function: !θ−1

n
Y n
Y
L(θ; x) = θxθ−1
i = θn xi
i=1 i=1

Log-likelihood function:
n
X
`(θ; x) = ln L(θ; x) = n ln θ + (θ − 1) ln xi
i=1

Consider
n n
!−1
∂ n X X
`(θ; x) = 0 =⇒ + ln xi = 0 =⇒ θ = −n ln xi .
∂θ θ i=1 i=1
Pn −1
The log-likelihood function attains its maximum when θ = −n ( i=1 ln xi )
∂2 n
as the second derivative 2
`(θ; x) = − 2 < 0.
∂θ θ
Therefore the MLE for θ is given by
n
!−1 n
!−1
X 1X
θ̂M L = −n ln xi =− ln xi .
i=1
n i=1

Note that
10
1 X
ln xi = −0.3704.
10 i=1
The maximum likelihood estimate for θ is
1
θ̂M L = − = 2.7000.
−0.3704
22

Exercise 2: Eddy’s budgerigar, Marigold, chirps for X minutes until she is fed. The prob-
ability density function of X is given as follows:
(
Cxe−λx , for x > 0;
f (x) =
0, otherwise,
where C and λ are two positive constants.

(a) Express C in terms of λ.

1
(b) Determine E(X) and E in terms of λ.
X
(c) Show that the moment estimator of λ is
2
λ̂ = .
X
(d) Show that the maximum likelihood estimator of λ is the same as its moment estimator.
(e) Eddy records the chirping time (in minutes) of Marigold for 20 days as follows:

2, 5, 5, 7, 8, 8, 9, 12, 12, 12, 12, 12, 15, 16, 18, 18, 18, 18, 18, 25
(i) Estimate the parameter λ and write down the corresponding variance of X.
(ii) Eddy makes 5 more observations and re-estimates λ using all the observed data. He
finds that with the new estimation the standard deviation of the chirping time is 8
minutes. Determine the average chirping time of the 5 new observations.

Solution: Eddy’s budgerigar, Marigold, chirps for X minutes until she is fed. The proba-
bility density function of X is given as follows:
(
Cxe−λx , for x > 0;
f (x) =
0, otherwise,
where C and λ are two positive constants.

(a) Express C in terms of λ.

Checking your formula sheet, X follows a gamma distribution.

A gamma distribution Γ(n, λ) has a density
 λn
n−1 −λx
 Γ(n) x e , for x > 0;


f (x) =



0, otherwise.
Therefore, by comparing we have n = 2 and X ∼ Γ(2, λ)
Thus,
λ2 λ2
C= = = λ2 .
Γ(2) 1!
23

1
(b) Determine E(X) and E in terms of λ.
X

Check your formula sheet to get

n 2
E(X) = = .
λ λ

On the other hand

Z ∞
1 1
E = f (x)dx
X 0 x
Z ∞
1 2 −λx
= λ xe dx
0 x
Z ∞
= λ λe−λx dx
0
= λ×1
= λ.

2
(c) Show that the moment estimator of λ is λ̂ = .
X

2
The first moment E(X) gives X = .
λ
Therefore,
2
λ̂ = .
X

(d) Show that the maximum likelihood estimator of λ is the same as its moment estimator.

For n observations x1 , . . . , xn , consider the likelihood function as

n
Y
L(x1 , . . . , xn ; λ) = f (xi ; λ)
i=1
Yn
= λ2 xi e−λxi
i=1
n
!
Y Pn
= λ2n xi e−λ i=1 xi

i=1

The loglikelihood is

`(x1 , . . . , xn ; λ) = ln L(x1 , . . . , xn ; λ)
Yn n
X
= 2n ln λ + ln xi − λ xi .
i=1 i=1
24

n
Y n
X
(Note: ln xi = ln xi .)
i=1 i=1
Consider n
∂` d` 2n X
= = − xi .
∂λ dλ λ i=1

For
∂`
= 0,
∂λ
n
2n X
− xi = 0,
λ̂ i=1
n
2n X
= xi ,
λ̂ i=1
2n 2
λ̂ = n = .
X X
xi
i=1

Remark: Check the second derivative

∂ 2` 2n
2
=− 2 <0
∂λ λ
implying that the above refers to a maximization.

(e) Eddy records the chirping time (in minutes) of Marigold for 20 days as follows:

2, 5, 5, 7, 8, 8, 9, 12, 12, 12, 12, 12, 15, 16, 18, 18, 18, 18, 18, 25

(i) Estimate the parameter λ and write down the corresponding variance of X.

Using the results of (c) or (d):

2 2 2 2
λ̂ = = = = = 0.16.
x (2 + 5 + · · · + 25)/20 250/20 12.5

Check your formula sheet to get

n 2
Var(X) = = = 78.125.
λ̂2 0.162

(ii) Eddy makes 5 more observations and re-estimates λ using all the observed data. She
finds that with the new estimation the standard deviation of the chirping time is 8
minutes. Determine the average chirping time of the 5 new observations.

Consider
2
Var(X) = = 82 = 64.
λ̂2
25

We have
r
2
λ̂ = ,
64
√
2 2
= ,
x 8
√
x = 8 2.

2 + 5 + · · · + 25 + (sum of 5 new obs) √

= 8 2,
20 + 5
250 + (sum of 5 new obs) √
= 8 2,
25
√
sum of 5 new obs = 200 2 − 250,

sum of 5 new obs √

= 40 2 − 50 = 6.56854188.
5
The average chirping time of the 5 new observations is 6.56854188 minutes.
26

Exercise 3: A theory suggests that X, the number of accidents incurred by a worker per
year, has the following probability function:

(
θ(1 − θ)x , for x = 0, 1, 2, . . . ;
f (x) =
0, otherwise,

1−θ
where 0 < θ < 1 is an unknown parameter of the distribution of X with mean and
θ
1−θ
variance . Let {X1 , . . . , Xn } be a random sample of size n from the population.
θ2

(a) Construct θ̂, the maximum likelihood estimator of θ, in terms of X1 , . . . , Xn .

(b) Construct θ,
e the moment estimator of θ, in terms of X1 , . . . , Xn .

Solution:

(a) Construct θ̂, the maximum likelihood estimator of θ, in terms of X1 , . . . , Xn .

For n observations x1 , . . . , xn , consider the likelihood function as

n
Y
L(θ; x1 , . . . , xn ) = f (θ; xi )
i=1
n
Y
= θ(1 − θ)xi
i=1
Pn
n xi
= θ (1 − θ) i=1 .

The loglikelihood is

`(θ; x1 , . . . , xn ) = ln L(θ; x1 , . . . , xn )
X n
= n ln(θ) + xi ln(1 − θ).
i=1

Consider the first derivative

Pn
∂` n xi
= − i=1 .
∂θ θ 1−θ
27

∂`
Now Set = 0.
∂θ
Pn
n i=1 xi
Thus, = ,
θ̂1 − θ̂
Pn
1 − θ̂ i=1 xi
= ,
θ̂ n

1 − θ̂
= x̄,
θ̂

1 − θ̂ = x̄θ̂,

1 = x̄θ̂ + θ̂,

1 = θ̂(x̄ + 1).

1
Therefore, θ̂ = .
X̄ + 1

Check the second derivative:

Pn n
∂` n i=1 xi
X
−1
= − = nθ − xi (1 − θ)−1 ,
∂θ θ 1−θ i=1

n
∂ 2` −2
X −n nx̄
2
= −nθ − xi (1 − θ)−2 = 2 − .
∂θ i=1
θ (1 − θ)2
1
When θ = ,
x̄ + 1
∂ 2` (x̄ + 1)2

−n nx̄ 2 2 1
= 2 − = −n(x̄ + 1) − n = −n(x̄ + 1) 1 + < 0.
∂θ2 θ (1 − θ)2 x̄ x̄
1
Therefore, θ̂ = is a maximum as long as X < −1.
X +1

(b) Construct θ,
e the moment estimator of θ, in terms of X1 , . . . , Xn .

1−θ
Given E(X) = .
θ
1 − θe
Solve X = for θ.
e
θe
1
θe = .
X +1
28

Exercise 4: A population has a density function given by:

f (x) = (θ + 1)xθ for 0 ≤ x ≤ 1,

where θ is an unknown positive parameter. Given {X1 , X2 , . . . , Xn } is a random sample from

the population.

(a) Find the maximum likelihood estimator (M.L.E.) θ̂1 of θ.

(b) Find the population mean E(X) in terms of θ.

(c) Hence, find a moment estimator θ̂2 of θ by the method of moments.

Solution:

(a) Find the maximum likelihood estimator (M.L.E.) θ̂1 of θ.

For n observations x1 , . . . , xn , consider the likelihood function as

n
Y
L(θ; x1 , . . . , xn ) = f (θ; xi )
i=1
Yn
= (θ + 1)xθi
i=1
= (θ + 1)n (x1 x2 · · · xn )θ .

The loglikelihood is

`(θ; x1 , . . . , xn ) = ln L(θ; x1 , . . . , xn )
= n ln(θ + 1) + θ ln(x1 x2 · · · xn ).

n
Y n
X
(Note: ln xi = ln xi .)
i=1 i=1

Consider the first derivative

n
∂` d` n X
= = + ln(xi ).
∂θ dθ θ + 1 i=1
29

For
∂`
= 0,
∂θ
n
n X
+ ln(xi ) = 0,
θ̂ + 1 i=1
n
n X
= − ln(xi ),
θ̂ + 1 i=1

−n
θ̂ + 1 = n ,
X
ln(xi )
i=1
n
Therefore, θ̂1 = −1 − n .
X
ln(Xi )
i=1
n
∂` X
= n(θ + 1)−1 + ln(xi ).
∂θ i=1
Remark: Check the second derivative:
∂ 2` −n
2
= −n(θ + 1)−2 = < 0,
∂λ (θ + 1)2
therefore the above is a maximization.

(b) Find the population mean E(X) in terms of θ.

1 1 1
(θ + 1)xθ+2
Z Z
θ θ+1 θ+1
E(X) = x(θ + 1)x dx = (θ + 1)x dx = = .
0 0 θ+2 0 θ+2

(c) Hence, find a moment estimator θ̂2 of θ by the method of moments.

θ+1
E(X) = .
θ+2
θ̂ + 1
Solve for θ̂ : X = ,
θ̂ + 2

θ̂ + 1 = X θ̂ + 2X,

θ̂ − X θ̂ = 2X − 1,

θ̂(1 − X) = 2X − 1,

2X − 1
Therefore, θ̂2 = .
1−X

Method of Moments
100% (1)
Method of Moments
4 pages
Sampling Distributions Guide
No ratings yet
Sampling Distributions Guide
21 pages
Stata 11: GMM Estimation Guide
No ratings yet
Stata 11: GMM Estimation Guide
29 pages
Modern Mathematical Statistics-Dudewics
No ratings yet
Modern Mathematical Statistics-Dudewics
6 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Econometrics Books: Books On-Line Books / Notes
100% (1)
Econometrics Books: Books On-Line Books / Notes
8 pages
Axiomatic Probability in Engineering
No ratings yet
Axiomatic Probability in Engineering
6 pages
Advanced Statistical Theory
No ratings yet
Advanced Statistical Theory
13 pages
R for Economics Students
No ratings yet
R for Economics Students
128 pages
Method of Moments
No ratings yet
Method of Moments
8 pages
Generalized Method of Moments (GMM) Estimation: Outline
100% (1)
Generalized Method of Moments (GMM) Estimation: Outline
16 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
STAT 480b Answer Key To Problem Set No. 4
No ratings yet
STAT 480b Answer Key To Problem Set No. 4
3 pages
UMVUE Estimators for Statistical Inference
No ratings yet
UMVUE Estimators for Statistical Inference
10 pages
Generalized Method of Moments Overview
No ratings yet
Generalized Method of Moments Overview
14 pages
Properties of Estimators
No ratings yet
Properties of Estimators
27 pages
Statistics For Management and Economics, Tenth Edition Formulas
No ratings yet
Statistics For Management and Economics, Tenth Edition Formulas
11 pages
ARCH/GARCH Models in Econometrics
No ratings yet
ARCH/GARCH Models in Econometrics
26 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
R Packages for Machine Learning
No ratings yet
R Packages for Machine Learning
3 pages
Regression Analysis in Healthcare
No ratings yet
Regression Analysis in Healthcare
3 pages
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
0% (1)
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
3 pages
Heteroskedasticity in Econometrics Analysis
No ratings yet
Heteroskedasticity in Econometrics Analysis
19 pages
STATS Textbook
100% (1)
STATS Textbook
459 pages
Time-Series Model Estimation Techniques
No ratings yet
Time-Series Model Estimation Techniques
64 pages
Gauss-Markov Theorem
No ratings yet
Gauss-Markov Theorem
5 pages
Parameters: Unless Otherwise Noted, These Formulas Assume
No ratings yet
Parameters: Unless Otherwise Noted, These Formulas Assume
6 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Diagonalization: Definition. A Matrix
No ratings yet
Diagonalization: Definition. A Matrix
5 pages
Statistics Formula Reference
No ratings yet
Statistics Formula Reference
11 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Model Specification & Data Issues
No ratings yet
Model Specification & Data Issues
45 pages
Time Series Econometrics Exam Questions
No ratings yet
Time Series Econometrics Exam Questions
18 pages
TSExamples PDF
No ratings yet
TSExamples PDF
9 pages
Econ3005 Applied Econometrics Exam
No ratings yet
Econ3005 Applied Econometrics Exam
11 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
STATS 325 Stochastic Processes Notes
No ratings yet
STATS 325 Stochastic Processes Notes
195 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Vector Autoregressions: How To Choose The Order of A VAR
No ratings yet
Vector Autoregressions: How To Choose The Order of A VAR
8 pages
Generalized Linear Models Course Overview
No ratings yet
Generalized Linear Models Course Overview
43 pages
The Three MS: Analysis Data
No ratings yet
The Three MS: Analysis Data
5 pages
Chow Test
No ratings yet
Chow Test
23 pages
Sampling Theory and Method-301-500
No ratings yet
Sampling Theory and Method-301-500
200 pages
Optimization & Stochastic Theory
No ratings yet
Optimization & Stochastic Theory
29 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Statistics Packet
No ratings yet
Statistics Packet
17 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
Time Series Analysis and Forecasting Techniques
100% (1)
Time Series Analysis and Forecasting Techniques
9 pages
RStudio Shortcuts Cheat Sheet
No ratings yet
RStudio Shortcuts Cheat Sheet
3 pages
Newbold Stat8 Ism 02
No ratings yet
Newbold Stat8 Ism 02
26 pages
Advanced Mathematical Statistics II
No ratings yet
Advanced Mathematical Statistics II
192 pages
Mehmetoglu, Jakobsen. Applied S - Mehmet Mehmetoglu
100% (1)
Mehmetoglu, Jakobsen. Applied S - Mehmet Mehmetoglu
493 pages
Introduction to MCMC and Bayesian Stats
No ratings yet
Introduction to MCMC and Bayesian Stats
69 pages
Advanced Statistical Inference Concepts
No ratings yet
Advanced Statistical Inference Concepts
7 pages
Econometric S
No ratings yet
Econometric S
231 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
Data Reduction Principles Explained
No ratings yet
Data Reduction Principles Explained
14 pages
Problems With OLS
No ratings yet
Problems With OLS
8 pages
MTH 216 Statistical Inference 2
No ratings yet
MTH 216 Statistical Inference 2
57 pages
Unit 2
No ratings yet
Unit 2
41 pages
Improved Voltage Control in Power Systems
No ratings yet
Improved Voltage Control in Power Systems
9 pages
Build A Bear
100% (2)
Build A Bear
10 pages
Pakyawlabor2024 09
No ratings yet
Pakyawlabor2024 09
2 pages
Understanding Immediate Memory Types
No ratings yet
Understanding Immediate Memory Types
9 pages
Tables in SAP
No ratings yet
Tables in SAP
20 pages
Zin Obelisk Construction Timeline Guide
No ratings yet
Zin Obelisk Construction Timeline Guide
3 pages
Amithlon Kernel
No ratings yet
Amithlon Kernel
9 pages
Finance & HR Expert Seeking New Role
No ratings yet
Finance & HR Expert Seeking New Role
7 pages
Schedule Test Series@Vision IAS
No ratings yet
Schedule Test Series@Vision IAS
16 pages
LMCC Impact Study - Lindsay Johnson
No ratings yet
LMCC Impact Study - Lindsay Johnson
5 pages
Physics Book Back Questions for 12th STD
No ratings yet
Physics Book Back Questions for 12th STD
5 pages
Being and Time A Revised Edition of The Stambaugh Translation Martin Heidegger Full Digital Chapters
No ratings yet
Being and Time A Revised Edition of The Stambaugh Translation Martin Heidegger Full Digital Chapters
59 pages
Grade 10 Science: Chemical Reactions Exam
No ratings yet
Grade 10 Science: Chemical Reactions Exam
5 pages
Chinese Literature
No ratings yet
Chinese Literature
38 pages
Understanding the Bittorrent Protocol
No ratings yet
Understanding the Bittorrent Protocol
35 pages
ACD - Eca.2408 038 (PIS) EnerG X Park-Educational Trip 20240823-EP
No ratings yet
ACD - Eca.2408 038 (PIS) EnerG X Park-Educational Trip 20240823-EP
3 pages
Quality Circle Report
100% (3)
Quality Circle Report
45 pages
Karthik June24
No ratings yet
Karthik June24
1 page
Grammar Practice Will Future Predictions Worksheetttt
No ratings yet
Grammar Practice Will Future Predictions Worksheetttt
2 pages
M1S014EF
No ratings yet
M1S014EF
55 pages
Software Basics for Beginners
No ratings yet
Software Basics for Beginners
6 pages
Science, Technology, and Society: Lesson 3
No ratings yet
Science, Technology, and Society: Lesson 3
45 pages
Game Design e Análise Do Comportamento
No ratings yet
Game Design e Análise Do Comportamento
2 pages
Lirik Lagu "Easy On Me" oleh Adele
No ratings yet
Lirik Lagu "Easy On Me" oleh Adele
1 page
Newcastle Disease Scientific - & Technico Booklet
No ratings yet
Newcastle Disease Scientific - & Technico Booklet
45 pages
Poset 1
No ratings yet
Poset 1
11 pages
Conan - Shining Kingdoms - Eye of The Vulture
100% (4)
Conan - Shining Kingdoms - Eye of The Vulture
14 pages
Terms of Reference - Tajikistan
No ratings yet
Terms of Reference - Tajikistan
2 pages
BDC With Table Control
No ratings yet
BDC With Table Control
6 pages
ТЕМА 3
No ratings yet
ТЕМА 3
4 pages

Statistical Estimation Essentials

Uploaded by

Statistical Estimation Essentials

Uploaded by

Chapter 6 - Statistical Estimation

Unbiased estimators & Minimum variance,

Method of Moments (M.M.),

Maximum Likelihood estimators (M.L.E.)

The objective of statistics is to make an inference about a population based on information

Some Properties of Point Estimators

If X1 , X2 , . . . , Xn represents a random sample to be taken from a population, then µ̂ = X, where

We use µ̂ to stand for an estimator of µ or an estimate for µ.

ˆ = 71.2 + 60 + 55.3 + 65.4 + 32.7 + 68.8 + 59.6 = 59.

Unbiasedness Property. Let θ̂ be a point estimator of a parameter θ. Then θ̂ is an unbiased

A : If E(θ̂) = θ, then θ is an unbiased estimator of θ̂.

Now Xi ∼ ?(µ, σ 2 ) for i = 1, 2, . . . , n.

(A) If E(θ) = θ̂, then θ is an unbiased estimator of θ̂.

(i) Show that X is an unbiased estimator of µ.

(iii) Another estimator of µ is suggested, namely X e = 2X1 + X2 + 2X3 . Show that X

(iv) Find the variance of X.

Var(X) is smaller than Var(X).

Therefore, X is B.L.U.E. - the Better (BEST) Linear Unbiased Estimator since it is an

(i) Show that X is an unbiased estimator of µ.

(iii) Another estimator of µ is suggested, X e = 3X1 + X2 + X3 + 3X4 . Show that Xe is also

(iv) Find the variance of X.

Var(X) is smaller than Var(X).

Therefore, X is BLUE - the Better Linear Unbiased Estimator since it is a U.M.V.E. -

Example 4: Show that S 2 is an unbiased estimator of σ 2 . Show that D2 is biased. (Assume

Therefore S 2 an unbiased estimator of σ 2 .

Starting again from ******

Therefore, D2 is NOT an unbiased estimator of σ 2 .

Note: S is NOT an unbiased estimator of σ.

A more formal proof is given below.

Some Commonly Used Unbiased Point Estimators

Target Parameter (θ) Sample Size(s) Point Estimator (θ̂) Var(θ̂)

How to Construct Estimators of an Unknown Parameter?

Method of Moments (M.M.)

The basic idea behind this form of the method is to:

(4) Solve for the unknown parameters.

Substitute this into E(X 2 ) = Var(X) + [E(X)]2 and solve for λ.

θ̂M M = 2 (5.5) = 11.

Let’s look at another method for finding estimators of parameters.

Maximum Likelihood Estimation (M.L.E.)

Suppose X1 , X2 , . . . , Xn have joint density function denoted by f (x1 , . . . , xn ; θ) where θ is the

Given observed values X1 = x1 , X2 = x2 , . . . , Xn = xn , the likelihood of θ is the function

L(θ) = L(θ; x1 , . . . , xn ) = f (x1 , . . . , xn ; θ),

and the task is to maximize

Summary of M.L.E. in Common Distributions

Distribution Support Parameter(s) MLE for MLE Same Unbiased

# For X ∼ Gam(α, λ),

^ For X ∼ Uni(0, θ), the maximum of n observations, X(n) has a pdf

Suppose that X ∼ Bin(n, p) with unknown parameter p, the pmf is

The likelihood function is  

The log-likelihood function is

Suppose that X ∼ Poi(λ) with unknown parameter λ, the pf is

The likelihood function is

`(λ; x) = ln L(λ; x) = −λ + x ln λ − ln x!.

Suppose that X ∼ Geo(p) where p is an unknown parameter, with pmf

p(x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . .

[Here X is the number of trial(s) until the first success.]

The likelihood function is

`(p; x) = ln L(p; x) = (x − 1) ln(1 − p) + ln p.

Negative Binomial distribution

Suppose that X ∼ NegBin(n, p) where p is an unknown parameter, with pmf

[Here X is the number of trial(s) until the r-th success.]

The likelihood function is  

Suppose that X ∼ Exp(λ) with unknown parameter λ, the pdf is

f (x) = λe−λx , for x > 0.

The likelihood function is

`(λ; x) = ln L(λ; x) = ln λ − λx.

Gamma distribution with known shape parameter α

Suppose that X ∼ Gam(α, λ) with unknown rate parameter λ, the pdf is

The likelihood function is

Normal distribution with known variance parameter σ 2

Suppose that X ∼ N(µ, σ 2 ) with unknown mean parameter µ, the pdf is

The likelihood function is

Normal distribution with known mean parameter µ

The log-likelihood function is

Normal distribution with no known parameters

With n observations, the likelihood function is

The likelihood function is

The likelihood function is