0% found this document useful (0 votes)

101 views30 pages

Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

This document provides an overview of methods for finding estimators, including the method of moments and maximum likelihood estimation. It defines key terms like estimators, estimates, and likelihood functions. The method of moments involves equating sample moments to population moments to obtain estimators. Maximum likelihood estimation involves finding the parameter values that maximize the likelihood function given the sample data. The document gives several examples of applying these methods to distributions like the normal, binomial, and Bernoulli. It also discusses properties of maximum likelihood estimators like invariance.

Uploaded by

riza amores

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views30 pages

Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Uploaded by

riza amores

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 7 for BST 695: Special Topics in Statistical Theory.

Kui Zhang, 2011

Chapter 7 – Methods of Finding Estimators

Section 7.1 – Introduction

Definition 7.1.1 A point estimator is any function W ( X)  W ( X 1 , X 2 ,, X n ) of a sample; that is, any statistic is a
point estimator.

Notes:
 estimator: function of the sample ( X  ( X 1 , X 2 , , X n ))

 estimate: realized value (a number) of an estimator ( ( x  ( x1 , x2 ,, xn )) )

Section 7.2 – Methods of Finding Estimators

7.2.1 Method of Moments (MME)

Notes:
 oldest method dating back at least to Karl Pearson in the late 1800s
 idea is simple, however, oftentimes, the resulting estimators may still be improved

1
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Let X 1 , , X n be iid from pmf or pdf f ( x | 1 , , k ) , we have

1 n
1st sample moment: m1   Xi
n i 1
1st population moment: 1'  EX  1' (1 ,, k )


1 n k
k th sample moment: mk   Xi
n i 1
k th population moment:  k'  EX k  k' (1 ,, k )

To get MME: “Equate” the first k sample moments to the corresponding k population moments and solve these
1 n 1 n 2 1 n k
equations for (1 ,, k ) in terms of (m1 ,, mk )  ( 
n i 1
X 1 , 
n i 1
X i , ,  Xi ) .
n i 1

Example 7.2.1 (Normal method of moments) Suppose X 1 , , X n are iid from a n(  , 2 ) . In this case, k  2 and

1   and  2   2 .
1 n 1 n 2
Solution:   
n i 1
X i  X and  2
  2
  X i , we can get:
n i 1

2
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

1 n n 1 2
  X and  2 
n
 i 1
( X i  X )2 
n
S .

Example 7.2.2 (Binomial method of moments) Suppose X 1 , , X n are iid from a binomial (m, p) where both m
and p are unknown. In this case, k  2 and 1  m and  2  p .
1 n 2
Solution: From X  mp and
n
 X  mp (1  p )  m 2 p 2  mp (1  p  mp ) , we have:
i 1 i

X2 X
m  and p  .
X  1/ n i 1 ( X i  X ) 2 m
n

Note: Method of moments may give estimates that are outside the range of the parameters.

7.2.2 Maximum Likelihood (MLE)

Let X 1 , , X n be iid from pmf or pdf f ( x | 1 , , k ) . The likelihood function is defined by

L(θ | x)  L(1 ,, k | x1 ,, xn )   i 1 f ( xi |1 ,, k ) .

3
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Definition 7.2.4 For each sample point x , let ˆ( x) be a parameter value at which L(θ | x) attains its maximum as a
function of θ , with x held fixed. A maximum likelihood estimator (MLE) of the parameter θ based on a sample
X is ˆ( X) .

Notes:
1. Finding the MLE can be difficult in some cases.
2. MLE may be obtained through differentiation but in some cases differentiation will not work.
3. When differentiation will be used to find the MLE, it will be easier to deal with the natural log of the
likelihood.
4. Maximization should be only over the range of the parameter.
5. If MLE cannot be obtained analytically, it can be obtained numerically.

Example 7.2.5 (Normal likelihood) Let X 1 , , X n are iid from a n(  ,1) . Show that X is the MLE of  using
derivatives.
Solution:
d
Step 1: find the solution from the equation: L(  | x)  0 , which gives the possible solutions.
d

d2
Step 2: verify the solution achieves the global maximum ( L(  | x) |  x  0 in this case).
d2

4
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Step 3: check the boundaries (    in this case; it is not necessary in this case).

1 n
Example 7.2.6 Recall Theorem 5.2.4 (p. 212) part (a): If x1 , , xn are any numbers and x   xi , then
n i 1
For any real numbers, we have:

 ( xi  a ) 2   i 1 ( xi  x ) 2 .
n n
i 1

with equality if and only if a  x . This implies that for any  ,

1 n 1 n
exp(
2
 i 1
( xi   ) 2
)  exp( 
2
 i 1
( xi  x )2 )

with equality if and only if   x . So X is MLE.

Example 7.2.7 (Bernoulli MLE) Let X 1 , , X n are iid Bernoulli( p ). Find the MLE of p where 0  p  1 . Note
that we include the possibility that p  0 or p  1 .
Solution: Use the natural log of the likelihood function.

Example 7.2.8 (Restricted range MLE) Let X 1 , , X n are iid from a n( ,1) , where   0 .

Solution: Without any restriction, X is the MLE. So when x  0 , ˆ  x . When x  0 , L( | x) achieves its

maximum at ˆ  0 for   0 , so ˆ  0 in this situation. In summary:

5
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

 X , X  0;
ˆ  XI[0, ) ( X )  
0, X  0.

Example 7.2.9 (Binomial MLE, unknown number of trials) Let X 1 , , X n are iid Binomial( k , p ). Find the
MLE of k where p is known and k is unknown. (Example where differentiation will not be used to obtain the
MLE.)
Solution: The likelihood function is:
n k 
L(k | p, x)   i 1   p xi (1  p ) n  xi .
 xi 
Then consider the ratio: L( k | p, x) / L( k  1| p, x)

Invariance Property of Maximum Likelihood Estimators

Definition Consider a function  ( ) which may not necessarily be one-to-one function so that for a given value  ,
there may be more than one  value such that  ( )   . The induced likelihood function, L * , of  ( ) is given by:
L * ( | x)  sup{ : ( ) } L( | x) .

The value ̂ that maximizes L * ( | x) will be called the MLE of    ( ) .

6
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Theorem 7.2.10 (Invariance Property of MLEs) If ˆ is the MLE of  , then for any function  ( ) , the MLE of

 ( ) is  (ˆ) .

Example Let X 1 , , X n be iid n( ,1) , the MLE of  2 is X 2 .

Example Let X 1 , , X n be iid binomial( k , p ) where k is known and p is unknown. Find the MLE of the variance
and standard deviation of X 1 .
Solution:
  kp(1  p) , so ˆ  kpˆ (1  pˆ )  kX (1  X ) .

  kp(1  p) , so ˆ  kpˆ (1  pˆ )  kX (1  X ) .

Example Let X 1 , , X n be iid Poisson(  ). Find the MLE of P( X  0) .

Solution: The MLE of  is: ˆ  X . Because P( X  0)  exp( ) , so the MLE of P( X  0) is exp( X ) .

Note: Theorem 7.2.10 includes the multivariate case. If the MLE of (1 ,, k ) is (ˆ1 ,,ˆk ) , and if  (1 , , k ) is
any function of the parameter vector, then by the invariance property of the MLE, the MLE of  (1 , , k ) is

 (ˆ1 , ,ˆk ) .

7
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Example 7.2.11 (Normal MLEs,  and  2 unknown) Let X 1 , , X n are iid from a n(  , 2 ) where both  and

1 n n 1 2
 2 are unknown. Then the MLE of  is ˆ  X and the MLE of  2 is ˆ 2 
n
 i 1
( X i  X )2 
n
S .

Solution: Verify these estimators using (a) univariate calculus (this Example, Example 7.2.11) and (b) multivariate
calculus (Example 7.2.12).

Notes:
1. MLE is susceptible to problems associated with numerical instability if the MLEs do not have explicitly
expression.
2. How sensitive is the MLE to measurement error in the data? (see Example 7.2.13.)

7.2.3 Bayes Estimators

Bayesian Approach to Statistics

 The parameter  is a random quantity described by a probability distribution known as the prior distribution.
 A sample is then taken from the population indexed by  .
 The prior distribution is updated with this sample information to get what is known as the posterior
distribution using Bayes’ rule (Theorem 1.3.5 p. 23). Let  ( ) denote the prior distribution of  and let

8
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

f (x |  ) be the sampling distribution of the sample. The posterior distribution of  given the sample x is
f (x |  ) ( )
given by  ( | x)  , where m(x) is the marginal distribution of x , i.e., m( x)   f ( x |  ) ( ) d .
m( x )
 The posterior distribution is then used to make statements about  . For instance, the mean of the posterior
distribution may be used as a point estimate of  .

Example 7.2.14 (Binomial Bayes estimation) Let X 1 , , X n be iid Bernoulli( p ), where p is unknown. Then

Y   i 1 X i is binomial( n, p ). We assume the prior distribution on p is beta(  ,  ). The posterior distribution of

p given Y  y , f ( p | y ) , is beta( y   , n  y   ). Hence the Bayes estimate for p is the mean of the posterior
distribution, i.e.,
y 
pˆ B  .
a n
Note that the mean of the prior distribution is

 
and pˆ B may be written as
n y   
pˆ B   .
  n n   n 

9
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Hence, the Bayes estimator is a linear combination of the sample mean and the prior mean with weights
determined by  ,  , and n .

Note: The prior and the posterior distributions are both beta distribution.

Definition 7.2.15 Let  denote the class of pdfs or pmfs f ( x |  ) (indexed by  ). A class  of prior distributions
is a conjugate family for  if the posterior distribution is the class of  for all f   , all priors in  , and all
x.

Example 7.2.16 (Normal Bayes estimation) Let X ~ n( , 2 ) where  2 is known. We assume the prior

distribution on  is n(  , 2 ) . The posterior distribution of  given X  x , is also normal (Homework problem)

with mean
2 2
E ( | x)  x  
 2  2  2  2
and variance
 2 2
Var ( | x)  2 .
  2

Notes:

10
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

1. The normal family is its own conjugate family.

2. If the prior information is vague (i.e.,  2 is very large) then more weight is given to the sample data.
3. If the prior information is good (i.e.,  2   2 ) then more weight is given to the prior mean.

Section 7.3 – Methods of Evaluating Estimators

7.3.1 Mean Squared Error

Definition 7.3.1 The mean squared error (MSE) of an estimator W of a parameter  is the function defined by
MSE  E (W   ) 2  VarW  ( BiasW ) 2 ,
where BiasW  EW   .

Definition 7.3.2 The bias of a point estimator W of a parameter  is the difference between the expected value of
W and  . An estimator whose bias is identically (in  ) equal to 0 is called unbiased and satisfies EW   for all
.

If W is unbiased then
MSE  E (W   ) 2  VarW .

11
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Example 7.3.3 (Normal MSE) Let X 1 , , X n be iid from a n(  , 2 ) . We know that X and S 2 are unbiased

estimators of  and  2 , respectively,

EX   and ES 2   2

(which is true even without normality see Theorem 5.2.6), i.e., for all  and  2 .
Thus
MSE ( X )  E ( X   ) 2   2 / n ,
and
2 4
MSE ( S )  E ( S   )  VarS 
2 2 2 2 2
.
n 1
(Recall that (n  1) S 2 /  2 ~ Chi-square with n  1 degrees of freedom, which is gamma (  (n  1) / 2,   2) and

Var (Y )   2 if Y ~ gamma ( ,  ) .)

Example 7.3.4 Let X 1 , , X n be iid from a n(  , 2 ) . Recall that the MLE (and MME) of  2 is
1 n n 1 2
ˆ 2  
n i 1
( X i  X ) 2

n
S .

Note that

12
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

n 1 2 n 1 2
E (ˆ 2 )  E ( S )  .
n n
and
n 1 2 n 1 2(n  1) 4
Varˆ 2  Var ( S )  2 Var ( S 2 )   ,
n n n2
so that
2(n  1) 4 1 4 2n  1 4
MSE (ˆ 2 )  Var (ˆ 2 )  [ Bias (ˆ 2 )]2    2  2  .
n2 n n
From these formulas, you can verify that ˆ 2 has the smaller MSE than S 2 .

Example 7.3.5 (MSE of binomial Bayes estimators) Suppose X 1 , , X n are iid from a Bernoulli( p ).

(1) MLE: p̂  X is unbiased estimator for p and

p (1  p )
MSE ( pˆ )  E p ( pˆ  p) 2  Varp ( X )  .
n
Y  np     (   ) p
(2) Bayes estimator: pˆ B  is a Biased estimator because E p ( pˆ B )   p .
  n   n   n
The MSE of pˆ B is

13
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

MSE ( pˆ B )  Varp ( pˆ B )  [ E p ( pˆ B  p)]2

Y    p   p 2
 Varp ( )( )
  n   n
np (1  p )   p   p 2
 ( ).
(    n) 2
  n

n Y  n/4
If we choose     n / 4 , we have MSE ( pˆ B )  as a constant and pˆ B  . In this situation,
4(n  n ) 2
n n
we can determine which of these two estimators is better in terms of the MSE.

Skip equivariance example (Example 7.3.6)

7.3.2 Best Unbiased Estimator

Consider the class of estimators

C  {W : EW   ( )} .
For any W1 ,W2  C , we have BiasW1  BiasW2   ( )   , so

MSE (W1 )  MSE (W2 )  E (W1   ) 2  E (W2   ) 2  Var (W1 )  Var (W2 ) .

14
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Definition 7.3.7 An estimator W * is a best unbiased estimator of  ( ) if it satisfies EW *   ( ) for all  and, for
any other estimator W with EW   ( ) , we have Var (W *)  Var (W ) for all  . W * is also called a uniform
minimum variance unbiased estimator (UMVUE) of  ( ) .

Note:
 UMVUE may not necessarily exist.
 If UMVUE exists, it is unique (Theorem 7.3.19).

Example 7.3.8 (Poisson unbiased estimation) Let X 1 , , X n be iid from a Poisson(  ). Note that E ( X )   and

E S 2   for all  . Thus, both X and S 2 are unbiased estimators of  .

Also, note that the class of estimators given by
Wa ( X , S 2 )  aX  (1  a) S 2
is a class of unbiased estimators for 0  a  1 .
To determine which estimator has the smallest MSE, we need to calculate Var ( X ) , Var ( S 2 ) , and

Var (aX  (1  a) S 2 ) . The calculation can be very lengthy.

The question here is, how can we find the best, i.e., smallest variance, of these unbiased estimators?

15
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Theorem 7.3.9 (Cramer-Rao inequality) Let X 1 , , X n be a sample with pdf f (x |  ) and let
W ( X)  W ( X 1 , , X n ) be any estimator satisfying
d 
EW ( X)   [W (x) f (x |  )]dx
d  

and
Var (W ( X))   .
Then
d
( EW ( X)) 2
Var (W ( X))  d .

E (( log f ( X |  )) )
2


where log is the natural logarithm.

Corollary 7.3.10 (Cram´er-Rao inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and,
additionally, X 1 , , X n are iid with pdf f ( x |  ) , then
d
( EW ( X)) 2
Var (W ( X))  d .

nE (( log f ( X |  )) )
2



Notes:

16
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011


1. The quantity E (( log f ( X |  )) 2 ) is called information number, or Fisher information, of the sample.

2. The information number gives a bound on the variance of the best unbiased estimator of  .
3. As the information number increases, we have more information about  , and we have a smaller bound.

The following lemma helps in the computation of the CRLB.

Lemma 7.3.11 If f ( x |  ) satisfies

d   
E ( log f ( X |  ))   [( log f ( x |  )) f ( x |  )]dx
d   

(true for an exponential family), then

 2
E {[ log f ( X |  )] }   E ( 2 log f ( X |  )) .
2

 

Example 7.3.12 Recall the Poisson problem. We will show that X is the UMVUE of  .

Note:

17
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

 Key assumption of the Cram´er-Rao Theorem is that one can differentiate under the integral sign. Below is
an example where this assumption is not satisfied.

Example 7.3.13 (Unbiased estimator for the scale uniform) Let X 1 , , X n be iid with pdf
f ( x |  )  1 /  ,0  x   .

Note:
 Cramer-Rao Lower Bound (CRLB) is not guaranteed to be sharp, i.e., there is no guarantee that the CRLB
can be attained.

Example 7.3.14 (Normal variance bound) Let X 1 , , X n be iid n(  , 2 ) . We have:

2 4 2 4
CRLB  but Var ( S ) 
2
,
n n 1
hence S 2 has variance larger than the CRLB.

Question:
 How do we know if there exists an unbiased estimator that achieves the CRLB?

18
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Corollary 7.3.15 (Attainment) Let X 1 , , X n be iid with pdf f ( x |  ) , where f ( x |  ) satisfies the conditions of

the Cramer-Rao Theorem. Let L( | x)   i 1 f ( xi |  ) denote the likelihood function. If W ( X)  W ( X 1 , , X n ) is

any unbiased estimator of  ( ) , then W ( X) attains the CRLB if and only if


a ( )[W ( X)   ( )]  log L( | x)

for some function a ( ) .

Example 7.3.16 Recall the normal problem.

1 1
L (  ,  2 | x)  exp{ 2  i 1 ( xi   ) 2 } ,
n

(2 ) n /2
2
so that

 ( xi   ) 2
n
 n
log L(  , | x) 
2
( i 1
 2).
 2
2 4
n
1 n
If μ is known, CRLB is achieved and the UMVUE is W ( X) 
n
 i 1
( X i   ) 2 . Otherwise, no unbiased estimator

of  2 will achieve the CRLB.

Question:

19
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

1. What can we do to find the “best” estimator if f ( x |  ) does not satisfy the assumptions of the Cramer-Rao
Theorem.
2. What if the CRLB is not attainable, how do we know if our estimator is the “best”?

7.3.3 Sufficiency and Unbiasedness

Recall two important results:

E ( X )  E[ E ( X | Y )] and Var ( X )  Var[ E ( X | Y )]  E[Var ( X | Y )] .

Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of  ( ) , and let T be a sufficient statistic for
 . Define  (T )  E (W | T ) . Then E ( (T ))   ( ) and Var ( (T ))  Var (W ) for all  ; i.e.,  (T ) is a uniformly
better unbiased estimator of  ( ) .

Notes:
1. Conditioning any unbiased estimator on a sufficient statistic will result in an improved estimator.
2. To find the UMVUE, only need to consider functions of the sufficient statistic.
3. Sufficiency is needed so that the resulting quantity (estimator) after conditioning on the sufficient statistic
will not depend on  .

20
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Example 7.3.18 (Conditioning on an insufficient statistic) Let X 1 and X 2 be iid from n( ,1) . Then X is an
unbiased estimator (and a sufficient statistic) of  . Suppose we condition X on X 1 which is not a sufficient

statistic. Let  ( X 1 )  E ( X 1 )  E ( X | X 1 ) . Then  ( X 1 ) is unbiased for  and has a smaller variance than X but is
not a valid estimator.

Theorem 7.3.19 If W is a best unbiased estimator of  ( ) , then W is unique.

Let W be such that E (W )   ( ) and let U be such that E (U )  0 for all  . Then
a  W  aU ,
where a is a constant forms a class of unbiased estimators of  with
Var (a )  VarW  2aCov (W ,U )  a 2VarU .

Question:
 Which is a better estimator, W or a ?

Theorem 7.3.20 If E (W )   ( ) , W is the best unbiased estimator of  ( ) if and only if W is uncorrelated with
all unbiased estimators of 0.

21
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Example 7.3.21 (Unbiased estimators of 0) Let X be an observation from uniform ( ,  1) distribution. Then
 1 1 1
EX   xdx    and Var X  .
 2 12
1 1
Therefore, X  is an unbiased estimator of  . We will show that X  is correlated with an unbiased estimator
2 2
of 0, and hence cannot be a best unbiased estimator of  .

Note:
 If a family of pdfs f ( x |  ) has the property that there are no unbiased estimators of 0 other than 0 itself, then
our search would be ended since Cov (W ,0)  0 . What is this property called?

n 1
Example 7.3.22 (continuation of Example 7.3.13) Let X 1 , , X n be iid uniform (0, ) Then Y where
n
Y  X ( n ) is an unbiased estimator of  .

Solution:
1. Conditions of Cramer-Rao Theorem were not satisfied.
2. By Rao-Blackwell Theorem, we only need to consider unbiased estimator of  based on Y .
3. Y is a complete sufficient statistic, therefore Y is uncorrelated with all unbiased estimators of 0 since
this would just be 0 itself.

22
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

n 1
4. Y is the best unbiased estimator of  .
n

Important Note:
 What is critical is the completeness of the family of distributions of the sufficient statistics not the
completeness of the original family.

Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter  , and let  (T ) be any estimator based
only on T . Then  (T ) is the unique best unbiased estimator of its expected value.

Result:
 If T is a complete sufficient statistic for a parameter  and h( X 1 , , X n ) is any unbiased estimator of  ( ) ,
then  (T )  E[ h( X 1 , , X n ) | T ] is the unique best unbiased estimator of  ( ) .

Example 7.3.24 (Binomial best unbiased estimation) Let X 1 , , X n be iid binomial (k , ) . We want to estimate

 ( )  P ( X  1)  k (1   )k 1 .

 X i ~ binomial ( kn, ) is a complete sufficient statistic for  .

n
Solution: Recall that i 1

23
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Question:
 How about an unbiased estimator for  ( ) ? Once we find an unbiased estimator, how do we get the best
unbiased estimator?

7.3.4 Loss Function Optimality

Decision Theory:
 Setting: Observed data: X  x where X ~ f (x |  ) .
 Let  = action space, i.e., set of allowable decisions regarding  .

Definition: Loss function is a nonnegative function that generally increases as the distance between an action, a ,
and  increases.

Note:
 L( , )  0 (What does this mean? – the loss is minimum if the action is correct)

If  is real-valued, two commonly used loss functions are

 absolute error loss L( , a ) |   a | : more penalty on small discrepancies

24
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

 squared error loss L( , a)  (  a) 2 : more penalty on large discrepancies

 Other examples:
(  a ) 2 , a   ;
L( , a )  
10(  a ) , a   .
2

which penalizes overestimation more than underestimation.

 L( , a)  [(  a ) 2 ] / (|  | 1) , penalizes errors in estimation more if  is near 0 than if |  | is large

Definition: In decision theoretic analysis, the quality of an estimator,  ( X) , is quantified by its risk function
defined by
R ( ,  )  E L( ,  ( X)) ,
i.e., at a given  the risk function is the average loss that will be incurred if the estimator  ( X) is used.

Notes:
 MSE is an example of a risk function with respect to the squared error loss.
R( ,  )  E L( ,  ( X))  E (   ( X)) 2  Var ( ( X))  ( Bias ( ( X))) 2 .

We want to find an estimator that has a small risk function for all  relative to another estimator. However, most
of the time the risk functions of two estimators cross.

25
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Example 7.3.25 (Binomial risk functions) Recall Example 7.3.5 comparing the Bayes estimator and the MLE of
the Bernoulli parameter p .


n
Xi  n / 4 1 n
pˆ B  i 1

n n
and pˆ  X   Xi .
n i 1

Example 7.3.26 (Risk of normal variance) Let X 1 , , X n be iid from n(  , 2 ) . We want to estimate  2

considering estimators of the form  b ( X)  bS 2 .

2 4
Solution: Recall that ES   and for normal samples Var ( S ) 
2 2
. 2

n 1
The risk function with respect to the squared error loss is
R ((  , 2 ),  b )  Var (bS 2 )  ( E (bS 2 )   2 ) 2
 b 2Var (bS 2 )  (b 2   2 ) 2
2b 2
[  (b  1) 2 ] 4 .
n 1

Notes:
1. The resulting risk function does not depend on  .

26
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

n 1
2. b value that minimizes this risk function is given by b  . Thus, for every value of (  , 2 ) , the
n 1
estimator with the smallest risk among all estimators of the form  b ( X)  bS 2 is
n 1 2 1 2
S   i
n
S  ( X  X ) (See Figure 7.3.2 p. 351 for n=5).
n 1 n  1 i 1

Example 7.3.27 (Variance estimation using Stein’s loss) Let X 1 , , X n be iid from a population with positive

finite variance,  2 . We want to estimate  2 .

Solution: Considering estimators of the form  b ( X)  bS 2 and the loss function
a a
L( 2 , a)   1  log (attributed to Stein)
 2
2
In this case, the risk function is given by
bS 2 bS 2 S2
R( 2 ,  b )  E (  1  log )  b  1  log b  E (log ).
2 2 2
S2
Note that E (log ) does not depend on b . To minimize this risk function, we find b that minimizes b  log(b)
2
which is when b  1. Hence, the estimator with the smallest risk for all values of  2 is
1
1 ( X )  S 2  
n
( X i  X )2 .
n 1 i 1

27
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Bayesian Approach to Loss Function

Definition: Given a prior distribution  ( ) , the Bayes risk


R( , ) ( )d   (  L( ,  (x)) f (x |  )dx) ( )d
 

and the estimator that results in the smallest value of the Bayes risk is known as the Bayes rule with respect to a
prior  ( ) .

Note that

 
R( , ) ( )d   [  L( , (x)) ( | x)d ]m(x)dx
 

where the quantity in the square brackets is known as the posterior expected loss.

The action  ( X) that minimizes the posterior expected loss will also minimize the Bayes risk.

Example 7.3.28 (Two Bayes rules) Suppose we want to estimate  .

1. For the squared error loss, the posterior expected loss is


(  a)2  ( | x)d  E ((  a)2 | X  x) ,

28
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

where  ~  ( | x) . This is minimized by   (x)  E ( | x) so that the Bayes rule is the mean of the posterior
distribution (Example 2.2.6).

2. For the absolute error loss, the posterior expected loss is

 |   a |  ( | x)d  E (|   a || X  x)


minimized by   (x)  E ( | x) = median of the posterior distribution (Exercise 2.18).

Example 7.3.29 (Normal Bayes estimates) Let X 1 , , X n be iid from n( , 2 ) and let  ( ) be n(  , 2 ) , where

 2 ,  , 2 are known. From Example 7.2.16 and your homework problem (Exercise 7.22), the posterior distribution
of  given X  x is normal with mean
2 2 /n
E ( | x )  x  
 2  ( 2 / n)  2  ( 2 / n)
and
 2 2 / n
Var ( | x )  .
 2  2 / n
1. For the squared error loss,
  (x)  E ( | x )
2. For the absolute error loss,

29
Chapter 7 for BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

  (x) = median of the posterior distribution = E ( | x ) .

Lecture 13
No ratings yet
Lecture 13
12 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
8 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
No ratings yet
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
5 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Module 4
No ratings yet
Module 4
3 pages
DSAI514 Lec2 Point Estimation Part 3
No ratings yet
DSAI514 Lec2 Point Estimation Part 3
21 pages
Ch2 Prob II NAU
No ratings yet
Ch2 Prob II NAU
15 pages
College Statistics
No ratings yet
College Statistics
244 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Inf 2
No ratings yet
Inf 2
37 pages
Beamer 7
100% (1)
Beamer 7
92 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Handout 6 (Chapter 6) : Point Estimation: Unbiased Estimator: A Point Estimator
No ratings yet
Handout 6 (Chapter 6) : Point Estimation: Unbiased Estimator: A Point Estimator
9 pages
00 Estimation
No ratings yet
00 Estimation
33 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
45 pages
Chapter 1.1 Mle
No ratings yet
Chapter 1.1 Mle
8 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
Chapitre 10 - Construction of Estimators
No ratings yet
Chapitre 10 - Construction of Estimators
35 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
(MLE) - MLE-vs-Bayes
No ratings yet
(MLE) - MLE-vs-Bayes
11 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Ch-1 5
No ratings yet
Ch-1 5
11 pages
Week 6 Mle
No ratings yet
Week 6 Mle
41 pages
Intro to Point Estimation Methods
100% (1)
Intro to Point Estimation Methods
22 pages
Par Est
No ratings yet
Par Est
36 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Chapter10 Solutions
No ratings yet
Chapter10 Solutions
62 pages
Maximum
No ratings yet
Maximum
3 pages
PSLecture18 2022
No ratings yet
PSLecture18 2022
100 pages
Unit 4 1lec 5
No ratings yet
Unit 4 1lec 5
6 pages
Handout 6 (Chapter 6) : Point Estimation: Unbiased Estimator: A Point Estimator
No ratings yet
Handout 6 (Chapter 6) : Point Estimation: Unbiased Estimator: A Point Estimator
9 pages
7 Mle
No ratings yet
7 Mle
31 pages
Point Estimation Techniques Explained
No ratings yet
Point Estimation Techniques Explained
53 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
NOTES
No ratings yet
NOTES
14 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
No ratings yet
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
23 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
13 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
6 pages
Lecture 2727K19EN
No ratings yet
Lecture 2727K19EN
15 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
An Introduction To Classical Statistics
No ratings yet
An Introduction To Classical Statistics
15 pages
Slide 8 01
No ratings yet
Slide 8 01
37 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Point Estimatiors
No ratings yet
Point Estimatiors
52 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Understanding Law: Key Concepts
No ratings yet
Understanding Law: Key Concepts
4 pages
Integration by Reduction To Partial Fractions: Math 12 - Calculus 2
100% (1)
Integration by Reduction To Partial Fractions: Math 12 - Calculus 2
19 pages
9th Century: When It Started Branch of Science
No ratings yet
9th Century: When It Started Branch of Science
4 pages
Inferences AND Preliminary Parts Research Defense
No ratings yet
Inferences AND Preliminary Parts Research Defense
11 pages
Guidelines IN Making THE Final Draft Rade EEK April: Research Project Inquiry Immersion AND Investigation
No ratings yet
Guidelines IN Making THE Final Draft Rade EEK April: Research Project Inquiry Immersion AND Investigation
3 pages
Guidelines IN Making THE Final Draft Rade EEK April: Research Project Inquiry Immersion AND Investigation
No ratings yet
Guidelines IN Making THE Final Draft Rade EEK April: Research Project Inquiry Immersion AND Investigation
3 pages
Four Basic Operations On Sets
100% (2)
Four Basic Operations On Sets
4 pages
Learning Activities #1 and 2
No ratings yet
Learning Activities #1 and 2
6 pages
Kinship, Marriage, and Households
No ratings yet
Kinship, Marriage, and Households
5 pages
Science, Technology, Engineering and Mathematics Grade 12
100% (1)
Science, Technology, Engineering and Mathematics Grade 12
4 pages
Understanding Non-State Institutions
No ratings yet
Understanding Non-State Institutions
5 pages
Worksheet No
No ratings yet
Worksheet No
8 pages
Magnetic Force on Current-Carrying Wires
No ratings yet
Magnetic Force on Current-Carrying Wires
5 pages
Understanding Culture, Society, and Politics (G 12) W 3 M 23-27, 2020
100% (1)
Understanding Culture, Society, and Politics (G 12) W 3 M 23-27, 2020
2 pages
Thermochemistry: What Is The Difference Between Exothermic and Endothermic Reactions?
No ratings yet
Thermochemistry: What Is The Difference Between Exothermic and Endothermic Reactions?
7 pages
Gen Physics 2
No ratings yet
Gen Physics 2
10 pages
Module in 2: General Chemistry
No ratings yet
Module in 2: General Chemistry
5 pages
Science, Technology, Engineering and Mathematics Grade 12 General Physics 2
No ratings yet
Science, Technology, Engineering and Mathematics Grade 12 General Physics 2
5 pages
Module in RP Week1
100% (1)
Module in RP Week1
4 pages
Module Cpar 70s To Comtemporary
100% (1)
Module Cpar 70s To Comtemporary
4 pages
Physical and Chemical Properties of Matter
No ratings yet
Physical and Chemical Properties of Matter
1 page
Classification of Matter
No ratings yet
Classification of Matter
1 page
Resistive Sensing Elements - POT, RTD, Thermistor
No ratings yet
Resistive Sensing Elements - POT, RTD, Thermistor
33 pages
EVAP System Flow Analysis Using CAE
No ratings yet
EVAP System Flow Analysis Using CAE
27 pages
Home Store Blog Schematics Tutorials Downloads Contact: Ba1404 Hi-Fi Stereo FM Transmitter 88 - 108 MHZ
No ratings yet
Home Store Blog Schematics Tutorials Downloads Contact: Ba1404 Hi-Fi Stereo FM Transmitter 88 - 108 MHZ
2 pages
Trigonometry Unit 2 Review Math 12P
No ratings yet
Trigonometry Unit 2 Review Math 12P
3 pages
Visscom 1
No ratings yet
Visscom 1
12 pages
10 1016@j Jhydrol 2020 124870
No ratings yet
10 1016@j Jhydrol 2020 124870
14 pages
Simulation of Electric Field Distribution Around Water Droplets On Outdoor Insulator Surfaces
No ratings yet
Simulation of Electric Field Distribution Around Water Droplets On Outdoor Insulator Surfaces
5 pages
Microsoft Excel Assignment 1
No ratings yet
Microsoft Excel Assignment 1
3 pages
SEM Petrology Atlas
100% (2)
SEM Petrology Atlas
247 pages
Merge Sort Algorithm Guide
No ratings yet
Merge Sort Algorithm Guide
18 pages
Hypothesis Testing for Mean Salaries and Costs
No ratings yet
Hypothesis Testing for Mean Salaries and Costs
4 pages
Installation, Operating and Service Instructions For Burnham Series 2 Gas-Fired Boiler
100% (1)
Installation, Operating and Service Instructions For Burnham Series 2 Gas-Fired Boiler
52 pages
Organic Chemistry Test Guide
No ratings yet
Organic Chemistry Test Guide
6 pages
Zero-Tail Swing Mini-Excavator Guide
No ratings yet
Zero-Tail Swing Mini-Excavator Guide
2 pages
Duo-Fine® 1401 Series Filter Cartridges
No ratings yet
Duo-Fine® 1401 Series Filter Cartridges
2 pages
WS Grade 10 IG Chemistry 24-25 - Organic Chemistry - 1
No ratings yet
WS Grade 10 IG Chemistry 24-25 - Organic Chemistry - 1
3 pages
Java Cryptography Algorithms Guide
No ratings yet
Java Cryptography Algorithms Guide
16 pages
Oracle 9i SQL & Admin Training
No ratings yet
Oracle 9i SQL & Admin Training
55 pages
Microsoft Access Database Tutorial
No ratings yet
Microsoft Access Database Tutorial
28 pages
Chm102 CBT CA Questions
No ratings yet
Chm102 CBT CA Questions
95 pages
Driver Information for Windows 7
No ratings yet
Driver Information for Windows 7
53 pages
Key-Value Databases Guide
No ratings yet
Key-Value Databases Guide
5 pages
Servo Motor iSV2-60TR Specs
No ratings yet
Servo Motor iSV2-60TR Specs
1 page
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
No ratings yet
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
4 pages
Japanese Shotcrete Slope Stabilization
No ratings yet
Japanese Shotcrete Slope Stabilization
5 pages
Metric Conversion Chart: Metric Length Measurement: Word Problems
No ratings yet
Metric Conversion Chart: Metric Length Measurement: Word Problems
1 page
Siemens 3G Wireless Standards Overview
No ratings yet
Siemens 3G Wireless Standards Overview
20 pages
Fehling's and Tollen's Reagent Guide
No ratings yet
Fehling's and Tollen's Reagent Guide
8 pages
G Án Unit 4-B-Food and Drink
100% (1)
G Án Unit 4-B-Food and Drink
7 pages
Team 20 Technical Project Report For The 2018 IREC: Invictus I
No ratings yet
Team 20 Technical Project Report For The 2018 IREC: Invictus I
65 pages

Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Uploaded by

Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011

Uploaded by

Chapter 7 for BST 695: Special Topics in Statistical Theory.

Kui Zhang, 2011

Chapter 7 – Methods of Finding Estimators

Section 7.1 – Introduction

 estimate: realized value (a number) of an estimator ( ( x  ( x1 , x2 ,, xn )) )

Section 7.2 – Methods of Finding Estimators

7.2.1 Method of Moments (MME)

Let X 1 , , X n be iid from pmf or pdf f ( x | 1 , , k ) , we have

7.2.2 Maximum Likelihood (MLE)

Let X 1 , , X n be iid from pmf or pdf f ( x | 1 , , k ) . The likelihood function is defined by

L(θ | x)  L(1 ,, k | x1 ,, xn )   i 1 f ( xi |1 ,, k ) .

with equality if and only if a  x . This implies that for any  ,

with equality if and only if   x . So X is MLE.

maximum at ˆ  0 for   0 , so ˆ  0 in this situation. In summary:

Invariance Property of Maximum Likelihood Estimators

The value ̂ that maximizes L * ( | x) will be called the MLE of    ( ) .

Example Let X 1 , , X n be iid n( ,1) , the MLE of  2 is X 2 .

Example Let X 1 , , X n be iid Poisson(  ). Find the MLE of P( X  0) .

Solution: The MLE of  is: ˆ  X . Because P( X  0)  exp( ) , so the MLE of P( X  0) is exp( X ) .

7.2.3 Bayes Estimators

Bayesian Approach to Statistics

Y   i 1 X i is binomial( n, p ). We assume the prior distribution on p is beta(  ,  ). The posterior distribution of

distribution on  is n(  , 2 ) . The posterior distribution of  given X  x , is also normal (Homework problem)

1. The normal family is its own conjugate family.

Section 7.3 – Methods of Evaluating Estimators

7.3.1 Mean Squared Error

estimators of  and  2 , respectively,

(1) MLE: p̂  X is unbiased estimator for p and

MSE ( pˆ B )  Varp ( pˆ B )  [ E p ( pˆ B  p)]2

Skip equivariance example (Example 7.3.6)

7.3.2 Best Unbiased Estimator

Consider the class of estimators

E S 2   for all  . Thus, both X and S 2 are unbiased estimators of  .

Var (aX  (1  a) S 2 ) . The calculation can be very lengthy.

The following lemma helps in the computation of the CRLB.

Lemma 7.3.11 If f ( x |  ) satisfies

(true for an exponential family), then

Example 7.3.14 (Normal variance bound) Let X 1 , , X n be iid n(  , 2 ) . We have:

the Cramer-Rao Theorem. Let L( | x)   i 1 f ( xi |  ) denote the likelihood function. If W ( X)  W ( X 1 , , X n ) is

any unbiased estimator of  ( ) , then W ( X) attains the CRLB if and only if

Example 7.3.16 Recall the normal problem.

of  2 will achieve the CRLB.

7.3.3 Sufficiency and Unbiasedness

Recall two important results:

Theorem 7.3.19 If W is a best unbiased estimator of  ( ) , then W is unique.

 X i ~ binomial ( kn, ) is a complete sufficient statistic for  .

7.3.4 Loss Function Optimality

If  is real-valued, two commonly used loss functions are

 squared error loss L( , a)  (  a) 2 : more penalty on large discrepancies

which penalizes overestimation more than underestimation.

considering estimators of the form  b ( X)  bS 2 .

finite variance,  2 . We want to estimate  2 .

Bayesian Approach to Loss Function

Definition: Given a prior distribution  ( ) , the Bayes risk

Example 7.3.28 (Two Bayes rules) Suppose we want to estimate  .

1. For the squared error loss, the posterior expected loss is

2. For the absolute error loss, the posterior expected loss is

minimized by   (x)  E ( | x) = median of the posterior distribution (Exercise 2.18).

  (x) = median of the posterior distribution = E ( | x ) .

You might also like