0% found this document useful (0 votes)

63 views16 pages

Introduction To Maximum Likelihood (ML)

the binomial model with y successes out of n trials, the likelihood equation is: y/θ - (n - y)/(1 - θ) = 0 Solving this gives the MLE: θˆ = y/n Which is just the observed proportion of successes. Maximizing the log-likelihood is equivalent to maximizing the likelihood and provides the same MLE estimate as directly maximizing the likelihood function. The variance of the MLE decreases as the sample size n increases, resulting in a more precise estimate.

Uploaded by

Ahsan Habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views16 pages

Introduction To Maximum Likelihood (ML)

Uploaded by

Ahsan Habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Introduction to maximum likelihood (ML) Lecture

Sren Hjsgaard, Ulrich Halekoh Unit of Statistics and Decision Analysis Faculty of Agricultural Sciences, University of Aarhus November 26, 2007
http://gbi.agrsci.dk/statistics/courses/Rcourse-DJF2008/

GLM - Maximum Likelihood

1
1 Introduction 4

Introduction

The presented material aims at giving a brief introduction to likelihood methods and related topics. These topics will be dealt with over and over again throughout the course. Therefore, do not panic if they seem very abstract at rst reading the topics will become more concrete later on.

Estimation

The likelihood function 3.1 3.2 3.3 3.4 3.5 3.6 ML Estimates and ML estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of ML estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variance of MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximizing the likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating the asymptotic variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transformation of MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 12 17 18 20 23 25 25 28 30 32 36 37 41

Tests of hypotheses 4.1 4.2 4.3 The likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Wald test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The score test* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Condence intervals 5.1 5.2 Wald based intervals and the Wald test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Condence intervals based on logodds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

GLM - Maximum Likelihood 3 5.3 5.4 5.5 5.6 6 Likelihood ratio based intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparisons of the intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretation of condence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reparameterizing the binomial model* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 49 50 53 56

GLM - Maximum Likelihood 5

Estimation

Taylor approximation*

An estimate (more precisely a point estimate) of an unknown parameter in a model is a guess based on observed data y. Example 2.1 Out of n = 20 male moth tobacco bud worm Heliothis virescens y = 4 died after they were given a specic dose of the pesticide trans-cypermethrin. We assume that each moth has the same probability of dying and the moths die independently of each other. Then the observations follow a binomial model Y bin(n, ), i.e. the probability of observing y deaths out of n is p(y; ) = P (Y = y; ) = where
n y

NewtonRaphson method*

Transformations of normal variables the method*

n y (1 )ny y

= n!/(y!(n y)!) with n! = n (n 1) (n 2) 2 1.

GLM - Maximum Likelihood

A natural estimate of is the proportion = 4/20 = 0.2.

The likelihood function

20 4

With n = 20 and y = 4, we have observing y = 4 death moths is

= 4845, so the probability of

p(4; ) = 4845 4 (1 )16 Once data are observed, p(y; ) is a function of the unknown parameter only. We call this the likelihood function, written L() = p(y; ).

GLM - Maximum Likelihood 7

GLM - Maximum Likelihood 9

We speak of an interval estimate (L(y), U (y)) if we state that L(y) < < U (y) with some degree of certainty. Example 2.2 A 95% condence interval for is given by (L(y) = 1.96 Var() ; ; U (y) = + 1.96 Var())

Figure 1 shows L() = p(4, ) for dierent values of i.e., how likely it is to observe y = 4 for dierent values of . The value which maximizes L() is called the maximum likelihood estimate or MLE for short. Figure 1 suggests that = 0.2 = 4/20 the relative frequency.

= (0.2 0.18

0.2 + 0.18) = (0.02; 0.38)

Lik(theta, y = 4, n = 20)

0.00 0.0

0.05

0.10

0.15

0.20

0.2

0.4 theta

0.6

0.8

1.0

Figure 1: Plot of the likelihood function. One can think of as the value of which maximizes the probability of observing the data which one actually has
GLM - Maximum Likelihood GLM - Maximum Likelihood

observed. In that sense is the value of which ts most closely to data.

3.1

ML Estimates and ML estimators

The value (y = 4) = 0.2 = 4/20 is the maximum likelihood estimate (MLE). The question is how good the estimate is. If n = 200 and y = 40, the MLE is also 0.2 but intuition suggests that the estimate is more precise in the latter case. Hence there must be a characteristic of the estimate which is not simply the estimate itself. To discuss this we need to make the distinction between an estimate and an estimator. Consider the model Y bin(n, ). Here Y is a random variable and the outcome of Y (the number of dead moths) will not be the same when the experiment is repeated several times. Hence (Y ) = Y /n is a random variable because it is a function of Y . We say that (Y ) is the maximum likelihood estimator.
GLM - Maximum Likelihood 11 GLM - Maximum Likelihood 13

Dening the function L():

Lik<-function(theta,y,n) { u <- choose(n, y) * theta^y * (1 - theta)^(n - y)}

If you nd it dicult to think about (Y ) as a random variable in abstract terms then think about the distribution of (Y ) as what we would get if the experiment was repeated a large number of times: Suppose the experiment with n = 20 is repeated, say 1000 times and that y 1 , y 2 , . . . , y 1000 denote the number of dead moths in these experiments. Based on these experiments we can calculate estimate (y k ) = k = y k /n for k = 1, . . . , 1000. We can then look at the distribution of the k s. We can not repeat the experiment in practice but we can simulate data from a model where the true value is = 0.2. This scheme can be repeated for other values of n, e.g. n = 200, see Figure 2.

We have choose(n,y)= n . Note that we just wrote the y probability function of the binomial distribution and hence an alternative denition would use
Lik <- function(theta, y, n) { u <- dbinom(y, size = n, prob = theta) }

The plot is produced by

theta <- seq(0, 1, 0.05) plot(Lik(theta, y = 4, n = 20) ~ theta, type = "b")

GLM - Maximum Likelihood

main = "")
200 250 Frequency 0.0 0.1 0.2 0.3 ^ 0.4 0.5 0.6 0 0.0 50 100 150 200

Frequency

100

150

0.1

0.2

0.3 ^

0.4

0.5

0.6

Figure 2: Distribution of MLE when = 0.2. Left: n = 20, right: n = 200.

GLM - Maximum Likelihood 15

GLM - Maximum Likelihood 17

We generate 1000 binomially distributed data, one time with n = 20 and the other time with n = 200
set.seed(234) y.20 <- rbinom(1000, size = 20, prob = 0.2) y.200 <- rbinom(1000, size = 200, prob = 0.2)

3.2

Properties of ML estimators

Figure 2 illustrates three important points about the distribution of (Y ): 1 The distribution is centered around the true value, = 0.2. More precisely, E((Y )) . 2 The variance v = Var((Y )) becomes smaller when n becomes large. In Section 3.5 we outline how an estimate of this variance can be calculated from the likelihood function. 3 The distribution is approximately normal. More precisely (Y ) A N (, v ). An additional property is:

To be able to repeat the results we initialize the random number generator with set.seed, a function that takes an integer. Now we estimate the probability for each of the 1000 repetitions
set.seed(234) theta.20 <- y.20/20 theta.200 <- y.200/200

The histograms (Fig. 2) are produced with

par(mfrow = c(1, 2)) hist(theta.20, xlim = c(0, 0.6), xlab = bquote(hat(theta)), main = "") hist(theta.200, xlim = c(0, 0.6), xlab = bquote(hat(theta)),
GLM - Maximum Likelihood

4 nice transformations of an MLE are approximately normal: If g() is a smooth monotone function (and g its rst derivative) then g((Y )) A N (g(), g ()2 v ).
GLM - Maximum Likelihood

3.3

Variance of MLE

3.4

Maximizing the likelihood

The key to talking about how good the estimate (y) is lies in considering the variance of the estimator. With n = 20 and y = 4 as well as with n = 200 and y = 40 we get = 0.2. Yet, intuition suggests that an estimate based on 200 observations is more precise than an estimate based on 20 observations. Figure 2 shows that the estimator when n = 200 has much smaller variance than when n = 20. A simple way of summarizing the variation is to calculate the emprirical variance of the simulated k s. We then nd that Var(k ) = 0.008 for n = 20 and Var(k ) = 0.001 for n = 200

In Example 2.1 we dened L() = p(4; ) = 20 4 (1 )16 4

to be the likelihood function. Since the term 20 does not 4 involve (it is just a constant) we can ignore this term and dene the likelihood function to be simply L() = 4 (1 )16 or more generally L() = y (1 )ny . It is often easier to maximize the log-likelihood function l() = log L() = y log + (n y) log(1 ) than the likelihood function L() itself. To maximize the loglikelihood function one has to 1. calculate the rst derivative l () of l() and (1)

GLM - Maximum Likelihood 19

GLM - Maximum Likelihood 21

var(theta.20)
[1] 0.00761606

2. solve the equation l () = 0 to nd the maximum likelihood estimate . 3. It must also be investigated whether is indeed the maximum. We shall not go into such details here. The function l () is called the score function.

var(theta.200)
[1] 0.000799019

The equation l () = 0, to solve is called the likelihood equation. For the binomial model we nd y ny l () = . 1

(2)

Solving l () = 0 for to nd the maximum likelihood estimate is straightforward for the binomial modela and gives the relative frequency: y = . n
a This follows from: y ny = 0 which implies 1 (1 )y (n y) = 0 whereby y = y + n y = n (1)y(ny) (1)

= 0 which implies

GLM - Maximum Likelihood

In most practical applications, the likelihood equation can not be solved as easily as above. Instead one has to use iterative methods, see Section 7

For n = 20:

Var() 0.09, for n = 200:

Var() 0.03.

GLM - Maximum Likelihood 23

GLM - Maximum Likelihood 25

3.5

Calculating the asymptotic variance

1 . ()

3.6

Transformation of MLE

The variance of the ML estimator is approximately v = l Dierentiating the score function and changing sign gives l () = y ny + . 2 (1 )2

A smooth monotone transformation of the MLE is approximately normal: Example 3.2 In the moth example, it may be more natural to work with interpreting odds: The estimated odds for dying is = /(1 ) = 0.2/0.8 = .25 Let = g() = /(1 ). Then = g() = .25. Dierentiation gives that g () = 1/(1 )2 . In Example 2.1 and 3.1 we found that = 0.2 and Var() = 0.008.

(3)

In practice is unknown but it can be justied to plug the estimate = y/n into l () and this gives l () = So the variance becomes v = y(n y) . n3 n (1 ) .

Hence g (0.2) = 1.56 so A N (, 1.562 0.008) = N (, 0.02).

Example 3.1 For the moth data in Example 2.1 we have = 0.2.
GLM - Maximum Likelihood GLM - Maximum Likelihood

Tests of hypotheses

4.1

The likelihood ratio test

Suppose interest is in testing whether is equal to a specic xed value 0 . For example, suppose interest is in testing the hypothesis that = 0 = 0.3 for the moth data. We mention three tests: The likelihood ratio test, the Wald test and the score test. These are asymptotically equivalent in the sense that when the amount of information in data tends to innity then they give the same result. Still, the likelihood ratio test is often the preferred one.

The maximum likelihood estimate is the value of which gives the observed data the highest probability which is L() = p(y; ). If the value 0 assigns nearly the same probability L(0 ) = p(y, 0 ) as does, we would be tempted to accept the hypothesis that = 0 . This suggests to consider the likelihood ratio test statistic Q dened by L(0 ) Q= . L() Clearly Q is a number between 0 and 1 and values close to 1 are in favor of the hypothesis.

GLM - Maximum Likelihood 27

GLM - Maximum Likelihood 29

l() l() l(0 )

If the hypothesis H0 : = 0 is true then 2 log Q = 2(log L() log L(0 )) = 2(l() l(0 )) (4)

Slope l ()

has (when n is large) approximately a 2 distribution with 1 degree of freedom. Large values of 2 log Q leads to rejection of the hypothesis. In Figure 3 it can be seen that 2 log Q is twice the vertical distance between the value of l in and 0 .

Figure 3: Overview of likelihood ratio test, the Wald test and the score test.

GLM - Maximum Likelihood

4.2

The Wald test

4.3

The score test*

When n is large and the hypothesis is true, the distribution of the Wald test statistic W = ( 0 )2 l () will look like a 2 distribution with 1 degree of freedom. Recall that Var() = 1/l () so W = ( 0 )2 . Var() (5)

When n is large and the hypothesis is true, the distribution of the score test statistic S = l (0 )2 /l (0 ) will look like a 2 distribution with 1 degree of freedom. The score test is obtained by considering the slope of l in the point 0 . It is known that the slope of l in is 0 (l() = 0 by denition of the MLE.) Hence values of l (0 ) near 0 will also speak in favor of the hypothesis. (6)

The Wald test compares the values of and 0 directly corresponding to the horizontal distance in Figure 3.

GLM - Maximum Likelihood 31

GLM - Maximum Likelihood 33

Note: In the literature, one frequently uses the term Wald test for 0 Z= . Var() If the hypothesis is true, then Z approximately has a N (0, 1) distribution. Note that Z 2 = W .

Example 4.1 For the moth data we nd l() = l(0.20) = 10.008 and l(0 ) = l(0.3) = 10.523. Hence the likelihood ratio test is 2 log Q = 1.03. The Wald test is obtained by calculating rst l () = 125, then W = 1.25. To calculate the score test we note that l (0 ) = 9.52 and l (0 ) = 77.10. Hence S = 1.18. The 95% quantile for a 2 distribution is 3.84, so none of the 1 tests is signicant.

GLM - Maximum Likelihood

We dene the log-likelihood function Eq.(1)

logLik <- function(theta, y, n) { y * log(theta) + (n - y) * log(1 - theta) }

Condence intervals

A common way of presenting the precision of an estimate is a condence interval which is an interval estimate. Many text books on statistics will present the following (approximate) 95% condence interval for for the moth example 1.96 Var() = 1.96 y(n y) . n3 (7)

the rst derivative Eq.(2)

logLikD <- function(theta, y, n) { y/theta - (n - y)/(1 - theta) }

and the second derivative Eq.(3)

logLikDD <- function(theta, y, n) { -y/theta^2 - (n - y)/(1 - theta)^2 }

Hence we have the LR-test-statistic (Eq. 4)

-2 * (logLik(0.3, 4, 20) - logLik(0.2, 4, 20))
[1] 1.02928 GLM - Maximum Likelihood 35 GLM - Maximum Likelihood 37

the Wald-test-statistic W (Eq. 5)

-(0.2 - 0.3)^2 * logLikDD(0.2, 4, 20)
[1] 1.25

5.1

Wald based intervals and the Wald test

and the score-test-statistic S (Eq. 6)

-logLikD(0.3, 4, 20)^2/logLikDD(0.3, 4, 20)
[1] 1.17647

Wald based intervals and the Wald test are closely connected: Consider testing the hypothesis H0 : = 0 for a range of dierent values for 0 , e.g. 0.05, 0.10, 0.15, 0.20, . . . Because is approximately normal, one way of testing the hypothesis is to use the statistic z= 0 Var() which under the hypothesis approximately has a N (0, 1) distribution.

GLM - Maximum Likelihood

Thus if the test is at 5% signicance level, the hypothesis is rejected if |z| > 1.96. Some of the hypotheses will be accepted; others will be rejected. The values of 0 that leads to acceptance of the hypothesis H0 : = 0 denes the condence interval. A simple calculation gives the interval as ( 1.96 Var(); + 1.96 Var()).

The problem arises for two reasons: 1. The Wald test relies on that the MLE is approximately normal. This approximation is generally poor when n is small. 2. The parameter is bounded to 0 1. This is not accounted for in the approximate interval.

GLM - Maximum Likelihood 39

GLM - Maximum Likelihood 41

Example 5.1 A Wald condence interval based on Eq. (7) for n = 20, y = 4 and for n = 10, y = 2 (in both cases = 0.2) is given in Table 1. Note that Var() = (1 )/n = y (n y)/n3 . Table 1: Condence interval (approximate) for probabilities. 2.5% n = 20, y = 4 n = 10, y = 2 0.02 -0.05 97.5% 0.38 0.45

5.2

Condence intervals based on logodds

Example 5.2 Instead of working with probabilities in the binomial example we can work with logodds (see Section 5.6) dened as = log . 1

When varies from 0 to 1, will vary from to . Then can be expressed in terms of the logodds as = e . 1 + e
y ny

Note: Since 0 1 it does not make much sense to have a negative lower boundary on a condence interval.

(8) and the variance of

The MLE for is the logodds: = log n the estimator is Var() y(ny) .

Since the MLE is approximately normal, an approximate 95%

GLM - Maximum Likelihood GLM - Maximum Likelihood

condence interval for is 1.96 Var() = 1.96 n . y(n y) (9)

Recall that to get from logodds to probabilities, we use the inverse-logit transformation or the logistic function (8). The same transformation is then to be applied to the end points of the condence intervals to obtain an approximate condence interval for in the two cases.

Reconsider the moth example with n = 20 moths and y = 4 dead moths. Consider also the case with n = 10 moths and y = 2 dead moths. In both cases = 4/20 = 2/10 = 0.2 while = log(4/16) = log(2/8) = log(0.25) = 1.39. Condence intervals for the logodds are in the two cases given in Table 2. Table 2: Condence interval (approximate) for logodds. 2.5% n = 20, y = 4 n = 10, y = 2 2.48 2.94 97.5% 0.29 0.16

GLM - Maximum Likelihood 43

GLM - Maximum Likelihood 45

The condence interval in the rst row of table 2 (Eq. 9) is

eta <- log(4/(20 - 4)) CI.eta <- eta + 1.96 * c(-1, 1) * sqrt(20/(4 * (20 4)))
[1] -2.481968 -0.290621

The result is given in Table 3 in the rst two columns. Note that these condence intervals are not symmetrical around = 0.2. They are always between 0 and 1. Compare the intervals to those based on the proportions themselves (Table 2). Table 3: Condence interval (approximate) for probabilities. transforming logits 2.5% n = 20, y = 4 n = 10, y = 2 0.08 0.05 97.5% 0.43 0.54 likelihood based 2.5% 0.07 0.04 97.5% 0.41 0.50

GLM - Maximum Likelihood

The condence interval for the probability (n = 20, y = 4) transforming the log-odds via the logistic function is
plogis(CI.eta)
[1] 0.077132 0.427852

The LR-based condence interval can not be calculated with the computational and conceptual means available at the moment. Just for later reference, one would t a simple logistic regression model and use the function confint (after loading the package MASS) for the condence interval calculation:
M <- glm(cbind(4, 20 - 4) ~ 1, family = binomial) eta <- coef(M) library(MASS) CI.eta <- confint(M)
2.5 % 97.5 % -2.636494 -0.383287

CI <- plogis(CI.eta)
2.5 % 97.5 % 0.0668264 0.4053344

GLM - Maximum Likelihood 47

GLM - Maximum Likelihood 49

5.3

Likelihood ratio based intervals

5.4

Comparisons of the intervals

An alternative is to base the interval on the likelihood ratio test. The test is now to be based on that 2 log Q = 2(l() l(0 )) approximately has a 2 1 distribution when H0 is true.

For large samples, the dierent condence intervals discussed here are (asymptotically) equivalent meaning that the intervals for large samples will be very similar. For small samples and/or parameter estimates close the limit of permissible values the intervals can be quite dierent, and the Wald intervals based on the parameters close to their limits can be quite misleading. In this case, it is recommended either to calculate intervals for parameters which are not restricted to lie in a specic interval and then convert the intervals. Alternatively, one can calculate the likelihood based intervals.

We can then nd the interval of 0 s which do not lead to rejection of H0 in the same spirit as before but with one exception: It is not possible to write down the interval explicitly as before. It is necessary to run through a series of possible values of 0 and see if hypothesis is accepted or not. Yet, with modern computers, this is not a problem. The likelihood based intervals for the example is given in Table 3 in the rightmost two columns. These intervals will always be within [0, 1].
GLM - Maximum Likelihood

GLM - Maximum Likelihood

5.5

Interpretation of condence intervals

Some of these intervals will include the true parameter value and some will not. The relative frequency with which the intervals include the true value is called the coverage probability. If the interval calculated for each experiment is a 95% condence interval, the coverage probability is 95% that is, in 95% of the experiments the condence intervals will include the true parameter. But that is not the same as saying that there is 95% probability that the true value of is contained in the interval [0.02; 0.38]. For this reason we use an alternative word: We are 95% condent that is in the interval [0.02; 0.38].

A common mistake is to interpret a condence interval along the following lines: There is 95% probability that falls in the interval [0.02; 0.38] It is not correct to attach such a probabilistic meaning to a condence interval. There is only one true value of and it will either lie in the interval or not as the case might be.

GLM - Maximum Likelihood 51

GLM - Maximum Likelihood 53

To associate a probability with a condence interval, imagine repeating the experiment under identical conditions a large number of times. For each experiment, the condence interval for is calculated as illustrated in Figure 4.

5.6

Reparameterizing the binomial model*

Recall the binomial probability function for Y bin(n, ): p(y; ) = P (Y = y; ) = n y (1 )ny y (10)

q q q q q q q q q q q q q q q q q q q q

We can reparameterize the model as follows: Dene the logodds as = log . 1 Then, when varies from 0 to 1, will vary from to . Then can be expressed in terms of the logodds as
6 7

e 1 + e n e 1 ( )y ( )ny y 1 + e 1 + e

(11)

If we substitute (11) into (10) then we get Figure 4: Coverage of condence intervals. p(y; ) = P (Y = y; ) = (12)

Now, (12) is also the binomial probability function. It is just

GLM - Maximum Likelihood GLM - Maximum Likelihood

expressed in terms of another parameter the logodds. This is called a re-parametrization of the model. We can then proceed as before: The loglikelihood becomes (where we ignore some calculations l() = y n log(1 + e ) The score function becomes l () = y n e 1 + e
n y

Taylor approximation*

) after

The simplest version of the Taylor approximation (also called a 1. order Taylor approximation) is based on approximating a nonlinear function h(x) by a linear function l(x). Let x0 and x be two numbers (not too far apart) and assume that h is a nice (i.e. dierentiable) function. Then it is known that h(x) = h(x0 ) + h (x0 )(x x0 ) (13)

Solving l () = 0 gives the (empirical) logodds: = log y ny

[h(x0 ) h (x0 )x0 ] + h (x0 )x = a + bx = l(x) (14)

i.e. a = [h(x0 ) h (x0 )x0 ] and b = h (x0 ). The further x is away from x0 the worse is this approximation.

A very important property of ML estimates is illustrated in e the following: Inserting into = 1+e gives . Hence it does not matter whether the model is parametrized in
GLM - Maximum Likelihood 55

GLM - Maximum Likelihood 57

terms of or . This is a general result on ML estimates. The variance of the estimator is Var() n y(n y)

NewtonRaphson method*

Suppose we wish to solve the equation h(x) = 0 for x. A classical method for doing so is NewtonRaphson algorithm: Suppose that x0 is an initial guess for a solution. Then the Taylor approximation gives 0 = h(x) h(x0 ) + h (x0 )(x x0 ) Solving the last equation gives h(x0 )/h (x0 ) x x0 and hence x x0 h(x0 ) h (x0 )

This suggests an iterative scheme known as NewtonRaphson iteration: Start with some initial value x0 and calculate successively x1 = x0
GLM - Maximum Likelihood GLM - Maximum Likelihood

h(x0 ) , h (x0 )

x 2 = x1

h(x1 ) , h (x1 )

x3 = . . .

In most (but certainly not all) problems arising in practice there will be very little change in the xvalues after a few iterations. Example 7.1 We wish to solve the likelihood equation S() = l () = y ny =0 1

Transformations of normal variables the method*

It is well known that a linear function of a normal random variable is itself normal, e.g. if x N (, 2 ) and l(x) = a + bx, then z = l(x) = a + bx N (a + b, b2 2 ). A nonlinear function z = h(x) of x is, however, not normal. Yet, in some cases z is not too far from being normal, more specically h(x) A N (h(), h ()2 2 )

for the binomial example with y = 4 and n = 20. Dierentiating the score function gives S () = ny y 2 (1 )2

With a simple pocket calculator we can now do the following: Starting with 0 = 0.1 we get the sequence of values 0.100 0.153 0.191 0.200 0.200 0.200 where we note that the scheme converges after 4 iterations. One can then stop and the nal value is then taken to be the maximum likelihood estimate.
GLM - Maximum Likelihood 59

GLM - Maximum Likelihood 61

This claim can be loosely justied as follows: We apply Taylors approximation (see Section 6) to h(x) where we let x0 = E(x) = . Then z = = h(x) h() + h ()(x ) [h() h ()] + h ()x = a + bx (15) (16)

From (16) it follows that h(x) is approximately a linear function of x which is normal, so h(x) is approximately normal. Left is only to determine the (approximate) mean and variance of h(x). To do so, recall that E(a + bx) = a + b E(x) and Var(a + bx) = b2 Var(x): From (16) it now follows that E(h(x)) h() and Var(h(x)) h ()2 Var(X). How good the approximation is depends on the shape of h and the variance 2 . The closer h is to being linear and the smaller 2 the better is the approximation.
GLM - Maximum Likelihood GLM - Maximum Likelihood

Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Advanced GLM Techniques Lecture
No ratings yet
Advanced GLM Techniques Lecture
61 pages
Federico Vegetti - GLM and Maximum Likelihood
No ratings yet
Federico Vegetti - GLM and Maximum Likelihood
32 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Inf 2
No ratings yet
Inf 2
37 pages
4 - Logistic Reg 2
No ratings yet
4 - Logistic Reg 2
44 pages
MLEstimation
No ratings yet
MLEstimation
8 pages
Module 4
No ratings yet
Module 4
3 pages
Maximum
No ratings yet
Maximum
3 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
6 pages
Lectura 3 MLE
No ratings yet
Lectura 3 MLE
33 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
Maximum Likelihood Estimation Notes
No ratings yet
Maximum Likelihood Estimation Notes
10 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Slides Week4
No ratings yet
Slides Week4
37 pages
Introduction To MME
No ratings yet
Introduction To MME
4 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Maximum Likelihood Estimation Lecture
No ratings yet
Maximum Likelihood Estimation Lecture
22 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
Math for CompSci: MLE & Regularization
No ratings yet
Math for CompSci: MLE & Regularization
46 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
11 Mle
No ratings yet
11 Mle
26 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
ML Notes
No ratings yet
ML Notes
4 pages
Extropy Estimation of Weibull Distribution Under Upper Records
No ratings yet
Extropy Estimation of Weibull Distribution Under Upper Records
7 pages
Full Unit 4 Including Numericals
No ratings yet
Full Unit 4 Including Numericals
38 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
103 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
CDA Course
No ratings yet
CDA Course
196 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
8 pages
Simulating Maximum Likelihood Estimators: Corbin Miller Stat 342 February 14, 2011
No ratings yet
Simulating Maximum Likelihood Estimators: Corbin Miller Stat 342 February 14, 2011
5 pages
Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
No ratings yet
Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
7 pages
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
No ratings yet
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
62 pages
Stat100b Maximum Likelihood
No ratings yet
Stat100b Maximum Likelihood
9 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
3 - Mle
No ratings yet
3 - Mle
14 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
10) ISM-Session 10
No ratings yet
10) ISM-Session 10
61 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
295 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Stat5900 f24 Lec14
No ratings yet
Stat5900 f24 Lec14
10 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Acknowledgement of PAN Application (Change Request) : Copy To Be Kept With Application
No ratings yet
Acknowledgement of PAN Application (Change Request) : Copy To Be Kept With Application
1 page
HAT400 V1.0 en
No ratings yet
HAT400 V1.0 en
6 pages
Comparative Adjectives
No ratings yet
Comparative Adjectives
4 pages
Install TinyOS on CentOS/Fedora/Red Hat
No ratings yet
Install TinyOS on CentOS/Fedora/Red Hat
2 pages
Media and Information Literacy1 and 2
No ratings yet
Media and Information Literacy1 and 2
37 pages
FM Global Property Loss Prevention Data Sheets: Low Expansion Foam Systems
No ratings yet
FM Global Property Loss Prevention Data Sheets: Low Expansion Foam Systems
3 pages
Excel to PDF Conversion Guide
0% (1)
Excel to PDF Conversion Guide
2 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
9 pages
SCHOOL ACTION PLAN On REMOTE ENROLMENT
No ratings yet
SCHOOL ACTION PLAN On REMOTE ENROLMENT
1 page
PHASE 1 - Chapter 1 Systems Planning
No ratings yet
PHASE 1 - Chapter 1 Systems Planning
33 pages
HCC Online Student Handbook 2018
No ratings yet
HCC Online Student Handbook 2018
16 pages
Limit of A Function
No ratings yet
Limit of A Function
41 pages
توافقات ايسي البيانات
No ratings yet
توافقات ايسي البيانات
4 pages
Geoparser
No ratings yet
Geoparser
8 pages
BPR and Logistics
0% (1)
BPR and Logistics
3 pages
Aptitude Shortcuts for Partnership Problems
No ratings yet
Aptitude Shortcuts for Partnership Problems
3 pages
Overview of Probabilistic Models
No ratings yet
Overview of Probabilistic Models
29 pages
Cp467 12 Lecture6 Sharpening
No ratings yet
Cp467 12 Lecture6 Sharpening
33 pages
Assignment Programming in C
No ratings yet
Assignment Programming in C
4 pages
Lean Production: 10 Essential Steps
No ratings yet
Lean Production: 10 Essential Steps
14 pages
Manual Lloyd
No ratings yet
Manual Lloyd
68 pages
Compact Climate Logger Design
No ratings yet
Compact Climate Logger Design
53 pages
Reservation - Internal Order With Material To Material Transfer Posting
No ratings yet
Reservation - Internal Order With Material To Material Transfer Posting
10 pages
Dbms Printout
No ratings yet
Dbms Printout
24 pages
Producer Consumer Problem in C with Semaphores
No ratings yet
Producer Consumer Problem in C with Semaphores
10 pages
Trak N Pay
No ratings yet
Trak N Pay
53 pages
Transmission Line Fault Location Using Genetic Algorithms: DPSP2014 - 0191
No ratings yet
Transmission Line Fault Location Using Genetic Algorithms: DPSP2014 - 0191
1 page
Unit-1 4350705
No ratings yet
Unit-1 4350705
23 pages
TT Signed
No ratings yet
TT Signed
8 pages
Engineering Lab: Rotary Link Control
No ratings yet
Engineering Lab: Rotary Link Control
4 pages