0% found this document useful (0 votes)
17 views180 pages

Simulation

The document discusses the concept of simulation in statistics, focusing on the use of computers to perform random experiments based on mathematical models. It outlines various techniques for predicting outcomes, including trial-and-error methods, probability models, and computer simulations, emphasizing the importance of understanding probability models and their applications. Additionally, it covers the generation of random variables from different distributions and the use of algorithms for sampling and simulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views180 pages

Simulation

The document discusses the concept of simulation in statistics, focusing on the use of computers to perform random experiments based on mathematical models. It outlines various techniques for predicting outcomes, including trial-and-error methods, probability models, and computer simulations, emphasizing the importance of understanding probability models and their applications. Additionally, it covers the generation of random variables from different distributions and the use of algorithms for sampling and simulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Simulation I

Presidency University

December, 2024
Introduction

I Want to deal with random experiments.


Introduction

I Want to deal with random experiments.

I Various techniques to predict the outcome of such experiments:


Introduction

I Want to deal with random experiments.

I Various techniques to predict the outcome of such experiments:

1. Wait and See: Designing winning strategies by trial-and-error


method.
2. Solve Probability Models : Assume a definite mathematical model
to predict outcome, sometimes gets complicated
3. Simulate Probability Models: Also start with a mathematical
model, but instead of computing it mathematically we use computers
to perform the virtual random experiment following that model and
then analyze the artificial data generated by computers. Similar to
“wait and see” except that we do not need to wait long in reality.
Concept of Simulation

I Assume a mathematical model.

1 The random numbers generated by computers are not random in absolute sense,
they are only pseudo-random numbers.
Concept of Simulation

I Assume a mathematical model.

I Use computers to perform the random experiment artificially.

1 The random numbers generated by computers are not random in absolute sense,
they are only pseudo-random numbers.
Concept of Simulation

I Assume a mathematical model.

I Use computers to perform the random experiment artificially.

I Computers can do artificial random experiment as computers can


generate random numbers.1

1 The random numbers generated by computers are not random in absolute sense,
they are only pseudo-random numbers.
Concept of Simulation

I Assume a mathematical model.

I Use computers to perform the random experiment artificially.

I Computers can do artificial random experiment as computers can


generate random numbers.1

I Use the artificial data generated by the computers to analyze the


model and predict the outcome.

1 The random numbers generated by computers are not random in absolute sense,
they are only pseudo-random numbers.
Why do we simulate?

I To have a better understanding of the known probability models.


Why do we simulate?

I To have a better understanding of the known probability models.

I To visualize a probability model with examples of outcome of a


random expt. (which in reality are hard to obtain)
Why do we simulate?

I To have a better understanding of the known probability models.

I To visualize a probability model with examples of outcome of a


random expt. (which in reality are hard to obtain)

I To have an idea about the result of a statistical model which cannot


be solved explicitly using formula.
Why do we simulate?

I To have a better understanding of the known probability models.

I To visualize a probability model with examples of outcome of a


random expt. (which in reality are hard to obtain)

I To have an idea about the result of a statistical model which cannot


be solved explicitly using formula.

I To judge the performance a model before applying it to a real data


situation.
I Suppose we have a random sample X = (X1 , X2 , ..., Xn ) from a
popuation with c.d.f F and we want to perform inference regarding
F.
I Suppose we have a random sample X = (X1 , X2 , ..., Xn ) from a
popuation with c.d.f F and we want to perform inference regarding
F.

I Let T (X ) be a statistic on the basis of which we can perform


inference regarding F .
I Suppose we have a random sample X = (X1 , X2 , ..., Xn ) from a
popuation with c.d.f F and we want to perform inference regarding
F.

I Let T (X ) be a statistic on the basis of which we can perform


inference regarding F .
I In order to judge the performance of T , we need to have an idea
about the probability distribution of T , which we refer to as the
sampling distribution of T .
I For example, if T is an estimator we can judge performance by
computing SE (T ) (or Var (T )).
I If T is a test statistic, we need to calculate probability of Type-I
error and Type-II error from sampling distribution of T .
Response of classical statisticians

I Restrict to tractable special cases: The idea was focus only very
simple setups for which sampling distribution becomes exactly
[Link] can be done in two ways:
I use only statistics which have simple mathematical form (like linear
functions of the data): That is why sample mean became so popular
as a statistic.
I assume that the probability distributions featured in the stochastic
model take one of a few forms for which exact calculation is possible,
analytically or via tabulated special functions: That is why normal
distribution became so popular.
I Combining we have the most popular combination for doing
statistical inference: sample mean and normal distribution.
Response of classical statisticians

I Restrict to tractable special cases: The idea was focus only very
simple setups for which sampling distribution becomes exactly
[Link] can be done in two ways:
I use only statistics which have simple mathematical form (like linear
functions of the data): That is why sample mean became so popular
as a statistic.
I assume that the probability distributions featured in the stochastic
model take one of a few forms for which exact calculation is possible,
analytically or via tabulated special functions: That is why normal
distribution became so popular.
I Combining we have the most popular combination for doing
statistical inference: sample mean and normal distribution.
I Appeal to asymptotics: Generally assumption of normality does not
hold in practice and so we can show under reasonable conditions,
even if we assume different parent distributions, the distribution of
sample mean is going to be normal if we have large samples.
I Up through about the 1960s, statistics was split between developing
general ideas about how to draw and evaluate inferences with
stochastic models, and working out the properties of inferential
procedures in tractable special cases (especially the linear and-
Gaussian case), or under asymptotic approximations.
I Up through about the 1960s, statistics was split between developing
general ideas about how to draw and evaluate inferences with
stochastic models, and working out the properties of inferential
procedures in tractable special cases (especially the linear and-
Gaussian case), or under asymptotic approximations.

I This yoked a very broad and abstract theory of inference to very


narrow and concrete practical formulas, an uneasy combination often
preserved in basic statistics classes.
I Up through about the 1960s, statistics was split between developing
general ideas about how to draw and evaluate inferences with
stochastic models, and working out the properties of inferential
procedures in tractable special cases (especially the linear and-
Gaussian case), or under asymptotic approximations.

I This yoked a very broad and abstract theory of inference to very


narrow and concrete practical formulas, an uneasy combination often
preserved in basic statistics classes.

I For example, if we have random sample X1 , X2 , ..., Xn from N(µ, 1)


population, we are taught to estimate µ by X̄ because it seems
reasonable to estimate population mean by sample mean. But one
could have treated µ as population median either and try to estimate
µ by sample median also. We generally don’t proceed like that
because finding sample distribution of sample median is not easy.
Rise of machines- Use of simulation

I Things began to change with the advent of cheap and fast


computers.
Rise of machines- Use of simulation

I Things began to change with the advent of cheap and fast


computers.

I In mid 1940’s a new field of simulation emerged in statistics which


gave the statisticians a new way to use complex statistical models
and complicated functions of the data.
Rise of machines- Use of simulation

I Things began to change with the advent of cheap and fast


computers.

I In mid 1940’s a new field of simulation emerged in statistics which


gave the statisticians a new way to use complex statistical models
and complicated functions of the data.

I In the method of simulation also we start with a probability model.


But instead of dealing with it mathematically, we make the
computer to perform (or simulate) the random experiment following
the model.
Rise of machines- Use of simulation

I Things began to change with the advent of cheap and fast


computers.

I In mid 1940’s a new field of simulation emerged in statistics which


gave the statisticians a new way to use complex statistical models
and complicated functions of the data.

I In the method of simulation also we start with a probability model.


But instead of dealing with it mathematically, we make the
computer to perform (or simulate) the random experiment following
the model.

I The method of simulation became widely used in many inferential


procedures like finding standard error of estimates, constructing level
α tests, constructing confidence interval with certain confidence
coefficient.
Uniform Random Variables: The starting point
I If we can generate X ∼ U(0, 1)(which we can do by runif()), we can
generate random variables from other discrete and continuous
distributions as well.
Uniform Random Variables: The starting point
I If we can generate X ∼ U(0, 1)(which we can do by runif()), we can
generate random variables from other discrete and continuous
distributions as well.

I Suppose we want to generate a Bernoulli random variable with


probability of success 0.7.

bernoulli=function(prob)
{
u=runif(1); x=NULL;
if (u<prob) x=1
else x=0
return(x)
}
bernoulli(0.7) #--call the function

[1] 1
Using it further

I Suppose we want to simulate a Geometric(0.8) random variable.

x=1
y=bernoulli(0.8)
while(y!=1)
{
y=bernoulli(0.8)
x=x+1
}
x

[1] 1
Much Complicated Ones

I How can we generate poission or a hypergeometric random variable


using the above technique? In gnereal, we can use the following fact:

Fact
Suppose
P we want to generate X having p.m.f P(X = xj ) = pj j = 0, 1, .....
pj = 1. We generate U ∼ U(0, 1) and set


 x0 if U < p0




 x1 if p0 ≤ U < p0 + p1
. ..
X = .. .
 Pj−1 Pj
xi if i=0 pi ≤ U < i=0 pi




.. ..


. .
Algorithm

I The preceding fact can be written algorithmically as :


I Generate a random U ∼ U(0, 1)
I If U < p0 stop and set X = x0
I If U < p0 + p1 stop and set X = x1
I If U < p0 + p1 + p2 stop and set X = x2 and so on....
Generating Poission Distribution

poi_mass=function(x,lambda)
{
return( exp(-lambda)*(lambda^x)/factorial(x))
}
poi_sample=function(lambda)
{
U=runif(1); i=0; cumprob=poi_mass(0,lambda)
while(U>cumprob)
{
i=i+1
cumprob=cumprob+poi_mass(i,lambda)
}
return(i)
}
poi_sample(5)

[1] 4
Drawing random sample
I Recall that randomly drawing a random sample is same as drawing a
discrete uniform {1, 2, ..., n} random variable. Hence we should know
how do simulate a discrete uniform sample if we have U ∼ U(0, 1).
Drawing random sample
I Recall that randomly drawing a random sample is same as drawing a
discrete uniform {1, 2, ..., n} random variable. Hence we should know
how do simulate a discrete uniform sample if we have U ∼ U(0, 1).
I Note that if U ∼ U(0, 1), then X = [nU] + 1 has a discrete uniform
distribution over {1, 2, ..., n}.
Drawing random sample
I Recall that randomly drawing a random sample is same as drawing a
discrete uniform {1, 2, ..., n} random variable. Hence we should know
how do simulate a discrete uniform sample if we have U ∼ U(0, 1).
I Note that if U ∼ U(0, 1), then X = [nU] + 1 has a discrete uniform
distribution over {1, 2, ..., n}.
I In R, we shall make use of the sample() command for both
with-replacement and without-replacement sampling.

[Link](100)
sample(c("A","B","C","D","E"),size=3)

[1] "B" "C" "E"

sample(c("A","B","C","D","E"),size=3,replace=T)

[1] "C" "A" "B"

sample(100,size=5)

[1] 78 23 86 70 4
Unequal Probability Sampling

[Link](100)
sample(c("A","B","C"),size=2,prob=c(0.1,0.4,0.5))

[1] "C" "B"


What about Continuous Distributions?

I For continuous distributions we use the following fact:

Fact
(Probability Integral Transformation) If X has an absolutely continous
distribution, the the c.d.f of X , F (X ) has U(0, 1) distribution.
What about Continuous Distributions?

I For continuous distributions we use the following fact:

Fact
(Probability Integral Transformation) If X has an absolutely continous
distribution, the the c.d.f of X , F (X ) has U(0, 1) distribution.
I Suppose we want to generate X from Exp(λ) distribution. Then

F (x) = 1 − e −λx = U ∼ U(0, 1).

Hence X = − λ1 ln(1 − U) is the required random variable.


What about Continuous Distributions?

I For continuous distributions we use the following fact:

Fact
(Probability Integral Transformation) If X has an absolutely continous
distribution, the the c.d.f of X , F (X ) has U(0, 1) distribution.
I Suppose we want to generate X from Exp(λ) distribution. Then

F (x) = 1 − e −λx = U ∼ U(0, 1).

Hence X = − λ1 ln(1 − U) is the required random variable.

I However this method does not work, if F does not have a closed
form. e.g. Normal distribution, Gamma distribution, Beta
distribution etc.
Generating Normal Variate I

I For generation of standard normal variates we use Box-Muller


transformation:
√ For independent U1 , U√2 ∼ U(0, 1),
Z1 = −2 ln U1 cos(2πU2 ) and Z2 = −2 ln U1 sin(2πU2 ) are
independent N(0, 1) variables.
I For generation of X ∼ N(µ, σ 2 ), we first generate Z ∼ N(0, 1) and
then take X = µ + σz

normal=function(n)
{
U1<-runif(n)
U2<-runif(n)
C<--sqrt(-2*log(U1))
Z1<-C*cos(2*pi*U2)
Z2<-C*sin(2*pi*U2)
return(Z1)
}
n<-100000
Z=normal(n)
hist(Z,prob=T)
curve(dnorm(x),-3,3,add=T)
Generating Normal Variate II
Histogram of Z
0.4
0.3
Density

0.2
0.1
0.0

−4 −2 0 2 4
Distribution Arising from Normal Distribution

I Once we generate normal random variables we can use standard


results to generate random variables from other distributions arising
from normal. For example χ2 −distribution, t-distribution, F-
distribution and many more.
Distribution Arising from Normal Distribution

I Once we generate normal random variables we can use standard


results to generate random variables from other distributions arising
from normal. For example χ2 −distribution, t-distribution, F-
distribution and many more.

I We can generate samples from bivariate normal distribution by the


following fact: If (X , Y ) ∼ N2 (0, 0, 1, 1, ρ), then Z1 = X and
Z2 = (Y√−ρX2) are iid N(0, 1) variates where −1 < ρ < 1
1−ρ
Distribution Arising from Normal Distribution

I Once we generate normal random variables we can use standard


results to generate random variables from other distributions arising
from normal. For example χ2 −distribution, t-distribution, F-
distribution and many more.

I We can generate samples from bivariate normal distribution by the


following fact: If (X , Y ) ∼ N2 (0, 0, 1, 1, ρ), then Z1 = X and
Z2 = (Y√−ρX2) are iid N(0, 1) variates where −1 < ρ < 1
1−ρ

I Equivalently if we generate Z1 , Z2 iid N(0, 1), then setting X = Z1


and Y = (1 − ρ2 )Z2 + ρZ1 gives a pair of random variables that
have N2 (0, 0, 1, 1, ρ) distribution.
Distribution Arising from Normal Distribution

I Once we generate normal random variables we can use standard


results to generate random variables from other distributions arising
from normal. For example χ2 −distribution, t-distribution, F-
distribution and many more.

I We can generate samples from bivariate normal distribution by the


following fact: If (X , Y ) ∼ N2 (0, 0, 1, 1, ρ), then Z1 = X and
Z2 = (Y√−ρX2) are iid N(0, 1) variates where −1 < ρ < 1
1−ρ

I Equivalently if we generate Z1 , Z2 iid N(0, 1), then setting X = Z1


and Y = (1 − ρ2 )Z2 + ρZ1 gives a pair of random variables that
have N2 (0, 0, 1, 1, ρ) distribution.

I More generally, if we can generate samples (X , Y ) from


N2 (0, 0, 1, 1, ρ), then we can have (U, V ) ∼ N2 (µ1 , µ2 , σ12 , σ22 , ρ) by
U = µ1 + σ1 X and V = µ2 + σ2 X .
Code I

binorm=function(n,rho)
{
x<- numeric(n); y<- numeric(n)
for (i in 1:n)
{
z1 <- normal(1)
z2 <- normal(1)
x[i] <- z1
y[i]<-rho*z1+sqrt(1-rho^2)*z2
}
return(cbind(x,y))
}

n <- 1000 ;rho <- -0.75


data=binorm(n,rho)
plot(data)
Code II

3
2
1
y

0
−1
−2
−3

−3 −2 −1 0 1 2 3
Generating Multivariate Normal

Fact
If X ∼ Np (µ, Σ) then there exists a non-singular T such that T ΣT 0 = Ip ,
i.e. T (X − µ) ∼ Np (0, Ip )
Generating Multivariate Normal

Fact
If X ∼ Np (µ, Σ) then there exists a non-singular T such that T ΣT 0 = Ip ,
i.e. T (X − µ) ∼ Np (0, Ip )
I Such T is found by cholesky decomposition of Σ
Generating Multivariate Normal

Fact
If X ∼ Np (µ, Σ) then there exists a non-singular T such that T ΣT 0 = Ip ,
i.e. T (X − µ) ∼ Np (0, Ip )
I Such T is found by cholesky decomposition of Σ

I Generate p iid N(0, 1) to form Y and take X = µ + T −1 Y


Code in R

multinorm=function(mean,Sigma)
{
p=length(mean)
T=chol(Sigma)
y=rnorm(p)
x=mean+(solve(T)%*%y)
return(x)
}
mean=c(10,9,11)
S=matrix(c(4,3,2,3,10,6,2,6,16),nrow=3)
multinorm(mean,S)

[,1]
[1,] 10.606520
[2,] 9.360652
[3,] 10.948511
Working with inbuilt R functions

I Suppose we want to generate random variables from N(µ, σ 2 )

rnorm(n=2,mean=5,sd=0.01) # 2 samples from N(5,0.01^2)

[1] 5.002859 5.004906


Working with inbuilt R functions

I Suppose we want to generate random variables from N(µ, σ 2 )

rnorm(n=2,mean=5,sd=0.01) # 2 samples from N(5,0.01^2)

[1] 5.002859 5.004906

I Now let us find P(X ≤ x) i.e. Φ( x−µ


σ )

pnorm(4,mean=5,sd=0.01) # P(X<=4) for N(5,0.01^2)

[1] 0

pnorm(7,mean=5,sd=0.01,[Link]=F) # P(X>7) for same

[1] 0
I We can also compute the normal quantiles zα

qnorm(0.05,mean=5,sd=0.01) # lower 0.05 point

[1] 4.983551

qnorm(0.01,mean=5,sd=0.01,[Link]=F) #upper 0.01 point

[1] 5.023263
I We can also compute the normal quantiles zα

qnorm(0.05,mean=5,sd=0.01) # lower 0.05 point

[1] 4.983551

qnorm(0.01,mean=5,sd=0.01,[Link]=F) #upper 0.01 point

[1] 5.023263

I Can also compute normal density φ( x−µ


σ )

dnorm(2,mean=5,sd=0.01) # density at x=2

[1] 0

dnorm(5,mean=5,sd=0.01) # density at x=5

[1] 39.89423
Plotting the normal density
x=seq(-3,3,by=0.01); y=dnorm(x)
plot(x,y,type="l",main="Density of N(0,1)",
ylab=expression(phi(x)))

Density of N(0,1)
0.4
0.3
φ(x)

0.2
0.1
0.0

−3 −2 −1 0 1 2 3

x
Other Standard Distributions in R

Distribution Sample P(X ≤ x) zα Density

Binomial rbinom(n, size, prob) pbinom qbinom dbinom

Poission rpois(n, lambda) ppois qpois dpois

Neg. Binomial rnbinom(n, size, prob, mu) pnbinom qnbinom dnbinom

Geometric rgeom(n, prob) pgeom qgeom dgeom

Hypergeometric rhyper(nn, m, n, k) phyper qhyper dhyper

Uniform runif(n, min = 0, max = 1) punif qunif dunif

Exponential rexp(n, rate = 1) pexp qexp dexp

Cauchy rcauchy(n, location = 0, scale = 1) pcauchy qcauchy dcauchy

t rt(n, df, ncp) pt qt dt

F rf(n, df1, df2, ncp) pf qf df

χ2 rchisq(n, df, ncp = 0) pchisq qchisq dchisq

Gamma rgamma(n, shape, rate = 1, scale = 1/rate) pgamma qgamma dgamma

Beta rbeta(n, shape1, shape2, ncp = 0) pbeta qbeta dbeta


Acceptance-Rejection Algorithm

I Let X be a random variable with density, f (x), and CDF, FX (x)


such that it’s hard to simulate a value of X directly. We might then
wish to use the acceptance-rejection (A-R) algorithm.
Acceptance-Rejection Algorithm

I Let X be a random variable with density, f (x), and CDF, FX (x)


such that it’s hard to simulate a value of X directly. We might then
wish to use the acceptance-rejection (A-R) algorithm.
I Suppose we have another random variable Y with density g (y ) such
that it is easy to simulate a value of Y . If there exists a constant a
such that gf (x)
(x) ≤ a for all x then we can use Y to simulate a value of
X according to the following algorithm:
I generate Y with pdf g ()
I generate U from U(0, 1)
I while U > agf (Y )
(Y )
I generate Y
I generate U
I set X = Y
Why does this method work?

I We need to show that P(X ≤ x) = FX (x).


Why does this method work?

I We need to show that P(X ≤ x) = FX (x).

I We define B to be the event that Y has been accepted in some


loop, i.e., U ≤ f (Y )/ag (Y ).
Why does this method work?

I We need to show that P(X ≤ x) = FX (x).

I We define B to be the event that Y has been accepted in some


loop, i.e., U ≤ f (Y )/ag (Y ).

I First we note that


P((Y ≤ x) ∩ B)
P(X ≤ x) = P(Y ≤ x|B) =
P(B)
Why does this method work?

I We need to show that P(X ≤ x) = FX (x).

I We define B to be the event that Y has been accepted in some


loop, i.e., U ≤ f (Y )/ag (Y ).

I First we note that


P((Y ≤ x) ∩ B)
P(X ≤ x) = P(Y ≤ x|B) =
P(B)

I Now
  Z ∞  
f (Y ) f (y )
P(B) = P U ≤ = P U≤ g (y )dy
ag (Y ) −∞ ag (y )
Z ∞ Z ∞
f (y ) f (y ) 1
= g (y )dy = dy =
−∞ ag (y ) −∞ a a
I Further
Z ∞
P((Y ≤ x) ∩ B) = P((Y ≤ x) ∩ B|Y = y )g (y )dy
−∞
!
Z ∞  
f (Y )
= P (Y ≤ x) ∩ U ≤ |Y = y g (y )dy
−∞ ag (Y )
Z x 
f (y )
= P U≤ g (y )dy
−∞ ag (y )
Z x
f (y ) FX (x)
= g (y )dy =
−∞ ag (y ) a
I Further
Z ∞
P((Y ≤ x) ∩ B) = P((Y ≤ x) ∩ B|Y = y )g (y )dy
−∞
!
Z ∞  
f (Y )
= P (Y ≤ x) ∩ U ≤ |Y = y g (y )dy
−∞ ag (Y )
Z x 
f (y )
= P U≤ g (y )dy
−∞ ag (y )
Z x
f (y ) FX (x)
= g (y )dy =
−∞ ag (y ) a

I Thus P(X ≤ x) = FX (x) which means the algorithm is producing X


from the distribution with c.d.f. F as required.
Example: Generating Beta Variables

I Suppose we wish to simulate a random variable X having density

f (x) = 60x 3 (1 − x)2 , 0 ≤ x ≤ 1.


Example: Generating Beta Variables

I Suppose we wish to simulate a random variable X having density

f (x) = 60x 3 (1 − x)2 , 0 ≤ x ≤ 1.

I Step I (Choose g (y )): Let us take Y ∼ U(0, 1).


f (x)
I Step II (Find a): Recall a should be such that g (x) ≤ a for all x.
3 2
I This implies 60x (1 − x) ≤ a for all x ∈ [0, 1].
I Note that a = 3 will satisfy.
Example: Algorithm

I generate Y ∼ U(0, 1)
I generate U ∼ U(0, 1)
I while U > 20Y 3 (1 − Y )2
I generate Y
I generate U
I set X = Y
Efficiency of the Acceptance-Rejection Algorithm

I Let N be the number of loops in the A-R algorithm until acceptance


(then N is random)
Efficiency of the Acceptance-Rejection Algorithm

I Let N be the number of loops in the A-R algorithm until acceptance


(then N is random)

I As before let B be the event that Y has been accepted in a loop, i.e.
U ≤ f (Y )/ag (Y ) .
Efficiency of the Acceptance-Rejection Algorithm

I Let N be the number of loops in the A-R algorithm until acceptance


(then N is random)

I As before let B be the event that Y has been accepted in a loop, i.e.
U ≤ f (Y )/ag (Y ) .

I We saw earlier that P(B) = 1a . Then N has geometric distribution


and E (N) = a.
Efficiency of the Acceptance-Rejection Algorithm

I Let N be the number of loops in the A-R algorithm until acceptance


(then N is random)

I As before let B be the event that Y has been accepted in a loop, i.e.
U ≤ f (Y )/ag (Y ) .

I We saw earlier that P(B) = 1a . Then N has geometric distribution


and E (N) = a.

I So clearly we would like a to be as small as possible. At this point


note that a ≥ 1.
Example(Contd.)

I Our constant a had to satisfy 60x 3 (1 − x)2 ≤ a for all x ∈ [0, 1].
Example(Contd.)

I Our constant a had to satisfy 60x 3 (1 − x)2 ≤ a for all x ∈ [0, 1].

I Thus we take a as

a = max 60x 3 (1 − x)2


x∈[0,1]
Example(Contd.)

I Our constant a had to satisfy 60x 3 (1 − x)2 ≤ a for all x ∈ [0, 1].

I Thus we take a as

a = max 60x 3 (1 − x)2


x∈[0,1]

I Using calculus we find this value to be 2.073.


Example(Contd.)

I Our constant a had to satisfy 60x 3 (1 − x)2 ≤ a for all x ∈ [0, 1].

I Thus we take a as

a = max 60x 3 (1 − x)2


x∈[0,1]

I Using calculus we find this value to be 2.073.

I Hence the algorithm will hopefully work faster if we take a = 2.073.


How do we choose g ?

I Would like to choose g () to minimize the computational load


I can do this by taking g () ‘close’ to f ()
I then a is close to 1 and fewer iterations required.
How do we choose g ?

I Would like to choose g () to minimize the computational load


I can do this by taking g () ‘close’ to f ()
I then a is close to 1 and fewer iterations required.
I But there is a tradeoff : if g() is close to f() then it will probably
also be hard to simulate from g()
How do we choose g ?

I Would like to choose g () to minimize the computational load


I can do this by taking g () ‘close’ to f ()
I then a is close to 1 and fewer iterations required.
I But there is a tradeoff : if g() is close to f() then it will probably
also be hard to simulate from g()
I So often need to strike a balance between having a ‘nice’ g() and a
small a.
Mixture Distribution

I Suppose again that we want to simulate a value of a random


variable X which has cumulative distribution function FX .
Mixture Distribution

I Suppose again that we want to simulate a value of a random


variable X which has cumulative distribution function FX .

I We can often write FX as



X
FX (x) = πj Fj (x)
j=1
P
where the Fj ’s are also CDF, and πj > 0 for all j and πj = 1.
Mixture Distribution

I Suppose again that we want to simulate a value of a random


variable X which has cumulative distribution function FX .

I We can often write FX as



X
FX (x) = πj Fj (x)
j=1
P
where the Fj ’s are also CDF, and πj > 0 for all j and πj = 1.

I Equivalently, if the densities exist then we can write



X
fX (x) = πj fj (x).
j=1
Mixture Distribution

I Suppose again that we want to simulate a value of a random


variable X which has cumulative distribution function FX .

I We can often write FX as



X
FX (x) = πj Fj (x)
j=1
P
where the Fj ’s are also CDF, and πj > 0 for all j and πj = 1.

I Equivalently, if the densities exist then we can write



X
fX (x) = πj fj (x).
j=1

I Such a representation often occurs very naturally.


I For example, consider a
X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 , ..., λn , αn ) so that
n
X
f (x) = αj λj e −λj x
j=1

n
X
where λi , αi ≥ 0, and αi = 1. Here αi = 0 for i > n.
i=1
I For example, consider a
X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 , ..., λn , αn ) so that
n
X
f (x) = αj λj e −λj x
j=1

n
X
where λi , αi ≥ 0, and αi = 1. Here αi = 0 for i > n.
i=1

I Suppose now that it is difficult to simulate X directly using the


inverse transform method.
I For example, consider a
X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 , ..., λn , αn ) so that
n
X
f (x) = αj λj e −λj x
j=1

n
X
where λi , αi ≥ 0, and αi = 1. Here αi = 0 for i > n.
i=1

I Suppose now that it is difficult to simulate X directly using the


inverse transform method.

I Then we could use the composition method instead.


Composition Algorithm

1. Generate a random variable I that is distributed on the non-negative


integers so that P(I = j) = πj .
2. If I = j, then simulate Yj from Fj
3. Set X = Yj .
X has the desired distribution

I We have

X
P(X ≤ x) = P(X ≤ x|I = j)P(I = j)
j=1

X
= P(Yj ≤ x)P(I = j)
j=1

X
= Fj (x)πj = FX (x).
j=1
Example (Contd.)

I Let X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 ) so that

fX (x) = α1 λ1 e −λ1 x + α2 λ2 e −λ2 x .


Example (Contd.)

I Let X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 ) so that

fX (x) = α1 λ1 e −λ1 x + α2 λ2 e −λ2 x .

I Then we have α1 = π1 , α2 = π2 and


f1 (x) = λ1 e −λ1 x , f2 (x) = λ2 e −λ2 x .
Example (Contd.)

I Let X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 ) so that

fX (x) = α1 λ1 e −λ1 x + α2 λ2 e −λ2 x .

I Then we have α1 = π1 , α2 = π2 and


f1 (x) = λ1 e −λ1 x , f2 (x) = λ2 e −λ2 x .

I The following algorithm will then generate a sample of X .


I generate U1
I if U1 ≤ π1 then set i = 1
I else set i = 2
I generate U2
I set X = − λ1i log U2 .
Mixture of Normal I

n <- 5000
x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 3, 1)
u <- runif(n)
k <- [Link](u > 0.5) #vector of 0's and 1's
x <- k * x1 + (1-k) * x2
#plot the density of the mixture
#with the densities of the components
plot(density(x), xlim=c(-7,7), ylim=c(0,.5),
lwd=3, xlab="x", main="")
curve(dnorm(x,0,1),from=-7,to=7,n=5000,add=T)
curve(dnorm(x,3,1),from=-7,to=7,n=5000,add=T)
Mixture of Normal II

0.5
0.4
0.3
Density

0.2
0.1
0.0

−6 −4 −2 0 2 4 6

x
Finding Probabilities

Fact
Probability of any event A can be interpeted as the long term relative
frequency of the event A, i.e.
no. of repetitions resulting in A
total number of repetitions
as n → ∞
I Hence for computing the probability of any event A by simulation,
we shall simulate a large number n of cases and count the number of
times the event A has occured. If this number is m, then the
probability of the event A can be approximated by m/n
Drawing a Card

I A card is drawn at random from a full pack of 52 cards. Find the


probability that the drawn card is a picture card (i.e. king ,queen or
jack).

pic=NULL
for ( i in 1:100000)
{
x=sample(52,size=1)
if (any(x%%13==c(11,12,0)))
pic=c(pic,1)
}
sum(pic)/100000

[1] 0.23035
Divisibility Test

I A number is chosen at random from 1 to 1000. Find the probability


that it is divisible by 3,5 or 6

count=0
for (i in 1:10000)
{
num=sample(1000,1)
if(num%%3==0 || num%%5==0 || num%%6==0) count=count+1
}
count/10000

[1] 0.4616
Urn-Ball Problem
I Suppose an urn contains 7 white and 5 black balls. 3 balls are
chosen at random without replacement. Find the probability that
I all the 3 balls are white.
I 2 are white and 1 is black.

count1=0; count2=0;
balls= [Link](c(rep("W",7),rep("B",5)))
for ( i in 1:10000)
{
chosen= sample(balls,3);
if (all(chosen=="W")) count1=count1+1
if (table(chosen)["W"]==2) count2=count2+1
}
count1/10000; count2/10000;

[1] 0.1596
[1] 0.4799
Birthday Problem

I In a class of 25 students, find the probability that at least two


students share the same birthday.

sims <- 5000 # Specify number of simulations


count <- 0
for (i in 1:sims)
{
class<-sample(365,25,replace=TRUE) # SRSWR
if(length(unique(class)) < length(class))
{
count<-count + 1
}
}
count/sims

[1] 0.5684
Card Shuffling

I Often we speak of well-shuffled deck of cards. When we shuffle a


deck by hand, the shuffling is always imperfect (not random). We
can simulate these imperfect shufflings in computer.
Card Shuffling

I Often we speak of well-shuffled deck of cards. When we shuffle a


deck by hand, the shuffling is always imperfect (not random). We
can simulate these imperfect shufflings in computer.

I The simplest method is “cutting” the deck.


Card Shuffling

I Often we speak of well-shuffled deck of cards. When we shuffle a


deck by hand, the shuffling is always imperfect (not random). We
can simulate these imperfect shufflings in computer.

I The simplest method is “cutting” the deck.

I We cut the deck at some random point chosen somewhere around


the middle of the deck. Then put the lower part on top of the
upper part.
Card Shuffling

I Often we speak of well-shuffled deck of cards. When we shuffle a


deck by hand, the shuffling is always imperfect (not random). We
can simulate these imperfect shufflings in computer.

I The simplest method is “cutting” the deck.

I We cut the deck at some random point chosen somewhere around


the middle of the deck. Then put the lower part on top of the
upper part.

I We shall simulate this shuffle.


Simulating a Cut Shuffle

cut=function(deck)
{
x=rbinom(1,52,0.5) #choose a random cut point near middle
temp=c(deck[(x+1):52],deck[1:x])
return(temp)
}
cut(1:52)

[1] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
[26] 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[51] 25 26
Riffle Shuffle

I A much more reliable way is the rifle shuffle (also known as Faro
shuffle or dovetail shuffle.)
Riffle Shuffle

I A much more reliable way is the rifle shuffle (also known as Faro
shuffle or dovetail shuffle.)

I First split the deck into two parts just as in the cut method. Take
the top half in left hand, and the other half in your right. Release
the cards randomly from both the hands.
Riffle Shuffle

I A much more reliable way is the rifle shuffle (also known as Faro
shuffle or dovetail shuffle.)

I First split the deck into two parts just as in the cut method. Take
the top half in left hand, and the other half in your right. Release
the cards randomly from both the hands.

I Mathemtically,if at any stage there are a cards in your left hand and
b cards in your right, then the next card comes from the left hand
with probability a/a+b and from the right with probability b/a+b.
Simulating Riffle Shuffle

riffle=function(deck)
{
n=length(deck)
x=rbinom(1,52,0.5)
left=deck[1:x]; right=deck[(x+1):52];
k=0; a=length(left); b=length(right); tab=NULL;
for(drop in 1:52)
{
ind=rbinom(1,1,a/(a+b))
if (ind==1)
{
tab[k+1] = left[a]
left = left[1:(a-1)]
k = k + 1; a = a - 1;
}
else
{
tab[k+1] = right[b]
right = right[1:(b-1)]
k = k + 1; b = b - 1;
}
}
return(tab)
}
Simulating Riffle Shuffle

riffle(1:52)

[1] 29 28 52 51 50 27 26 49 25 48 24 47 23 22 21 46 45 20 44 43 42 41 40 39 19
[26] 38 37 18 36 17 16 15 14 13 35 12 11 10 9 8 7 6 34 33 5 4 3 32 2 31
[51] 1 30
Comparing the techniques

I Consider two consecutive cards (say 23rd and 24th card) in a pack
of cards. If in the original unshuffled pack, the 23rd card is followed
by the 24th card, then what is the probability that the 24th card
follows the 23rd one even in the shuffled deck?
Comparing the techniques

I Consider two consecutive cards (say 23rd and 24th card) in a pack
of cards. If in the original unshuffled pack, the 23rd card is followed
by the 24th card, then what is the probability that the 24th card
follows the 23rd one even in the shuffled deck?
1
I If the deck is well shuffled, then this probability should be 52
Comparing the techniques

I Consider two consecutive cards (say 23rd and 24th card) in a pack
of cards. If in the original unshuffled pack, the 23rd card is followed
by the 24th card, then what is the probability that the 24th card
follows the 23rd one even in the shuffled deck?
1
I If the deck is well shuffled, then this probability should be 52

I We now shuffle our virtual deck 8 times using the above two
methods and compute the above probability.
Simulating the probability

count1=0; count2=0;
for(simul in 1:1000)
{
deck1= 1:52; deck2=1:52;
for( i in 1:8)
{
deck1=cut(deck1)
deck2 = riffle(deck2)
}

for(j in 1:51)
{
if(deck1[j]==23 & deck1[j+1]==24)
count1= count1 + 1
if(deck2[j]==23 & deck2[j+1]==24)
count2= count2 + 1
}
}
print(count1/1000); print(count2/1000);

[1] 0.996
[1] 0.017
Central Limit Theorem

Theorem
(iid case) Let X1 , X2 , ....., Xn be iid random variables with mean µ and
Sn −E (Sn )
variance σ 2 < ∞ and Sn = X1 + X2 + .... + Xn . Then √ → N(0, 1)
Var (Sn )
in law as n → ∞.
Central Limit Theorem I

I U1 , U2 , ....., Un are iid U(0, 1) variables. Then

U1 + U2 + .... + Un − n/2
Zn = p → N(0, 1)
n/12

in distribution as n → ∞

n<-100
k<-10000
U<-runif(n*k)
M<-matrix(U,n,k)
X<-apply(M,2,'sum')
Z<-(X-n/2)/sqrt(n/12)
par(mfrow=c(1,2))
hist(Z)
qqnorm(Z)
qqline(Z,col="red")
Central Limit Theorem II
Histogram of Z Normal Q−Q Plot
2000

3
1500

2
Sample Quantiles

1
Frequency

1000

0
−1
500

−2
−3
0

−2 0 2 4 −4 −2 0 2 4

Z Theoretical Quantiles
Cauchy Distribution I

I It will be interesting to see the asymptotic behaviour of Sn in case of


Cauchy(0,1) distribution whose mean does exist.

n<-100
k<-10000
U<-rcauchy(n*k)
M<-matrix(U,n,k)
X<-apply(M,2,'sum')
Z<-(X-n/2)/sqrt(n/12)
par(mfrow=c(1,2))
hist(Z)
qqnorm(Z)
qqline(Z,col="red")
Cauchy Distribution II
Histogram of Z Normal Q−Q Plot
1000 2000 3000 4000 5000 6000

40000
20000
Sample Quantiles
Frequency

0
−40000 −20000
0

−40000 0 20000 −4 −2 0 2 4

Z Theoretical Quantiles
Laws of large numbers

I The weak law of large numbers says that for any  > 0 the
sequence of probabilities
 
Sn
P | − µ| <  → 1 as n → ∞
n
Laws of large numbers

I The weak law of large numbers says that for any  > 0 the
sequence of probabilities
 
Sn
P | − µ| <  → 1 as n → ∞
n

I Consider i.i.d. coin flips, that is, Bernoulli trials with p = µ = 1/2
Laws of large numbers

I The weak law of large numbers says that for any  > 0 the
sequence of probabilities
 
Sn
P | − µ| <  → 1 as n → ∞
n

I Consider i.i.d. coin flips, that is, Bernoulli trials with p = µ = 1/2
 
Sn
I We find the probability P | n − µ| <  in R and illustrate the
limiting behavior, with = 0.01
Plotting the Probability I

wlln<-function(n,eps,p)
{
pbinom(n*p+n*eps,n,p)-pbinom(n*p-n*eps,n,p)
}
prob=NULL
for (n in 1:10000)
prob[n]=wlln(n,0.01,0.5)
plot(prob,type="l",xlab="n",ylab="probability")
Plotting the Probability II

0.8
0.6
probability

0.4
0.2
0.0

0 2000 4000 6000 8000 10000

n
Strong Law of large numbers

I The strong law of large numbers says that


Sn
→µ w.p. 1 as n → ∞
n
Strong Law of large numbers

I The strong law of large numbers says that


Sn
→µ w.p. 1 as n → ∞
n

I Consider i.i.d. coin flips, that is, Bernoulli trials with p = µ = 1/2
Strong Law of large numbers

I The strong law of large numbers says that


Sn
→µ w.p. 1 as n → ∞
n

I Consider i.i.d. coin flips, that is, Bernoulli trials with p = µ = 1/2

I The sum Sn is a binomial random variable.


Illustrating Strong Law I

slln<-function(n,p)
{
x=rbinom(1,size=n,prob=p)
return(x)
}
value=NULL
for( n in 1:10000)
value[n]=slln(n,0.5)/n

plot(value,type="l",xlab="n",ylab="sample mean")
abline(h=0.5)
Illustrating Strong Law II

1.0
0.8
sample mean

0.6
0.4
0.2
0.0

0 2000 4000 6000 8000 10000

n
Using Simulation to construct Tests

I Simulation can be used to construct tests in situations where the


exact sampling distribution of the test statistic is hard to find
even under the null hypothesis.
Using Simulation to construct Tests

I Simulation can be used to construct tests in situations where the


exact sampling distribution of the test statistic is hard to find
even under the null hypothesis.

I More specifically we use simulation to find the 100α% cut-off points


Using Simulation to construct Tests

I Simulation can be used to construct tests in situations where the


exact sampling distribution of the test statistic is hard to find
even under the null hypothesis.

I More specifically we use simulation to find the 100α% cut-off points

I Suppose we have a sample of size 101 from Beta(5,b) distribution.


Using Simulation to construct Tests

I Simulation can be used to construct tests in situations where the


exact sampling distribution of the test statistic is hard to find
even under the null hypothesis.

I More specifically we use simulation to find the 100α% cut-off points

I Suppose we have a sample of size 101 from Beta(5,b) distribution.

I We want to test H0 : b = 5 against H1 : b < 5


Using Simulation to construct Tests

I Simulation can be used to construct tests in situations where the


exact sampling distribution of the test statistic is hard to find
even under the null hypothesis.

I More specifically we use simulation to find the 100α% cut-off points

I Suppose we have a sample of size 101 from Beta(5,b) distribution.

I We want to test H0 : b = 5 against H1 : b < 5

I To get an idea about the nature of the test we plot the density
function for different values of b.
Plot Of Beta Densities I

par(mfrow=c(1,3))
x=seq(0,1,by=0.01)
y1=dbeta(x,shape1=5,shape2=2)
y2=dbeta(x,shape1=5,shape2=5)
y3=dbeta(x,shape1=5,shape2=10)
plot(x,y1,type="l",xlab="b<5")
abline(v=median(rbeta(101,shape1=5,shape2=2)),col="red")
plot(x,y2,type="l",xlab="b=5")
abline(v=median(rbeta(101,shape1=5,shape2=5)),col="red")
plot(x,y3,type="l",xlab="b>5")
abline(v=median(rbeta(101,shape1=5,shape2=10)),col="red")
Plot Of Beta Densities II
2.5

2.5

3.0
2.0

2.0

2.5
1.5

2.0
1.5
y1

y2

y3

1.5
1.0

1.0

1.0
0.5

0.5

0.5
0.0

0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

b<5 b=5 b>5


What type of test shall we perform?

I From the figures we see that sample median can be used as a test
statistic.
What type of test shall we perform?

I From the figures we see that sample median can be used as a test
statistic.

I Also a right tailed test based on the median will be appropriate


What type of test shall we perform?

I From the figures we see that sample median can be used as a test
statistic.

I Also a right tailed test based on the median will be appropriate

I Thus we shall reject H0 if the sample median exceeds some value c.


What type of test shall we perform?

I From the figures we see that sample median can be used as a test
statistic.

I Also a right tailed test based on the median will be appropriate

I Thus we shall reject H0 if the sample median exceeds some value c.

I We want to find the test at 90% level of signficance.


What type of test shall we perform?

I From the figures we see that sample median can be used as a test
statistic.

I Also a right tailed test based on the median will be appropriate

I Thus we shall reject H0 if the sample median exceeds some value c.

I We want to find the test at 90% level of signficance.

I We shall use simulation technique to find the cut-off point c.


Plot PH0 (me > c) for different c I

[Link](100);prob=NULL; j=1;
C=seq(0.2,0.9,by=0.001)
for ( c in C)
{
prob[j]=0
for(i in 1:100)
{
x=rbeta(101,shape1=5,shape2=5)
me=median(x)
if(me>c) prob[j]=prob[j]+1
}
prob[j]=prob[j]/100
j=j+1
}
plot(C,prob,type="l"); abline(h=0.9,col="red");
Plot PH0 (me > c) for different c II

1.0
0.8
0.6
prob

0.4
0.2
0.0

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

C
Now lets find the c

We continue to search the c for which PH0 (me > c) is closest to 0.9
Now lets find the c

We continue to search the c for which PH0 (me > c) is closest to 0.9

C[which(prob>0.89 & prob<0.91)]

[1] 0.473 0.475

prob[which(C %in% C[which(prob>0.89 & prob<0.91)])]

[1] 0.9 0.9


Now lets find the c

We continue to search the c for which PH0 (me > c) is closest to 0.9

C[which(prob>0.89 & prob<0.91)]

[1] 0.473 0.475

prob[which(C %in% C[which(prob>0.89 & prob<0.91)])]

[1] 0.9 0.9

So we can take c to be 0.473.


Now lets find the c

We continue to search the c for which PH0 (me > c) is closest to 0.9

C[which(prob>0.89 & prob<0.91)]

[1] 0.473 0.475

prob[which(C %in% C[which(prob>0.89 & prob<0.91)])]

[1] 0.9 0.9

So we can take c to be 0.473.


Thus our test rule is : Reject H0 if sample median exceeds 0.473
Approximating Expectation

I Already studied that sample average x̄ converges to population


mean µ w.p. 1 by the Strong Law of Large Numbers. Thus
expected value of any function can be approximated by the sample
XN
average. Thus N1 f (Xi ) → E [f (X )] with probability 1 as N → ∞
i=1
if X1 , X2 , ... are iid sequence of random variables with the same
distribution as X .
Approximating Expectation

I Already studied that sample average x̄ converges to population


mean µ w.p. 1 by the Strong Law of Large Numbers. Thus
expected value of any function can be approximated by the sample
XN
average. Thus N1 f (Xi ) → E [f (X )] with probability 1 as N → ∞
i=1
if X1 , X2 , ... are iid sequence of random variables with the same
distribution as X .

I A Monte Carlo method for estimating E (f (X )) is a numerical


method based on the approximation
N
1X
E [f (X )] ≈ f (Xi )
N
i=1

where X1 , X2 , ... are iid sequence of random variables with the same
distribution as X .
Application of Monte Carlo

I We can apply Monte Carlo to approximate any expectation. How


does that help us?
Application of Monte Carlo

I We can apply Monte Carlo to approximate any expectation. How


does that help us?

I A wide range of summary measures can be expressed as expectation


of some quantities. So we can approximate them also. e.g. Variance
whose Monte-Carlo estimate will be sample variance. Similarly for
moments.
Application of Monte Carlo

I We can apply Monte Carlo to approximate any expectation. How


does that help us?

I A wide range of summary measures can be expressed as expectation


of some quantities. So we can approximate them also. e.g. Variance
whose Monte-Carlo estimate will be sample variance. Similarly for
moments.

I In fact we can express any probability as expected value of indicator


random variables. So we can approximate probabilities as well.
Monte-Carlo estimate of probability
I Suppose we need to estimate P(X ∈ A) for some random variable X
having distribution f .
Monte-Carlo estimate of probability
I Suppose we need to estimate P(X ∈ A) for some random variable X
having distribution f .

I Now we can write


Z
P(X ∈ A) = E (I (X ∈ A)) = I (x ∈ A)f (x)dx.
Monte-Carlo estimate of probability
I Suppose we need to estimate P(X ∈ A) for some random variable X
having distribution f .

I Now we can write


Z
P(X ∈ A) = E (I (X ∈ A)) = I (x ∈ A)f (x)dx.

I Hence the Monte-Carlo estimate will be


N
1 X
I (xi ∈ A) = p
N
i=1

where p is the proportion of observations in A out of a sample of


size N from distribution f .
Monte-Carlo estimate of probability
I Suppose we need to estimate P(X ∈ A) for some random variable X
having distribution f .

I Now we can write


Z
P(X ∈ A) = E (I (X ∈ A)) = I (x ∈ A)f (x)dx.

I Hence the Monte-Carlo estimate will be


N
1 X
I (xi ∈ A) = p
N
i=1

where p is the proportion of observations in A out of a sample of


size N from distribution f .

I Hence Monte-Carlo estimate of probability is the long run relative


frequency.
Another Application: Monte Carlo Integration

Z b
I Consider the integral f (x)dx
a
Another Application: Monte Carlo Integration

Z b
I Consider the integral f (x)dx
a

I We can express this integral as expectation of some random variable.


Another Application: Monte Carlo Integration

Z b
I Consider the integral f (x)dx
a

I We can express this integral as expectation of some random variable.


1
I Let X , X1 , X2 , ..... be iid U(a, b), i.e. density of Xj is φ(x) = b−a I[a,b]
Another Application: Monte Carlo Integration

Z b
I Consider the integral f (x)dx
a

I We can express this integral as expectation of some random variable.


1
I Let X , X1 , X2 , ..... be iid U(a, b), i.e. density of Xj is φ(x) = b−a I[a,b]

I Then Z b Z b
f (x)dx = (b − a) f (x)φ(x)dx
a a

= (b − a)E (f (X ))
N
b−a X
≈ f (Xj )
N
i=1

for large N.
Integration

R 2π
I Evaluate the integral 0
e cos(x) dx
Integration

R 2π
I Evaluate the integral 0
e cos(x) dx

I We generate samples Xj from U(0, 2π)


Integration

R 2π
I Evaluate the integral 0
e cos(x) dx

I We generate samples Xj from U(0, 2π)

I Then use the approximation


Z 2π N
2π X cos(Xj )
e cos(x) dx ≈ e
0 N
j=1
Code in R

N=1000
x=runif(N,min=0,max=(2*pi))
value=sum(exp(cos(x)))
value=(2*pi)*value/N
value

[1] 8.188166
Family Planning

I Suppose a couple is planning to have children until they have one


child of each sex. Assuming male and female child are equally
probable, how many children can they expect to have?
Family Planning

I Suppose a couple is planning to have children until they have one


child of each sex. Assuming male and female child are equally
probable, how many children can they expect to have?

count=NULL
for ( i in 1:1000)
{
child=sample(0:1,1)
while(length(unique(child))<2)
child=c(child,sample(0:1,1))
count[i]=length(child)
}
mean(count)

[1] 3.008
Example

I Suppose we want to estimate Φ(t) = P(X ≤ t), where X ∼ N(0, 1) for a fixed t.
I We can do that in several ways:

1. Generate N observations from N(0, 1) distribution and find the proportions


of observations not exceeding t.
2. Generate N observations from U(0, t) and evaluate the integral
Z t 1 u2
0.5 + √ e − 2 du
0 2π
3. Generate N observations from U(0, 1) and evaluate the integral
1
Z 1 2
0.5 + √ te −(tv ) /2
dv
2π 0

which is obtained from the previous integral by transformation v = u/t.


I The last two process assumes that t > 0, so that for t < 0, we can have
Φ(t) = 1 − Φ(−t).
I We now generate Φ(t) for different values of t and compare with the inbuilt
pnorm function of R.
x <- seq(.1, 2.5, length = 10)
m <- 10000
method1 <- numeric(length(x))
method2 <- numeric(length(x))
method3 <- numeric(length(x))
method4 <- numeric(length(x))

z <- rnorm(m)
dim(x) <- length(x)
method1 <- apply(x, MARGIN = 1,
FUN = function(x, z) {mean(z < x)}, z = z)

for (i in 1:length(x)) {
u=runif(m,min=0,max=x[i])
g= exp(-(u^2)/2)
method2[i]=mean(g)*x[i] / sqrt(2 * pi) + 0.5
}

u <- runif(m)
for (i in 1:length(x)) {
g <- x[i] * exp(-(u * x[i])^2 / 2)
method3[i] <- mean(g) / sqrt(2 * pi) + 0.5
}

method4=pnorm(x)

print(round(rbind(x, method1, method2, method3,method4), 3))

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x 0.100 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
method1 0.542 0.643 0.738 0.818 0.880 0.925 0.955 0.976 0.987 0.992
method2 0.540 0.643 0.737 0.816 0.879 0.922 0.958 0.975 0.984 0.996
method3 0.540 0.643 0.737 0.816 0.878 0.924 0.956 0.975 0.987 0.993
method4 0.540 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
Bias and Variance

I The Monte Carlo estimate ZnMC for E (f (X )), has

bias(ZnMC ) = 0

and
1
MSE (ZnMC ) = Var (ZnMC ) = Var (f (X )).
n
Mean or Median

I Problem: One can estimate the center of a symmetric distribution


either with the sample mean or with the sample mdeian. Which of
the estimators is better, when the underlying population is
a) the standard normal
b) t2 distribution.
For Normal Distribution

[Link]=1000; [Link]=50;
m=NULL; me=NULL;
for (i in 1:[Link])
{
x=rnorm([Link])
m[i]=mean(x); me[i]=median(x);
}
se.m= sqrt(mean(m^2)-((mean(m))^2))
[Link]= sqrt(mean(me^2)-((mean(me))^2))
mean(m); mean(me);

[1] -0.004079595
[1] -0.009236697

se.m; [Link];

[1] 0.1412464
[1] 0.1722919
For t-distribution

[Link]=1000; [Link]=50;
m=NULL; me=NULL;
for (i in 1:[Link])
{
x=rt([Link],df=2)
m[i]=mean(x); me[i]=median(x);
}
se.m= sqrt(mean(m^2)-((mean(m))^2))
[Link]= sqrt(mean(me^2)-((mean(me))^2))
mean(m); mean(me);

[1] 0.01256094
[1] 0.006310878

se.m; [Link];

[1] 0.5231233
[1] 0.202407
Level α test

I Problem: Draw a random sample of size n from N(µ, 1) population


and on the basis of that sample we consider the UMP test of size
α = 0.05 for testing H0 : µ = 0 against H1 : µ > 0. Verify using
Monte Carlo method whether P(Type I error ) ≈ α
Level α test

I Problem: Draw a random sample of size n from N(µ, 1) population


and on the basis of that sample we consider the UMP test of size
α = 0.05 for testing H0 : µ = 0 against H1 : µ > 0. Verify using
Monte Carlo method whether P(Type I error ) ≈ α

I The level α UMP test is given by


(
τα
1 if x̄ > √ n
φ=
0 otherwise

and P(Type I error ) = E [φ|H0 ]


Level α test

I Problem: Draw a random sample of size n from N(µ, 1) population


and on the basis of that sample we consider the UMP test of size
α = 0.05 for testing H0 : µ = 0 against H1 : µ > 0. Verify using
Monte Carlo method whether P(Type I error ) ≈ α

I The level α UMP test is given by


(
τα
1 if x̄ > √ n
φ=
0 otherwise

and P(Type I error ) = E [φ|H0 ]

I This expectation is approximated as: Draw random sample several


times (say, m times) and at each stage observe the value of φ as φi .
m
X
Then our estimate will be m1 φi
i=1
Checking the level in R

n=100
m=500
x=NULL
p=0
for(i in 1:m)
{
x=rnorm(n)
if(mean(x)>qnorm(0.95)/sqrt(n)) p=p+1
}
p/m

[1] 0.048
Power of Test

I Problem: Suppose (X , Y ) ∼ N2 (50, 65, 4, 9, ρ). Want to test


H0 : ρ = 0 against H1 : ρ > 0. Find the power curve of the test.
Power of Test

I Problem: Suppose (X , Y ) ∼ N2 (50, 65, 4, 9, ρ). Want to test


H0 : ρ = 0 against H1 : ρ > 0. Find the power curve of the test.

I Let (Xi , Yi ), i = 1, 2, ...n be a random sample. Then the test


statistic is √
n − 2r
t= √ ∼ tn−2 under H0
1 − r2
Power of Test

I Problem: Suppose (X , Y ) ∼ N2 (50, 65, 4, 9, ρ). Want to test


H0 : ρ = 0 against H1 : ρ > 0. Find the power curve of the test.

I Let (Xi , Yi ), i = 1, 2, ...n be a random sample. Then the test


statistic is √
n − 2r
t= √ ∼ tn−2 under H0
1 − r2
I Hence the power function is β(ρ) = P(t > tα,n−2 |ρ)
Power of Test

I Problem: Suppose (X , Y ) ∼ N2 (50, 65, 4, 9, ρ). Want to test


H0 : ρ = 0 against H1 : ρ > 0. Find the power curve of the test.

I Let (Xi , Yi ), i = 1, 2, ...n be a random sample. Then the test


statistic is √
n − 2r
t= √ ∼ tn−2 under H0
1 − r2
I Hence the power function is β(ρ) = P(t > tα,n−2 |ρ)

I How to compute this probability?


Power of Test

I Problem: Suppose (X , Y ) ∼ N2 (50, 65, 4, 9, ρ). Want to test


H0 : ρ = 0 against H1 : ρ > 0. Find the power curve of the test.

I Let (Xi , Yi ), i = 1, 2, ...n be a random sample. Then the test


statistic is √
n − 2r
t= √ ∼ tn−2 under H0
1 − r2
I Hence the power function is β(ρ) = P(t > tα,n−2 |ρ)

I How to compute this probability?

I Apply Monte-Carlo Technique.


Power Curve in R I

options(warn=-1)
require(MASS,quiet=T)
n=50; m=1000;p=seq(0,0.6,0.01); prob=NULL;
for( i in 1:length(p))
{
s=0
for(j in 1:m)
{
u=mvrnorm(n,c(50,65),
matrix(c(4,p[i]*2*3,p[i]*2*3,9),nrow=2))
r=cor(u)[1,2]; t=r*sqrt((n-2)/(1-r^2));
if(t>qt(0.95,n-2)) s=s+1
}
prob[i]=s/m
}
plot(p,prob,type="l",xlab=expression(rho),main="Power Curve")
Power Curve in R II

1.0
0.8
0.6 Power Curve
prob

0.4
0.2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

You might also like