0% found this document useful (0 votes)
18 views40 pages

Introduction To Bayesian Computation Lecture02

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

Introduction To Bayesian Computation Lecture02

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

VIASM Lecture Series on

Introduction to Bayesian Computation

Lecture 2: Simple Monte Carlo methods

Tran Minh Ngoc, University of Sydney Business School


Dao Viet Hung, University of New South Wales

1 / 40
Learning objectives

▶ Understand what the Monte Carlo method is and why it is


important
▶ Understand how computers can generate random numbers
▶ Be able to use methods such as Inverse method,
Transformation method, Accept-Reject method to generate
random variables.
▶ Know how to use the importance sampling method to
estimate expectations.

2 / 40
Monte Carlo methods
Monte Carlo simulation is at the heart of scientific computing1

1
Picture credit: [Link]
the-future-of-scientific-computation
3 / 40
Table of contents

Monte Carlo Integration

Random Variable Generation

Importance Sampling

4 / 40
Recommended reading

[AO], chapters 2, 3 and 9

[Link]

5 / 40
Outline

Monte Carlo Integration

Random Variable Generation

Importance Sampling

6 / 40
Monte Carlo Integration
Introduction
A basic problem in statistics is to compute an integral of the form
Z
I = E[h(X )] = h(x)π(x)dx
X

where π(x) is a pdf with space X , X ∼ π(x), and some function


h(x) on X . Often, X is high dimensional.
E.g.
Z Z
µX = E[X ] = xπ(x)dx, V(X ) = (x − µX )2 π(x)dx
Z
P(X ∈ A) = IA (x)π(x)dx

where IA (x) = 1 if and only if x ∈ A.


In most cases, we can’t compute I analytically, have to use
computers
7 / 40
Monte Carlo Integration
Deterministic numerical methods

Suppose that x is scalar


Z b
I = E[h(X )] = h(x)π(x)dx
a

Basic idea
▶ Divide the domain into grids a = x0 < x1 < ... < xn = b
▶ Estimate I by Ib = ni=1 h(ξi )π(ξi )(xi −xi−1 ) for some
P
ξi ∈ [xi−1 ,xi )

Major drawback:
▶ Very inefficient for high dimensional integrals.
▶ The domain must be finite: finite limits a and b

8 / 40
Monte Carlo Integration
Monte Carlo methods

Suppose that we are able to use a computer to generate n samples


iid
Xi ∼ π(x), i = 1, ..., n. Let
n
1X
Ibn := h(Xi )
n
i=1

a.s.
By LLN, Ibn −→ I, V(Ibn ) = V(h(X1 ))/n −→ 0 as n → ∞ regardless
of the dimension of the integral
Note that numerical methods are deterministic, while Monte Carlo
methods are stochastic
Numerical methods spend all computational resources equally on
the entire domain, while Monte Carlo methods focus only on
regions with high π-density.

9 / 40
Monte Carlo Integration
A bit about history and the name “Monte Carlo”

▶ Dates back to as earlier as the 18th century in the famous


Buffon’s needle problem2
▶ The modern version of Monte Carlo methods was invented in
the late 1940s by Stanislaw Ulam, a Polish-American
mathematician, while he was working on nuclear weapons
projects during WWII.
▶ Because the projects were confidential, the work of Ulam
required a code name, the name Monte Carlo was used, which
refers to the Monte Carlo Casino in Monaco where Ulam’s
relatives used to gamble. Yes, it’s weird!
▶ Nowadays, Monte Carlo methods are used in almost every
scientific fields

2
[Link]
10 / 40
Monte Carlo Integration

Example. Z  2
12 x
I= x √ exp − dx = 1
R 2π 2
Let try the following in Matlab
x = normrnd(0,1,n,1); Ihat = sum(x.*x)/n
n 100 1000 10000 1,000,000
In
b 0.8098 0.9702 1.0057 1.0010

11 / 40
Beyond Monte Carlo...

Randomization seems to be a good solution to many scientific


problems

▶ Basic equation in scientific problems

Ax = b,

with a HUGE A, can be randomized! Exponentially faster


than the deterministic approach
▶ A quantum computer: randomize a classical computer, which
is based on deterministic principles

12 / 40
Monte Carlo Integration

But wait...

iid
How can we use a computer to generate samples Xi ∼ π(x)?

Note: there are two differenct uses of the notation X ∼ π(X ). One
means “distributed as” and the other means “sampled from”.

13 / 40
Outline

Monte Carlo Integration

Random Variable Generation

Importance Sampling

14 / 40
Random Variable Generation

iid
How a computer can generate random numbers Xi ∼ π(x)?
Albert Einstein once said ”God does not play dice”. It seems he
didn’t find comfortable talking about “randomness”. Then how
come a computer can mimic “randomness”?
Yet, we can teach a computer how to generate random numbers,
but only from uniform distribution U(0, 1), the simplest
distribution.

15 / 40
Random Variable Generation

Strategy:
▶ Design a computer algorithm to generate a sequence of
numbers (u1 , ..., un ) in [0,1].
▶ Test the hypothesis that these numbers are iid ∼ U(0, 1).
▶ If we cannot reject that hypothesis when n is large enough,
then these numbers can be considered as iid samples from
U(0, 1)

Definition. A random number generator is an algorithm which,


starting from an initial value u0 and using a transformation rule D,
produces a sequence ui = D(ui−1 ) of values in [0, 1]. These
numbers (u1 , ..., un ) pass a set of usual hypothesis tests that
they’re iid ∼ U(0, 1).

16 / 40
Random Variable Generation

Random seed refers to how the initial value u0 is set up. If you use
the same random seed, the computer produces exactly the same
pseudo-random numbers.
Notice the difference in the following
>> normrnd(0,1,1,2)
ans =
0.5377 1.8339
>> normrnd(0,1,1,2)
ans =
-2.2588 0.8622
>> rng(1000)
>> normrnd(0,1,1,2)
ans =
0.3106 -1.3391
>> rng(1000)
>> normrnd(0,1,1,2)
ans =
0.3106 -1.3391

17 / 40
Random Variable Generation

All computers nowadays are equipped with a random number


generator. There’re several random number generators. We don’t
discuss it here.
We assume that we can generate U ∼ U(0, 1). That’s all we need
to be able to generate a random variable X ∼ π(x) for any pdf
π(x).
Just like the basic elements of computers are sequences of binary
numbers 0 and 1, uniform random numbers U ∼ U(0, 1) are the
basics of all Monte Carlo methods.

18 / 40
Random Variable Generation
Inverse method

Want to generate a rv X with pdf π(x) and cdf F (x). Assume


that the inverse function F −1 of F exists.
▶ Generate U ∼ U(0, 1)
▶ Set X = F −1 (U)

Then the generated X is a random variable whose cdf is indeed


F (x).
This is because

FX (x) = P(X ≤ x) = P F −1 (U) ≤ x = P U ≤ F (x) = F (x)


 

19 / 40
Random Variable Generation
Inverse method

Example. The exponential distribution with Exp(λ = 1) has the


pdf
f (x) = e −x , x > 0.
Its cdf is
F (x) = 1 − e −x , x > 0.
The range of F (x) is [0, 1]. Given an u ∈ [0, 1] in this range, the
inverse of F (x) = u is

x = F −1 (u) = − log(1 − u).

If U ∼ U(0, 1), then X = − log(1 − U) ∼Exp(λ = 1).

20 / 40
Random Variable Generation
Inverse method
The following Matlab code generates 10, 000 observations from
Exp(1):
n = 10000;
u = unifrnd(0,1,n,1);
x = -log(1-u);
hist(x)

In general, F (x) = 1 − e −λx , x > 0 is the cdf of exponential


distribution Exp(λ). F −1 (u) = − log(1 − u)/λ 21 / 40
Random Variable Generation
Transformation Methods

Usefull when the distribution of interest is linked in a relatively


simple way to other distributions that are easy to simulate.
For example, if Z ∼ N (0, 1), Y ∼ χ2ν and Y is independent of Z .
Then
Z
T =p
Y /ν
is Student’s t r.v. tν .
Hence, if you know how to sample from N(0, 1) and χ2ν , you can
generate T .

22 / 40
Random Variable Generation
Example.
▶ χ22 is Exp(1/2) (check youself!), so if U ∼ U(0, 1) then
X = −2 log(1 − U) ∼ χ22 .
▶ If X ∼ χ2r , Y ∼ χ2s and X , Y are independent, then
X + Y ∼ χ2r +s .

P that if Uj ∼ U(0, 1) then


This implies
Y = −2 νj=1 log(1 − Uj ) ∼ χ22ν
The following Matlab code generates 10000 observations from χ210 :
n = 10000; nu = 5;
u = unifrnd(0,1,n,nu);
y = -2*sum(log(1-u),2);
hist(y)

23 / 40
Random Variable Generation

Example. (Homework!) Let X be a discrete rv



−1,
 with prob. 0.2
X = 0, with prob. 0.5

1, with prob. 0.3

Write computer code to generate 10000 observations of X and plot


their histogram.

24 / 40
Random Variable Generation

Before we move on, think about how you can generate from the
standard normal distribution N (0, 1)?

25 / 40
Random Variable Generation
Accept-Reject Methods

Difficulties with the inverse and transformation methods


▶ It might be difficult to compute the inverse cdf F −1 , e.g.,
inverse of the normal cdf
▶ Many distributions don’t have a simple relation with
distributions that are easy to sample from
▶ In some cases the density π is known only up to a constant:
R
π(x) = k(x)/C with the normalising constant C = k(x)dx
unknown

The key idea of Accept-Reject methods: use a simpler density g ,


which is easy to sample from, as an instrumental density, to assist
in generating from π.

26 / 40
Accept-Reject Methods R
Want to sample from density π(x) = k(x)/C , where C = k(x)dx
might be unknown. We sometimes write π(x) ∝ k(x).
Algorithm. Find a density g and a constant M s.t.
k(x) ≤ Mg (x) for all x in the support of π
(i) Generate X ∼ g and U ∼ U(0, 1)
k(X )
(ii) Accept Y = X if U ≤ Mg (X ) , otherwise return to (i).
Proof.
The joint density of X and U is fX ,U (x, u) = g (x)fU (u) = g (x)
k(X )
k(X ) P(X ≤ y , U ≤ Mg (X ) )
P(Y ≤ y ) = P(X ≤ y |U ≤ )= k(X )
Mg (X ) P(U ≤ Mg (X ) )
R y R k(x)/Mg (x) Z y
−∞ 0 fX ,U (x, u)dxdu k(x)
= R ∞ R k(x)/Mg (x) = dx = F (y )
fX ,U (x, u)dxdu −∞ C
−∞ 0

27 / 40
Accept-Reject Methods
The algorithm works for any M > 0 as long as

k(x) ≤ Mg (x) for all x in the support of π.

This condition requires that the tails of g (x) must cover the tails
of π(x) (Draw a picture to convince yourself!)
The probability of accepting Y = X is

k(X ) C
P(U ≤ )=
Mg (X ) M

So the smaller M the more efficient the algorithm is.


Note that
Z Z Z
C= k(x)dx ≤ Mg (x)dx = M g (x)dx = M

28 / 40
Accept-Reject Methods

Example. Want to sample from a truncated normal density


2
π(x) ∝ k(x) = exp(− x2 )Ix≥0 . Select the exponential distribution
Exp(λ) as the instrumental distribution g (x) = λ exp(−λx)Ix≥0
 2 
k(x) 1 x 1
= exp − + λx ≤ exp(λ2 /2), ∀x.
g (x) λ 2 λ

Hence, we have
k(x) ≤ Mg (x), for all x
with M = λ1 exp(λ2 /2).

29 / 40
Accept-Reject Methods

The following Matlab code generates 10000 observations from


π(x) with λ = 4
tic
n = 10000;
y = zeros(n,1);
lambda = 4;
M = 1/lambda*exp(lambda^2/2);
for i = 1:n
stop = 0;
while ~stop
x = exprnd(lambda);
u = unifrnd(0,1);
if u<=exp(-x.^2/2+x)/M stop = 1; end
end
y(i) = x;
end
CPU = toc % CPU time taken
CPU =

265.5920

30 / 40
Accept-Reject Methods

The following Matlab code generates 10000 observations from


π(x) with λ = 1
tic
n = 10000;
y = zeros(n,1);
lambda = 1;
M = 1/lambda*exp(lambda^2/2);
for i = 1:n
stop = 0;
while ~stop
x = exprnd(lambda);
u = unifrnd(0,1);
if u<=exp(-x.^2/2+x)/M stop = 1; end
end
y(i) = x;
end
CPU = toc % CPU time taken
CPU =

0.1702

31 / 40
Accept-Reject Methods

Did you notice the CPU times taken in the two differenct choices
of λ?
1
The upper bound M = λ exp(λ2 /2) is minimised at λ = 1. The
best λ is 1, then √
M= e.

32 / 40
Accept-Reject Methods

Advantages of the Accept-Reject method:


▶ provides a generic method to simulate from any density π that
is known up to a normalising constant.
▶ the samples generated are independent of each other.

Disadvantages: might be difficult to select g and M.


There’re alternative methods for sampling from π(x) (covered in
the next lecture)
▶ Markov chain Monte Carlo methods
▶ Gibbs sampling

33 / 40
Random Variable Generation

▶ Inverse method, transformation method and accept-reject


method are general methods for generating random variables
from a distribution π(x).
▶ There are highly efficient methods specifically designed for
some special distribution such as N (0, 1) (not covered here,
please see the [AO] book).

34 / 40
Outline

Monte Carlo Integration

Random Variable Generation

Importance Sampling

35 / 40
Monte Carlo problems

Up to this point, we see that Monte Carlo methods are to deal


with two main problems

P1 Generating samples from a probability distribution of interest


with pdf π(x).
P2 Estimating an integral of the form
Z

I = h(x)π(x)dx = EX ∼π(x) h(X )

for some function h(x).

Problem 2 can be solved from Problem 1. Sometimes, it’s more


convenient and more efficient to solve Problem 2 directly.
Importance samplig is such a technique.

36 / 40
Importance Sampling
Monte Carlo Integration
We want to compute the integral
Z
I= h(x)π(x)dx
X

where π(x) is a pdf and some function h(x) on X .


iid
Recall that, if we are able to generate a sample Xi ∼ π(x), then
n
1X a.s.
Ibn := h(Xi ) −→ I
n
i=1

as n → ∞. However, sometimes
▶ It’s challenging to sample directly from π
▶ Simulation from π is not necessarily optimal in terms of
having a small variance V(I)
b
▶ π(x) is known only up to a constant: π(x) ∝ k(x)
37 / 40
Importance Sampling
π(x) is fully known
Suppose for now that π(x) is fully known, but it’s challenging to
sample from it directly.
Let g (x) be an instrumental density that it’s easy to sample from
and supp(g ) covers supp(π). The key idea is
Z Z  
π(x) π(X )
I= h(x)π(x)dx = h(x) g (x)dx = EX ∼g h(X )
X X g (x) g (X )

Importance Sampling algorithm


▶ Generate Xi ∼ g (x), i = 1,...,n
▶ Estimate I by
n
1X π(Xi )
Ibn = h(Xi )
n g (Xi )
i=1

Check that Ibn is unbiased!


38 / 40
Importance Sampling
π(x) is fully known

▶ g is called the proposal density or instrumental density


a.s
▶ If supp(g ) covers supp(π) then Ibn → I. Under some mild
conditions,
 2 !
1 π(x) 2 n→∞
V(Ibn ) = Eg h(X ) −I −→ 0
n g (X )

and the CLT applies, which is useful for constructing


confidence intervals for I.
▶ Importance Sampling is one of the most widely-used
simulation techniques.

39 / 40
Importance Sampling
π(x) is not fully known
If π(x) is known only up to a normalising constant π(x) ∝ k(x),
i.e. π(x) = Ck(x) with C unknown.
Then I is estimated by
Pn
i=1 h(Xi )k(Xi )/g (Xi )
Ien = Pn
i=1 k(Xi )/g (Xi )

where Xi ∼ g (x), i = 1,...,n.


▶ This estimator is no longer unbiased.
a.s
▶ Check that Ien −→ I!
▶ Under some mild conditions, we also have that
n→∞
V(Ibn ) −→ 0

and CLT applies for Ien .


40 / 40

You might also like