0% found this document useful (0 votes)

85 views13 pages

Importance Sampling

Uploaded by

dhoang6679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views13 pages

Importance Sampling

Uploaded by

dhoang6679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Chapter 6

Importance sampling

6.1 The basics

To movtivate our discussion consider the following situation. We want to use Monte Carlo to
compute µ = E[X]. There is an event E such that P (E) is small but X is small outside of E.
When we run the usual Monte Carlo algorithm the vast majority of our samples of X will be
outside E. But outside of E, X is close to zero. Only rarely will we get a sample in E where
X is not small.

Most of the time we think of our problem as trying to compute the mean of some random
variable X. For importance sampling we need a little more structure. We assume that the
random variable we want to compute the mean of is of the form f (X) ~ where X ~ is a random
~ is absolutely continous and let p(~x) be
vector. We will assume that the joint distribution of X
the density. (Everything we will do also works for the case where the random vector X ~ is
discrete.) So we focus on computing
Z
~ =
Ef (X) f (~x)p(~x)dx (6.1)

Sometimes people restrict the region of integration to some subset D of Rd . (Owen does this.)
We can (and will) instead just take p(x) = 0 outside of D and take the region of integration
to be Rd .

The idea of importance sampling is to rewrite the mean as follows. Let q(x) be another
probability density on Rd such that q(x) = 0 implies f (x)p(x) = 0. Then
Z Z
f (x)p(x)
µ= f (x)p(x)dx = q(x) dx (6.2)
q(x)

1
We can write the last expression as
 
~ X)
f (X)p( ~
Eq   (6.3)
~
q(X)

where Eq is the expectation for a probability measure for which the distribution of X ~ is q(x)
rather than p(x). The density p(x) is called the nominal or target distribution , q(x) the
importance or proposal distribution and p(x)/q(x) the likelihood ratio. Note that we assumed
that f (x)p(x) = 0 whenever q(x) = 0. Note that we do not have to have p(x) = 0 for all x
where q(x) = 0.
~ 1, · · · , X
The importance sampling algorithm is then as follows. Generate samples X ~n
according to the distribution q(x). Then the estimator for µ is

1X n ~ i )p(X
f (X ~ i)
µ̂q = (6.4)
n i=1 q(X ~ i)

Of course this is doable only if f (x)p(x)/q(x) is computable.

Theorem 1 µ̂q is an unbaised estimator of µ, i.e., Eq µ̂q = µ. Its variance is σq2 /n where
Z
f (x)2 p(x)2 Z
(f (x)p(x) − µq(x))2
σq2 = 2
dx − µ = dx (6.5)
q(x) q(x)

Proof: Straightforward. QED

We can think of this importance sampling Monte Carlo algorithm as just ordinary Monte
~ X)/q(
Carlo applied to Eq [f (X)p( ~ ~
X)]. So a natural estimator for the variance is
 2
1X n ~ i )p(X
f (X ~ i)
σ̂q2 =  − µ̂q  (6.6)
n i=1 q(X ~ i)

What is the optimal choice of the importance distribution q(x)? Looking at the theorem we
see that if we let q(x) = f (x)p(x)/µ, then the variance will be zero. This is a legitimate
probability density if f (x) ≥ 0. Of course we cannot really do this since it would require
knowing µ. But this gives us a strategy. We would like to find a density q(x) which is close to
being proportional to f (x)p(x).

What if f (x) is not positive? Then we will show that the variance is minimized by taking q(x)
to be proportional to |f (x)|p(x).
Theorem 2 Let q(x) = |f (x)|p(x)/c where c is the constant that makes this a probability
density. Then for any probability density q(x) we have σq ≤ σq

Proof: Note that c = |f (x)|q(x)dx.

Z
f (x)2 p(x)2
σq − µ2 = dx (6.7)
q(x)
Z
= c |f (x)|p(x)dx (6.8)
Z 2
= |f (x)|p(x)dx (6.9)
!2
|f (x)|p(x)
Z
= q(x)dx (6.10)
q(x)
Z
f (x)2 p(x)2
≤ q(x)dx (6.11)
q(x)2
Z
f (x)2 p(x)2
= dx (6.12)
q(x)
= σq − µ2 (6.13)

where we have used the Cauchy Schwarz inequality with respect to the probaility measure
q(x)dx. (One factor is the function 1.) QED.

Since we do not know f (x)p(x)dx, we probably do not know |f (x)|p(x)dx either. So the
R R

optimal sampling density given in the theorem is not realizable. But again, it gives us a
strategy. We want a sampling density which is approximately proportional to |f (x)|p(x).

Big warning: Even if the original f (X) ~ has finite variance, there is no guarantee that σq will
be finite. Discuss heavy tails and light tails.

How the sampling distribution should be chosen depends very much on the particular
problem. Nonetheless there are some general ideas which we illustrate with some trivial
examples.

If the function f (x) is unbounded then ordinary Monte Carlo may have a large variance,
possibly even infinite. We may be able to use importance sampling to turn a problem with an
unbounded random variable into a problem with a bounded random variable.

Example We want to compute the integral

Z 1
I= x−α e−x dx (6.14)
0
where 0 < α < 1. So the integral is finite, but the integrand is unbounded. We take
f (x) = x−α e−x and the nominal distribution is the uniform distribution on [0, 1]. Note that f
will have infinite variance if α ≤ −1/2.

We take the sampling distribution to be

1
q(x) = x−α (6.15)
1−α
on [0, 1]. This can be sampled using inversion. We have

p(x)
f (x) = e−x (1 − α) (6.16)
q(x)

So we do a Monte Carlo simulation of Eq [e−X (1 − α)] where X has distribution q. Note that
e−X (1 − α) is a bounded random variable.

The second general idea we illustrate involves rare-event simulation. This refers to the
situation where you want to compute the probabily of an event when that probability is very
small.

Example: Let Z have a standard normal distribution. We want to compute P (Z ≥ 4). We

could do this by a Monte Carlo simulation. We generate a bunch of samples of Z and count
how many satisfy Z ≥ 4. The problem is that there won’t be very many (probably zero). If
p = Pq(Z ≥ 4), then the variance of 1Z≥4 is p(1 − p) ≈ p. So the error with n samples is of
order p/n. So this is small, but it will be small compared to p only if n is huge.

Our nominal distribution is

1 1
p(x) = √ exp(− x2 ) (6.17)
2π 2
We take the sampling distribution to be
(
e−(x−4) , if x ≥ 4,
q(x) = (6.18)
0, if x < 4,

The sampling distribution is an exponential shifted to the right by 4. In other words, if Y has
an exponential distribution with mean 1, then Y + 4 has the distribution q. The probability
we want to compute is
Z
p = 1x≥4 p(x) dx (6.19)
Z
p(x)
= 1x≥4 q(x) dx (6.20)
q(x)
The likehood ratio is
p(x) 1 1
w(x) = = √ exp(− x2 + x − 4) (6.21)
q(x) 2π 2
√
On [4, ∞) this function is decreasing. So its maximum is at 4 where its value is exp(−8)/ 2π
which is really small. The variance is no bigger than the second moment which is bounded by
this number squared. This is exp(−16)/2π. Compare this with the variance of ordinary MC
which saw was of the order of p which is on the order of exp(−8). So the decrease in the
variance is huge.

Example We return to the network example, following Kroese’s review article. Let
U1 , U2 , · · · , U5 be independent and uniform on [0, 1]. Let Ti be Ui multiplied by the approriate
constant to give the desired distribution for the times Ti . We want to estimate the mean of
f (U1 , · · · , U5 ) where f is the minimum time. The nominal density is p(u) = 1 on [0, 1]5 . For
our sampling density we take
5
νi uiνi −1
Y
g(u) = (6.22)
i=1

where the νi are parameters. (This is a special case of the beta distribution.) Note that νi = 1
gives the nominal distribution p. There is no obvious choice for the νi . Kroese finds that with
ν = (1.3, 1.1, 1.1, 1.3, 1.1) the variance is reduced by roughly a factor of 2.

We have discussed importance sampling in the setting where we want to estimate E[f (X)] ~
and X ~ is jointly absolutely continuous. Everything we have done works if X ~ is a discrete RV.
For this discussion I will drop the vector notation. So suppose we want to compute
µ = E[f (X)] where X is discrete with probability mass function p(x), i.e., p(x) = P (X = x).
If q(x) is another discrete distribution such that q(x) = 0 implies f (x)p(x) = 0, then we have
X X f (x)p(x) f (x)p(x)
µ = E[f (X)] = f (x)p(x) = q(x) = Eq [ ] (6.23)
x x q(x) q(x)
where Eq means expectation with repect to q(x).

Example - union counting problem (from Fishman)

We have a finite set which we will take to just be {1, 2, · · · , r} and will call Ω. We also have a
collection Sj , j = 1, · · · , m of subsets of Ω. We know r, the cardinality of Ω and the
cardinalities |Sj | of all the given subsets. Througout this example we use | | to denote the
cardinality of a set. We want to compute l = |U | where
U = ∪m
j=1 Sj (6.24)
We assume that r and l are huge so that we cannot do this explicitly by finding all the
elements in the union. We can do this by a straightforward Monte Carlo if two conditions are
met. First, we can sample from the uniform distribution on Ω. Second, given an ω ∈ Ω we can
determine if ω ∈ Sj in a reasonable amount of time. The MC algorithm is then to generate a
large number, n, of samples ωi from the uniform distribution on Ω and let X be the number
that are in the union U . Our estimator is then rX/n. We are computing Ep [f (ω)], where

f (ω) = r1ω∈U (6.25)

We are assuming r and n are both large, but suppose r/n is small. Then this will be an
inefficient MC method.

For our importance sampling algorithm, define s(ω) to be the number of subsets Sj that
contain ω, i.e.,

s(ω) = |{j : ω ∈ Sj }| (6.26)

and let s = s(ω). Note that s = |Sj |. The importance distribution is taken to be
P P
ω j

s(ω)
q(ω) = (6.27)
s
The likelihood ratio is just
p(ω) s
= (6.28)
q(ω) rs(ω)
s
Note that q(ω) is zero when f (ω) is zero. So f (ω)q(ω)/p(ω) is just s(ω)
. We then do a Monte
Carlo to estimate
s
Eq [ ] (6.29)
s(ω)
However, is it really feasible to sample from the q distribution? Since l is huge a direct
attempt to sample from it may be impossible. We make two assumptions. We assume we
know |Sj | for all the subsets, and we assume that for each j, we can sample from the uniform
distribution on Sj . Then we can sample from q as follows. First generate a random
J ∈ {1, 2, · · · , m} with
|Sj |
P (J = j) = Pm (6.30)
i=1 |Si |

Then sample ω from the uniform distribution on SJ . To see that this gives the desired density
q(), first note that if ω is not in ∪i Si , then there is no chance of picking ω. If ω is in the
union, then
m
X X 1 |Sj | s(ω)
P (ω) = P (ω|J = j)P (J = j) = Pm = (6.31)
j=1 j:ω∈Sj |Sj | i=1 |Si | s
Fishman does a pretty complete study of the variance for this importance sampling algorithm.
Here we will just note the following. The variance will not depend on n. So if n, r are huge
but r/n is small then the importance sampling algorithm will certainly do better than the
simple Monte Carlo of just sampling uniformly from Ω.

Stop - Mon, 2/15

6.2 Self-normalized importance sampling

In many problems the density we want to sample from is only known up to an unknown
constant, i.e., p(x) = cp p0 (x) where p0 (x) is known, but cp is not. Of course cp is determined
by the requirement that the integral of p(x) be 1, but we may not be able to compute the
integral. Suppose we are in this situation and we have another density q(x) that we can
sample from. It is also possible that q(x) is only known up to a constant, i.e., q(x) = cq q0 (x)
were q0 (x) is known but cq is not known.

The idea of self-normalizing is based on

Z Z
f (x)p(x)
f (x)p(x)dx = q(x)dx (6.32)
q(x)
f (x)p(x)
q(x)dx
R
q(x)
= R p(x) (6.33)
q(x)
q(x)dx
f (x)p0 (x)
q(x)dx
R
q0 (x)
= R p0 (x) (6.34)
q0 (x)
q(x)dx
f (x)w(x)q(x)dx
R
= (6.35)
w(x)q(x)dx
R

Eq [f (x)w(x)]
= (6.36)
Eq [w(x)]

where w(x) = p0 (x)/q0 (x) is a known function.

The self-normalized importance sampling algorithm is as follows. We generate samples

~ 1, · · · , X
X ~ n according to the distribution q(x). Our estimator for µ = R f (x)p(x)dx is
Pn
i=1
~ i )w(X
f (X ~ i)
µ̂ = (6.37)
Pn
w(X ~ i)
i=1

Theorem 3 hypotheses The estimator µ̂ converges to µ with probability 1.

Proof: Note that

1 Pn ~ ~ 1 Pn ~ i ) cq p(X~ i )
f (X 1 Pn ~ i ) p(X~ i )
f (X
n i=1 f (Xi )w(Xi ) n i=1 cp q(X~ )
i n i=1 q(X~ )
i
µ̂ = = = (6.38)
1 Pn ~
i=1 w(Xi )
1 Pn ~ i)
cq p(X 1 Pn ~ i)
p(X
n n i=1 cp q(X
~ i) n i=1 q(X
~ i)

Now apply the strong law of large number to the numerator and denominator separately.
~
Remember that X is sampled from q(x), so the numerator converges to f (x)p(x)dx = µ.
R

The denominator converges to p(x)dx = 1. QED

It should be noted that the expected value of µ̂ is not exactly µ. The estimator is slightly
biased.

To find a confidence interval for self normalized importance sampling we need to compute the
variance of µ̂. We already did this using the delta method. In µ̂ the numerator is the sample
mean for f w and the denominator is the sample mean for w. Plugging this into our result
from the delta method we find that an estimator for the variance of µ̂ is
Pn
i=1
~ i )2 (f (X
w(X ~ i ) − µ̂)2
(6.39)
Pn
( i=1 w(X ~ i ))2

~ i )/ Pnj=1 w(X
If we let wi = w(X ~ j ), then this is just
n
~ i ) − µ̂)2
wi2 (f (X
X
(6.40)
i=1

In ordinary Monte Carlo all of our samples contribute with equal weight. In importance
sampling we give them different weights. The total weight of the weights is ni=1 wi . It is
P

possible that most of this weight is concentrated in a just a few weights. If this happens we
expect the important sampling Monte Carlo will have large error. We might hope that when
this happens our estimate of the variance will be large and so this will alert us to the
problem. However, our estimate of the variance σq uses the same set of weights, so it may not
be accruate when this happens.

Another way to check if we are getting grossly imbalanced weights is to compute an effective
sample size. Consider the following toy problem. Let w1 , · · · , wn be constants (not random).
Let Z1 , · · · , Zn be i.i.d. random variables with common variance σ 2 . An estimator for the
mean of the Zi is
Pn
w i Zi
µ̂ = Pi=1
n (6.41)
i=1 wi

The variance of µ̂ is
Pn
2 w2
var(µ̂) = σ Pni=1 i 2 (6.42)
( i=1 wi )
Now define the number of effective samples ne to be the number of independent samples we
would need to get the same variance if we did not use the weights. In this case the variance is
σ 2 /ne . So
Pn
( wi )2
ne = Pi=1
n 2
(6.43)
i=1 wi

As an example, suppose that k of the wi equal 1 and the rest are zero. The a trivial
calculation shows ne = k.

Note that this definition of the effective sample size only involves the weights. It does not take
f into account. One can also define an effective sample size that depends on f . See Owen.

6.3 Variance minimization and exponential tilting

Rather than consider all possbile choices for the sampling distribution q(x), one strategy is to
restrict the set of q(x) we consider to some family of distributions and minimize the variance
σq over this family. So we assume we have a family of distributions p(x, θ) where θ
parameterizes the family. Here x is multidimensional and so is θ, but the dimensions need not
be the same. We let θ0 be the parameter value that corresponds to our nominal distribution.
So p(x) = p(x, θ0 ). The weighting function is
p(x, θ0 )
w(x, θ) = (6.44)
p(x, θ)
The importance sampling algorithm is based on
Z Z
~ =
µ = Eθ0 [f (X)] f (x)p(x, θ0 )dx = ~
f (x)w(x, θ)p(x, θ)dx = Eθ [f (X)w( ~ θ)]
X, (6.45)

The variance for this is

Z
2
σ (θ) = f (x)2 w(x, θ)2 p(x, θ)dx − µ2 (6.46)
We want to minimize this as a function of θ. One approach would be for each different value
of θ we run a MC simulation where we sample from p(x, θ) and use these samples to estimate
σ 2 (θ). This is quite expensive since it involves a simulation for every value of θ we need to
consider.

A faster approach to search for the best θ is the following. Rewrite the variance as
Z
σ 2 (θ) = ~ 2 w(X,
f (x)2 w(x, θ)p(x, θ0 )dx − µ2 = Eθ0 [f (X) ~ θ)] − µ2 (6.47)

~1, X
Now we run a single MC simulation where we sample from p(x, θ0 ). Let X ~ 2 , · · · , X~m be
the samples. We then use the following to estimate σθ :
m
2 1 X
σ0ˆ(θ) = ~ i )2 w(X
f (X ~ i , θ) − µ2 (6.48)
m i=1

The subscript 0 on the estimator is to remind us that we used a sample from p(x, θ0 ) rather
than p(x, θ) in the estimation. We then use our favorite numerical optimization method for
minimizing a function of several variables to find the minimum of this as a function of θ. Let
θ∗ be the optimal value.
~ according
Now we return to the original problem of estimating µ. We generate samples of X
to the distribution p(x, θ∗ ). We then let
n
1X ~ i )w(X
~ i , θ∗ )
µ̂ = f (X (6.49)
n i=1

The variance of this estimator is σθ2∗ /n. Our estimator for the variance σθ2∗ is
m
ˆ ∗) = 1 X ~ i )2 w(X
~ i , θ)2
σ 2 (θ f (X (6.50)
m i=1

The above algorithm can fail completely if the distribution p(x, θ0 ) is too far from a good
sampling distribution. We illustrate this with an example.

Example: We return to an earlier example. Z is a standard normal RV and we want to

compute P (Z > 4). We take for our family the normal distributions with variance 1 and
mean θ. So
1 1
p(x, θ) = √ exp(− (x − θ)2 ) (6.51)
2π 2
and the nominal density p(x) is p(x, 0).

MORE MORE MORE MORE

We can fix the problem above as follows. Instead of doing our single MC run by sampling
from p(x, θ0 ), we sample from p(x, θr ) that θr , our “reference” θ, is our best guess for a good
choice of θ. Rewrite the variance as
Z
2
σ (θ) = f (x)2 w(x, θ)2 p(x, θ)dx − µ2 (6.52)
Z
p(x, θ0 )2
= f (x)2 p(x, θr )dx − µ2 (6.53)
p(x, θ)p(x, θr )
~ 1 , · · · , X~n from p(x, θr ). Our estimator for the variance σ(θ)2 is
We then generate samples X
then

1X n ~ i )2 p(X
f (X ~ i , θ0 ) 2
σr2ˆ(θ) = − µ̂2 (6.54)
~ ~
n i=1 p(Xi , θ)p(Xi , θr )

For several well-known classes of distributions the ratio p(x)/q(x) takes a simple form. An
exponential family is a family such that
p(x; θ) = exp((η(θ), T (x)) − A(x) − C(θ)) (6.55)
for functions η(θ), T (X), A(x), C(θ). The following are examples. A multivariate normal with
a fixed covariance matrix is an exponential distribution where the means are the parameters.
The possion distribution is an exponential family where the parameter is the usual
(one-dimensional) λ. If we fixed the number of trials, then the binominal distribution is an
exponential family with parameter p. The gamma distribution is also an exponential family.
In many cases the weight function just reduces to exp((θ, x)). Even if p(x) does not come
from an exponential family we can still look for a proposal density of the form
1
q(x) = exp((θ, x))p(x) (6.56)
Z(θ)
where Z(θ) is just the normalizing constant. Importance sampling in this case is often called
exponential tilting.

Example Comment on network example.

Stop - Wed, Feb 17

6.4 Processes

Now suppose that instead of a random vector we have a stochastic process X1 , X2 , X3 , · · ·. We

will let X stand for X1 , X2 , X3 , · · ·. We want to estimate the mean of a function of the process
µ = f (X). It doesn’t make sense to try to give a probability density for the full infinite
process. Instead we specify it through conditional densities:
p1 (x1 ), p2 (x2 |x1 ), p3 (x3 |x1 , x2 ), · · · , pn (xn |x1 , x2 , · · · , xn−1 ), · · ·. Note that it is immediate from
the definition of conditional density that

p(x1 , x2 , · · · , xn ) = pn (xn |x1 , x2 , · · · , xn−1 )pn−1 (xn−1 |x1 , x2 , · · · , xn−2 ) (6.57)

· · · p3 (x3 |x1 , x2 )p2 (x2 |x1 )p1 (x1 ) (6.58)

We specify the proposal density in the same way:

q(x1 , x2 , · · · , xn ) = qn (xn |x1 , x2 , · · · , xn−1 )qn−1 (xn−1 |x1 , x2 , · · · , xn−2 ) (6.59)

· · · q3 (x3 |x1 , x2 )q2 (x2 |x1 )q1 (x1 ) (6.60)

So the likehood function is

Y pn (xn |x1 , x2 , · · · , xn−1 )
w(x) = (6.61)
n≥1 qn (xn |x1 , x2 , · · · , xn−1 )

An infinite product raises convergence questions. But in applications f typically either

depends on a fixed, finite number of the Xi or f depends on a finite but random number of
the Xi . So suppose that f only depends on X1 , · · · , XM where M may be random. To be
more precise we assume that there is a random variable M taking values in the non-negative
integers such that if we are given that M = m, then f (X1 , X2 , · · ·) only depends on
X1 , · · · , Xm . So we can write
∞
X
f (X1 , X2 , · · ·) = 1M =m fm (X1 , · · · , Xm ) (6.62)
m=1

We also assume that M is a stopping time. This means that the event M = m only depends
on X1 , · · · , Xm . Now we define
∞ m
X Y pn (xn |x1 , x2 , · · · , xn−1 )
w(x) = 1M =m (x1 , · · · , xm ) (6.63)
m=1 n=1 qn (xn |x1 , x2 , · · · , xn−1 )

Example - random walk exit: This follows an example in Owens. Let ξi be an i.i.d.
sequence of random variables. Let X0 = 0 and
n
X
Xn = ξi (6.64)
i=1
In probability this is called a random walk. It starts at 0. Now fix an interval (a, b) with
0 ∈ (a, b). We run the run until it exits this interval and then ask whether it exited to the
right or the left. So we let

M = inf{n : Xn ≥ b or Xn < a} (6.65)

So the stopping condition is XM ≥ b or XM ≤ a. Then we want to compute µ = P (XM ≥ b).

We are particularily interested in the case where Eξi < 0. So the walk drifts to the left on
average and the probability µ will be small if b is relatively large.

We take the walk to have steps with a normal distribution with variance 1 and mean −1. So
the walk drifts to the left. We take (a, b) = (−5, 10). We run the walk until is exits this
interval and want to compute the probability it exits to the right. This is a very small
probability. So the ξi are independent normal random variables with variance 1 and mean −1.

The conditional densities that determine the nominal distribution are given by
1 1
p(xn |x1 , · · · , xn−1 ) = p(xn |xn−1 ) = fξn (xn − xn−1 ) = √ exp(− (xn − xn−1 − θ0 )2 ) (6.66)
2π 2
In our example we take θ0 = −1. MORE Explain how we sample this A Monte Carlo
simulation with no importance sampling with 106 samples produced no samples that exited to
the right. So it gives the useless estimate p̂ = 0.

For the sampling distribution we take a random walk whose step distribution is normal with
variance 1 and mean θ. So
1 1
q(xn |x1 , · · · , xn−1 ) = q(xn |xn−1 ) = √ exp(− (xn − xn−1 − θ)2 ) (6.67)
2π 2
The weight factors are then
1 1
wn (x1 , · · · , xn ) = exp((θ0 − θ)(xn − xn−1 ) − θ02 + θ2 ) (6.68)
2 2

With no idea of how to choose θ, we try θ = 0 and find with 106 samples

p = 6.74 × 10−10 ± 0.33 × 10−10 (6.69)

The confidence intervals is rather large, so we do a longer run with 107 samples and find

p = 6.53 × 10−10 ± 0.098 × 10−10 (6.70)

The choice of θ = 0 is far from optimal. More on this in a homework problem.

Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
Variance Reduction Techniques in Monte Carlo
No ratings yet
Variance Reduction Techniques in Monte Carlo
8 pages
Monte Carlo & Markov Chains Lecture
No ratings yet
Monte Carlo & Markov Chains Lecture
18 pages
Slides No Break
No ratings yet
Slides No Break
77 pages
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
No ratings yet
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
24 pages
Sampling
No ratings yet
Sampling
100 pages
MCMC Brief
100% (1)
MCMC Brief
69 pages
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
No ratings yet
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
14 pages
Monte Carlo Approximation Basics
No ratings yet
Monte Carlo Approximation Basics
4 pages
Book Prob 2012 Probability and Statistics
No ratings yet
Book Prob 2012 Probability and Statistics
5 pages
Probability & Random Variables Guide
No ratings yet
Probability & Random Variables Guide
3 pages
Statistical Computing
No ratings yet
Statistical Computing
9 pages
Probability & Statistics Primer
No ratings yet
Probability & Statistics Primer
19 pages
Monte Carlo Integration & Variance Reduction
No ratings yet
Monte Carlo Integration & Variance Reduction
54 pages
Basic Probability & Statistics Review
No ratings yet
Basic Probability & Statistics Review
20 pages
Approximate Inference Via Variational Sampling
No ratings yet
Approximate Inference Via Variational Sampling
13 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Monte Carlo Integration With Subtraction
No ratings yet
Monte Carlo Integration With Subtraction
19 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
56 pages
MCMC Sampling - Class 2025
No ratings yet
MCMC Sampling - Class 2025
101 pages
Monte Carlo Integration
No ratings yet
Monte Carlo Integration
38 pages
Markov Chain Monte Carlo For Computing Rare Event Probabilities For A Heavy Tailed Random Walk
No ratings yet
Markov Chain Monte Carlo For Computing Rare Event Probabilities For A Heavy Tailed Random Walk
18 pages
Empirical Finance7
No ratings yet
Empirical Finance7
30 pages
CPSC 440: Advanced Machine Learning: Monte Carlo Methods
No ratings yet
CPSC 440: Advanced Machine Learning: Monte Carlo Methods
30 pages
Chapter 1 - Basic Principles of Monte Carlo Methods
No ratings yet
Chapter 1 - Basic Principles of Monte Carlo Methods
25 pages
Project
No ratings yet
Project
44 pages
College Statistics
No ratings yet
College Statistics
244 pages
CPSC 540: Machine Learning: Monte Carlo Methods
No ratings yet
CPSC 540: Machine Learning: Monte Carlo Methods
32 pages
Importance Sampling Techniques Overview
No ratings yet
Importance Sampling Techniques Overview
84 pages
Hitchhiker S Guide To Probability
No ratings yet
Hitchhiker S Guide To Probability
6 pages
Statistics
No ratings yet
Statistics
4 pages
MC RiskManage
No ratings yet
MC RiskManage
19 pages
Stat513 l11
No ratings yet
Stat513 l11
17 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Annotated L19
No ratings yet
Annotated L19
53 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
Final Soln
No ratings yet
Final Soln
5 pages
Odf-SC1 05 MonteCarlo
No ratings yet
Odf-SC1 05 MonteCarlo
22 pages
Monte Carlo Techniques for Analysts
No ratings yet
Monte Carlo Techniques for Analysts
6 pages
Monte Carlo Integration Lecture
No ratings yet
Monte Carlo Integration Lecture
8 pages
Siggraph03
No ratings yet
Siggraph03
24 pages
2012 Rare EventSimulation
No ratings yet
2012 Rare EventSimulation
30 pages
Pseph QuantileEstimation
No ratings yet
Pseph QuantileEstimation
21 pages
323 Egec
No ratings yet
323 Egec
18 pages
Review
No ratings yet
Review
6 pages
Importance Sampling Via Simulacrum: Alan E. Wessel
No ratings yet
Importance Sampling Via Simulacrum: Alan E. Wessel
13 pages
Probability Exam Cheat Sheet
No ratings yet
Probability Exam Cheat Sheet
2 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
Conditional Distributions
No ratings yet
Conditional Distributions
5 pages
Macroeconomics Computational Methods
No ratings yet
Macroeconomics Computational Methods
72 pages
Simulating Discrete Distributions
No ratings yet
Simulating Discrete Distributions
3 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
28 pages
Iterative and Non-Iterative Simulation Algorithms
No ratings yet
Iterative and Non-Iterative Simulation Algorithms
6 pages
17 Montecarlo
No ratings yet
17 Montecarlo
35 pages
Part 33421a
No ratings yet
Part 33421a
79 pages
Tungban Probabilistic ML 2021 - 04 - Sampling
No ratings yet
Tungban Probabilistic ML 2021 - 04 - Sampling
24 pages
Low Variance Sampling Techniques For Particle Filter
No ratings yet
Low Variance Sampling Techniques For Particle Filter
7 pages
Why Practitioners Should Use Bayesian
No ratings yet
Why Practitioners Should Use Bayesian
18 pages
2001 Pollard Paris-Lectures On Le Cam
No ratings yet
2001 Pollard Paris-Lectures On Le Cam
2 pages
Unified Inference For Variational Bayesian Linear State Space Model
No ratings yet
Unified Inference For Variational Bayesian Linear State Space Model
8 pages
Bayesianismvs Frequentism Instatisticalinference: Motivation: The Discovery of The Higgs Particle
No ratings yet
Bayesianismvs Frequentism Instatisticalinference: Motivation: The Discovery of The Higgs Particle
24 pages
IRC Lecture8 2019 Copula
No ratings yet
IRC Lecture8 2019 Copula
37 pages
Continuous-Time Asset Pricing Inference
No ratings yet
Continuous-Time Asset Pricing Inference
46 pages
List II 2015
No ratings yet
List II 2015
110 pages
Portfolios For Long Term Investors
No ratings yet
Portfolios For Long Term Investors
42 pages
Bayesian Estimation of State Space Models
No ratings yet
Bayesian Estimation of State Space Models
49 pages
Measure Change for Probability Estimation
No ratings yet
Measure Change for Probability Estimation
2 pages
LDPC Codes
No ratings yet
LDPC Codes
3 pages
Robust Bayesian Analysis For Econometrics
No ratings yet
Robust Bayesian Analysis For Econometrics
41 pages
Ecta - Higher Order Properties of GMM and Generalized - 2004
No ratings yet
Ecta - Higher Order Properties of GMM and Generalized - 2004
37 pages
Blind Source Separation
No ratings yet
Blind Source Separation
64 pages
The Law of Sines and Cosine
100% (1)
The Law of Sines and Cosine
3 pages
Mat3100 Lecture Notes 6 2018 PDF
No ratings yet
Mat3100 Lecture Notes 6 2018 PDF
7 pages
EC1307
No ratings yet
EC1307
12 pages
Applied Time Series Analysis Guide
No ratings yet
Applied Time Series Analysis Guide
46 pages
Standard Univariate Continuous Probability Distributions
No ratings yet
Standard Univariate Continuous Probability Distributions
11 pages
Right-Invariant Poisson-Nijenhuis Structure On Lie Groupoids
No ratings yet
Right-Invariant Poisson-Nijenhuis Structure On Lie Groupoids
11 pages
(Lecture Notes) Romyar Sharifi - Iwasawa Theory (2017)
No ratings yet
(Lecture Notes) Romyar Sharifi - Iwasawa Theory (2017)
155 pages
Sme Discrete Ramdom Variable
No ratings yet
Sme Discrete Ramdom Variable
73 pages
10 Formulas
No ratings yet
10 Formulas
11 pages
Newton's Method for Root Finding
No ratings yet
Newton's Method for Root Finding
5 pages
JNTUH Mathematics-II Course Notes
No ratings yet
JNTUH Mathematics-II Course Notes
90 pages
Edrolo VCE Mathematical Methods Units 1&2 (2025) Exam Preparation Mini Mock Exam B
No ratings yet
Edrolo VCE Mathematical Methods Units 1&2 (2025) Exam Preparation Mini Mock Exam B
14 pages
MathsPaper2QuestionCalc JUNE2014
No ratings yet
MathsPaper2QuestionCalc JUNE2014
28 pages
Definite Integration TN
No ratings yet
Definite Integration TN
12 pages
Partial Differential Equations PDF
100% (1)
Partial Differential Equations PDF
126 pages
Error Function Reference Guide
No ratings yet
Error Function Reference Guide
2 pages
(CETS Math) Word Problem Drills 1 & 2
No ratings yet
(CETS Math) Word Problem Drills 1 & 2
4 pages
7 LP Simplex Maximization
No ratings yet
7 LP Simplex Maximization
18 pages
Functional Analysis for Math Students
No ratings yet
Functional Analysis for Math Students
12 pages
2025 SHS2 End of Semester Exams Mathematics Elective 1
No ratings yet
2025 SHS2 End of Semester Exams Mathematics Elective 1
4 pages
MTL712 Finite Difference Methods Hyperbolic PDEs 240814 130646
No ratings yet
MTL712 Finite Difference Methods Hyperbolic PDEs 240814 130646
2 pages
Abstract Algebra Key Terms
No ratings yet
Abstract Algebra Key Terms
4 pages
Hackerrank
No ratings yet
Hackerrank
4 pages
Linear Programming Fundamentals 2Nd Year - Ensia: Exercice C1
No ratings yet
Linear Programming Fundamentals 2Nd Year - Ensia: Exercice C1
2 pages
8th Math Curriculum Map
100% (1)
8th Math Curriculum Map
7 pages
Pdes Mod 3 Summary
No ratings yet
Pdes Mod 3 Summary
20 pages
Functions of Random Variables Overview
No ratings yet
Functions of Random Variables Overview
34 pages
3D Geometry: Plane Equations & Distances
No ratings yet
3D Geometry: Plane Equations & Distances
2 pages

Importance Sampling

Uploaded by

Importance Sampling

Uploaded by

Chapter 6

6.1 The basics

Of course this is doable only if f (x)p(x)/q(x) is computable.

Proof: Straightforward. QED

Proof: Note that c = |f (x)|q(x)dx.

Example We want to compute the integral

We take the sampling distribution to be

Example: Let Z have a standard normal distribution. We want to compute P (Z ≥ 4). We

Our nominal distribution is

Example - union counting problem (from Fishman)

f (ω) = r1ω∈U (6.25)

s(ω) = |{j : ω ∈ Sj }| (6.26)

Stop - Mon, 2/15

6.2 Self-normalized importance sampling

The idea of self-normalizing is based on

where w(x) = p0 (x)/q0 (x) is a known function.

The self-normalized importance sampling algorithm is as follows. We generate samples

Theorem 3 hypotheses The estimator µ̂ converges to µ with probability 1.

Proof: Note that

The denominator converges to p(x)dx = 1. QED

6.3 Variance minimization and exponential tilting

The variance for this is

Example: We return to an earlier example. Z is a standard normal RV and we want to

MORE MORE MORE MORE

Example Comment on network example.

Stop - Wed, Feb 17

Now suppose that instead of a random vector we have a stochastic process X1 , X2 , X3 , · · ·. We

p(x1 , x2 , · · · , xn ) = pn (xn |x1 , x2 , · · · , xn−1 )pn−1 (xn−1 |x1 , x2 , · · · , xn−2 ) (6.57)

We specify the proposal density in the same way:

q(x1 , x2 , · · · , xn ) = qn (xn |x1 , x2 , · · · , xn−1 )qn−1 (xn−1 |x1 , x2 , · · · , xn−2 ) (6.59)

So the likehood function is

An infinite product raises convergence questions. But in applications f typically either

M = inf{n : Xn ≥ b or Xn < a} (6.65)

So the stopping condition is XM ≥ b or XM ≤ a. Then we want to compute µ = P (XM ≥ b).

p = 6.74 × 10−10 ± 0.33 × 10−10 (6.69)

p = 6.53 × 10−10 ± 0.098 × 10−10 (6.70)

The choice of θ = 0 is far from optimal. More on this in a homework problem.

You might also like