An introduction to
simulation-based inference
51st SLAC Summer Institute
August 16, 2023
Gilles Louppe
[email protected]
1 / 36
2 / 36
vx = v cos(α), vy = v sin(α),
dx dy dvy
= vx , = vy , = −G.
dt dt dt
3 / 36
def simulate(v, alpha, dt=0.001):
v_x = v * np.cos(alpha) # x velocity m/s
v_y = v * np.sin(alpha) # y velocity m/s
y = 1.1 + 0.3 * random.normal()
x = 0.0
while y > 0: # simulate until ball hits floor
v_y += dt * -G # acceleration due to gravity
x += dt * v_x
y += dt * v_y
return x + 0.25 * random.normal()
4 / 36
5 / 36
What parameter values θ are the most plausible?
6 / 36
7 / 36
Outline
1. Simulation-based inference
2. Algorithms
Neural ratio estimation
Neural posterior estimation
Neural score estimation
3. Diagnostics
8 / 36
Simulation-based inference
8 / 36
Scienti c simulators
9 / 36
θ, z, x ∼ p(θ, z, x)
10 / 36
θ, z ∼ p(θ, z∣x)
11 / 36
12 / 36
12 / 36
12 / 36
12 / 36
p(x∣θ) = ∭ p(zp ∣θ)p(zs ∣zp )p(zd ∣zs )p(x∣zd )dzp dzs dzd
yikes!
13 / 36
Bayesian inference
Start with
a simulator that can generate N samples xi ∼ p(xi ∣θi ),
a prior model p(θ),
observed data xobs ∼ p(xobs ∣θtrue ).
Then, estimate the posterior
p(xobs ∣θ)p(θ)
p(θ∣xobs ) = .
p(xobs )
14 / 36
15 / 36
Algorithms
15 / 36
―
Credits: Cranmer, Brehmer and Louppe, 2020. 16 / 36
Approximate Bayesian Computation (ABC)
Issues:
How to choose x′ ? ϵ? ∣∣ ⋅ ∣∣?
No tractable posterior.
Need to run new simulations for new data or new prior.
―
Credits: Johann Brehmer. 17 / 36
―
Credits: Cranmer, Brehmer and Louppe, 2020. 18 / 36
―
Credits: Cranmer, Brehmer and Louppe, 2020. 18 / 36
Neural ratio estimation
p(x∣θ) p(x,θ)
The likelihood-to-evidence r(x∣θ) = p(x) = p(x)p(θ) ratio can be learned, even
if neither the likelihood nor the evidence can be evaluated:
x, θ ∼ p(x, θ)
r^(x∣θ)
x, θ ∼ p(x)p(θ)
―
Credits: Cranmer et al, 2015; Hermans et al, 2020. 19 / 36
The solution d found after training approximates the optimal classi er
p(x, θ)
d(x, θ) ≈ d∗ (x, θ) = .
p(x, θ) + p(x)p(θ)
Therefore,
p(x∣θ) p(x, θ) d(x, θ)
r(x∣θ) = = ≈ = r^(x∣θ).
p(x) p(x)p(θ) 1 − d(x, θ)
20 / 36
p(θ∣x) ≈ r^(x∣θ)p(θ)
21 / 36
Constraining dark matter with stellar streams
.]
Interaction of Pal 5 with two …
―
Image credits: C. Bickel/Science; D. Erkal. 22 / 36
―
Credits: Hermans et al, 2021. 23 / 36
Preliminary results for GD-1 suggest a preference for CDM over WDM.
24 / 36
Neural Posterior Estimation
min Ep(x) [KL(p(θ∣x)∣∣qϕ (θ∣x))]
qϕ
25 / 36
Normalizing ows
A normalizing ow is a sequence of invertible transformations fk that map a
simple distribution p0 to a more complex distribution pK :
By the change of variables formula, the log-likelihood of a sample x is given by
K
log p(x) = log p(z0 ) − ∑ log ∣det Jfk (zk−1 )∣ .
k=1
26 / 36
Exoplanet atmosphere characterization
―
Credits: NSA/JPL-Caltech, 2010. 27 / 36
―
Credits: Vasist et al, 2023. 28 / 36
Diagnostics
28 / 36
p^(θ∣x) = sbi(p(x∣θ), p(θ), x)
We must make sure our approximate
simulation-based inference
algorithms can (at least) actually
realize faithful inferences on the
(expected) observations.
How do we know this is good enough?
29 / 36
Mode convergence
The maximum a posteriori estimate converges towards the nominal value θ∗
for an increasing number of independent and identically distributed observables
xi ∼ p(x∣θ∗ ):
lim arg max p(θ∣{xi }N
i=1 )
N →∞ θ
= lim arg max p(θ) ∏ r(xi ∣θ) = θ∗
N →∞ θ
xi
―
Credits: Brehmer et al, 2019. 30 / 36
Coverage diagnostic
For x, θ ∼ p(x, θ), compute the 1 − α
credible interval based on p^(θ∣x).
If the fraction of samples for which θ is
contained within the interval is larger than the
nominal coverage probability 1 − α, then the
approximate posterior p^(θ∣x) has coverage.
―
Credits: Hermans et al, 2021; Siddharth Mishra-Sharma, 2021. 31 / 36
―
Credits: Hermans et al, 2021. 32 / 36
What if diagnostics fail?
33 / 36
Balanced NRE
Enforce neural ratio estimation to be conservative by using binary classi ers d^
that are balanced, i.e. such that
Ep(θ,x) [ d^(θ, x)] = Ep(θ)p(x) [1 − d^(θ, x)] .
―
Credits: Delaunoy et al, 2022. 34 / 36
―
Credits: Delaunoy et al, 2022. 35 / 36
Summary
Advances in deep learning have enabled new approaches to statistical
inference.
This is major evolution in the statistical capabilities for science, as it
enables the analysis of complex models and data without simplifying
assumptions.
Inference remains approximate and requires careful validation.
Obstacles remain to be overcome, such as the curse of dimensionality and
the need for large amounts of data.
36 / 36
The end.
36 / 36