Differentiability and Approximation of Probability
Differentiability and Approximation of Probability
Abstract In this work, we study probability functions associated with Gaussian mixture models. Our
primary focus is on extending the use of spherical radial decomposition for multivariate Gaussian random
vectors to the context of Gaussian mixture models, which are not inherently spherical but only condition-
ally so. Specifically, the conditional probability distribution, given a random parameter of the random
vector, follows a Gaussian distribution, allowing us to apply Bayesian analysis tools to the probability
function. This assumption, together with spherical radial decomposition for Gaussian random vectors,
enables us to represent the probability function as an integral over the Euclidean sphere. Using this
representation, we establish sufficient conditions to ensure the differentiability of the probability function
and provide and integral representation of its gradient. Furthermore, leveraging the Bayesian decomposi-
tion, we approximate the probability function using random sampling over the parameter space and the
Euclidean sphere. Finally, we present numerical examples that illustrate the advantages of this approach
over classical approximations based on random vector sampling.
The first author was partially supported by ANID-Chile grant Exploración 13220097. The second and third author were
partially supported by Centro de Modelamiento Matemático (CMM), ACE210010 and FB210005, BASAL funds for center
of excellence, as well as ANID-Chile grants MATH-AMSUD 23-MATH-17, ECOS-ANID ECOS320027, Fondecyt Regular
1220886, Fondecyt Regular 1240120, and Exploración 13220097. Additionally, the second author was supported by ANID-
Chile grants MATH-AMSUD 23-MATH-09 and Fondecyt Regular 1240335.
Gonzalo Contador
Universidad Técnica Federico Santa Marı́a, Santiago, Chile. E-mail: [Link]@[Link]
Pedro Pérez-Aros
Departamento de Ingenierı́a Matemática and Centro de Modelamiento Matemático (CNRS UMI 2807), Universidad de
Chile, Santiago, Chile. E-mail: pperez@[Link]
Emilio Vilches
Instituto de Ciencias de la Ingenierı́a, Universidad de O’Higgins, Rancagua, Chile.
Centro de Modelamiento Matemático (CNRS UMI 2807), Santiago, Chile. E-mail: [Link]@[Link]
2 Gonzalo Contador et al.
1 Introduction
In this paper, we adopt the standard notation commonly used in variational analysis, optimization,
statistics, and probability theory. For more details and comprehensive coverage, we refer the reader to
the monographs [2, 14, 16, 20, 21].
4 Gonzalo Contador et al.
It is known that if ξ has a (multivariate) normal distribution ξ ∼ N (µ, Σ), then ξ admits a so-called
spherical radial decomposition representation, given by
ξ = µ + RLΘ,
where R follows a χ-distribution with m-degrees of freedom,
Pm 2Θ follows a uniform distribution ζΘ over
the m-dimensional unit sphere Sm−1 := {z ∈ Rm : i=1 zi = 1}, and L is the (unique) Cholesky
decomposition of Σ, such that LLT = Σ.
Now, given a probability function φ(x) := P(g(x, ξ) ≤ 0), for ξ ∼ N (µ, Σ), the above decomposition
allows us to rewrite φ as Z
φ(x) = e(x, v)dζΘ (v),
Sm−1
m−1
where ζΘ is the uniform measure on S , and e(x, v) represents the probability of satisfying the con-
straint for a given x when Θ = v is fixed. Specifically,
Z
e(x, v) = χ(s)ds, (2)
{r≥0 :g(x,µ+rLv)≤0}
where k ∈ N is a fixed integer, µ1 , . . . , µk are the means and Σ1 , . . . , Σk are the covariances of k Gaussian
distributions, and w1 , . . . , wk are the mixing proportions, which are positive and sum 1. In this way, the
model can be interpreted as ξ having a multivariate normal distribution with parameters µk and Σk with
probability wk .
To obtain the infinite limit, the authors in [17] place a prior on the (one-dimensional) common means,
with group variances given Gaussian and Gamma priors, respectively, and the mixing proportion vector
w = (w1 , . . . , wk )⊤ given a Dirichlet prior. Subsequent work also incorporates uncertainty quantification
on the number of categories [11]. We will generalize this framework so that µ and Σ are selected according
to a density m over a subset of feasible means and covariance matrices, specifically,
ξ|µ, Σ ∼ N (µ, Σ) and (µ, Σ) ∼ m.
This formulation will be detailed in Section 2.3.
Probability Functions under Gaussian Mixture Models 5
2.3 Mathematical Formulation and Spherical Radial decomposition for Gaussian Mixture Models
Let us consider a probability space (Ω, S, P), a random vector (ξ, C) : Ω → Ξ × C ⊂ Rm+p , and a set
of conditional probability measures P := {Pc (A) = E(1A |C = c) : c ∈ C} over Ξ, where the parameter
space C ⊂ Rp is a Borel measurable set. We denote by m the measure induced by the second coordinate
of (ξ, C) over the Borel σ-algebra B(C), that is,
and the chance-constrained problem (1) is then rewritten as what we will refer to from now on as a
Bayesian Chance Constraint (BCC) problem with confidence level p:
Z
min f (x) s.t. P(g(x, ξ)) ≤ 0) = Pc (g(x, ξ) ≤ 0)dm(c) ≥ p
P
We observe that the above formulation coincides with (1) when the distribution of ξ is fully specified,
i.e., when P = {P} and m = δP . In the context of an infinite Gaussian mixture model, we consider a
probability space (C, F, m) where the random variable C has probability distribution m as defined in (4).
In what follows, we consider measurable functions z : C → Rm and L : C → Rm×m such that
a) For each c ∈ C, conditional on the event C = c, the vector ξ ∼ N (z(c), Σ(c)), i.e.,
Z
Pc (ξ ∈ A) := P(ξ ∈ A|C = c) = hc (z)dz for all Borel set A ⊂ Rm ,
A
where
1 1
hc (z) := p exp − (z − z(c))⊤ Σ −1 (c)(z − z(c)) .
(2π)m det Σ(c) 2
where L(c) the Choleski decomposition of Σ(c), i.e., Σ(c) := L(c)⊤ L(c).
Hence, the expression hc (ξ) can be understood as the conditional density of ξ given C = c, which allows
us to define the conditional probability function as
Z
φc (x) := Pc (g(x, ξ) ≤ 0) = P(g(x, ξ) ≤ 0|C = c) = hc (z)dz (6)
{z:g(x,z)≤0}
6 Gonzalo Contador et al.
Condition a) guarantees that φc is well-defined for almost every c ∈ C (see, e.g., [4]), and from (6), it
follows that
Z
P(g(x, z) ≤ 0) = Pc (g(x, ξ) ≤ 0)dm(c) = E(φC (x)).
C
It is worth emphasizing that the probability defined in (6) can be interpreted as the probability of
satisfying the constraint, calculated after randomly selecting a Gaussian distribution according to m.
Applying the spherical radial decomposition on the event C = c for a Gaussian random variable ξ, as
described in Section 2.1, we obtain the following parameterized decomposition:
Moreover, it is clear that (Θ, R) is independent with Θ uniform over the sphere Sm−1 and R follows a
χ-distribution with m-degres of freedom.
The conditional probability function (6) can be expressed as
Z
φc (x) = ec (v, x)dµζ (v),
Sm−1
where ec := e(c, ·, ·) and e : C × Sm−1 × Rn → R ∪ {+∞} is the parametric radial probability function in
(2) on the event C = c; specifically, i.e.
Z
ec (x, v) = χ(r)dr. (7)
{r≥0:g(x,rL(c)v+z(c))≤0}
In what follows, we say that x̄ is a z-uniform Slater point if there exists an open neighborhood U of x̄
and γ0 > 0 such that
g(x, z(c)) < −γ0 , for all x ∈ U and c ∈ C. (8)
Proposition 1 Assume that z and Σ are measurable functions. Then, the parametric radial probability
function e : C ×Sm−1 ×Rn → R∪{+∞}, defined in (7), is F ⊗B(Sm−1 )⊗B(Rn ) measurable. Furthermore,
for any (c, v) ∈ C × Sm−1 the function ec (·, v) is continuous at any z-uniform Slater point (8).
Probability Functions under Gaussian Mixture Models 7
Before establishing the differentiability of the probability function defined in (9), we need to introduce a
condition that allows us to control the gradient of g locally at x̄, but globally on (z, c) ∈ Rm × C. In our
GMM setting, we consider the so-called exponential growth condition, first introduced in [24]. Without
this condition, differentiability may fail even for Gaussian distributions (see, e.g., [6, 24]).
Definition 1 (Exponential growth condition) We say that a function g satisfies the exponential
growth condition at x̄ if one can find constants ε, a, b > 0 such that
∥∇x g(x, z)∥ ≤ a exp(b∥z∥) for all (x, z) ∈ Bε (x̄) × Rm .
The following result proves the continuous differentiability of the probability function (9) and provides
an integral formula for its gradient.
The proof of the above result will follow directly from a more general result presented in the appendix,
which provides an explicit representation for computing the integral formula in (10) in terms of the
function g, along with a detailed technical analysis of the estimation process for the integrand (see
Theorem 3). We refer to (17) for an explicit formula for ∇x ec (x, v) in terms of the data of the problem.
In this section, we propose a sampling method to approximate the probability function Φ in (9) and
its gradient. As Theorem 1 guarantees continuity of Φ, our goal is to obtain an expression that remains
continuous in x. Therefore, we avoid relying on naive i.i.d.-based Monte Carlo approximations of the form
N
1 X
Φ̃N (x) = 1{g(x,ξi )≤0} . (11)
N i=1
Instead, we propose a Monte Carlo approximation to estimate Φ(x) and ∇Φ(x) that relies on an i.i.d.
sample of {(vi , ci )}i∈N . Alternatively, if directly sampling from ζΘ ⊗ m is challenging, we can use an
asymptotic surrogate Markov Chain Monte Carlo (MCMC) methods [8,13], as such sequence has the same
asymptotic distribution as an i.i.d. ζΘ ⊗m sequence after a sufficiently large number of steps [19, Theorem
7.4]. By averaging the radial probability function (7) across the sample, we define the empirical spherical
approximation to (6) and its gradient (10) as
N N
1 X 1 X
ΦN (x) = ec (x, vi ) and ∇ΦN (x) = ∇x eci (x, vi ). (12)
N i=1 i N i=1
Hence, under mild regularity and growth conditions, the hypotheses in [21, Theorem 9.60] are satisfied,
leading to the following convergence theorem, which is proved in the appendix.
8 Gonzalo Contador et al.
The above theorem enables us to establish asymptotic distributional results to assess consistency of the
approximations in (12) and to establish the asymptotic normality of their errors with controlled variance.
The result is the following:
Corollary 1 Under the assumptions of Theorem 2, for all x ∈ X, the following holds:
√ √
N [ΦN (x) − Φ(x)] →d N (0, σx2 ) and N [∇ΦN (x) − ∇Φ(x)] →d N (0, Σx2 ) as N → +∞,
where
Z
σx2 := ec (x, v)2 dζΘ ⊗ m(v, c) − Φ(x)2 ,
Sm−1 ×C
Z
Σx2 := [∇x ec (x, v)][∇x ec (x, v)]T dζΘ ⊗ m(v, c) − ∇Φ(x)∇Φ(x)T .
Sm−1 ×C
Proof Fix x ∈ X. We consider the case where {(vi , ci )}N i=1 is an i.i.d. sequence of ζΘ ⊗ m. The case of
MCMC is similar (see, e.g., [19]). Indeed, the sequence eci (x, vi ) is i.i.d. with mean Φ(x) and variance
σx2 . In the same way, the sequence ∇x eci (x, vi ) is i.i.d. with mean ∇Φ(x) and covariance matrix Σx2 .
Therefore, the result follows from the Central Limit Theorem [2, Theorem 27.2].
Corollary 2 Under the assumptions of Theorem 2, for all x ∈ X, the following holds:
The naive approximation Φ̃N in (11) is also asymptotically normal. However, for x ∈ X, its asymptotic
variance
R is σ̃x2 := Φ(x) − Φ(x)2 . Since for any (c, v), we have ec (x, v) ≥ ec (x, v)2 , it follows that Φ(x) ≥
e (x, v)2 dζΘ ⊗ m(v, c). Hence, σ̃x2 ≥ σx2 . Therefore, estimating the probability function by ΦN is
Sm−1 ×C c
uniformly more efficient than doing so via Φ̃N .
Probability Functions under Gaussian Mixture Models 9
where Fχk,λ denotes the cumulative distribution function of a non-central chi-squared random variable
with k degrees of freedom and noncentrality parameter λ, and fχk,λ denotes its corresponding density.
Furthermore, the derivative of the probability function is therefore given by
Z 1
Γ (2δ) 2 δ−1 2,c2 2
∇Φ(x) = (1 − c ) fχ (∥x∥ + 2)dc x.
22δ−2 Γ (δ)2 −1
On the other hand,
g(x, (0, c)⊤ ) ≤ −1 for all x ∈ Rn and c ∈ [−1, 1].
Hence, assumption (8) is satisfied with γ0 = 1/2. On the other hand,
function compared to the naive estimator (11), as shown in the leftmost plot. Unlike Φ̃N , the smooth
profile of ΦN admits a continuous derivative ∇ΦN , which reasonably approximates the gradient of the
probability function ∇Φ, as illustrated in the plot on the right. The highest empirical error approximation
in the gradient approximation is less than 0.0007, and this high accuracy is achieved across the entire
domain with a relatively small sample size.
∇Φ
∇ΦN
0.4
0.9
0.3
Probability
Derivative
0.8
0.2
0.7
0.1
ΦN
~
ΦN
0.0
0.6
0 1 2 3 4 0 1 2 3 4
x x
Fig. 1 Estimates for N = 100 and true values of the probability function Φ (left) and its derivative Φ′ (right). Naive
estimator (11) in blue, empirical spherical approximation (12) in red, and true functions in black.
Declarations
Data availability No datasets were generated or analyzed during the current study.
Conflict of interest The authors have no conflict of interest to declare..
Probability Functions under Gaussian Mixture Models 11
Proof of Proposition 1. Observe that the function (x, v, c, r) 7→ χ(r)1{r′ ≥0:g(x,r′ L(c)v+z(c))≤0} (r) is measurable. Hence,
using Fubini’s Theorem we get that
Z
(c, v, x) 7→ χ(r)dr
{r ′ ≥0:g(x,r ′ L(c)v+z(c))≤0}
is measurable, which proves the measurability of e. Now, let us fix (c, v) ∈ C × Sm−1 and let x ∈ Rn be a z-uniform Slater
point. Consider N > 0 and a sequence xk → x. Define the functions
ηk (r) := χ(r)1{r′ ≥:g(xk ,r′ L(c)v+z(c))≤0} (r), and η(r) := χ(r)1{r′ ≥:g(x,r′ L(c)v+z(c))≤0} (r)
First, note that |ηk (r)| ≤ χ(r) for all r ∈ [0, +∞). Furthermore, due to the convexity of g with respect to its second
argument, it is straightforward to show that ηk converges almost everywhere to η as k → ∞. Therefore, by the Lebesgue
Dominated Convergence theorem, we have that
Z Z
ec (x, v) = η(r)dr = lim ηk (r)dr = lim ec (xk , v),
k→∞ k→∞
R R
F (x) := {(v, c) ∈ Sm−1 × C : ∃r > 0 s.t. g(x, z(c) + rL(c)v) > 0},
I(x) := {(v, c) ∈ Sm−1 × C : ∀r > 0 g(x, z(c) + rL(c)v) < 0}.
The following proposition is a direct adaptation of the arguments presented in [27, Lemmas 3.2, Lemma 3.3 and 3.4] (see
also [25]).
Lemma 1 Suppose that condition (8) holds at x̄. Then, for every (x, v, c) ∈ U × Sm−1 × C, the set {r ≥ 0 : g(x, rL(c)v +
z(c)) ≤ 0} = [0, ρc (x, v)] (with the convention that [0, ∞] = [0, ∞)), and
g(x, z(c))
⟨∇z g(x, z(c) + ρc (x, v)L(c)v), L(c)v⟩ ≥ − . (14)
ρc (x, v)
ρc (x,v)
Z
ec (x, v) = χ(r)dr, for every (x, v, c) ∈ Bε (x̄) × Sm−1 × C. (15)
0
Furthermore, the function ρ is B(Rn ) ⊗ B(Sm−1 ) ⊗ F measurable, and for every sequence (xk , vk ) → (x, v), we have that
ρc (xk , vk ) → ρc (x, v) for all c ∈ C. In addition, if ρc (x, v) < +∞, then the function ρc is continuously differentiable with
respect to x, with gradient
∇x g(x, z(c) + ρc (x, v)L(c)v)
∇x ρc (x, v) = . (16)
⟨∇z g(x, z(c) + ρc (x, v)L(c)v), L(c)v⟩
12 Gonzalo Contador et al.
{(x, v, c) ∈ U × Sm−1 × C : ρ(c, x, v) > ν} = {(x, v, c) ∈ U × Sm−1 × C : g(x, νL(c)v + z(c)) < 0},
where the set on the right-hand side of the equality is measurable due to the continuity of g and measurability of z and L.
Now, we establish a lower bound for the inner product between the partial gradient of g with respect to z and the directions
over the sphere. This technical inequality follows from the convexity of the function g (with respect to z), and its proof is
obtained by adapting the arguments from [27, Lemma 3.3] to our setting.
Proposition 2 Suppose that (8) holds, and consider (x, v, c) ∈ U × Sm−1 × C with (v, c) ∈ F (x). Let r0 such that
{r ≥ 0 : g(x, rL(c)v + z(c)) ≤ 0} = [0, r0 ]. Then ⟨∇z g(x, z(c) + r0 L(c)v), L(c)v⟩ ≥ −r0−1 g(x, z(c)) > 0.
The following result pertains to a technical of estimating the gradients of the (parameterized) radius function defined in
(13). Since we cannot assume a priori that the radial probability function is differentiable, the proof of this result relies on
techniques from variational analysis and generalized differentiation. We refer to [14, 20] for the primary notation and the
tools used in the proof.
Theorem 3 Let x̄ be a z-uniform Slater point, and suppose that g satisfies the exponential growth condition atx̄. Then,
there exists ε > 0 such that for all (x, v, c) ∈ Bε (x̄) × Sm−1 × C the function ec is continuously differentiable differentiable
with respect to x, with gradient
n
χ(ρc (x, v))∇x ρc (x, v) if (v, c) ∈ F (x),
∇x ec (x, v) = (17)
0 if (v, c) ∈ I(x).
∥∇x ec (x, v)∥ ≤ κ̂, for all (x, v, c) ∈ Bε (x̄) × Sm−1 × C. (18)
Proof Let U be the neighborhood in (8). We divide the proof into three claims.
Claim 1: For all (x, v, c) ∈ U × Sm−1 × C, the following inclusions hold:
∂ˆx ec (x, v) ⊂ {χ(ρc (x, v))∇x ρc (x, v)} for (v, c) ∈ F (x) and ∂ˆx ec (x, v) ⊂ {0} for (v, c) ∈ I(x), (19)
where ∂ˆx ec (x, v) denotes the Fréchet subdifferential of the function ec with respect to x.
Proof of Claim 1: Indeed, let x ∈ U . Then, by Lemma 1, we have that (17) holds whenever (v, c) ∈ F (x). On the other
hand, if (v, c) ∈ I(x), then ec (x, v) = 1, which implies that ec attains its maximum. It is then straightforward to show that
∂ˆx ec (x, v) ⊂ {0}.
Claim 2: There exist γ > 0 and κ̂ ≥ 0 such that
Proof of Claim 2: Let ε, a, b > 0 be the parameters in the definition of the exponential growth condition. By shrinking ε
if necessary, we can assume that Bε (x̄) ⊂ U , where U is the neighborhood of x̄ from (8). Now, consider x ∈ Bε (x̄) and
x∗ ∈ ∂ˆx ec (x, v). Then, using (3), (16), and (19), we get
∥x∗ ∥ ≤ c0 γ0 ρc (x, v)m exp(−ρc (x, v)2 /2)a exp(bρc (x, v)∥L(c)∥) exp(b∥z(c)∥)
(20)
≤ ac0 γ0 exp(bη0 )ρc (x, v)m exp(−ρc (x, v)2 /2) exp(bη0 ρc (x, v)).
Probability Functions under Gaussian Mixture Models 13
Consequently,
Proof of Theorem 1 By using (18) and the classical theorem on the interchange of integration and differentiation, we conclude
that the formula (10) holds within a suitable neighborhood of x̄. Furthermore, considering the integrand in (10) and the
uniform boundedness in (18), we can establish the local continuity of the gradient around x̄. ⊓
⊔
Proof of Theorem 2: On the one hand, notice that for every x ∈ X, the following inequality holds:
On the other hand, by (18) and a classical argument of compactness, we can ensure that there exists κ̂ > 0 and an open
set U ′ ⊃ X such that
∆1,k (v, c) = sup |ec (x, v) − ec (x̄, v)|, and ∆2,k (v, c) = sup |∇x ec (x, v) − ∇x ec (x̄, v)|.
x∈Brk (x̄) x∈Brk (x̄)
By Theorem 3, we have that for j = 1, 2, as k → ∞, ∆j,k (v, c) → 0 (ζΘ ⊗ m) almost surely. Since ∆1,k (v, c) ≤ 2 and
∆2,k (v, c) ≤ 2κ̂, the dominated convergence theorem implies that for j = 1, 2,
Z Z
lim ∆j,k (v, c)d(ζΘ ⊗ m)(v, c) = lim ∆j,k (v, c)d(ζΘ ⊗ m)(v, c) = 0.
k→∞ k→∞
C×Sm−1 C×Sm−1
By the Strong Law of Large Numbers [2, Theorem 6.2] (or equivalently by [19, Theorem 7.4] when using a MCMC-type
sampler), it follows that for fixed k ∈ N and x̄ ∈ X, as N → ∞, almost surely both
N
X
sup |ΦN (x) − ΦN (x̄)| ≤ N −1 ∆1,k (ci , vi ) → E(∆1,k ),
x∈Brk (x̄)
i=1
and
N
X
sup |∇ΦN (x) − ∇ΦN (x̄)| ≤ N −1 ∆2,k (ci , vi ) → E(∆2,k ).
x∈Brk (x̄)
i=1
As the right-hand sides above converge to 0 as k → ∞, for any ε > 0, we can find k̃ such that
sup |ΦN (x) − ΦN (x̄)| ≤ ε/3 and sup |∇ΦN (x) − ∇ΦN (x̄)| ≤ ε/3.
|x−x̄|<rk̃ |x−x̄|<rk̃
14 Gonzalo Contador et al.
Since the sets Brk̃ (x̄) for x̄ ∈ X cover X, the compactness of X implies that there exists a finite collection x̄1 , . . . , x̄F and
K = max{k̃1 , · · · , k̃F } such that BrK (x̄1 ), . . . , BrK (x̄1 ) cover X. Therefore, for x̄ ∈ X, there exist indices 1 ≤ j0 ≤ F and
1 ≤ j1 ≤ F such that
|ΦN (x̄) − ΦN (x̄j0 )| ≤ sup |ΦN (x) − ΦN (x̄j0 )| ≤ ε/3,
|x−x̄j0 |<rK
and
|∇ΦN (x̄) − ∇ΦN (x̄j1 )| ≤ sup |∇ΦN (x) − ∇ΦN (x̄j1 )| ≤ ε/3.
|x−x̄j1 |<rK
The compactness of X and continuity of Φ and its gradient ensure that, possibly enlarging F and K, for each x̄ ∈ X, there
exists 1 ≤ j ≤ F such that
PN PN
For each 1 ≤ j ≤ F , h(c, v) = N −1 e (x̄j , vi )
i=1 ci
and h̃(c, v) = N −1 i=1
∇eci (x̄j , vi ) converges almost surely to Φ(x̄j )
and ∇Φ(x̄j ), respectively. Thus, we can find k̂ such that |ΦN (x̄j ) − Φ(x̄j )| ≤ ε/3 and |∇ΦN (x̄j ) − ∇Φ(x̄j )| ≤ ε/3 whenever
N ≥ k̂ for every j. Finally, by the triangle inequality, it follows that for arbitrary ε > 0, we have for k > max{k̂, K},
References
1. T. Aktekin and T. Ekin. Stochastic call center staffing with uncertain arrival, service and abandonment rates: A
bayesian perspective. Naval Research Logistics (NRL), 63(6):460–478, 2016.
2. P. Billingsley. Probability and Measure. John Wiley and Sons, second edition, 1986.
3. N. Chitsazan, H. V. Pham, and F. T.-C. Tsai. Bayesian chance-constrained hydraulic barrier design under geological
structure uncertainty. Groundwater, 53(6):908–919, 2015.
4. A. M. Faden. The existence of regular conditional probabilities: necessary and sufficient conditions. Ann. Probab.,
13(1):288–298, 1985.
5. T. González Grandón, R. Henrion, and P. Pérez-Aros. Dynamic probabilistic constraints under continuous random
distributions. Math. Program., pages 1065–1096, 2020.
6. A. Hantoute, R. Henrion, and P. Pérez-Aros. Subdifferential characterization of probability functions under Gaussian
distribution. Math. Program., 174(1-2 (B)):167–194, 2019.
7. P. Jaiswal, H. Honnappa, and V. A. Rao. Bayesian joint chance constrained optimization: Approximations and statistical
consistency. SIAM J. Optim., 33(3):1968–1995, 2023.
8. G. L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299–320, 2004.
9. J. Kirschner, I. Bogunovic, S. Jegelka, and A. Krause. Distributionally robust bayesian optimization. In S. Chiappa
and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and
Statistics, volume 108 of Proc. Mach. Learn. Res., pages 2174–2184. PMLR, 26–28 Aug 2020.
10. T. L. Lai, H. Xing, and Z. Chen. Mean–variance portfolio optimization when means and covariances are unknown.
Ann. Appl. Stat., 5(2A):798 – 823, 2011.
11. A. Matza and Y. Bistritz. Infinite Gaussian mixture modeling with an improved estimation of the number of clusters.
Proceedings of the AAAI Conference on Artificial Intelligence, 35(10):8921–8929, 2021.
12. G. McLachlan and D. Peel. Finite mixture models. Wiley Ser. Probab. Stat. Wiley, New York, 2000.
13. K. L. Mengersen. MCMC convergence diagnostics: a review. Bayesian statistics, 6:415–440, 1999.
14. B. S. Mordukhovich. Variational analysis and applications. Springer Monogr. Math. Springer, Cham, 2018.
15. A. Papoulis and S.U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill Series in Electrical
Engineering. Communications and Information Theory. Tata McGraw-Hill, 2002.
16. A. Prékopa. Stochastic programming, volume 324 of Mathematics and its Applications. Kluwer Academic Publishers
Group, Dordrecht, 1995.
17. C. Rasmussen. The infinite Gaussian mixture model. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural
Information Processing Systems, volume 12, pages 554–560. MIT Press, 04 2000.
Probability Functions under Gaussian Mixture Models 15
18. D. Reynolds. Gaussian mixture models. In S. Z. Li and A. K. Jain, editors, Encyclopedia of Biometrics, pages 827–832.
Springer US, Boston, MA, 2015.
19. C.P. Robert and G. Casella. Monte Carlo statistical methods. Springer Verlag, 2004.
20. R.T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der mathematischen Wis-
senschaften. Springer Verlag Berlin, 3rd edition, 2009.
21. A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on stochastic programming, volume 9 of MOS-SIAM Ser.
Optim. SIAM, Philadelphia, PA., second edition, 2014. Modeling and theory.
22. A. Shapiro, E. Zhou, and Y. Lin. Bayesian distributionally robust optimization. SIAM J. Optim., 33(2):1279–1304,
2023.
23. W. van Ackooij, I. Aleksovska, and M. Munoz-Zuniga. (Sub-)differentiability of probability functions with elliptical
distributions. Set-Valued Var. Anal., 26(4):887–910, 2018.
24. W. van Ackooij and R. Henrion. Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-
like distributions. SIAM J. Optim., 24(4):1864–1889, 2014.
25. W. van Ackooij and R. Henrion. (Sub-) Gradient formulae for probability functions of random inequality systems under
Gaussian distribution. SIAM J. Uncertain. Quantif., 5(1):63–87, 2017.
26. W. van Ackooij and P. Pérez-Aros. Generalized differentiation of probability functions acting on an infinite system of
constraints. SIAM J. Optim., 29(3):2179–2210, 2019.
27. W. van Ackooij and P. Pérez-Aros. Generalized differentiation of probability functions: Parameter dependent sets given
by intersections of convex sets and complements of convex sets. Appl. Math. Optim., 85(1):2, Jan 2022.