0% found this document useful (0 votes)

29 views15 pages

Differentiability and Approximation of Probability

Cdf

Uploaded by

Nico Stagnaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views15 pages

Differentiability and Approximation of Probability

Cdf

Uploaded by

Nico Stagnaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Differentiability and Approximation of Probability Functions under

Gaussian Mixture Models: A Bayesian Approach

Gonzalo Contador · Pedro Pérez-Aros · Emilio Vilches

arXiv:2411.02721v1 [[Link]] 5 Nov 2024

Received: date / Accepted: date

Abstract In this work, we study probability functions associated with Gaussian mixture models. Our
primary focus is on extending the use of spherical radial decomposition for multivariate Gaussian random
vectors to the context of Gaussian mixture models, which are not inherently spherical but only condition-
ally so. Specifically, the conditional probability distribution, given a random parameter of the random
vector, follows a Gaussian distribution, allowing us to apply Bayesian analysis tools to the probability
function. This assumption, together with spherical radial decomposition for Gaussian random vectors,
enables us to represent the probability function as an integral over the Euclidean sphere. Using this
representation, we establish sufficient conditions to ensure the differentiability of the probability function
and provide and integral representation of its gradient. Furthermore, leveraging the Bayesian decomposi-
tion, we approximate the probability function using random sampling over the parameter space and the
Euclidean sphere. Finally, we present numerical examples that illustrate the advantages of this approach
over classical approximations based on random vector sampling.

Keywords stochastic programming · chance constraint · probability functions · Gaussian mixture

models · Bayesian statistics

The first author was partially supported by ANID-Chile grant Exploración 13220097. The second and third author were
partially supported by Centro de Modelamiento Matemático (CMM), ACE210010 and FB210005, BASAL funds for center
of excellence, as well as ANID-Chile grants MATH-AMSUD 23-MATH-17, ECOS-ANID ECOS320027, Fondecyt Regular
1220886, Fondecyt Regular 1240120, and Exploración 13220097. Additionally, the second author was supported by ANID-
Chile grants MATH-AMSUD 23-MATH-09 and Fondecyt Regular 1240335.

Gonzalo Contador
Universidad Técnica Federico Santa Marı́a, Santiago, Chile. E-mail: [Link]@[Link]
Pedro Pérez-Aros
Departamento de Ingenierı́a Matemática and Centro de Modelamiento Matemático (CNRS UMI 2807), Universidad de
Chile, Santiago, Chile. E-mail: pperez@[Link]
Emilio Vilches
Instituto de Ciencias de la Ingenierı́a, Universidad de O’Higgins, Rancagua, Chile.
Centro de Modelamiento Matemático (CNRS UMI 2807), Santiago, Chile. E-mail: [Link]@[Link]
2 Gonzalo Contador et al.

Mathematics Subject Classification (2000) Primary: 90C15, 65K10 · Secondary: 62F15

1 Introduction

A chance-constrained optimization problem is a mathematical program of the form:

min f (x) s.t. P (g(x, ξ) ≤ 0) ≥ p, (1)

where f : Rn → R is the objective function, ξ is a m-dimensional random vector defined on a probability

space, g : Rn × Rm → R represents an inequality constraint, and p ∈ (0, 1) is a reliability parameter.
In this context, a vector x ∈ Rn is feasible for the optimization problem (1) if and only if the random
inequality g(x, ξ) ≤ 0 is satisfied with probability of at least p. For classical monographs on optimization
problems with probabilistic constraints, we refer to [16, 21].
To efficiently compute numerical solutions for chance-constrained optimization problems, access to both
the values and gradients of the probability function, denoted as Φ(x) := P (g(x, ξ) ≤ 0), is crucial. A
typical procedure for computing this function involves sampling the random vector ξ using standard
Monte Carlo techniques and then approximating the actual probability value through a sample average.
The Law of Large Numbers guarantees that this approach converges towards the real probability as the
number of sampled values of ξ approaches infinity. However, this approach presents notable challenges,
especially when the random inequality g(x, ξ) ≤ 0 has a nonlinear structure with respect to the variable
ξ. Such a sampling procedure introduces inaccuracies in the probability computations, complicating the
solution process for optimization problem. This complexity arises from the need not only to compute
probability values but also to determine gradients of probability functions across multiple iterations of
various optimization methods, ultimately making this approach impractical for addressing problem (1).
Hence, there is a clear need for an efficient procedure to compute both the values and gradients of prob-
ability functions. In this context, some authors have utilized the so-called spherical radial decomposition
as an effective method for computing probability function values. This decomposition has proven to be
specially promising for studying first-order variational information of the probability function. Specifi-
cally, it provides insights into gradients when the probability function is smooth and subgradients in the
non-differentiable case. Most works related to this approach involve the analysis of elliptical symmetrical
distributions, such as multivariate Gaussian distributions (see, e.g., [23–27] and the references therein).
Nevertheless, various random phenomena exist for which elliptical symmetric distributions are not an
adequate model. Our motivation here is to provide a framework for solving chance-constrained problems
that do not exhibit and inherently spherical structure as described above for ξ but do so conditionally,
based on the realization of a latent variable, as in the case of Gaussian mixture models. This framework
leverages known results from the spherical case and extends them to the conditionally Gaussian scenario.
Gaussian mixture models are probabilistic models used to represent normally distributed subpopulations
within an overall population [12]. A Gaussian mixture model with K components is parameterized by a
PK component weights c, each component’s
distribution of nonnegative mixture PK means and covariance matrix,
with density given by h(ξ) = i=1 ci hi (ξ). Here, the weights satisfy i=1 ci = 1 and can be viewed as
either learned or an a priori distribution over components, where ci = P(ξ comes from component i) [18].
The function hi represents the density of a Gaussian vector with mean µi and covariance matrix Σi .
Probability Functions under Gaussian Mixture Models 3

Extensions of this model to a combination of uncountably many Gaussian distributions, weighted by a

probability density, are well-established in Bayesian analysis [17].
The introduction of a Bayesian framework to account for uncertainty in P by treating it as a prior
distribution over elements in the set of probability measures P ∈ P, which also allows practitioners
to incorporate their beliefs about the overall random structure of the problem, is a recent advance-
ment in chance-constrained optimization. Authors have employed specific conjugate structures to derive
closed-form posterior constraint distributions in optimization problems related to staffing [1], hydraulic
engineering [3], and portfolio allocation [10]. More recently, approximations to posterior distributions
have been developed and shown to be asymptotically optimal and consistent in a statistical sense [7].
Contributions by [9] and [22] have incorporated Bayesian analysis techniques in numerical optimization,
using observations of random phenomena ξ1 , . . . , ξn to approximate the true constraint distribution.
In most cases, however, the decision x (a ‘here-and now-decision’) must be made before the random
phenomena are observed. For example, in [5], the design of a mechanical structure (encoded by x), which
is constructed once and permanently, must withstand future random forces ξ with high probability.
Our proposal leverages a Bayesian formulation of the uncertainty set to derive a unified radial probability
function. This approach transforms the original probability function into an expectation over conditional
probability functions. Unlike the worst-case scenario probability formulation (which maximizes over all
possible realizations of probability estimates without considering the observer’s ability to estimate the
likelihood of these conditional probabilities occurring. see, e.g., [24, 26]) our method facilitates the study
of the differentiability of the probability function by combining classical differentiation techniques under
integration with spherical radial decomposition. Additionally, it enables the estimation of the true prob-
ability function through empirical approximations based on sampling from the parameter set (Bayesian
approximation) and the unit sphere (spherical radial decomposition).
The paper is organized as follows. In Section 2, we introduce the main notation used throughout the work
and provide the mathematical formulation of our approach. In Section 3, we examine the differentiability
of the probability function under Gaussian mixture models and present an integral representation using
the spherical radial decomposition (see Theorem 1). Section 4 is dedicated to formally establishing that
an approximation based on an i.i.d. sample over the set of parameters and the m-dimensional unit sphere,
with ad hoc probability distributions, converges uniformly over compact sets under mild assumptions,
both in terms of function values and gradient information (see Theorem 2). In Section 5, we illustrate
our results a numerical example. For clarity, the proofs of the main results are presented in the appendix.

2 Notation and Preliminary

In this paper, we adopt the standard notation commonly used in variational analysis, optimization,
statistics, and probability theory. For more details and comprehensive coverage, we refer the reader to
the monographs [2, 14, 16, 20, 21].
4 Gonzalo Contador et al.

2.1 Normal Distribution and Spherical Radial Decomposition

It is known that if ξ has a (multivariate) normal distribution ξ ∼ N (µ, Σ), then ξ admits a so-called
spherical radial decomposition representation, given by
ξ = µ + RLΘ,
where R follows a χ-distribution with m-degrees of freedom,
Pm 2Θ follows a uniform distribution ζΘ over
the m-dimensional unit sphere Sm−1 := {z ∈ Rm : i=1 zi = 1}, and L is the (unique) Cholesky
decomposition of Σ, such that LLT = Σ.
Now, given a probability function φ(x) := P(g(x, ξ) ≤ 0), for ξ ∼ N (µ, Σ), the above decomposition
allows us to rewrite φ as Z
φ(x) = e(x, v)dζΘ (v),
Sm−1
m−1
where ζΘ is the uniform measure on S , and e(x, v) represents the probability of satisfying the con-
straint for a given x when Θ = v is fixed. Specifically,
Z
e(x, v) = χ(s)ds, (2)
{r≥0 :g(x,µ+rLv)≤0}

where χ is the density of a χ-distribution with m-degrees of freedom, defined by

m 2
2π 2 m−1 r
χ(r) := m r exp − . (3)
Γ( 2 ) 2
We further details, see [24, 25] and the references therein.

2.2 Gaussian Mixture Models

A finite Gaussian mixture model in Rm can be written as a density p of the form

k
X
ξ ∼ p( · |µ1 , . . . , µk , Σ1 , . . . , Σk , w1 , . . . , wk ) = wi N (µi , Σi ),
i=1

where k ∈ N is a fixed integer, µ1 , . . . , µk are the means and Σ1 , . . . , Σk are the covariances of k Gaussian
distributions, and w1 , . . . , wk are the mixing proportions, which are positive and sum 1. In this way, the
model can be interpreted as ξ having a multivariate normal distribution with parameters µk and Σk with
probability wk .
To obtain the infinite limit, the authors in [17] place a prior on the (one-dimensional) common means,
with group variances given Gaussian and Gamma priors, respectively, and the mixing proportion vector
w = (w1 , . . . , wk )⊤ given a Dirichlet prior. Subsequent work also incorporates uncertainty quantification
on the number of categories [11]. We will generalize this framework so that µ and Σ are selected according
to a density m over a subset of feasible means and covariance matrices, specifically,
ξ|µ, Σ ∼ N (µ, Σ) and (µ, Σ) ∼ m.
This formulation will be detailed in Section 2.3.
Probability Functions under Gaussian Mixture Models 5

2.3 Mathematical Formulation and Spherical Radial decomposition for Gaussian Mixture Models

Let us consider a probability space (Ω, S, P), a random vector (ξ, C) : Ω → Ξ × C ⊂ Rm+p , and a set
of conditional probability measures P := {Pc (A) = E(1A |C = c) : c ∈ C} over Ξ, where the parameter
space C ⊂ Rp is a Borel measurable set. We denote by m the measure induced by the second coordinate
of (ξ, C) over the Borel σ-algebra B(C), that is,

m(A) = P (ω ∈ Ω : C(ω) ∈ A) . (4)

Hence, the chance constraint in (1) can be written as

Z
P(g(x, ξ) ≤ 0) = Pc (g(x, ξ) ≤ 0)dm(c),
P

and the chance-constrained problem (1) is then rewritten as what we will refer to from now on as a
Bayesian Chance Constraint (BCC) problem with confidence level p:
Z
min f (x) s.t. P(g(x, ξ)) ≤ 0) = Pc (g(x, ξ) ≤ 0)dm(c) ≥ p
P

We observe that the above formulation coincides with (1) when the distribution of ξ is fully specified,
i.e., when P = {P} and m = δP . In the context of an infinite Gaussian mixture model, we consider a
probability space (C, F, m) where the random variable C has probability distribution m as defined in (4).
In what follows, we consider measurable functions z : C → Rm and L : C → Rm×m such that

a) For each c ∈ C, conditional on the event C = c, the vector ξ ∼ N (z(c), Σ(c)), i.e.,

Z
Pc (ξ ∈ A) := P(ξ ∈ A|C = c) = hc (z)dz for all Borel set A ⊂ Rm ,
A

where

1 1
hc (z) := p exp − (z − z(c))⊤ Σ −1 (c)(z − z(c)) .
(2π)m det Σ(c) 2

We refer to [23, 26] for more details.

b) There exists η0 > 0 such that

∥z(c)∥ + ∥L(c)∥ ≤ η0 for all c ∈ C, (5)

where L(c) the Choleski decomposition of Σ(c), i.e., Σ(c) := L(c)⊤ L(c).

Hence, the expression hc (ξ) can be understood as the conditional density of ξ given C = c, which allows
us to define the conditional probability function as
Z
φc (x) := Pc (g(x, ξ) ≤ 0) = P(g(x, ξ) ≤ 0|C = c) = hc (z)dz (6)
{z:g(x,z)≤0}
6 Gonzalo Contador et al.

Condition a) guarantees that φc is well-defined for almost every c ∈ C (see, e.g., [4]), and from (6), it
follows that
Z
P(g(x, z) ≤ 0) = Pc (g(x, ξ) ≤ 0)dm(c) = E(φC (x)).
C

It is worth emphasizing that the probability defined in (6) can be interpreted as the probability of
satisfying the constraint, calculated after randomly selecting a Gaussian distribution according to m.
Applying the spherical radial decomposition on the event C = c for a Gaussian random variable ξ, as
described in Section 2.1, we obtain the following parameterized decomposition:

ξ|{C = c} = z(c) + RL(c)Θ.

Moreover, it is clear that (Θ, R) is independent with Θ uniform over the sphere Sm−1 and R follows a
χ-distribution with m-degres of freedom.
The conditional probability function (6) can be expressed as
Z
φc (x) = ec (v, x)dµζ (v),
Sm−1

where ec := e(c, ·, ·) and e : C × Sm−1 × Rn → R ∪ {+∞} is the parametric radial probability function in
(2) on the event C = c; specifically, i.e.
Z
ec (x, v) = χ(r)dr. (7)
{r≥0:g(x,rL(c)v+z(c))≤0}

In what follows, we say that x̄ is a z-uniform Slater point if there exists an open neighborhood U of x̄
and γ0 > 0 such that
g(x, z(c)) < −γ0 , for all x ∈ U and c ∈ C. (8)

3 Gradient of Probability Functions under Gaussian Mixture Models

In this section, we study the following (unconditional) probability function:

Z
Φ(x) := E(φc (x)) = φc (x)dm(c), (9)
C

where φc (x) is defined on (6).

The following result, proven in the appendix, establishes that the above probability function is well-
defined.

Proposition 1 Assume that z and Σ are measurable functions. Then, the parametric radial probability
function e : C ×Sm−1 ×Rn → R∪{+∞}, defined in (7), is F ⊗B(Sm−1 )⊗B(Rn ) measurable. Furthermore,
for any (c, v) ∈ C × Sm−1 the function ec (·, v) is continuous at any z-uniform Slater point (8).
Probability Functions under Gaussian Mixture Models 7

Before establishing the differentiability of the probability function defined in (9), we need to introduce a
condition that allows us to control the gradient of g locally at x̄, but globally on (z, c) ∈ Rm × C. In our
GMM setting, we consider the so-called exponential growth condition, first introduced in [24]. Without
this condition, differentiability may fail even for Gaussian distributions (see, e.g., [6, 24]).

Definition 1 (Exponential growth condition) We say that a function g satisfies the exponential
growth condition at x̄ if one can find constants ε, a, b > 0 such that
∥∇x g(x, z)∥ ≤ a exp(b∥z∥) for all (x, z) ∈ Bε (x̄) × Rm .

The following result proves the continuous differentiability of the probability function (9) and provides
an integral formula for its gradient.

Theorem 1 Let g : Rn × Rm → R be a continuously differentiable function satisfying the exponential

growth condition at x̄. Assume that g is convex in the second argument and x̄ is a z-uniform Slater point
as in (8). Then, the probability function Φ defined in (9) is continuously differentiable on a neighborhood
U ′ of x̄, and
Z
∇Φ(x) = ∇x ec (x, v)dζΘ ⊗ m(v, c), for all x ∈ U ′ . (10)
Sm−1 ×C

The proof of the above result will follow directly from a more general result presented in the appendix,
which provides an explicit representation for computing the integral formula in (10) in terms of the
function g, along with a detailed technical analysis of the estimation process for the integrand (see
Theorem 3). We refer to (17) for an explicit formula for ∇x ec (x, v) in terms of the data of the problem.

4 Approximation of Probability Functions via Monte Carlo Simulation

In this section, we propose a sampling method to approximate the probability function Φ in (9) and
its gradient. As Theorem 1 guarantees continuity of Φ, our goal is to obtain an expression that remains
continuous in x. Therefore, we avoid relying on naive i.i.d.-based Monte Carlo approximations of the form
N
1 X
Φ̃N (x) = 1{g(x,ξi )≤0} . (11)
N i=1

Instead, we propose a Monte Carlo approximation to estimate Φ(x) and ∇Φ(x) that relies on an i.i.d.
sample of {(vi , ci )}i∈N . Alternatively, if directly sampling from ζΘ ⊗ m is challenging, we can use an
asymptotic surrogate Markov Chain Monte Carlo (MCMC) methods [8,13], as such sequence has the same
asymptotic distribution as an i.i.d. ζΘ ⊗m sequence after a sufficiently large number of steps [19, Theorem
7.4]. By averaging the radial probability function (7) across the sample, we define the empirical spherical
approximation to (6) and its gradient (10) as
N N
1 X 1 X
ΦN (x) = ec (x, vi ) and ∇ΦN (x) = ∇x eci (x, vi ). (12)
N i=1 i N i=1

Hence, under mild regularity and growth conditions, the hypotheses in [21, Theorem 9.60] are satisfied,
leading to the following convergence theorem, which is proved in the appendix.
8 Gonzalo Contador et al.

Theorem 2 Let {(vi , ci )}N

i=1 be an i.i.d. sequence obtained from ζΘ ⊗ m or from a surrogate Markov
Chain method with stationary distribution ζΘ ⊗ m. Consider a compact set X ⊂ Rn such that, for every
x ∈ X, condition (8) and the exponential growth condition holds at x. Then, with probability 1, the
approximations ΦN (x) and ∇ΦN (x) defined in (12) converge uniformly on X to Φ and ∇Φ, respectively,
as N → ∞.

The above theorem enables us to establish asymptotic distributional results to assess consistency of the
approximations in (12) and to establish the asymptotic normality of their errors with controlled variance.
The result is the following:

Corollary 1 Under the assumptions of Theorem 2, for all x ∈ X, the following holds:
√ √
N [ΦN (x) − Φ(x)] →d N (0, σx2 ) and N [∇ΦN (x) − ∇Φ(x)] →d N (0, Σx2 ) as N → +∞,

where
Z
σx2 := ec (x, v)2 dζΘ ⊗ m(v, c) − Φ(x)2 ,
Sm−1 ×C
Z
Σx2 := [∇x ec (x, v)][∇x ec (x, v)]T dζΘ ⊗ m(v, c) − ∇Φ(x)∇Φ(x)T .
Sm−1 ×C

Proof Fix x ∈ X. We consider the case where {(vi , ci )}N i=1 is an i.i.d. sequence of ζΘ ⊗ m. The case of
MCMC is similar (see, e.g., [19]). Indeed, the sequence eci (x, vi ) is i.i.d. with mean Φ(x) and variance
σx2 . In the same way, the sequence ∇x eci (x, vi ) is i.i.d. with mean ∇Φ(x) and covariance matrix Σx2 .
Therefore, the result follows from the Central Limit Theorem [2, Theorem 27.2].

The following result establishes convergence in probability for our approximations.

Corollary 2 Under the assumptions of Theorem 2, for all x ∈ X, the following holds:

ΦN (x) →P Φ(x) and ∇ΦN (x) →P ∇Φ(x) as N → ∞.

Proof Due to Corollary 1, for all ε > 0, we have that

P(|ΦN (x) − Φ(x)| > ε) = OP (N −1/2 ) = oP (1).

The proof is similar for the case of the gradient.

The naive approximation Φ̃N in (11) is also asymptotically normal. However, for x ∈ X, its asymptotic
variance
R is σ̃x2 := Φ(x) − Φ(x)2 . Since for any (c, v), we have ec (x, v) ≥ ec (x, v)2 , it follows that Φ(x) ≥
e (x, v)2 dζΘ ⊗ m(v, c). Hence, σ̃x2 ≥ σx2 . Therefore, estimating the probability function by ΦN is
Sm−1 ×C c
uniformly more efficient than doing so via Φ̃N .
Probability Functions under Gaussian Mixture Models 9

5 Estimation of the Probability Function and Its Gradient: A Numerical Example

In this section, we illustrate the quality of the proposed approximations.

Let C ∈ [−1, 1] be a random variable such that D := 12 (C + 1) follows, for some δ > 0, a Beta(δ, δ)
distribution, that is,
Γ (2δ)
m(c) := 2δ−1 (1 − c2 )δ−1 for c ∈ [−1, 1],
2 Γ (δ)2

0 10
where Γ (·) denotes the Gamma function. Let ξ ∼ N , and suppose that g : Rn × R2 → R
C 01
is given by
g(x, ξ) = ∥ξ∥2 − ∥x∥2 − 2.
Hence, on the one hand, the probability (9) becomes
Z 1 Z 1
Γ (2δ) 2
Φ(x) = P(g(x, ξ) ≤ 0|C = c)dm(c) = 2δ−1 2
(1 − c2 )δ−1 Fχ2,c (∥x∥2 + 2)dc,
−1 2 Γ (δ) −1

where Fχk,λ denotes the cumulative distribution function of a non-central chi-squared random variable
with k degrees of freedom and noncentrality parameter λ, and fχk,λ denotes its corresponding density.
Furthermore, the derivative of the probability function is therefore given by
Z 1
Γ (2δ) 2 δ−1 2,c2 2
∇Φ(x) = (1 − c ) fχ (∥x∥ + 2)dc x.
22δ−2 Γ (δ)2 −1
On the other hand,
g(x, (0, c)⊤ ) ≤ −1 for all x ∈ Rn and c ∈ [−1, 1].
Hence, assumption (8) is satisfied with γ0 = 1/2. On the other hand,

|∇x g(x, ξ)| ≤ 2 exp ∥ξ∥ for all x ∈ Rn and ξ ∈ R2 ,

which implies the growth condition from Definition 1.

The spherical radial decomposition for ξ is ξ = (0, c)⊤ + RL(c)Θ, where L(c) is the identity matrix
and R2 ∼ χ22 , i.e., R has the Rayleigh distribution [15]. The function defined in (3) takes the form
2
χ(r) = re−r /2 for r > 0. Moreover, it is clear that for fixed (c, v), we have
q
g(x, rv1 , c + rv2 ) ≤ 0 ⇐⇒ r ≤ ρc (x, v) := −cv2 + ∥x∥2 + 2 − c2 v12

which implies that

2 2 x
ec (x, v) = 1 − e−ρc (x,v) /2
and ∇x ec (x, v) = ρc (x, v)e−ρc (x,v) /2
p .
∥x∥2 + 2 − c2 v12

The GitHub page [Link] contains simulation results and an R

script for conducting further simulations with user-defined parameters. Figure 1 presents simulation-based
results for x ∈ [0, 4], δ = 2.5 and a sample size N = 100, demonstrating that the proposed approximation
ΦN in (12) is smooth and offers significantly higher accuracy as an estimator of the true probability
10 Gonzalo Contador et al.

function compared to the naive estimator (11), as shown in the leftmost plot. Unlike Φ̃N , the smooth
profile of ΦN admits a continuous derivative ∇ΦN , which reasonably approximates the gradient of the
probability function ∇Φ, as illustrated in the plot on the right. The highest empirical error approximation
in the gradient approximation is less than 0.0007, and this high accuracy is achieved across the entire
domain with a relatively small sample size.

Constraint probability Gradient of probability function

1.0

∇Φ

∇ΦN

0.4
0.9

0.3
Probability

Derivative
0.8

0.2
0.7

0.1

ΦN
~
ΦN
0.0
0.6

0 1 2 3 4 0 1 2 3 4

x x

Fig. 1 Estimates for N = 100 and true values of the probability function Φ (left) and its derivative Φ′ (right). Naive
estimator (11) in blue, empirical spherical approximation (12) in red, and true functions in black.

Declarations

Data availability No datasets were generated or analyzed during the current study.
Conflict of interest The authors have no conflict of interest to declare..
Probability Functions under Gaussian Mixture Models 11

A Well-Posedness of the Probability Function

Proof of Proposition 1. Observe that the function (x, v, c, r) 7→ χ(r)1{r′ ≥0:g(x,r′ L(c)v+z(c))≤0} (r) is measurable. Hence,
using Fubini’s Theorem we get that
Z
(c, v, x) 7→ χ(r)dr
{r ′ ≥0:g(x,r ′ L(c)v+z(c))≤0}

is measurable, which proves the measurability of e. Now, let us fix (c, v) ∈ C × Sm−1 and let x ∈ Rn be a z-uniform Slater
point. Consider N > 0 and a sequence xk → x. Define the functions

ηk (r) := χ(r)1{r′ ≥:g(xk ,r′ L(c)v+z(c))≤0} (r), and η(r) := χ(r)1{r′ ≥:g(x,r′ L(c)v+z(c))≤0} (r)

First, note that |ηk (r)| ≤ χ(r) for all r ∈ [0, +∞). Furthermore, due to the convexity of g with respect to its second
argument, it is straightforward to show that ηk converges almost everywhere to η as k → ∞. Therefore, by the Lebesgue
Dominated Convergence theorem, we have that
Z Z
ec (x, v) = η(r)dr = lim ηk (r)dr = lim ec (xk , v),
k→∞ k→∞
R R

which implies that the function ec (·, v) is continuous at x.

B Differentiability of the Bayesian probability function

Define the sets of finite and infinite directions as follows:

F (x) := {(v, c) ∈ Sm−1 × C : ∃r > 0 s.t. g(x, z(c) + rL(c)v) > 0},
I(x) := {(v, c) ∈ Sm−1 × C : ∀r > 0 g(x, z(c) + rL(c)v) < 0}.

Furthermore, we introduce the radius function ρ : Rn × Sm−1 × C → R+ . defined by

ρc (x, v) := ρ(x, v, c) := sup{r ≥ 0 : g(x, rL(c)v + z(c)) ≤ 0}. (13)

The following proposition is a direct adaptation of the arguments presented in [27, Lemmas 3.2, Lemma 3.3 and 3.4] (see
also [25]).

Lemma 1 Suppose that condition (8) holds at x̄. Then, for every (x, v, c) ∈ U × Sm−1 × C, the set {r ≥ 0 : g(x, rL(c)v +
z(c)) ≤ 0} = [0, ρc (x, v)] (with the convention that [0, ∞] = [0, ∞)), and

g(x, z(c))
⟨∇z g(x, z(c) + ρc (x, v)L(c)v), L(c)v⟩ ≥ − . (14)
ρc (x, v)

Moreover, the function e, defined in (7), satisfies the identity

ρc (x,v)
Z
ec (x, v) = χ(r)dr, for every (x, v, c) ∈ Bε (x̄) × Sm−1 × C. (15)
0

Furthermore, the function ρ is B(Rn ) ⊗ B(Sm−1 ) ⊗ F measurable, and for every sequence (xk , vk ) → (x, v), we have that
ρc (xk , vk ) → ρc (x, v) for all c ∈ C. In addition, if ρc (x, v) < +∞, then the function ρc is continuously differentiable with
respect to x, with gradient
∇x g(x, z(c) + ρc (x, v)L(c)v)
∇x ρc (x, v) = . (16)
⟨∇z g(x, z(c) + ρc (x, v)L(c)v), L(c)v⟩
12 Gonzalo Contador et al.

Proof The equality

{r ≥ 0 : g(x, rL(c)v + z(c)) ≤ 0} = [0, ρc (x, v)]
follows from the convexity of g with respect to z, which directly implies (15) (see, e.g., [26, Lemma 2.4] or [24, Lemma 3.2]
for further details). Furthermore, the measurability of ρ follows from the equality: for ν ≥ 0,

{(x, v, c) ∈ U × Sm−1 × C : ρ(c, x, v) > ν} = {(x, v, c) ∈ U × Sm−1 × C : g(x, νL(c)v + z(c)) < 0},

where the set on the right-hand side of the equality is measurable due to the continuity of g and measurability of z and L.

Now, we establish a lower bound for the inner product between the partial gradient of g with respect to z and the directions
over the sphere. This technical inequality follows from the convexity of the function g (with respect to z), and its proof is
obtained by adapting the arguments from [27, Lemma 3.3] to our setting.

Proposition 2 Suppose that (8) holds, and consider (x, v, c) ∈ U × Sm−1 × C with (v, c) ∈ F (x). Let r0 such that
{r ≥ 0 : g(x, rL(c)v + z(c)) ≤ 0} = [0, r0 ]. Then ⟨∇z g(x, z(c) + r0 L(c)v), L(c)v⟩ ≥ −r0−1 g(x, z(c)) > 0.

The following result pertains to a technical of estimating the gradients of the (parameterized) radius function defined in
(13). Since we cannot assume a priori that the radial probability function is differentiable, the proof of this result relies on
techniques from variational analysis and generalized differentiation. We refer to [14, 20] for the primary notation and the
tools used in the proof.

Theorem 3 Let x̄ be a z-uniform Slater point, and suppose that g satisfies the exponential growth condition atx̄. Then,
there exists ε > 0 such that for all (x, v, c) ∈ Bε (x̄) × Sm−1 × C the function ec is continuously differentiable differentiable
with respect to x, with gradient
n
χ(ρc (x, v))∇x ρc (x, v) if (v, c) ∈ F (x),
∇x ec (x, v) = (17)
0 if (v, c) ∈ I(x).

Furthermore, there exist κ̂ ≥ 0 such that

∥∇x ec (x, v)∥ ≤ κ̂, for all (x, v, c) ∈ Bε (x̄) × Sm−1 × C. (18)

Proof Let U be the neighborhood in (8). We divide the proof into three claims.
Claim 1: For all (x, v, c) ∈ U × Sm−1 × C, the following inclusions hold:

∂ˆx ec (x, v) ⊂ {χ(ρc (x, v))∇x ρc (x, v)} for (v, c) ∈ F (x) and ∂ˆx ec (x, v) ⊂ {0} for (v, c) ∈ I(x), (19)

where ∂ˆx ec (x, v) denotes the Fréchet subdifferential of the function ec with respect to x.
Proof of Claim 1: Indeed, let x ∈ U . Then, by Lemma 1, we have that (17) holds whenever (v, c) ∈ F (x). On the other
hand, if (v, c) ∈ I(x), then ec (x, v) = 1, which implies that ec attains its maximum. It is then straightforward to show that
∂ˆx ec (x, v) ⊂ {0}.
Claim 2: There exist γ > 0 and κ̂ ≥ 0 such that

∂ˆx ec (x, v) ⊂ κ̂B, for all (x, v, c) ∈ Bγ (x̄) × Sm−1 × C.

Proof of Claim 2: Let ε, a, b > 0 be the parameters in the definition of the exponential growth condition. By shrinking ε
if necessary, we can assume that Bε (x̄) ⊂ U , where U is the neighborhood of x̄ from (8). Now, consider x ∈ Bε (x̄) and
x∗ ∈ ∂ˆx ec (x, v). Then, using (3), (16), and (19), we get

∥∇x g(x, z(c) + ρc (x, v)L(c)v)∥

∥x∗ ∥ ≤ χ(ρc (x, v))∥∇x ρc (x, v)∥ = c0 ρc (x, v)m−1 exp(−ρ(x, v)2 /2) ,
|⟨∇ξ g(x, z(c) + ρc (x, v)L(c)v), L(c)v⟩|
m
2π 2
where c0 = Γ(m )
. Now, as a consequence of (5), (8), (14), and the exponential growth condition, we have that
2

∥x∗ ∥ ≤ c0 γ0 ρc (x, v)m exp(−ρc (x, v)2 /2)a exp(bρc (x, v)∥L(c)∥) exp(b∥z(c)∥)
(20)
≤ ac0 γ0 exp(bη0 )ρc (x, v)m exp(−ρc (x, v)2 /2) exp(bη0 ρc (x, v)).
Probability Functions under Gaussian Mixture Models 13

Consequently,

∥x∗ ∥ ≤ ac0 γ0 exp(bη0 ) sup{rm exp(−r2 /2) exp(bη0 r) : r ≥ 0} =: κ̂.

This concludes the proof of the claim.

Claim 3: For all (v, c) ∈ Sm−1 × C, the function x 7→ ec (x, v) is continuously differentiable over Bγ/2 (x̄). Consequently,
(17) and (18) hold.
Proof of Claim 3: Fix (v, c) ∈ Sm−1 × C. First, using [14, Theorem 4.15], we conclude that x 7→ ec (x, v) is Lipschitz
continuous around any x ∈ Bε/2 (x̄). Second, due to Lemma 1, it is enough to prove continuous differentiability when
(v, c) ∈ I(x). Indeed, consider a sequence xk → x and x∗k ∈ ∂ˆx ec (xk , v); we will prove that x∗k → 0. If (v, c) ∈ I(xk ),
we have that ∥x∗k ∥ = 0, so we can assume that (v, c) ∈ F (xk ) for all k ∈ N. Then, by Lemma 1 and using the estimate
given in (20), we have that x∗k → 0. Finally, [14, Theorem 4.17] implies that x 7→ ec (x, v) is strictly differentiable at x with
∂x ec (x, v) = {0}, and from the arbitrariness of x, we obtain its continuous differentiability. ⊓
⊔

Proof of Theorem 1 By using (18) and the classical theorem on the interchange of integration and differentiation, we conclude
that the formula (10) holds within a suitable neighborhood of x̄. Furthermore, considering the integrand in (10) and the
uniform boundedness in (18), we can establish the local continuity of the gradient around x̄. ⊓
⊔

C Approximation of Bayesian Probability Functions

Proof of Theorem 2: On the one hand, notice that for every x ∈ X, the following inequality holds:

ec (x, v) ≤ 1, for all (c, v) ∈ C × S m−1 .

On the other hand, by (18) and a classical argument of compactness, we can ensure that there exists κ̂ > 0 and an open
set U ′ ⊃ X such that

∥∇x ec (x, v)∥ ≤ κ̂, for all (x, v, c) ∈ U ′ × Sm−1 × C.

For rk a positive sequence converging to 0 and arbitrary x̄ ∈ X, define

∆1,k (v, c) = sup |ec (x, v) − ec (x̄, v)|, and ∆2,k (v, c) = sup |∇x ec (x, v) − ∇x ec (x̄, v)|.
x∈Brk (x̄) x∈Brk (x̄)

By Theorem 3, we have that for j = 1, 2, as k → ∞, ∆j,k (v, c) → 0 (ζΘ ⊗ m) almost surely. Since ∆1,k (v, c) ≤ 2 and
∆2,k (v, c) ≤ 2κ̂, the dominated convergence theorem implies that for j = 1, 2,
Z Z
lim ∆j,k (v, c)d(ζΘ ⊗ m)(v, c) = lim ∆j,k (v, c)d(ζΘ ⊗ m)(v, c) = 0.
k→∞ k→∞
C×Sm−1 C×Sm−1

By the Strong Law of Large Numbers [2, Theorem 6.2] (or equivalently by [19, Theorem 7.4] when using a MCMC-type
sampler), it follows that for fixed k ∈ N and x̄ ∈ X, as N → ∞, almost surely both
N
X
sup |ΦN (x) − ΦN (x̄)| ≤ N −1 ∆1,k (ci , vi ) → E(∆1,k ),
x∈Brk (x̄)
i=1

and
N
X
sup |∇ΦN (x) − ∇ΦN (x̄)| ≤ N −1 ∆2,k (ci , vi ) → E(∆2,k ).
x∈Brk (x̄)
i=1

As the right-hand sides above converge to 0 as k → ∞, for any ε > 0, we can find k̃ such that

sup |ΦN (x) − ΦN (x̄)| ≤ ε/3 and sup |∇ΦN (x) − ∇ΦN (x̄)| ≤ ε/3.
|x−x̄|<rk̃ |x−x̄|<rk̃
14 Gonzalo Contador et al.

Since the sets Brk̃ (x̄) for x̄ ∈ X cover X, the compactness of X implies that there exists a finite collection x̄1 , . . . , x̄F and
K = max{k̃1 , · · · , k̃F } such that BrK (x̄1 ), . . . , BrK (x̄1 ) cover X. Therefore, for x̄ ∈ X, there exist indices 1 ≤ j0 ≤ F and
1 ≤ j1 ≤ F such that
|ΦN (x̄) − ΦN (x̄j0 )| ≤ sup |ΦN (x) − ΦN (x̄j0 )| ≤ ε/3,
|x−x̄j0 |<rK

and
|∇ΦN (x̄) − ∇ΦN (x̄j1 )| ≤ sup |∇ΦN (x) − ∇ΦN (x̄j1 )| ≤ ε/3.
|x−x̄j1 |<rK

The compactness of X and continuity of Φ and its gradient ensure that, possibly enlarging F and K, for each x̄ ∈ X, there
exists 1 ≤ j ≤ F such that

sup |Φ(x) − Φ(x̄j )| ≤ ε/3 and sup |∇Φ(x) − ∇Φ(x̄j )| ≤ ε/3.

|x−x̄j |<rK |x−x̄j |<rK

PN PN
For each 1 ≤ j ≤ F , h(c, v) = N −1 e (x̄j , vi )
i=1 ci
and h̃(c, v) = N −1 i=1
∇eci (x̄j , vi ) converges almost surely to Φ(x̄j )
and ∇Φ(x̄j ), respectively. Thus, we can find k̂ such that |ΦN (x̄j ) − Φ(x̄j )| ≤ ε/3 and |∇ΦN (x̄j ) − ∇Φ(x̄j )| ≤ ε/3 whenever
N ≥ k̂ for every j. Finally, by the triangle inequality, it follows that for arbitrary ε > 0, we have for k > max{k̂, K},

sup |ΦN (x) − Φ(x)| ≤ ε and sup |∇ΦN (x) − ∇Φ(x)| ≤ ε.

x∈X x∈X

Thus, the uniform convergence of ΦN and ∇ΦN is proven. ⊓

⊔

References

1. T. Aktekin and T. Ekin. Stochastic call center staffing with uncertain arrival, service and abandonment rates: A
bayesian perspective. Naval Research Logistics (NRL), 63(6):460–478, 2016.
2. P. Billingsley. Probability and Measure. John Wiley and Sons, second edition, 1986.
3. N. Chitsazan, H. V. Pham, and F. T.-C. Tsai. Bayesian chance-constrained hydraulic barrier design under geological
structure uncertainty. Groundwater, 53(6):908–919, 2015.
4. A. M. Faden. The existence of regular conditional probabilities: necessary and sufficient conditions. Ann. Probab.,
13(1):288–298, 1985.
5. T. González Grandón, R. Henrion, and P. Pérez-Aros. Dynamic probabilistic constraints under continuous random
distributions. Math. Program., pages 1065–1096, 2020.
6. A. Hantoute, R. Henrion, and P. Pérez-Aros. Subdifferential characterization of probability functions under Gaussian
distribution. Math. Program., 174(1-2 (B)):167–194, 2019.
7. P. Jaiswal, H. Honnappa, and V. A. Rao. Bayesian joint chance constrained optimization: Approximations and statistical
consistency. SIAM J. Optim., 33(3):1968–1995, 2023.
8. G. L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299–320, 2004.
9. J. Kirschner, I. Bogunovic, S. Jegelka, and A. Krause. Distributionally robust bayesian optimization. In S. Chiappa
and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and
Statistics, volume 108 of Proc. Mach. Learn. Res., pages 2174–2184. PMLR, 26–28 Aug 2020.
10. T. L. Lai, H. Xing, and Z. Chen. Mean–variance portfolio optimization when means and covariances are unknown.
Ann. Appl. Stat., 5(2A):798 – 823, 2011.
11. A. Matza and Y. Bistritz. Infinite Gaussian mixture modeling with an improved estimation of the number of clusters.
Proceedings of the AAAI Conference on Artificial Intelligence, 35(10):8921–8929, 2021.
12. G. McLachlan and D. Peel. Finite mixture models. Wiley Ser. Probab. Stat. Wiley, New York, 2000.
13. K. L. Mengersen. MCMC convergence diagnostics: a review. Bayesian statistics, 6:415–440, 1999.
14. B. S. Mordukhovich. Variational analysis and applications. Springer Monogr. Math. Springer, Cham, 2018.
15. A. Papoulis and S.U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill Series in Electrical
Engineering. Communications and Information Theory. Tata McGraw-Hill, 2002.
16. A. Prékopa. Stochastic programming, volume 324 of Mathematics and its Applications. Kluwer Academic Publishers
Group, Dordrecht, 1995.
17. C. Rasmussen. The infinite Gaussian mixture model. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural
Information Processing Systems, volume 12, pages 554–560. MIT Press, 04 2000.
Probability Functions under Gaussian Mixture Models 15

18. D. Reynolds. Gaussian mixture models. In S. Z. Li and A. K. Jain, editors, Encyclopedia of Biometrics, pages 827–832.
Springer US, Boston, MA, 2015.
19. C.P. Robert and G. Casella. Monte Carlo statistical methods. Springer Verlag, 2004.
20. R.T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der mathematischen Wis-
senschaften. Springer Verlag Berlin, 3rd edition, 2009.
21. A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on stochastic programming, volume 9 of MOS-SIAM Ser.
Optim. SIAM, Philadelphia, PA., second edition, 2014. Modeling and theory.
22. A. Shapiro, E. Zhou, and Y. Lin. Bayesian distributionally robust optimization. SIAM J. Optim., 33(2):1279–1304,
2023.
23. W. van Ackooij, I. Aleksovska, and M. Munoz-Zuniga. (Sub-)differentiability of probability functions with elliptical
distributions. Set-Valued Var. Anal., 26(4):887–910, 2018.
24. W. van Ackooij and R. Henrion. Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-
like distributions. SIAM J. Optim., 24(4):1864–1889, 2014.
25. W. van Ackooij and R. Henrion. (Sub-) Gradient formulae for probability functions of random inequality systems under
Gaussian distribution. SIAM J. Uncertain. Quantif., 5(1):63–87, 2017.
26. W. van Ackooij and P. Pérez-Aros. Generalized differentiation of probability functions acting on an infinite system of
constraints. SIAM J. Optim., 29(3):2179–2210, 2019.
27. W. van Ackooij and P. Pérez-Aros. Generalized differentiation of probability functions: Parameter dependent sets given
by intersections of convex sets and complements of convex sets. Appl. Math. Optim., 85(1):2, Jan 2022.

Bayesian Analysis of Binary Sequences
No ratings yet
Bayesian Analysis of Binary Sequences
13 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
MCMC and Bayesian Modeling Overview
No ratings yet
MCMC and Bayesian Modeling Overview
27 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Bayesian Point Estimation in Economics
No ratings yet
Bayesian Point Estimation in Economics
5 pages
Power of Human-Algorithm Collaboration in Solving Combinatorial Optimization Problems
No ratings yet
Power of Human-Algorithm Collaboration in Solving Combinatorial Optimization Problems
19 pages
Mil780 Classification
No ratings yet
Mil780 Classification
18 pages
Pham Gia2006
No ratings yet
Pham Gia2006
20 pages
qt9kb6x0bw Nosplash
No ratings yet
qt9kb6x0bw Nosplash
18 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
Efficient Monte Carlo Simulations For Stochastic Programming
No ratings yet
Efficient Monte Carlo Simulations For Stochastic Programming
24 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
100% (1)
Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
48 pages
Lecture3-Bayesian Decision Theory
No ratings yet
Lecture3-Bayesian Decision Theory
57 pages
Chapter 5 PDF Lecture Notes
No ratings yet
Chapter 5 PDF Lecture Notes
56 pages
Bayesian Inference Basics
No ratings yet
Bayesian Inference Basics
7 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
SMITH - CROFT - 2003 - BN For Discrete Multivariate Data
No ratings yet
SMITH - CROFT - 2003 - BN For Discrete Multivariate Data
16 pages
1992 Smith Gelfand
No ratings yet
1992 Smith Gelfand
6 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
Bayesian Optimization for ML Experts
No ratings yet
Bayesian Optimization for ML Experts
84 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
2014 - Picchini - Inference For SDE Models Via ABC
No ratings yet
2014 - Picchini - Inference For SDE Models Via ABC
22 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Solving Cahnce-Constrained Stochastic Programs Via Sampling and Integer Programing
No ratings yet
Solving Cahnce-Constrained Stochastic Programs Via Sampling and Integer Programing
10 pages
BORE: Bayesian Optimization by Density-Ratio Estimation: Brochu Et Al. 2010 Shahriari Et Al. 2015
No ratings yet
BORE: Bayesian Optimization by Density-Ratio Estimation: Brochu Et Al. 2010 Shahriari Et Al. 2015
26 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
Hyperdirichlet Distribution Overview
No ratings yet
Hyperdirichlet Distribution Overview
17 pages
Two-Point Estimates in Probabilities
No ratings yet
Two-Point Estimates in Probabilities
7 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
51 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Christian Robert - Bayesian Statistics - Lectures Solutions
No ratings yet
Christian Robert - Bayesian Statistics - Lectures Solutions
135 pages
Hitchhiker S Guide To Probability
No ratings yet
Hitchhiker S Guide To Probability
6 pages
Evolution of Bayesian Computation
No ratings yet
Evolution of Bayesian Computation
47 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Lecture Notes in Statistics 133: Edited by Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Oikin, N. Wennuth, S. Zeger
No ratings yet
Lecture Notes in Statistics 133: Edited by Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Oikin, N. Wennuth, S. Zeger
375 pages
Direct Assessment of Local Accuracy and Precision
No ratings yet
Direct Assessment of Local Accuracy and Precision
11 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Serie N 1
No ratings yet
Serie N 1
2 pages
Introductory Probability I Syllabus
No ratings yet
Introductory Probability I Syllabus
4 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Uncertainty in Counts and Operationg Time For Poisson Rates
No ratings yet
Uncertainty in Counts and Operationg Time For Poisson Rates
5 pages
Assign 1
No ratings yet
Assign 1
5 pages
Lec3 IntroToProbabilityAndStatistics
No ratings yet
Lec3 IntroToProbabilityAndStatistics
45 pages
Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
Models Beyond The DP
No ratings yet
Models Beyond The DP
47 pages
Data-Driven Robust Optimization
No ratings yet
Data-Driven Robust Optimization
43 pages
Summer Vacation Homework 2022 Class 10
No ratings yet
Summer Vacation Homework 2022 Class 10
3 pages
VHF/UHF Radio Safety Guide
No ratings yet
VHF/UHF Radio Safety Guide
14 pages
Big Bang Theory
No ratings yet
Big Bang Theory
9 pages
Community Engagement Reviewer 3RD QRT GR 12
No ratings yet
Community Engagement Reviewer 3RD QRT GR 12
8 pages
Detailed NOC Requirements Questionnaire
No ratings yet
Detailed NOC Requirements Questionnaire
6 pages
THE OF: Anxieties Mobility
No ratings yet
THE OF: Anxieties Mobility
209 pages
Describing Your Job Goal Worksheet
No ratings yet
Describing Your Job Goal Worksheet
4 pages
Unit 5 IS Notes
No ratings yet
Unit 5 IS Notes
27 pages
SR5 TOOL Adept Powers List + Description, Compiled
No ratings yet
SR5 TOOL Adept Powers List + Description, Compiled
5 pages
Datu. Multi TP
No ratings yet
Datu. Multi TP
1 page
TSC Unit 3
No ratings yet
TSC Unit 3
2 pages
Low Alloy Steel Grades and Compositions
No ratings yet
Low Alloy Steel Grades and Compositions
1 page
Instruction Manual: Hand Stacker Pa1015 Capacity 1000kg
No ratings yet
Instruction Manual: Hand Stacker Pa1015 Capacity 1000kg
17 pages
PTCUser Sweden Creo3.0 Update
No ratings yet
PTCUser Sweden Creo3.0 Update
114 pages
Hydac Plastic Injection Molding Machines
No ratings yet
Hydac Plastic Injection Molding Machines
12 pages
Loas Observer Training
No ratings yet
Loas Observer Training
4 pages
Computer HIstory and Modern
No ratings yet
Computer HIstory and Modern
60 pages
Advanced Construction Tech Module
No ratings yet
Advanced Construction Tech Module
3 pages
Uk - 503cw - 503cwd - CFV - Cfvii - cfv39 - User Manual - v6
100% (1)
Uk - 503cw - 503cwd - CFV - Cfvii - cfv39 - User Manual - v6
86 pages
Bleach - Can't Fear Your Own World, Vol. 3
No ratings yet
Bleach - Can't Fear Your Own World, Vol. 3
263 pages
Geoswath 4R: Next Generation Wide Swath Bathymetry and Side Scan
No ratings yet
Geoswath 4R: Next Generation Wide Swath Bathymetry and Side Scan
2 pages
Lee Et Al 2022 Construction of An Raav Producer Cell Line Through Synthetic Biology
No ratings yet
Lee Et Al 2022 Construction of An Raav Producer Cell Line Through Synthetic Biology
11 pages
Slides For Students-V3
No ratings yet
Slides For Students-V3
10 pages
Catch - Up Programs and Activities
No ratings yet
Catch - Up Programs and Activities
12 pages
(BIO) Chapter 15 - Reproduction in Plants
No ratings yet
(BIO) Chapter 15 - Reproduction in Plants
12 pages
RCC Check List (NBC 105 2020) - Footgcv
No ratings yet
RCC Check List (NBC 105 2020) - Footgcv
9 pages
Alco Valves
No ratings yet
Alco Valves
56 pages
YCMOU MBA HR Points
No ratings yet
YCMOU MBA HR Points
8 pages
Focus Group Invitation for Housing Changes
No ratings yet
Focus Group Invitation for Housing Changes
9 pages
Wujian100 - Open Userguide v1.0
No ratings yet
Wujian100 - Open Userguide v1.0
119 pages

Differentiability and Approximation of Probability

Uploaded by

Differentiability and Approximation of Probability

Uploaded by

Differentiability and Approximation of Probability Functions under

Gaussian Mixture Models: A Bayesian Approach

Gonzalo Contador · Pedro Pérez-Aros · Emilio Vilches

Received: date / Accepted: date

Keywords stochastic programming · chance constraint · probability functions · Gaussian mixture

Mathematics Subject Classification (2000) Primary: 90C15, 65K10 · Secondary: 62F15

A chance-constrained optimization problem is a mathematical program of the form:

min f (x) s.t. P (g(x, ξ) ≤ 0) ≥ p, (1)

where f : Rn → R is the objective function, ξ is a m-dimensional random vector defined on a probability

Extensions of this model to a combination of uncountably many Gaussian distributions, weighted by a

2 Notation and Preliminary

2.1 Normal Distribution and Spherical Radial Decomposition

where χ is the density of a χ-distribution with m-degrees of freedom, defined by

2.2 Gaussian Mixture Models

A finite Gaussian mixture model in Rm can be written as a density p of the form

m(A) = P (ω ∈ Ω : C(ω) ∈ A) . (4)

Hence, the chance constraint in (1) can be written as

We refer to [23, 26] for more details.

∥z(c)∥ + ∥L(c)∥ ≤ η0 for all c ∈ C, (5)

ξ|{C = c} = z(c) + RL(c)Θ.

3 Gradient of Probability Functions under Gaussian Mixture Models

In this section, we study the following (unconditional) probability function:

where φc (x) is defined on (6).

Theorem 1 Let g : Rn × Rm → R be a continuously differentiable function satisfying the exponential

4 Approximation of Probability Functions via Monte Carlo Simulation

Theorem 2 Let {(vi , ci )}N

The following result establishes convergence in probability for our approximations.

ΦN (x) →P Φ(x) and ∇ΦN (x) →P ∇Φ(x) as N → ∞.

Proof Due to Corollary 1, for all ε > 0, we have that

P(|ΦN (x) − Φ(x)| > ε) = OP (N −1/2 ) = oP (1).

The proof is similar for the case of the gradient.

5 Estimation of the Probability Function and Its Gradient: A Numerical Example

In this section, we illustrate the quality of the proposed approximations.

|∇x g(x, ξ)| ≤ 2 exp ∥ξ∥ for all x ∈ Rn and ξ ∈ R2 ,

which implies the growth condition from Definition 1.

which implies that

The GitHub page [Link] contains simulation results and an R

Constraint probability Gradient of probability function

A Well-Posedness of the Probability Function

which implies that the function ec (·, v) is continuous at x.

B Differentiability of the Bayesian probability function

Define the sets of finite and infinite directions as follows:

Furthermore, we introduce the radius function ρ : Rn × Sm−1 × C → R+ . defined by

ρc (x, v) := ρ(x, v, c) := sup{r ≥ 0 : g(x, rL(c)v + z(c)) ≤ 0}. (13)

Moreover, the function e, defined in (7), satisfies the identity

Proof The equality

Furthermore, there exist κ̂ ≥ 0 such that

∂ˆx ec (x, v) ⊂ κ̂B, for all (x, v, c) ∈ Bγ (x̄) × Sm−1 × C.

∥∇x g(x, z(c) + ρc (x, v)L(c)v)∥

∥x∗ ∥ ≤ ac0 γ0 exp(bη0 ) sup{rm exp(−r2 /2) exp(bη0 r) : r ≥ 0} =: κ̂.

This concludes the proof of the claim.

C Approximation of Bayesian Probability Functions

ec (x, v) ≤ 1, for all (c, v) ∈ C × S m−1 .

∥∇x ec (x, v)∥ ≤ κ̂, for all (x, v, c) ∈ U ′ × Sm−1 × C.

For rk a positive sequence converging to 0 and arbitrary x̄ ∈ X, define

sup |Φ(x) − Φ(x̄j )| ≤ ε/3 and sup |∇Φ(x) − ∇Φ(x̄j )| ≤ ε/3.

sup |ΦN (x) − Φ(x)| ≤ ε and sup |∇ΦN (x) − ∇Φ(x)| ≤ ε.

Thus, the uniform convergence of ΦN and ∇ΦN is proven. ⊓

You might also like