Random Batch Method With Momentum Correction: Zhao Chen Zhang
Random Batch Method With Momentum Correction: Zhao Chen Zhang
1. Introduction
1.1. Interacting particle systems
Interacting particle systems are a class of stochastic models that have gained significant attention in the realm
of applied mathematics and theoretical physics. These systems, comprised of a large number of individual particles
or agents, exhibit complex and often emergent behaviors that arise from the intricate interactions among the
particles[10, 11]. In general, the interacting particle systems can be represented by the following equation:
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝐾 𝑋 − 𝑋 𝑗 𝑑𝑡 + 𝜎𝑑𝐵 𝑖 . (1.1)
𝑁 − 1 𝑗≠𝑖
In this stochastic differential equation, 𝐾(⋅) is the kernel of interacting particle system, 𝑉 represents the external flow
independent of the interacting particle systems, 𝑋 𝑖 represents some property of the particle, such as its position, and
𝐵 is a stochastic disturbance while the diffusion term 𝜎 is represents the strength of the disturbance.
At the core of interacting particle systems lies the concept of randomness and probability. Each particle or agent
within the system is governed by a set of stochastic differential equations, which capture the dynamic evolution of the
system over time. These equations incorporate the interactions between the particles, enabling the modeling of a wide
range of phenomena, from the diffusion of molecules in a fluid to the collective behavior of social networks.
The mathematical analysis of interacting particle systems requires an in-depth understanding of probability
theory, stochastic processes, and partial differential equations. In addition to their theoretical significance, interacting
particle systems are widely used in many fields, such as physics, chemistry, biology, economics, and social sciences
[24, 25, 26, 27]. From modeling the spread of outbreaks to understanding financial market dynamics, these systems
have proven to be powerful frameworks for capturing the complex interactions in the world around us.
Due to the significant theoretical and practical implications of interacting particle systems, finding effective
solutions to these systems is of great importance. In other words, devising accurate and efficient numerical methods
for solving interacting particle systems is a problem worthy of in-depth investigation.
∗ Corresponding
author
[email protected] (Y. Zhao)
ORCID (s):
3. The paper provides a theoretical analysis, proving an upper bound on the error of the RBM-M algorithm. It also
analyzes in detail the reasons why the error is smaller than that of the RBM algorithm. In the other hand, from
a probabilistic perspective, the statistical properties of RBM-M are demonstrated (Detail in the theorem 3.4, the
theorem 3.5 and the figure 7).
4. The RBM-M algorithm introduces a hyperparameter 𝛽, which allows for the flexible control of the correction
strength. As the singularity of the system increases, 𝛽 can be adjusted to obtain higher accuracy. Consequently,
the RBM-M algorithm exhibits improved robustness and flexibility, making it suitable for a wider range of
example systems.
In Section 2, we will be devoted to our proposed algorithm, a momentum-based modification of the random batch
method, henceforth referred to as RBM-M. Then we will provide a rigorous analysis of the error characteristics of this
RBM-M approach in Section 3. Section 4 will then detail numerical experiments conducted to validate the efficacy of
the conclusions and algorithms associated with our RBM-M method. Finally, Section 5 will discuss potential future
research directions and offer concluding remarks.
2. RBM-M
2.1. The deficiency of RBM
While the Random Batch Method (RBM) offers numerous advantages, it also possesses a certain limitation.
Specifically, in order to obtain more accurate results, it is necessary to impose certain restrictions on the interacting
particle system to ensure that the system’s singularity is not excessively large. The following conclusions provide an
analysis of the error associated with RBM under specific conditions [1].
Proposition 2.1. Suppose 𝑉 is strongly convex on ℝ𝑑 so that 𝑥 ↦ 𝑉 (𝑥) − 21 |𝑥|2 is convex, and ∇𝑉 , ∇2 𝑉 have
polynomial growth (i.e., |∇𝑉 (𝑥)| + |∇2 𝑉 (𝑥)| ≤ 𝐶(1 + |𝑥|𝑞 ) for some 𝑞 > 0). Assume 𝐾(⋅) is bounded, Lipschitz on
ℝ𝑑 with Lipschitz constant 𝐿 and has bounded second order derivatives. Then the error of RBM:
√ √
1 𝜏
sup ∥ 𝑍 (𝑡) ∥≤ 𝐶 + 𝜏 2 ≤ 𝐶 𝜏.
𝑡≥0 𝑝−1
It can be observed that in order to minimize the numerical error of the RBM as the time discretization step size
𝜏 approaches zero, it is necessary to ensure that the kernel function 𝐾 of the interacting particle system is relatively
smooth, and the external flow 𝑉 also satisfies certain properties.
However, not all interacting particle systems possess these desirable properties. Furthermore, when the singularity
of the particle system is relatively large, the error associated with RBM can become substantial, and it may even be
the case that the method does not converge at all. Therefore, the challenge of improving the algorithm to enhance the
stability of RBM and obtain accurate results in the presence of particle systems with relatively large singularities is a
problem that merits further investigation.
∣ 𝑧 ∣2
𝐾𝛿 (𝑧) = 𝐾(𝑧) . (2.1)
∣ 𝑧 ∣2 +𝛿 2
𝐶 ⋅ (𝑏𝑦 − 𝑎𝑦 )
𝐾2 (𝑏 − 𝑎) = .
‖𝑏 − 𝑎‖2
Where 𝑎 and 𝑏 represent the positions of the two examples respectively, and their corresponding angle symbols indicate
whether the component is in the x- direction or the y-direction, and 𝐶 is a constant.
For this kernel function, there is a singularity at z = 0. Hence, a regularization method is adopted:
‖𝑧‖2 𝑧
𝐾𝛿 (𝑧) = 𝐾(𝑧) ⋅ =𝐶⋅ .
‖𝑧‖2 + 𝛿 2 ‖𝑧‖2 + 𝛿 2
Therefore, the new kernel function 𝐾𝛿 (𝑧) is continuous on ℝ𝑑 , and obviously 𝐾𝛿 (𝑧) is globally stable, hence 𝐾𝛿 (𝑧) is
Lipschitz continuous on ℝ𝑑 . Then the interacting particle systems satisfies the requirements of the proposition 2.1, so
its error can be guaranteed by the proposition.
Algorithm 1 RBM-M
( )
𝐾 𝑋 𝑖 − 𝑋 𝑗 = 0 when time=0
for m in 1:[T / 𝜏] do
Divide {1, 2, … , 𝑝𝑛} into n batches randomly.
for each batch 𝑞 do
( ) [ )
Update 𝑋 𝑖 ′ 𝑠 𝑖 ∈ 𝑞 by solving the following SDE with 𝑡 ∈ 𝑡𝑚−1 , 𝑡𝑚
( 𝑖 ) ( ) ( )
𝐾𝑐𝑜𝑟 𝑋 − 𝑋 𝑗 = 𝛽𝐾𝑡−1 𝑋 𝑖 − 𝑋 𝑗 + (1 − 𝛽)𝐾𝑡 𝑋 𝑖 − 𝑋 𝑗
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝑝−1 𝑗
𝑗∈𝑞 ,𝑗≠𝑖 𝐾𝑐𝑜𝑟 𝑋 − 𝑋 𝑑𝑡 + 𝜎𝑑𝐵
𝑖
end for
end for
The RBM-M is designed to address the challenges associated with interacting particle systems, where the evolution
time spans the interval [0, 𝑇 ], and the discrete time step is denoted as 𝜏. The algorithmic flow can be outlined as follows:
For the solution of the governing equations at each time step, the total 𝑁 particles are divided into 𝑛 groups, each
containing 𝑝 particles (if the division is not evenly distributed, the final group may have fewer than 𝑝 particles). For
each element within the group 𝐶𝑝 , the local interactions are computed and used as a proxy for the global interactions
to solve the governing equations. After solving the equations for one time step, the particle grouping is reorganized,
and the aforementioned process is repeated until the final time 𝑇 is reached. The interactions calculated within this
algorithmic framework serve as the cornerstone for the RBM’s approach to modeling interacting particle systems.
( ) ( ) ( )
𝐾𝑐𝑜𝑟 𝑋 𝑖 − 𝑋 𝑗 = 𝛽𝐾𝑡−1 𝑋 𝑖 − 𝑋 𝑗 + (1 − 𝛽)𝐾𝑡 𝑋 𝑖 − 𝑋 𝑗 . (2.2)
The RBM-M employs an exponential weighted averaging technique, where each update step comprises two com-
ponents: the interaction of the current time instance and the particle dynamics from the previous time step. The
hyperparameter 𝛽, which can be determined empirically, plays a crucial role in this process.
When 𝛽 is smaller, the correction effect is less pronounced, and the update equation primarily considers the
information from the current time step. Conversely, as 𝛽 increases, the impact of momentum becomes more significant,
leading to a stronger corrective effect. However, this also introduces the potential for additional errors.
By performing the k-th evolution, the 𝐾𝑐𝑜𝑟 term incorporates information from all the preceding k evolutions,
thereby enhancing the stability and robustness of the algorithm. This approach can be visualized more intuitively
through the figure 1 provided. In the diagram, from top to bottom, the first green line represents the real interaction,
the second yellow line represents the RBM-M calculation, and the third orange line represents the RBM calculation.
The gray dashed line represents the momentum correction effect, that is, the improvement of RBM by RBM-M.
Remark 2.1. In the RBM-M, the hyperparameter 𝛽 exhibits a distinct role compared to Stochastic Gradient Descent
(SGD) with momentum. In RBM-M, the particle position changes at each step, leading to a different distribution at every
time step. Consequently, the previous information can introduce new errors in addition to the correction. To mitigate
Remark 2.2. Compared to the original RBM, RBM-M introduces additional computations when calculating the
interaction. However, this extra information is already available, and no additional data is required. Equivalently, the
extra computation per evolution in RBM-M amounts to two multiplications and one addition, which can be considered
negligible. As a result, the actual computational complexity of RBM-M remains 𝑂(𝑁), similar to RBM.
𝑢̃ 0𝑝 = 𝑢0𝑝 , (3.1)
𝑢̃ 𝑖𝑝 = 𝛽 𝑢̃ 𝑖−1 𝑖
𝑝 + (1 − 𝛽)𝑢𝑝 (𝑖 ≥ 1). (3.2)
And through some calculations related to exponential weighted averaging we can know that:
𝑢̃ 𝑛𝑝 = 𝛽 𝑢̃ 𝑛−1
𝑝 + (1 − 𝛽)𝑢𝑛𝑝
= 𝛽(𝛽 𝑢̃ 𝑛−2
𝑝 + (1 − 𝛽)𝑢𝑛−1 𝑛
𝑝 ) + (1 − 𝛽)𝑢𝑝
= 𝛽 𝑛 𝑢0𝑝 + 𝛽 𝑛−1 (1 − 𝛽)𝑢1𝑝 + ⋯ + 𝛽(1 − 𝛽)𝑢𝑛−1
𝑝 + (1 − 𝛽)𝑢𝑛𝑝
∑
𝑛
= 𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 .
𝑖=1
Next, we will estimate the error of the RBM-M algorithm. Before that, we should note that based on our previous
assumptions, the parameters 𝛽 and 𝜏 are both very small. Then, we will present some propositions that will be used to
prove the error bound.
First, we will analyze the errors in adjacent time steps of the equation:
Proposition 3.1. Assuming the conditions proposition 2.1 in are met,then
∥ 𝑢𝑡𝑝 − 𝑢𝑖−1
𝑝 ∥≤ 𝐶𝜏, 𝑡 ∈ (𝑡𝑖−1 , 𝑡𝑖 ). (3.3)
Theorem 3.2. Suppose 𝑉 is strongly convex on ℝ𝑑 so that 𝑥 ↦ 𝑉 (𝑥)− 21 |𝑥|2 is convex, and ∇𝑉 , ∇2 𝑉 have polynomial
growth (i.e., |∇𝑉 (𝑥)|+|∇2 𝑉 (𝑥)| ≤ 𝐶(1+|𝑥|𝑞 ) for some 𝑞 > 0). Assume 𝐾(⋅) is bounded, Lipschitz on ℝ𝑑 with Lipschitz
constant 𝐿 and has bounded second order derivatives. Then the error of RBM-M:
√ √
2 𝜏
𝑛
∥ 𝐸 − 𝑢̃ 𝑝 ∥≤ 𝐶(1 − 𝛽 ) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 ) ≲ 𝐶(1 − 𝛽 2 ) 𝜏. (3.4)
𝑝−1
∑
𝑛
∥𝐸 − 𝑢̃ 𝑛𝑝 ∥≤∥ 𝐸 − (𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 ) ∥ .
𝑖=1
∑
𝑛
∥ 𝐸 − (𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 ) ∥
𝑖=1
≤ ∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛−1
𝑝1
− (1 − 𝛽)𝑢𝑛𝑝2 ∥ +𝑂(𝛽 2 ),
∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛−1
𝑝1
− (1 − 𝛽)𝑢𝑛𝑝2 ∥ +𝑂(𝛽 2 )
= ∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛𝑝1 − (1 − 𝛽)𝑢𝑛𝑝2 + (1 − 𝛽)𝛽(𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
) ∥ +𝑂(𝛽 2 )
≤𝛽(1 − 𝛽) ∥ 𝐸 − 𝑢𝑛𝑝1 ∥ +(1 − 𝛽) ∥ 𝐸 − 𝑢𝑛𝑝2 ∥ +𝛽(1 − 𝛽) ∥ 𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
∥ +𝑂(𝛽 2 ).
By proposition 2.1:
√
𝜏
∥𝐸 − 𝑢𝑛𝑝1 ∥≤ 𝐶 + 𝜏 2,
𝑝−1
√
𝜏
∥𝐸 − 𝑢𝑛𝑝2 ∥≤ 𝐶 + 𝜏 2.
𝑝−1
By proposition 3.1:
∥ 𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
∥≤ 𝐶𝜏.
∥ 𝐸 − 𝑢̃ 𝑛𝑝 ∥
√ √
𝜏 𝜏
≤𝐶(1 − 𝛽) 2
+ 𝜏 + 𝐶𝛽(1 − 𝛽) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 )
𝑝−1 𝑝−1
√
𝜏
≤𝐶(1 − 𝛽 2 ) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 )
𝑝−1
√
≲𝐶(1 − 𝛽 2 ) 𝜏.
We have mathematically proven that the error of the RBM-M is smaller than that of the RBM.
The error of the RBM-M can be decomposed into two terms. The first term of error (3.4) is:
√
𝜏
(1 − 𝛽 2 ) + 𝜏 2,
𝑝−1
which represents the main part of the error and corresponds to the error of the RBM. The second term of error (3.4) is:
where the expectation is taken with respect to the random division of batches. Moreover, the variance is given by
( )
1 1
Var(𝑋𝑚,𝑖 (𝑥)) = 𝔼[𝑋𝑚,𝑖 (𝑥)]2 = − Λ𝑖 (𝑡), (3.7)
𝑝−1 𝑁 −1
where
1 ∑ 1 ∑
Λ𝑖 (𝑡) = ∣ 𝐾(𝑥𝑖 − 𝑥𝑗 ) − 𝐾(𝑥𝑖 − 𝑥𝑙 ) ∣2 . (3.8)
𝑁 − 2 𝑗≠𝑖 𝑁 − 1 𝑘≠𝑖
1−𝛽 ∑
𝐾̃ 𝑛𝑖 =𝛽 𝐾̃ 𝑛−1 + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )
𝑝−1 𝑗∈𝐶𝑛 ,𝑗≠𝑖
𝛽𝑛 ∑ 1 − 𝛽 ∑ 𝑛−𝑠 ∑
𝑛
= 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) + 𝛽 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ).
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑝 − 1 𝑠=1 𝑗∈𝐶 ,𝑗≠𝑖
0 𝑠
1 ∑ 1 ∑
𝔼(𝑋𝑛,𝑖 (𝑥)) = 𝔼( 𝐾(𝑥𝑖 − 𝑥𝑗 ) − 𝐾(𝑥𝑖 − 𝑥𝑗 )) = 0.
𝑝 − 1 𝑗∈𝐶,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
Then we have:
1 ∑ 1 ∑
∥ 𝔼(𝑌𝑛,𝑖 (𝑥)) ∥ ≤ 𝛽 𝑛 ∥ 𝔼( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥ (3.13)
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
0
1 ∑ 1 ∑
+ ⋯ + 𝛽(1 − 𝛽) ∥ 𝔼( 𝐾(𝑥𝑖𝑛−1 − 𝑥𝑗𝑛−1 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑛−1
1 ∑ 1 ∑
+ (1 − 𝛽) ∥ 𝔼( 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑛
𝛽𝑛 ∑
≤ ∥ 𝐾(𝑥𝑖0 − 𝑥𝑗0 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑁 − 1 𝑗≠𝑖
𝛽(1 − 𝛽) ∑
+⋯+ ∥ 𝐾(𝑥𝑖𝑛−1 − 𝑥𝑗𝑛−1 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑁 − 1 𝑗≠𝑖
(1 − 𝛽) ∑
+ ∥ 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥ .
𝑁 − 1 𝑗≠𝑖
By proposition 3.1 and the 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 of K and ∥ 𝑥 ∥ is bounded,we can get it easily:
𝛽𝑛 𝛽 𝑛−1 (1 − 𝛽) 𝛽(1 − 𝛽)
∥ 𝔼(𝑌𝑛,𝑖 (𝑥)) ∥ ≤ 𝐶1 𝑛𝜏 + 𝐶2 (𝑛 − 1)𝜏 + ⋯ + 𝐶 𝜏
𝑁 −1 𝑁 −1 𝑁 − 1 𝑛−1
𝐶𝜏
≤ (𝑛𝛽 𝑛 + (𝑛 − 1)𝛽 𝑛−1 + ⋯ + 𝛽). (3.15)
𝑁 −1
Using misalignment subtraction:
𝛽 − 𝛽 𝑛+1 𝑛𝛽 𝑛+1
𝑛𝛽 𝑛 + (𝑛 − 1)𝛽 𝑛−1 + ⋯ + 𝛽 = − . (3.16)
(1 − 𝛽)2 1−𝛽
Then there are some properties of variance that prove that the variance of RBM-M is more stable:
Theorem 3.5. Assume when 𝑡1 ≠ 𝑡2 , 𝐾(𝑥𝑖𝑡1 −𝑥𝑗𝑡1 ) and 𝐾(𝑥𝑖𝑡2 −𝑥𝑗𝑡2 ) are independent (it is reasonable for the re grouping
of different time periods to be independent of each other), and 𝜏 → 0, 𝛽 → 0, then we have the following estimation
formula:
(1 − 𝛽)2 𝛽2 (1 − 𝛽)2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 (𝑥)) < ( + 𝑜(𝛽 2𝑛+1 ))𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) + 2𝜏 ≲ 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ), (3.18)
1 − 𝛽2 (1 − 𝛽 2 )2 1 − 𝛽2
while 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) is RBM’s variance.
Proof. According to the definition of variance:
2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) = 𝔼(𝑌𝑛,𝑖 ) − (𝔼(𝑌𝑛,𝑖 ))2 ,
2 1 ∑
𝔼(𝑌𝑛,𝑖 ) = 𝔼 ∣ 𝐾̃ 𝑛𝑖 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) ∣2
𝑁 − 1 𝑖≠𝑗
2 ∑ 1 ∑ ∑
= 𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣2 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣ + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) 𝐾(𝑥𝑖𝑛 − 𝑥𝑘𝑛 ),
𝑁 − 1 𝑖≠𝑗 (𝑁 − 1)2 𝑖≠𝑗 𝑖≠𝑘
1 ∑
(𝔼(𝑌𝑛,𝑖 ))2 = (𝔼(𝐾̃ 𝑛𝑖 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )))2
𝑁 − 1 𝑖≠𝑗
2 ∑ 1 ∑ ∑
= (𝔼(𝐾̃ 𝑛𝑖 ))2 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣ + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) 𝐾(𝑥𝑖𝑛 − 𝑥𝑘𝑛 ).
𝑁 − 1 𝑖≠𝑗 2
(𝑁 − 1) 𝑖≠𝑗 𝑖≠𝑘
𝛽𝑛 ∑ 1 − 𝛽 ∑ 𝑛−𝑠 ∑
𝑛
− (𝔼( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) + 𝛽 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 )))2 .
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖
𝑝 − 1 𝑠=1 𝑗∈𝐶 ,𝑗≠𝑖
0 𝑠
𝛽 2𝑛 ∑ 1 ∑
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) = (𝔼 ∣ 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) ∣2 −( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ))2 )
(𝑝 − 1) 2 𝑁 − 1
𝑗∈𝐶 ,𝑗≠𝑖 0 𝑗≠𝑖
(1 − 𝛽)2 ∑
𝑛 ∑ ∑ 1 ∑
+ 𝛽 2(𝑛−𝑠) (𝐸 ∣ 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ) ∣2 −( 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ))2 ).
(𝑝 − 1)2 𝑠=1 𝑗∈𝐶𝑠 ,𝑗≠𝑖 𝑗∈𝐶0 ,𝑗≠𝑖
𝑁 − 1 𝑗≠𝑖
hence:
(1 − 𝛽)2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) < 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ).
1 − 𝛽2
Through the above theoretical analysis, we know that when the hyperparameter 𝛽 is appropriately selected, the
error of the RBM-M algorithm is smaller than that of the RBM, and it exhibits better properties in terms of statistics
and probability. Next, we will conduct some numerical experiments to verify our theoretical conclusions from the
numerical perspective.
1 𝜌̃𝑁
𝑡
𝑡 |𝜌𝑡 ) =
𝑁 (𝜌̃𝑁 𝑁
𝜌̃𝑁 log . (3.19)
𝑁 ∫ℝ𝑁 𝑡 𝜌𝑁𝑡
This part of the proof is mainly referenced from [28, 29], using the solution and corrected kernel of RBM-M to
replace the solution and kernel of RBM.
∑
𝑁 ∑
𝑁
𝜕𝑡 𝜌𝑁
𝑡 + 𝑑𝑖𝑣𝑥𝑖 (𝜌𝑁
𝑡 (𝑏(𝑥𝑖 ) + 𝐾 ∗ 𝜌𝑡 (𝑥𝑖 ))) = 𝜎 △𝑥𝑖 𝜌𝑁
𝑡 . (3.20)
𝑖=1 𝑖=1
For the original RBM and RBM-M solutions, Euler scheme is generally used to solve SDE to simulate the system
evolution. In order to make the distribution smoother, the solution process is processed continuously and the following
equation is obtained:
√
𝑋𝑡 = 𝑋𝑇𝑘 + (𝑡 − 𝑇𝑘 )𝑏(𝑋𝑇𝑘 ) + (𝑡 − 𝑇𝑘 )𝐾 𝐶𝑘 (𝑋𝑇𝑘 ) + 2𝜎(𝑊𝑇𝑘+1 − 𝑊𝑇𝑘 ), 𝑡 ∈ (𝑇𝑘 , 𝑇𝑘+1 ). (3.21)
In this case, t can continuously take values from (𝑇𝑘 , 𝑇𝑘+1 ) for the smooth simulation of each time.
For RBM,
𝑖 1 ∑
𝐾 𝐶𝑘 (𝑋𝑡𝑖 ) ∶= 𝐾 𝑡 = 𝐾(𝑋𝑡𝑖 − 𝑋𝑡𝑗 ), (3.22)
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖
𝑘
For continuously solved formula (3.21), the continuous solution distributions obtained by RBM and RBM-M are
denoted as 𝜚𝑁 𝑁
𝑡 and 𝜚̃𝑡 . And for them, the Liouville equation holds[28]:
( )
Proposition 3.6. Denote by 𝜚𝑁,𝐶
𝑡 the probability density function of 𝑋𝑡 = 𝑋𝑡1 , … , 𝑋𝑡𝑁 for 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). Then the
following Liouville equation holds:
∑
𝑁 ( ( 𝐶,𝑖 𝐶,𝑖
)) ∑𝑁
𝜕𝑡 𝜚𝑁,𝐶
𝑡 + div𝑥𝑖 𝜚𝑁,𝐶
𝑡 𝑏𝑡 (𝑥) + 𝐾 𝑡 (𝑥) = 𝜎Δ𝑥𝑖 𝜚𝑁,𝐶
𝑡 , (3.24)
𝑖=1 𝑖=1
where
𝐶,𝑖
[ ( ) ]
𝑏𝑡 (𝑥) = 𝔼 𝑏 𝑋𝑇𝑖 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ), (3.25)
𝑘
and
𝐶,𝑖
[ 𝑖 ]
𝐾 𝑡 (𝑥) ∶= 𝔼 𝐾 𝑡 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). (3.26)
where
[ ( ) ]
𝑏̃ 𝐶,𝑖 𝑖
𝑡 (𝑥) = 𝔼 𝑏 𝑋𝑇 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ), (3.28)
𝑘
To prove (3.19), let’s start with Fisher information, which is defined by:
1 1 1
≥ + , (3.31)
(𝑝 ∗ 𝑞) (𝑝) (𝑞)
Define a definite mapping:
𝐶 ( )
𝜓̃ 𝜏 𝑘 (𝐱) ∶= 𝐱 + 𝜏 𝐛(𝐱) + 𝐊𝐂𝐤 (𝐱) , (3.32)
where 𝐱, 𝐛(𝐱), 𝐊𝐂𝐤 (𝐱) represents the union form of all corresponding variables respectively, i.e
𝐱 = (𝑥1 , ⋯ , 𝑥𝑛 ) ∈ ℝ𝑁𝑑 ,
𝐛(𝐱) = (𝑏(𝑥1 ), ⋯ , 𝑏(𝑥𝑛 ))𝑇 ∈ ℝ𝑁𝑑 ,
̃ 𝐂𝐤 (𝐱) = (𝐾̃ 𝐶𝑘 (𝑥1 ), ⋯ , 𝐾̃ 𝐶𝑘 (𝑥𝑛 ))𝑇 ∈ ℝ𝑁𝑑 .
𝐊
𝜚̃𝑁,𝐶
𝑇
(𝐱)
𝑘
𝑝̃𝑘 (𝐳) = 𝐶
.
𝑑𝑒𝑡(∇𝜓̃ 𝜏 𝑘 𝐱)
( )
Let 𝑞𝜏 denote the Nd-dimensional Gaussian distribution 0, 2𝜎𝜏𝐈𝑁𝑑 . By (3.21) and (3.32), we have that
𝜚̃𝑁,𝐶 (𝐱) = 𝑝𝑘 (𝐱) ∗ 𝑞𝜏 . Note that for RBM-M, the velocity field is the same as for RBM. Moreover, if 𝐾(⋅) is Lipshitz,
𝑇𝑘 ( )
then 𝐾𝑐𝑜𝑟 (⋅) is also Lipshitz. Then we have the following estimate for 𝜚̃𝑁,𝐶
𝑇
upper bound:
𝑘
( ) ( )
( 𝑁 ) 1 + 𝜏(𝑟 + 𝐿) 𝑁𝑑(𝑟 + 𝐿)(3 + 𝜏(𝑟 + 𝐿))
𝜚̃𝑁,𝐶
𝑇𝑘
≤ max 𝜌0
, 𝑀𝑁, . (3.33)
(1 − 𝜏(𝑟 + 𝐿))2 2𝜎
For the specific proof process, see [28]. Recall (3.21), the 𝜚̃𝑁,𝐶
𝑡 [ is the probability density of the RBM for a given
( ) ]
𝑁 𝑁,𝑪
sequence of batches 𝑪 ∶= 𝐶0 , 𝐶1 , ⋯ , 𝐶𝑘 , ⋯ , so that 𝜌̃𝑇 = 𝔼𝑪 𝜚̃𝑇 . Moreover, by the Markov property, we are
𝑘 𝑘
able to define
[ ] [ )
𝑁,𝐶
𝜌̃𝑡 𝑘 ∶= 𝔼 𝜚̃𝑁,𝑪
𝑡 ∣ 𝐶 𝑖 , 𝑖 ≥ 𝑘 , 𝑡 ∈ 𝑇𝑘 , 𝑇𝑘+1 .
1 ∑ ( 1 ∑ (
𝑁
( ) ) 𝐶,𝑖 )
𝐹 𝑥𝑖 = 𝐾 𝑥𝑖 − 𝑥𝑗 , 𝐾 𝑡 (𝑥) = 𝐾 𝑥𝑖 − 𝑥𝑗 ,
𝑁 𝑗=1 𝑝 𝑗∈𝐶
𝑖
‖ 𝐶,𝑖 ‖
esssup𝐶 ‖ ‖
‖𝐾 𝑡 − 𝐹 ‖ ∞ 𝑑 < +∞,
‖ ‖𝐿 (ℝ )
then, we have,
( ) ( ) 𝐶
sup 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 ≤ 𝑁 𝜌̃0𝑡 ∣ 𝜌0𝑡 + 𝐶1 𝜏 2 + 2 .
𝑡 𝑁
Proof. First we switch the order of integration and derivation, then replace 𝜕𝑡 𝜌̃𝑁 𝑁
𝑡 and 𝜕𝑡 𝜌𝑡 with (3.24) and (3.20),
finally we use the Green formula:
𝜌̃𝑁
( 𝑁 𝑁) 𝑑(∫ℝ𝑁 𝜌̃𝑁
𝑡 log
) 𝑡
𝑑 1 𝜌𝑁
𝑡
𝜌̃ ∣ 𝜌𝑡 =
𝑑𝑡 𝑁 𝑡 𝑁 𝑑𝑡
𝜌̃ 𝑁 𝜕𝑡 𝜌̃𝑁 𝜕𝑡 𝜌𝑁
1 1
= 𝜕𝑡 𝜌̃𝑁
𝑡 𝑙𝑜𝑔 𝑡𝑁 𝑑𝐱 + 𝜌̃𝑁
𝑡 ( 𝑁𝑡 − 𝑁𝑡 )𝑑𝐱
𝑁 ∫ℝ𝑁𝑑 𝜌 𝑁 ∫ℝ𝑁𝑑 𝜌̃𝑡 𝜌̃𝑡
(𝑡 ) ( )
1 ( 𝑁) 𝑁
𝜌̃𝑡 1 ( 𝑁) 𝜌̃𝑁
𝑡
= 𝜕 𝜌̃ log 𝑁 + 1 𝑑𝐱 + 𝜕𝜌 − 𝑁 𝑑𝐱
𝑁 ∫ℝ𝑁𝑑 𝑡 𝑡 𝜌 𝑡
𝑁 ∫ℝ𝑁𝑑 𝑡 𝑡 𝜌 𝑡
[( ( ( )) ]) ( )
1 ∑
𝑁
𝑁,𝐶𝑘 ̃ 𝐶𝑘 ,𝑖 𝐶𝑘 ,𝑖 𝜌̃𝑁
𝑡
̃ 𝑁
= − 𝔼𝐶𝑘 𝑑𝑖𝑣𝑥𝑖 𝜌̃𝑡 𝑏𝑡 (𝐱) + 𝐾𝑡 (𝐱) − 𝜎 △𝑥𝑖 𝜌̃𝑡 log 𝑁 + 1 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
( )
1 ∑
𝑁
( ( ( ) ( ))) 𝜌̃𝑁
𝑡
+ −𝑑𝑖𝑣𝑥𝑖 𝜌𝑁 𝑡 𝑏 𝑥 𝑖 + 𝐾 ∗ 𝜌 𝑡 𝑥𝑖 + 𝜎 △ 𝑥 𝜌
𝑖 𝑡
𝑁
− 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡𝑁
( ( ( )) ) ( )
1 ∑
𝑁 𝑁
𝑁,𝐶𝑘 ̃ 𝐶𝑘 ,𝑖 𝐶 ,𝑖 𝜌
̃ 𝑡
= 𝔼𝐶𝑘 𝜌̃𝑡 𝑏𝑡 (𝐱) + 𝐾̃ 𝑡 𝑘 (𝐱) − 𝜎 div𝑥𝑖 𝜌̃𝑁 𝑡 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
( ( ( ) )
1 ∑
𝑁
( )) 𝜌̃𝑁
+ 𝜌𝑁
𝑡 𝑏 𝑥𝑖 + 𝐾 ∗ 𝜌𝑡 𝑥𝑖 − 𝜎 div𝑥𝑖 𝜌𝑁 𝑡 ⋅ −∇𝑥𝑖 𝑡𝑁 𝑑𝐱.
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
𝜌̃𝑁 𝜌𝑁 𝜌̃𝑁 1 ∑
Taking note of ∇𝑥𝑖 log 𝑡
= 𝑡
∇𝑥𝑖 𝑡
and introducing auxiliary variables 𝐾̃ 𝑡𝑖 and 𝐹 (𝑥𝑖 ) = 𝑗≠𝑖 𝐾(𝑥𝑖 − 𝑥𝑗 ) ,
𝜌𝑁
𝑡 𝜌̃𝑁
𝑡 𝜌𝑁
𝑡
𝑁
the above equation is equivalent to
( ( ( )))
1 ∑
𝑁
𝑑 ( 𝑁 𝑁) 𝑁,𝐶 𝐶 ,𝑖 𝜌̃𝑁
𝑡
𝑁 𝜌̃𝑡 ∣ 𝜌𝑡 = 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝑏̃ 𝑡 𝑘 (𝐱) − 𝑏 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱 (3.34)
𝑑𝑡 𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
1 ∑
𝑁
𝑁,𝐶 𝐶 ,𝑖 𝑁,𝐶 ( ) 𝜌̃𝑁
𝑡
+ 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡 𝑘 (𝐱) − 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱 (3.35)
𝑁 ∫ℝ𝑁𝑑 𝜌
𝑖=1 𝑡
∑𝑁 ( ( ) ( )) 𝜌̃𝑁
1 𝑁,𝐶 𝑡
+ 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝜌̃𝑁
𝑡 𝐹 𝑥 𝑖 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱 (3.36)
𝑁 ∫ℝ𝑁𝑑 𝜌𝑁
𝑖=1 𝑡
1 ∑𝑁
( ( ) ( )) 𝜌̃𝑁
𝑡
+ 𝐹 𝑥𝑖 − 𝐾 ∗ 𝜌𝑡 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇𝑥𝑖 log 𝑑𝐱 (3.37)
𝑁 ∫ℝ𝑁𝑑 𝜌𝑁
𝑖=1 𝑡
( ( ))
1 ∑
𝑁
𝑁,𝐶 ( ) 𝜌̃𝑁
𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝜌̃𝑁
𝑡 𝐹 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑡𝑁 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( [( )( ( ) ( ))] ( )] )
1 ∑
𝑁
𝑁,𝐶 𝑁,𝐶̃ [ ( ) 𝜌̃𝑁
𝑡
= 𝔼𝐶𝑘 ,𝐶̃𝑘 𝜌̃𝑡 𝑘 − 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 + 𝔼𝐶𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱.
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑁𝑡
1 2
Take note of 𝑎𝑏 ≤ 𝑟𝑎2 + 4𝑟
𝑏 , apply it then,
[( ( ) ( )) ( 𝑁,𝐶𝑘 )] 𝜌̃𝑁
𝑁,𝐶̃ 𝑡
𝔼𝐶𝑘 ,𝐶̃𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑡 − 𝜌̃𝑡 𝑘 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱
∫ℝ𝑁𝑑 𝜌𝑡
⎡ | 𝑁,𝐶𝑘 𝑁,𝐶̃𝑘 ||
2 ⎤
⎢ |𝜌̃ − 𝜌̃ ⎥
| 𝑡 | | |2
2 | ̃𝑖 ( ) ( )|2 |
𝑡
| 𝑑𝐱⎥ + 𝜎 𝑁|
𝜌̃𝑁
𝑡 |
≤ 𝔼𝐶𝑘 ,𝐶̃𝑘 ⎢ |𝐾𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 | 𝜌̃ |∇ log | 𝑑𝐱,
𝜎 ⎢∫ℝ𝑁𝑑 | | 𝑁,𝐶̃ ⎥ 8 ∫ℝ𝑁𝑑 𝑡 || 𝑥𝑖 𝜌𝑁 |
𝜌̃𝑡 𝑘 𝑡 |
⎢ ⎥
⎣ ⎦
[ ( ) ( )] 𝜌̃𝑁
𝑡
𝔼𝐶𝑘 𝐹 𝑥𝑖 − 𝐾̃ 𝑡𝑖 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱
∫ℝ𝑁𝑑 𝜌𝑁𝑡
( [ ( ) ( )])2 𝑁 | 𝜌̃𝑁 |2
2 𝜎 | 𝑡 |
≤ 𝔼𝐶𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑡 𝑑𝐱 + 𝜌̃𝑁 |∇ log | 𝑑𝐱.
𝜎 ∫ℝ𝑁𝑑 8 ∫ℝ𝑁𝑑 𝑡 || 𝑥𝑖 𝜌𝑁 |
𝑡 |
1 ( )
𝔼 (𝑓 log 𝑓 ) − (𝔼𝑓 ) log (𝔼𝑓 ) ≤ 𝔼 𝑓 |∇ log 𝑓 |2 . (3.39)
2
apply (3.39) so that,
( )
| 𝜌
̃ 𝑁 |2 𝜌
̃ 𝑁 | 𝜌
̃ 𝑁 |2
| | | |
𝜌̃𝑁 |∇ log 𝑡𝑁 | 𝑑𝐱 = 𝔼𝜌𝑁 𝑡
|∇ log 𝑁 |
𝑡
∫ℝ𝑁𝑑 𝑡 | 𝑥𝑖 𝜌 | 𝑡 𝜌 𝑁 | 𝑥𝑖 𝜌 |
| 𝑡 | 𝑡 | 𝑡 |
( ) ( ) ( )
𝜌̃𝑁
𝑡 𝜌̃𝑁
𝑡 𝜌̃𝑁𝑡 𝜌̃𝑁
𝑡 ( 𝑁 𝑁)
≥2𝔼𝜌𝑁 log − 2𝔼 𝜌 𝑁 log 𝔼𝜌 𝑁 ≥ 𝑐 𝑁 𝜌̃𝑡 ∣ 𝜌𝑡 .
𝑡 𝜌𝑁𝑡 𝜌𝑁𝑡
𝑡 𝜌𝑁 𝑡
𝑡 𝜌𝑁𝑡
The same method is applied to the other items in (3.34), and the following results can be obtained by integrating
them:
( ) ( )
2 ∑
𝑁
𝑑 | 𝑁 𝜎 𝑁| 𝑁 |𝑏̃ 𝑖 (𝑥) − 𝑏(𝑥𝑖 )|2 𝜌̃𝑁,𝐶𝑘 𝑑𝑥
𝑁 𝜌̃𝑁
𝑡 || 𝑡
𝜌 ≤ − 𝐻 𝜌
̃ |𝜌 + 𝔼
𝑑𝑡 2𝐶𝐿𝑆 𝑁 𝑡 | 𝑡 𝜎𝑁 𝑖=1 𝐶𝑘 ∫ℝ𝑑 | 𝑡 | 𝑡
where 𝐶𝐿𝑆 and 𝑐 are positive constants. Since RBM and RBM-M are almost identical except for kernel, the estimation
of the same term like 𝐽1 and 𝐽4 can be directly
( referred
) [to in
( the) analysis [28].] The estimates of several terms related
̃ are given here. For 𝐽2 , we have 𝐾̃ 𝑖 𝑥𝑖 = 𝔼 𝐾̃ 𝑖 𝑋̃ 𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 and apply Tayler’s expansion for any
to 𝐾(⋅)
[ ) 𝑡 𝑡 𝑡
𝑡 ∈ 𝑇𝑘 , 𝑇𝑘+1 :
( ) [ ( ) ( ) ]
𝐶 ,𝑖
𝐾̃ 𝑡 𝑘 (𝑥𝑖 ) − 𝐾̃ 𝑡𝑖 𝑥𝑖 = 𝔼 𝐾̃ 𝑡𝑖 𝑋̃ 𝑇𝑖 − 𝐾̃ 𝑡𝑖 𝑋̃ 𝑡𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘
𝑘
[ ] ( ) ( )
= 𝔼 𝑋̃ − 𝑋̃ 𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 ⋅ ∇𝑥 𝐾̃ 𝑖 𝑥𝑖 + 𝑟̂𝑡 𝑥𝑖 ,
𝑖
𝑇𝑘 𝑡 𝑖 𝑡
Note that
( the𝑝 Hessians) of K have at most polynomial growth through, ignore the higher order of 𝛽 and
sup𝑡≥0 𝔼 ||𝑋̃ 𝑡𝑖 || |𝐶 ≤ 𝐶𝑝 through [30]:
1 [| ( )| ]
| ( )| | 2 ̃𝑖 ̃ 𝑖 ̃ 𝑖 | | ̃𝑖 ̃ 𝑖|
2
̃
|𝑟̂𝑡 𝑥𝑖 | ≤ 𝔼 |∇𝑥 𝐾𝑡 (1 − 𝑠)𝑋𝑡 + 𝑠𝑋𝑇 | ⋅ |𝑋𝑇 − 𝑋𝑡 | ∣ 𝐗𝑡 = 𝐱, 𝐶𝑘 𝑑𝑠
| | ∫0 | 𝑖 𝑘 | | 𝑘 |
[ ( ) ( ) ]
| | ̃ 𝑖 |𝑝 | ̃ 𝑖 |𝑝 | ̃ 𝑖 |𝑝 || | ̃ 𝑖 𝑖|
2
| | ̃ 𝑖 |𝑝 ̃ ̃
≲ 𝐶𝔼 |𝛽 1 + |𝑋𝑡 | + |𝑋𝑇 | + 𝛽(1 − 𝛽) 1 + |𝑋𝑡−1 | + |𝑋𝑇 −1 | | ⋅ |𝑋𝑇 − 𝑋𝑡 | ∣ 𝐗𝑡 = 𝐱, 𝐶𝑘
| | 𝑘| | | | 𝑘 | | | 𝑘 |
[ ]
̃ ||𝑋̃ 𝑖 − 𝑋̃ 𝑖 || ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 .
2
≤ 𝐶𝔼
| 𝑘𝑇 𝑡 |
The remaining part of 𝐽2 has nothing to do with kernel, and the conclusion can be drawn by referring to the proof
of [28]:
( ( ))
1
𝐽2 ≤ 𝑐̃′ 𝜏 2 3 + 𝜌̃𝑁 𝑇𝑘 .
𝑁
𝑖
Then for the 𝐽3 , through the assumption that |𝐾 𝑡 − 𝐹 (𝑥𝑖 )| is bounded and kernel 𝐾(⋅) is Lipschitz continuous:
| ̃𝑖 ( ) ( )| | 𝑖 ( ) ( )| | 𝑖 ( ) ( )|
|𝐾𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 | ≤ ||𝐾 𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 || + ||𝐾 𝑡 𝑥𝑖 − 𝐾̃ 𝑡𝑖 𝑥𝑖 || ≤ 𝐶𝑏𝑜𝑢𝑛𝑑 + 𝐶𝐿 𝐿.
| | | | | |
𝜏 × 10−3 error
16 0.0188
4 0.0102
1 0.0051
0.25 0.0025
| 𝑁,𝐶𝑘 𝑁,𝐶̃𝑘 |2
⎡ |𝜌̃
| 𝑡 −𝜌̃𝑡 |
|
⎤
which means − 𝐹 (𝑥𝑖 )| is also bounded, then we only need to estimate 𝔼𝐶𝑘 ,𝐶̃𝑘 ⎢∫ℝ𝑁𝑑
|𝐾̃ 𝑡𝑖 | | 𝑑𝐱⎥ which has
⎢ 𝑁,𝜀
𝜌̃𝑡 𝑘 ⎥
⎣ ⎦
been given by [28]. So that we have
( ( ))
1
𝐽3 ≤ 𝑐̃′′ 𝜏 2 1 + 𝜌̃𝑁 𝑇𝑘 .
𝑁
Finally, combining all of the above results can be obtained
𝑑 ( ) ( ) 𝐶
𝜌̃𝑁 ∣ 𝜌𝑁 ≤ −𝐶0 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 + 𝐶1 𝜏 2 + 2 ,
𝑑𝑡 𝑁 𝑡 𝑡 𝑁
then
( ) ( ) 𝐶
sup 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 ≤ 𝑁 𝜌̃0𝑡 ∣ 𝜌0𝑡 + 𝐶1 𝜏 2 + 2 .
𝑡 𝑁
4. Numerical experiment
This section mainly demonstrates a set of numerical experiments to help verify the theories presented in the previous
section. In the following numerical experiments, regardless of whether the RBM or RBM-M algorithm is used, the
total number of particles 𝑁 in the interactive particle system is 10,000, and the number of particles 𝑝 in each group
is 360. This setting takes into account the computational efficiency, and the numerical instability caused by a small
number of particles in each group can be avoided.
The following numerical experiments can correspond to the previous theoretical analysis one by one, including the
convergence order of the kernel function after regularizing, the error of RBM-M and the good statistical properties of
RBM-M. In addition, the theoretical results are supplemented to some extent by numerical experiments.
√
4.1. The error is proportional to 𝜏
According to the theorem 3.2 that we gave the description, the error of RBM-M and discrete time step 𝜏 should be
proportional.
We select biot-savart to verify this conclusion. In the corresponding interacting particle systems, the corresponding
kernel function is[9]:
𝑧⊥
𝐾(𝑧) = .
∥ 𝑧 ∥2
Here 𝑧⊥ is the orthogonal complement of 𝑧. At the same time, for the sake of simplicity, the diffusion term 𝜎 of the
equation is selected to obey the standard normal distribution, and the applied flow 𝑉 of the system is 0.
The numerical results are recorded in table 1, and the result of numerical experiment that we do can prove it greatly:
It can be seen that as the time step of the dispersion becomes 14 of the original, the error of the algorithm basically
√
becomes half of the original. This shows that 𝜏 is proportional to error, which is exactly what our theory tells us.
( )( )
1 ∑ | |
d𝑉𝑡𝑖 = 𝐾 |𝑋𝑡𝑗 − 𝑋𝑡𝑖 | 𝑉𝑡𝑗 − 𝑉𝑡𝑖 d𝑡 + 𝜎d𝐵𝑡𝑖 , (4.2)
𝑁 𝑗≠𝑖 | |
In this second-order system, the variable 𝑥 can be interpreted as the position of the particle, and the variable 𝑣 can be
interpreted as the velocity of the particle. This provides a clear physical interpretation of the entire system. The system
is a concrete representation of Newton’s second law of motion. Equation (4.1) shows that the position of the particle
is determined by the product of velocity and time, while Equation (4.2) can be understood as the effect of an external
force on the particle, namely the acceleration, and the velocity is determined by the acceleration.
We set the initial conditions of the stochastic differential equation (SDE) such that the initial values of 𝑋 are
distributed uniformly within a unit circle, while the initial values of 𝑉 are set to 0. Additionally, the evolution time
is set to 𝑇 = 0.02𝑠, and the discrete time step is 𝜏 = 0.001𝑠. In this case, we will compare the real solution of the
interacting particle system, the numerical solution of RBM, and the numerical solution of RBM-M (with a control
parameter 𝛽 = 0.01). The results are shown in the figure 2 and 3.
It can be intuitively observed from the above figure that both RBM and RBM-M have achieved good results.
In addition to intuitively comparing the results from the figure above, the specific 𝐿2 values of RBM and RBM-M
are also listed here. The L2 error here is defined as:
√
√𝑁
√∑
𝑙 = √ (𝑥𝑖 − 𝑥𝑖𝑡𝑟𝑢𝑒 )2 . (4.4)
𝑖=1
Here 𝑥𝑖 represents the numerical value and 𝑥𝑖𝑡𝑟𝑢𝑒 represents the real solution. Under this definition, the specific
numerical result is: the error of RBM is 4.489 × 10−2 , and the error of RBM-M is 4.419 × 10−2 . In this example,
although the error of RBM-M is slightly lower than that of RBM, the effect is not significant. We speculate that
because this second-order interacting particle system is relatively smooth, the correction brought by the introduction
of momentum is not substantial. At the same time, the introduction of momentum may also introduce some errors,
so the value of the control parameter 𝛽 is small in this case. However, from another perspective, the performance of
RBM-M should be no worse than RBM, provided that the parameter 𝛽 is reasonably selected.
To further demonstrate that the RBM-M algorithm can indeed achieve good results when the interacting particle
system has a large singularity, we construct the following system:
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝐾 𝑋 − 𝑋 𝑗 𝑑𝑡 + 𝜎𝑑𝐵 𝑖 , (4.5)
𝑁 − 1 𝑗≠𝑖
cosh(𝑦𝑖 − 𝑦)
𝐾𝑦 (𝑠𝑖 − 𝑠) = . (4.7)
∥ 𝑠𝑖 − 𝑠 ∥2
here 𝑠 = (𝑥, 𝑦) represents the position of the particle, and 𝐾𝑥 (⋅) and 𝐾𝑦 (⋅) represent the components of the kernel
function in the X-axis and y-direction, respectively. The additional set system flow is:
( )
( ) 0
−∇𝑉 𝑠𝑖 = . (4.8)
𝑐𝑜𝑠(𝑥𝑖 )
Due to the large singularity of this system, RBM-M theoretically needs stronger correction effect, so the control
parameter 𝛽=0.1 here has a larger value. The remaining settings are consistent with the above second-order system, i.e.,
the initial value is that the initial values of 𝑋 are distributed uniformly in a unit circle, the evolution time of termination
is 𝑇 = 0.02𝑠, and the discrete time step is 𝜏 = 0.001𝑠. So the standard solution, the solution of RBM and the solution
of RBM-M are respectively in figure 4.
There is still no difference between the two numerical algorithms. Similarly, according to the definition of formula
(4.4), compare the 𝐿2 error of the value, where the error of RBM is 7.946 × 10−2 , and the error of RBM-M is only
4.691 × 10−2 . In this example, the difference between the two algorithms is still obvious, and the effect of RBM-M is
obviously better than that of RBM.
In addition, it should be noted that in this example, if the original equation is solved, when 𝑁 = 10, 000 particles
are selected, it takes 54.48𝑠 to solve, while with the same number of particles, it only takes 4.09𝑠 to use RBM and 4.45𝑠
Table 2
The effect of parameter 𝛽 on the algorithm.
to use RBM-M. This shows that both RBM and RBM-M can significantly reduce the computation time, and RBM-M
has little additional time compared to RBM.
At the same time, with this example, the influence of control parameter 𝛽 on the correction effect of RBM-M
algorithm can be explored. We set the value of 𝛽 from {0.04, 0.06, 0.08, 0.1, 0.12} , and take the selected beta as the
parameter of RBM-M to compare the accuracy of RBM-M algorithm under different parameter conditions, the results
are in table 2.
As can be seen from the above table, with the increase of control parameter 𝛽 from 0.04 to 0.12, the error of RBM-
M changes to the original 56.41%, 43.19%, 43.67%, 59.09% and 79.88% respectively, which is, when 𝛽 is selected
reasonably, the error can be reduced to less than half of the original. Due to the large singularity of the system, the
corrective effect of the algorithm is significant. Therefore, in the beginning, as 𝛽 increases, the performance of the
RBM-M algorithm improves gradually. However, as 𝛽 continues to increase, the corrective effect reaches saturation,
and the positive impact of further increasing 𝛽 becomes limited, while the introduced error becomes larger and larger,
resulting in a deterioration of the final outcome. In summary, when the system singularity is small, the selected 𝛽
should be small; and when the system singularity is large, the selected 𝛽 should be large. In addition, we also tested
on other systems, The kernel functions employed in the various systems are as follows:
K1: Biot-Savart kernel:
𝑧⊥
𝐾(𝑧) = .
∥𝑧∥
K2: two order system:
( )( )
1 ∑ | |
d𝑉𝑡𝑖 = 𝐾 |𝑋𝑡𝑗 − 𝑋𝑡𝑖 | 𝑉𝑡𝑗 − 𝑉𝑡𝑖 d𝑡 + 𝜎d𝐵𝑡𝑖 .
𝑁 𝑗≠𝑖 | |
‖𝐱‖ ‖𝐱‖
− 𝑙 − 𝑙
𝐾(𝐱) = 𝐶𝑟 𝑒 𝑟 − 𝐶𝑎 𝑒 𝑎 .
y-dimension:
2
𝑒−(𝑦𝑖 −𝑦)
.
∥ 𝑠𝑖 − 𝑠 ∥
y-dimension:
cosh(𝑦𝑖 − 𝑦)
.
∥ 𝑠𝑖 − 𝑠 ∥2
ln(−𝑒𝑥𝑝(𝑧2 ) + 1)
𝐾(𝑧) = .
tan(∥ 𝑧 ∥2 )∕10 + 2𝜋
⎧ − 1 𝑥<1−𝛼
⎪ 𝑥−1
⎪ 1
𝑓 (𝑥) = ⎨ 2
∣𝑥−1∣ 1−𝛼 <𝑥<1+𝛼 , (4.9)
⎪ 𝛼
⎪ 1
𝑥>𝛼+1
⎩ 𝑥−1
where 𝛼 is a parameter to measure the steepness of a function,and 𝛼 is smaller, the function is more steep ( when
selecting a function, the reason for not taking 0 as the center is to avoid some division of 0 that may lead to a decrease
in numerical accuracy). Figure 5 is the figure of 𝑓 (𝑥):
If 𝛼 → 0, the function 𝑓 (𝑥) becomes a discontinuous function. In this situation, there is a significant degree of
singularity associated with 𝑓 (𝑥). If we were to use the standard RBM approach in this case, the error would be quite
large. However, as demonstrated by the proof presented earlier, the RBM-M algorithm outperforms the standard RBM
when the function exhibits a high degree of singularity. Let us now compare the effects of these two approaches in
figure 6:
The figure above compares the errors between the two methods. For each data point represented in the figure, the
reported error is the average of 10 repeated calculations, in order to mitigate the effects of random factors. The closer
the 𝑥-axis direction is to 1, the smaller the 𝛼, which means the function is more steep and has greater singularity. From
the graph, it can be observed that as 𝛼 decreases, the singularity of the function increases, and the error of the standard
RBM becomes increasingly larger than that of the RBM-M method.
The specific data is in table 4.
In this numerical experiment, we set the coefficient of the RBM-M method, 𝛽, to 0.1. When the function is flatter,
we use a smaller 𝛽, and when the function is more steep, we use a larger 𝛽 (the final difference has decreased because
the singularity of the function is too large, and a larger 𝛽 should be selected). All of these choices follow the theorem
3.4.
Moreover, we can explain the main idea of RBM and RBM-M from a sampling perspective. (3.5) and (3.11)
have explained that the core concept is to use the distribution of a few particles to estimate the overall distribution of
all particles. When the singularity of the kernel is relatively small, and the distribution is relatively flat, it is easier
to estimate. However, if the kernel singularity is relatively high, as mentioned earlier, it becomes more challenging to
estimate accurately. The figure 7 illustrates the sampling processes of two distinct methods. In each time step, the RBM
Table 4
Comparison of errors of two algorithms for functions with different singularity.
Data availability
No data was used for the research described in the article.
Acknowledgement
Z.-W. Zhang’s research is . J.-R. Chen’s work . The authors are also very grateful to anonymous reviewers for their
valuable comments and suggestions, which have greatly helped to improve our manuscript.
References
[1] S. Jin, L. Li, J.-G. Liu, Random batch methods (rbm) for interacting particle systems, Journal of Computational Physics 400 (2020) 108877.
doi:https://doi.org/10.1016/j.jcp.2019.108877.
[2] Z. Wang, J. Xin, Z. Zhang, Deepparticle: Learning invariant measure by a deep neural network minimizing wasserstein distance on data
generated from an interacting particle method, Journal of Computational Physics 464 (2022) 111309. doi:https://doi.org/10.1016/
j.jcp.2022.111309.
[3] B. Perthame, Pde models for chemotactic movements: Parabolic, hyperbolic and kinetic, Applications of Mathematics 49 (2004) 539–564.
[4] L. Greengard, V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics 73 (2) (1987) 325–348. doi:https:
//doi.org/10.1016/0021-9991(87)90140-9.
[5] J. S. Bagla, Treepm: A code for cosmological n-body simulations, Journal of Astrophysics and Astronomy 23 (1999) 185–196.
[6] Y. Yao, J. Capecelatro, An accurate particle-mesh method for simulating charged particles in wall-bounded flows, Powder Technology 387
(2021) 239–250. doi:https://doi.org/10.1016/j.powtec.2021.04.012.