0% found this document useful (0 votes)
18 views25 pages

Random Batch Method With Momentum Correction: Zhao Chen Zhang

The document presents the Random Batch Method with Momentum Correction (RBM-M), an enhanced algorithm designed to improve the computational efficiency and accuracy of solving stochastic differential equations involving interacting particle systems. By introducing a momentum-like correction, the RBM-M algorithm reduces the error associated with traditional RBM, particularly in systems with significant singularities, while maintaining similar computational times. The paper includes theoretical proofs and numerical experiments demonstrating the effectiveness of the RBM-M algorithm compared to its predecessor.

Uploaded by

mavohib665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

Random Batch Method With Momentum Correction: Zhao Chen Zhang

The document presents the Random Batch Method with Momentum Correction (RBM-M), an enhanced algorithm designed to improve the computational efficiency and accuracy of solving stochastic differential equations involving interacting particle systems. By introducing a momentum-like correction, the RBM-M algorithm reduces the error associated with traditional RBM, particularly in systems with significant singularities, while maintaining similar computational times. The paper includes theoretical proofs and numerical experiments demonstrating the effectiveness of the RBM-M algorithm compared to its predecessor.

Uploaded by

mavohib665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Random Batch Method with Momentum Correction

Yanshun Zhaoa , Jingrun Chenb,∗ and Zhiwen Zhangc,∗


a School of Mathematical Sciences, University of Science and Technology of China, Hefei, 230026, People’s Republic of China
b School of Mathematical Sciences, University of Hong Kong, Pokfulam, Hong Kong
c School of Mathematical Sciences and Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215132,

People’s Republic of China


arXiv:2412.15581v1 [math.NA] 20 Dec 2024

ARTICLE INFO ABSTRACT


Keywords: The Random Batch Method (RBM) is an effective technique to reduce the computational
Interacting particle systems complexity when solving certain stochastic differential problems (SDEs) involving interacting
Fast algorithm particles. It can transform the computational complexity from O(𝑁 2 ) to O(𝑁), where N
Random batch method represents the number of particles. However, the traditional RBM can only be effectively
Momentum applied to interacting particle systems with relatively smooth kernel functions to achieve
satisfactory results. To address the issue of non-convergence of the RBM in particle interaction
MSC: 65F10, 65K05, 65Y20 systems with significant singularities, we propose some enhanced methods to make the modified
algorithm more applicable. The idea for improvement primarily revolves around a momentum-
like correction, and we refer to the enhanced algorithm as the Random Batch Method with
Momentum Correction ( RBM-M). We provide a theoretical proof to control the error of the
algorithm, which ensures that under ideal conditions it has a smaller error than the original
algorithm. Finally, numerical experiments have demonstrated the effectiveness of the RBM-M
algorithm.

1. Introduction
1.1. Interacting particle systems
Interacting particle systems are a class of stochastic models that have gained significant attention in the realm
of applied mathematics and theoretical physics. These systems, comprised of a large number of individual particles
or agents, exhibit complex and often emergent behaviors that arise from the intricate interactions among the
particles[10, 11]. In general, the interacting particle systems can be represented by the following equation:
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝐾 𝑋 − 𝑋 𝑗 𝑑𝑡 + 𝜎𝑑𝐵 𝑖 . (1.1)
𝑁 − 1 𝑗≠𝑖

In this stochastic differential equation, 𝐾(⋅) is the kernel of interacting particle system, 𝑉 represents the external flow
independent of the interacting particle systems, 𝑋 𝑖 represents some property of the particle, such as its position, and
𝐵 is a stochastic disturbance while the diffusion term 𝜎 is represents the strength of the disturbance.
At the core of interacting particle systems lies the concept of randomness and probability. Each particle or agent
within the system is governed by a set of stochastic differential equations, which capture the dynamic evolution of the
system over time. These equations incorporate the interactions between the particles, enabling the modeling of a wide
range of phenomena, from the diffusion of molecules in a fluid to the collective behavior of social networks.
The mathematical analysis of interacting particle systems requires an in-depth understanding of probability
theory, stochastic processes, and partial differential equations. In addition to their theoretical significance, interacting
particle systems are widely used in many fields, such as physics, chemistry, biology, economics, and social sciences
[24, 25, 26, 27]. From modeling the spread of outbreaks to understanding financial market dynamics, these systems
have proven to be powerful frameworks for capturing the complex interactions in the world around us.
Due to the significant theoretical and practical implications of interacting particle systems, finding effective
solutions to these systems is of great importance. In other words, devising accurate and efficient numerical methods
for solving interacting particle systems is a problem worthy of in-depth investigation.
∗ Corresponding
author
[email protected] (Y. Zhao)
ORCID (s):

Zhao et al.: Preprint submitted to Elsevier Page 1 of 25


Upon initial consideration, the solution to this problem may appear to be a straightforward numerical approach
to solving the underlying stochastic differential equations, which can be addressed using techniques such as the
Euler method [12] or Runge-Kutta method [13]. However, further examination reveals certain challenges. Examining
equation (1.1), we can observe that if we attempt to solve it directly using numerical methods, the computational
complexity will scale quadratically with the number of particles, 𝑁, in the interacting particle system. This is because
for each particle, the equation accounts for the interactions with all other particles, resulting in a complexity of 𝑂(𝑁).
Given the N particles in the system, the overall computational complexity becomes 𝑂(𝑁 2 ).
For interacting particle systems with a large number of particles, this quadratic computational complexity poses a
significant challenge, as it leads to lengthy simulation times. Therefore, if we aim to solve large-scale interacting particle
systems in industrial applications, it may not be appropriate to directly apply methods like the Euler or Runge-Kutta
techniques to obtain numerical solutions to the stochastic differential equations. Instead, we need to develop improved
algorithms that can reduce the computational complexity as much as possible while maintaining the necessary accuracy.

1.2. Related work


To mitigate the computational complexity inherent in simulating interacting particle systems, various methods have
been proposed by previous researchers. For instance, the Fast Multipole Method (FMM) has been a widely adopted
technique since the late 20th century [4]. The FMM provides a means to reduce the computational complexity to
𝑂(𝑁𝑙𝑜𝑔𝑁), or even 𝑂(𝑁) in certain cases, by exploiting the inherent hierarchical structure of the problem. The core
idea of the FMM is to partition the computational domain into a hierarchical tree-like structure, where each node in the
tree represents a cluster of particles or objects. Through the use of a series of multipole expansions and local Taylor
series expansions, the algorithm is able to efficiently approximate the long-range interactions between distant clusters,
while accurately computing the short-range interactions between nearby particles. The implementation of the FMM
involves several key steps, including the construction of the hierarchical tree structure, the computation of multipole
and local Taylor series expansions, and the efficient traversal of the tree to accumulate the contributions from distant
clusters. The algorithm also employs various optimization techniques, such as adaptive refinement of the tree structure
and the utilization of parallel computing architectures, to further enhance its computational performance.
Additionally, the Tree Code method is another fast algorithm for calculating multi-body interactions. This method
also utilizes a hierarchical tree structure to represent the particle distribution, and can significantly reduce the
computational complexity by aggregating the contributions from distant particles [5].
Furthermore, there exist mesh-based methods, such as the Particle-Mesh Method [6], which transform the
continuous space into a grid and perform calculations on the grid. This approach can leverage fast Fourier transform
and other techniques to efficiently compute long-range interactions, thereby substantially reducing the computational
complexity.
In recent years, a novel algorithm known as the Random Batch Method (RBM) has gained significant attention from
many researchers due to its simplicity of implementation and impressive computational efficiency [1, 16]. The core
principle underlying RBM is to divide the particle system into smaller, randomly selected batches, and then perform
the computations on these batches rather than the entire system. This approach effectively mitigates the computational
complexity from the traditional 𝑂(𝑁 2 ) scaling to a more manageable 𝑂(𝑁) or even better scaling. The key advantage
of RBM lies in its ability to approximate the long-range interactions between distant particles by considering only
the nearby particles within each randomly selected batch. Furthermore, RBM can be easily parallelized, and this
parallelization capability serves to further enhance the computational performance of the algorithm, making it well-
suited for simulating large-scale particle systems. Notably, the implementation of RBM is much simpler compared to
the more sophisticated Fast Multipole Method (FMM) and tree code methods. This unique combination of simplicity
and high computational efficiency has rendered RBM a popular choice among researchers and practitioners across a
wide range of applications, including fluid mechanics, plasma physics, and molecular dynamics.

1.3. Advantage of Algorithm


However, despite the good results achieved by the above fast algorithms, there are still some small limitations.
Therefore, this paper proposes the RBM with Momentum Correction ( RBM-M) algorithm, which is an evolution of
the standard RBM algorithm and offers several advantages. The main contributions of this work are as follows:
1. The RBM-M algorithm shares a similar underlying concept with the RBM algorithm, making it relatively simple
to understand and implement.

Zhao et al.: Preprint submitted to Elsevier Page 2 of 25


2. In the face of particle systems with relatively large singularities, the RBM-M algorithm can achieve higher
accuracy compared to the standard RBM (see theorem 3.2), while maintaining a similar computational time.
Numerical experiments demonstrate that the error of RBM-M can be reduced to half that of RBM in some cases
(Detail in table 3).

3. The paper provides a theoretical analysis, proving an upper bound on the error of the RBM-M algorithm. It also
analyzes in detail the reasons why the error is smaller than that of the RBM algorithm. In the other hand, from
a probabilistic perspective, the statistical properties of RBM-M are demonstrated (Detail in the theorem 3.4, the
theorem 3.5 and the figure 7).
4. The RBM-M algorithm introduces a hyperparameter 𝛽, which allows for the flexible control of the correction
strength. As the singularity of the system increases, 𝛽 can be adjusted to obtain higher accuracy. Consequently,
the RBM-M algorithm exhibits improved robustness and flexibility, making it suitable for a wider range of
example systems.
In Section 2, we will be devoted to our proposed algorithm, a momentum-based modification of the random batch
method, henceforth referred to as RBM-M. Then we will provide a rigorous analysis of the error characteristics of this
RBM-M approach in Section 3. Section 4 will then detail numerical experiments conducted to validate the efficacy of
the conclusions and algorithms associated with our RBM-M method. Finally, Section 5 will discuss potential future
research directions and offer concluding remarks.

2. RBM-M
2.1. The deficiency of RBM
While the Random Batch Method (RBM) offers numerous advantages, it also possesses a certain limitation.
Specifically, in order to obtain more accurate results, it is necessary to impose certain restrictions on the interacting
particle system to ensure that the system’s singularity is not excessively large. The following conclusions provide an
analysis of the error associated with RBM under specific conditions [1].

Proposition 2.1. Suppose 𝑉 is strongly convex on ℝ𝑑 so that 𝑥 ↦ 𝑉 (𝑥) − 21 |𝑥|2 is convex, and ∇𝑉 , ∇2 𝑉 have
polynomial growth (i.e., |∇𝑉 (𝑥)| + |∇2 𝑉 (𝑥)| ≤ 𝐶(1 + |𝑥|𝑞 ) for some 𝑞 > 0). Assume 𝐾(⋅) is bounded, Lipschitz on
ℝ𝑑 with Lipschitz constant 𝐿 and has bounded second order derivatives. Then the error of RBM:

√ √
1 𝜏
sup ∥ 𝑍 (𝑡) ∥≤ 𝐶 + 𝜏 2 ≤ 𝐶 𝜏.
𝑡≥0 𝑝−1

It can be observed that in order to minimize the numerical error of the RBM as the time discretization step size
𝜏 approaches zero, it is necessary to ensure that the kernel function 𝐾 of the interacting particle system is relatively
smooth, and the external flow 𝑉 also satisfies certain properties.
However, not all interacting particle systems possess these desirable properties. Furthermore, when the singularity
of the particle system is relatively large, the error associated with RBM can become substantial, and it may even be
the case that the method does not converge at all. Therefore, the challenge of improving the algorithm to enhance the
stability of RBM and obtain accurate results in the presence of particle systems with relatively large singularities is a
problem that merits further investigation.

2.2. Regularization of kernel


When the RBM was first introduced, certain restrictions were imposed on the kernel function of the interacting
particle systems in order to ensure the numerical solution converges to the true solution. Since these restrictions on the
kernel function limit the scope of applicability, it would be desirable to explore ways to relax these constraints, thereby
expanding the range of problems that RBM can effectively address[14, 15].
The way is:

∣ 𝑧 ∣2
𝐾𝛿 (𝑧) = 𝐾(𝑧) . (2.1)
∣ 𝑧 ∣2 +𝛿 2

Zhao et al.: Preprint submitted to Elsevier Page 3 of 25


This is a very simple modification, but it can really achieve good results. After this regulation, some kernel are
polished so that it satisfies the convergent condition, like Keller-Segal kernel[17] and Biot–Savart kernel[18]. Taking
the KS kernel as an example.
For the KS algorithm, the kernel function 𝐾 in two-dimensional space is:
𝑧
𝐾(𝑧) = 𝐶 ⋅ .
‖𝑧‖2
Specifically, taking a 2D example:
For the x-direction:
𝐶 ⋅ (𝑏𝑥 − 𝑎𝑥 )
𝐾1 (𝑏 − 𝑎) = .
‖𝑏 − 𝑎‖2
For the y-direction:

𝐶 ⋅ (𝑏𝑦 − 𝑎𝑦 )
𝐾2 (𝑏 − 𝑎) = .
‖𝑏 − 𝑎‖2
Where 𝑎 and 𝑏 represent the positions of the two examples respectively, and their corresponding angle symbols indicate
whether the component is in the x- direction or the y-direction, and 𝐶 is a constant.
For this kernel function, there is a singularity at z = 0. Hence, a regularization method is adopted:

‖𝑧‖2 𝑧
𝐾𝛿 (𝑧) = 𝐾(𝑧) ⋅ =𝐶⋅ .
‖𝑧‖2 + 𝛿 2 ‖𝑧‖2 + 𝛿 2

Therefore, the new kernel function 𝐾𝛿 (𝑧) is continuous on ℝ𝑑 , and obviously 𝐾𝛿 (𝑧) is globally stable, hence 𝐾𝛿 (𝑧) is
Lipschitz continuous on ℝ𝑑 . Then the interacting particle systems satisfies the requirements of the proposition 2.1, so
its error can be guaranteed by the proposition.

2.3. Correct algorithm with momentum


Based on the introduction of previous related work, we have gained a certain understanding of the RBM. The key
idea behind RBM is to leverage the concept of randomness to process all particles in batches. Guided by this principle,
small and randomly selected batches of particle interactions are utilized, thereby reducing the computational cost of
the n-particle system for binary interactions from 𝑂(𝑁 2 ) per time step to 𝑂(𝑁) per time step.
Carefully examining the algorithmic steps of RBM, one can observe that its underlying concept is highly similar
to that of Stochastic Gradient Descent (SGD) [7, 19]. Both methods involve processing the entire dataset by randomly
selecting a portion of the samples, with the role of this partial sample approximating that of the entire dataset. Using
this random subset instead of the full dataset can significantly reduce the computational complexity.
However, there are certain issues associated with the use of SGD. One of the primary shortcomings is the inherent
noise in the gradient estimates. Since the gradient is computed using a single data point or a small batch of data points,
the resulting gradient can be noisy and may not accurately represent the true gradient of the entire dataset. This can
lead to oscillations in the parameter updates and slow down the convergence of the algorithm, particularly in the later
stages of the optimization process. To address this issue, various techniques have been proposed, such as the use of
momentum. Momentum-based methods, like the Nesterov Accelerated Gradient (NAG) algorithm, incorporate the
previous gradient updates to smooth out the parameter updates and accelerate the convergence [8, 20].
Upon a careful examination of the challenges encountered with the RBM, it has been observed that the algorithm’s
accuracy is diminished when applied to interacting particle systems with relatively large singularities. We postulate that
this issue arises from a similar shortcoming as that of Stochastic Gradient Descent (SGD) algorithms. Specifically, since
the particle interactions are computed using small batches of data points, the resulting interactions may not accurately
represent the true interactions of the entire particle system. This can lead to inaccurate calculation of the interaction
terms when the governing equations are updated, consequently introducing substantial errors.
Drawing inspiration from the advancements in momentum correction for SGD, we have explored the feasibility
of incorporating momentum into the RBM algorithm to rectify its numerical performance. Accordingly, we propose a

Zhao et al.: Preprint submitted to Elsevier Page 4 of 25


variant, named RBM with Momentum Correction ( RBM-M), which aims to address the limitations observed in the
original RBM formulation:

Algorithm 1 RBM-M
( )
𝐾 𝑋 𝑖 − 𝑋 𝑗 = 0 when time=0
for m in 1:[T / 𝜏] do
Divide {1, 2, … , 𝑝𝑛} into n batches randomly.
for each batch 𝑞 do
( ) [ )
Update 𝑋 𝑖 ′ 𝑠 𝑖 ∈ 𝑞 by solving the following SDE with 𝑡 ∈ 𝑡𝑚−1 , 𝑡𝑚
( 𝑖 ) ( ) ( )
𝐾𝑐𝑜𝑟 𝑋 − 𝑋 𝑗 = 𝛽𝐾𝑡−1 𝑋 𝑖 − 𝑋 𝑗 + (1 − 𝛽)𝐾𝑡 𝑋 𝑖 − 𝑋 𝑗
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝑝−1 𝑗
𝑗∈𝑞 ,𝑗≠𝑖 𝐾𝑐𝑜𝑟 𝑋 − 𝑋 𝑑𝑡 + 𝜎𝑑𝐵
𝑖

end for
end for

The RBM-M is designed to address the challenges associated with interacting particle systems, where the evolution
time spans the interval [0, 𝑇 ], and the discrete time step is denoted as 𝜏. The algorithmic flow can be outlined as follows:
For the solution of the governing equations at each time step, the total 𝑁 particles are divided into 𝑛 groups, each
containing 𝑝 particles (if the division is not evenly distributed, the final group may have fewer than 𝑝 particles). For
each element within the group 𝐶𝑝 , the local interactions are computed and used as a proxy for the global interactions
to solve the governing equations. After solving the equations for one time step, the particle grouping is reorganized,
and the aforementioned process is repeated until the final time 𝑇 is reached. The interactions calculated within this
algorithmic framework serve as the cornerstone for the RBM’s approach to modeling interacting particle systems.
( ) ( ) ( )
𝐾𝑐𝑜𝑟 𝑋 𝑖 − 𝑋 𝑗 = 𝛽𝐾𝑡−1 𝑋 𝑖 − 𝑋 𝑗 + (1 − 𝛽)𝐾𝑡 𝑋 𝑖 − 𝑋 𝑗 . (2.2)

The RBM-M employs an exponential weighted averaging technique, where each update step comprises two com-
ponents: the interaction of the current time instance and the particle dynamics from the previous time step. The
hyperparameter 𝛽, which can be determined empirically, plays a crucial role in this process.
When 𝛽 is smaller, the correction effect is less pronounced, and the update equation primarily considers the
information from the current time step. Conversely, as 𝛽 increases, the impact of momentum becomes more significant,
leading to a stronger corrective effect. However, this also introduces the potential for additional errors.
By performing the k-th evolution, the 𝐾𝑐𝑜𝑟 term incorporates information from all the preceding k evolutions,
thereby enhancing the stability and robustness of the algorithm. This approach can be visualized more intuitively
through the figure 1 provided. In the diagram, from top to bottom, the first green line represents the real interaction,

Figure 1: RBM-M algorithm diagram.

the second yellow line represents the RBM-M calculation, and the third orange line represents the RBM calculation.
The gray dashed line represents the momentum correction effect, that is, the improvement of RBM by RBM-M.
Remark 2.1. In the RBM-M, the hyperparameter 𝛽 exhibits a distinct role compared to Stochastic Gradient Descent
(SGD) with momentum. In RBM-M, the particle position changes at each step, leading to a different distribution at every
time step. Consequently, the previous information can introduce new errors in addition to the correction. To mitigate

Zhao et al.: Preprint submitted to Elsevier Page 5 of 25


the impact of these introduced errors, the hyperparameter 𝛽 in RBM-M is required to be relatively small, typically less
than 0.1, as further analyzed in the theoretical section. In contrast, SGD with momentum operates on a consistent data
distribution, where the gradient of each update is theoretically subject to the same distribution. Therefore, to enhance
the algorithm’s stability, the updates in SGD with momentum are made in small increments, i.e., 𝛽 is set to a large
value, and the newly added information is only modified by a small amplitude.

Remark 2.2. Compared to the original RBM, RBM-M introduces additional computations when calculating the
interaction. However, this extra information is already available, and no additional data is required. Equivalently, the
extra computation per evolution in RBM-M amounts to two multiplications and one addition, which can be considered
negligible. As a result, the actual computational complexity of RBM-M remains 𝑂(𝑁), similar to RBM.

3. Theoretical analysis of RBM-M


The RBM-M algorithm has been described in greater detail. Now, we will provide a proof of the algorithm’s
performance.

3.1. The error of RBM-M


First, we will denote the true solution as 𝐸. Additionally, we will represent the solution obtained by the RBM
algorithm as 𝑢𝑖𝑝 and the solution obtained by the RBM-M algorithm as 𝑢̃ 𝑖𝑝 , where the subscript i denotes the i-th
iteration, and the subscript p represents the number of particles per group.
Second, let us review the algorithmic process of RBM-M:

𝑢̃ 0𝑝 = 𝑢0𝑝 , (3.1)

𝑢̃ 𝑖𝑝 = 𝛽 𝑢̃ 𝑖−1 𝑖
𝑝 + (1 − 𝛽)𝑢𝑝 (𝑖 ≥ 1). (3.2)

And through some calculations related to exponential weighted averaging we can know that:

𝑢̃ 𝑛𝑝 = 𝛽 𝑢̃ 𝑛−1
𝑝 + (1 − 𝛽)𝑢𝑛𝑝
= 𝛽(𝛽 𝑢̃ 𝑛−2
𝑝 + (1 − 𝛽)𝑢𝑛−1 𝑛
𝑝 ) + (1 − 𝛽)𝑢𝑝
= 𝛽 𝑛 𝑢0𝑝 + 𝛽 𝑛−1 (1 − 𝛽)𝑢1𝑝 + ⋯ + 𝛽(1 − 𝛽)𝑢𝑛−1
𝑝 + (1 − 𝛽)𝑢𝑛𝑝

𝑛
= 𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 .
𝑖=1

Next, we will estimate the error of the RBM-M algorithm. Before that, we should note that based on our previous
assumptions, the parameters 𝛽 and 𝜏 are both very small. Then, we will present some propositions that will be used to
prove the error bound.
First, we will analyze the errors in adjacent time steps of the equation:
Proposition 3.1. Assuming the conditions proposition 2.1 in are met,then

∥ 𝑢𝑡𝑝 − 𝑢𝑖−1
𝑝 ∥≤ 𝐶𝜏, 𝑡 ∈ (𝑡𝑖−1 , 𝑡𝑖 ). (3.3)

The proposition can be proved directly by the boundedness of kernel function.


With the preliminary preparation, the error bounds of RBM-M can be proved:

Theorem 3.2. Suppose 𝑉 is strongly convex on ℝ𝑑 so that 𝑥 ↦ 𝑉 (𝑥)− 21 |𝑥|2 is convex, and ∇𝑉 , ∇2 𝑉 have polynomial
growth (i.e., |∇𝑉 (𝑥)|+|∇2 𝑉 (𝑥)| ≤ 𝐶(1+|𝑥|𝑞 ) for some 𝑞 > 0). Assume 𝐾(⋅) is bounded, Lipschitz on ℝ𝑑 with Lipschitz
constant 𝐿 and has bounded second order derivatives. Then the error of RBM-M:
√ √
2 𝜏
𝑛
∥ 𝐸 − 𝑢̃ 𝑝 ∥≤ 𝐶(1 − 𝛽 ) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 ) ≲ 𝐶(1 − 𝛽 2 ) 𝜏. (3.4)
𝑝−1

Zhao et al.: Preprint submitted to Elsevier Page 6 of 25


Proof. Substitute the definition of 𝑢̃ 𝑛𝑝 :


𝑛
∥𝐸 − 𝑢̃ 𝑛𝑝 ∥≤∥ 𝐸 − (𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 ) ∥ .
𝑖=1

𝛽 is vary small so that the high-order terms of 𝛽 can be almost ignored:


𝑛
∥ 𝐸 − (𝛽 𝑛 𝑢0𝑝 + (1 − 𝛽) 𝛽 𝑛−𝑖 𝑢𝑖𝑝 ) ∥
𝑖=1
≤ ∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛−1
𝑝1
− (1 − 𝛽)𝑢𝑛𝑝2 ∥ +𝑂(𝛽 2 ),

where p1 and p2 are two different groups in RBM.


Let 𝑢𝑛−1
𝑝1
continue running time 𝜏,considering 𝑢𝑛𝑝1 :

∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛−1
𝑝1
− (1 − 𝛽)𝑢𝑛𝑝2 ∥ +𝑂(𝛽 2 )
= ∥ (1 − 𝛽 2 )𝐸 − (1 − 𝛽)𝛽𝑢𝑛𝑝1 − (1 − 𝛽)𝑢𝑛𝑝2 + (1 − 𝛽)𝛽(𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
) ∥ +𝑂(𝛽 2 )
≤𝛽(1 − 𝛽) ∥ 𝐸 − 𝑢𝑛𝑝1 ∥ +(1 − 𝛽) ∥ 𝐸 − 𝑢𝑛𝑝2 ∥ +𝛽(1 − 𝛽) ∥ 𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
∥ +𝑂(𝛽 2 ).

By proposition 2.1:

𝜏
∥𝐸 − 𝑢𝑛𝑝1 ∥≤ 𝐶 + 𝜏 2,
𝑝−1


𝜏
∥𝐸 − 𝑢𝑛𝑝2 ∥≤ 𝐶 + 𝜏 2.
𝑝−1

By proposition 3.1:

∥ 𝑢𝑛𝑝1 − 𝑢𝑛−1
𝑝1
∥≤ 𝐶𝜏.

Note that compare to 𝜏, 𝜏 2 is very small, so:

∥ 𝐸 − 𝑢̃ 𝑛𝑝 ∥
√ √
𝜏 𝜏
≤𝐶(1 − 𝛽) 2
+ 𝜏 + 𝐶𝛽(1 − 𝛽) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 )
𝑝−1 𝑝−1

𝜏
≤𝐶(1 − 𝛽 2 ) + 𝜏 2 + 𝐶𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 )
𝑝−1

≲𝐶(1 − 𝛽 2 ) 𝜏.

We have mathematically proven that the error of the RBM-M is smaller than that of the RBM.
The error of the RBM-M can be decomposed into two terms. The first term of error (3.4) is:

𝜏
(1 − 𝛽 2 ) + 𝜏 2,
𝑝−1

which represents the main part of the error and corresponds to the error of the RBM. The second term of error (3.4) is:

𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 ),

Zhao et al.: Preprint submitted to Elsevier Page 7 of 25


which represents the additional error introduced by the momentum term.
Based on the form of the error expression, we hypothesize that the RBM-M algorithm exhibits better performance
when the kernel of the RBM has√ a relatively large singularity. The rationale is that when the kernel has a large
𝜏
singularity, the error of the RBM ( 𝑝−1 + 𝜏 2 ) is significant. In this case, the effect caused by the coefficient (1 − 𝛽 2 )
becomes more prominent. However, the additional term (𝛽(1 − 𝛽)𝜏 + 𝑂(𝛽 2 )) has little relevance to the singularity
of the kernel. As long as 𝛽 becomes larger, this error term also becomes larger. Therefore, the overall error of the
RBM-M is a balanced result. When the singularity of the kernel is large, the advantages brought by the momentum
correction outweigh the disadvantages caused by the additional error term, enabling the RBM-M to achieve a more
accurate solution. We plan to validate this hypothesis through numerical experiments in the subsequent analysis.

3.2. Good statistical properties about variance and expectations


On the other hand, we can explain the advantages of the RBM-M algorithm from a probabilistic perspective. We
first consider the situation of the RBM, for which the following theoretical analysis [1] has been provided, with its
expectation and variance respectively:
Proposition 3.3. Consider 𝑝 ≥ 2 and a given fixed 𝑥 = (𝑥1 , … , 𝑥𝑁 ) ∈ ℝ𝑑 . Denote:
1 ∑ 1 ∑
𝑋𝑚,𝑖 (𝑥) = 𝐾(𝑥𝑖 − 𝑥𝑗 ) − 𝐾(𝑥𝑖 − 𝑥𝑗 ), (3.5)
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑖

Then, for all 𝑖,

𝔼𝑋𝑚,𝑖 (𝑥) = 0, (3.6)

where the expectation is taken with respect to the random division of batches. Moreover, the variance is given by
( )
1 1
Var(𝑋𝑚,𝑖 (𝑥)) = 𝔼[𝑋𝑚,𝑖 (𝑥)]2 = − Λ𝑖 (𝑡), (3.7)
𝑝−1 𝑁 −1

where
1 ∑ 1 ∑
Λ𝑖 (𝑡) = ∣ 𝐾(𝑥𝑖 − 𝑥𝑗 ) − 𝐾(𝑥𝑖 − 𝑥𝑙 ) ∣2 . (3.8)
𝑁 − 2 𝑗≠𝑖 𝑁 − 1 𝑘≠𝑖

Then we consider RBM-M, denoting:


̃ 𝑖 − 𝑥𝑗 ) = 𝐾(𝑥𝑖 − 𝑥𝑗 ),
𝐾(𝑥 (3.9)
0 0 0 0

𝐾(𝑥 ̃ 𝑖 − 𝑥𝑗 ) + (1 − 𝛽)𝐾(𝑥𝑖 − 𝑥𝑗 ), (𝑛 ≥ 1).


̃ 𝑖 − 𝑥𝑗 ) = 𝛽 𝐾(𝑥 (3.10)
𝑛 𝑛 𝑛−1 𝑛−1 𝑛 𝑛

while 𝑥𝑖 , 𝑥𝑗 means different particles,subscript ’𝑛’ means 𝑛-th time.


Let 𝐾̃ is the exponential weighted averaging of interacting particles term:

1−𝛽 ∑
𝐾̃ 𝑛𝑖 =𝛽 𝐾̃ 𝑛−1 + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )
𝑝−1 𝑗∈𝐶𝑛 ,𝑗≠𝑖

𝛽𝑛 ∑ 1 − 𝛽 ∑ 𝑛−𝑠 ∑
𝑛
= 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) + 𝛽 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ).
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑝 − 1 𝑠=1 𝑗∈𝐶 ,𝑗≠𝑖
0 𝑠

while 𝐶𝑠 means different groups.


Define:
1 ∑
𝑌𝑛,𝑖 (𝑥) = 𝐾̃ 𝑛𝑖 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ). (3.11)
𝑁 − 1 𝑖≠𝑗

Zhao et al.: Preprint submitted to Elsevier Page 8 of 25


Next, we calculate and prove some statistical properties of RBM-M. The first is unbiaseness, which has the
following theorem:
Theorem 3.4. The interacting particles term of RBM-M is asymptotic unbiased estimation of true interacting particles
term, which means that:

lim 𝔼(𝑌𝑛,𝑖 (𝑥)) = 0. (3.12)


𝜏→0
𝛽→0

Proof. By the mathematical definition of expectation, we know that

1 ∑ 1 ∑
𝔼(𝑋𝑛,𝑖 (𝑥)) = 𝔼( 𝐾(𝑥𝑖 − 𝑥𝑗 ) − 𝐾(𝑥𝑖 − 𝑥𝑗 )) = 0.
𝑝 − 1 𝑗∈𝐶,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖

Then we have:
1 ∑ 1 ∑
∥ 𝔼(𝑌𝑛,𝑖 (𝑥)) ∥ ≤ 𝛽 𝑛 ∥ 𝔼( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥ (3.13)
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
0
1 ∑ 1 ∑
+ ⋯ + 𝛽(1 − 𝛽) ∥ 𝔼( 𝐾(𝑥𝑖𝑛−1 − 𝑥𝑗𝑛−1 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑛−1
1 ∑ 1 ∑
+ (1 − 𝛽) ∥ 𝔼( 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑛
𝛽𝑛 ∑
≤ ∥ 𝐾(𝑥𝑖0 − 𝑥𝑗0 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑁 − 1 𝑗≠𝑖
𝛽(1 − 𝛽) ∑
+⋯+ ∥ 𝐾(𝑥𝑖𝑛−1 − 𝑥𝑗𝑛−1 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥
𝑁 − 1 𝑗≠𝑖
(1 − 𝛽) ∑
+ ∥ 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )) ∥ .
𝑁 − 1 𝑗≠𝑖

By proposition 3.1 and the 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 of K and ∥ 𝑥 ∥ is bounded,we can get it easily:

∥ 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) − 𝐾(𝑥𝑖0 − 𝑥𝑗𝑛 ) ∥≤ 𝐶𝑛𝜏. (3.14)

Substitute the equation (3.14) into the equation (3.13):

𝛽𝑛 𝛽 𝑛−1 (1 − 𝛽) 𝛽(1 − 𝛽)
∥ 𝔼(𝑌𝑛,𝑖 (𝑥)) ∥ ≤ 𝐶1 𝑛𝜏 + 𝐶2 (𝑛 − 1)𝜏 + ⋯ + 𝐶 𝜏
𝑁 −1 𝑁 −1 𝑁 − 1 𝑛−1
𝐶𝜏
≤ (𝑛𝛽 𝑛 + (𝑛 − 1)𝛽 𝑛−1 + ⋯ + 𝛽). (3.15)
𝑁 −1
Using misalignment subtraction:

𝛽 − 𝛽 𝑛+1 𝑛𝛽 𝑛+1
𝑛𝛽 𝑛 + (𝑛 − 1)𝛽 𝑛−1 + ⋯ + 𝛽 = − . (3.16)
(1 − 𝛽)2 1−𝛽

Substitute the equation (3.16) into the equation (3.15):

𝐶𝜏 𝛽 − 𝛽 𝑛+1 𝑛𝛽 𝑛+1 𝐶𝛽𝜏


∥ 𝔼(𝑌𝑛,𝑖 (𝑥)) ∥≤ ( − )≤ . (3.17)
𝑁 − 1 (1 − 𝛽)2 1−𝛽 (1 − 𝛽)2 (𝑁 − 1)

Zhao et al.: Preprint submitted to Elsevier Page 9 of 25


So we can get:

lim 𝔼(𝑌𝑛,𝑖 (𝑥)) = 0.


𝜏→0
𝛽→0

Then there are some properties of variance that prove that the variance of RBM-M is more stable:
Theorem 3.5. Assume when 𝑡1 ≠ 𝑡2 , 𝐾(𝑥𝑖𝑡1 −𝑥𝑗𝑡1 ) and 𝐾(𝑥𝑖𝑡2 −𝑥𝑗𝑡2 ) are independent (it is reasonable for the re grouping
of different time periods to be independent of each other), and 𝜏 → 0, 𝛽 → 0, then we have the following estimation
formula:

(1 − 𝛽)2 𝛽2 (1 − 𝛽)2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 (𝑥)) < ( + 𝑜(𝛽 2𝑛+1 ))𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) + 2𝜏 ≲ 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ), (3.18)
1 − 𝛽2 (1 − 𝛽 2 )2 1 − 𝛽2
while 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) is RBM’s variance.
Proof. According to the definition of variance:
2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) = 𝔼(𝑌𝑛,𝑖 ) − (𝔼(𝑌𝑛,𝑖 ))2 ,

processing them separately:

2 1 ∑
𝔼(𝑌𝑛,𝑖 ) = 𝔼 ∣ 𝐾̃ 𝑛𝑖 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) ∣2
𝑁 − 1 𝑖≠𝑗
2 ∑ 1 ∑ ∑
= 𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣2 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣ + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) 𝐾(𝑥𝑖𝑛 − 𝑥𝑘𝑛 ),
𝑁 − 1 𝑖≠𝑗 (𝑁 − 1)2 𝑖≠𝑗 𝑖≠𝑘

1 ∑
(𝔼(𝑌𝑛,𝑖 ))2 = (𝔼(𝐾̃ 𝑛𝑖 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )))2
𝑁 − 1 𝑖≠𝑗
2 ∑ 1 ∑ ∑
= (𝔼(𝐾̃ 𝑛𝑖 ))2 − 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 )𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣ + 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) 𝐾(𝑥𝑖𝑛 − 𝑥𝑘𝑛 ).
𝑁 − 1 𝑖≠𝑗 2
(𝑁 − 1) 𝑖≠𝑗 𝑖≠𝑘

Rewrite the variance based on the two calculation of expectations:

𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) = 𝔼 ∣ 𝐾̃ 𝑛𝑖 ∣2 −(𝔼(𝐾̃ 𝑛𝑖 ))2


𝛽𝑛 ∑ 𝑗 1 − 𝛽 ∑ 𝑛−𝑠 ∑
𝑛
=𝔼∣ 𝑖
𝐾(𝑥0 − 𝑥0 ) + 𝛽 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ) ∣2
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑝 − 1 𝑠=1 𝑗∈𝐶 ,𝑗≠𝑖
0 𝑠

𝛽𝑛 ∑ 1 − 𝛽 ∑ 𝑛−𝑠 ∑
𝑛
− (𝔼( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) + 𝛽 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 )))2 .
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖
𝑝 − 1 𝑠=1 𝑗∈𝐶 ,𝑗≠𝑖
0 𝑠

Use the independence of 𝐾(𝑥𝑖𝑡1 − 𝑥𝑗𝑡1 ) and 𝐾(𝑥𝑖𝑡2 − 𝑥𝑗𝑡2 )(𝑡1 ≠ 𝑡2 ):

𝛽 2𝑛 ∑ 1 ∑
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) = (𝔼 ∣ 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) ∣2 −( 𝐾(𝑥𝑖0 − 𝑥𝑗0 ))2 )
(𝑝 − 1) 2 𝑁 − 1
𝑗∈𝐶 ,𝑗≠𝑖 0 𝑗≠𝑖

(1 − 𝛽)2 ∑
𝑛 ∑ ∑ 1 ∑
+ 𝛽 2(𝑛−𝑠) (𝐸 ∣ 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ) ∣2 −( 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ))2 ).
(𝑝 − 1)2 𝑠=1 𝑗∈𝐶𝑠 ,𝑗≠𝑖 𝑗∈𝐶0 ,𝑗≠𝑖
𝑁 − 1 𝑗≠𝑖

Zhao et al.: Preprint submitted to Elsevier Page 10 of 25


Denote
1 ∑ 1 ∑
𝑋𝑠,𝑖 (𝑥) = 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ) − 𝐾(𝑥𝑖𝑠 − 𝑥𝑗𝑠 ).
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑁 − 1 𝑗≠𝑖
𝑖

Note that 𝑋𝑛,𝑖 (𝑥) is the variance of RBM, then:

∣ 𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) ∣ =∣ 𝛽 2𝑛 𝑉 𝑎𝑟(𝑋0,𝑖 ) + 𝛽 2(𝑛−1) (1 − 𝛽)2 𝑉 𝑎𝑟(𝑋1,𝑖 ) + ⋯


+ 𝛽 2 (1 − 𝛽)2 𝑉 𝑎𝑟(𝑋𝑛−1,𝑖 ) + (1 − 𝛽)2 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) ∣,

because 𝐾(⋅) is bound,so:


1 ∑ 𝑘 1 ∑ 𝑘
∣ 𝐾(𝑥𝑖0 − 𝑥𝑗0 ) − 𝐾(𝑥𝑖0 − 𝑥0 2 ) ∣≤∣ 𝐾(𝑥𝑖𝑛 − 𝑥𝑗𝑛 ) − 𝐾(𝑥𝑖𝑛 − 𝑥𝑛 1 ) ∣ +2𝑛𝜏,
𝑁 − 1 𝑘 ≠𝑖 𝑁 − 1 𝑘 ≠𝑖
2 1

hence:

𝑉 𝑎𝑟(𝑋0,𝑖 ) ≤ 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) + 2𝑛𝜏𝐶,

add up the variances of all different time steps:

𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) < ((1 − 𝛽)2 + 𝛽 2 (1 − 𝛽)2 + ⋯ + 𝛽 2𝑛−2 (1 − 𝛽)2 + 𝛽 2𝑛 )𝑉 𝑎𝑟(𝑋𝑛,𝑖 )


+ 𝐶𝜏(2𝑛𝛽 2𝑛 + 2(𝑛 − 1)𝛽 2𝑛−2 + ⋯ + 𝛽 2 (1 − 𝛽)2 )
(1 − 𝛽)2 2𝛽 2𝑛+1 2𝐶𝛽 2 𝜏
≤( + )𝑉 𝑎𝑟(𝑋𝑛,𝑖 ) + .
1 − 𝛽2 1+𝛽 (1 − 𝛽 2 )2
So when 𝜏 → 0 and ignoring higher-order term of 𝛽,we can get:

(1 − 𝛽)2
𝑉 𝑎𝑟(𝑌𝑛,𝑖 ) < 𝑉 𝑎𝑟(𝑋𝑛,𝑖 ).
1 − 𝛽2

Through the above theoretical analysis, we know that when the hyperparameter 𝛽 is appropriately selected, the
error of the RBM-M algorithm is smaller than that of the RBM, and it exhibits better properties in terms of statistics
and probability. Next, we will conduct some numerical experiments to verify our theoretical conclusions from the
numerical perspective.

3.3. The error estimation of rescaled relative entropy


The joint distribution of all particles simulated by RBM-M (𝜌̃𝑁 𝑡 ) can be controlled by numerically solving the
corresponding joint distribution (𝜌𝑁𝑡 ) for all particles. In order to prove this, the particle association law obtained by
𝑁
solving RBM (𝜌𝑡 ) is introduced as an auxiliary proof of the intermediate result. What this part needs to estimate is
the error about the rescaled relative entropy between RBM-M and reference solution is:

1 𝜌̃𝑁
𝑡
𝑡 |𝜌𝑡 ) =
𝑁 (𝜌̃𝑁 𝑁
𝜌̃𝑁 log . (3.19)
𝑁 ∫ℝ𝑁 𝑡 𝜌𝑁𝑡

This part of the proof is mainly referenced from [28, 29], using the solution and corrected kernel of RBM-M to
replace the solution and kernel of RBM.

Zhao et al.: Preprint submitted to Elsevier Page 11 of 25


For convenience, the velocity field −∇𝑉 (𝑥) is denoted as 𝑏(𝑥). For the joint distribution 𝜌𝑁
𝑡 that N particle systems
obey at time t, the Fokker-Planck equation is satisfied:


𝑁 ∑
𝑁
𝜕𝑡 𝜌𝑁
𝑡 + 𝑑𝑖𝑣𝑥𝑖 (𝜌𝑁
𝑡 (𝑏(𝑥𝑖 ) + 𝐾 ∗ 𝜌𝑡 (𝑥𝑖 ))) = 𝜎 △𝑥𝑖 𝜌𝑁
𝑡 . (3.20)
𝑖=1 𝑖=1

For the original RBM and RBM-M solutions, Euler scheme is generally used to solve SDE to simulate the system
evolution. In order to make the distribution smoother, the solution process is processed continuously and the following
equation is obtained:

𝑋𝑡 = 𝑋𝑇𝑘 + (𝑡 − 𝑇𝑘 )𝑏(𝑋𝑇𝑘 ) + (𝑡 − 𝑇𝑘 )𝐾 𝐶𝑘 (𝑋𝑇𝑘 ) + 2𝜎(𝑊𝑇𝑘+1 − 𝑊𝑇𝑘 ), 𝑡 ∈ (𝑇𝑘 , 𝑇𝑘+1 ). (3.21)

In this case, t can continuously take values from (𝑇𝑘 , 𝑇𝑘+1 ) for the smooth simulation of each time.
For RBM,
𝑖 1 ∑
𝐾 𝐶𝑘 (𝑋𝑡𝑖 ) ∶= 𝐾 𝑡 = 𝐾(𝑋𝑡𝑖 − 𝑋𝑡𝑗 ), (3.22)
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖
𝑘

and for RBM-M,


1 ∑ 1 ∑
𝐾 𝐶𝑘 (𝑋𝑡𝑖 ) ∶= 𝐾̃ 𝑡𝑖 = 𝐾𝑐𝑜𝑟 (𝑋𝑡𝑖 − 𝑋𝑡𝑗 ) = (𝛽𝐾(𝑋𝑡𝑖 − 𝑋𝑡𝑗 ) + (1 − 𝛽)𝐾𝑐𝑜𝑟 (𝑋𝑡−1
𝑖 𝑗
− 𝑋𝑡−1 )).
𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖 𝑝 − 1 𝑗∈𝐶 ,𝑗≠𝑖
𝑘 𝑘
(3.23)

For continuously solved formula (3.21), the continuous solution distributions obtained by RBM and RBM-M are
denoted as 𝜚𝑁 𝑁
𝑡 and 𝜚̃𝑡 . And for them, the Liouville equation holds[28]:
( )
Proposition 3.6. Denote by 𝜚𝑁,𝐶
𝑡 the probability density function of 𝑋𝑡 = 𝑋𝑡1 , … , 𝑋𝑡𝑁 for 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). Then the
following Liouville equation holds:


𝑁 ( ( 𝐶,𝑖 𝐶,𝑖
)) ∑𝑁
𝜕𝑡 𝜚𝑁,𝐶
𝑡 + div𝑥𝑖 𝜚𝑁,𝐶
𝑡 𝑏𝑡 (𝑥) + 𝐾 𝑡 (𝑥) = 𝜎Δ𝑥𝑖 𝜚𝑁,𝐶
𝑡 , (3.24)
𝑖=1 𝑖=1

where
𝐶,𝑖
[ ( ) ]
𝑏𝑡 (𝑥) = 𝔼 𝑏 𝑋𝑇𝑖 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ), (3.25)
𝑘

and
𝐶,𝑖
[ 𝑖 ]
𝐾 𝑡 (𝑥) ∶= 𝔼 𝐾 𝑡 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). (3.26)

Here, 𝑥 = (𝑥1 , … , 𝑥𝑛 ) ∈ ℝ𝑁𝑑 .


In the same way, using the same proof method, we can get the following lemma of RBM-M:
( )
Lemma 3.1. Denote by 𝜚̃𝑁,𝐶 𝑡 the probability density function of 𝑋𝑡 = 𝑋𝑡1 , … , 𝑋𝑡𝑁 for 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). Then the
following Liouville equation holds:
( ( ))

𝑁 ∑
𝑁
𝜕𝑡 𝜚̃𝑁,𝐶
𝑡 + div𝑥𝑖 𝜚̃𝑁,𝐶
𝑡
̃ 𝐶,𝑖 (𝑥) + 𝐾̃ 𝐶,𝑖 (𝑥)
𝑏 𝑡 𝑡 = 𝜎Δ𝑥𝑖 𝜚̃𝑁,𝐶
𝑡 , (3.27)
𝑖=1 𝑖=1

where
[ ( ) ]
𝑏̃ 𝐶,𝑖 𝑖
𝑡 (𝑥) = 𝔼 𝑏 𝑋𝑇 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ), (3.28)
𝑘

Zhao et al.: Preprint submitted to Elsevier Page 12 of 25


and
[ ]
𝐾̃ 𝑡𝐶,𝑖 (𝑥) ∶= 𝔼 𝐾̃ 𝑡𝑖 ∣ 𝑋𝑡 = 𝑥, 𝐶 , 𝑡 ∈ [𝑇𝑘 , 𝑇𝑘+1 ). (3.29)

To prove (3.19), let’s start with Fisher information, which is defined by:

(𝜌) = 𝜌|∇𝜌|2 𝑑𝑥, (3.30)


And, we know that Fisher information statisfies the following inequality:

1 1 1
≥ + , (3.31)
(𝑝 ∗ 𝑞) (𝑝) (𝑞)
Define a definite mapping:
𝐶 ( )
𝜓̃ 𝜏 𝑘 (𝐱) ∶= 𝐱 + 𝜏 𝐛(𝐱) + 𝐊𝐂𝐤 (𝐱) , (3.32)

where 𝐱, 𝐛(𝐱), 𝐊𝐂𝐤 (𝐱) represents the union form of all corresponding variables respectively, i.e

𝐱 = (𝑥1 , ⋯ , 𝑥𝑛 ) ∈ ℝ𝑁𝑑 ,
𝐛(𝐱) = (𝑏(𝑥1 ), ⋯ , 𝑏(𝑥𝑛 ))𝑇 ∈ ℝ𝑁𝑑 ,
̃ 𝐂𝐤 (𝐱) = (𝐾̃ 𝐶𝑘 (𝑥1 ), ⋯ , 𝐾̃ 𝐶𝑘 (𝑥𝑛 ))𝑇 ∈ ℝ𝑁𝑑 .
𝐊

̃ 𝑘 = 𝜓̃ 𝜏𝐶𝑘 (𝐗𝐤 ), then through the conversion of variables, we can get


Let 𝑝̃𝑘 (⋅) be the desity of random variable 𝐙
that:

𝜚̃𝑁,𝐶
𝑇
(𝐱)
𝑘
𝑝̃𝑘 (𝐳) = 𝐶
.
𝑑𝑒𝑡(∇𝜓̃ 𝜏 𝑘 𝐱)
( )
Let 𝑞𝜏 denote the Nd-dimensional Gaussian distribution  0, 2𝜎𝜏𝐈𝑁𝑑 . By (3.21) and (3.32), we have that
𝜚̃𝑁,𝐶 (𝐱) = 𝑝𝑘 (𝐱) ∗ 𝑞𝜏 . Note that for RBM-M, the velocity field is the same as for RBM. Moreover, if 𝐾(⋅) is Lipshitz,
𝑇𝑘 ( )
then 𝐾𝑐𝑜𝑟 (⋅) is also Lipshitz. Then we have the following estimate for  𝜚̃𝑁,𝐶
𝑇
upper bound:
𝑘

( ) ( )
( 𝑁 ) 1 + 𝜏(𝑟 + 𝐿) 𝑁𝑑(𝑟 + 𝐿)(3 + 𝜏(𝑟 + 𝐿))
 𝜚̃𝑁,𝐶
𝑇𝑘
≤ max  𝜌0
, 𝑀𝑁, . (3.33)
(1 − 𝜏(𝑟 + 𝐿))2 2𝜎

For the specific proof process, see [28]. Recall (3.21), the 𝜚̃𝑁,𝐶
𝑡 [ is the probability density of the RBM for a given
( ) ]
𝑁 𝑁,𝑪
sequence of batches 𝑪 ∶= 𝐶0 , 𝐶1 , ⋯ , 𝐶𝑘 , ⋯ , so that 𝜌̃𝑇 = 𝔼𝑪 𝜚̃𝑇 . Moreover, by the Markov property, we are
𝑘 𝑘
able to define
[ ] [ )
𝑁,𝐶
𝜌̃𝑡 𝑘 ∶= 𝔼 𝜚̃𝑁,𝑪
𝑡 ∣ 𝐶 𝑖 , 𝑖 ≥ 𝑘 , 𝑡 ∈ 𝑇𝑘 , 𝑇𝑘+1 .

Next is a concrete analysis of (3.19), which is calculated the time derivative.


Theorem 3.7. Suppose:
(a)the field 𝑏: ℝ𝑑 → ℝ𝑑 and the interaction kernel 𝐾(⋅) are Lipshitz,
(b)the Hessians of 𝑏 and 𝐾 have at most polynomial growth:
| 2 | | 2 |
|∇ 𝑏(𝑥)| ≤ 𝐶(1 + |𝑥|)𝑞 , ̃ + |𝑥|)𝑞 .
|∇ 𝐾(𝑥)| ≤ 𝐶(1
| | | |

Zhao et al.: Preprint submitted to Elsevier Page 13 of 25


𝐶,𝑖
(c)𝐾 − 𝐹 is uniformly bounded, where:

1 ∑ ( 1 ∑ (
𝑁
( ) ) 𝐶,𝑖 )
𝐹 𝑥𝑖 = 𝐾 𝑥𝑖 − 𝑥𝑗 , 𝐾 𝑡 (𝑥) = 𝐾 𝑥𝑖 − 𝑥𝑗 ,
𝑁 𝑗=1 𝑝 𝑗∈𝐶
𝑖
‖ 𝐶,𝑖 ‖
esssup𝐶 ‖ ‖
‖𝐾 𝑡 − 𝐹 ‖ ∞ 𝑑 < +∞,
‖ ‖𝐿 (ℝ )

then, we have,
( ) ( ) 𝐶
sup 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 ≤ 𝑁 𝜌̃0𝑡 ∣ 𝜌0𝑡 + 𝐶1 𝜏 2 + 2 .
𝑡 𝑁

Proof. First we switch the order of integration and derivation, then replace 𝜕𝑡 𝜌̃𝑁 𝑁
𝑡 and 𝜕𝑡 𝜌𝑡 with (3.24) and (3.20),
finally we use the Green formula:
𝜌̃𝑁
( 𝑁 𝑁) 𝑑(∫ℝ𝑁 𝜌̃𝑁
𝑡 log
) 𝑡
𝑑 1 𝜌𝑁
𝑡
 𝜌̃ ∣ 𝜌𝑡 =
𝑑𝑡 𝑁 𝑡 𝑁 𝑑𝑡
𝜌̃ 𝑁 𝜕𝑡 𝜌̃𝑁 𝜕𝑡 𝜌𝑁
1 1
= 𝜕𝑡 𝜌̃𝑁
𝑡 𝑙𝑜𝑔 𝑡𝑁 𝑑𝐱 + 𝜌̃𝑁
𝑡 ( 𝑁𝑡 − 𝑁𝑡 )𝑑𝐱
𝑁 ∫ℝ𝑁𝑑 𝜌 𝑁 ∫ℝ𝑁𝑑 𝜌̃𝑡 𝜌̃𝑡
(𝑡 ) ( )
1 ( 𝑁) 𝑁
𝜌̃𝑡 1 ( 𝑁) 𝜌̃𝑁
𝑡
= 𝜕 𝜌̃ log 𝑁 + 1 𝑑𝐱 + 𝜕𝜌 − 𝑁 𝑑𝐱
𝑁 ∫ℝ𝑁𝑑 𝑡 𝑡 𝜌 𝑡
𝑁 ∫ℝ𝑁𝑑 𝑡 𝑡 𝜌 𝑡
[( ( ( )) ]) ( )
1 ∑
𝑁
𝑁,𝐶𝑘 ̃ 𝐶𝑘 ,𝑖 𝐶𝑘 ,𝑖 𝜌̃𝑁
𝑡
̃ 𝑁
= − 𝔼𝐶𝑘 𝑑𝑖𝑣𝑥𝑖 𝜌̃𝑡 𝑏𝑡 (𝐱) + 𝐾𝑡 (𝐱) − 𝜎 △𝑥𝑖 𝜌̃𝑡 log 𝑁 + 1 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
( )
1 ∑
𝑁
( ( ( ) ( ))) 𝜌̃𝑁
𝑡
+ −𝑑𝑖𝑣𝑥𝑖 𝜌𝑁 𝑡 𝑏 𝑥 𝑖 + 𝐾 ∗ 𝜌 𝑡 𝑥𝑖 + 𝜎 △ 𝑥 𝜌
𝑖 𝑡
𝑁
− 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡𝑁

( ( ( )) ) ( )
1 ∑
𝑁 𝑁
𝑁,𝐶𝑘 ̃ 𝐶𝑘 ,𝑖 𝐶 ,𝑖 𝜌
̃ 𝑡
= 𝔼𝐶𝑘 𝜌̃𝑡 𝑏𝑡 (𝐱) + 𝐾̃ 𝑡 𝑘 (𝐱) − 𝜎 div𝑥𝑖 𝜌̃𝑁 𝑡 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
( ( ( ) )
1 ∑
𝑁
( )) 𝜌̃𝑁
+ 𝜌𝑁
𝑡 𝑏 𝑥𝑖 + 𝐾 ∗ 𝜌𝑡 𝑥𝑖 − 𝜎 div𝑥𝑖 𝜌𝑁 𝑡 ⋅ −∇𝑥𝑖 𝑡𝑁 𝑑𝐱.
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡

𝜌̃𝑁 𝜌𝑁 𝜌̃𝑁 1 ∑
Taking note of ∇𝑥𝑖 log 𝑡
= 𝑡
∇𝑥𝑖 𝑡
and introducing auxiliary variables 𝐾̃ 𝑡𝑖 and 𝐹 (𝑥𝑖 ) = 𝑗≠𝑖 𝐾(𝑥𝑖 − 𝑥𝑗 ) ,
𝜌𝑁
𝑡 𝜌̃𝑁
𝑡 𝜌𝑁
𝑡
𝑁
the above equation is equivalent to

( ( ( )))
1 ∑
𝑁
𝑑 ( 𝑁 𝑁) 𝑁,𝐶 𝐶 ,𝑖 𝜌̃𝑁
𝑡
𝑁 𝜌̃𝑡 ∣ 𝜌𝑡 = 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝑏̃ 𝑡 𝑘 (𝐱) − 𝑏 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱 (3.34)
𝑑𝑡 𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( )
1 ∑
𝑁
𝑁,𝐶 𝐶 ,𝑖 𝑁,𝐶 ( ) 𝜌̃𝑁
𝑡
+ 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡 𝑘 (𝐱) − 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱 (3.35)
𝑁 ∫ℝ𝑁𝑑 𝜌
𝑖=1 𝑡
∑𝑁 ( ( ) ( )) 𝜌̃𝑁
1 𝑁,𝐶 𝑡
+ 𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝜌̃𝑁
𝑡 𝐹 𝑥 𝑖 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱 (3.36)
𝑁 ∫ℝ𝑁𝑑 𝜌𝑁
𝑖=1 𝑡

1 ∑𝑁
( ( ) ( )) 𝜌̃𝑁
𝑡
+ 𝐹 𝑥𝑖 − 𝐾 ∗ 𝜌𝑡 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇𝑥𝑖 log 𝑑𝐱 (3.37)
𝑁 ∫ℝ𝑁𝑑 𝜌𝑁
𝑖=1 𝑡

Zhao et al.: Preprint submitted to Elsevier Page 14 of 25


𝜎 ∑
𝑁 | 𝜌̃𝑁 |2
𝑁| 𝑡 |
− 𝜌̃ |∇ log 𝑁 | 𝑑𝐱. (3.38)
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝑡 || 𝑥𝑖 𝜌𝑡 ||
( ) ( )
Among them, unlike [28], there is no 𝔼𝐶 𝐾̃ 𝑡𝑖 = 𝐹 (𝑥𝑖 ), i.e. 𝔼𝐶 𝐾̃ 𝑡𝑖 is no longer an unbiased estimate of 𝐹 (𝑥𝑖 ),
so extra processing is required here. Consider another batch 𝐶̃𝑘 independent of 𝐶𝑘 , so,

( ( ))
1 ∑
𝑁
𝑁,𝐶 ( ) 𝜌̃𝑁
𝔼𝐶𝑘 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝜌̃𝑁
𝑡 𝐹 𝑥𝑖 ⋅ ∇𝑥𝑖 log 𝑡𝑁 𝑑𝐱
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑡
( [( )( ( ) ( ))] ( )] )
1 ∑
𝑁
𝑁,𝐶 𝑁,𝐶̃ [ ( ) 𝜌̃𝑁
𝑡
= 𝔼𝐶𝑘 ,𝐶̃𝑘 𝜌̃𝑡 𝑘 − 𝜌̃𝑡 𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 + 𝔼𝐶𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱.
𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝜌𝑁𝑡

1 2
Take note of 𝑎𝑏 ≤ 𝑟𝑎2 + 4𝑟
𝑏 , apply it then,

[( ( ) ( )) ( 𝑁,𝐶𝑘 )] 𝜌̃𝑁
𝑁,𝐶̃ 𝑡
𝔼𝐶𝑘 ,𝐶̃𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑡 − 𝜌̃𝑡 𝑘 ⋅ ∇𝑥𝑖 log 𝑁 𝑑𝐱
∫ℝ𝑁𝑑 𝜌𝑡
⎡ | 𝑁,𝐶𝑘 𝑁,𝐶̃𝑘 ||
2 ⎤
⎢ |𝜌̃ − 𝜌̃ ⎥
| 𝑡 | | |2
2 | ̃𝑖 ( ) ( )|2 |
𝑡
| 𝑑𝐱⎥ + 𝜎 𝑁|
𝜌̃𝑁
𝑡 |
≤ 𝔼𝐶𝑘 ,𝐶̃𝑘 ⎢ |𝐾𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 | 𝜌̃ |∇ log | 𝑑𝐱,
𝜎 ⎢∫ℝ𝑁𝑑 | | 𝑁,𝐶̃ ⎥ 8 ∫ℝ𝑁𝑑 𝑡 || 𝑥𝑖 𝜌𝑁 |
𝜌̃𝑡 𝑘 𝑡 |
⎢ ⎥
⎣ ⎦

[ ( ) ( )] 𝜌̃𝑁
𝑡
𝔼𝐶𝑘 𝐹 𝑥𝑖 − 𝐾̃ 𝑡𝑖 𝑥𝑖 𝜌̃𝑁
𝑡 ⋅ ∇ 𝑥𝑖 log 𝑑𝐱
∫ℝ𝑁𝑑 𝜌𝑁𝑡
( [ ( ) ( )])2 𝑁 | 𝜌̃𝑁 |2
2 𝜎 | 𝑡 |
≤ 𝔼𝐶𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑡 𝑑𝐱 + 𝜌̃𝑁 |∇ log | 𝑑𝐱.
𝜎 ∫ℝ𝑁𝑑 8 ∫ℝ𝑁𝑑 𝑡 || 𝑥𝑖 𝜌𝑁 |
𝑡 |

Apply (3.17) of theorem 3.4, we have,


( [ ( ) ( )])2 𝑁
𝔼𝐶𝑘 𝐾̃ 𝑡𝑖 𝑥𝑖 − 𝐹 𝑥𝑖 𝜌̃𝑡 𝑑𝐱 ≤ 𝑐𝛽 2 𝜏 2 .
∫ℝ𝑁𝑑

where 𝑐 is a constant. Then, consider the Log-Sobolev inequality,

1 ( )
𝔼 (𝑓 log 𝑓 ) − (𝔼𝑓 ) log (𝔼𝑓 ) ≤ 𝔼 𝑓 |∇ log 𝑓 |2 . (3.39)
2
apply (3.39) so that,
( )
| 𝜌
̃ 𝑁 |2 𝜌
̃ 𝑁 | 𝜌
̃ 𝑁 |2
| | | |
𝜌̃𝑁 |∇ log 𝑡𝑁 | 𝑑𝐱 = 𝔼𝜌𝑁 𝑡
|∇ log 𝑁 |
𝑡
∫ℝ𝑁𝑑 𝑡 | 𝑥𝑖 𝜌 | 𝑡 𝜌 𝑁 | 𝑥𝑖 𝜌 |
| 𝑡 | 𝑡 | 𝑡 |
( ) ( ) ( )
𝜌̃𝑁
𝑡 𝜌̃𝑁
𝑡 𝜌̃𝑁𝑡 𝜌̃𝑁
𝑡 ( 𝑁 𝑁)
≥2𝔼𝜌𝑁 log − 2𝔼 𝜌 𝑁 log 𝔼𝜌 𝑁 ≥ 𝑐 𝑁 𝜌̃𝑡 ∣ 𝜌𝑡 .
𝑡 𝜌𝑁𝑡 𝜌𝑁𝑡
𝑡 𝜌𝑁 𝑡
𝑡 𝜌𝑁𝑡

The same method is applied to the other items in (3.34), and the following results can be obtained by integrating
them:
( ) ( )
2 ∑
𝑁
𝑑 | 𝑁 𝜎 𝑁| 𝑁 |𝑏̃ 𝑖 (𝑥) − 𝑏(𝑥𝑖 )|2 𝜌̃𝑁,𝐶𝑘 𝑑𝑥
𝑁 𝜌̃𝑁
𝑡 || 𝑡
𝜌 ≤ − 𝐻 𝜌
̃ |𝜌 + 𝔼
𝑑𝑡 2𝐶𝐿𝑆 𝑁 𝑡 | 𝑡 𝜎𝑁 𝑖=1 𝐶𝑘 ∫ℝ𝑑 | 𝑡 | 𝑡

Zhao et al.: Preprint submitted to Elsevier Page 15 of 25


[ ]
2 ∑
𝑁
| 𝐶𝑘 ,𝑖 ( )|2
+ 𝔼 |𝐾̃ (𝑥𝑖 ) − 𝐾̃ 𝑖 𝑥𝑖 | 𝜌̃ 𝑘 𝑑𝑥
𝑁,𝐶
𝜎𝑁 𝑖=1 𝐶𝑘 ∫ℝ𝑑 | 𝑡 𝑡 | 𝑡
| |
⎡ | 𝑁,𝐶𝑘 𝑁,𝐶̃𝑘 ||
2 ⎤
⎢ |𝜌̃ − 𝜌̃ ⎥

𝑁 | 𝑡 |
2 | ̃𝑖 ( ) |2 |
𝑡
| 𝑑𝑥⎥
+ 𝔼𝐶𝑘 ,𝐶̃𝑘 ⎢ |𝐾𝑡 𝑥𝑖 − 𝐹 (𝑥𝑖 )|
𝜎𝑁 𝑖=1 ⎢∫ℝ𝑁𝑑 | | 𝑁,𝐶̃ ⎥
⎢ 𝜌̃𝑡 𝑘 ⎥
⎣ ⎦
2 ∑
𝑁
| ( ) |2
+ 𝜌̃𝑁 |𝐹 𝑥𝑖 − 𝐾 ∗ 𝜌𝑡 (𝑥𝑖 )| 𝑑𝑥 + 𝑐𝛽 2 𝜏 2
𝜎𝑁 𝑖=1 ∫ℝ𝑁𝑑 𝑡 | |
( )
𝜎 |
∶= − 𝐻 𝜌̃𝑁 |𝜌𝑁 + 𝐽1 + 𝐽2 + 𝐽3 + 𝐽4 + 𝑐𝛽 2 𝜏 2 ,
2𝐶𝐿𝑆 𝑁 𝑡 | 𝑡

where 𝐶𝐿𝑆 and 𝑐 are positive constants. Since RBM and RBM-M are almost identical except for kernel, the estimation
of the same term like 𝐽1 and 𝐽4 can be directly
( referred
) [to in
( the) analysis [28].] The estimates of several terms related
̃ are given here. For 𝐽2 , we have 𝐾̃ 𝑖 𝑥𝑖 = 𝔼 𝐾̃ 𝑖 𝑋̃ 𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 and apply Tayler’s expansion for any
to 𝐾(⋅)
[ ) 𝑡 𝑡 𝑡
𝑡 ∈ 𝑇𝑘 , 𝑇𝑘+1 :

( ) [ ( ) ( ) ]
𝐶 ,𝑖
𝐾̃ 𝑡 𝑘 (𝑥𝑖 ) − 𝐾̃ 𝑡𝑖 𝑥𝑖 = 𝔼 𝐾̃ 𝑡𝑖 𝑋̃ 𝑇𝑖 − 𝐾̃ 𝑡𝑖 𝑋̃ 𝑡𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘
𝑘
[ ] ( ) ( )
= 𝔼 𝑋̃ − 𝑋̃ 𝑖 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 ⋅ ∇𝑥 𝐾̃ 𝑖 𝑥𝑖 + 𝑟̂𝑡 𝑥𝑖 ,
𝑖
𝑇𝑘 𝑡 𝑖 𝑡

where the remainder takes the form


[ ]
( ) ( )⊗2 1 ( )
1
𝑟̂𝑡 𝑥𝑖 ∶= 𝔼 𝑋̃ 𝑇𝑖 − 𝑋̃ 𝑡𝑖 ∶ ∇2 𝐾̃ 𝑖 (1 − 𝑠)𝑋̃ 𝑡𝑖 + 𝑠𝑋̃ 𝑇𝑖 𝑑𝑠 ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 .
2 𝑘 ∫0 𝑥𝑖 𝑡 𝑘

Note that
( the𝑝 Hessians) of K have at most polynomial growth through, ignore the higher order of 𝛽 and
sup𝑡≥0 𝔼 ||𝑋̃ 𝑡𝑖 || |𝐶 ≤ 𝐶𝑝 through [30]:

1 [| ( )| ]
| ( )| | 2 ̃𝑖 ̃ 𝑖 ̃ 𝑖 | | ̃𝑖 ̃ 𝑖|
2
̃
|𝑟̂𝑡 𝑥𝑖 | ≤ 𝔼 |∇𝑥 𝐾𝑡 (1 − 𝑠)𝑋𝑡 + 𝑠𝑋𝑇 | ⋅ |𝑋𝑇 − 𝑋𝑡 | ∣ 𝐗𝑡 = 𝐱, 𝐶𝑘 𝑑𝑠
| | ∫0 | 𝑖 𝑘 | | 𝑘 |
[ ( ) ( ) ]
| | ̃ 𝑖 |𝑝 | ̃ 𝑖 |𝑝 | ̃ 𝑖 |𝑝 || | ̃ 𝑖 𝑖|
2
| | ̃ 𝑖 |𝑝 ̃ ̃
≲ 𝐶𝔼 |𝛽 1 + |𝑋𝑡 | + |𝑋𝑇 | + 𝛽(1 − 𝛽) 1 + |𝑋𝑡−1 | + |𝑋𝑇 −1 | | ⋅ |𝑋𝑇 − 𝑋𝑡 | ∣ 𝐗𝑡 = 𝐱, 𝐶𝑘
| | 𝑘| | | | 𝑘 | | | 𝑘 |
[ ]
̃ ||𝑋̃ 𝑖 − 𝑋̃ 𝑖 || ∣ 𝐗̃ 𝑡 = 𝐱, 𝐶𝑘 .
2
≤ 𝐶𝔼
| 𝑘𝑇 𝑡 |

The remaining part of 𝐽2 has nothing to do with kernel, and the conclusion can be drawn by referring to the proof
of [28]:
( ( ))
1
𝐽2 ≤ 𝑐̃′ 𝜏 2 3 +  𝜌̃𝑁 𝑇𝑘 .
𝑁
𝑖
Then for the 𝐽3 , through the assumption that |𝐾 𝑡 − 𝐹 (𝑥𝑖 )| is bounded and kernel 𝐾(⋅) is Lipschitz continuous:

| ̃𝑖 ( ) ( )| | 𝑖 ( ) ( )| | 𝑖 ( ) ( )|
|𝐾𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 | ≤ ||𝐾 𝑡 𝑥𝑖 − 𝐹 𝑥𝑖 || + ||𝐾 𝑡 𝑥𝑖 − 𝐾̃ 𝑡𝑖 𝑥𝑖 || ≤ 𝐶𝑏𝑜𝑢𝑛𝑑 + 𝐶𝐿 𝐿.
| | | | | |

Zhao et al.: Preprint submitted to Elsevier Page 16 of 25


Table 1
The relationship between the error and the time step.

𝜏 × 10−3 error
16 0.0188
4 0.0102
1 0.0051
0.25 0.0025

| 𝑁,𝐶𝑘 𝑁,𝐶̃𝑘 |2
⎡ |𝜌̃
| 𝑡 −𝜌̃𝑡 |
|

which means − 𝐹 (𝑥𝑖 )| is also bounded, then we only need to estimate 𝔼𝐶𝑘 ,𝐶̃𝑘 ⎢∫ℝ𝑁𝑑
|𝐾̃ 𝑡𝑖 | | 𝑑𝐱⎥ which has
⎢ 𝑁,𝜀
𝜌̃𝑡 𝑘 ⎥
⎣ ⎦
been given by [28]. So that we have
( ( ))
1
𝐽3 ≤ 𝑐̃′′ 𝜏 2 1 +  𝜌̃𝑁 𝑇𝑘 .
𝑁
Finally, combining all of the above results can be obtained

𝑑 ( ) ( ) 𝐶
 𝜌̃𝑁 ∣ 𝜌𝑁 ≤ −𝐶0 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 + 𝐶1 𝜏 2 + 2 ,
𝑑𝑡 𝑁 𝑡 𝑡 𝑁
then
( ) ( ) 𝐶
sup 𝑁 𝜌̃𝑁 𝑁
𝑡 ∣ 𝜌𝑡 ≤ 𝑁 𝜌̃0𝑡 ∣ 𝜌0𝑡 + 𝐶1 𝜏 2 + 2 .
𝑡 𝑁

4. Numerical experiment
This section mainly demonstrates a set of numerical experiments to help verify the theories presented in the previous
section. In the following numerical experiments, regardless of whether the RBM or RBM-M algorithm is used, the
total number of particles 𝑁 in the interactive particle system is 10,000, and the number of particles 𝑝 in each group
is 360. This setting takes into account the computational efficiency, and the numerical instability caused by a small
number of particles in each group can be avoided.
The following numerical experiments can correspond to the previous theoretical analysis one by one, including the
convergence order of the kernel function after regularizing, the error of RBM-M and the good statistical properties of
RBM-M. In addition, the theoretical results are supplemented to some extent by numerical experiments.

4.1. The error is proportional to 𝜏
According to the theorem 3.2 that we gave the description, the error of RBM-M and discrete time step 𝜏 should be
proportional.
We select biot-savart to verify this conclusion. In the corresponding interacting particle systems, the corresponding
kernel function is[9]:

𝑧⊥
𝐾(𝑧) = .
∥ 𝑧 ∥2

Here 𝑧⊥ is the orthogonal complement of 𝑧. At the same time, for the sake of simplicity, the diffusion term 𝜎 of the
equation is selected to obey the standard normal distribution, and the applied flow 𝑉 of the system is 0.
The numerical results are recorded in table 1, and the result of numerical experiment that we do can prove it greatly:

It can be seen that as the time step of the dispersion becomes 14 of the original, the error of the algorithm basically

becomes half of the original. This shows that 𝜏 is proportional to error, which is exactly what our theory tells us.

Zhao et al.: Preprint submitted to Elsevier Page 17 of 25


4.2. Compare the error between RBM-M and RBM
In this section, we will primarily focus on comparing the algorithmic accuracy of RBM-M and RBM under
various scenarios (primarily involving different kernel functions, representing different interacting particle systems).
Some of these interacting particle systems are derived from real physical equations, while others are self-constructed
systems with significant singularity, which are specifically designed to verify the stability of the algorithms. The
comparison will include both first-order and higher-order systems. By examining several numerical examples, we
can comprehensively evaluate the performance between the two algorithms.
As an illustrative example, we will choose a second-order interacting particle system. The specific form of this
second-order system is:

d𝑋𝑡𝑖,𝑁 = 𝑉𝑡𝑖,𝑁 𝑑𝑡, (4.1)

( )( )
1 ∑ | |
d𝑉𝑡𝑖 = 𝐾 |𝑋𝑡𝑗 − 𝑋𝑡𝑖 | 𝑉𝑡𝑗 − 𝑉𝑡𝑖 d𝑡 + 𝜎d𝐵𝑡𝑖 , (4.2)
𝑁 𝑗≠𝑖 | |

where the kernel function k has the concrete form:


( ) 𝑋𝑡𝑗 − 𝑋𝑡𝑖
| |
𝐾 |𝑋𝑡𝑗 − 𝑋𝑡𝑖 | = . (4.3)
| | 1+ ∣ 𝑋𝑡𝑗 − 𝑋𝑡𝑖 ∣2

In this second-order system, the variable 𝑥 can be interpreted as the position of the particle, and the variable 𝑣 can be
interpreted as the velocity of the particle. This provides a clear physical interpretation of the entire system. The system
is a concrete representation of Newton’s second law of motion. Equation (4.1) shows that the position of the particle
is determined by the product of velocity and time, while Equation (4.2) can be understood as the effect of an external
force on the particle, namely the acceleration, and the velocity is determined by the acceleration.
We set the initial conditions of the stochastic differential equation (SDE) such that the initial values of 𝑋 are
distributed uniformly within a unit circle, while the initial values of 𝑉 are set to 0. Additionally, the evolution time
is set to 𝑇 = 0.02𝑠, and the discrete time step is 𝜏 = 0.001𝑠. In this case, we will compare the real solution of the
interacting particle system, the numerical solution of RBM, and the numerical solution of RBM-M (with a control
parameter 𝛽 = 0.01). The results are shown in the figure 2 and 3.

Figure 2: Two order system–x.

It can be intuitively observed from the above figure that both RBM and RBM-M have achieved good results.
In addition to intuitively comparing the results from the figure above, the specific 𝐿2 values of RBM and RBM-M
are also listed here. The L2 error here is defined as:

√𝑁
√∑
𝑙 = √ (𝑥𝑖 − 𝑥𝑖𝑡𝑟𝑢𝑒 )2 . (4.4)
𝑖=1

Zhao et al.: Preprint submitted to Elsevier Page 18 of 25


Figure 3: Two order system–v.

Here 𝑥𝑖 represents the numerical value and 𝑥𝑖𝑡𝑟𝑢𝑒 represents the real solution. Under this definition, the specific
numerical result is: the error of RBM is 4.489 × 10−2 , and the error of RBM-M is 4.419 × 10−2 . In this example,
although the error of RBM-M is slightly lower than that of RBM, the effect is not significant. We speculate that
because this second-order interacting particle system is relatively smooth, the correction brought by the introduction
of momentum is not substantial. At the same time, the introduction of momentum may also introduce some errors,
so the value of the control parameter 𝛽 is small in this case. However, from another perspective, the performance of
RBM-M should be no worse than RBM, provided that the parameter 𝛽 is reasonably selected.
To further demonstrate that the RBM-M algorithm can indeed achieve good results when the interacting particle
system has a large singularity, we construct the following system:
( ) 1 ∑ ( 𝑖 )
𝑑𝑋 𝑖 = −∇𝑉 𝑋 𝑖 𝑑𝑡 + 𝐾 𝑋 − 𝑋 𝑗 𝑑𝑡 + 𝜎𝑑𝐵 𝑖 , (4.5)
𝑁 − 1 𝑗≠𝑖

where kernel function is:


𝑥𝑖 − 𝑥
𝐾𝑥 (𝑠𝑖 − 𝑠) = , (4.6)
cosh(∥ 𝑠𝑖 − 𝑠 ∥2 )

cosh(𝑦𝑖 − 𝑦)
𝐾𝑦 (𝑠𝑖 − 𝑠) = . (4.7)
∥ 𝑠𝑖 − 𝑠 ∥2

here 𝑠 = (𝑥, 𝑦) represents the position of the particle, and 𝐾𝑥 (⋅) and 𝐾𝑦 (⋅) represent the components of the kernel
function in the X-axis and y-direction, respectively. The additional set system flow is:
( )
( ) 0
−∇𝑉 𝑠𝑖 = . (4.8)
𝑐𝑜𝑠(𝑥𝑖 )

Due to the large singularity of this system, RBM-M theoretically needs stronger correction effect, so the control
parameter 𝛽=0.1 here has a larger value. The remaining settings are consistent with the above second-order system, i.e.,
the initial value is that the initial values of 𝑋 are distributed uniformly in a unit circle, the evolution time of termination
is 𝑇 = 0.02𝑠, and the discrete time step is 𝜏 = 0.001𝑠. So the standard solution, the solution of RBM and the solution
of RBM-M are respectively in figure 4.
There is still no difference between the two numerical algorithms. Similarly, according to the definition of formula
(4.4), compare the 𝐿2 error of the value, where the error of RBM is 7.946 × 10−2 , and the error of RBM-M is only
4.691 × 10−2 . In this example, the difference between the two algorithms is still obvious, and the effect of RBM-M is
obviously better than that of RBM.
In addition, it should be noted that in this example, if the original equation is solved, when 𝑁 = 10, 000 particles
are selected, it takes 54.48𝑠 to solve, while with the same number of particles, it only takes 4.09𝑠 to use RBM and 4.45𝑠

Zhao et al.: Preprint submitted to Elsevier Page 19 of 25


Figure 4: A case of large singularity.

Table 2
The effect of parameter 𝛽 on the algorithm.

𝛽 RBM error RBM-M error


−2
0.04 8.029 × 10 4.529 × 10−2
0.06 7.984 × 10−2 3.449 × 10−2
0.08 8.057 × 10−2 3.519 × 10−2
0.10 7.946 × 10−2 4.695 × 10−2
0.12 8.041 × 10−2 6.423 × 10−2

to use RBM-M. This shows that both RBM and RBM-M can significantly reduce the computation time, and RBM-M
has little additional time compared to RBM.
At the same time, with this example, the influence of control parameter 𝛽 on the correction effect of RBM-M
algorithm can be explored. We set the value of 𝛽 from {0.04, 0.06, 0.08, 0.1, 0.12} , and take the selected beta as the
parameter of RBM-M to compare the accuracy of RBM-M algorithm under different parameter conditions, the results
are in table 2.
As can be seen from the above table, with the increase of control parameter 𝛽 from 0.04 to 0.12, the error of RBM-
M changes to the original 56.41%, 43.19%, 43.67%, 59.09% and 79.88% respectively, which is, when 𝛽 is selected
reasonably, the error can be reduced to less than half of the original. Due to the large singularity of the system, the
corrective effect of the algorithm is significant. Therefore, in the beginning, as 𝛽 increases, the performance of the
RBM-M algorithm improves gradually. However, as 𝛽 continues to increase, the corrective effect reaches saturation,
and the positive impact of further increasing 𝛽 becomes limited, while the introduced error becomes larger and larger,
resulting in a deterioration of the final outcome. In summary, when the system singularity is small, the selected 𝛽
should be small; and when the system singularity is large, the selected 𝛽 should be large. In addition, we also tested
on other systems, The kernel functions employed in the various systems are as follows:
K1: Biot-Savart kernel:
𝑧⊥
𝐾(𝑧) = .
∥𝑧∥
K2: two order system:

d𝑋𝑡𝑖,𝑁 = 𝑉𝑡𝑖,𝑁 𝑑𝑡,

( )( )
1 ∑ | |
d𝑉𝑡𝑖 = 𝐾 |𝑋𝑡𝑗 − 𝑋𝑡𝑖 | 𝑉𝑡𝑗 − 𝑉𝑡𝑖 d𝑡 + 𝜎d𝐵𝑡𝑖 .
𝑁 𝑗≠𝑖 | |

Zhao et al.: Preprint submitted to Elsevier Page 20 of 25


Table 3
Error contrast between RBM and RBM-M.
kernel RBM error RBM-M error
−3
1 8.0033 × 10 7.8905 × 10−3
2 4.849 × 10−2 4.819 × 10−2
3 7.817 × 10−2 7.792 × 10−2
4 7.946 × 10−2 4.691 × 10−2
5 3.589 × 10−1 2.342 × 10−1
6 4.289 × 10−1 2.423 × 10−1

K3: repulsive-attractive Morse potential[23]:

‖𝐱‖ ‖𝐱‖
− 𝑙 − 𝑙
𝐾(𝐱) = 𝐶𝑟 𝑒 𝑟 − 𝐶𝑎 𝑒 𝑎 .

K4: Let the particle position be set as 𝑠𝑖 = (𝑥𝑖 , 𝑦𝑖 )


x-dimension:
𝑥𝑖 − 𝑥
,
cosh(∥ 𝑠𝑖 − 𝑠 ∥2 )

y-dimension:
2
𝑒−(𝑦𝑖 −𝑦)
.
∥ 𝑠𝑖 − 𝑠 ∥

K5: Let the particle position be set as 𝑠𝑖 = (𝑥𝑖 , 𝑦𝑖 )


x-dimension:
sinh(𝑥𝑖 − 𝑥)
,
∥ 𝑠𝑖 − 𝑠 ∥2

y-dimension:

cosh(𝑦𝑖 − 𝑦)
.
∥ 𝑠𝑖 − 𝑠 ∥2

K6: Let the particle position be set as 𝑠𝑖 = (𝑥𝑖 , 𝑦𝑖 )

ln(−𝑒𝑥𝑝(𝑧2 ) + 1)
𝐾(𝑧) = .
tan(∥ 𝑧 ∥2 )∕10 + 2𝜋

The corresponding 𝐿2 error is shown in the table 3.


The first column represents the different kinds of kernels, the second column represents the error of RBM solution,
and the third column represents the error of RBM-M solution. It is important to note that due to the different final results
evolved from various equations (some distributions are more concentrated, while others are more diffuse), comparing
the absolute errors between different equations is not very meaningful. The focus should be on comparing the errors
of the two algorithms.
It can be clearly observed that as the kernel singularity increases, the effect of momentum correction becomes more
pronounced. This suggests that the local momentum algorithm has significant effects in certain situations.

Zhao et al.: Preprint submitted to Elsevier Page 21 of 25


4.3. Probabilistic explanation that RBM-M is more stable than RBM
Now, let us consider characterizing the Singularity of kernels. We define a family of similar kernels as:

⎧ − 1 𝑥<1−𝛼
⎪ 𝑥−1
⎪ 1
𝑓 (𝑥) = ⎨ 2
∣𝑥−1∣ 1−𝛼 <𝑥<1+𝛼 , (4.9)
⎪ 𝛼
⎪ 1
𝑥>𝛼+1
⎩ 𝑥−1

where 𝛼 is a parameter to measure the steepness of a function,and 𝛼 is smaller, the function is more steep ( when
selecting a function, the reason for not taking 0 as the center is to avoid some division of 0 that may lead to a decrease
in numerical accuracy). Figure 5 is the figure of 𝑓 (𝑥):

Figure 5: The steepness of different 𝛼.

If 𝛼 → 0, the function 𝑓 (𝑥) becomes a discontinuous function. In this situation, there is a significant degree of
singularity associated with 𝑓 (𝑥). If we were to use the standard RBM approach in this case, the error would be quite
large. However, as demonstrated by the proof presented earlier, the RBM-M algorithm outperforms the standard RBM
when the function exhibits a high degree of singularity. Let us now compare the effects of these two approaches in
figure 6:
The figure above compares the errors between the two methods. For each data point represented in the figure, the
reported error is the average of 10 repeated calculations, in order to mitigate the effects of random factors. The closer
the 𝑥-axis direction is to 1, the smaller the 𝛼, which means the function is more steep and has greater singularity. From
the graph, it can be observed that as 𝛼 decreases, the singularity of the function increases, and the error of the standard
RBM becomes increasingly larger than that of the RBM-M method.
The specific data is in table 4.
In this numerical experiment, we set the coefficient of the RBM-M method, 𝛽, to 0.1. When the function is flatter,
we use a smaller 𝛽, and when the function is more steep, we use a larger 𝛽 (the final difference has decreased because
the singularity of the function is too large, and a larger 𝛽 should be selected). All of these choices follow the theorem
3.4.
Moreover, we can explain the main idea of RBM and RBM-M from a sampling perspective. (3.5) and (3.11)
have explained that the core concept is to use the distribution of a few particles to estimate the overall distribution of
all particles. When the singularity of the kernel is relatively small, and the distribution is relatively flat, it is easier
to estimate. However, if the kernel singularity is relatively high, as mentioned earlier, it becomes more challenging to
estimate accurately. The figure 7 illustrates the sampling processes of two distinct methods. In each time step, the RBM

Zhao et al.: Preprint submitted to Elsevier Page 22 of 25


Figure 6: Comparison of errors between RBM and RBM-M.

Table 4
Comparison of errors of two algorithms for functions with different singularity.

𝛼 RBM RBM-M advantage


−3
5 × 10 0.1743 0.1562 0.0181
2.5 × 10−3 0.2627 0.1752 0.0875
1 × 10−3 0.5601 0.3679 0.1923
7.5 × 10−4 0.6784 0.4317 0.2467
5 × 10−4 0.9236 0.5218 0.3418
2.5 × 10−4 1.1161 0.8827 0.2835

Figure 7: The sampling of RBM and RBM-M.

Zhao et al.: Preprint submitted to Elsevier Page 23 of 25


requires a single sampling operation, while the RBM-M performs multiple sampling iterations due to its consideration
of multiple time steps. The pink circular data points represent the particles sampled by the RBM at a specific time step,
whereas the black data points depict the samples generated by the RBM-M. Since each step in the RBM-M process
influences the subsequent evolution of the system, it is analogous to having a greater number of data points being
sampled. The transparency of the black data points in the figure indicates the intensity of the influence of the data on
the RBM-M results. The earlier the evolution time, the more transparent the corresponding data points, and the effects
of the system’s previous evolution will gradually diminish. This visual representation highlights the key differences
between the sampling approaches of the RBM and RBM-M methods, underscoring the importance of considering
multiple time steps in the RBM-M approach and the resulting impact on the data’s influence on the overall system
evolution.
The above figure illustrates the sampling process of the two methods. In each time step, the standard RBM needs to
sample only once, while the RBM-M samples multiple times by considering multiple time steps. If we directly solve the
stochastic differential equations (SDEs), we will definitely sample the steep parts of the function. However, if we use
the standard RBM, our sampling may not necessarily represent the entire distribution: if the steep part of the function
1 ∑ 𝑖 𝑗
is not extracted during sampling, the ∣ 𝑝−1 𝑗∈𝐶𝑖 ,𝑗≠𝑖 𝐾(𝑥 − 𝑥 ) ∣ will be slightly small. Even if the steep part of the
1
function is successfully extracted during sampling, the large outlier will only be weakened by the coefficient 𝑝−1 , when
1 1 ∑ 𝑖 𝑗
it should be accurately using 𝑁−1 , so the ∣ 𝑝−1 𝑗∈𝐶𝑖 ,𝑗≠𝑖 𝐾(𝑥 − 𝑥 ) ∣ will be slightly large. Since the RBM-M uses
exponential average weighting to consider more particles, it can to some extent avoid the errors caused by sampling.
Furthermore, it can be inferred from (3.18) that the variance of RBM-M is slightly smaller than that of the standard
RBM, and both should be (asymptotically) unbiased estimates in an ideal state, so the RBM-M is more likely to
converge to a good result.

5. Conclusions and future work


The RBM-M algorithm is an extension of the RBM algorithm that introduces a momentum correction. Theoretical
derivation and numerical experiments have shown that the RBM-M algorithm can achieve better results than the
standard RBM algorithm when the interacting particle system has a relatively large singularity, and the value of the
control parameter 𝛽 should be relatively large in such cases. Conversely, when the singularity of the interacting particle
system is relatively small, the value of the control parameter 𝛽 should be relatively small.
Furthermore, there is potential to expand the application domain of this algorithm to make it suitable for a wider
range of particle systems. For example, other algorithms such as Adam [21] and RMSProp [22] could be considered
for the correction of RBM. Simultaneously, identifying the optimal parameter 𝛽 is also a valuable area of research.

Data availability
No data was used for the research described in the article.

Acknowledgement
Z.-W. Zhang’s research is . J.-R. Chen’s work . The authors are also very grateful to anonymous reviewers for their
valuable comments and suggestions, which have greatly helped to improve our manuscript.

References
[1] S. Jin, L. Li, J.-G. Liu, Random batch methods (rbm) for interacting particle systems, Journal of Computational Physics 400 (2020) 108877.
doi:https://doi.org/10.1016/j.jcp.2019.108877.
[2] Z. Wang, J. Xin, Z. Zhang, Deepparticle: Learning invariant measure by a deep neural network minimizing wasserstein distance on data
generated from an interacting particle method, Journal of Computational Physics 464 (2022) 111309. doi:https://doi.org/10.1016/
j.jcp.2022.111309.
[3] B. Perthame, Pde models for chemotactic movements: Parabolic, hyperbolic and kinetic, Applications of Mathematics 49 (2004) 539–564.
[4] L. Greengard, V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics 73 (2) (1987) 325–348. doi:https:
//doi.org/10.1016/0021-9991(87)90140-9.
[5] J. S. Bagla, Treepm: A code for cosmological n-body simulations, Journal of Astrophysics and Astronomy 23 (1999) 185–196.
[6] Y. Yao, J. Capecelatro, An accurate particle-mesh method for simulating charged particles in wall-bounded flows, Powder Technology 387
(2021) 239–250. doi:https://doi.org/10.1016/j.powtec.2021.04.012.

Zhao et al.: Preprint submitted to Elsevier Page 24 of 25


[7] S. Ruder, An overview of gradient descent optimization algorithms, ArXiv abs/1609.04747 (2016).
[8] J. Lin, C. Song, K. He, L. Wang, J. E. Hopcroft, Nesterov accelerated gradient and scale invariance for improving transferability of adversarial
examples, ArXiv abs/1908.06281 (2019).
[9] L.-P. Chaintron, A. Diez, Propagation of chaos: A review of models, methods and applications. i. models and methods, Kinetic and Related
Models (2022).
[10] P. CLIFFORD, A. SUDBURY, A model for spatial conflict, Biometrika 60 (3) (1973) 581–588. doi:10.1093/biomet/60.3.581.
[11] J. M. Swart, A course in interacting particle systems, arXiv: Probability (2017).
[12] A. Ferreiro-Castilla, A. E. Kyprianou, R. Scheichl, An euler-poisson scheme for lévy driven sdes, arXiv: Probability (2013).
[13] J. Hong, C. Huang, X. Wang, Strong convergence rate of runge–kutta methods and simplified step-$n$ euler schemes for sdes driven by
fractional brownian motions, arXiv: Numerical Analysis (2017).
[14] Z. Wang, J. X. Xin, Z. Zhang, Deepparticle: learning invariant measure by a deep neural network minimizing wasserstein distance on data
generated from an interacting particle method, J. Comput. Phys. 464 (2021) 111309.
[15] Z. Wang, J. X. Xin, Z. Zhang, A deepparticle method for learning and generating aggregation patterns in multi-dimensional keller-segel
chemotaxis systems, ArXiv abs/2209.00109 (2022).
[16] S. Jin, L. Li, Random batch methods for classical and quantum interacting particle systems and statistical samplings, ArXiv abs/2104.04337
(2021).
[17] J. Liu, L. Wang, Z. jiang Zhou, Positivity-preserving and asymptotic preserving method for 2d keller-segal equations, Math. Comput. 87 (2016)
1165–1189.
[18] M. Eisterer, The significance of solutions of the inverse biot–savart problem in thick superconductors, Superconductor Science and Technology
18 (2004) S58 – S62.
[19] X. Chen, Z. S. Wu, M. Hong, Understanding gradient clipping in private sgd: A geometric perspective, ArXiv abs/2006.15429 (2020).
[20] L. Kong, M. Tao, Quantitative convergences of lie group momentum optimizers, ArXiv abs/2405.20390 (2024).
[21] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, CoRR abs/1412.6980 (2014).
[22] D. Xu, S. Zhang, H. Zhang, D. P. Mandic, Convergence of the rmsprop deep learning method with penalty for nonconvex optimization, Neural
Networks 139 (2021) 17–23. doi:https://doi.org/10.1016/j.neunet.2021.02.011.
[23] J. A. Carrillo, Y. Huang, S. Martin, Explicit flock solutions for quasi-morse potentials, European Journal of Applied Mathematics 25 (2013)
553 – 578.
[24] P. Degond, M. Engel, Coagulation–fragmentation model for animal group-size statistics, Journal of Nonlinear Science 27 (2015) 379–424.
[25] T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, O. Sochet, Novel type of phase transition in a system of self-driven particles., Physical review
letters 75 6 (1995) 1226–1229.
[26] J. A. Carrillo, S. Lisini, E. Mainini, Uniqueness for keller-segel-type chemotaxis models, Discrete and Continuous Dynamical Systems 34
(2012) 1319–1338.
[27] S. Motsch, E. Tadmor, Heterophilious dynamics enhances consensus, SIAM Rev. 56 (2013) 577–621.
[28] Z. Huang, S. Jin, L. Li, Mean field error estimate of the random batch method for large interacting particle system, ArXiv abs/2403.08336
(2024).
[29] L. Li, Y. Wang, A sharp uniform-in-time error estimate for stochastic gradient langevin dynamics, ArXiv abs/2207.09304 (2022).
[30] S. Jin, L. Li, On the mean field limit of the random batch method for interacting particle systems, Science China Mathematics 65 (2020) 169
– 202.

Zhao et al.: Preprint submitted to Elsevier Page 25 of 25

You might also like