0% found this document useful (0 votes)
22 views7 pages

27678-Article Text-103861-1-10-20250805

Uploaded by

Nghĩa Bùi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

27678-Article Text-103861-1-10-20250805

Uploaded by

Nghĩa Bùi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Robotics and Control (JRC)

Volume 6, Issue 4, 2025


ISSN: 2715-5072, DOI: 10.18196/jrc.v6i4.27678 2045

Hybrid Path Planning for Wheeled Mobile Robot


Based on RRT-star Algorithm and Reinforcement
Learning Method
Hoang-Long Pham 1, Nhu-Nghia Bui 2, Thai-Viet Dang 3*
1
Research Institute of Post and Telecommunication, Posts and Telecommunications Institute of Technology, Vietnam
2, 3
Department of Mechatronics, School of Mechanical Engineering, Hanoi University of Science and Technology, Vietnam
Email: 1 Longph@[Link], 2 Nghia.BN215977@[Link], 3 [Link]@[Link]
*Corresponding Author

Abstract—In the field of wheeled mobile robots (WMRs), servoing utilizes continuous visual feedback to guide a
path planning is a critical concern. WMRs employ advanced mobile robot toward a stationary target [11]-[17]. However,
algorithms to find out the feasible path from a starting point to achieving this objective can be particularly challenging in
a specific destination. The paper proposes efficient and optimal environments characterized by complex and unpredictable
path planning for WMRs, integrating collision avoidance
strategies and smoothed techniques to determine the best route
behaviors, as well as uncertain disturbances [18], [19]. Such
during navigation. The proposed hybrid path planning consists irregular fluctuations in control systems can considerably
of improved RRTstar algorithm and reinforcement learning affect the efficacy and stability of control mechanisms.
method. Therefore, the RRT* algorithm employs random Consequently, it is imperative to develop a system that
sampling in conjunction with a reinforcement learning model to demonstrates exceptional tracking capabilities to improve the
purposefully guide the sampling process towards areas that performance of vision-based mobile robots. A critical
demonstrate an increased likelihood of successful navigation component in the pursuit of autonomy is motion planning,
completion. The proposed RRTstar-RL algorithm generates which enables wheeled mobile robots (WMRs) to determine
significantly shorter trajectories compared to the traditional their own trajectories [20], [21]. Once equipped with either a
RRT and RRTstar methods. Specifically, the path length with
the proposed algorithm is 11.323 meters, while the lengths for
global or local map through environmental awareness,
RRT and RRTstar are 15.74 and 14.40 meters, respectively. WMRs must formulate a feasible path from their initial
Moreover, the optimization of computation time, especially position to their intended destination. This journey must
when using pre-trained data, greatly speeds up the path-finding comply with specific criteria, which may include reducing
calculation process. In particular, the time needed to generate operational costs, identifying the most expedient route, or
the optimal path with the RRTstar-RL algorithm is 2.02 times minimizing travel time [22]-[23].
faster than that of RRTstar and 1.6 times faster than RRT.
Finally, the proposed RRTstar-RL algorithm has been Path planning algorithms function as proficient
successfully verified for feasibility and effectively meets navigators, adeptly determining routes from initial points to
numerous objectives established during simulations and destinations while skillfully circumventing obstacles [24],
validation experiments. [25]. These algorithms can be classified into two primary
categories: global and local path planning. Global path
Keywords—Wheeled Mobile Robots; Reinforcement planning serves as a comprehensive strategist, identifying a
Learning; Rrtstar; Path Planning.
sequence of critical waypoints that connect the starting point
I. INTRODUCTION to the endpoint [26]-[30]. It utilizes three fundamental
techniques: graph search, sampling search, and dynamic
In recent years, mobile robots have gained significant search. Graph search algorithms, such as Dijkstra’s [31] and
prominence in the domains of automation and robotics, Astar [32], excel in low-dimensional spaces, ensuring a
particularly in navigation tasks within environments such as comprehensive exploration of potential routes. In contrast,
warehouses and manufacturing facilities. Motion and path sampling search algorithms are suited for high-dimensional
planning are fundamental components that enable wheeled spaces, providing a probabilistic assurance of pathfinding.
mobile robots (WMRs) to navigate autonomously [1]-[5]. Dynamic search, on the other hand, improves the
Upon acquiring a global or local map through environmental connectivity of path nodes, albeit at the expense of some
perception, the robot must formulate a feasible trajectory completeness. Conversely, local path planning operates as a
from its initial position to the desired destination. These meticulous artist, generating precise trajectories from the
robots are required to operate with both flexibility and start node to the target node within a localized area [33]-[38].
precision to effectively execute their assigned tasks. Prior This category encompasses traditional methods such as the
research has investigated various image processing Rapidly exploring Random Tree (RRT) [39], Time-Elastic
techniques and control strategies aimed at enhancing the Band (TEB) [40], Dynamic Window Approach (DWA) [41],
operational flexibility and cognitive capabilities of WMRs. [42], Artificial Potential Field (APF) [43], [44], and Neural
Traditional control methods are often effective only when the Network Methods (NNM) [45], [46]. As we consider future
specific characteristics of the system and the precise locations developments, the trajectory of motion planning for WMRs
of tracked objects are known [6]-[10]. For example, visual

Journal Web site: [Link] Journal Email: jrc@[Link]


Journal of Robotics and Control (JRC) ISSN: 2715-5072 2046

is evident: it is transitioning from broad, coarse path planning II. PROPOSED METHOD
in spatial contexts to detailed trajectory planning in temporal
A. RRTstar Algorithms
domains, facilitated by the continuously advancing
capabilities of computational power. RRTstar algorithm represents an advancement over the
conventional RRT methodology (see Fig. 1), particularly in
In a comprehensive examination of path finding the domains of pathfinding and optimization. Its primary aim
algorithms, Liu et al. [47] presented the weighted Astar is to determine the most efficient and feasible route for
algorithm to complete the WMR’s understanding. navigation from a specified starting point 𝑥𝑖𝑛𝑖𝑡 to a designated
Concurrently, Feng et al. [48] achieved significant destination 𝑥𝑔𝑜𝑎𝑙 . The operational framework involves the
advancements of the bidirectional search algorithm for random selection of points, referred to as 𝑥𝑟𝑎𝑛𝑑 , which
optimal values. Dang et al. [49] introduced the innovative facilitates the expansion of the search tree. Subsequently, the
jump point search (JPS) algorithm for eliminating the algorithm identifies the vertices within the tree, denoted as
redundant Astar path point, in grid maps. The hybrid JBS- 𝑥𝑛𝑒𝑎𝑟 , that are closest to 𝑥𝑟𝑎𝑛𝑑 , while adhering to a
A*B algorithm and improved DWA increased the safety in
predetermined distance threshold s to ensure that the
local areas. Building upon this progress, Esmaiel et al. [50]
trajectory from 𝑥𝑛𝑒𝑎𝑟 to 𝑥𝑛𝑒𝑤 remains unobstructed by
developed the LQR-RRTstar algorithm, which utilizes a
obstacles. The neighboring points are then evaluated, and the
Linear Quadratic Regulator (LQR) to determine the optimal
parent point with the lowest associated cost is selected.
path for extended random tree nodes within a specified
Ultimately, the neighboring points are reconnected to
timeframe, in dynamic environments. Wang et al. [51]
optimize the overall path, culminating in a complete route.
addressed this challenge by formulating a quadratic convex
optimization problem aimed at minimizing the discrepancy
between the current and ideal states, thereby directly
determining the optimal trajectory under dynamic
constraints. Finally, Zhang et al. [52] introduced the Flat-
RRTstar algorithm, specifically designed for differentially
flat systems. Therefore, trajectory kinematic constraints
derive optimal motion primitives between two grid states,
ultimately producing suboptimal trajectories that connect two
nodes.
Fig. 1. Traditional RRT algorithm
The analysis presented above clearly indicates that
various navigation strategies possess distinct advantages and The RRTstar algorithm diverges from the conventional
disadvantages. To facilitate seamless movement and enhance approach of selecting the nearest point as the parent point for
the stability of WMRs while tracking their trajectories, we the new point. As illustrated in Fig. 2, this process involves
propose a hybrid path planning approach that integrates an drawing a circle around the nearest point 𝑥𝑛𝑒𝑎𝑟𝑒𝑠𝑡 and
improved RRTstar algorithm with RL method [53], [54]. assessing the distance between any point within this circle
Initially, the RRTstar algorithm is refined to leverage the and the new point 𝑥𝑛𝑒𝑤 , in Fig. 2(a). If the distance from
benefits of reinforcement learning. To address the 𝑥𝑛𝑒𝑎𝑟𝑒𝑠𝑡 to 𝑥𝑛𝑒𝑤 is less than the distance from 𝑥𝑛𝑒𝑤 to other
inefficiencies associated with the sampling process of the points (𝑞1 or 𝑞2 ), a connection is established between 𝑟𝑛𝑒𝑎𝑟𝑒𝑠𝑡
RRTstar algorithm, we have incorporated a reinforcement and 𝑥𝑛𝑒𝑤 , in Fig. 2(b). Additionally, it is necessary to evaluate
learning framework. This framework comprises an Actor, the shortest distance between 𝑥𝑛𝑒𝑎𝑟𝑒𝑠𝑡 and 𝑞2 . Should the
which determines the appropriate actions to be taken, and a distance from 𝑥𝑛𝑒𝑤 to 𝑞2 be shorter, the parent of 𝑞2 is
Critic, which evaluates the effectiveness of these actions and updated to 𝑥𝑛𝑒𝑤 (see Fig. 2(c)).
provides feedback for necessary adjustments to the WMRs in
dynamic environments. The Actor is constructed using the U-
Net architecture [55], [56], which is responsible for
generating probability maps, while the Critic employs the
MobileNetV2 [57]-[60] architecture to assess the current
policy, or the weight patterns produced by the Actor, in
relation to the achievement of the reward parameter. The Fig. 2. RRTstar algorithm
proposed RRTstar-RL algorithm offers significant
advantages, including a reduction in inefficient sampling, the Within the context of this research, the authors have
extraction of map features that incorporate trained data into developed the RRTstar algorithm to facilitate enhancements
the pathfinding process, and an acceleration of processing when utilizing reinforcement learning techniques. In
time due to a decreased need for sample review. Two basic particular, 𝑥𝑟𝑎𝑛𝑑 points are selected randomly from the
drawbacks of RRTstar have been addressed, including the configuration space in accordance with the established
inefficient random sampling process and the lack of probability map, which delineates marginal probability along
connectivity between samples. Finally, the feasibility of the the X-axis, in (1) and conditional probability along the Y-
proposed RRTstar-RL algorithm has been successfully axis, in (2):
validated, demonstrating its effectiveness in achieving
various objectives established during simulations and 𝑃(𝑥) = ∑ 𝑃(𝑥, 𝑦) (1)
validation experiments. 𝑦

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method
Journal of Robotics and Control (JRC) ISSN: 2715-5072 2047

𝑃(𝑥, 𝑦) This study diverges from the conventional approach of


𝑃(𝑥 ∣ 𝑦) = (2) employing random sampling of points in RRTstar by utilizing
𝑃(𝑥)
a RL model to strategically direct sampling towards regions
There is always the formula: that exhibit a higher navigation completion rate. During the
𝑥𝑟𝑎𝑛𝑑 ∼ 𝑃(𝑥, 𝑦) ∈ 𝐶𝑓𝑟𝑒𝑒 (3) training phase, the model generates weight maps that
correspond to the processed map data, thereby optimizing
Where 𝐶𝑓𝑟𝑒𝑒 is configuration space. both path cost and computational efficiency. The RL model
is predicated on three fundamental components: state, action,
The process of finding the nearest point consists of the 𝑘
and reward. The state is defined as the input environment,
neighbors closest to the 𝑥𝑟𝑎𝑛𝑑 point in the tree:
which encompasses information regarding the starting point,
𝑁𝑘 (𝑥𝑟𝑎𝑛𝑑 ) = 𝑎𝑟𝑔𝑚𝑖𝑛𝑥𝑖 ∈𝑇 ∥ 𝑥𝑟𝑎𝑛𝑑 − 𝑥𝑖 ∥2 , 𝑖 = 1. . . 𝑘 (4) destination, and obstacles. These elements are integrated into
a corresponding dataset for both training and inference
where 𝑇 is the set of existing points in the search tree. ∥ 𝑥𝑟𝑎𝑛𝑑 purposes, with each instance encoded as a tensor of
− 𝑥𝑖 ∥2 is the Euclidean distance between 𝑥𝑟𝑎𝑛𝑑 and 𝑥𝑖 in dimensions (3, H, W). The action is conceptualized as a
space. Points 𝑥𝑛𝑒𝑤 are created in the direction from 𝑥𝑛𝑒𝑎𝑟 to weight map, wherein each pixel represents the probability of
𝑥𝑟𝑎𝑛𝑑 satisfying the allowed distance: sampling a point at its respective coordinates. The
𝑥𝑛𝑒𝑤 = 𝑥𝑛𝑒𝑎𝑟 + 𝑚𝑖𝑛(𝑟𝑎𝑛𝑔𝑒, ∥ 𝑥𝑟𝑎𝑛𝑑 − 𝑥𝑛𝑒𝑎𝑟 computational framework is constructed based on a normal
𝑥𝑟𝑎𝑛𝑑 − 𝑥𝑛𝑒𝑎𝑟 (5) distribution, denoted as N(μ, σ²), where:
∥) ⋅
∥ 𝑥𝑟𝑎𝑛𝑑 − 𝑥𝑛𝑒𝑎𝑟 ∥ 𝜇, 𝜎² = 𝐴𝑐𝑡𝑜𝑟(𝑚𝑎𝑝) ∈ 𝑅𝐻×𝑊 (8)
The valid points are checked for collisions with obstacles to In (8), the parameters of the normal distribution are
select no collision points. Then the optimal parent is selected processed through the Sigmoid function, resulting in a range
based on selecting points from 𝑘 neighbors so that the total of values between 0 and 1, thereby generating a probability
cost from the current point is minimum. The selection map. The reward function is subsequently determined using
function is illustrated as follows: the following formula:
𝑥𝑝𝑎𝑟𝑒𝑛𝑡 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑐(𝑥𝑖 ) +∥ 𝑥𝑖 − 𝑥𝑛𝑒𝑤 ∥ (6) Reward = 1000 - Cost (9)
where 𝑐 is the distance from the current point to 𝑥𝑖 . The A reward value of zero signifies the absence of a
parents of the points are updated if traversing 𝑥𝑛𝑒𝑤 costs less. satisfactory path within the environment. The model consists
of two primary components: the Actor and the Critic. The
∀𝑥𝑖 ∈ 𝑁𝑘 (𝑥𝑛𝑒𝑤 ): 𝑖𝑓 𝑐(𝑥𝑛𝑒𝑤 )+∥ 𝑥𝑛𝑒𝑤 − 𝑥𝑖 ∥< 𝑐(𝑥𝑖 )
(7) Actor is designed using the U-Net architecture, which is
⇒ 𝑢𝑝𝑑𝑎𝑡𝑒 𝑝𝑎𝑟𝑒𝑛𝑡 𝑜𝑓𝑥𝑖
responsible for generating probability maps, or, in other
terms, producing actions based on the given input. In contrast,
B. Reinforcement Learning (RL) Method
the Critic utilizes the MobileNetV2 architecture to assess the
The primary challenge lies in estimating the RRTstar current policy, or the weight patterns generated by the Actor
function with a limited number of iterations, utilizing a (π) in relation to the attainment of the reward parameter. This
floating-point value for each iteration. While RRTstar can dual structure enhances the model’s capability and efficiency
identify optimal motion paths, its sampling process is in pathfinding within the environment. Unet is inherently a
characterized by inefficiency, as it does not leverage any flexible model with input heads, accompanied by stable
information regarding the environment and fails to derive performance thanks to its symmetric architecture. The setup
insights from previously solved problems. To address these and customization with input data from the environment are
limitations, the authors have incorporated a reinforcement highly valued. MobileNetv2 is specially designed for high
learning framework into the RRTstar algorithm. This speed and small model size, accompanied by minimal
integration involves an Actor that determines the appropriate computational parameters. This architecture is suitable for
actions to take, while a Critic evaluates the effectiveness of accelerating the computation process and evaluating the
these actions and provides feedback for necessary effectiveness of the generated actions. From there, the
adjustments. The overall architecture of this enhanced optimal navigation process is selected. The evaluation
framework is depicted in Fig. 3. process is conducted through the value function, as outlined
below:

𝑉(𝑠) = 𝐸𝜋 [∑ 𝛾 𝑡 𝑟𝑡 ] (10)
𝑡=0

where 𝑉(𝑠) is the value function at the corresponding state 𝑠.


𝐸𝜋 is the average expectation under policy 𝜋 mapped from
the state. 𝛾 ∈ [0,1) is the discount factor. The reward at state
t is denoted as rt. The function 𝑉(𝑡) is calculated to compare
the actual rewards, select the advantages (𝐴) and participate
in calculating the Critic’s loss:
𝐴 = 𝑅𝑒𝑤𝑎𝑟𝑑 − 𝑉(𝑡) (11)
Fig. 3. The architecture of RRTstar-RL algorithm

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method
Journal of Robotics and Control (JRC) ISSN: 2715-5072 2048

Therefore, the simulation of probability distributions and Utilizing the established and trained environment, the
maps in the figure is illustrated in Fig. 4 and Fig. 5, authors assess and contrast the trajectories produced by the
respectively. proposed model with those generated by the RRTstar
algorithm. The findings are presented in Fig. 7.

Fig. 4. Probability map

Fig. 7. Comparison of the path results calculated from the proposed method
(blue) with the RRT-Star algorithms

Based on the probability of regions in the map, the


Fig. 5. The training scenarios in the dataset (above) and the trained weight
proposed RRTstar-RL algorithm produces significantly
maps (below). With a color scale from light to dark indicating accessible shorter trajectories than the RRTstar algorithm. In Fig. 8, the
regions (light) and slow collision probability regions (dark) paths generated based on the reinforcement learning model
have an average length of 11.323 meters. While the
The correlation between values and optimal completion experiments with RRTstar [61] have an average lenght of
rates during the pathfinding process facilitates the rapid 15.74 meters and the traditional RRT algorithm [62] has a
convergence of the RRT-star algorithm. The integration of length of 14.4 meters. It is clear that the proposed RRTstar-
learned data minimizes the formation of redundant and RL algorithm produces paths that are closer to the ideal path
inefficient search trees. Ultimately, this leads to the than the unimproved approaches. In the cases of densely
identification of the optimal path from the initial point to the obstructed maps, RRTstar sometimes fails to find any path
target destination. In conclusion, the authors employ a RL from the starting point to the destination while the proposed
framework in conjunction with RRT-star, which offers model almost does not record the cases of not finding a path.
significant advantages, including the reduction of inefficient The authors conclude that the use of reinforcement learning
sampling, the extraction of map features that integrate trained combined with the RRTstar algorithm significantly improves
data into the pathfinding process within the environment, and the path finding efficiency in diverse environmental
an acceleration of processing time due to a decreased scenarios. Some potential and advantages of integrating the
necessity for sample review. proposed method into path planning problems for
autonomous systems, especially intelligent mechatronic
III. RESULTS AND DISCUSSION
systems [63], [64].
The experimental robot model is a three-wheeled mobile
robot equipped with Lidar and computer, in Fig. 6. Visual analyses employing metrics such as computation
time, path length, and optimality have been systematically
conducted to demonstrate the superior performance of the
proposed method. Notably, the optimization of computation
time, particularly when utilizing pre-trained data,
significantly accelerates the path-finding process.
Specifically, the RRTstar-RL algorithm generates the optimal
path 2.02 times faster than RRTstar and 1.6 times faster than
RRT. Furthermore, improvements in path length have been
empirically validated through comparative tables. Based on
the referenced studies, the authors conclude that integrating
Fig. 6. Practical three wheeled mobile robot reinforcement learning within the RRTstar framework

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method
Journal of Robotics and Control (JRC) ISSN: 2715-5072 2049

substantially enhances path finding efficiency across diverse spline, which contributes to the stability of the trajectory
environments, yielding shorter travel distances and tracking process and minimizes the error associated with the
significantly reduced computation times, while effectively three-wheel mobile robot’s turning angles, in Fig. 9.
optimizing parameter utilization. The reinforcement learning
model processes 5.8 million parameters and directly outputs
the optimal movement trajectory, a procedure considerably
faster than traditional path planning algorithms that rely on
exhaustive sampling and map-based path searches.
Additionally, the relatively small computational footprint of
RRTstar-RL facilitates its deployment on resource-
constrained systems with minimal computational overhead.

Fig. 9. The processing of the calculated trajectories is based on the proposed


RRTstar-RL algorithm

Finally, to enhance the efficacy of implementing the


petal-shaped complex trajectory tracking control process in
practical settings. Fig. 10 illustrates that, utilizing the
navigation plan derived from the proposed RRTstar-RL
algorithm, the three-wheeled mobile robot demonstrates
stable movement and precise navigation to each peak of the
petal trajectory. This is achieved while maintaining stable
trajectory tracking and minimizing error, as indicated by the
black line during its motion.

Fig. 10. Three-wheel mobile robot’s three-petal trajectory tracking process


Fig. 8. Comparison of the proposed RRT-star-RL algorithm with other
methods based on the metrics: computing time, path length and optimality IV. CONCLUSIONS
The paper proposes an efficient and optimal path planning
In order to enhance the accuracy of the proposed approach for WMRs, which incorporates collision avoidance
methodology, the RRTstar-RL algorithm has been developed strategies and smoothing techniques to identify the most
to improve the efficiency of data processing within dynamic effective navigation route. The proposed hybrid path
robotic environments. This algorithm is designed to planning framework integrates an enhanced RRTstar
effectively plan paths that ensure successful navigation to algorithm with a reinforcement learning methodology. The
designated destinations while eliminating superfluous path length achieved with the proposed algorithm decreased
waypoints, thereby achieving the shortest possible path by 28% and 21% compared to the RRT and RRTstar
distance. Furthermore, the RRTstar-RL algorithm is algorithms respectively. Furthermore, the optimization of
integrated with a trajectory smoothing technique utilizing B-

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method
Journal of Robotics and Control (JRC) ISSN: 2715-5072 2050

computational time, particularly when utilizing pre-trained Sustainable Future: Conceptual Framework, Scenarios, and
Multidiscipline Perspectives, pp 275-285, 2024, doi: 10.1007/978-3-
data, substantially accelerates the path-finding calculation 031-65656-9_28.
process. Notably, the time required to generate the optimal
[16] V. T. Nguyen, N. N. Bui, D. M. C. Tran, P. X. Tan, and T. V. Dang,
path using the RRTstar-RL algorithm is 2.02 times faster than “FDE-Net: Lightweight Depth Estimation for Monocular Cameras,”
that of RRTstar and 1.6 times faster than RRT. Ultimately, The 13th International Symposium on Information and Communication
the proposed RRTstar-RL path planning algorithm has been Technology (SOICT 2024), pp. 3-13, 2025, doi: 10.1007/978-981-96-
successfully validated for feasibility and effectively fulfills 4282-3_1.
multiple objectives established during simulation and [17] T. V. Dang, V. D. Ngo, M. Q. Ngo, and N. T. Bui, “OD-CT3D: Object
Detection Model based on CentreTrack 3D for Mobile Robot Global
validation experiments. The challenges of optimizing the Path Planning,” 5th International Conference on Intelligent Systems &
model to achieve fast inference speed, minimizing Networks, 2025.
computational parameters are the premise for future jobs. [18] Y. Zheng et al., “Adaptive fuzzy sliding mode control of uncertain
nonholonomic wheeled mobile robot with external disturbance and
REFERENCES actuator saturation,” Information Sciences, vol. 663, no. 120303, 2024.
[1] V. T. Nguyen, D. N. Duong, and T. V. Dang, “Optimal Two-Wheeled [19] Y. Wu and W. Yu, “Asymptotic tracking control of uncertain
Self-Balancing Mobile Robot Strategy of Navigation using Adaptive nonholonomic wheeled mobile robot with actuator saturation and
Fuzzy controller-based KD-SegNet,” Intelligent Service Robotics, pp. external disturbances,” Neural Computing and Applications, vol. 32,
1-25, 2025, doi: 10.1007/s11370-025-00606-0. no. 2, 2020, doi: 10.1007/s00521-019-04373-9.
[2] T. T. Le, K. K. Phung Cong, and T. V. Dang, “An Improved Coverage [20] A. D. Nguyen, T. D. Vu, Q. A. Vu, and T. V. Dang, “Research on
Path planning for Service robots based on Backtracking method,” MM Modeling and Object Tracking for Robot Arm based on Deep
Science Journal, vol. 10, pp. 7464-7468, 2024, doi: Reinforcement Learning,” MM Science Journal, vol. 6, pp. 8459-8463,
10.17973/MMSJ.2024_10_2024063. 2025, doi: 10.17973/MMSJ.2025_06_2025059.
[3] A. Abadi et al., “Robust Tracking Control of Wheeled Mobile Robot [21] B. Shi et al., “An intelligence enhancement method for USV navigation
Based on Differential Flatness and Sliding Active Disturbance visual measurement based on variable gradient soft-threshold
Rejection Control: Simulations and Experiments,” Sensor, vol. 24, no. correction,” Measurement, vol. 242, no. 116201, 2025, doi:
2849, 2024, doi: 10.3390/s24092849. 10.1016/[Link].2024.116201.
[4] M. Y. Silaa, A. Bencherif, and O. Barambones, “Indirect Adaptive [22] T. V. Dang, “Optimization Hybrid Path Planning based on A-star
Control Using Neural Network and Discrete Extended Kalman Filter Algorithm combining with DWA,” MM Science Journal, vol. 10, pp.
for Wheeled Mobile Robot,” Actuators, vol. 13, no. 51, 2024, doi: 7551-7555, 2024, doi: 10.17973/MMSJ.2024_10_2024077.
10.3390/act13020051. [23] T. V. Dang and D. S. Nguyen, “Optimal Navigation Based on
[5] J. Lin et al., “Combined Localization Method for Multimodal Wheel- Improved A* Algorithm for Mobile Robot,” Intelligent Systems and
Track Robots in Sheltered Space,” IEEE Access, vol. 12, pp. 47271- Networks, pp. 574-580, 2023, doi: 10.1007/978-981-99-4725-6_68.
47282, 2024, doi: 10.1109/ACCESS.2024.3364068. [24] T. P. Nguyen, H. Nguyen, and N. Q. T. Ha, “Towards sustainable
[6] B. Kazed and A. Guessoum, “A Lyapunov based posture controller for scheduling of a multi-automated guided vehicle system for collision
a differential drive mobile robot,” IAES International Journal of avoidance,” Computers and Electrical Engineering, vol. 120, no.
Robotics and Automation (IJRA), vol. 13, no. 1, pp. 1-10, 2024, doi: 109824, 2024, doi: 10.1016/[Link].2024.109824.
10.11591/ijra.v13i1. [25] C. C. Huang, C. H. Huang, and J. S. Shaw, “Development of an AMR
[7] S. Sachan and P. M. Pathak, “Addressing unpredictable movements of Applying Cartographer Combined with Visual Odometry for
dynamic obstacles with deep reinforcement learning to ensure safe Navigation,” Journal of Applied Science and Engineering, vol. 28, no
navigation for omni-wheeled mobile robot,” Proc. I. Mech. E. Part C: 1, pp. 35-40, 2024, doi: 10.6180/jase.202501_28(1).0004.
J. Mechanical Engineering Science, vol. 239, no. 4, pp. 1267-1293, [26] Z. Lin, L. Lu, Y. Yuan, and H. Zhao, “A novel robotic path planning
2024, doi: 10.1177/09544062241281115. method in grid map context based on D* lite algorithm and deep
[8] S. Yang et. al., “A RISE-based asymptotic prescribed performance learning,” J. Circuits Syst. Comput., vol. 33, no. 4, p. 2450057, 2023,
trajectory tracking control of two-wheeled self-balancing mobile doi: 10.1142/S0218126624500579.
robot,” Nonlinear Dyn., vol. 112, pp. 15327-15348, 2024, doi: [27] L. Liu et al., “Global dynamic path planning fusion algorithm
10.1007/s11071-024-09569-w. combining jump-A* Algorithm and dynamic window approach,” IEEE
[9] H. Xue, S. Lu, and C. Zhang, “An Adaptive Control Based on Access, vol. 9, pp. 19632-19638, 2021, doi:
Improved Gray Wolf Algorithm for Mobile Robots,” Applied Sciences, 10.1109/ACCESS.2021.3052865.
vol. 14, no. 7092, 2024, doi: 10.3390/app14167092. [28] Z. Xunyu, T. Jun, H. Huosheng, and P. Xiafu, “Hybrid path planning
[10] C. Li and Z. Li, “Dynamic Modeling and Disturbance-Observer- based on safe A* Algorithm and adaptive window approach for mobile
Enhanced Control for Mecanum-Wheeled Vehicles Under Load and robot in large-scale dynamic environment,” J. Intell. Rob. Syst., vol.
Noise Disturbance,” Mathematics, vol. 13, no. 789, 2025, doi: 99, no. 2, pp. 65-77, 2020, doi: 10.1007/s10846-019-01112-z.
10.3390/math13050789. [29] C. C. Hsu, Y. J. Chen, M. C. Lu, and L. S. An, “Hybrid path planning
[11] T. V. Dang and N. T. Bui, “Multi-Scale Fully Convolutional Network- incorporating global and local search for Mobile robot,” Conference
Based Semantic Segmentation for Mobile Robot Navigation,” Towards Autonomous Robotic Systems, vol. 7429, 2012, doi:
Electronics, vol. 12, no. 3, p. 533, 2023, doi: 10.1007/978-3-642-32527-4_50.
10.3390/electronics12030533. [30] M. Imran and F. Kunwar, “A Hybrid path planning technique
[12] T. V. Dang and N. T. Bui, “Obstacle Avoidance Strategy for Mobile developed by integrating global and local path planner,” 2016
Robot based on Monocular Camera,” Electronics, vol. 12, no. 8, p. International Conference on Intelligent Systems Engineering (ICISE),
1932, 2023, doi: 10.3390/electronics12081932. pp. 118-122, 2016, doi: 10.1109/INTELSE.2016.7475172.
[13] H. T. Linh and T. V. Dang, “An ultra-fast Semantic Segmentation [31] L. S. Liu et al., “Path planning for smart car based on Dijkstra
Model for AMR's Path Planning,” Journal of Robotics and Control, algorithm and dynamic window approach,” Wirel. Commun. Mob.
vol. 4, no. 3, pp. 424-430, 2023, doi: 10.18196/jrc.v4i3.18758. Comput., vol. 4, pp. 1-12, 2021, doi: 10.1088/10.1155/2021/8881684.
[14] T. V. Dang and N. T. Bui, “Design the Abnormal Object Detection [32] R. Song, Y. Liu, and R. Bucknall, “Smoothed A∗ algorithm for
System Using Template Matching and Subtract Background practical unmanned surface vehicle path planning,” Appl. Ocean Res.,
Algorithm,” Proceedings of the 3rd Annual International Conference vol. 83, no. 6, pp. 9-20, 2019, doi: 10.1016/[Link].2018.12.001.
on Material, Machines and Methods for Sustainable Development, pp. [33] L. Zhao, G. Li, and H. Zhang, “Global and Local Awareness: Combine
87-95, 2024, doi: 10.1007/978-3-031-57460-3_10. Reinforcement Learning and Model-Based Control for Collision
[15] T. V. Dang, N. N. Bui, and N. T. Bui, “Binary-SegNet: Efficient Avoidance,” IEEE Open Journal of Intelligent Transportation Systems,
Convolutional Architecture for Semantic Segmentation Based on vol. 5, pp. 422-432, 2024, doi: 10.1109/OJITS.2024.3424587.
Monocular Camera,” From Smart City to Smart Factory for

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method
Journal of Robotics and Control (JRC) ISSN: 2715-5072 2051

[34] Y. Zhu, J. Zhu, and P. Zhang, “Local obstacle avoidance control for [49] T. V. Dang and P. X. Tan, “Hybrid Mobile Robot Path Planning Using
multi-axle and multi-steering-mode wheeled robot based on window- Safe JBS-A*B Algorithm and Improved DWA Based on Monocular
zone division strategy,” Robotics and Autonomous Systems, vol. 183, Camera,” Journal of Intelligent & Robotic Systems, vol. 110, no. 151,
no. 104843, 2025, doi: 10.1016/[Link].2024.104843. pp. 1-21, 2024, doi: 10.1007/s10846-024-02179-z.
[35] L. Xiang, X. Li, H. Liu, and P. Li, “Parameter Fuzzy Self-Adaptive [50] H. Esmaiel, G. Zhao, Z. A. Qasem, J. Qi, and H. Sun, “Double-Layer
Dynamic Window Approach for Local Path Planning of Wheeled RRT* Objective Bias Anytime Motion Planning Algorithm,” Robotics,
Robot,” IEEE Open Journal of Intelligent Transportation Systems, vol. vol. 13, no. 41, 2024, doi: 10.3390/robotics13030041.
3, pp. 1-6, 2021, doi: 10.1109/OJITS.2021.3137931. [51] H. Wang, X. Zhou, and J. Li, “Improved RRT* Algorithm for
[36] M. Kobayashi and N. Motoi, “Local Path Planning: Dynamic Window Disinfecting Robot Path Planning,” Sensors, vol. 24, no. 1520, 2024,
Approach with Virtual Manipulators Considering Dynamic Obstacles,” doi: 10.3390/s24051520.
IEEE Access, vol. 10, pp. 17018-17029, 2022, doi: [52] Q. Zhang, Y. Liu, and J. Qin, “An Informed-Bi-Quick RRT* Algorithm
10.1109/ACCESS.2022.3150036. Based on Offlne Sampling: Motion Planning Considering Multiple
[37] M. Kobayashi, H. Zuski, T. Nakamura, and N. Motoi, “Local Path Constraints for a Dual-Arm Cooperative System,” Actuators, vol. 13,
Planning: Dynamic Window Approach With Q-Learning Considering no. 75, 2024, doi: 10.3390/act13020075.
Congestion Environments for Mobile Robot,” IEEE Access, vol. 11, [53] H. Han, J. Wang, L. Kuang, X. Han, and H. Xue, “Improved Robot
pp. 96733-96742, 2023, doi: 10.1109/ACCESS.2023.3311023. Path Planning Method Based on Deep Reinforcement Learning,”
[38] M. Kramer and T. Bertram, “Improving Local Trajectory Optimization Sensor, vol. 23, no. 5622, 2023, doi: 10.3390/s23125622.
by Enhanced Initialization and Global Guidance,” IEEE Access, vol. [54] C. H. Nguyen, Q. A. Vu, K. K. Phung Cong, T. V. Dang, “Optimal
10, pp. 29633-29645, 2022, doi: 10.1109/ACCESS.2022.3159233. obstacle avoidance strategy using deep reinforcement learning based
[39] T. V. Dang, “Research and design of a path planning using an improved on stereo camera,” MM Science Journal, vol. 10, pp. 7556-7561, 2024,
RRT* algorithm for an autonomous mobile robot,” MM Science doi: 10.17973/MMSJ.2024_10_2024078.
Journal, vol. 10, pp. 6712-6716, 2023, doi: [55] C. Copurkaya, E. Meriç, F. P. Akbulut, and C. Catal, “A multi-
10.17973/MMSJ.2023_10_2023051. pretraining U-Net architecture for semantic segmentation,” Signal
[40] T. V. Dang, “Autonomous mobile robot path planning based on Image and Video Processing, vol. 19, no. 8, 2025, doi:
enhanced A* algorithm integrating with time elastic band,” MM 10.1007/s11760-025-04125-4.
Science Journal, vol. 10, pp. 6717-6722, 2023, doi: [56] K. Rezaee et al., “Hand gestures classification of sEMG signals based
10.17973/MMSJ.2023_10_2023052. on BiLSTM-metaheuristic optimization and hybrid U-Net-
[41] T. V. Dang, D. S. Nguyen, and N. T. Bui, “Hybrid Path Planning for MobileNetV2 encoder architecture,” Scientific Reports, vol. 14, no. 1,
Mobile Robot based on Improved A* Fusion Dynamic Window 2024, doi: 10.1038/s41598-024-82676-1.
Approach,” Proceedings of the International Conference on Intelligent [57] Q. Zhu et al., “A study on expression recognition based on improved
Systems and Networks, pp. 82-88, 2024, doi: 10.1007/978-981-97- mobilenetV2 network,” Scientific Reports, vol. 14, no. 1, 2024, doi:
5504-2_10. 10.1038/s41598-024-58736-x.
[42] Y. Sun et al., “Local Path Planning for Mobile Robots Based on Fuzzy [58] Z. Li, “Enhancing Tea Leaf Disease Identification with Lightweight
Dynamic Window Algorithm,” Sensor, vol. 23, no. 8260, 2023, doi.: MobileNetV2,” Computers, Materials & Continua, vol. 80, no. 1, pp.
10.3390/ s23198260 679-694, 2024, doi: 10.32604/cmc.2024.051526.
[43] Y. Chen, G. Bai, Y. Zhan, X. Hu, and J. Liu, “The USV Based on [59] T. V. Dang, D. M. C. Tran, and P. X. Tan, “IRDC-Net: Lightweight
Improved ACO-APF Hybrid Algorithm with Adaptive Early- Semantic Segmentation Network Based on Monocular Camera for
Warning,” IEEE Access, vol. 9, pp. 40728-40742, 2021, doi: Mobile Robot Navigation,” Sensors, vol. 23, no. 15, p. 6907, 2023. doi:
10.1109/ACCESS.2021.3062375. 10.3390/s23156907
[44] Y. Ji, L. Ni, C. Zhao, C. Lei, and Y. Du, “TriPField: A 3D Potential [60] T. V. Dang, X. T. Phan, and N. N. Bui, “KD-SegNet: Efficient
Field Model and Its Applications to Local Path Planning of Semantic Segmentation Network with Knowledge Distillation Based
Autonomous Vehicles,” IEEE Transactions on Intelligent on Monocular Camera,” Computers, Materials & Continua, vol. 82,
Transportation Systems, vol. 24, no. 3, 2023, doi: no. 2, pp. 2001-2026, doi: 10.32604/cmc.2025.060605.
10.1109/TITS.2022.3231259.
[61] M. Zhang, S. Liu, Q. Zhou, and X. Han, “A novel path planning scheme
[45] D. Li, W. Yin, W. E. Wong, M. Jian, and M. Chau, “Quality-Oriented based on Fast-IBi-RRT* algorithm for industrial robots,” Applied
Hybrid Path Planning Based on A∗ and Q-Learning for Unmanned Intelligence, vol. 55, no. 11, 2025, doi: 10.1007/s10489-025-06694-w.
Aerial Vehicle,” IEEE Access, vol. 10, pp. 7664-7674, 2021, doi:
[62] S. Lei et al., “Research on improved RRT path planning algorithm
10.1109/ACCESS.2021.3139534.
based on multi-strategy fusion,” Scientific Reports, vol. 15, no. 1, 2025,
[46] C. Wang, X. Yang, and H. Li, “Improved Q-Learning Applied to doi: 10.1038/s41598-025-92675-5.
Dynamic Obstacle Avoidance and Path Planning,” IEEE Access, vol.
[63] V. T. Nguyen, C. D. Do, T. V. Dang, T. L. Bui, and P. X. Tan, “A
10, pp. 92879-92888, 2022, doi: 10.1109/ACCESS.2022.3203072.
Comprehensive RGB-D Dataset for 6D Pose Estimation for Industrial
[47] Y. Liu, C. Wang, and H. Wu, “Mobile Robot Path Planning Based on Robots Pick and Place: Creation and Real-World Validation,” Results
Kinematically Constrained A-Star Algorithm and DWA Fusion in Engineering, vol. 24, no. 103459, 2024, doi:
Algorithm,” Mathematics, vol. 11, no. 4552, 2023, doi: 10.1016/[Link].2024.103459.
10.3390/math11214552.
[64] T. V. Dang, D. M. C. Tran, N. N. Bui, and P. X. Tan, “ELDE-Net:
[48] Y. Feng, W. Zhang, and J. Zhu, “Application of an Improved A* Efficient Light-weight Depth Estimation Network for Deep
Algorithm for the Path Analysis of Urban Multi-Type Transportation Reinforcement Learning-based Mobile Robot Path Planning,”
Systems,” Appl. Sci., vol. 13, no. 13090, 2023, doi: Computers, Materials & Continua, 2025.
10.3390/app132413090.

Hoang-Long Pham, Hybrid Path Planning for Wheeled Mobile Robot Based on RRT-star Algorithm and Reinforcement
Learning Method

You might also like