Bug report
Required Info:
- Operating System:
- Computer:
- ROS2 Version:
- Version or commit hash:
- DDS implementation:
Steps to reproduce issue
When launching an application that is using the lifecycle manager with multiple lifecycle nodes, a deadlock can be triggered.
If during the bringup of the lifecycle nodes, a shutdown is initiated(CTRL+C), the bringup and shutdown sequences of the lifecycle manager will run at the same time.
What then happens is a race condition that could result in the bringup logic deadlocking on spin_until_future_complete of one of the nodes, that will never be completed because shutdown has already been initiated.
Right here:
|
if (callback_group_executor_.spin_until_future_complete(future_result) != |
Expected behavior
Expected behavior is that the bringup sequence will be aborted, and all the nodes shutdown.
On the jazzy branch I have added a stop() function in service_client.hpp. This stop function cancels any running spin* functions on service_client's internal executor.
void stop()
{
if (client_)
{
callback_group_executor_.cancel();
}
}
If we then call this stop function in the destructor of LifecycleServiceClient (lifecycle_service_client.hpp), the deadlock is avoided.
~LifecycleServiceClient()
{
change_state_.stop();
};
I have not tested this on Kilted or Main. I see there have been changes to the lifecycle manager after Jazzy, I think the deadlock is still in there.
Bug report
Required Info:
Steps to reproduce issue
When launching an application that is using the lifecycle manager with multiple lifecycle nodes, a deadlock can be triggered.
If during the bringup of the lifecycle nodes, a shutdown is initiated(CTRL+C), the bringup and shutdown sequences of the lifecycle manager will run at the same time.
What then happens is a race condition that could result in the bringup logic deadlocking on
spin_until_future_completeof one of the nodes, that will never be completed because shutdown has already been initiated.Right here:
navigation2/nav2_util/include/nav2_util/service_client.hpp
Line 117 in 3109c3a
Expected behavior
Expected behavior is that the bringup sequence will be aborted, and all the nodes shutdown.
On the jazzy branch I have added a
stop()function inservice_client.hpp. This stop function cancels any running spin* functions on service_client's internal executor.If we then call this stop function in the destructor of LifecycleServiceClient (
lifecycle_service_client.hpp), the deadlock is avoided.I have not tested this on Kilted or Main. I see there have been changes to the lifecycle manager after Jazzy, I think the deadlock is still in there.