Skip to content

Deterministically shut down all connections on ApiListener::Stop()#10460

Merged
yhabteab merged 4 commits intomasterfrom
apilistener-shutdown-conns-on-stop
Jun 16, 2025
Merged

Deterministically shut down all connections on ApiListener::Stop()#10460
yhabteab merged 4 commits intomasterfrom
apilistener-shutdown-conns-on-stop

Conversation

@jschmidt-icinga
Copy link
Copy Markdown
Contributor

@jschmidt-icinga jschmidt-icinga commented Jun 3, 2025

Deterministically shut down all connections on ApiListener::Stop()

Currently, while the shutdown order is determined by icinga2 object priorities, the connection objects on the backend can continue to have effects on the state of the program, even after seemingly having been shut down (i.e. the Stop() method of ApiListener has been called). In practice, the connection objects will mostly stay alive until the very end when icinga2 exits and will answer all kinds of requests until then.

To mitigate some of the effects from ApiListener's associated objects after Stop() has returned, with this PR, these additional steps will be performed (in order):

  • Exit from ListenerCoroutineProc
  • Disconnect remaining endpoint and anonymous JsonRpcConnections
  • Abort HTTP API requests that are modifying the program state and reply with 503: Service unavailable

This PR relates to #10179 and implements one of the TODOs in @Al2Klimov's comment.

Exiting from ListnerCoroutineProc

ListenerCoroutineProc is stopped by closing the acceptor socket, then waiting for the completion of outstanding NewClientHandlers using a separate wait group.

Shutting down the remaining endpoint and anonymous JsonRpcConnections

Since the ListenerCoroutineProc will not take any new connections now, and m_ReconnectTimer has already been stopped, we can be sure that no endpoints are currently connecting and we can simply iterate over all endpoint objects and Disconnect() those that are still connected.

This means that some check results are still discarded when the wait group is stopped. A solution to this will require a message to be passed between the nodes and could be addressed in a future PR.

Aborting HTTP API requests

In API requests that modify or have the potential to modify the program state, the wait group will be locked and in case of potentially longer running requests, the stopped condition will be used to abort loops that iterate over config objects early and return with a 503 error code:

{
    "error": 503,
    "status": "Shutting down."
}

fixes #10179

@cla-bot cla-bot bot added the cla/signed label Jun 3, 2025
@jschmidt-icinga jschmidt-icinga added this to the 2.15.0 milestone Jun 3, 2025
@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch 4 times, most recently from 12aa63f to a5b8289 Compare June 3, 2025 13:22
@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from a5b8289 to 8a201f9 Compare June 6, 2025 08:50
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

No need to review this now. I'm still looking at a few things. I will update the PR description and add a comment here when I'm done.

@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from 8a201f9 to f38c98a Compare June 6, 2025 13:46
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

jschmidt-icinga commented Jun 6, 2025

Changes:

  • ListenerCoroutineProc is now exited by closing the socket. The wait group is still needed to track ongoing NewClientHandler() instances to be sure we hit all live connections in the following disconnects.
  • JsonRpcConnection::Disconnect() is still called immediately, but the lock on the wait group is now across the read and write coroutines outside the loop.
  • I've reduced the messages about disconnecting API clients to LogInformation. If we want LogWarning for the case this happens outside a normal shutdown, I can make them ask ApiListener first if this is a shutdown.
  • HttpHandlers now get passed an atomic boolean to signal the abort condition instead of the wait group. I've also reduced the situations where this abort is done as per the reviewer's suggestions. This is triggered in a seperate AbortProcessingResponse() method.
  • HttpServerConnection is now part of the same wait group as everything else except the one around NewClientHandler
  • I've moved the grace period timer inside DisconnectHttpConnections() and renamed them.
  • This PR now no longer adds a Cancel() method and the lock_shared() interface to WaitGroup since it is no longer needed (at the moment).

Caveats:

  • There is still a small potential for a situation where a connection is accepted in ListenerCoroutineProc and immediately closed, but I don't think it will happen often and should be harmless.
  • There is still a m_ListenerWaitGroup, and I'm afraid it's needed to wait on potentially multiple outstanding NewClientHandlers to complete. I could move this waitgroup to be local to AddListener() and be captured by the coroutines, but that would be more effort than I think is justified.
  • JsonRpcConnections may need some more attention if we want it to do what I tried to do initially, i.e. ensure that all messages other nodes have sent us will get fully processed. After thinking about this, this will at the very least involve some additional communication.

@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch 2 times, most recently from 350c91a to 7ca5994 Compare June 10, 2025 07:52
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

jschmidt-icinga commented Jun 10, 2025

Incorporated some offline suggestions and fixed a crash due to a pointer initialization I managed to "clean up" during one of the recent rebases.

Comment on lines +499 to +503
try{
acceptor->close();
} catch (const std::exception& ex) {
Log(LogCritical, "ApiListener")
<< "Failed to close acceptor socket: " << ex.what();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this block...

@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from 7ca5994 to e1e0467 Compare June 11, 2025 09:04
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

Again I've simplified and reduced the scope of the PR, according to the suggestion of @Al2Klimov. I've also think I've addressed the remaining comments that still apply.

Comment on lines +71 to +74
std::shared_lock wgLock(*m_WaitGroup, std::try_to_lock);
if (!wgLock) {
return;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this lock held not only for the message processing itself but also during socket operations, this raises a question: are these operations unblocked within a reasonable time (very few seconds) during shutdown?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing, yes. In the previous iteration of the PR I was scared this would be a similar situation as the async_write() in HttpServerConnection blocking the shutdown. But unlike there, here the coroutine mostly sits in the async_read() without actually making progress (i.e. giving other operations a chance to run) or actively reading the rather small RPC messages (like check results).

Then again, the same corner cases could arise. Large messages over very, very slow connections. But even in that case, I think it would be better to let the message finish than to interrupt reading it in the middle, even if it takes a second or two.

To move around this, acquiring the lock somewhere else isn't even enough, I would have to pass something into NetString::ReadStringFromStream to interrupt reading the message, which I really don't want to do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this some more, unlike in the previous iteration of the PR (which had the WaitGroup::Cancel() in a timeout), I could now get away with just locking the WaitGroup per iteration, after the read. But again... it depends on what we want... do we want to shut down fast or do we want to make sure things we started get finished... We can't have this cake and eat it too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do we want to make sure things we started get finished

That just depends on your interpretation of what counts as started. I'd totally fine with only counting the message processing itself, no the read operation.

The other extreme (I haven't checked in detail yet if something like this could actually happen) would be the remote crashes so that the connection just hangs in the middle of reading a single JSON-RPC message. In that situation, it shouldn't happen that the shutdown blocks until the CheckLiveness() coroutine kills the connection.

The nicest option would be to signal that we want to shut down to the connection, process all outstanding messages and then cleanly shut down the connection, with an additional quick timeout that would kill the connection if this doesn't happen within a few connection and only waits for the processing of the current message then (so that if processing of a message started, it also finishes). However, I don't see how an implementation without introducing new cluster messages to notify the peer about the intent to shut down would work any better than more or less randomly selecting a last message after which the connection is closed which should be pretty much what you get when locking around each MessageHandler() individually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how an implementation without introducing new cluster messages [...]

I get that and also commented on it here (last point).

I'd totally fine with only counting the message processing itself, no the read operation.

Alright, so I'd move to lock to the top of MessageHandler(). Is that fine? Should the lock in WriteOutgoingMessages() stay then or should I rather remove it entirely? There's a timeout in Disconnect(), but I guess that can fail to cancel the operation too under certain conditions.

@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from e1e0467 to 44c902d Compare June 13, 2025 07:44
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

I've pushed and update that should address the remaining comments, with the exception of the wait group lock in JsonRpcConnection::WriteOutgoingMessages, which I'm still not sure about.

Another change is that I'm now using a boost::signal+post instead of the post+coroutine I used to stop the listener before. I've also moved the lock inside the loop because I had after all encountered rare situations where, like @julianbrost warned, the operation_aborted error was not thrown and the socket was closed after async_accept succeeded, trapping ListenerCoroutineProc in an endless loop of bad_file_descriptor exceptions. Asio coroutines are a lot of fun.

@julianbrost
Copy link
Copy Markdown
Member

the operation_aborted error was not thrown and the socket was closed after async_accept succeeded, trapping ListenerCoroutineProc in an endless loop of bad_file_descriptor exceptions

Did this just happen randomly? Were you able to make this happen more reliably by creating some specific scenario?

I've also moved the lock inside the loop

Does that prevent the bad_file_descriptor situation itself or just the endless loop as that locking operation acts pretty much like a loop condition (like for example while (acceptor->is_open()) or while (m_ListenerWaitGroup->IsLockable()))?

@julianbrost
Copy link
Copy Markdown
Member

Sorry, I forgot to address the following in a previous comment/review:

with the exception of the wait group lock in JsonRpcConnection::WriteOutgoingMessages, which I'm still not sure about.

For me, it's pretty similar to the one in the read loop and don't add unless:

  1. You can reason that the socket operations that are done while it's held are unblocked quickly on shutdown; or
  2. You say it's needed for correctness, but in that case we need to ensure that that socket operations are unblocked at shutdown.

@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

Did this just happen randomly? Were you able to make this happen more reliably by creating some specific scenario?

I think it's the same situation I've observed in #5581, with the coroutine yielding again when the async operation is actually already done. Here, this would mean closing the socket comes too late to make async_accept() throw, but obviously still closes the socket. I thought about alternatively checking if the wait group is not lockable anymore when catching the bad_file_descriptor exception, but I liked this option better and it should avoid the exception entirely, at least I moved the lock.

that locking operation acts pretty much like a loop condition

Well, it acts as locking and a loop condition. I can also keep it outside the loop and just do while (IsLockable()) if you prefer it that way. Wouldn't even need to check the shared_lock if we use it as a loop condition.

@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

2. You say it's needed for correctness, but in that case we need to ensure that that socket operations are unblocked at shutdown.

We can ensure they are unblocked, but not guarantee it's within a reasonable time, even when the 5s timeout in Disconnect() fires, the async_write() will not yield on very slow but steady connections, same as in HttpServerConnection. I'll just remove it.

@julianbrost
Copy link
Copy Markdown
Member

julianbrost commented Jun 13, 2025

the same situation I've observed in #5581

Is that really the correct PR you wanted to reference?

I was asking primarily because I didn't understand how changing the scope of the locking should solve bad file descriptor errors and if these appear randomly, that's always a hint that something could be wrong with the code.1

In the current state of the PR, I think there could still be a small race condition in that respect:

void ApiListener::StopListener()
{
m_OnListenerShutdown();
m_ListenerWaitGroup->Join();
}

The accepting socket is already closed by line 521, but the wait group you check before starting a new async_accept() is only signaled to stop in line 523, so couldn't that still start one (or even a few) more accept(s) until the signal from the wait group propagates to the listener?

std::shared_lock wgLock(*m_ListenerWaitGroup, std::try_to_lock);
if (!wgLock) {
return;
}
asio::ip::tcp::socket socket (io);
server->async_accept(socket.lowest_layer(), yc);

So I think while (server->is_open()) might even be the more correct option as that's already synchronized with the corresponding close() call by the use of the common strand for both.

Footnotes

  1. If a syscall actually returns EBADFD that could be a hint that a fd number is used after it was closed. At least in a multi-threaded application, instead of receiving that error, there's a chance that the fd number could have been reused already and instead of getting an error, the operation could also have been performed on a random other file instead. However, with Asio in between, that also maps operations on closed socket objects (which wrap a fd) to that error without ever issuing a syscall so that use-after-close isn't an issue (unless you have code that reuses the Asio socket object).

@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from 44c902d to fb81054 Compare June 13, 2025 10:39
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

Is that really the correct PR you wanted to reference?

Sorry, no. I screwed up the auto complete. I meant #10139.

and if these appear randomly, that's always a hint that something could be wrong with the code.

I mean, it didn't appear randomly, it appeared sporadically when due to timing closing the socket didn't cancel an ongoing async_accept().

Anyway, I've just pushed yet another update.

Note: I just noticed that ModifyObjectsHandler currently always sends http::status::ok as a result, despite other error conditions existing. I'm not going to change that here, but maybe doing the same as DeleteObjectsHandler would be a good thing.

@julianbrost
Copy link
Copy Markdown
Member

I mean, it didn't appear randomly, it appeared sporadically

Randomly in the sense that I don't understand the pattern and it looks random to me. ;)

Just remove "randomly" from my sentence. seeing a bad file descriptor error anywhere is rarely a good sign.

Copy link
Copy Markdown
Member

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be getting close to the finish line now as I'm getting more into some details and a bit of code style. Overall functionality should be good from looking at the code, I'll give it a test later today.

@Al2Klimov @yhabteab I only expect some small changes here, so in case you want to take another look, now (or after the next push) would be a good time.

The wait group gets passed to HttpServerConnection, then down to the
HttpHandlers. For those handlers that modify the program state, the
wait group is locked so ApiListener will wait on Stop() for the
request to complete. If the request iterates over config objects,
a further check on the state of the wait group is added to abort early
and not delay program shutdown. In that case, 503 responses will be
sent to the client.

Additionally, in HttpServerConnection, no further requests than the
one already started will be allowed once the wait group is joining.
@jschmidt-icinga jschmidt-icinga force-pushed the apilistener-shutdown-conns-on-stop branch from fb81054 to 82bb636 Compare June 13, 2025 12:51
@jschmidt-icinga
Copy link
Copy Markdown
Contributor Author

jschmidt-icinga commented Jun 13, 2025

Fixed the trailing whitespace, added the type and name fields, and moved the new Dictionary directly into emplace_back

Copy link
Copy Markdown
Member

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@Al2Klimov @yhabteab I only expect some small changes here, so in case you want to take another look, now (or after the next push) would be a good time.

Not merging yet until I get some feedback there.

@Al2Klimov Al2Klimov self-requested a review June 13, 2025 14:53
@yhabteab
Copy link
Copy Markdown
Member

yhabteab herewith dismisses all his requested changes :)! Just joking - I didn't request anything explicitly here on purpose, but submitted only few comments, so my review shouldn't block this in anyway. Since Alex already requested a review from himself, it should be fine when both of you approve it!

@Al2Klimov Al2Klimov removed their request for review June 16, 2025 14:50
@yhabteab
Copy link
Copy Markdown
Member

My over loaded docker setup found nothing suspicious, so this should be fine. Thanks to all of you 🎉!

@yhabteab yhabteab merged commit dcef22e into master Jun 16, 2025
25 checks passed
@yhabteab yhabteab deleted the apilistener-shutdown-conns-on-stop branch June 16, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Icinga2 looses state history and notifications during restart

4 participants