Skip to content

Handle config package updates gracefully #10584

Merged
yhabteab merged 3 commits intosupport/2.15from
broken-config-stage-updates-215
Oct 8, 2025
Merged

Handle config package updates gracefully #10584
yhabteab merged 3 commits intosupport/2.15from
broken-config-stage-updates-215

Conversation

@yhabteab
Copy link
Copy Markdown
Member

@yhabteab yhabteab commented Oct 8, 2025

yhabteab and others added 3 commits October 8, 2025 13:33
Previously, we used a simple boolean to track the state of the package updates,
and didn't reset it back when the config validation was successful because it was
assumed that if we successfully validated the config beforehand, then the worker
would also successfully reload the config afterwards, and that the old worker would
be terminated. However, this assumption is not always true due to a number of reasons
that I can't even think of right now, but the most obvious one is that after we successfully
validated the config, the config  might have changed again before the worker was able
to reload it. If that happens, then the new worker might fail to successfully validate
the config due to the recent changes, in which case the old worker would remain active,
and this flag would still be set to true, causing any subsequent requests to fail with a
`423` until you manually restart the Icinga 2 service.

So, in order to prevent such a situation, we are additionally tracking the last time a reload
failed and allow to bypass the `m_RunningPackageUpdates` flag only if the last reload failed
time was changed since the previous request.
Once the new worker process has read the config, it also includes a
`include */include.conf` statement within the config packages root
directory, and from there on we must not allow to delete any stage
directory from the config package. Otherwise, when the worker actually
evaluates that include statement, it will fail to find the directory
where the include file is located, or the `active.conf` file, which is
included from each stage's `include.conf` file, thus causing the worker
fail.

Co-Authored-By: Johannes Schmidt <[email protected]>
@yhabteab yhabteab added this to the 2.15.1 milestone Oct 8, 2025
@yhabteab yhabteab added the area/api REST API label Oct 8, 2025
@cla-bot cla-bot bot added the cla/signed label Oct 8, 2025
@yhabteab yhabteab added the bug Something isn't working label Oct 8, 2025
@yhabteab yhabteab requested a review from julianbrost October 8, 2025 11:35
@yhabteab yhabteab merged commit 82a711c into support/2.15 Oct 8, 2025
29 checks passed
@yhabteab yhabteab deleted the broken-config-stage-updates-215 branch October 8, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api REST API bug Something isn't working cla/signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants