Skip to content

mainloop: fix config revert persistance #831

Merged
bazsi merged 2 commits intoaxoflow:mainfrom
MrAnno:fix-reload
Oct 27, 2025
Merged

mainloop: fix config revert persistance #831
bazsi merged 2 commits intoaxoflow:mainfrom
MrAnno:fix-reload

Conversation

@MrAnno
Copy link
Contributor

@MrAnno MrAnno commented Oct 22, 2025

For example, kept-alive connections were closed during config revert.

For example, kept-alive connections were closed during config revert.

Signed-off-by: László Várady <[email protected]>
Signed-off-by: László Várady <[email protected]>
@bazsi
Copy link
Member

bazsi commented Oct 27, 2025

This looks good to me. I am not sure the macOS job failed, I am rerunning that with the suspicion that it was a glitch.

All green on my local box on Linux just like all the rest of the tests.

@bazsi bazsi merged commit 5de3ceb into axoflow:main Oct 27, 2025
26 of 27 checks passed
@HofiOne
Copy link
Contributor

HofiOne commented Oct 29, 2025

@MrAnno, @bazsi Unfortunately, this one makes things worse.

Now cfg_persist_config_move resets the old_config->persist (of course), but this happens before cfg_deinit(), which actually triggers the persist update — meaning it won’t happen at all now.

In the case of the afsocket-source (and every item that wants to update the persist), it will not be able to save its states (mainly the socket/connection if keep-alive(yes)), as the persist is already "gone".
So, currently, the keep-alive option does not take effect at all with this change.

Because of all the above, the original order

  self->old_config->persist = persist_config_new();
  cfg_deinit(self->old_config);
  cfg_persist_config_move(self->old_config, self->new_config);

was correct here, I guess.


Let’s just focus on the keep-alive() issue (the problem can occur with any contradicting option setting between the old and new config).

The long-standing issue is that setting keep-alive(no) in the new config when it was yes in the old one causes the newly initialized config sockets to fail to bind, because the old ones remain open due to keep-alive(yes).

It’s a bit of a catch-22, because either:

  • we have to close the old sockets if we detect this situation, but
  • then we lose the ability to revert the config if any later config element init fails (since the sockets are already closed)

Because of this, I prefer a solution like I am testing here.

It simply detects the above situation and processes the persist state in that case as if keep-alive() was set to yes.

Why I think this is acceptable:

  • it keeps the reversible state — config revert will still work in case of errors,
  • it only happens at the first reload — further reloads will use the new option as expected,
  • this is consistent with the opposite case (no -> yes), where the new keep-alive() setting only takes effect after it has been loaded and applied at least once. (yes, I know, in that case there is nothing to revert, but...)

So, focusing on solving the original issue, I think it’s better to prioritize the reversibility of config reloads in case of an error, and accept this slightly “strange” behavior.

Any other, better idea?

@MrAnno
Copy link
Contributor Author

MrAnno commented Oct 29, 2025

I fixed the revert mechanism, not the apply.

@HofiOne
Copy link
Contributor

HofiOne commented Oct 29, 2025

the revert might be fixed, but the original functionality of the keep-alive() is broken now, unfortunately

@MrAnno
Copy link
Contributor Author

MrAnno commented Oct 29, 2025

This PR fixed a revert issue that made keep-alive(yes) misbehave during configuration revert. With this fix, keep-alive finally behaves during revert as it is documented and nothing else got worse, at least not in AxoSyslog.

The issue you described is there for 10+ years, even in syslog-ng 3.6:

podman run -it docker.io/balabit/syslog-ng:3.6 -Fdev

[2025-10-29T13:24:46.245619] Error binding socket; addr='AF_INET(0.0.0.0:4444)', error='Address already in use (98)'
[2025-10-29T13:24:46.245644] Error initializing message pipeline;
[2025-10-29T13:24:46.245652] Error initializing new configuration, reverting to old config

If you need our help, open a separate issue.

@HofiOne
Copy link
Contributor

HofiOne commented Oct 29, 2025

probably my bad, sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants