Skip to content

fix: Reload the rspamd service instead of restarting it#4632

Merged
casperklein merged 12 commits intodocker-mailserver:masterfrom
scristian71:fix-rspamd-dkim-restart
Jan 11, 2026
Merged

fix: Reload the rspamd service instead of restarting it#4632
casperklein merged 12 commits intodocker-mailserver:masterfrom
scristian71:fix-rspamd-dkim-restart

Conversation

@scristian71
Copy link
Copy Markdown
Contributor

@scristian71 scristian71 commented Jan 3, 2026

Description

Context: #4579 (comment)

Fixes the failure by requesting Rspamd reload configuration via sending a HUP signal to the main rspamd process. supervisorctl restart rspamd was prone to a race condition causing tests to fail and sometimes putting the rspamd service into a bad state.

As per review below, the decision was to switch send the signal via supervisorctl signal <signal> <service name>.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • [] My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary, I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have added information about changes made in this PR to CHANGELOG.md

@polarathene
Copy link
Copy Markdown
Member

You should provide more context with the PR, appears to be as identified here: #4579 (comment)

End of that comment I am unsure of the current PR approach resolving the problem properly due to change detection still being racey (although the issue should be less frequently encountered by dropping the redundant restart). I proposed for change detection to be refactored with adopting watchexec in future that would resolve the concern.

@scristian71
Copy link
Copy Markdown
Contributor Author

In rspamd is restarted twice, once in _setup_default_signing_conf and the second time in _final_steps as it is mentioned in the referenced issue.
In my case I have failed tests:
https://github.com/scristian71/docker-mailserver/actions/runs/20618435610
And also if I run locally "[Rspamd] (DKIM)" I had random failures.

@polarathene
Copy link
Copy Markdown
Member

polarathene commented Jan 6, 2026

TL;DR:

  • Please try and verify the following change suggestions (changing from restart to reload via SIGHUP):
    • Change:
      function _final_steps() {
      # We need to restart Rspamd so the changes take effect immediately.
      if ! supervisorctl restart rspamd; then
      _log 'warn' 'Could not restart Rspamd via Supervisord'
      fi
      _log 'trace' 'Finished DKIM key creation'
      }

      to:
      function _final_steps() {
        # Request Rspamd to reload it's config immediately (instead of waiting on change detection):
        pkill --signal SIGHUP $(pgrep --oldest rspamd)
      
        _log 'trace' 'Finished DKIM key creation'
      }
    • Change:
      _log 'debug' 'Rspamd configuration has changed - restarting service'
      supervisorctl restart rspamd

      to:
      _log 'debug' 'Reloading Rspamd configuration'
      pkill --signal SIGHUP $(pgrep --oldest rspamd)

In my case I have failed tests:
https://github.com/scristian71/docker-mailserver/actions/runs/20618435610
And also if I run locally "[Rspamd] (DKIM)" I had random failures.

We recently landed a change that improved the rspamd test to avoid a failure that started appearing.

EDIT: I see your test run was from the merge commit of that PR, so the same spurious failure detailed in #4579 and referenced in #4627 (comment)

Change detection is presently at a default 2s poll rate, Rspamd can take a few seconds to complete a restart IIRC. So depending on how close the two restarts are from change detection and the setup config dkim script, any of the relevant rspamd_dkim.bats test cases may still fail.

EDIT: Sending a SIGHUP signal to Rspamd for it to reload config might avoid the concern.


In rspamd is restarted twice, once in _setup_default_signing_conf and the second time in _final_steps as it is mentioned in the referenced issue.

Yes, but it will still be possible that it is triggered and causes this failure even with your PR here.

Here is our change detection logic for rspamd support, it will detect the DKIM files created, then at the end of the snippet restart the rspamd service:

function _rspamd_changes() {
# RSPAMD_DMS_D='/tmp/docker-mailserver/rspamd'
if [[ ${CHANGED} =~ ${RSPAMD_DMS_D}/.* ]]; then
# "${RSPAMD_DMS_D}/override.d"
if [[ ${CHANGED} =~ ${RSPAMD_DMS_OVERRIDE_D}/.* ]]; then
_log 'trace' 'Rspamd - Copying configuration overrides'
rm "${RSPAMD_OVERRIDE_D}"/*
cp "${RSPAMD_DMS_OVERRIDE_D}"/* "${RSPAMD_OVERRIDE_D}"
fi
# "${RSPAMD_DMS_D}/custom-commands.conf"
if [[ ${CHANGED} =~ ${RSPAMD_DMS_CUSTOM_COMMANDS_F} ]]; then
_log 'trace' 'Rspamd - Generating new configuration from custom commands'
_rspamd_handle_user_modules_adjustments
fi
# "${RSPAMD_DMS_D}/dkim"
if [[ ${CHANGED} =~ ${RSPAMD_DMS_DKIM_D} ]]; then
_log 'trace' 'Rspamd - DKIM files updated'
fi
_log 'debug' 'Rspamd configuration has changed - restarting service'
supervisorctl restart rspamd
fi
}

The DKIM keys generated into the DMS Config Volume rspamd/dkim/ dir will trigger that:

function _create_keys() {
if [[ ${KEYTYPE} == 'rsa' ]]; then
local BASE_FILE_NAME="${RSPAMD_DMS_DKIM_D}/${KEYTYPE}-${KEYSIZE}-${SELECTOR}-${DOMAIN}"
KEYTYPE_OPTIONS=('-b' "${KEYSIZE}")
_log 'info' "Creating DKIM keys of type '${KEYTYPE}' and length '${KEYSIZE}' with selector '${SELECTOR}' for domain '${DOMAIN}'"
else
local BASE_FILE_NAME="${RSPAMD_DMS_DKIM_D}/${KEYTYPE}-${SELECTOR}-${DOMAIN}"
KEYTYPE_OPTIONS=('-t' "${KEYTYPE}")
_log 'info' "Creating DKIM keys of type '${KEYTYPE}' with selector '${SELECTOR}' for domain '${DOMAIN}'"
fi
PUBLIC_KEY_FILE="${BASE_FILE_NAME}.public.txt"
PUBLIC_KEY_DNS_FILE="${BASE_FILE_NAME}.public.dns.txt"
PRIVATE_KEY_FILE="${BASE_FILE_NAME}.private.txt"

Just depends on the timing 🤔

The logic for both were contributed for DMS v13 back in 2023, where I raised some concerns:


So technically both restarts in this Rspamd DKIM script could be avoided I think? (LDAP is now enabled for change detection support, so I think that'll go smoothly). Doing so would make the following commentary a bit redundant?:

# We copy here immediately in order to not rely on the changedetector - this way, users
# can immediately use the new keys. The file should not already exist in ${RSPAMD_OVERRIDE_D}
# since it would have been copied already.
cp "${DEFAULT_CONFIG_FILE}" "${RSPAMD_OVERRIDE_D}/dkim_signing.conf"
chown _rspamd:_rspamd "${DEFAULT_CONFIG_FILE}" "${RSPAMD_OVERRIDE_D}/dkim_signing.conf"

We may still want to go ahead with the immediate copy though, but instead of a restart a SIGHUP could be sent to request rspamd to reload it's configs via pkill --signal SIGHUP $(pgrep --oldest rspamd) (Side-note: We don't have rspamd covered in our service restart tests)

Copilot AI review requested due to automatic review settings January 6, 2026 21:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes a duplicated rspamd restart call and changes the remaining restart operations to reload operations using SIGHUP signals. The change aims to address issue #4579 by eliminating redundant service restarts and using a more lightweight reload mechanism instead of full restarts.

  • Replaced supervisorctl restart rspamd with pkill --signal SIGHUP $(pgrep --oldest rspamd) in change detection script
  • Removed one redundant supervisorctl restart rspamd call from rspamd-dkim script
  • Changed the remaining restart in rspamd-dkim to use SIGHUP signal reload

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
target/scripts/check-for-changes.sh Changes rspamd restart to reload via SIGHUP when configuration changes are detected
target/bin/rspamd-dkim Removes one redundant restart call and changes the final restart to a reload via SIGHUP

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread target/scripts/check-for-changes.sh Outdated
_log 'debug' 'Rspamd configuration has changed - restarting service'
supervisorctl restart rspamd
_log 'debug' 'Reloading Rspamd configuration'
pkill --signal SIGHUP $(pgrep --oldest rspamd)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command substitution in the pkill command should be quoted to prevent word splitting and globbing issues. Without quotes, if pgrep returns multiple PIDs or unexpected output, the command could fail or behave unexpectedly.

Suggested change
pkill --signal SIGHUP $(pgrep --oldest rspamd)
pkill --signal SIGHUP "$(pgrep --oldest rspamd)"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the explanation makes sense as the service restarting should be synncronized in some way.
I did the requested changes in the branch but now there is another issue:
https://github.com/scristian71/docker-mailserver/actions/runs/20763050368/job/59622717076
Will try to analyze in the next days.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new issue looks to be:

  • RSpamd has a wrong config and pkill "reloads" the config but fails to check the status (rspamd is dead)
  • User fixes the config
  • now a pkill cannot send a signal to a not existing process, and we need to probably start rspamd with supervisorctl start rspamd

(We are also constantly failing the "[Rspamd] (DKIM) argument 'keysize' is applied correctly for RSA keys" test on our CI)

Copy link
Copy Markdown
Contributor

@ap-wtioit ap-wtioit Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pkill does not accept pids, when using pgrep to get the pid use kill:

image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here i changed it to:

    local rspamd_pid
    rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
    if [[ -z ${rspamd_pid} ]]; then
      _log 'debug' 'Starting Rspamd after configuration change'
      supervisorctl start rspamd
    else
      _log 'debug' 'Reloading Rspamd configuration'
      if ! kill -s SIGHUP "${rspamd_pid}"; then
        _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"
      fi
    fi

Comment thread target/scripts/check-for-changes.sh Outdated
_log 'debug' 'Rspamd configuration has changed - restarting service'
supervisorctl restart rspamd
_log 'debug' 'Reloading Rspamd configuration'
pkill --signal SIGHUP $(pgrep --oldest rspamd)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pkill command lacks error handling. If rspamd is not running, pgrep will fail and the pkill command will execute with no PID argument, which will also fail. This change detection script does not have strict error handling (no set -e), so the failure will be silent. Consider adding error handling to log a warning if the reload fails, or verify that rspamd is running before attempting to signal it.

Suggested change
pkill --signal SIGHUP $(pgrep --oldest rspamd)
local rspamd_pid
rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
if [[ -z ${rspamd_pid} ]]; then
_log 'warn' 'Rspamd reload requested, but no rspamd process found'
else
if ! pkill --signal SIGHUP "${rspamd_pid}"; then
_log 'warn' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"
fi
fi

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@ap-wtioit ap-wtioit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the code changes that worked for me locally based on your current PR.

  • checking if there is an rspamd pid
  • reloading (with sighup) if rspamd pid can be found
  • checking that kill was executed successfully
  • logging errors not warnings (i missed that pkill did nothing when using the AI suggestion at first and only logging a warning)
  • checking the status of rspamd after changes in rspamd-dkim to emit an error if config is broken (the previous supervisorctl restart rspamd did return an error if the config was broken) but ignoring the error in check-for-changes (as this seems to loop and should start rspamd again after the user fixes the config error)

Comment thread target/bin/rspamd-dkim
@@ -275,10 +272,8 @@ function _transform_public_key_file_to_dns_record_contents() {
}

function _final_steps() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what seems to work for us:

function _final_steps() {
  local rspamd_pid
  rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
  if [[ -z ${rspamd_pid} ]]; then
    supervisorctl start rspamd
  elif ! kill -s SIGHUP "${rspamd_pid}"; then
    _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"
  fi
  # check status of rspamd after a possible reload (that could have killed rspamd in case of a breaking config)
  supervisorctl status rspamd
  _log 'trace' 'Finished DKIM key creation'
}

Comment thread target/scripts/check-for-changes.sh Outdated
_log 'debug' 'Rspamd configuration has changed - restarting service'
supervisorctl restart rspamd
_log 'debug' 'Reloading Rspamd configuration'
pkill --signal SIGHUP $(pgrep --oldest rspamd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here i changed it to:

    local rspamd_pid
    rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
    if [[ -z ${rspamd_pid} ]]; then
      _log 'debug' 'Starting Rspamd after configuration change'
      supervisorctl start rspamd
    else
      _log 'debug' 'Reloading Rspamd configuration'
      if ! kill -s SIGHUP "${rspamd_pid}"; then
        _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"
      fi
    fi

@scristian71
Copy link
Copy Markdown
Contributor Author

Comment thread target/bin/rspamd-dkim Outdated
Comment on lines +277 to +278
rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
if [[ -z ${rspamd_pid} ]]; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
if [[ -z ${rspamd_pid} ]]; then
rspamd_pid=$(supervisorctl pid rspamd)
if (( rspamd_pid == 0 )); then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit cleaner. If rspamd is running, the PID is returned, otherwise 0.

Copy link
Copy Markdown
Member

@polarathene polarathene Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Not sure why this is listed under my posted review, it's a reply to earlier feedback.


I'm not sure if it's better since a condition is still required? supervisorctl pid <service name> is neat, but the main difference is instead of an empty value on failure it returns a 0 🤔

Then again, it's clearer for intent than the pgrep command, and might actually be better for our older test case on process restarts where I noted a few services that were less than obvious when piecing that together that extra context was required for maintenance.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

Copy link
Copy Markdown
Member

@casperklein casperklein Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was not to avoid the condition. "grepping" for something that we (supervisord) already know seemed odd to me.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

That was just a mistake. Personally I use (( )) for pure arithmetic stuff. You are right, in DMS, we should stick to the style we typically use elsewhere 👍

Comment thread target/bin/rspamd-dkim Outdated
Comment on lines +283 to +284
# check status of rspamd after a possible reload (that could have killed rspamd in case of a breaking config)
supervisorctl status rspamd
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work as you think. If rspamd fails to start due an invalid configuration, supervisor will immediately start rspam again (in a loop).
The output will almost always be "running".

Copy link
Copy Markdown
Contributor

@ap-wtioit ap-wtioit Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok than my local tests were not doing what i thought was going on. I was on the wrong track there. I mistook the pkill not working with pids (that i discovered later) for one of the negative tests leaving the config in a dirty state.

@polarathene
Copy link
Copy Markdown
Member

polarathene commented Jan 7, 2026

This is in response to @ap-wtioit review input, but on the main thread to keep the added information below visible.

Also pkill does not accept pids, when using pgrep to get the pid use kill

Whoops that's my bad sorry 😓

I was referencing some test logic I wrote a while back but didn't pay attention to how the PID was used.


This is additional context for my benefit on the topic. No need to read below this unless it interests you 👍

Notes

Using the rspamd process name with --oldest should still work well, just pgrep itself isn't necessary:

$ pgrep --oldest rspamd
1360

$ pgrep --list-full rspamd
1360 rspamd: main process
1639 rspamd: rspamd_proxy process (127.0.0.1:11332)
1640 rspamd: rspamd_proxy process (127.0.0.1:11332)
1641 rspamd: controller process (0.0.0.0:11334)
1642 rspamd: hs_helper process

# Reload (NOTE: `--echo` produces the output lines instead of being silent):
$ pkill --echo --signal SIGHUP rspamd
rspamd killed (pid 1360)
rspamd killed (pid 1639)
rspamd killed (pid 1640)
rspamd killed (pid 1641)
rspamd killed (pid 1642)

# Only sends signal to the main rspamd process:
# (NOTE: This still respawns the child processes)
$ pkill --oldest --echo --signal SIGHUP rspamd
rspamd killed (pid 1360)

# Original PID 1360 wasn't actually "killed":
# (Child processes were respawned)
$ pgrep --full --list-full rspamd
1360 rspamd: main process
2115 rspamd: rspamd_proxy process (127.0.0.1:11332)
2116 rspamd: rspamd_proxy process (127.0.0.1:11332)
2117 rspamd: controller process (0.0.0.0:11334)
2118 rspamd: hs_helper process

# Process tree (`pstree` via `psmisc` package):
$ pstree -p -s "$(pgrep --oldest rspamd)"
dumb-init(1)---supervisord(7)---rspamd(1360)-+-rspamd(2115)
                                             |-rspamd(2116)
                                             |-rspamd(2117)
                                             `-rspamd(2118)

NOTE: The increasingly larger PID of the child processes appears to be from check-for-changes.sh calling sleep 2 in a loop with each invocation incrementing the PID (along with any other commands I use via CLI).

Related logs (different PIDs from another run), completes within 1 second:

Rspamd SIGHUP reload event logs
2026-01-07 20:59:33 #635(main) <e96fe3>; main; rspamd_hup_handler: rspamd 3.12.1 is requested to reload configuration
# ...
2026-01-07 20:59:34 #635(main) <e96fe3>; main; reread_config: config has been reread successfully
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: spawn workers with a new config
2026-01-07 20:59:34 #635(main) <e96fe3>; main; spawn_workers: worker of type fuzzy(localhost:11335) is disabled in the config, skip spawning
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process rspamd_proxy (0); listen on: 127.0.0.1:11332
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process rspamd_proxy (1); listen on: 127.0.0.1:11332
2026-01-07 20:59:34 #1261(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1261(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process controller (0); listen on: 0.0.0.0:11334
2026-01-07 20:59:34 #1262(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1262(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; spawn_workers: worker of type normal(127.0.0.1:11333) is disabled in the config, skip spawning
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process hs_helper (0), no bind socket
2026-01-07 20:59:34 #1263(controller) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1263(controller) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #1263(controller) <19ntq8>; controller; rspamd_controller_password_sane: your normal password is not encrypted, we strongly recommend to replace it with the encrypted one
2026-01-07 20:59:34 #1263(controller) <19ntq8>; controller; rspamd_controller_password_sane: enable password is not set, so you should filter controller availability by using of firewall or `secure_ip` option
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: workers spawning has been finished
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: kill old workers
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 806
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 805
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 804
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 807
2026-01-07 20:59:34 #1264(hs_helper) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1264(hs_helper) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: hs_helper process 807 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process hs_helper after found terminated process with pid 807
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: rspamd_proxy process 804 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process rspamd_proxy after found terminated process with pid 804
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: controller process 806 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process controller after found terminated process with pid 806
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: rspamd_proxy process 805 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process rspamd_proxy after found terminated process with pid 805

Rspamd service management and failure handling

If Rspamd service itself is not running, I don't think we need to start the service.

Supervisord handles restarting the main process when the service was started during container init, and it will handle restarting if the rspamd process is killed abruptly too (which is what the linked test case covers, albeit it is missing rspamd coverage it should still be handled the same).

  • RSpamd has a wrong config and pkill "reloads" the config but fails to check the status (rspamd is dead)

Rspamd shouldn't be dead?

The actions run failure linked by @scristian71 has test cases fail with:

ERROR rspamd-dkim: Unexpected error occurred :: script = /usr/local/bin/rspamd-dkim  | function = _final_steps | command = pkill --signal SIGHUP $(pgrep --oldest rspamd) | line = 276 | exit code = 1

Not too different from the error at the start of my associated report (minus the earlier log lines related to a supervisorctl restart rspamd):

rspamd: ERROR (not running)
rspamd: ERROR (abnormal termination)
ERROR rspamd-dkim: Unexpected error occured :: script = /usr/local/bin/rspamd-dkim  | function = _setup_default_signing_conf | command = supervisorctl restart rspamd | line = 260 | exit code = 7

However the current failure of (exit code = 1) is occurring because (as you mentioned) the failure for pkill to kill by PID input:

$ pkill --signal SIGHUP 123
$ echo $?
1

I also tried to trigger a bad config failure, but this didn't affect the rspamd process in anyway. Even with a reload invoked.. when using rspamadm configdump it showed the bad config but no failure (possibly because the tool just dumps config), nor was anything showing up in /var/log/mail/rspamd.log (it would probably log errors if I triggered the functionality for the /etc/rspamd/override.d/multimap.conf invalid config I added).

I'm lacking further context on where bad config is causing rspamd to die in our test suite 😅


Side-Note - Accidental supervisord reload impact (SIGHUP to PID 1)

If we ran into a scenario that caused pkill --signal SIGHUP '' (by getting an empty pgrep subshell response with the quote wrapping advised from AI suggestions), this would actually restart everything in the container as supervisord receives the SIGHUP signal. That in turn brings down Rspamd temporarily.

$ docker compose up -d --force-recreate
$ docker compose exec dms bash

# When no PID is given, the empty string input sends the signal to every process:
# (thus the `supervisord` process, as an empty string matches all processes)
$ pkill --signal SIGHUP ""

# The previous command also booted you out of the container (bash process reloaded):
$ docker compose exec dms bash

# `supervisord` reloads due to `SIGHUP`, this restarts the DMS container init script
# which results in rspamd and other services being started again...
# - New rspamd process running for 17s
# - NOTE: If you ran this command too early, it'll still have the existing rspamd service time
$ ps -p "$(pgrep --oldest rspamd)" -o etimes=
     17

Inspecting /var/log/supervisor/supervisord.log we can see this is the case:

2026-01-07 21:16:27,106 WARN exited: mailserver (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,164 WARN exited: update-check (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,164 WARN exited: changedetector (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,165 WARN received SIGHUP indicating restart request
2026-01-07 21:16:54,648 WARN stopped: cron (terminated by SIGTERM)

After the pkill command boots you out of container, we can go back in and see how this affects the rspamd service that supervisord manages:

$ supervisorctl status rspamd
error: <class 'xmlrpc.client.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib/python3.11/xmlrpc/client.py line: 668

# Try again:
$ supervisorctl status rspamd
rspamd                           STOPPED   Not started

# Try again:
$ supervisorctl status rspamd
rspamd                           RUNNING   pid 6271, uptime 0:00:01

EDIT: Turns out it wasn't just PID 1 and forwarding to supervisord that our ENTRYPOINT + CMD instructions in the Dockerfile configure, but all processes are matched and received the SIGHUP 😬

Console output for context
$ pkill --echo --signal SIGHUP ''

dumb-init killed (pid 1)
supervisord killed (pid 6)
tail killed (pid 7768)
cron killed (pid 7908)
rsyslogd killed (pid 7912)
dovecot killed (pid 7918)
anvil killed (pid 7923)
log killed (pid 7924)
config killed (pid 7925)
update-check.sh killed (pid 7927)
redis-server killed (pid 7953)
sleep killed (pid 7966)
rspamd killed (pid 7967)
pidproxy killed (pid 7973)
postfix-script killed (pid 7977)
check-for-chang killed (pid 8004)
rspamd killed (pid 8006)
rspamd killed (pid 8007)
rspamd killed (pid 8008)
rspamd killed (pid 8011)
master killed (pid 8093)
pickup killed (pid 8094)
qmgr killed (pid 8095)
bash killed (pid 8096)
sleep killed (pid 9126)

I could see how the supervisord reload would result in encountering rspamd in a bad state, or the process not being visible briefly (or appearing dead).

pkill --signal SIGHUP $(pgrep --oldest rspamd) still results in a exit code of 1 just the same though.. it's only a bigger concern if it was given an empty input with via quotes:

# Error (exit code 2):
$ pkill --signal SIGHUP $(pgrep --oldest rspamdx)
pkill: no matching criteria specified
Try `pkill --help' for more information.

# The `pgrep` subshell returns an empty return value,
# all PID are matched and given the `SIGHUP` signal:
$ pkill --signal SIGHUP "$(pgrep --oldest rspamdx)"

Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit inclined towards pkill as shown below, but open to feedback from @casperklein or @georglauterbach if using supervisorctl pid <service name> is the preferred approach (both variants are covered below as suggestions).


I was a bit slow to give feedback, but thanks for looking into this too @ap-wtioit ❤️

We want to avoid duplicating this functionality across the two locations, historically that diverges over time with contributors / reviewers forgetting to carry changes over to the other location of duplicate code.

Instead we now have the utils.sh helper that we can extract that to, both of these files have that already in scope, so you just need to apply the suggestions I've provided and add the following to our utils.sh:

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD="$(pgrep --oldest rspamd)"

  if [[ -n ${PID_RSPAMD} ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    if ! kill -s SIGHUP "${PID_RSPAMD}"; then
      _log 'error' "Failed to send `SIGHUP` to Rspamd (PID: ${PID_RSPAMD})"
    fi
  fi
}

@casperklein made a nice suggestion for getting the rspamd PID via supervisorctl pid rspamd. This returns 0 on failure (eg: service not running).

The behaviour of kill with a PID of 0 kills processes in the same process group as the current process:

https://man7.org/linux/man-pages/man2/kill.2.html
If pid equals 0, then sig is sent to every process in the process group of the calling process.

Which in this case would terminate check-for-changes.sh process or your shell running the setup CLI command. Neither ideal, so we still need the variable + condition:

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD=$(supervisorctl pid rspamd)

  if [[ ${PID_RSPAMD} -ne 0 ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    if ! kill -s SIGHUP "${PID_RSPAMD}"; then
      _log 'error' "Failed to send `SIGHUP` to Rspamd (PID: ${PID_RSPAMD})"
    fi
  fi
}
  • -ne operator for number comparison (note the lack of quote wrapping PID_RSPAMD now.
  • Probably still want the conditional as there is output you cannot silence if encountering an error and we may still want to output our own error message?:
    $ kill -s SIGHUP 12345
    bash: kill: (12345) - No such process
    
    $ echo $?
    1
    That wouldn't happen with the logic AFAIK though..
Variant - Without error support

The error log could be omitted if we're sure PID_RSPAMD being a value other than 0 is valid? (no string output from supervisorctl):

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD=$(supervisorctl pid rspamd)

  if [[ ${PID_RSPAMD} -ne 0 ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    kill -s SIGHUP "${PID_RSPAMD}"
  fi
}
$ kill -s SIGHUP $(supervisorctl pid what)
bash: kill: No: arguments must be process or job IDs
bash: kill: such: arguments must be process or job IDs
bash: kill: process: arguments must be process or job IDs
bash: kill: what: arguments must be process or job IDs

Actual failure in the above case would occur at the -ne conditional:

[[: No such process what: syntax error in expression (error token is "such process what")
# If switching to a string value and using `!= '0'` comparision instead of `-ne 0`:
$ kill -s SIGHUP "$(supervisorctl pid what)"
bash: kill: No such process what: arguments must be process or job IDs

Alternatively... if we don't bother with the PID we can just use pkill (as I demonstrated here):

function _reload_rspamd() {
  _log 'debug' 'Reloading Rspamd configuration'
  if ! pkill --signal SIGHUP --oldest 'rspamd'; then
    _log 'error' "Failed to send `SIGHUP` to the main rspamd process"
  fi
}

Comment thread target/bin/rspamd-dkim Outdated
Comment thread target/scripts/check-for-changes.sh Outdated
Comment thread target/bin/rspamd-dkim Outdated
Comment on lines +277 to +278
rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
if [[ -z ${rspamd_pid} ]]; then
Copy link
Copy Markdown
Member

@polarathene polarathene Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Not sure why this is listed under my posted review, it's a reply to earlier feedback.


I'm not sure if it's better since a condition is still required? supervisorctl pid <service name> is neat, but the main difference is instead of an empty value on failure it returns a 0 🤔

Then again, it's clearer for intent than the pgrep command, and might actually be better for our older test case on process restarts where I noted a few services that were less than obvious when piecing that together that extra context was required for maintenance.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

@ap-wtioit
Copy link
Copy Markdown
Contributor

ap-wtioit commented Jan 8, 2026

I was about to write a comment for the _reload_rspamd utils method. When i read Supervisor/supervisor#53 once again and came to Supervisor/supervisor#53 (comment). It seems there is a supervisorctl signal SIGHUP rspamd that should do what we need to reload Rspamd.

Edit: confirmed that for us the tests are working with supervisorctl signal SIGHUP rspamd in both target/bin/rspamd-dkim and target/scripts/check-for-changes.sh

@scristian71
Copy link
Copy Markdown
Contributor Author

I have applied the suggested changes. Locally the tests are passing.

@polarathene polarathene changed the title fix: Remove duplicated and possibly wrong rspamd restart fix: Reload the rspamd service instead of restarting it Jan 9, 2026
georglauterbach
georglauterbach previously approved these changes Jan 9, 2026
Copy link
Copy Markdown
Member

@georglauterbach georglauterbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't read the whole conversation, but the changes look sensible.

Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll apply these (nevermind, not given permission to do so)

In addition to these final changes, please add a Changelog entry just above this section:

- **Internal:**
- `ENABLE_QUOTAS=1` - When an alias has multiple addresses, the first local mailbox address found will be used for the Dovecot dummy account workaround ([#4581](https://github.com/docker-mailserver/docker-mailserver/pull/4581))
- Change Detection service - Added support for responding to updated DMS config (_Rspamd and TLS certificates_) to `ACCOUNT_PROVISIONER=LDAP` ([#4627](https://github.com/docker-mailserver/docker-mailserver/pull/4627))

Copy/paste this:

- **Rspamd:**
  - Configuration changes now trigger a service reload instead of a restart ([#4632](https://github.com/docker-mailserver/docker-mailserver/pull/4632))

Comment thread target/bin/rspamd-dkim Outdated
Comment thread target/scripts/helpers/utils.sh Outdated
Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks for taking the time to contribute! ❤️

@polarathene polarathene enabled auto-merge (squash) January 10, 2026 20:25
@polarathene polarathene disabled auto-merge January 10, 2026 20:25
Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, once this change is applied we can merge :)

Comment thread target/scripts/helpers/utils.sh Outdated
@casperklein casperklein enabled auto-merge (squash) January 11, 2026 17:46
@casperklein casperklein merged commit 066d314 into docker-mailserver:master Jan 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants