fix: Reload the `rspamd` service instead of restarting it by scristian71 · Pull Request #4632 · docker-mailserver/docker-mailserver

scristian71 · 2026-01-03T11:50:08Z

Description

Fixes the failure by requesting Rspamd reload configuration via sending a HUP signal to the main rspamd process. supervisorctl restart rspamd was prone to a race condition causing tests to fail and sometimes putting the rspamd service into a bad state.

As per review below, the decision was to switch send the signal via supervisorctl signal <signal> <service name>.

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

[] My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (README.md or the documentation under docs/)
If necessary, I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have added information about changes made in this PR to CHANGELOG.md

polarathene · 2026-01-04T02:21:07Z

You should provide more context with the PR, appears to be as identified here: #4579 (comment)

End of that comment I am unsure of the current PR approach resolving the problem properly due to change detection still being racey (although the issue should be less frequently encountered by dropping the redundant restart). I proposed for change detection to be refactored with adopting watchexec in future that would resolve the concern.

scristian71 · 2026-01-05T18:47:56Z

In rspamd is restarted twice, once in _setup_default_signing_conf and the second time in _final_steps as it is mentioned in the referenced issue.
In my case I have failed tests:
https://github.com/scristian71/docker-mailserver/actions/runs/20618435610
And also if I run locally "[Rspamd] (DKIM)" I had random failures.

polarathene · 2026-01-06T00:55:01Z

TL;DR:

Please try and verify the following change suggestions (changing from restart to reload via SIGHUP):

Change:

docker-mailserver/target/bin/rspamd-dkim

Lines 277 to 284 in 0f5763c

    
           function _final_steps() { 
        
             # We need to restart Rspamd so the changes take effect immediately. 
        
             if ! supervisorctl restart rspamd; then 
        
               _log 'warn' 'Could not restart Rspamd via Supervisord' 
        
             fi 
        
             _log 'trace' 'Finished DKIM key creation' 
        
           }

to:

function _final_steps() {
  # Request Rspamd to reload it's config immediately (instead of waiting on change detection):
  pkill --signal SIGHUP $(pgrep --oldest rspamd)

  _log 'trace' 'Finished DKIM key creation'
}

Change:

docker-mailserver/target/scripts/check-for-changes.sh

Lines 209 to 210 in 0f5763c

_log 'debug' 'Rspamd configuration has changed - restarting service'

supervisorctl restart rspamd

to:
```
_log 'debug' 'Reloading Rspamd configuration'
pkill --signal SIGHUP $(pgrep --oldest rspamd)
```

In my case I have failed tests:
https://github.com/scristian71/docker-mailserver/actions/runs/20618435610
And also if I run locally "[Rspamd] (DKIM)" I had random failures.

~~We recently landed a change that improved the rspamd test to avoid a failure that started appearing.~~

EDIT: I see your test run was from the merge commit of that PR, so the same spurious failure detailed in #4579 and referenced in #4627 (comment)

Change detection is presently at a default 2s poll rate, Rspamd can take a few seconds to complete a restart IIRC. So depending on how close the two restarts are from change detection and the setup config dkim script, any of the relevant rspamd_dkim.bats test cases may still fail.

EDIT: Sending a SIGHUP signal to Rspamd for it to reload config might avoid the concern.

In rspamd is restarted twice, once in _setup_default_signing_conf and the second time in _final_steps as it is mentioned in the referenced issue.

Yes, but it will still be possible that it is triggered and causes this failure even with your PR here.

Here is our change detection logic for rspamd support, it will detect the DKIM files created, then at the end of the snippet restart the rspamd service:

docker-mailserver/target/scripts/check-for-changes.sh

Lines 187 to 212 in 0f5763c

    
           function _rspamd_changes() { 
        
             # RSPAMD_DMS_D='/tmp/docker-mailserver/rspamd' 
        
             if [[ ${CHANGED} =~ ${RSPAMD_DMS_D}/.* ]]; then 
        
               # "${RSPAMD_DMS_D}/override.d" 
        
               if [[ ${CHANGED} =~ ${RSPAMD_DMS_OVERRIDE_D}/.* ]]; then 
        
                 _log 'trace' 'Rspamd - Copying configuration overrides' 
        
                 rm "${RSPAMD_OVERRIDE_D}"/* 
        
                 cp "${RSPAMD_DMS_OVERRIDE_D}"/* "${RSPAMD_OVERRIDE_D}" 
        
               fi 
        
               # "${RSPAMD_DMS_D}/custom-commands.conf" 
        
               if [[ ${CHANGED} =~ ${RSPAMD_DMS_CUSTOM_COMMANDS_F} ]]; then 
        
                 _log 'trace' 'Rspamd - Generating new configuration from custom commands' 
        
                 _rspamd_handle_user_modules_adjustments 
        
               fi 
        
               # "${RSPAMD_DMS_D}/dkim" 
        
               if [[ ${CHANGED} =~ ${RSPAMD_DMS_DKIM_D} ]]; then 
        
                 _log 'trace' 'Rspamd - DKIM files updated' 
        
               fi 
        
               _log 'debug' 'Rspamd configuration has changed - restarting service' 
        
               supervisorctl restart rspamd 
        
             fi 
        
           }

The DKIM keys generated into the DMS Config Volume rspamd/dkim/ dir will trigger that:

docker-mailserver/target/bin/rspamd-dkim

Lines 163 to 176 in 0f5763c

    
           function _create_keys() { 
        
             if [[ ${KEYTYPE} == 'rsa' ]]; then 
        
               local BASE_FILE_NAME="${RSPAMD_DMS_DKIM_D}/${KEYTYPE}-${KEYSIZE}-${SELECTOR}-${DOMAIN}" 
        
               KEYTYPE_OPTIONS=('-b' "${KEYSIZE}") 
        
               _log 'info' "Creating DKIM keys of type '${KEYTYPE}' and length '${KEYSIZE}' with selector '${SELECTOR}' for domain '${DOMAIN}'" 
        
             else 
        
               local BASE_FILE_NAME="${RSPAMD_DMS_DKIM_D}/${KEYTYPE}-${SELECTOR}-${DOMAIN}" 
        
               KEYTYPE_OPTIONS=('-t' "${KEYTYPE}") 
        
               _log 'info' "Creating DKIM keys of type '${KEYTYPE}' with selector '${SELECTOR}' for domain '${DOMAIN}'" 
        
             fi 
        
             PUBLIC_KEY_FILE="${BASE_FILE_NAME}.public.txt" 
        
             PUBLIC_KEY_DNS_FILE="${BASE_FILE_NAME}.public.dns.txt" 
        
             PRIVATE_KEY_FILE="${BASE_FILE_NAME}.private.txt"

Just depends on the timing 🤔

The logic for both were contributed for DMS v13 back in 2023, where I raised some concerns:

Sending a SIGHUP signal would be better than a restart to reload Rspamd instead of a full service restart. There was a concern expressed about SIGHUP not being supported and it's not entirely clear if that's still valid in 2026.
The change detection concern was also raised at the time. There was a reply but it was focused on the dkim_signing.conf file only, the DKIM keys still result in a trigger.

So technically both restarts in this Rspamd DKIM script could be avoided I think? (LDAP is now enabled for change detection support, so I think that'll go smoothly). Doing so would make the following commentary a bit redundant?:

docker-mailserver/target/bin/rspamd-dkim

Lines 253 to 257 in 0f5763c

    
           # We copy here immediately in order to not rely on the changedetector - this way, users 
        
           # can immediately use the new keys. The file should not already exist in ${RSPAMD_OVERRIDE_D} 
        
           # since it would have been copied already. 
        
           cp "${DEFAULT_CONFIG_FILE}" "${RSPAMD_OVERRIDE_D}/dkim_signing.conf" 
        
           chown _rspamd:_rspamd "${DEFAULT_CONFIG_FILE}" "${RSPAMD_OVERRIDE_D}/dkim_signing.conf"

We may still want to go ahead with the immediate copy though, but instead of a restart a SIGHUP could be sent to request rspamd to reload it's configs via pkill --signal SIGHUP $(pgrep --oldest rspamd) (Side-note: We don't have rspamd covered in our service restart tests)

Copilot

Pull request overview

This PR removes a duplicated rspamd restart call and changes the remaining restart operations to reload operations using SIGHUP signals. The change aims to address issue #4579 by eliminating redundant service restarts and using a more lightweight reload mechanism instead of full restarts.

Replaced supervisorctl restart rspamd with pkill --signal SIGHUP $(pgrep --oldest rspamd) in change detection script
Removed one redundant supervisorctl restart rspamd call from rspamd-dkim script
Changed the remaining restart in rspamd-dkim to use SIGHUP signal reload

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File	Description
target/scripts/check-for-changes.sh	Changes rspamd restart to reload via SIGHUP when configuration changes are detected
target/bin/rspamd-dkim	Removes one redundant restart call and changes the final restart to a reload via SIGHUP

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T21:47:30Z

-    _log 'debug' 'Rspamd configuration has changed - restarting service'
-    supervisorctl restart rspamd
+    _log 'debug' 'Reloading Rspamd configuration'
+    pkill --signal SIGHUP $(pgrep --oldest rspamd)


The command substitution in the pkill command should be quoted to prevent word splitting and globbing issues. Without quotes, if pgrep returns multiple PIDs or unexpected output, the command could fail or behave unexpectedly.

Suggested change

pkill --signal SIGHUP $(pgrep --oldest rspamd)

pkill --signal SIGHUP "$(pgrep --oldest rspamd)"

All the explanation makes sense as the service restarting should be synncronized in some way.
I did the requested changes in the branch but now there is another issue:
https://github.com/scristian71/docker-mailserver/actions/runs/20763050368/job/59622717076
Will try to analyze in the next days.

The new issue looks to be:

RSpamd has a wrong config and pkill "reloads" the config but fails to check the status (rspamd is dead)

User fixes the config

now a pkill cannot send a signal to a not existing process, and we need to probably start rspamd with supervisorctl start rspamd

(We are also constantly failing the "[Rspamd] (DKIM) argument 'keysize' is applied correctly for RSA keys" test on our CI)

Also pkill does not accept pids, when using pgrep to get the pid use kill:

And here i changed it to:

local rspamd_pid rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid="" if [[ -z ${rspamd_pid} ]]; then _log 'debug' 'Starting Rspamd after configuration change' supervisorctl start rspamd else _log 'debug' 'Reloading Rspamd configuration' if ! kill -s SIGHUP "${rspamd_pid}"; then _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})" fi fi

Copilot · 2026-01-06T21:47:30Z

-    _log 'debug' 'Rspamd configuration has changed - restarting service'
-    supervisorctl restart rspamd
+    _log 'debug' 'Reloading Rspamd configuration'
+    pkill --signal SIGHUP $(pgrep --oldest rspamd)


The pkill command lacks error handling. If rspamd is not running, pgrep will fail and the pkill command will execute with no PID argument, which will also fail. This change detection script does not have strict error handling (no set -e), so the failure will be silent. Consider adding error handling to log a warning if the reload fails, or verify that rspamd is running before attempting to signal it.

Suggested change

pkill --signal SIGHUP $(pgrep --oldest rspamd)

local rspamd_pid

rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""

if [[ -z ${rspamd_pid} ]]; then

_log 'warn' 'Rspamd reload requested, but no rspamd process found'

else

if ! pkill --signal SIGHUP "${rspamd_pid}"; then

_log 'warn' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"

fi

fi

ap-wtioit

I added the code changes that worked for me locally based on your current PR.

checking if there is an rspamd pid
reloading (with sighup) if rspamd pid can be found
checking that kill was executed successfully
logging errors not warnings (i missed that pkill did nothing when using the AI suggestion at first and only logging a warning)
checking the status of rspamd after changes in rspamd-dkim to emit an error if config is broken (the previous supervisorctl restart rspamd did return an error if the config was broken) but ignoring the error in check-for-changes (as this seems to loop and should start rspamd again after the user fixes the config error)

ap-wtioit · 2026-01-07T11:27:18Z

@@ -275,10 +272,8 @@ function _transform_public_key_file_to_dns_record_contents() {
 }

 function _final_steps() {


This is what seems to work for us:

function _final_steps() { local rspamd_pid rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid="" if [[ -z ${rspamd_pid} ]]; then supervisorctl start rspamd elif ! kill -s SIGHUP "${rspamd_pid}"; then _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})" fi # check status of rspamd after a possible reload (that could have killed rspamd in case of a breaking config) supervisorctl status rspamd _log 'trace' 'Finished DKIM key creation' }

ap-wtioit · 2026-01-07T11:28:49Z

-    _log 'debug' 'Rspamd configuration has changed - restarting service'
-    supervisorctl restart rspamd
+    _log 'debug' 'Reloading Rspamd configuration'
+    pkill --signal SIGHUP $(pgrep --oldest rspamd)


And here i changed it to:

local rspamd_pid rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid="" if [[ -z ${rspamd_pid} ]]; then _log 'debug' 'Starting Rspamd after configuration change' supervisorctl start rspamd else _log 'debug' 'Reloading Rspamd configuration' if ! kill -s SIGHUP "${rspamd_pid}"; then _log 'error' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})" fi fi

scristian71 · 2026-01-07T20:39:38Z

I have added the changes suggested by @ap-wtioit and the tests are passing now:

https://github.com/scristian71/docker-mailserver/actions/runs/20793106916

casperklein · 2026-01-07T21:46:51Z

+  rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
+  if [[ -z ${rspamd_pid} ]]; then


Suggested change

rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""

if [[ -z ${rspamd_pid} ]]; then

rspamd_pid=$(supervisorctl pid rspamd)

if (( rspamd_pid == 0 )); then

I think this is a bit cleaner. If rspamd is running, the PID is returned, otherwise 0.

EDIT: Not sure why this is listed under my posted review, it's a reply to earlier feedback.

I'm not sure if it's better since a condition is still required? supervisorctl pid <service name> is neat, but the main difference is instead of an empty value on failure it returns a 0 🤔

Then again, it's clearer for intent than the pgrep command, and might actually be better for our older test case on process restarts where I noted a few services that were less than obvious when piecing that together that extra context was required for maintenance.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

The reason was not to avoid the condition. "grepping" for something that we (supervisord) already know seemed odd to me.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

That was just a mistake. Personally I use (( )) for pure arithmetic stuff. You are right, in DMS, we should stick to the style we typically use elsewhere 👍

casperklein · 2026-01-07T22:06:55Z

+  # check status of rspamd after a possible reload (that could have killed rspamd in case of a breaking config)
+  supervisorctl status rspamd


This doesn't work as you think. If rspamd fails to start due an invalid configuration, supervisor will immediately start rspam again (in a loop).
The output will almost always be "running".

Ok than my local tests were not doing what i thought was going on. I was on the wrong track there. I mistook the pkill not working with pids (that i discovered later) for one of the negative tests leaving the config in a dirty state.

polarathene · 2026-01-07T22:42:03Z

This is in response to @ap-wtioit review input, but on the main thread to keep the added information below visible.

Also pkill does not accept pids, when using pgrep to get the pid use kill

Whoops that's my bad sorry 😓

I was referencing some test logic I wrote a while back but didn't pay attention to how the PID was used.

This is additional context for my benefit on the topic. No need to read below this unless it interests you 👍

Notes

Using the rspamd process name with --oldest should still work well, just pgrep itself isn't necessary:

$ pgrep --oldest rspamd
1360

$ pgrep --list-full rspamd
1360 rspamd: main process
1639 rspamd: rspamd_proxy process (127.0.0.1:11332)
1640 rspamd: rspamd_proxy process (127.0.0.1:11332)
1641 rspamd: controller process (0.0.0.0:11334)
1642 rspamd: hs_helper process

# Reload (NOTE: `--echo` produces the output lines instead of being silent):
$ pkill --echo --signal SIGHUP rspamd
rspamd killed (pid 1360)
rspamd killed (pid 1639)
rspamd killed (pid 1640)
rspamd killed (pid 1641)
rspamd killed (pid 1642)

# Only sends signal to the main rspamd process:
# (NOTE: This still respawns the child processes)
$ pkill --oldest --echo --signal SIGHUP rspamd
rspamd killed (pid 1360)

# Original PID 1360 wasn't actually "killed":
# (Child processes were respawned)
$ pgrep --full --list-full rspamd
1360 rspamd: main process
2115 rspamd: rspamd_proxy process (127.0.0.1:11332)
2116 rspamd: rspamd_proxy process (127.0.0.1:11332)
2117 rspamd: controller process (0.0.0.0:11334)
2118 rspamd: hs_helper process

# Process tree (`pstree` via `psmisc` package):
$ pstree -p -s "$(pgrep --oldest rspamd)"
dumb-init(1)---supervisord(7)---rspamd(1360)-+-rspamd(2115)
                                             |-rspamd(2116)
                                             |-rspamd(2117)
                                             `-rspamd(2118)

NOTE: The increasingly larger PID of the child processes appears to be from check-for-changes.sh calling sleep 2 in a loop with each invocation incrementing the PID (along with any other commands I use via CLI).

Related logs (different PIDs from another run), completes within 1 second:

Rspamd SIGHUP reload event logs

2026-01-07 20:59:33 #635(main) <e96fe3>; main; rspamd_hup_handler: rspamd 3.12.1 is requested to reload configuration
# ...
2026-01-07 20:59:34 #635(main) <e96fe3>; main; reread_config: config has been reread successfully
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: spawn workers with a new config
2026-01-07 20:59:34 #635(main) <e96fe3>; main; spawn_workers: worker of type fuzzy(localhost:11335) is disabled in the config, skip spawning
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process rspamd_proxy (0); listen on: 127.0.0.1:11332
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process rspamd_proxy (1); listen on: 127.0.0.1:11332
2026-01-07 20:59:34 #1261(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1261(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process controller (0); listen on: 0.0.0.0:11334
2026-01-07 20:59:34 #1262(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1262(rspamd_proxy) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; spawn_workers: worker of type normal(127.0.0.1:11333) is disabled in the config, skip spawning
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_fork_worker: prepare to fork process hs_helper (0), no bind socket
2026-01-07 20:59:34 #1263(controller) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1263(controller) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #1263(controller) <19ntq8>; controller; rspamd_controller_password_sane: your normal password is not encrypted, we strongly recommend to replace it with the encrypted one
2026-01-07 20:59:34 #1263(controller) <19ntq8>; controller; rspamd_controller_password_sane: enable password is not set, so you should filter controller availability by using of firewall or `secure_ip` option
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: workers spawning has been finished
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_hup_handler: kill old workers
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 806
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 805
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 804
2026-01-07 20:59:34 #635(main) <e96fe3>; main; kill_old_workers: send signal to worker 807
2026-01-07 20:59:34 #1264(hs_helper) <e96fe3>; main; rspamd_worker_set_limits: use system max file descriptors limit: 1024KiB cur and 1024KiB max
2026-01-07 20:59:34 #1264(hs_helper) <e96fe3>; main; rspamd_worker_set_limits: use system max core size limit: -1B cur and -1B max
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: hs_helper process 807 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process hs_helper after found terminated process with pid 807
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: rspamd_proxy process 804 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process rspamd_proxy after found terminated process with pid 804
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: controller process 806 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process controller after found terminated process with pid 806
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_check_termination_clause: rspamd_proxy process 805 terminated normally
2026-01-07 20:59:34 #635(main) <e96fe3>; main; rspamd_cld_handler: do not respawn process rspamd_proxy after found terminated process with pid 805

Rspamd service management and failure handling

If Rspamd service itself is not running, I don't think we need to start the service.

Supervisord handles restarting the main process when the service was started during container init, and it will handle restarting if the rspamd process is killed abruptly too (which is what the linked test case covers, albeit it is missing rspamd coverage it should still be handled the same).

RSpamd has a wrong config and pkill "reloads" the config but fails to check the status (rspamd is dead)

Rspamd shouldn't be dead?

The actions run failure linked by @scristian71 has test cases fail with:

ERROR rspamd-dkim: Unexpected error occurred :: script = /usr/local/bin/rspamd-dkim  | function = _final_steps | command = pkill --signal SIGHUP $(pgrep --oldest rspamd) | line = 276 | exit code = 1

Not too different from the error at the start of my associated report (minus the earlier log lines related to a supervisorctl restart rspamd):

rspamd: ERROR (not running)
rspamd: ERROR (abnormal termination)
ERROR rspamd-dkim: Unexpected error occured :: script = /usr/local/bin/rspamd-dkim  | function = _setup_default_signing_conf | command = supervisorctl restart rspamd | line = 260 | exit code = 7

However the current failure of (exit code = 1) is occurring because (as you mentioned) the failure for pkill to kill by PID input:

$ pkill --signal SIGHUP 123
$ echo $?
1

I also tried to trigger a bad config failure, but this didn't affect the rspamd process in anyway. Even with a reload invoked.. when using rspamadm configdump it showed the bad config but no failure (possibly because the tool just dumps config), nor was anything showing up in /var/log/mail/rspamd.log (it would probably log errors if I triggered the functionality for the /etc/rspamd/override.d/multimap.conf invalid config I added).

I'm lacking further context on where bad config is causing rspamd to die in our test suite 😅

Side-Note - Accidental `supervisord` reload impact (`SIGHUP` to PID 1)

If we ran into a scenario that caused pkill --signal SIGHUP '' (by getting an empty pgrep subshell response with the quote wrapping advised from AI suggestions), this would actually restart everything in the container as supervisord receives the SIGHUP signal. That in turn brings down Rspamd temporarily.

$ docker compose up -d --force-recreate
$ docker compose exec dms bash

# When no PID is given, the empty string input sends the signal to every process:
# (thus the `supervisord` process, as an empty string matches all processes)
$ pkill --signal SIGHUP ""

# The previous command also booted you out of the container (bash process reloaded):
$ docker compose exec dms bash

# `supervisord` reloads due to `SIGHUP`, this restarts the DMS container init script
# which results in rspamd and other services being started again...
# - New rspamd process running for 17s
# - NOTE: If you ran this command too early, it'll still have the existing rspamd service time
$ ps -p "$(pgrep --oldest rspamd)" -o etimes=
     17

Inspecting /var/log/supervisor/supervisord.log we can see this is the case:

2026-01-07 21:16:27,106 WARN exited: mailserver (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,164 WARN exited: update-check (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,164 WARN exited: changedetector (terminated by SIGHUP; not expected)
2026-01-07 21:16:27,165 WARN received SIGHUP indicating restart request
2026-01-07 21:16:54,648 WARN stopped: cron (terminated by SIGTERM)

After the pkill command boots you out of container, we can go back in and see how this affects the rspamd service that supervisord manages:

$ supervisorctl status rspamd
error: <class 'xmlrpc.client.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib/python3.11/xmlrpc/client.py line: 668

# Try again:
$ supervisorctl status rspamd
rspamd                           STOPPED   Not started

# Try again:
$ supervisorctl status rspamd
rspamd                           RUNNING   pid 6271, uptime 0:00:01

EDIT: Turns out it wasn't just PID 1 and forwarding to supervisord that our ENTRYPOINT + CMD instructions in the Dockerfile configure, but all processes are matched and received the SIGHUP 😬

Console output for context

$ pkill --echo --signal SIGHUP ''

dumb-init killed (pid 1)
supervisord killed (pid 6)
tail killed (pid 7768)
cron killed (pid 7908)
rsyslogd killed (pid 7912)
dovecot killed (pid 7918)
anvil killed (pid 7923)
log killed (pid 7924)
config killed (pid 7925)
update-check.sh killed (pid 7927)
redis-server killed (pid 7953)
sleep killed (pid 7966)
rspamd killed (pid 7967)
pidproxy killed (pid 7973)
postfix-script killed (pid 7977)
check-for-chang killed (pid 8004)
rspamd killed (pid 8006)
rspamd killed (pid 8007)
rspamd killed (pid 8008)
rspamd killed (pid 8011)
master killed (pid 8093)
pickup killed (pid 8094)
qmgr killed (pid 8095)
bash killed (pid 8096)
sleep killed (pid 9126)

I could see how the supervisord reload would result in encountering rspamd in a bad state, or the process not being visible briefly (or appearing dead).

pkill --signal SIGHUP $(pgrep --oldest rspamd) still results in a exit code of 1 just the same though.. it's only a bigger concern if it was given an empty input with via quotes:

# Error (exit code 2):
$ pkill --signal SIGHUP $(pgrep --oldest rspamdx)
pkill: no matching criteria specified
Try `pkill --help' for more information.

# The `pgrep` subshell returns an empty return value,
# all PID are matched and given the `SIGHUP` signal:
$ pkill --signal SIGHUP "$(pgrep --oldest rspamdx)"

polarathene

I'm still a bit inclined towards pkill as shown below, but open to feedback from @casperklein or @georglauterbach if using supervisorctl pid <service name> is the preferred approach (both variants are covered below as suggestions).

I was a bit slow to give feedback, but thanks for looking into this too @ap-wtioit ❤️

We want to avoid duplicating this functionality across the two locations, historically that diverges over time with contributors / reviewers forgetting to carry changes over to the other location of duplicate code.

Instead we now have the utils.sh helper that we can extract that to, both of these files have that already in scope, so you just need to apply the suggestions I've provided and add the following to our utils.sh:

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD="$(pgrep --oldest rspamd)"

  if [[ -n ${PID_RSPAMD} ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    if ! kill -s SIGHUP "${PID_RSPAMD}"; then
      _log 'error' "Failed to send `SIGHUP` to Rspamd (PID: ${PID_RSPAMD})"
    fi
  fi
}

@casperklein made a nice suggestion for getting the rspamd PID via supervisorctl pid rspamd. This returns 0 on failure (eg: service not running).

The behaviour of kill with a PID of 0 kills processes in the same process group as the current process:

https://man7.org/linux/man-pages/man2/kill.2.html
If pid equals 0, then sig is sent to every process in the process group of the calling process.

Which in this case would terminate check-for-changes.sh process or your shell running the setup CLI command. Neither ideal, so we still need the variable + condition:

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD=$(supervisorctl pid rspamd)

  if [[ ${PID_RSPAMD} -ne 0 ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    if ! kill -s SIGHUP "${PID_RSPAMD}"; then
      _log 'error' "Failed to send `SIGHUP` to Rspamd (PID: ${PID_RSPAMD})"
    fi
  fi
}

-ne operator for number comparison (note the lack of quote wrapping PID_RSPAMD now.
Probably still want the conditional as there is output you cannot silence if encountering an error and we may still want to output our own error message?:
```
$ kill -s SIGHUP 12345
bash: kill: (12345) - No such process

$ echo $?
1
```
That wouldn't happen with the logic AFAIK though..

Variant - Without error support

The error log could be omitted if we're sure PID_RSPAMD being a value other than 0 is valid? (no string output from supervisorctl):

function _reload_rspamd() {
  local PID_RSPAMD
  PID_RSPAMD=$(supervisorctl pid rspamd)

  if [[ ${PID_RSPAMD} -ne 0 ]]; then
    _log 'debug' 'Reloading Rspamd configuration'
    kill -s SIGHUP "${PID_RSPAMD}"
  fi
}

$ kill -s SIGHUP $(supervisorctl pid what)
bash: kill: No: arguments must be process or job IDs
bash: kill: such: arguments must be process or job IDs
bash: kill: process: arguments must be process or job IDs
bash: kill: what: arguments must be process or job IDs

Actual failure in the above case would occur at the -ne conditional:

[[: No such process what: syntax error in expression (error token is "such process what")

# If switching to a string value and using `!= '0'` comparision instead of `-ne 0`:
$ kill -s SIGHUP "$(supervisorctl pid what)"
bash: kill: No such process what: arguments must be process or job IDs

Alternatively... if we don't bother with the PID we can just use pkill (as I demonstrated here):

function _reload_rspamd() {
  _log 'debug' 'Reloading Rspamd configuration'
  if ! pkill --signal SIGHUP --oldest 'rspamd'; then
    _log 'error' "Failed to send `SIGHUP` to the main rspamd process"
  fi
}

polarathene · 2026-01-07T22:40:14Z

+  rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
+  if [[ -z ${rspamd_pid} ]]; then


EDIT: Not sure why this is listed under my posted review, it's a reply to earlier feedback.

I'm not sure if it's better since a condition is still required? supervisorctl pid <service name> is neat, but the main difference is instead of an empty value on failure it returns a 0 🤔

Then again, it's clearer for intent than the pgrep command, and might actually be better for our older test case on process restarts where I noted a few services that were less than obvious when piecing that together that extra context was required for maintenance.

Why the (( rspamd_pid == 0 )) condition vs [[ rspamd_pid -eq 0 ]]? (which is what we typically use elsewhere in our scripts AFAIK?)

ap-wtioit · 2026-01-08T06:41:31Z

I was about to write a comment for the _reload_rspamd utils method. When i read Supervisor/supervisor#53 once again and came to Supervisor/supervisor#53 (comment). It seems there is a supervisorctl signal SIGHUP rspamd that should do what we need to reload Rspamd.

Edit: confirmed that for us the tests are working with supervisorctl signal SIGHUP rspamd in both target/bin/rspamd-dkim and target/scripts/check-for-changes.sh

Co-authored-by: Brennan Kinney <[email protected]>

scristian71 · 2026-01-08T23:27:56Z

I have applied the suggested changes. Locally the tests are passing.

georglauterbach

Didn't read the whole conversation, but the changes look sensible.

polarathene

~~I'll apply these~~ (nevermind, not given permission to do so)

In addition to these final changes, please add a Changelog entry just above this section:

docker-mailserver/CHANGELOG.md

Lines 11 to 13 in 0f5763c

    
           - **Internal:** 
        
             - `ENABLE_QUOTAS=1` - When an alias has multiple addresses, the first local mailbox address found will be used for the Dovecot dummy account workaround ([#4581](https://github.com/docker-mailserver/docker-mailserver/pull/4581)) 
        
             - Change Detection service - Added support for responding to updated DMS config (_Rspamd and TLS certificates_) to `ACCOUNT_PROVISIONER=LDAP` ([#4627](https://github.com/docker-mailserver/docker-mailserver/pull/4627))

Copy/paste this:

- **Rspamd:**
  - Configuration changes now trigger a service reload instead of a restart ([#4632](https://github.com/docker-mailserver/docker-mailserver/pull/4632))

Co-authored-by: Brennan Kinney <[email protected]>

polarathene

LGTM 👍 Thanks for taking the time to contribute! ❤️

polarathene

My mistake, once this change is applied we can merge :)

Co-authored-by: Brennan Kinney <[email protected]>

Cristian Spiridon and others added 2 commits January 2, 2026 09:54

fix: Remove duplicated and possibly wrong rspamd restart

4b63a1e

Merge branch 'master' into fix-rspamd-dkim-restart

bbb0c64

polarathene mentioned this pull request Jan 6, 2026

bug report: Rspamd crashing in loop from setup config dkim #4579

Open

1 task

change restart with SIGUP fro rspamd

e7854c0

Copilot AI review requested due to automatic review settings January 6, 2026 21:42

Copilot started reviewing on behalf of scristian71 January 6, 2026 21:43 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

ap-wtioit reviewed Jan 7, 2026

View reviewed changes

Add @ap-wtioit suggestions

50d60ed

casperklein reviewed Jan 7, 2026

View reviewed changes

polarathene requested changes Jan 7, 2026

View reviewed changes

polarathene added this to the v16.0.0 milestone Jan 7, 2026

polarathene added area/scripts area/features service/security/dkim-dmarc-spf service/security/rspamd kind/bug/fix A fix (PR) for a confirmed bug labels Jan 7, 2026

scristian71 and others added 4 commits January 8, 2026 23:07

Update target/scripts/check-for-changes.sh

7684592

Co-authored-by: Brennan Kinney <[email protected]>

Update target/bin/rspamd-dkim

0f173d2

Co-authored-by: Brennan Kinney <[email protected]>

Applying @ap-wtioit suggestion for rspamd configuration reload

60d9b95

Add new line at the end of the file

63e443d

polarathene changed the title ~~fix: Remove duplicated and possibly wrong rspamd restart~~ fix: Reload the rspamd service instead of restarting it Jan 9, 2026

polarathene requested a review from casperklein January 9, 2026 02:28

polarathene requested a review from georglauterbach January 9, 2026 02:28

georglauterbach previously approved these changes Jan 9, 2026

View reviewed changes

polarathene requested changes Jan 9, 2026

View reviewed changes

Comment thread target/bin/rspamd-dkim Outdated

Comment thread target/scripts/helpers/utils.sh Outdated

Remove unnecessary rspamd status query

1b4610c

Co-authored-by: Brennan Kinney <[email protected]>

scristian71 dismissed georglauterbach’s stale review via 1b4610c January 10, 2026 17:42

scristian71 and others added 2 commits January 10, 2026 17:42

Update target/scripts/helpers/utils.sh

496f56c

Co-authored-by: Brennan Kinney <[email protected]>

Update CHANGELOG

65e5385

polarathene approved these changes Jan 10, 2026

View reviewed changes

polarathene enabled auto-merge (squash) January 10, 2026 20:25

polarathene disabled auto-merge January 10, 2026 20:25

polarathene requested changes Jan 10, 2026

View reviewed changes

Comment thread target/scripts/helpers/utils.sh Outdated

Update log message to be accepted by linter.

76c793b

Co-authored-by: Brennan Kinney <[email protected]>

casperklein approved these changes Jan 11, 2026

View reviewed changes

casperklein enabled auto-merge (squash) January 11, 2026 17:46

polarathene approved these changes Jan 11, 2026

View reviewed changes

casperklein merged commit 066d314 into docker-mailserver:master Jan 11, 2026
7 checks passed

	pkill --signal SIGHUP $(pgrep --oldest rspamd)
	pkill --signal SIGHUP "$(pgrep --oldest rspamd)"

-    pkill --signal SIGHUP $(pgrep --oldest rspamd)
+    local rspamd_pid
+    rspamd_pid=$(pgrep --oldest rspamd) || rspamd_pid=""
+    if [[ -z ${rspamd_pid} ]]; then
+      _log 'warn' 'Rspamd reload requested, but no rspamd process found'
+    else
+      if ! pkill --signal SIGHUP "${rspamd_pid}"; then
+        _log 'warn' "Failed to send SIGHUP to Rspamd (PID: ${rspamd_pid})"
+      fi
+    fi

		@@ -275,10 +272,8 @@ function _transform_public_key_file_to_dns_record_contents() {
		}

		function _final_steps() {

		rspamd_pid=$(pgrep --oldest rspamd) \|\| rspamd_pid=""
		if [[ -z ${rspamd_pid} ]]; then

		# check status of rspamd after a possible reload (that could have killed rspamd in case of a breaking config)
		supervisorctl status rspamd

	- Internal:
	- `ENABLE_QUOTAS=1` - When an alias has multiple addresses, the first local mailbox address found will be used for the Dovecot dummy account workaround ([#4581](https://github.com/docker-mailserver/docker-mailserver/pull/4581))
	- Change Detection service - Added support for responding to updated DMS config (_Rspamd and TLS certificates_) to `ACCOUNT_PROVISIONER=LDAP` ([#4627](https://github.com/docker-mailserver/docker-mailserver/pull/4627))

Uh oh!

Conversation

scristian71 commented Jan 3, 2026 • edited by polarathene Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

polarathene commented Jan 4, 2026

Uh oh!

scristian71 commented Jan 5, 2026

Uh oh!

polarathene commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ap-wtioit Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

ap-wtioit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scristian71 commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polarathene Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

casperklein Jan 8, 2026 • edited by polarathene Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ap-wtioit Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polarathene commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Rspamd service management and failure handling

Side-Note - Accidental supervisord reload impact (SIGHUP to PID 1)

Uh oh!

polarathene left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

polarathene Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ap-wtioit commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

scristian71 commented Jan 3, 2026 •

edited by polarathene

Loading

polarathene commented Jan 6, 2026 •

edited

Loading

ap-wtioit Jan 7, 2026 •

edited

Loading

polarathene Jan 7, 2026 •

edited

Loading

casperklein Jan 8, 2026 •

edited by polarathene

Loading

ap-wtioit Jan 8, 2026 •

edited

Loading

polarathene commented Jan 7, 2026 •

edited

Loading

Side-Note - Accidental `supervisord` reload impact (`SIGHUP` to PID 1)

polarathene Jan 7, 2026 •

edited

Loading

ap-wtioit commented Jan 8, 2026 •

edited

Loading

polarathene left a comment •

edited

Loading