-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Miscellaneous first checks
- I checked that all ports are open and not blocked by my ISP / hosting provider.
- I know that SSL errors are likely the result of a wrong setup on the user side and not caused by DMS itself. I'm confident my setup is correct.
Affected Component(s)
postsrsd
What happened and when does this occur?
Another user reported to upstream postsrsd experiencing this same issue earlier in the year (specifically with DMS).
I noticed this while working on a PR and debugging local test failures. Why the tests do not fail in our CI is unclear (might be due to older environment/packages in CI?):
✗ checking configuration: hostname/domainname override: check headers of received mail [235]
(from function `assert_success' in file test/test_helper/bats-assert/src/assert.bash, line 114,
in test file test/mail_hostname.bats, line 104)
`assert_success' failed
-- command failed --
status : 1
output : ls: cannot access '/var/mail/localhost.localdomain/user1/new': No such file or directory
--
✗ checking configuration: non-subdomain: check headers of received mail [308]
(from function `assert_success' in file test/test_helper/bats-assert/src/assert.bash, line 114,
in test file test/mail_hostname.bats, line 163)
`assert_success' failed
-- command failed --
status : 1
output : ls: cannot access '/var/mail/localhost.localdomain/user1/new': No such file or directory
--
I had a process monitor active in the background and noticed the four containers spun up in that test each used 25% CPU for the containers postsrsd process.
When starting a single container in my VM (2vCPU) postsrsd was consistently showing 50% CPU usage, despite no mail to process.
The ports 10001 and 10002 are not bound to localhost:
$ ss -tulpn | grep LISTEN
tcp LISTEN 0 4096 127.0.0.1:10024 0.0.0.0:*
tcp LISTEN 0 100 127.0.0.1:10025 0.0.0.0:*
tcp LISTEN 0 100 0.0.0.0:587 0.0.0.0:*
tcp LISTEN 0 100 0.0.0.0:143 0.0.0.0:* users:(("dovecot",pid=642,fd=32))
tcp LISTEN 0 100 0.0.0.0:465 0.0.0.0:*
tcp LISTEN 0 100 0.0.0.0:65265 0.0.0.0:* users:(("dovecot",pid=642,fd=54))
tcp LISTEN 0 100 0.0.0.0:25 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.1:8891 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.1:8893 0.0.0.0:*
tcp LISTEN 0 100 [::]:587 [::]:*
tcp LISTEN 0 100 [::]:143 [::]:* users:(("dovecot",pid=642,fd=33))
tcp LISTEN 0 100 [::]:465 [::]:*
tcp LISTEN 0 100 [::]:65265 [::]:* users:(("dovecot",pid=642,fd=55))
tcp LISTEN 0 100 [::]:25 [::]:*postsrsd process is running:
$ ps aux | grep postsrsd
root 591 0.0 0.0 4864 3872 pts/0 S 02:37 0:00 /bin/bash /usr/local/bin/postsrsd-wrapper.sh
root 623 0.0 0.0 2420 1592 pts/0 S 02:37 0:00 /bin/sh /etc/init.d/postsrsd start
root 774 98.2 0.0 2332 708 pts/0 R 02:37 3:01 /usr/sbin/postsrsd -f 10001 -r 10002 -d example.test -s /etc/postsrsd.secret -a = -n 4 -N 4 -u postsrsd -p /var/run/postsrsd.pid -c /var/lib/postsrsd -l 127.0.0.1 -D -XHowever after roughly 10 minutes running like this, postsrsd heavy CPU activity ceases and the ports do show as bound. It's unclear why it takes this long.
What did you expect to happen?
postsrsd should not be consuming so much CPU activity, nor taking about 10 minutes to finish starting up. Possibly stuck in this upstream while loop?
How do we replicate the issue?
-
Next step uses the config dir of the failed test container, but it should reproduce with a minimal config containing
postfix-accounts.cffrom themake generate-accountscommand. -
Run:
docker run --rm -it \ --volume "${PWD}/test/duplicate_configs/mail_override_hostname:/tmp/docker-mailserver" \ --volume "${PWD}/test/test-files:/tmp/docker-mailserver-test:ro" \ -e ENABLE_SRS=1 \ -h 'example.com' \ --name 'postsrsd_cpu' \ "mailserver/docker-mailserver:edge"
-
Monitor the
postsrsdprocess, it should show high CPU activity (can be observed on the host, does not require running anything within the container). -
Run
docker exec -it postsrsd_cpu bashin a separate terminal, and then try send a mailsendmail -f '[email protected]' '[email protected]' < /tmp/docker-mailserver-test/email-templates/existing-user1.txt -
DMS will respond to it but fail as it can't reach the PostSRSD service on
localhost:10001, as the service is running (ps aux | grep postsrsd) but ports are not yet bound (ss -tulpn | grep LISTEN), and our logs appear to be useless (cat /var/log/supervisor/postsrsd.logonly has a line about the service being started, nothing added once it quiets down 10 mins later). -
Monitor the
postsrsdprocess, it should cease the heavy workload after a while, and ports for the service finally bound. Mail can then be sent to the local user account as before and actually be received (ls /var/mail/localhost.localdomain/user1).
DMS version
v11.1
What operating system is DMS running on?
Linux
What instruction set architecture is DMS running on?
x86_64 / AMD64
What container orchestration tool are you using?
Docker
Docker CLI command
docker run --rm -it \
--volume "${PWD}/test/duplicate_configs/mail_override_hostname:/tmp/docker-mailserver" \
--volume "${PWD}/test/test-files:/tmp/docker-mailserver-test:ro" \
-e ENABLE_SRS=1 \
-h 'example.com' \
--name 'postsrsd_cpu' \
'mailserver/docker-mailserver:edge'Relevant log output
Nothing relevant was logged, other than the failure to handle the mail due to the service not yet being up.
Other relevant information
I suspect the environment may have something to do with it.
Environment is:
- VMWare Workstation Player 16.3 on a Linux 5.17 x86_64 host.
- VM guest is Fedora 36 (5.18.9 kernel), Docker version is 20.10.17.
- CPU (2vCPU) under 50% load reported in the guest OS, 4GB RAM + 4GB swap (ZRAM), 2GB of each free.
Network if it's something going wrong with the binding step:
$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
1511: eth0@if1512: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft foreverCode of conduct
- I have read this project's Code of Conduct and I agree
- I have read the README and the documentation and I searched the issue tracker but could not find a solution
Improvements to this form?
It's eating up markdown syntax when some inputs are wrapped in codefences upon submission, while others are assuming formats such as yml for docker-compose.yml config sharing, despite saying you can also provide a docker CLI command instead.