chore: Simplify `compose.yaml` healthcheck by polarathene · Pull Request #4498 · docker-mailserver/docker-mailserver

polarathene · 2025-06-01T07:39:34Z

Description

The current example healthcheck ss --listening --tcp | grep -P 'LISTEN.+:smtp' || exit 1 can be simplified to nc -z localhost 25 (our tests use this via a helper to wait on Postfix being ready), which effectively does the same check (that a service is listening on the port), but without the extra output or need to set an exit status (both of which could be resolved with grep -q).

The nc command is a bit more terser and direct though. The port 25 can be alternatively substituted for the service port smtp, (nc -z localhost smtp) just like the ss output displays (resolved via grep smtp /etc/services), but I figured most would be more familiar with the port number itself.

This has a benefit of less noisy healthcheck logs, which currently looks like this every 30 secs:

$ docker inspect --format='{{json .State.Health}}' dms | jq

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T07:16:35.79643154Z",
      "End": "2025-06-01T07:16:35.842821137Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:17:05.843928898Z",
      "End": "2025-06-01T07:17:05.891460075Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:17:35.892520906Z",
      "End": "2025-06-01T07:17:35.938991775Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:18:05.939841294Z",
      "End": "2025-06-01T07:18:06.003672308Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:18:36.004749936Z",
      "End": "2025-06-01T07:18:36.048725208Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    }
  ]
}

With nc instead:

$ docker inspect --format='{{json .State.Health}}' dms | jq

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T07:20:00.464604128Z",
      "End": "2025-06-01T07:20:00.516787463Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:20:30.517462343Z",
      "End": "2025-06-01T07:20:30.578389769Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:21:00.579295302Z",
      "End": "2025-06-01T07:21:00.621559922Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:21:30.622531376Z",
      "End": "2025-06-01T07:21:30.67123444Z",
      "ExitCode": 0,
      "Output": ""
    }
  ]
}

Type of change

Improvement (non-breaking change that does improve existing functionality)

casperklein · 2025-06-01T10:57:45Z

[..] or need to set an exit status

Curl uses a lot of different return codes. The healthcheck however expects just 0 or 1. || exit 1 converts return codes > 1 to 1.

I don't like the new approach, because it pollutes the mail.log every 30 seconds with something like:

Jun  1 12:45:32 mail postfix/postscreen[18492]: CONNECT from [127.0.0.1]:46020 to [127.0.0.1]:25
Jun  1 12:45:32 mail postfix/postscreen[18492]: ALLOWLISTED [127.0.0.1]:46020
Jun  1 12:45:32 mail postfix/smtpd[18493]: connect from localhost[127.0.0.1]
Jun  1 12:45:32 mail opendmarc[2397]: ignoring connection from localhost
Jun  1 12:45:32 mail postfix/smtpd[18493]: lost connection after CONNECT from localhost[127.0.0.1]
Jun  1 12:45:32 mail postfix/smtpd[18493]: disconnect from localhost[127.0.0.1] commands=0/0

This might also introduce problems with fail2ban. IIRC using mode ddos or agressive, it looks for "connect" / "disconnect" patterns.

Edit:

To achieve a cleaner output, we could use: ss --ipv4 --listening --tcp --numeric | grep -o '0.0.0.0:25', which only returns 0.0.0.0:25

polarathene · 2025-06-01T21:46:38Z

Curl uses a lot of different return codes. The healthcheck however expects just 0 or 1. || exit 1 converts return codes > 1 to 1.

Where is curl involved here? Isn't the exit status coming from the grep command otherwise?

I don't like the new approach, because it pollutes the mail.log every 30 seconds with something lik

I was not aware of that and you raise very valid points, thanks for pointing that out! 🙏

To achieve a cleaner output, we could use: ss --ipv4 --listening --tcp --numeric | grep -o '0.0.0.0:25', which only returns 0.0.0.0:25

Is there value in that output being stored repeated in the healthcheck log?

Shouldn't we just use grep -q for 0/1 exit status? As per the grep docs:

Exit Status

Normally, the exit status is 0 if selected lines are found and 1 otherwise.
But the exit status is 2 if an error occurred, unless the -q or --quiet or --silent option is used and a selected line is found.

polarathene

Verified this check adjusted to interval: 1s:

$ docker inspect --format='{{json .State.Health}}' dms | jq

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T21:57:33.747428151Z",
      "End": "2025-06-01T21:57:33.827584046Z",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:34.828241261Z",
      "End": "2025-06-01T21:57:34.919379591Z",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:35.919784141Z",
      "End": "2025-06-01T21:57:35.965090221Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:36.965467119Z",
      "End": "2025-06-01T21:57:37.012927453Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:38.013727139Z",
      "End": "2025-06-01T21:57:38.059783409Z",
      "ExitCode": 0,
      "Output": ""
    }
  ]
}

NOTE: Only the last 5 checks are stored in the log output, hence the reduced interval to confirm.

casperklein · 2025-06-02T00:34:22Z

Where is curl involved here? Isn't the exit status coming from the grep command otherwise?

My bad. I was looking at another healthcheck example while writing. It was however just to explain the || exit 1 usage / best-practice.

Shouldn't we just use grep -q for 0/1 exit status?

That will work. Only disadvantage: when grep fails (return code 2) for whatever reason, it will happen silently.

casperklein · 2025-06-02T00:36:32Z

    #   - NET_ADMIN
    healthcheck:
-      test: "ss --listening --tcp | grep -P 'LISTEN.+:smtp' || exit 1"
+      test: "ss --listening --tcp | grep --silent ':smtp'"


Afaik our postfix listens only on IPv4. Therefore we could make it more explicit with:

Suggested change

test: "ss --listening --tcp | grep --silent ':smtp'"

test: "ss --listening --ipv4 --tcp | grep --silent ':smtp'"

I'm not sure if we should? If you had an IPv6 only container (I think this is possible in Docker now at least), the healthcheck could be agnostic to that.

DMS itself though does have a fair bit of 127.0.0.1 hard-coded I think instead of localhost, so it may not support IPv6 only out of the box 😅

If there is a good reason to specifically filter to IPv4 only though, we could do that. But I don't think there's any issues with also detecting listening on IPv6?

polarathene · 2025-06-02T04:55:33Z

Shouldn't we just use grep -q for 0/1 exit status?

That will work. Only disadvantage: when grep fails (return code 2) for whatever reason, it will happen silently.

I assume the exit status of 2 becomes 1 in that case, but I'm not sure how to verify that.

If grep is given an invalid arg or invalid filepath to search you still get an exit status of 2, despite --quiet / --silent. Presumably a valid command can error and instead of 2 you'd get 1?

I suppose we could keep || exit 1 tacked on as a precaution? 🤷‍♂️

https://docs.docker.com/reference/dockerfile/#healthcheck

The command's exit status indicates the health status of the container.

The possible values are:

0: success - the container is healthy and ready for use

1: unhealthy - the container isn't working correctly

2: reserved - don't use this exit code

polarathene

Adding --ipv4 as suggested by @casperklein , should someone bring up IPv6 only DMS, we can discuss dropping it then 🤔

Adding back || exit 1 for broader compatibility. Some errors encountered by grep still return a 2 despite --silent/--quiet option being set.

chore: Simplify compose.yaml healthcheck

a76b065

polarathene added this to the v15.1.0 milestone Jun 1, 2025

polarathene requested review from casperklein and georglauterbach June 1, 2025 07:39

polarathene self-assigned this Jun 1, 2025

polarathene added the kind/improvement Improve an existing feature, configuration file or the documentation label Jun 1, 2025

tests: Update healthcheck test

a0cb91d

georglauterbach previously approved these changes Jun 1, 2025

View reviewed changes

polarathene commented Jun 1, 2025

View reviewed changes

Comment thread compose.yaml Outdated

Comment thread test/tests/serial/tests.bats Outdated

Apply suggestions from code review

7b37166

polarathene dismissed georglauterbach’s stale review via 7b37166 June 1, 2025 22:00

polarathene requested a review from georglauterbach June 1, 2025 22:05

casperklein reviewed Jun 2, 2025

View reviewed changes

polarathene commented Jun 2, 2025

View reviewed changes

Comment thread compose.yaml Outdated

Comment thread test/tests/serial/tests.bats Outdated

Apply suggestions from code review

d8ee526

georglauterbach approved these changes Jun 2, 2025

View reviewed changes

Merge branch 'master' into chore/simplify-example-healthcheck

23abd92

polarathene enabled auto-merge (squash) June 2, 2025 07:27

polarathene merged commit 3c193a1 into master Jun 2, 2025
2 checks passed

polarathene deleted the chore/simplify-example-healthcheck branch June 2, 2025 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Simplify `compose.yaml` healthcheck#4498

chore: Simplify `compose.yaml` healthcheck#4498
polarathene merged 5 commits intomasterfrom
chore/simplify-example-healthcheck

polarathene commented Jun 1, 2025

Uh oh!

casperklein commented Jun 1, 2025 •

edited

Loading

Uh oh!

polarathene commented Jun 1, 2025

Uh oh!

polarathene left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

casperklein commented Jun 2, 2025

Uh oh!

casperklein Jun 2, 2025

Uh oh!

polarathene Jun 2, 2025

Uh oh!

polarathene commented Jun 2, 2025

Uh oh!

polarathene left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	test: "ss --listening --tcp \| grep --silent ':smtp'"
	test: "ss --listening --ipv4 --tcp \| grep --silent ':smtp'"

Uh oh!

Conversation

polarathene commented Jun 1, 2025

Description

Type of change

Uh oh!

casperklein commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polarathene commented Jun 1, 2025

Uh oh!

polarathene left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

casperklein commented Jun 2, 2025

Uh oh!

casperklein Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

polarathene Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

polarathene commented Jun 2, 2025

Uh oh!

polarathene left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

casperklein commented Jun 1, 2025 •

edited

Loading

polarathene left a comment •

edited

Loading