Skip to content

fix: Ensure main log file is tailed from the start#4146

Merged
polarathene merged 7 commits intomasterfrom
fix/logs-tail-from-start
Aug 3, 2024
Merged

fix: Ensure main log file is tailed from the start#4146
polarathene merged 7 commits intomasterfrom
fix/logs-tail-from-start

Conversation

@polarathene
Copy link
Copy Markdown
Member

@polarathene polarathene commented Jul 31, 2024

Description

I noticed a difference in log output when troubleshooting a recent issue. Depending on if ENABLE_AMAVIS was on/off, earlier logs in this file did not appear. The touch doesn't seem relevant as logs from starting up daemons were already present, but in this case only slightly delayed amavis logs were displayed.

Syntax of the tail command was slightly off, seems to require a + in front of the number given to -n 👍

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change that does improve existing functionality)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • I have added information about changes made in this PR to CHANGELOG.md

@polarathene polarathene added area/scripts kind/bug/fix A fix (PR) for a confirmed bug labels Jul 31, 2024
@polarathene polarathene added this to the v14.1.0 milestone Jul 31, 2024
@polarathene polarathene self-assigned this Jul 31, 2024
@casperklein
Copy link
Copy Markdown
Member

Syntax of the tail command was slightly off, seems to require a + in front of the number given to -n 👍

I think you misunderstood the intention of "-F -n 0". It means, follow/display the file, start from the very end. So essentially, only display log lines, that are for the current session.

-n +0 / -n +1 means, display the file from line 0/1. IMO, the latter is more logical.

Your change will display the whole log file on each DMS startup. Depending on logrotation interval and amount of log file lines, this can be a lot and may take some time to process.

With our default LOGROTATE_INTERVAL=weekly, up to one week of mail.log entries would be displayed on startup.

I see no value in displaying older log entrys than the current DMS session.

but in this case only slightly delayed amavis logs were displayed.

That sounds like a race condition, when new log lines from the current sessions are written, while tail haven't been executed yet.

We could check, what the line count of mail.log is, before starting the daemons. And then using that value for tail:

TAIL_START=$(wc -l < mail.log)

tail -F -n +$TAIL_START

(not tested).

@polarathene
Copy link
Copy Markdown
Member Author

Your change will display the whole log file on each DMS startup. Depending on logrotation interval and amount of log file lines, this can be a lot and may take some time to process.

Thanks for pointing this out. My main concern was that log lines were not being output (an error) as they were racey from daemons starting prior to the tail.

but in this case only slightly delayed amavis logs were displayed.

That sounds like a race condition, when new log lines from the current sessions are written, while tail haven't been executed yet.

We could check, what the line count of mail.log is, before starting the daemons. And then using that value for tail

Yeah, the missing log lines were about 0.5-1 second offset from the displayed Amavis ones. Some would show when Amavis was disabled but not the OpenDKIM/OpenDMARC IIRC (these were the earliest log lines).

Your suggestion sounds fine 👍 I'll verify and update the PR tomorrow with that.

Copy link
Copy Markdown
Member

@georglauterbach georglauterbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @casperklein's explanation, I now understand the issue, and I agree with him. I'll block on this via review so that it does not get merged by accident.

@polarathene
Copy link
Copy Markdown
Member Author

I'll open a separate issue for the bug below.

Resolved - Cause is not from this PR

After verifying @casperklein advice as working well, I noticed that if I force stop the container (docker compose up for the container restart, followed by CTRL+C twice after container restart startup was successful_), there was a loop of error spam to /var/log/mail.log and /var/log/syslog, with lines like:

2024-08-02T00:13:59.113778+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)

I'm not sure exactly why the changes I've done here make that happen? (EDIT: Not due to this PR, see update at tend of message)

Without them I only get that line logged once (internally since prior to this PR it's too early to output). It's a bit racey, and I know why it's a problem, but I just don't understand why the change here was introducing the crash loop 🤔

After the standard Amavis log lines, it's just this over and over:

dms-1  | 2024-08-02 00:14:06,365 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:06,387 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:06,409 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:06,433 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02T00:14:06.129730+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.151205+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.171097+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.192423+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.213524+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.234588+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.255106+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.277583+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.299474+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.322426+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.343959+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.365593+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.386614+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.408952+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:06.432847+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02 00:14:07,133 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:07,160 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:07,184 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:07,210 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:07,237 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02T00:14:07.132721+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:07.160243+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:07.183814+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:07.210406+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02T00:14:07.237414+00:00 mail dovecot: master: Fatal: Dovecot is already running with PID 133 (read from /run/dovecot/master.pid)
dms-1  | 2024-08-02 00:14:08,132 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,152 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,174 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,198 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,220 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,244 WARN exited: dovecot (exit status 89; not expected)
dms-1  | 2024-08-02 00:14:08,266 WARN exited: dovecot (exit status 89; not expected)

That'd be fixed if we ensure PID is cleaned up for container restarts too:

function _setup_apply_fixes_after_configuration() {
_log 'trace' 'Removing leftover PID files from a stop/start'
find /var/run/ -not -name 'supervisord.pid' -name '*.pid' -delete
touch /dev/shm/supervisor.sock
_log 'debug' 'Checking /var/mail permissions'
if ! _chown_var_mail_if_necessary; then
_dms_panic__general 'Failed to fix /var/mail permissions'
fi
_log 'debug' 'Removing files and directories from older versions'
rm -rf /var/mail-state/spool-postfix/{dev,etc,lib,pid,usr,private/auth}
_rspamd_get_envs
# /tmp/docker-mailserver/rspamd/dkim
if [[ -d ${RSPAMD_DMS_DKIM_D} ]]; then
_log 'debug' "Ensuring '${RSPAMD_DMS_DKIM_D}' is owned by '_rspamd:_rspamd'"
chown -R _rspamd:_rspamd "${RSPAMD_DMS_DKIM_D}"
fi
}

_register_setup_function '_setup_apply_fixes_after_configuration'

It's not really a blocker for this PR, some change here just better surfaced the regression introduced with v14 for the container restart skip feature. I'd still like to know what this PR does to cause the loop though 🤔

EDIT: My :edge image was built from 9175424 (about 4 weeks old), I was testing these changes with a volume mount to replace the file with a master copy modified, so that might have something to do with it..

UPDATE: Not this PR 😅 I could reproduce without the volume mount with both the old :edge and current :edge images, so definitely a race condition from unclean container exit that is not handled properly from v14 with #3929

For reference these are the SHA256 image digests:

# Either DockerHub or GHCR registries are fine, same digest for both.
# GHCR is easier to browse by digest, DockerHub only seems to list current digest for the published tag.
# https://github.com/docker-mailserver/docker-mailserver/pkgs/container/docker-mailserver/versions

# Current `:edge`
# Commit: https://github.com/docker-mailserver/docker-mailserver/commit/2f8ad142eccfe3d6291373629c5b1dbf89b71f52
ghcr.io/docker-mailserver/docker-mailserver@sha256:22bdbd1288a42ed7fe9e304cc0c99b022933ece73658d6b5b7f183387b58e226

# The older `:edge` image I had:
# Commit: https://github.com/docker-mailserver/docker-mailserver/commit/9175424d0f3c67622c030311cc720d7477f1c2b6
ghcr.io/docker-mailserver/docker-mailserver@sha256:e9c54948daed0f1d9d08872ea37e9cbd2a4446da094778b8a4e096c73db869a5

Comment thread target/scripts/start-mailserver.sh Outdated
0 or 1 is equivalent for a fresh file, but when log lines already exist, you want to start from the line after the last one, not at it. Thus always increment the count by 1.
Comment thread CHANGELOG.md Outdated
@polarathene
Copy link
Copy Markdown
Member Author

Verified locally. Should be sorted now 👍

georglauterbach
georglauterbach previously approved these changes Aug 2, 2024
casperklein
casperklein previously approved these changes Aug 2, 2024
Comment thread target/scripts/start-mailserver.sh
Comment thread target/scripts/start-mailserver.sh Outdated
@polarathene polarathene dismissed stale reviews from casperklein and georglauterbach via e350170 August 2, 2024 20:48
@polarathene polarathene merged commit 526fd64 into master Aug 3, 2024
@polarathene polarathene deleted the fix/logs-tail-from-start branch August 3, 2024 00:04
@georglauterbach georglauterbach removed this from the v14.1.0 milestone Sep 26, 2024
@georglauterbach georglauterbach added this to the v15.0.0 milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/scripts kind/bug/fix A fix (PR) for a confirmed bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants