Skip to content

chore: Remove delay starting the change detection service#3064

Merged
polarathene merged 8 commits intodocker-mailserver:masterfrom
polarathene:chore/changedetection-remove-init-delay
Feb 18, 2023
Merged

chore: Remove delay starting the change detection service#3064
polarathene merged 8 commits intodocker-mailserver:masterfrom
polarathene:chore/changedetection-remove-init-delay

Conversation

@polarathene
Copy link
Copy Markdown
Member

@polarathene polarathene commented Feb 6, 2023

Description

There does not seem to be a need anymore for delaying the service availability by 10 seconds when starting a container.

  • A test needed to be adjusted as Amavis was not ready to receive emails without the extra time (delay) that the change detection service provided. This is not a bug with the change detector :)
  • The change event polling now only replaces the existing checksum when the contents has actually changed (after processing). I don't expect that to cause regressions, but it might reduce some redundant I/O each poll?
  • The change service is now the last daemon started.

AFAIK, with the current start-up scripts the change detector service should not have an issue with conflicting writes:

  • It would be started after all our setup scripts have run (initial checksum created), followed by permission fixes, user-patches and other daemons being started.
  • For first runs without any accounts, this occurs during setup scripts, before the change detector service is running, so again I don't see any race conditions?
  • This was apparently a concern / issue in the past. One concern raised was multiple containers sharing a config volume (such as over a network NFS share), I'm not sure that a fixed sleep would alleviate that though?

History + References

Change Detector start-up flow (preparing initial checksum)

_setup => _prepare_for_change_detection => _monitored_files_checksums >"${CHKSUM_FILE}"

_log 'info' "Welcome to docker-mailserver $(</VERSION)"
_register_functions
_check
_setup
[[ ${LOG_LEVEL} =~ (debug|trace) ]] && print-environment
_apply_fixes
_start_misc
_setup_user_patches
_start_daemons
# marker to check if container was restarted
date >/CONTAINER_START
_log 'info' "${HOSTNAME} is up and running"

function _setup
{
_log 'info' 'Configuring mail server'
for FUNC in "${FUNCS_SETUP[@]}"
do
${FUNC}
done
# All startup modifications to configs should have taken place before calling this:
_prepare_for_change_detection
}

# Global checksum file used to track when monitored files have changed in content:
# shellcheck disable=SC2034
CHKSUM_FILE=/tmp/docker-mailserver-config-chksum
# Once container startup scripts complete, take a snapshot of
# the config state via storing a list of files content hashes.
function _prepare_for_change_detection
{
_log 'debug' 'Setting up configuration checksum file'
_log 'trace' "Creating '${CHKSUM_FILE}'"
_monitored_files_checksums >"${CHKSUM_FILE}"
}

Type of change

  • Improvement (non-breaking change that does improve existing functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

There should be no concern with conflicts as any writes should have already been done by the time this daemon service is started.
The change event for adding a user can be processed much sooner now, which means Amavis may not yet be ready.

Added extra condition to wait on at least the Amavis port being reachable, and some failure asserts with the mail queue to better catch / debug when this problem occurs.
@polarathene polarathene added area/scripts kind/improvement Improve an existing feature, configuration file or the documentation labels Feb 6, 2023
@polarathene polarathene added this to the v12.0.0 milestone Feb 6, 2023
@polarathene polarathene self-assigned this Feb 6, 2023
@polarathene polarathene changed the title chore: changedetection remove init delay chore: Remove delay starting the change detection service Feb 6, 2023
georglauterbach
georglauterbach previously approved these changes Feb 6, 2023
Copy link
Copy Markdown
Member

@georglauterbach georglauterbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ran the tests as we were seeing weird issues with smtp_delivery.bats. Looks fine now though. You could do me a favour and run the tests locally a few times so we can be more sure about not having introduced a new race condition here.


_wait_for_smtp_port_in_container
# Even if the Amavis port is reachable at this point, it may still refuse connections?
_wait_for_tcp_port_in_container 10024
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to wait for port 10025 too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure when these are used 😅

Perhaps I'm listening on the wrong port since 10024 appears to be for sending and 10025 for receiving? I just know that when I queried the mail queue, sometimes a connection refused error mentioned port 10024.

You could do me a favour and run the tests locally a few times so we can be more sure about not having introduced a new race condition here.

I ran it many times actually and wasn't able to produce it locally. I had this change ready around a month ago, but back then it occurred more often. I had a solution to delay sending mail a little bit with sleep 1:

  # NOTE: If change detection begins too early, it can cause connection issues with Amavis.
  # `sleep 1` can be added before or after to prevent that from happening.
  sleep 1

I am pretty sure that the issue is more to do with Amavis not being ready quick enough, as before with the 10 sec change event delay, there was plenty of time. There might be a small window after the change event with the reloads on amavis and postfix (plus dovecot for delivery I guess).


I'll be getting some rest, but will address this concern when I'm back 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be getting some rest, but will address this concern when I'm back 👍

Alright! I too think this is about a Amavis-timing-issue. I'd add the check for port 10025 and I'm not against a small sleep as mentioned in your comment (with additional info on why it's needed, just like in your comment).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add the check for port 10025

Will this be added or should I approve now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it, sorry been a bit busy and since CI is broken presently for tests it seems I should wait until we've resolved that 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add the check for port 10025

Will this be added or should I approve now? 😆

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I don't believe there needs to be a check for port 10025. AFAIK this is a port on Postfix to receive mail back from Amavis after it's done processing. I am doubtful it'll make a difference when there's a connection error logged about port 10024. The sleep should give Amavis enough time.

@polarathene polarathene enabled auto-merge (squash) February 9, 2023 11:39
@polarathene
Copy link
Copy Markdown
Member Author

@casperklein ready to approve 😅

@georglauterbach
Copy link
Copy Markdown
Member

@casperklein ready to approve 😅

If you're really short on time, just let us briefly know :) Maybe we can then go ahead and merge a few PRs with only reviewer :)

@polarathene polarathene merged commit 1c8a160 into docker-mailserver:master Feb 18, 2023
@georglauterbach
Copy link
Copy Markdown
Member

georglauterbach commented Feb 18, 2023

🤦🏼 I did not see auto-merge was enabled...

The merge happened because I changed the required reviewers for the time being to 1. Rationale: This project is maintained by volunteers, and we cannot ask people to spend time on it when they do not have time for it. On the other hand, we have at least 1 reviewer available at the moment, so I think for the time being, it is safe to merge some PRs with only 1 approval, especially when PRs are waiting for a longer time to get merged (hindering more work to happen ATM). Best case would be to have 2 frequent reviewers again. @casperklein just keep us updated on your current status, I can change the required approvals back at any time :)

I also wanted to ask @polarathene if you're fine with this being merged before actually merging it, but the auto-merge was just too fast.. :D

@polarathene
Copy link
Copy Markdown
Member Author

I changed the required reviewers for the time being to 1. Rationale: This project is maintained by volunteers, and we cannot ask people to spend time on it when they do not have time for it.

On the other hand, we have at least 1 reviewer available at the moment, so I think for the time being, it is safe to merge some PRs with only 1 approval, especially when PRs are waiting for a longer time to get merged (hindering more work to happen ATM).

I think that this is the right thing to do.

We have not been getting new maintainers joining, and many of us are strained for time. I know I've over-committed myself already and need to focus on other tasks outside of DMS.

As much as I'd like to contribute more, it's a huge time sink that's difficult to justify the amount of free time I invest into it 😅

@georglauterbach
Copy link
Copy Markdown
Member

No worries, you have already contributed so much and you added so much value to this project ❤️ I know from experience that maintaing takes a lot of time, absolutely right :D I think what would be most valuable is to still have a few people around for reviewing some of the proposed changes - one does not need to invest heavily and doing changes themselves; reviewing is very important in the end :)

@casperklein
Copy link
Copy Markdown
Member

@casperklein just keep us updated on your current status

I am still here. I just had some two extreme busy weeks and had no time for reviews. If PRs are not related to the current test improvements, I'll review them in more depth. So I don't approve them in hurry to get them merged asap (without reviewing them properly). I will take a look at the merged PRs, in case something important was missed.

@georglauterbach
Copy link
Copy Markdown
Member

@casperklein just keep us updated on your current status

I am still here. I just had some two extreme busy weeks and had no time for reviews.

I hope I did not offend you in any way :) I am going to be on vacation and I'll be very busy the next weeks, so I wanted to get some PRs out of the way, to have peace of mind during my vacation :D

If PRs are not related to the current test improvements, I'll review them in more depth. So I don't approve them in hurry to get them merged asap (without reviewing them properly). I will take a look at the merged PRs, in case something important was missed.

👍🏼

@casperklein
Copy link
Copy Markdown
Member

I hope I did not offend you in any way :)

No you didn't. In march I'll be more active again.

@polarathene polarathene mentioned this pull request Feb 23, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/scripts kind/improvement Improve an existing feature, configuration file or the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants