Skip to content

check-for-changes.sh performance concerns on NFS #2098

@NorseGaud

Description

@NorseGaud

Subject

#2096 sparked my interest in this. The addmailuser script seems to be a bit slow for some users (NFS mostly):

I notice that when calling addmailuser it becomes very slow when number of mailboxes increases.

I did not notice slowness too much when using non-NFS e.g. local HDD or AWS EBS storage

Now I am using NFS (actually AWS EFS) storage since deploying as docker in AWS ECS.

For example I have 2500 mailboxes and a call to addmailuser takes 60 seconds.

Ok, so this is understandable due to the following code I added in one of my last PRs:

if [[ -e "/tmp/docker-mailserver-config-chksum" ]] # Prevent infinite loop in tests like "checking accounts: user3 should have been added to /tmp/docker-mailserver/postfix-accounts.cf even when that file does not exist"
then
while [[ ! -d "/var/mail/${DOMAIN}/${USER}" ]]
do
echo "Waiting for dovecot to create /var/mail/${DOMAIN}/${USER}..."
sleep 1
done
fi

This code assists with NFS or other slow volumes in use by the mail server. We could debate why slow volumes are being used, but I'll save that for another ticket :)

So, with all of this said, it doesn't actually seem as if slow volumes are actually the main cause of the root problem. The root problem seems to be that check-for-changes.sh takes much too long to update what it needs to. I therefore did some digging in a forked branch: https://github.com/NorseGaud/docker-mailserver/commits/check-for-changes-performance

Here are the results while running a modified check-for-changes.sh that outputs run time and STDOUT/ERR:


  1. Adding emails one by one using for n in {1..10}; do ./usr/local/bin/addmailuser test${n}@pierce.us XXXXX; done in the running container in my production setup using NFS. The check-for-changes.sh was only modified to show the run time, and no other changes were made:
root@ip-172-31-3-247:/# ./usr/local/bin/check-for-changes.sh
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
DONE TOTAL RUNTIME SECONDS: 0
---------- Created [email protected] ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 6
---------- Created [email protected] ----------------------
[ WARNING ]  File not found for certificate in check_for_changes.sh
postfix: stopped
postfix: started
dovecot: stopped
dovecot: started
DONE TOTAL RUNTIME SECONDS: 18

All other emails added took ~18 seconds.

  1. I then added code that backgrounded almost all of the commands check-for-changes.sh runs, does a wait for each PID (wait "${WAIT_FOR_PIDS[*]}"), and then lets supervisorctl restart commands happen. The results were exactly the same, indicating to me that postfix/dovecot restarting was the primary target for the delay. I commented the dovecot restart out and found that actually postfix itself took ~15 seconds to restart ON THE SECOND RESTART.

While parallelization of certain things in check-for-changes.sh might be a good idea, postfix's restart time is crazy long.

I'm going to dig into if there is anything we can do about this, but I wanted to post it here for the community to also make some recommendations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/improvementImprove an existing feature, configuration file or the documentationmeta/closed due to age or inactivityThis issue / PR has been closed due to inactivitymeta/staleThis issue / PR has become stale and will be closed if there is no further activitypriority/low

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions