relieve some pain: restart `check-for-changes.sh` regularly by georglauterbach · Pull Request #2398 · docker-mailserver/docker-mailserver

georglauterbach · 2022-02-07T10:28:15Z

Description

The script does not need to run every single second. This is not to say
that there is huge resource usage due to running every second, but every
~~ten~~ two seconds is more than enough. This should act like a plaster when
it comes to #2348 - I KNOW THIS IS NOT A FIX. But this way, people
can first of all stay on Debian 11 and DMS 10.4.0, secondly, we may get
some more feedback on this mysterious issue due to people staying in DMS
10.4.0. Restarting the script will provide a new PID and should in theory stop the resource leakage. Not sure whether we want to go through with it though, as this will also disguise the error at hand to some degree.

Independently of #2348, increasing the timeout is a very valid concern I had, and on my deployment, I already raised it.

Fixes nothing, but should increase the time before DMS starts using all the RAM on a system.

Type of change

Improvement (non-breaking change that does improve existing functionality)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (README.md or the documentation under docs/)
If necessary I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

The script does not need to run every single second. This is not to say that there is huge resource usage due to running every second, but every ten seconds is more than enough. This _should_ act like a plaster when it comes to #2348 - I KNOW THIS IS **NOT** A FIX. But this way, people can first of all stay on Debian 11 and DMS 10.4.0, secondly, we may get some more feedback on this mysterious issue due to people staying in DMS 10.4.0.

georglauterbach · 2022-02-07T11:59:54Z

@polarathene would providing the one failing test with more time help here, or do you think the issue is somewhere else entirely?

oblitum · 2022-02-07T13:08:36Z

This is probably related to PR #2157.

georglauterbach · 2022-02-07T13:12:15Z

This is probably related to PR #2157.

It is :D But #2157 has stalled.

I'm not even sure whether this PR is the right way, but I'm running my deployment with 10s and all is well for me. Independently of the problem I think that increasing sleep time is a good idea.

casperklein · 2022-02-07T13:23:08Z

Fixes nothing, but should increase the time before DMS starts using all the RAM on a system.

I think a better approach would be, to restart the changedetector service once a day or so.
Your solution does only slow down a potential memory leak, but will not prevent it. Restarting the service should however "fix" it.

georglauterbach · 2022-02-07T15:10:24Z

Good idea. Will supervisorctl changedetector restart suffice here?

casperklein · 2022-02-07T17:33:54Z

Yes, I think so. But haven't tested it. If the PID of the script changes afterwards, it works.

casperklein · 2022-02-07T17:37:20Z

BTW: While reviewing the check-for-changes script, I stumbled upon

docker-mailserver/target/scripts/helper-functions.sh

Lines 245 to 254 in 7b21db7

    
           ( 
        
             cd /tmp/docker-mailserver || exit 1 
        
             exec sha512sum 2>/dev/null -- \ 
        
               postfix-accounts.cf \ 
        
               postfix-virtual.cf \ 
        
               postfix-aliases.cf \ 
        
               dovecot-quotas.cf \ 
        
               /etc/letsencrypt/acme.json \ 
        
               ${CERT_FILES[@]} 
        
           )

@georglauterbach @polarathene Do you have an idea, why there is a subshell used + exec statement? IMO it should work without those too. I cannot imagine a reason for that.

georglauterbach · 2022-02-07T17:43:32Z

BTW: While reviewing the check-for-changes script, I stumbled upon

docker-mailserver/target/scripts/helper-functions.sh

Lines 245 to 254 in 7b21db7

(

cd /tmp/docker-mailserver || exit 1

exec sha512sum 2>/dev/null -- \

postfix-accounts.cf \

postfix-virtual.cf \

postfix-aliases.cf \

dovecot-quotas.cf \

/etc/letsencrypt/acme.json \

${CERT_FILES[@]}

)

@georglauterbach @polarathene Do you have an idea, why there is a subshell used + exec statement? IMO it should work without those too. I cannot imagine a reason for that.

I see no reason why the subshell is there either... maybe the writer wanted to be able to not run the SHA-sum when he/she could not change directories, but this is an awful way of doing it. I think this can be done in a better way :D

…k-for-changes

oblitum · 2022-02-07T22:41:05Z

I think this can be done in a better way :D

TBH, this spot in the code is the strange part that makes me feel it's the culprit. I wish that, if you folks could refactor this to something less astonishing, we could then test whether that actually removes the issue, before having a PR applied that could daily kill the problem.

georglauterbach · 2022-02-07T22:51:01Z

I think this can be done in a better way :D

TBH, this spot in the code is the strange part that makes me feel it's the culprit. I wish that, if you folks could refactor this to something less astonishing, we could then test whether that actually removes the issue, before having a PR applied that could daily kill the problem.

I will see where I can provide something here unless @casperklein beats me to it :D @casperklein maybe you can have a look at the mailbox in #2361 if you have some time - not sure how to approach this properly, and you have more experience in this regard.

casperklein · 2022-02-07T23:41:13Z

function _fix_restart_changedetector_daily

One last thing: I think it's a good idea, to add a small comment referencing the issue, so it's clear, why this function exists.
Otherwise it LGTM 👍

Edit:

before having a PR applied that could daily kill the problem.

Valid point. Once this "fix" is applied and works, people will not notice that there is a problem, which makes it harder to find the root cause and apply a final fix.

oblitum · 2022-02-08T01:05:53Z

Just to be clear, I suspect of sub shell part because it does a full clone of shell script state (without due reason afaik), and this state cloning might be increasing each time it happens, accumulating over previous state (although I couldn't yet realize what could be linearly increasing), while leaving leftovers behind in the parent shell process. This can explain why both RAM and CPU usage increases linearly.

NorseGaud · 2022-02-08T03:32:08Z

Just to be clear, I suspect of sub shell part because it does a full clone of shell script state (without due reason afaik), and this state cloning might be increasing each time it happens, accumulating over previous state (although I couldn't yet realize what could be linearly increasing), while leaving leftovers behind in the parent shell process. This can explain why both RAM and CPU usage increases linearly.

I think that's a really good point.

NorseGaud · 2022-02-08T03:33:58Z

I've seen exec used instead of eval but both for the same purpose: allowing interpolation of dynamic variables to happen before execution. Likely why because of the array expansion/interpolation happening. Though, it should work just fine without exec afaik

georglauterbach · 2022-02-08T11:12:29Z

Currently testing the new version locally. If this works, I'll create another PR and we can close this one.

While the removal of exec is certainly worthwhile, if this fixes the resource leak, it seems as if the problem is with the version of Bash on Debian 11 (some bug in Bash?), or the exec does something very weird we do not understand. I highly doubt that this is a problem with the underlying exec syscall, but maybe Docker does a weird translation as well. Who knows...

georglauterbach · 2022-02-08T13:15:19Z

I will close this in favor of #2401.

polarathene · 2022-02-16T07:32:56Z

Just to add to the discussion. It was maybe related to syscall change with the newer Debian image, perhaps not due to Bash itself but another core package (EDIT: glibc AFAIK).

I don't recall the specifics but Alpine has a similar problem I became aware of a week or so ago (EDIT: this answer explains it, for Debian 10 vs 11 it seems to be related to the glibc update, notably with kernels prior to 5.8), where a change since their 3.14 release resulted in a syscall that older kernels didn't support properly.. I was surprised as it broke the hash command from bash from working correctly (the Sentry.io CLI installer script was failing because of that).

georglauterbach · 2022-02-16T08:42:59Z

Very interesting, so it was as I suspected earlier in the issue discussion (a kernel thing...). Thank you for sharing the information here, very much appreciated!

EDIT: glibc is broken anyway (if one belives L. Torvalds :D)

oblitum · 2022-02-16T18:38:39Z

@polarathene tbh I'm not sure how that relates to a leak due to subshell'ing, I couldn't connect the dots. Pointed problem refers to failing testing file writable, seems not related at first.

georglauterbach added area/scripts kind/improvement Improve an existing feature, configuration file or the documentation labels Feb 7, 2022

georglauterbach requested a review from a team February 7, 2022 10:28

georglauterbach self-assigned this Feb 7, 2022

restart changedetector instead of increasing timeout by 10

815f904

casperklein reviewed Feb 7, 2022

View reviewed changes

Comment thread target/scripts/startup/fixes-stack.sh Outdated

georglauterbach changed the title ~~relieve some pain: increased sleep duration for check-for-changes.sh~~ relieve some pain: restartcheck-for-changes.sh regularly Feb 7, 2022

georglauterbach changed the title ~~relieve some pain: restartcheck-for-changes.sh regularly~~ relieve some pain: restart check-for-changes.sh regularly Feb 7, 2022

georglauterbach added 2 commits February 7, 2022 22:17

Merge remote-tracking branch 'origin/master' into pain-reduction/chec…

74a9739

…k-for-changes

added shebang and made file executable

3d1d28f

georglauterbach requested a review from a team February 7, 2022 21:31

georglauterbach marked this pull request as draft February 8, 2022 11:48

georglauterbach mentioned this pull request Feb 8, 2022

improvement: get rid of subshell + exec in helper-functions.sh #2401

Merged

7 tasks

georglauterbach closed this Feb 8, 2022

georglauterbach deleted the pain-reduction/check-for-changes branch February 8, 2022 13:15

Uh oh!

Conversation

georglauterbach commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

georglauterbach commented Feb 7, 2022

Uh oh!

oblitum commented Feb 7, 2022

Uh oh!

georglauterbach commented Feb 7, 2022

Uh oh!

casperklein commented Feb 7, 2022

Uh oh!

georglauterbach commented Feb 7, 2022

Uh oh!

casperklein commented Feb 7, 2022

Uh oh!

casperklein commented Feb 7, 2022

Uh oh!

georglauterbach commented Feb 7, 2022

Uh oh!

Uh oh!

oblitum commented Feb 7, 2022

Uh oh!

georglauterbach commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casperklein commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oblitum commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NorseGaud commented Feb 8, 2022

Uh oh!

NorseGaud commented Feb 8, 2022

Uh oh!

georglauterbach commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georglauterbach commented Feb 8, 2022

Uh oh!

polarathene commented Feb 16, 2022

Uh oh!

georglauterbach commented Feb 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oblitum commented Feb 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

georglauterbach commented Feb 7, 2022 •

edited

Loading

georglauterbach commented Feb 7, 2022 •

edited

Loading

casperklein commented Feb 7, 2022 •

edited

Loading

oblitum commented Feb 8, 2022 •

edited

Loading

georglauterbach commented Feb 8, 2022 •

edited

Loading

georglauterbach commented Feb 16, 2022 •

edited

Loading

oblitum commented Feb 16, 2022 •

edited

Loading