relieve some pain: restart check-for-changes.sh regularly#2398
relieve some pain: restart check-for-changes.sh regularly#2398georglauterbach wants to merge 4 commits intomasterfrom
check-for-changes.sh regularly#2398Conversation
The script does not need to run every single second. This is not to say that there is huge resource usage due to running every second, but every ten seconds is more than enough. This _should_ act like a plaster when it comes to #2348 - I KNOW THIS IS **NOT** A FIX. But this way, people can first of all stay on Debian 11 and DMS 10.4.0, secondly, we may get some more feedback on this mysterious issue due to people staying in DMS 10.4.0.
|
@polarathene would providing the one failing test with more time help here, or do you think the issue is somewhere else entirely? |
|
This is probably related to PR #2157. |
I think a better approach would be, to restart the changedetector service once a day or so. |
|
Good idea. Will |
|
Yes, I think so. But haven't tested it. If the PID of the script changes afterwards, it works. |
|
BTW: While reviewing the check-for-changes script, I stumbled upon docker-mailserver/target/scripts/helper-functions.sh Lines 245 to 254 in 7b21db7 @georglauterbach @polarathene Do you have an idea, why there is a subshell used + |
I see no reason why the subshell is there either... maybe the writer wanted to be able to not run the SHA-sum when he/she could not change directories, but this is an awful way of doing it. I think this can be done in a better way :D |
check-for-changes.shcheck-for-changes.sh regularly
check-for-changes.sh regularlycheck-for-changes.sh regularly
TBH, this spot in the code is the strange part that makes me feel it's the culprit. I wish that, if you folks could refactor this to something less astonishing, we could then test whether that actually removes the issue, before having a PR applied that could daily kill the problem. |
I will see where I can provide something here unless @casperklein beats me to it :D @casperklein maybe you can have a look at the mailbox in #2361 if you have some time - not sure how to approach this properly, and you have more experience in this regard. |
One last thing: I think it's a good idea, to add a small comment referencing the issue, so it's clear, why this function exists. Edit:
Valid point. Once this "fix" is applied and works, people will not notice that there is a problem, which makes it harder to find the root cause and apply a final fix. |
|
Just to be clear, I suspect of sub shell part because it does a full clone of shell script state (without due reason afaik), and this state cloning might be increasing each time it happens, accumulating over previous state (although I couldn't yet realize what could be linearly increasing), while leaving leftovers behind in the parent shell process. This can explain why both RAM and CPU usage increases linearly. |
I think that's a really good point. |
|
I've seen exec used instead of eval but both for the same purpose: allowing interpolation of dynamic variables to happen before execution. Likely why because of the array expansion/interpolation happening. Though, it should work just fine without exec afaik |
|
Currently testing the new version locally. If this works, I'll create another PR and we can close this one. While the removal of |
|
I will close this in favor of #2401. |
|
Just to add to the discussion. It was maybe related to syscall change with the newer Debian image, perhaps not due to Bash itself but another core package (EDIT: glibc AFAIK). I don't recall the specifics but Alpine has a similar problem I became aware of a week or so ago (EDIT: this answer explains it, for Debian 10 vs 11 it seems to be related to the glibc update, notably with kernels prior to 5.8), where a change since their 3.14 release resulted in a syscall that older kernels didn't support properly.. I was surprised as it broke the |
|
Very interesting, so it was as I suspected earlier in the issue discussion (a kernel thing...). Thank you for sharing the information here, very much appreciated! EDIT: |
|
@polarathene tbh I'm not sure how that relates to a leak due to subshell'ing, I couldn't connect the dots. Pointed problem refers to failing testing file writable, seems not related at first. |
Description
The script does not need to run every single second. This is not to say
that there is huge resource usage due to running every second, but every
tentwo seconds is more than enough.This should act like a plaster whenI KNOW THIS IS NOT A FIX.it comes to #2348 -
But this way, peopleRestarting the script will provide a new PID and should in theory stop the resource leakage. Not sure whether we want to go through with it though, as this will also disguise the error at hand to some degree.can first of all stay on Debian 11 and DMS 10.4.0, secondly, we may get
some more feedback on this mysterious issue due to people staying in DMS
10.4.0.
Independently of #2348, increasing the timeout is a very valid concern I had, and on my deployment, I already raised it.
Fixes nothing, but should increase the time before DMS starts using all the RAM on a system.
Type of change
Checklist:
docs/)