Skip to content

scripts: refactored check-for-changes.sh#2498

Merged
georglauterbach merged 11 commits intomasterfrom
scripts/refactoring/check-for-changes
Apr 2, 2022
Merged

scripts: refactored check-for-changes.sh#2498
georglauterbach merged 11 commits intomasterfrom
scripts/refactoring/check-for-changes

Conversation

@georglauterbach
Copy link
Copy Markdown
Member

@georglauterbach georglauterbach commented Mar 21, 2022

Description

Slightly altered check-for-changes.sh and adjusted it to the new log. Removed _notify as it is now not used anywhere else. I will remove DMS_DEBUG with a separate PR.

Type of change

  • Improvement (non-breaking change that does improve existing functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

I refactored `check-for-changes.sh` and used the new log. `_notify` can
therefore be deleted as it is used no more. The `_get_date` function was
moved to `utils.sh`.
Let'sEncrypt and SSL manual tests were adjusted to the new log.
Moreover,

```
assert_output --partial 'Change detected'
```

was dropped because it appears first in the logs, and due to the now
more verbose log, we'd need to increase the lenght of the `tail`ed
output, but this an unreliable method. I therefore simply dropped this
message, and I think this is fine, because the other messages are still
fine for testing whether all is well with the script and its
functionality.
@georglauterbach georglauterbach added area/scripts kind/improvement Improve an existing feature, configuration file or the documentation labels Mar 21, 2022
@georglauterbach georglauterbach added this to the v11.0.0 milestone Mar 21, 2022
@georglauterbach georglauterbach self-assigned this Mar 21, 2022
@georglauterbach georglauterbach marked this pull request as draft March 21, 2022 12:03
polarathene
polarathene previously approved these changes Mar 22, 2022
Comment thread target/scripts/check-for-changes.sh
Comment thread target/scripts/check-for-changes.sh
Comment thread test/mail_ssl_letsencrypt.bats
Comment thread target/scripts/check-for-changes.sh Outdated
@georglauterbach georglauterbach marked this pull request as ready for review March 22, 2022 10:51
@georglauterbach georglauterbach marked this pull request as draft March 22, 2022 11:13
@georglauterbach
Copy link
Copy Markdown
Member Author

Feel free to have a look. I'm waiting on #2499 to get merged because this PR holds the _log_with_date function, which I want to use in check-for-changes.sh as well.

So here is what happens:

1. The test starts and the mail server logs the message about the
   certificate being extracted during startup in the normal logs (all is
   well)
2. The `_acme_rsa` runs and extracts (FOR THE SAME DOMAIN) (all is well)
3. In the end, `_acme_wildcard` run, (FOR ANOTHER DOMAIN mind you). This
   results in another entry being created under `/etc/letsencrypt/live`
   which causes the changedetector to run twice, not once (which is
   correct). Now, all is not well. Due to the (in total) third run, the
   log does not contain the `'/etc/letsencrypt/acme.json' has changed...`
   message anymore when `tail`ing normally (because the `acme.json`
   hasn't changed again).

   I therefore

   1. Added `-3000` to tail more log. This is fine because with the
      first test (`_acme_rsa`), we cannot get false positives as it is
      the first test; with `_acme_wildcard` I carefully measured the
      logs to not have a false positive (now, this test WAS and STILL IS
      very sensible to log adjustments - I'm not changing this in this
      PR).
    2. Added more `sleep` duration for two changedetector runs to
       complete without problems. This prevents timing issues in `_acme_wildcard`.
    3. Obviously, the restart count needed to be adjusted as well (as
       the changedetector now restards thrice and not twice as
       expected). Looking at the whole file is an improvement because
       now we will _always_ have the total amount of restarts available.

I guess this did not come up due to how the logs were written before and
because of "timing issues". This could and should be resolved in the
future.
@georglauterbach
Copy link
Copy Markdown
Member Author

georglauterbach commented Mar 22, 2022

@polarathene I added a longer commit message for 4e40ce1 (which adjusts the LE test) if you're curious why I did what I did.

@polarathene
Copy link
Copy Markdown
Member

polarathene commented Mar 22, 2022

From commit message:

In the end, _acme_wildcard run, (FOR ANOTHER DOMAIN mind you). This results in another entry being created under /etc/letsencrypt/live which causes the changedetector to run twice, not once (which is correct).

Now, all is not well. Due to the (in total) third run, the log does not contain the '/etc/letsencrypt/acme.json' has changed... message anymore when tailing normally (because the acme.json hasn't changed again).

  1. Added more sleep duration for two changedetector runs to complete without problems. This prevents timing issues in _acme_wildcard.
  2. Obviously, the restart count needed to be adjusted as well

The third change event shouldn't be occurring? We identified this was triggered by the wildcard cert files, yet the rsa cert files did not trigger a change (as they shouldn't) despite different contents to the ecdsa cert files. The wildcard files should not be triggering a change event either... I prefer we resolve that bug tbh.

I detailed all of this with full flow and likely lines that contain the bug in the original PR discussion about this.

Getting restart counts from the full log is great, but adjusting the expected restart count in the test to accommodate the bug is not - when we should be able to fix it instead of ignore it.

So, apparently

1. `check-for-changes.sh` needs HOSTNAME and DOMAINNAME to
   to be set properly (this is now fixed again, I removed it earlier)
2. POSTMASTER_ADDRESS was not properly set to begin with, so this PR
   actually also fixes a bug :D

The regression should be fixed now and I reverted parts of the test back
to their original state.
@georglauterbach
Copy link
Copy Markdown
Member Author

@polarathene I found the regression, and 31c91be fixes the issue. I reverted the necessary test pieces back to their original state :)

@georglauterbach georglauterbach marked this pull request as ready for review March 23, 2022 10:50
@polarathene
Copy link
Copy Markdown
Member

@polarathene I found the regression, and 31c91be fixes the issue.

Referencing the commit message:

So, apparently

  1. check-for-changes.sh needs HOSTNAME and DOMAINNAME to to be set properly (this is now fixed again, I removed it earlier)
  2. POSTMASTER_ADDRESS was not properly set to begin with, so this PR actually also fixes a bug :D

HOSTNAME and DOMAINNAME are needed for the ssl.sh helper right? Or other scripts? I wonder if there is a maintenance concern with the helpers we source if vars like those two internal ones can be expected in scripts but check-for-changes.sh doesn't keep in sync.. since I don't think we have much tests for such (basically the whole test suite again after a change event which is a bit tricky).

@georglauterbach
Copy link
Copy Markdown
Member Author

georglauterbach commented Mar 23, 2022

@polarathene I found the regression, and 31c91be fixes the issue.

Referencing the commit message:

So, apparently

  1. check-for-changes.sh needs HOSTNAME and DOMAINNAME to to be set properly (this is now fixed again, I removed it earlier)
  2. POSTMASTER_ADDRESS was not properly set to begin with, so this PR actually also fixes a bug :D

HOSTNAME and DOMAINNAME are needed for the ssl.sh helper right? Or other scripts? I wonder if there is a maintenance concern with the helpers we source if vars like those two internal ones can be expected in scripts but check-for-changes.sh doesn't keep in sync.. since I don't think we have much tests for such (basically the whole test suite again after a change event which is a bit tricky).

I guess they are needed by ssl.sh, yes, but other functions also rely on other env variables. Because we're currently using HOSTNAME and not something like DMS_HOSTNAME, we're not adding this to /etc/dms-settings and we therefore cannot just rely on this file. But we already discussed that the whole DNS/HOSTNAME/DOMAINNAME/etc. thing will change in the future.

Keeping them in sync is a whole other task 😉 There is a maintenance concern, but there is currently no need to worry because we do not rely on this and we do not expect these variables to change during the lifetime of the container.

Comment thread target/scripts/helpers/utils.sh Outdated
Moreover, I opted to source `/etc/dms-settings` as a whole to
future-proof the script. When the DNS adjustments PRs (that do not exist
by now but will exit in the future) are done, we can then remove
`_obtain_hostname_and_domainname` because we're already writing the
variables to `/etc/dms-settings`. I left instructions in the script in
the form of TODO comments.
@georglauterbach
Copy link
Copy Markdown
Member Author

All important changes are included now. This PR is officially ready for review @casperklein @polarathene :)

Because we now log the date for all messages of the changedetector, we
need to `tail` a bit more log than before.
@georglauterbach georglauterbach merged commit 35fb744 into master Apr 2, 2022
@georglauterbach georglauterbach deleted the scripts/refactoring/check-for-changes branch April 2, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/scripts kind/improvement Improve an existing feature, configuration file or the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants