ci: make helpers more robust#3137
Conversation
You won't like it, but trust me when I tell Ye: I BANGED MY HEAD AGAINST THIS WALL FOR HOURS AND DID NOT FIND A BETTER SOLUTION, and I think I'm quite good at Bash. For review's sake you will likely need to look at it, but afterwards: Just use it.. and don't think about it. I already did for you.
`setup.bash` is now more robust, due to the additional comments and the new `__abort` function.
Since the sending functions were called inside `export X=$()`, their return value was not properly set in case they failed. This has been corrected: the functions now take the variable name to store the ID in (if you need th ID), so they do not need the command substitution The function that checks whether a variable is set has been made "public" (I always smile while writing this..), so others can use it as well.
Very nice! The changes work, and they found a situation in which a container name was supplied that was deprecated.
64735a7 to
6104960
Compare
I don't like using bash, but this approach as a fix... I'd rather swap to python? 🤣 ( I'm not sure what failure in the tests you're referring to. I was able to resolve them locally and only saw one bizarre failure with the The caught typo/bug from this approach
That's good, and I imagine felt rewarding for the effort 😛 Although inspecting that typo call: docker-mailserver/test/helper/common.bash Lines 256 to 262 in b451742 The arg docker-mailserver/test/helper/common.bash Lines 25 to 67 in b451742 The current logic wasn't designed to detect the issue. The arg was provided, but lacked the prefix, so it checked the next condition, Is it an improvement, or the absence of extra logic?So how did your changes in this PR behave differently to catch the incorrect (yet corrected internally prior to this PR) container name? __SET_CONTAINER_NAME_WITH_POS_ARG_1='local CONTAINER_NAME=${1:-${CONTAINER_NAME:?Container name unset and explicit name not provided}}'
# ...
function _wait_for_smtp_port_in_container() {
eval "${__SET_CONTAINER_NAME_WITH_POS_ARG_1}"
_wait_for_tcp_port_in_container 25
}Isn't that just behaving the same as: function _wait_for_smtp_port_in_container() {
local CONTAINER_NAME=${1:-${CONTAINER_NAME:?Container name unset and explicit name not provided}}
_wait_for_tcp_port_in_container 25
}
Difficult to approveI'm still not sure I can get onboard with the proposed changes here. They don't look good from a maintenance perspective personally...
I did read over your comments in the commits that describe how the eval approach is meant to prevent an issue, but I was having some trouble understanding how that applies to our existing tests / helpers. I've probably encountered it or something similar in the past as some tests have been very frustrating to write / refactor due to weird behaviour, costing me time to investigate or find some alternative approach to work around, but this is more of a gripe with using bash itself I think.. It may be better to eventually consider moving the tests away from BATS, or at least delegating some of it like helpers to something more pleasant to work with 😅 |
|
The current issue is that we cannot use
I may have put to much strain on that. This is not actually what this PR solves. It solved the above mentioned issues.
Indeed. But since we do this at like 10 different points in the code, I wanted to have it as a variable. PS: THe check for the container name prefix is something I wasn't sure to begin with, and now that I see it obscures these issues, I am fond of removing it. |
Could you please reference a line where this has actually happened?
This is an improvement to workaround problems from using bash to run tests, at the expense of adding code smells into the helpers.
The
It technically worked as a safe guard though? It didn't allow an explicit container name that wasn't prefixed with what we'd expect. Instead of throwing an error (which from what I understand would not be caught with existing approach), it used the effectively guaranteed That does seem bad since that won't be obvious or caught, and probably risks multi-container tests where the That would be a valid reason, but for that particular concern with the function return code that the whole I'm still uncomfortable about the proposed changes. I won't block if @casperklein feels ok accepting the PR. |
In your recent PR about the Postfix log: https://github.com/docker-mailserver/docker-mailserver/actions/runs/4298377341/jobs/7492449282#step:6:99 This test should never have run because sending the mail and acquiring the ID failed in the first place. You didn't notice because only later is the function retrieving the ID failing (as it is not called inside
After showing the above, I am sorry to inform you that this is wrong.
Oh, but there is.
Bash itself has so many inherent concepts you could call code smells, I am not even sure wheter the
It is causing real problems. And they are hidden behind so much complexity, it's hard for anyone, let alone people who are not Bash experts, to understand. It took me an hour to figure out where the issue was coming from, but several to come up with a solution. I tried to explain it with the comments above the new ENV variables. It boils down to: everyone except @casperklein and me will currently have problems understanding the real issue when debugging this, because the actual error originates somewhere else. Sending an email is an integral part of our tests; and it is currently flawed. Errors will only show up later, and may be unrelated - which is the worst you can have, because you start searching at the wrong place. About the container name mismatch, this is just a nice addition we get out of these changes. But is not this PR's primary concern. |
| # | ||
| # @param ${1} = variable name that is to be checked | ||
| function _check_if_var_is_set() { | ||
| local -n REF=${1:?No variable name given to __check_if_set} |
There was a problem hiding this comment.
| local -n REF=${1:?No variable name given to __check_if_set} | |
| local -n REF=${1:?No variable name given to __check_if_var_is_set} |
Are you referring to this message: docker-mailserver/test/helper/common.bash Lines 63 to 66 in f0edcc2
Then the echo could be changed to output to FD3 |
|
Already tried that. While In the end, I don't want to pressure to merge this PR. Maybe it suffices to know that there are such issues. @polarathene is right, and I'd probably say the same: this is not a very nice change. The only reason I propose it is because I consider the alternative worse. It's all just about Bash being ugly in this regard. |
|
Maybe I miss something.. I did some tests ( without "local", error shows up and the test failswith "local", the error is hidden and the test succeedwith "local", but variable assignment in a separate step, the error is shown and the test failsIf I understand correctly, the issue you describe is the second example? Take a look at the third one, that should fix it. See also here. |
Just to clarify that I'm following along correctly. Test failure was reported in this test case when calling this helper method: docker-mailserver/test/helper/common.bash Lines 469 to 480 in f0edcc2 But actual failure should have been raised in docker-mailserver/test/tests/parallel/set1/spam_virus/rspamd.bats Lines 32 to 34 in f0edcc2 docker-mailserver/test/helper/sending.bash Lines 59 to 80 in f0edcc2 Where the actual cause of failure was already known as being due to changing the mail logging, Postfix was no longer sending logs to syslog to add into docker-mailserver/test/helper/sending.bash Lines 74 to 79 in f0edcc2 So the log file exists, no error from tac there, the grep calls silently fail and we end up with The assert that is meant to catch that failure didn't trigger? And then we echo the empty string back to the var in This PR attempts to resolve this by refactoring the helper method logic cited to: _run_in_container_bash "tac /var/log/mail/mail.log \
| grep -E -m 1 'postfix/smtpd.*: [A-Z0-9]+: client=localhost' \
| grep -E -o '[A-Z0-9]{9,12}'"
assert_success
# shellcheck disable=SC2154
ENV_VAR_NAME="${lines[0]}"
assert_not_equal "${ENV_VAR_NAME}" ''
Ok so given the above, current behaviour resulted in the Or I'm mistaken, and docker-mailserver/test/helper/common.bash Line 476 in f0edcc2 So what is different beyond Where does all the other additions from this PR come into play (the actual stuff that I'm uncomfortable with), and how does that better handle this failure scenario?
CI gave an error about where the failure happened:
I'm not against that change, but I'm curious how the
I don't see one. Tests are passing and working correctly AFAIK, the failure was still caught? I understand that there is the risk of that not happening, but there is no known issues at present that have been missed, that this PR is actually resolving right now. Thus not an immediate issue to resolve before v12, nothing is broken or blocked that would prevent the release of v12 that would depend upon this PR. All it has raised awareness of in our tests was a typo you missed while refactoring a test, which while awesome to see, was not causing any actual problems requiring the immediate attention of maintainers.
As expressed above. I remain at a stance that I disagree. I feel that this complicates maintenance further, which as you mention is already a concern. This adds another layer to that debugging experience to consider, and I wouldn't be surprised if someone else has a similar experience where these changes only slowed them down further with their debugging tbh.
While I agree. There is better alternative approaches to take. That particular concern can be resolved, but then the changes proposed here linger as technical debt, something that we've been frustratingly clearing out from several years of contributors / maintainers decisions in the past. If I've understood the scenario properly, the flaw is related to sending a mail and expecting a value returned back as the mail ID, with the problem being it may be empty. We have checks in place, especially when passing these expected values to other methods as args, that are correctly catching that issue and failing. It doesn't pin-point where the exact failure originated, and that is not ideal, but it's presently not undetected, as your PR here has proved. |
I remain ambivalent towards this PR. If @casperklein agrees with you that it's worthwhile, by all means merge.
I think this is the best course of action right now. The PR can be referenced in As you've mentioned elsewhere, this is also a problem in our scripts outside of tests.
I know it's awful, and we're not exactly attracting new maintainers over the years. We can only improve the DX so much with bash scripts, and I am of the opinion this is only going to make new potential contributors more intimidated. Our tests tend to be the biggest issue new contributors face. Realistically, if the project isn't able to attract new maintainers (or regular contributors), then it is at risk as our availability declines, I know that I can't keep up my frequency as of late or I'll burn out. Best we can do is have the project in good shape, along with the test suite as we've been focused on. So that it does become more approachable. I do my best to add traceability to issues / PRs for that purpose, and you both do a terrific job with your contributions. Eventually, we may shift away relying on bash. I don't expect it to be much better for attracting new contributors as we'd like pursue Rust instead of languages with larger communities like Python, but at least for us it'll be more robust / reliable. |
@georglauterbach documented this in the PR
The referenced shellcheck |
I described exactly this case in my comments in While I still believe this would be worthwhile, I can live with the issues our current code has and I will close this, because we have and had enough work already. I will address some notes below for understanding and context though. |
While this assumption seems logical, it's not quite on point. The whole point is the removal of
Eval will do an in-place evaluation, i.e. as if you wrote It would resolve the mentioned issue by failing much earlier (and properly) where the error actually happened - in
AFAIK, they do not quite run correctly in that the error aborts the tests at the right place. The failure was caught later because I am a good Bash programmer who made sure the
While you say "not ideal", I'd say it's plain wrong. I like to think about it this way: fn main() {
let id = if let Ok(value) = some_function_that_can_fail {
value
} else {
const some_nonsense_value_even_though_this_is_the_err_case__which_will_only_show_up_later: ...
some_nonsense_value_even_though_this_is_the_err_case__which_will_only_show_up_later
};
let log = print_log_for_id(id)?;
}Clippy would likely kill you for such a sin. |
Alright. I am sorry that I've missed that! NOW I THINK I've fully understand the issue (otherwise I give up 😆) and why you chose the eval way in this PR 👍 One last try from my side to solve this riddle 😆 Instead of using the same var name for global and local, Details#!/bin/bash
set -ueo pipefail
CONTAINER_NAME="dmstest"
function __handle_container_name() {
if [[ -n ${1:-} ]] && [[ ${1:-} =~ ^dms-test_ ]]
then
printf '%s' "${1}"
return 0
elif [[ -n ${CONTAINER_NAME+set} ]]
then
printf '%s' "${CONTAINER_NAME}"
return 0
else
echo 'ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage' >&2
exit 1
fi
}
function foo() {
local LOCAL_CONTAINER_NAME
LOCAL_CONTAINER_NAME=$(__handle_container_name "${1:-}")
echo "container name: $LOCAL_CONTAINER_NAME"
}
# use global CONTAINER_NAME
foo
# use custom container name passed as parameter
foo dms-test_my_custom_container_name
# wrong custom container name passed as parameter, fall back to use global CONTAINER_NAME instead
foo wrong_custom_container_name
unset CONTAINER_NAME
# no global CONTAINER_NAME, but custom name passed as parameter
foo dms-test_my_custom_container_name
# no global CONTAINER_NAME and wrong custom container name --> print error and exit 1
foo wrong_custom_container_name
# no global CONTAINER_NAME and no custom container name --> print error and exit 1
fooOutput: container name: dmstest
container name: dms-test_my_custom_container_name
container name: dmstest
container name: dms-test_my_custom_container_name
ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage |
Yeah, thought about that as well! This would only work if you do it like this: function some_function() {
local LOCAL_CONTAINER_NAME
LOCAL_CONTAINER_NAME=$(__handle_container_name "${1:-}")
local CONTAINER_NAME=${LOCAL_CONTAINER_NAME}
...
}because functions you call afterwards expect I didn't want to implement this because this add 2 additonal lines, and for someone looking at it, it seems to make no sense at all :D |
Description
A PR I was not fond of making, but current helpers are not actually failing as intended when an argument is wrong/unset. This PR corrects issues observed after investigating errors in #3134.
I'd strongly advise to go through this commit-by-commit. I tried to provide good commit messages; the code is documented as well.
Type of change
Checklist:
docs/)