ci: make helpers more robust by georglauterbach · Pull Request #3137 · docker-mailserver/docker-mailserver

georglauterbach · 2023-03-01T15:01:19Z

Description

A PR I was not fond of making, but current helpers are not actually failing as intended when an argument is wrong/unset. This PR corrects issues observed after investigating errors in #3134.

I'd strongly advise to go through this commit-by-commit. I tried to provide good commit messages; the code is documented as well.

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (README.md or the documentation under docs/)
If necessary I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

You won't like it, but trust me when I tell Ye: I BANGED MY HEAD AGAINST THIS WALL FOR HOURS AND DID NOT FIND A BETTER SOLUTION, and I think I'm quite good at Bash. For review's sake you will likely need to look at it, but afterwards: Just use it.. and don't think about it. I already did for you.

`setup.bash` is now more robust, due to the additional comments and the new `__abort` function.

Since the sending functions were called inside `export X=$()`, their return value was not properly set in case they failed. This has been corrected: the functions now take the variable name to store the ID in (if you need th ID), so they do not need the command substitution The function that checks whether a variable is set has been made "public" (I always smile while writing this..), so others can use it as well.

Very nice! The changes work, and they found a situation in which a container name was supplied that was deprecated.

polarathene · 2023-03-02T06:14:13Z

You won't like it, but trust me when I tell Ye: I BANGED MY HEAD AGAINST THIS WALL FOR HOURS AND DID NOT FIND A BETTER SOLUTION, and I think I'm quite good at Bash. - Commit message

I don't like using bash, but this approach as a fix... I'd rather swap to python? 🤣 (nginx-proxy does this btw, I contributed to their tests in the past and it wasn't too bad, initial hurdles aside)

I'm not sure what failure in the tests you're referring to. I was able to resolve them locally and only saw one bizarre failure with the setup_cli.bats test failing trying to access the smtp_delivery.bats container.

The caught typo/bug from this approach

Very nice! The changes work, and they found a situation in which a container name was supplied that was deprecated. - Commit message

That's good, and I imagine felt rewarding for the effort 😛

Although inspecting that typo call:

docker-mailserver/test/helper/common.bash

Lines 256 to 262 in b451742

    
           # Wait for SMTP port (25) to become ready. 
        
           # 
        
           # @param ${1} = name of the container [OPTIONAL] 
        
           function _wait_for_smtp_port_in_container() { 
        
             local CONTAINER_NAME=$(__handle_container_name "${1:-}") 
        
             _wait_for_tcp_port_in_container 25 
        
           }

The arg ${1} is indeed not the container name we want, but the CONTAINER_NAME resolved here would be the desired container name, so it functions as expected.

docker-mailserver/test/helper/common.bash

Lines 25 to 67 in b451742

    
           # Properly handle the container name given to tests. This makes the whole 
        
           # test suite more robust as we can be sure that the container name is 
        
           # properly set. Sometimes, we need to provide an explicit container name; 
        
           # this function eases the pain by either providing the explicitly given 
        
           # name or `CONTAINER_NAME` if it is set. 
        
           # 
        
           # @param ${1} = explicit container name [OPTIONAL] 
        
           # 
        
           # ## Attention 
        
           # 
        
           # Note that this function checks whether the name given to it starts with 
        
           # the prefix `dms-test_`. One must adhere to this naming convention. 
        
           # 
        
           # ## Panics 
        
           # 
        
           # If neither an explicit non-empty argument is given nor `CONTAINER_NAME` 
        
           # is set. 
        
           # 
        
           # ## "Calling Convention" 
        
           # 
        
           # This function should be called the following way: 
        
           # 
        
           #     local SOME_VAR=$(__handle_container_name "${X:-}") 
        
           # 
        
           # Where `X` is an arbitrary argument of the function you're calling. 
        
           # 
        
           # ## Note 
        
           # 
        
           # This function is internal and should not be used in tests. 
        
           function __handle_container_name() { 
        
             if [[ -n ${1:-} ]] && [[ ${1:-} =~ ^dms-test_ ]] 
        
             then 
        
               printf '%s' "${1}" 
        
               return 0 
        
             elif [[ -n ${CONTAINER_NAME+set} ]] 
        
             then 
        
               printf '%s' "${CONTAINER_NAME}" 
        
               return 0 
        
             else 
        
               echo 'ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage' >&2 
        
               exit 1 
        
             fi 
        
           }

The current logic wasn't designed to detect the issue. The arg was provided, but lacked the prefix, so it checked the next condition, CONTAINER_NAME was set, thus it was considered valid. If the test didn't declare CONTAINER_NAME, the test immediately fails saying it must be set before the tests begin.

Is it an improvement, or the absence of extra logic?

So how did your changes in this PR behave differently to catch the incorrect (yet corrected internally prior to this PR) container name?

__SET_CONTAINER_NAME_WITH_POS_ARG_1='local CONTAINER_NAME=${1:-${CONTAINER_NAME:?Container name unset and explicit name not provided}}'

# ...

function _wait_for_smtp_port_in_container() {
  eval "${__SET_CONTAINER_NAME_WITH_POS_ARG_1}"
  _wait_for_tcp_port_in_container 25
}

Isn't that just behaving the same as:

function _wait_for_smtp_port_in_container() {
  local CONTAINER_NAME=${1:-${CONTAINER_NAME:?Container name unset and explicit name not provided}}
  _wait_for_tcp_port_in_container 25
}

Where it was provided the arg mail_with_imap, thus that is used as the CONTAINER_NAME, and it fails to find the appropriate container?
Previous functionality worked here correctly, but it only accepted a container name arg if the expected prefix was there, otherwise it used an existing CONTAINER_NAME as fall back.
The only difference was additional logic to filter the arg by prefix, and since CONTAINER_NAME must be set before test cases are run, it'd always have that as a fallback, thus no error would be omitted in this scenario.

Difficult to approve

I'm still not sure I can get onboard with the proposed changes here. They don't look good from a maintenance perspective personally...

I'd like some concrete example of where tests aren't reliable, and these changes prove worthwhile.
I'll probably wait for @casperklein input this time around as bash scripts aren't my forte.

I did read over your comments in the commits that describe how the eval approach is meant to prevent an issue, but I was having some trouble understanding how that applies to our existing tests / helpers.

I've probably encountered it or something similar in the past as some tests have been very frustrating to write / refactor due to weird behaviour, costing me time to investigate or find some alternative approach to work around, but this is more of a gripe with using bash itself I think.. It may be better to eventually consider moving the tests away from BATS, or at least delegating some of it like helpers to something more pleasant to work with 😅

georglauterbach · 2023-03-03T08:26:15Z

The current issue is that we cannot use local VAR=$(...) since this hides errors that should actually be caught. But at the same time, we cannot use local CONTANER_NAME ; __handle_container_name .... So this PR solves the former and the latter issue.

So how did your changes in this PR behave differently to catch the incorrect (yet corrected internally prior to this PR) container name?

I may have put to much strain on that. This is not actually what this PR solves. It solved the above mentioned issues.

Isn't that just behaving the same as:

Indeed. But since we do this at like 10 different points in the code, I wanted to have it as a variable.

PS: THe check for the container name prefix is something I wasn't sure to begin with, and now that I see it obscures these issues, I am fond of removing it.

polarathene · 2023-03-03T10:45:23Z

The current issue is that we cannot use local VAR=$(...) since this hides errors that should actually be caught.

Could you please reference a line where this has actually happened?

I understand that you're saying it can happen, or that it's an issue while writing tests.
I am of the understanding that we don't have it actually impacting any of the current test suite hiding errors that should be raised.
If there's no immediate issue to resolve, the priority to review and merge before v12 release seems low?

So this PR solves the former and the latter issue.

This is an improvement to workaround problems from using bash to run tests, at the expense of adding code smells into the helpers.

Indeed. But since we do this at like 10 different points in the code, I wanted to have it as a variable.

The local CONTAINER_NAME=? I'm not sure it's worthwhile unless causing real problems :(

PS: The check for the container name prefix is something I wasn't sure to begin with, and now that I see it obscures these issues, I am fond of removing it.

It technically worked as a safe guard though?

It didn't allow an explicit container name that wasn't prefixed with what we'd expect. Instead of throwing an error (which from what I understand would not be caught with existing approach), it used the effectively guaranteed CONTAINER_NAME as a fallback.

That does seem bad since that won't be obvious or caught, and probably risks multi-container tests where the CONTAINER_NAME is changed confusing to debug if using a container name that didn't use the prefix convention.

That would be a valid reason, but for that particular concern with the function return code that the whole eval approach is meant to resolve, couldn't it raise failure internally via the assert calls from BATS? I assume that would be effective, but maybe not.

I'm still uncomfortable about the proposed changes. I won't block if @casperklein feels ok accepting the PR.

georglauterbach · 2023-03-03T14:44:12Z

Could you please reference a line where this has actually happened?

In your recent PR about the Postfix log: https://github.com/docker-mailserver/docker-mailserver/actions/runs/4298377341/jobs/7492449282#step:6:99

This test should never have run because sending the mail and acquiring the ID failed in the first place. You didn't notice because only later is the function retrieving the ID failing (as it is not called inside $(). This is bad because it hides the actual error.

I am of the understanding that we don't have it actually impacting any of the current test suite hiding errors that should be raised.

After showing the above, I am sorry to inform you that this is wrong.

If there's no immediate issue to resolve

Oh, but there is.

This is an improvement to workaround problems from using bash to run tests, at the expense of adding code smells into the helpers.

Bash itself has so many inherent concepts you could call code smells, I am not even sure wheter the eval ... really is Bash's worst - probably not. Variable scope is though..

The local CONTAINER_NAME=? I'm not sure it's worthwhile unless causing real problems :(

It is causing real problems. And they are hidden behind so much complexity, it's hard for anyone, let alone people who are not Bash experts, to understand. It took me an hour to figure out where the issue was coming from, but several to come up with a solution. I tried to explain it with the comments above the new ENV variables. It boils down to: everyone except @casperklein and me will currently have problems understanding the real issue when debugging this, because the actual error originates somewhere else. Sending an email is an integral part of our tests; and it is currently flawed. Errors will only show up later, and may be unrelated - which is the worst you can have, because you start searching at the wrong place.

About the container name mismatch, this is just a nice addition we get out of these changes. But is not this PR's primary concern.

casperklein · 2023-03-03T23:29:48Z

+#
+# @param ${1} = variable name that is to be checked
+function _check_if_var_is_set() {
+  local -n REF=${1:?No variable name given to __check_if_set}


Suggested change

local -n REF=${1:?No variable name given to __check_if_set}

local -n REF=${1:?No variable name given to __check_if_var_is_set}

casperklein · 2023-03-04T00:17:40Z

The current issue is that we cannot use local VAR=$(...) since this hides errors that should actually be caught.

Are you referring to this message:

docker-mailserver/test/helper/common.bash

Lines 63 to 66 in f0edcc2

    
           else 
        
             echo 'ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage' >&2 
        
             exit 1 
        
           fi

exit 1 should make bats fail the test, right?

Then the echo could be changed to output to FD3 echo "error: bla" >&3. This way, the error message would be visible.

georglauterbach · 2023-03-04T00:35:20Z

Already tried that. While exit 1 ends the test, same issue: for reasons, the error shown after the echo is messed up: in case setup_file failed, the error says teardown_file failed (with status 0).

In the end, I don't want to pressure to merge this PR. Maybe it suffices to know that there are such issues. @polarathene is right, and I'd probably say the same: this is not a very nice change. The only reason I propose it is because I consider the alternative worse. It's all just about Bash being ugly in this regard.

casperklein · 2023-03-04T01:28:24Z

Maybe I miss something.. I did some tests (test/tests/foo):

without "local", error shows up and the test fails

load "${REPOSITORY_ROOT}/test/helper/common"

@test "foo" {
  FOO=$(__handle_container_name ${1:-})
}

test/bats/bin/bats -T test/tests/foo.bats 
foo.bats
 ✗ foo [5]
   (in test file test/tests/foo.bats, line 6)
     `FOO=$(__handle_container_name ${1:-})' failed
   ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage

1 test, 1 failure in 0 seconds

with "local", the error is hidden and the test succeed

REPOSITORY_ROOT=/docker/build/docker-mailserver/

load "${REPOSITORY_ROOT}/test/helper/common"

@test "foo" {
  local FOO=$(__handle_container_name ${1:-})
}

test/bats/bin/bats -T test/tests/foo.bats 
foo.bats
 ✓ foo [6]

1 test, 0 failures in 0 seconds

with "local", but variable assignment in a separate step, the error is shown and the test fails

load "${REPOSITORY_ROOT}/test/helper/common"

@test "foo" {
  local FOO
  FOO=$(__handle_container_name ${1:-})

}

test/bats/bin/bats -T test/tests/foo.bats 
foo.bats
 ✗ foo [5]
   (in test file test/tests/foo.bats, line 6)
     `FOO=$(__handle_container_name ${1:-})' failed
   ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage

1 test, 1 failure in 0 seconds

If I understand correctly, the issue you describe is the second example? Take a look at the third one, that should fix it. See also here.

polarathene · 2023-03-04T01:33:35Z

This test should never have run because sending the mail and acquiring the ID failed in the first place.
You didn't notice because only later is the function retrieving the ID failing (as it is not called inside $().
This is bad because it hides the actual error.

Just to clarify that I'm following along correctly.

Test failure was reported in this test case when calling this helper method:

docker-mailserver/test/tests/parallel/set1/spam_virus/rspamd.bats

Line 63 in f0edcc2

_print_mail_log_for_id "${MAIL_ID1}"

docker-mailserver/test/helper/common.bash

Lines 469 to 480 in f0edcc2

    
           # Filters the mail log for lines that belong to a certain email identified 
        
           # by its ID. You can obtain the ID of an email you want to send by using 
        
           # `_send_email_and_get_id`. 
        
           # 
        
           # @param ${1} = email ID 
        
           # @param ${2} = container name [OPTIONAL] 
        
           function _print_mail_log_for_id() { 
        
             local MAIL_ID=${1:?Mail ID must be provided} 
        
             local CONTAINER_NAME=$(__handle_container_name "${2:-}") 
        
             _run_in_container grep -F "${MAIL_ID}" /var/log/mail.log 
        
           }

But actual failure should have been raised in setup() here?:

docker-mailserver/test/tests/parallel/set1/spam_virus/rspamd.bats

Lines 32 to 34 in f0edcc2

    
           # We will send 3 emails: the first one should pass just fine; the second one should 
        
           # be rejected due to spam; the third one should be rejected due to a virus. 
        
           export MAIL_ID1=$(_send_email_and_get_id 'email-templates/existing-user1')

docker-mailserver/test/helper/sending.bash

Lines 59 to 80 in f0edcc2

    
           function _send_email_and_get_id() { 
        
             local TEMPLATE_FILE=${1:?Must provide name of template file} 
        
             local NC_PARAMETERS=${2:-0.0.0.0 25} 
        
             local MAIL_ID 
        
             assert_not_equal "${NC_PARAMETERS}" '' 
        
             assert_not_equal "${CONTAINER_NAME:-}" '' 
        
             _wait_for_empty_mail_queue_in_container 
        
             _send_email "${TEMPLATE_FILE}" 
        
             _wait_for_empty_mail_queue_in_container 
        
             # The unique ID Postfix (and other services) use may be different in length 
        
             # on different systems (e.g. amd64 (11) vs aarch64 (10)). Hence, we use a 
        
             # range to safely capture it. 
        
             MAIL_ID=$(_exec_in_container tac /var/log/mail.log              \ 
        
               | grep -E -m 1 'postfix/smtpd.*: [A-Z0-9]+: client=localhost' \ 
        
               | grep -E -o '[A-Z0-9]{9,12}' || true) 
        
             assert_not_equal "${MAIL_ID}" '' 
        
             echo "${MAIL_ID}" 
        
           }

Where the actual cause of failure was already known as being due to changing the mail logging, Postfix was no longer sending logs to syslog to add into /var/log/mail/mail.log:

docker-mailserver/test/helper/sending.bash

Lines 74 to 79 in f0edcc2

    
             MAIL_ID=$(_exec_in_container tac /var/log/mail.log              \ 
        
               | grep -E -m 1 'postfix/smtpd.*: [A-Z0-9]+: client=localhost' \ 
        
               | grep -E -o '[A-Z0-9]{9,12}' || true) 
        
             assert_not_equal "${MAIL_ID}" '' 
        
             echo "${MAIL_ID}"

So the log file exists, no error from tac there, the grep calls silently fail and we end up with || true at the end.

The assert that is meant to catch that failure didn't trigger? And then we echo the empty string back to the var in setup().

This PR attempts to resolve this by refactoring the helper method logic cited to:

  _run_in_container_bash "tac /var/log/mail/mail.log \
    | grep -E -m 1 'postfix/smtpd.*: [A-Z0-9]+: client=localhost' \
    | grep -E -o '[A-Z0-9]{9,12}'"
  assert_success

  # shellcheck disable=SC2154
  ENV_VAR_NAME="${lines[0]}"
  assert_not_equal "${ENV_VAR_NAME}" ''

BATS run is used to leverage assert_success and extract the expected return value via BATS $lines var.
The helper method now takes a string arg to reference an external var with local -n ENV_VAR_NAME, where the value is assigned instead of the echo approach.

Ok so given the above, current behaviour resulted in the || true fallback, and the assert_not_equal didn't consider that equal to an empty string.

Or I'm mistaken, and _exec_in_container() with the implicit CONTAINER_NAME had a problem, so instead of || true, some other output was returned? Seems unlikely, as that'd still have echo'd a value for MAIL_ID1 in setup(), which is missing as per the CI error log:

docker-mailserver/test/helper/common.bash

Line 476 in f0edcc2

local MAIL_ID=${1:?Mail ID must be provided}

docker-mailserver/test/tests/parallel/set1/spam_virus/rspamd.bats

Line 63 in f0edcc2

_print_mail_log_for_id "${MAIL_ID1}"

docker-mailserver/test/tests/parallel/set1/spam_virus/rspamd.bats

Line 34 in f0edcc2

export MAIL_ID1=$(_send_email_and_get_id 'email-templates/existing-user1')

So what is different beyond _run_in_container_bash paired with the removal of || true that I'm assuming is the actual problem (beyond the assert_not_equal failing to catch it), now resolved by assert_success being required instead?

Where does all the other additions from this PR come into play (the actual stuff that I'm uncomfortable with), and how does that better handle this failure scenario?

After showing the above, I am sorry to inform you that this is wrong.

CI gave an error about where the failure happened:

not ok 56 [Rspamd] normal mail passes fine in 164ms
# (from function `_print_mail_log_for_id' in file test/helper/common.bash, line 476,
#  in test file test/tests/parallel/set1/spam_virus/rspamd.bats, line 63)
#   `_print_mail_log_for_id "${MAIL_ID1}"' failed
# /home/runner/work/docker-mailserver/docker-mailserver/test/helper/common.bash: line 476: 1: Mail ID must be provided

_print_mail_log_for_id() call at rspamd.bats:L63 => common.bash:L476, arg check for MAIL_ID.
I can then look up the rspamd.bats test, and see that this arg is an export MAIL_ID1 defined in the setup() calling _send_email_and_get_id().
This is where the failure actually happened, and much earlier. I agree with you. But it's not something hidden from me to follow that trail.
Your solution was to avoid subshells and rely on BATS run with assert_success even though the problem appears to be related to the assert_not_equal not working as intended? (due to running in a subshell I assume?) So we pass a string arg to provide an external var to update?

I'm not against that change, but I'm curious how the eval approach in this PR assists with resolving that, or is that a separate fix bundled alongside it?

Oh, but there is.

I don't see one. Tests are passing and working correctly AFAIK, the failure was still caught?

I understand that there is the risk of that not happening, but there is no known issues at present that have been missed, that this PR is actually resolving right now.

Thus not an immediate issue to resolve before v12, nothing is broken or blocked that would prevent the release of v12 that would depend upon this PR.

All it has raised awareness of in our tests was a typo you missed while refactoring a test, which while awesome to see, was not causing any actual problems requiring the immediate attention of maintainers.

It is causing real problems. And they are hidden behind so much complexity, it's hard for anyone, let alone people who are not Bash experts, to understand. It took me an hour to figure out where the issue was coming from, but several to come up with a solution.

As expressed above. I remain at a stance that I disagree. I feel that this complicates maintenance further, which as you mention is already a concern.

This adds another layer to that debugging experience to consider, and I wouldn't be surprised if someone else has a similar experience where these changes only slowed them down further with their debugging tbh.

Sending an email is an integral part of our tests; and it is currently flawed. Errors will only show up later, and may be unrelated - which is the worst you can have, because you start searching at the wrong place.

While I agree. There is better alternative approaches to take.

That particular concern can be resolved, but then the changes proposed here linger as technical debt, something that we've been frustratingly clearing out from several years of contributors / maintainers decisions in the past.

If I've understood the scenario properly, the flaw is related to sending a mail and expecting a value returned back as the mail ID, with the problem being it may be empty. We have checks in place, especially when passing these expected values to other methods as args, that are correctly catching that issue and failing.

It doesn't pin-point where the exact failure originated, and that is not ideal, but it's presently not undetected, as your PR here has proved.

polarathene · 2023-03-04T01:33:42Z

But is not this PR's primary concern.

It's the only issue presently caught / resolved by the changes in this PR is my point.
The referenced scenario wasn't difficult to follow the trail and your fix refactored that helpers logic to resolve it. I don't see where the other proposed changes that are more invasive come into play with that failure.

I remain ambivalent towards this PR. If @casperklein agrees with you that it's worthwhile, by all means merge.

Maybe it suffices to know that there are such issues.

I think this is the best course of action right now. The PR can be referenced in helper/common.bash and / or maintainer docs to raise awareness of the problem with debugging tests.

As you've mentioned elsewhere, this is also a problem in our scripts outside of tests.

The only reason I propose it is because I consider the alternative worse. It's all just about Bash being ugly in this regard.

I know it's awful, and we're not exactly attracting new maintainers over the years. We can only improve the DX so much with bash scripts, and I am of the opinion this is only going to make new potential contributors more intimidated.

Our tests tend to be the biggest issue new contributors face. Realistically, if the project isn't able to attract new maintainers (or regular contributors), then it is at risk as our availability declines, I know that I can't keep up my frequency as of late or I'll burn out.

Best we can do is have the project in good shape, along with the test suite as we've been focused on. So that it does become more approachable. I do my best to add traceability to issues / PRs for that purpose, and you both do a terrific job with your contributions.

Eventually, we may shift away relying on bash. I don't expect it to be much better for attracting new contributors as we'd like pursue Rust instead of languages with larger communities like Python, but at least for us it'll be more robust / reliable.

polarathene · 2023-03-04T01:41:12Z

If I understand correctly, the issue you describe is the second example? Take a look at the third one, that should fix it. See also here.

@georglauterbach documented this in the PR common.bash under "Container name handling functionality":

functions that want to provide the users of test helper functions with a simple API that

is able to take the container name as an explicit argument via a positional parameter;

or use the globally provided CONTAINER_NAME if no explicit parameter is provided;

or abort if 1. and 2. fail.

This cannot be done with functions though. The caller needs to use local CONTAINER_NAME to only override CONTAINER_NAME for the scope of the function.

Hence, you could either use local CONTAINER_NAME=$(__some_handling_function ${SOME_POSITIONAL_PARAMETER}), but this way, the test will not actually abort since the return value in the case of the abort will not be properly set due to local (see https://www.shellcheck.net/wiki/SC2155);

or you use local CONTAINER_NAME and then CONTAINER_NAME=$(__some_handling_function ${SOME_POSITIONAL_PARAMETER}), but this way, you have just introduced CONTAINER_NAME as an empty variable, and __some_handling_function is unable to get the global value.

The referenced shellcheck SC2155 is there in the 1st option (your 2nd example), while the next option at the end (your third example) mentions a problem with introducing an empty variable.

georglauterbach · 2023-03-04T11:17:23Z

If I understand correctly, the issue you describe is the second example? Take a look at the third one, that should fix it. See also here.

I described exactly this case in my comments in common.sh. Doesn't work. When you use local CONTAINER_NAME, the global name if hidden and you cannot fall back anymore.

While I still believe this would be worthwhile, I can live with the issues our current code has and I will close this, because we have and had enough work already. I will address some notes below for understanding and context though.

georglauterbach · 2023-03-04T11:34:17Z

So what is different beyond _run_in_container_bash paired with the removal of || true that I'm assuming is the actual problem (beyond the assert_not_equal failing to catch it), now resolved by assert_success being required instead?

While this assumption seems logical, it's not quite on point. The whole point is the removal of $(), which in turn raises the issue about the need to define the variable beforehand, which does not work either.

I'm not against that change, but I'm curious how the eval approach in this PR assists with resolving that, or is that a separate fix bundled alongside it?

Eval will do an in-place evaluation, i.e. as if you wrote local CONTAINER_NAME=${1:-${CONTAINER_NAME:? ... directly. You could do that, indeed - this would resolve the need for eval. But I figured always writing the whole thing is error-prone, hence the eval.

It would resolve the mentioned issue by failing much earlier (and properly) where the error actually happened - in setup_file.

Tests are passing and working correctly AFAIK, the failure was still caught?

AFAIK, they do not quite run correctly in that the error aborts the tests at the right place. The failure was caught later because I am a good Bash programmer who made sure the _print_mail_log_for_id made assertions to ID. Otherwise, a completely different error would have been shown: assert_output would fail, which is even further away from the issue. But as long as we write the helpers, and folks do not need to touch them, I guess you're right again.

It doesn't pin-point where the exact failure originated, and that is not ideal, but it's presently not undetected, as your PR here has proved.

While you say "not ideal", I'd say it's plain wrong. I like to think about it this way:

fn main() {
  let id = if let Ok(value) = some_function_that_can_fail {
    value
  } else {
    const some_nonsense_value_even_though_this_is_the_err_case__which_will_only_show_up_later: ...
    some_nonsense_value_even_though_this_is_the_err_case__which_will_only_show_up_later
  };

  let log = print_log_for_id(id)?; 
}

Clippy would likely kill you for such a sin.

casperklein · 2023-03-04T14:33:09Z

I described exactly this case in my comments in common.sh. Doesn't work. When you use local CONTAINER_NAME, the global name if hidden and you cannot fall back anymore.

Alright. I am sorry that I've missed that! NOW I THINK I've fully understand the issue (otherwise I give up 😆) and why you chose the eval way in this PR 👍

One last try from my side to solve this riddle 😆 Instead of using the same var name for global and local, LOCAL_CONTAINER_NAME is used.

Details

#!/bin/bash

set -ueo pipefail

CONTAINER_NAME="dmstest"

function __handle_container_name() {
  if [[ -n ${1:-} ]] && [[ ${1:-} =~ ^dms-test_ ]]
  then
    printf '%s' "${1}"
    return 0
  elif [[ -n ${CONTAINER_NAME+set} ]]
  then
    printf '%s' "${CONTAINER_NAME}"
    return 0
  else
    echo 'ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage' >&2
    exit 1
  fi
}

function foo() {
	local LOCAL_CONTAINER_NAME
	LOCAL_CONTAINER_NAME=$(__handle_container_name "${1:-}")
	echo "container name: $LOCAL_CONTAINER_NAME"
}

# use global CONTAINER_NAME
foo

# use custom container name passed as parameter
foo dms-test_my_custom_container_name

# wrong custom container name passed as parameter, fall back to use global CONTAINER_NAME instead
foo wrong_custom_container_name

unset CONTAINER_NAME

# no global CONTAINER_NAME, but custom name passed as parameter
foo dms-test_my_custom_container_name

# no global CONTAINER_NAME and wrong custom container name --> print error and exit 1
foo wrong_custom_container_name

# no global CONTAINER_NAME and no custom container name --> print error and exit 1
foo

Output:

container name: dmstest
container name: dms-test_my_custom_container_name
container name: dmstest
container name: dms-test_my_custom_container_name
ERROR: (helper/common.sh) Container name was either provided explicitly without the required "dms-test_" prefix, or CONTAINER_NAME is not set for implicit usage

georglauterbach · 2023-03-04T14:44:33Z

One last try from my side to solve this riddle laughing Instead of using the same var name for global and local, LOCAL_CONTAINER_NAME is used.

Yeah, thought about that as well! This would only work if you do it like this:

function some_function() {
  local LOCAL_CONTAINER_NAME
  LOCAL_CONTAINER_NAME=$(__handle_container_name "${1:-}")
  local CONTAINER_NAME=${LOCAL_CONTAINER_NAME}

  ...
}

because functions you call afterwards expect CONTAINER_NAME to be properly set again (otherwise you gain nothing from just setting LOCAL_CONTAINER_NAME).

I didn't want to implement this because this add 2 additonal lines, and for someone looking at it, it seems to make no sense at all :D

update BATS submodule

3de3a2c

georglauterbach added area/ci kind/bugfix labels Mar 1, 2023

georglauterbach added this to the v12.0.0 milestone Mar 1, 2023

georglauterbach requested a review from polarathene March 1, 2023 15:01

georglauterbach self-assigned this Mar 1, 2023

georglauterbach added 5 commits March 1, 2023 16:17

improve setup functionality for tests

6cdc8aa

`setup.bash` is now more robust, due to the additional comments and the new `__abort` function.

adjust helpers to use new container name handling

aa26a14

found and fixed error not previously discovered

6104960

Very nice! The changes work, and they found a situation in which a container name was supplied that was deprecated.

georglauterbach force-pushed the ci/make-sending-mails-more-robust branch from 64735a7 to 6104960 Compare March 1, 2023 15:18

georglauterbach mentioned this pull request Mar 1, 2023

config/ENV: improve Postfix config for spoof protection #3127

Merged

8 tasks

georglauterbach added priority/high priority/medium labels Mar 1, 2023

georglauterbach mentioned this pull request Mar 3, 2023

[BUG] _shutdown will continue when called in a subshell (causing UB) #3140

Closed

casperklein reviewed Mar 3, 2023

View reviewed changes

georglauterbach closed this Mar 4, 2023

georglauterbach deleted the ci/make-sending-mails-more-robust branch March 4, 2023 11:17

	local -n REF=${1:?No variable name given to __check_if_set}
	local -n REF=${1:?No variable name given to __check_if_var_is_set}

Uh oh!

Conversation

georglauterbach commented Mar 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

polarathene commented Mar 2, 2023

The caught typo/bug from this approach

Is it an improvement, or the absence of extra logic?

Difficult to approve

Uh oh!

georglauterbach commented Mar 3, 2023

Uh oh!

polarathene commented Mar 3, 2023

Uh oh!

georglauterbach commented Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casperklein Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

casperklein commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georglauterbach commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casperklein commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polarathene commented Mar 4, 2023

Uh oh!

polarathene commented Mar 4, 2023

Uh oh!

polarathene commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georglauterbach commented Mar 4, 2023

Uh oh!

georglauterbach commented Mar 4, 2023

Uh oh!

casperklein commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georglauterbach commented Mar 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

georglauterbach commented Mar 1, 2023 •

edited

Loading

georglauterbach commented Mar 3, 2023 •

edited

Loading

casperklein commented Mar 4, 2023 •

edited

Loading

georglauterbach commented Mar 4, 2023 •

edited

Loading

casperklein commented Mar 4, 2023 •

edited

Loading

polarathene commented Mar 4, 2023 •

edited

Loading

casperklein commented Mar 4, 2023 •

edited

Loading