ci: misc test enhancements#2815
Conversation
Lacking justification for being in PR.
|
@polarathene do you have any idea, btw, why the test job does not use the cache from the build job? |
|
So, I know, not pretty - but I spent too much time yesterday without really getting down to it. Therefore, I figured, adding the |
I haven't looked in detail in this current issue. But another pragmatic approach could be: find out, when the failing tests started and then revert the last changes made to the tests? That should fix it or am I wrong. Anything was introduced that caused that issue. So instead of fixing the symptoms, I would roll back the change that introduced the issue in the first place. |
IIRC #2729 showed that CI is fragile. While I do not think #2729 introduced a bug, it unveiled the fragility. That's why it is hard to pinpoint: there is no obvious change that serves as the cause.
As I said above, this works in theory, but it is more or less difficult in this case for the reasons mentioned above. I'm not for introducing |
The build job did not create a new cache. It had a primary key cache hit. The file contents that derive the cache key has been seen by the CI before. When this happens, if something has changed that invalidates that cache, unfortunately the cache action does not upload a new cache, this only happens when the expected cache key for the contents (the primary key that we used) didn't have a match and a fallback cache key was used. So we restored a previous cached build, that for some reason uploaded the build cache that for the same key still got it invalidated. I only remember this happening when I worked on the feature and I am wondering if we implemented the re-usable workflows correctly, or if they're incompatible here. I'd need to check the cache key generated is for this PR contents or if it was generating that from Cache key CI used: I just noticed this is quite a problem with commits on master often failing to pass/complete the Build, Test & Deploy workflows. |
|
That is the same cache key used by this PR CI build/test, as was used on Master for merging of this recent PR. That was also the cache key used in that PRs last commits (merges from master, that was the specific job that generated the cache key quoted above) which isn't too surprising. We can also see that non-merge commits in that PR was using different cache keys as they should (this ones build step restored a fallback key, then generated a new one, that was used to restore cache in future commits on the PR and generate the same cache key for the image build being used here). Oh ok, just noticed that this PR doesn't contribute anything that should cause a new image cache key to be created, so it appears to be behaving correctly. Looking back at the related PR merge commit on master and the CI build job there, it does not have the known build cache key from the PR due to the security scoping stuff, it needs to do a fresh build/cache upload of that. And we see that it selects and restores the Anyway, that cache from whenever that was, is invalidated and we get a fresh build. At the end of the build job, since we did not restore a primary cache key for the given contents, it uploads to master branch cache the same key this PR is using. Now any branch/PR that derives this cache key should have access to that, and it should be the most recent fallback cache key used too presently. I am not sure what exactly is causing this invalidation, but I can see that your previous version of this PR was able to build using the cache for this same cache key: Your last commit on that PR that built with cache was: |
It's fine for now and I doubt we're going to benefit much time wise by reverting. It's a minor hit considering how long tests normally take, and we know that the However, we can do better: docker-mailserver/test/test_helper/common.bash Lines 120 to 122 in ff96950 docker-mailserver/test/test_helper/common.bash Lines 204 to 216 in ff96950 docker-mailserver/test/test_helper/common.bash Lines 243 to 249 in ff96950 docker-mailserver/test/test_helper/common.bash Lines 89 to 95 in ff96950 These are the methods you're adding sleep around, and apart from the helper I wrote with a fixed docker-mailserver/test/test_helper/common.bash Lines 8 to 9 in ff96950 If 15 secs of sleep is really all that's required, surely we could just bump up the default a bit more. Otherwise I'd expect failures to still occur and increasing If we still get failures, it's not a fix that extra time is likely to resolve, and more an indication that something else is going wrong (as has been suggested in the earlier version of this PR for what we could try). Additionally as per my comments here, we can switch I also suggested considering |
This comment was marked as outdated.
This comment was marked as outdated.
Resolved
docker-mailserver/test/tests.bats Line 77 in dd6f967 Worth noting is this is a failure because of the return code/exit status of the There is nothing in the output that indicates the failure, and if previous Casper looked into a similar |
|
Very nice investigative work @polarathene! I think this PR has some nice additions too that should make it into v12 later :) Feel free to review :D PS: Sorry for the force-push - forgot to sign the commit. |
c5b9741 to
f5fad61
Compare
There was a problem hiding this comment.
Few questions and a request to remove the TEST_TIMEOUT_IN_SECONDS addition at the start of tests.bats.
The rest of the feedback is just optional suggestions for clamav.bats, as the "checking" bit seemed redundant and I'd rather the space suffix be moved out of the ENV - although the suggestions reduce the usefulness of the ENV a bit 😅
Co-authored-by: Brennan Kinney <[email protected]>
- removed unnecessary configuration - added test for virus detection to `clamav.bats`
|
I have changed the title as @casperklein proposed (it was indeed not fitting anymore). |
polarathene
left a comment
There was a problem hiding this comment.
LGTM 👍
Two small questions remain :)
Description
Fresh start of #2811
Type of change
Checklist:
docs/)