Fix racey TestHealthKillContainer#42958
Merged
thaJeztah merged 1 commit intomoby:masterfrom Oct 21, 2021
Merged
Conversation
Before this change if you assume that things work the way the test expects them to (it does not, but lets assume for now) we aren't really testing anything because we are testing that a container is healthy before and after we send a signal. This will give false positives even if there is a bug in the underlying code. Sending a signal can take any amount of time to cause a container to exit or to trigger healthchecks to stop or whatever. Now lets remove the assumption that things are working as expected, because they are not. In this case, `top` (which is what is running in the container) is actually exiting when it receives `USR1`. This totally invalidates the test. We need more control and knowledge as to what is happening in the container to properly test this. This change introduces a custom script which traps `USR1` and flips the health status each time the signal is received. We then send the signal twice so that we know the change has occurred and check that the value has flipped so that we know the change has actually occurred. Signed-off-by: Brian Goff <[email protected]>
thaJeztah
approved these changes
Oct 21, 2021
|
|
||
| while true; do sleep 1; done | ||
| ` | ||
| c.Config.Cmd = []string{"/bin/sh", "-c", cmd} |
Member
There was a problem hiding this comment.
Actually surprised the multiline cmd works. Makes sense that it does, just that I didn't directly thing of doing it this way (without writing it to a file)
Nice! 👍👍
tonistiigi
approved these changes
Oct 21, 2021
Member
tonistiigi
left a comment
There was a problem hiding this comment.
LGTM, but maybe slightly better would be to use different signals (usr1, usr2) to set healthy vs unhealthy instead of flipping between them. This would make it more certain that we are reacting to the correct kill call. Just checking [ ! -f /unhealthy ] could be simpler as well then.
samuelkarp
approved these changes
Oct 21, 2021
Member
samuelkarp
left a comment
There was a problem hiding this comment.
LGTM, glad this is an actual test now.
Member
|
Failure is unrelated, and a known flaky test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before this change if you assume that things work the way the test
expects them to (it does not, but lets assume for now) we aren't really
testing anything because we are testing that a container is healthy
before and after we send a signal. This will give false positives even
if there is a bug in the underlying code. Sending a signal can take any
amount of time to cause a container to exit or to trigger healthchecks
to stop or whatever.
Now lets remove the assumption that things are working as expected,
because they are not.
In this case,
top(which is what is running in the container) isactually exiting when it receives
USR1.This totally invalidates the test.
We need more control and knowledge as to what is happening in the
container to properly test this.
This change introduces a custom script which traps
USR1and flips thehealth status each time the signal is received.
We then send the signal twice so that we know the change has occurred
and check that the value has flipped so that we know the change has
actually occurred.
Fixes #41930