freezer: add delay after freeze#2941
Merged
kolyshkin merged 1 commit intoopencontainers:masterfrom May 6, 2021
Merged
Conversation
AkihiroSuda
reviewed
May 6, 2021
libcontainer/cgroups/fs/freezer.go
Outdated
| } | ||
|
|
||
| if i%25 == 24 { | ||
| // A short sleep before reading back also helps. |
Member
There was a problem hiding this comment.
Could you update the comment to clarify this helps what
Contributor
Author
There was a problem hiding this comment.
There's a big comment above telling the whole story...
Ah, I just found it now contradicts what I say here. Will fix.
Contributor
Author
|
This is from a "good" VM (once you start testing, gha gives you a good VM...). Out of 400 runs, only 147 needed more than 1 retry. Out of those that need a retry, there are peaks at 24 and 49, meaning sleep helps. |
Contributor
|
LGTM |
I hate to keep adding those kludges, but lately TestFreeze (and TestSystemdFreeze) from libcontainer/integration fails a lot. The failure comes and goes, and is probably this is caused by a slow host allocated for the test, and a slow VM on top of it. To remediate, add a small sleep on every 25th iteration in between asking the kernel to freeze and checking its status. In the worst case scenario (failure to freeze), this adds about 0.4 ms (40 x 10 us) to the duration of the call. It is hard to measure how this affects CI as GHA plays a roulette when allocating a node to run the test on, but it seems to help. With additional debug info, I saw somewhat frequent "frozen after 24 retries" or "frozen after 49 retries", meaning it succeeded right after the added sleep. While at it, rewrite/improve the comments. Signed-off-by: Kir Kolyshkin <[email protected]>
b787703 to
524abc5
Compare
mrunalp
approved these changes
May 6, 2021
AkihiroSuda
approved these changes
May 6, 2021
This was referenced May 6, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hate to keep adding those kludges (for earlier ones, see #2918, #2791, #2774)
but lately TestFreeze (and TestSystemdFreeze) from libcontainer/integration
fails a lot (see #2907).
The failure comes and goes, and is probably this is caused by a slow host
allocated for the test, and a slow VM on top of it.
To remediate, add a small sleep on every 25th iteration in between
asking the kernel to freeze and checking its status.
In the worst case scenario (failure to freeze), this adds about 0.4 ms
(40 x 10 us) to the duration of the call.
It is hard to measure how this affects CI as GHA plays a roulette when
allocating a node to run the test on, but it seems to help. With
additional debug info, I saw somewhat frequent "frozen after 24 retries"
or "frozen after 49 retries", meaning it succeeded right after the added
sleep.
While at it, rewrite/improve the comments.
Fixes: #2907.