Skip to content

integration/container: Deflake TestWaitBlocked and TestWaitRestartedContainer#49618

Merged
vvoland merged 2 commits intomoby:masterfrom
vvoland:container-wait-deflake
Mar 14, 2025
Merged

integration/container: Deflake TestWaitBlocked and TestWaitRestartedContainer#49618
vvoland merged 2 commits intomoby:masterfrom
vvoland:container-wait-deflake

Conversation

@vvoland
Copy link
Copy Markdown
Contributor

@vvoland vvoland commented Mar 10, 2025

Remove Parallel execution and increase stop timeout seems to help.

@vvoland vvoland self-assigned this Mar 10, 2025
@vvoland vvoland force-pushed the container-wait-deflake branch 6 times, most recently from ba8bb31 to 3cca323 Compare March 11, 2025 12:09
@vvoland vvoland changed the title container/wait: Fix early return for containers in Created state integration/container: Deflake TestWaitBlocked and TestWaitRestartedContainer Mar 11, 2025
@vvoland vvoland force-pushed the container-wait-deflake branch 2 times, most recently from 4e4a511 to 876007c Compare March 12, 2025 15:00
@vvoland vvoland marked this pull request as ready for review March 12, 2025 16:42
@vvoland vvoland requested review from robmry and thaJeztah March 12, 2025 19:13
@robmry
Copy link
Copy Markdown
Contributor

robmry commented Mar 12, 2025

Does the t.Parallel break it because some of the container-stop is serialised in the daemon, so the stops end up queued anyway, then 2s isn't enough for the last in the queue? (Just wondering if increasing the 2s timeout works as an alternative, because testing stops running in parallel might uncover something interesting?)

@vvoland vvoland force-pushed the container-wait-deflake branch from 876007c to 16d7d8a Compare March 13, 2025 12:16
@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Mar 13, 2025

Nope, the 10s timeout is hit on the daemon side so it's not about the 2s wait timeout.

I don't have a good explanation yet why removing Parallel works. Or maybe it actually doesn't and I wasn't able to hit the jackpot 🙈

I'm still willing to investigate this more, but would like to first make sure that removing Parallel is the key here.

@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Mar 13, 2025

Actually, let me try to test one more change on this one.

@vvoland vvoland marked this pull request as draft March 13, 2025 12:19
@vvoland vvoland force-pushed the container-wait-deflake branch from 16d7d8a to a39d6c9 Compare March 13, 2025 14:56
vvoland added 2 commits March 13, 2025 18:09
It seems to help with the flakiness in the CI.
However, I can't reproduce the flakiness locally.

Signed-off-by: Paweł Gronowski <[email protected]>
@vvoland vvoland force-pushed the container-wait-deflake branch from a39d6c9 to ac34bd9 Compare March 13, 2025 17:09
@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Mar 13, 2025

I can't reproduce the flakiness locally so I think it's related to the system load of the CI machine? 🤷🏻

@vvoland vvoland marked this pull request as ready for review March 14, 2025 11:55
@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Mar 14, 2025

Reverted to the initial approach

Copy link
Copy Markdown
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating ... is the conclusion that parallel stops actually are flaky in the daemon, so it's not "just" a test problem? (Do we need an issue to track that?)

Either way, we can't have flaky tests - so this change LGTM.

@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Mar 14, 2025

No concrete idea yet, for some reason the container just doesn't exit after the daemon sends the SIGTERM.
Perhaps the trap 'exit 0' TERM doesn't have the chance to execute before ContainerStop is called? 🤔 But I have no explanation why removing the Parallel would help that.

I'll open a ticket for it and keep an eye on this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestWaitBlocked, TestWaitRestartedContainer Flaky test: TestWaitRestartedContainer

2 participants