Fix hang when container fails to start#5060
Conversation
|
Should also add a test that would catch a regression such as this, but I'll leave that for tomorrow 😅 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5060 +/- ##
=======================================
Coverage 61.06% 61.07%
=======================================
Files 295 295
Lines 20667 20670 +3
=======================================
+ Hits 12621 12624 +3
Misses 7147 7147
Partials 899 899 |
|
Will this affect client-side handling of |
|
I don't think it should, that check/path happens before the changes in this patch/the other changes right? cli/cli/command/container/utils.go Lines 26 to 28 in 647ccf3 |
f431514 to
109a06e
Compare
109a06e to
32d7da6
Compare
vvoland
left a comment
There was a problem hiding this comment.
LGTM; left a small nit, but feel free to ignore 😅
Signed-off-by: Laura Brehm <[email protected]>
Signed-off-by: Laura Brehm <[email protected]>
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM
We should probably look at some of our tests to make sure cleanup happens; it looks like icmd.RunCmd may be broken (as it doesn't set the timeout on the cmd that's passed to icmd.WaitOnCmd). But while doing a little test (running sleep 10 instead of invalid command) I noticed that the container is not cleaned up when the test completes; we may either need some cleanup step as part of tests, or as part of the setup of the e2e tests 🤔
|
Agree! I'll write a ticket somewhere for us to remember to do that, should look at all the tests and check that. |
|
Thanks! Yes, looks like a tracking ticket won't hurt. In most cases it probably shouldn't be problematic, but it could explain some flakiness (containers from other tests still being around, causing tests to be inconsistent). In either case, not critical for this PR; I think this PR should be good to go |
- What I did
fixes #5053
When running a container, if it fails to start, we immediately cancel the context used for
waitExitOrRemovedand, if running the container with--rm, wait on the channel:cli/cli/command/container/run.go
Lines 196 to 205 in 870ad7f
However, in 840016e, we added the following line
cli/cli/command/container/utils.go
Lines 40 to 41 in 647ccf3
but in this case, we return from the goroutine without notifying the caller, which causes the caller to hang.
- How I did it
We should not be cancelling the context used to wait on the container exit/removal in this case, and it's only happening by accident due to this line, which is there for other reasons
cli/cli/command/container/run.go
Lines 196 to 199 in 05cba3f
So I used a new context for the
waitExitOrRemovedcall.Also, close the returned channel to unblock any caller when we hit this branch, and added another
closeto a case down where we added another return without notifying the caller.- How to verify it
Reproduce such as:
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)