Add timeout in CLI waiting for autoremove#31744
Conversation
ccf8195 to
1cda2b7
Compare
This makes sure the CLI doesn't hang in case, for whatever reason, an event never comes. Times out 30seconds after the `die` event is received. Additionally polls the container inspect API every 5 seconds Signed-off-by: Brian Goff <[email protected]>
1cda2b7 to
6059e44
Compare
|
So many levels of indentation makes my head hurt. That's even after you removed one. Was there an issue that prompted this work? |
MHBauer
left a comment
There was a problem hiding this comment.
as far as I understand it LGTM.
Only question is curious why to put the new arg to waitExitOrRemoved "in the middle" and not on the right edge.
| var removeErr error | ||
| statusChan := make(chan int) | ||
| exitCode := 125 | ||
| ctxWithCancel, cancel := context.WithCancel(ctx) |
There was a problem hiding this comment.
Cancellable context, good to see this used.
| } | ||
|
|
||
| go func() { | ||
| statusChan <- waitExitedOrRemovedEvent(ctxWithCancel, dockerCli, containerID, waitRemove, options) |
There was a problem hiding this comment.
start the event processor in a separate thread
| } | ||
|
|
||
| func waitRemovedPolling(ctx context.Context, dockerCli *command.DockerCli, containerID string) { | ||
| ticker := time.NewTicker(5 * time.Second) |
There was a problem hiding this comment.
Is this anything we need to worry about being configurable?
| select { | ||
| case <-attachDone: | ||
| case <-ctxWithCancel.Done(): | ||
| return |
There was a problem hiding this comment.
bail early if we've finished and cleaned up already.
| case <-ctx.Done(): | ||
| return | ||
| case <-ticker.C: | ||
| _, err := dockerCli.Client().ContainerInspect(ctx, containerID) |
There was a problem hiding this comment.
Anything else we need to do in here? Or just loop until the container is gone?
There was a problem hiding this comment.
So, I guess if anything happened that caused container failed to remove, the CLI could still hang? right?
There was a problem hiding this comment.
Sorry, I didn't see how the 30s works. In my understanding, if we failed to remove container, and container is always there, waitRemovedPolling won't exit, isn't that true?
There was a problem hiding this comment.
The context will get cancelled.
There was a problem hiding this comment.
The waitRemovedPolling rely on line 34: https://github.com/docker/docker/pull/31744/files#diff-1fe5ea7c675b9700809b89d15a414576R34 to cancel the context right? and line 34 waitExitedOrRemovedEvent has a param of ctxWithCancel which doesn't have a timeout, so it still rely on Event API. If container fail to remove, there's no destroy event and nobody will cancel the context. 30s timeout only works after waitRemovedPolling returns.
That's why I think it removal failed, this could still hang~
There was a problem hiding this comment.
@WeiZhang555 Today, a remove can't fail (when using --force, which autoremove does) unless there is something horribly wrong on the system (and kill -9 doesn't work).
There was a problem hiding this comment.
We can certainly put a timeout on the poller, or start warning after a period of time or something, but the user can just as easily cancel out of it.
| eventq, errq := dockerCli.Client().Events(eventCtx, eventOpts) | ||
|
|
||
| exitCode = 125 // set default exit code | ||
| eventProcessor := func(e events.Message) bool { |
There was a problem hiding this comment.
process a single event and indicate if we've received an event telling us to finish or not.
| } | ||
| case err := <-errq: | ||
| logrus.Errorf("error getting events from daemon: %v", err) | ||
| for { |
There was a problem hiding this comment.
loop forever processing events.
|
@MHBauer Mostly because I had to rebase this change on some other refactorings that happened and just kept stuff in place. |
|
The most recent other person prior in this code looks like @WeiZhang555. Any opinions here? |
|
I can understand the logic, this looks more complicated than it's first implemented, I'm just wondering if this is worth it without a specific issue as @MHBauer said. If the |
|
@WeiZhang555 Yes there is a specific issue, a couple really:
I believe @tonistiigi has also mentioned adjusting the wait API to support removal as well. |
|
The implementation looks good to me 😄 |
|
Closing for now in favor of #32237 |
This makes sure the CLI doesn't hang in case, for whatever reason, an
event never comes.