daemon: daemon.containerRestart: don't cancel restart on context cancel#46688
daemon: daemon.containerRestart: don't cancel restart on context cancel#46688thaJeztah merged 1 commit intomoby:masterfrom
Conversation
|
Looking if i can come up with a test somehow. I can probably write a test that immediately cancels the context, but that wouldn't be testing the "half-way" situation (although knowing that we already fixed canceling the context for |
fe084a1 to
ffd9765
Compare
|
Test is not happy on windows 😞 |
c72740e to
88225ad
Compare
corhere
left a comment
There was a problem hiding this comment.
Ideally, the decision on whether the container-restart operation is cancelable should be left up to the caller of daemon.ContainerRestart(). But there are two problems with that:
- It would make
daemon.ContainerRestart()inconsistent withdaemon.ContainerStop(). (#46687) daemon.ContainerRestart()starts by callingdaemon.GetContainer(), which we eventually want to plumb contexts into for containerd reasons. We need a way for the caller to be able to apply a different cancelation policy to theGetContainer()call than to the stop-restart operation itself.
Addressing those problems is a bigger task which would blow up the scope of this PR, so would best be saved for a follow-up.
| return events.Message{}, nil | ||
| } | ||
| return events.Message{}, err | ||
| case <-time.After(restartTimeout): |
There was a problem hiding this comment.
Each turn of the loop resets the timeout. That doesn't seem right. Shouldn't this timer be started outside the loop?
There was a problem hiding this comment.
Doh! I recall I went looking if we have a utility for handling the event stream in tests, and landed on TestEventsVolumeCreate as "somewhat close", and copied that
moby/integration/system/event_test.go
Lines 139 to 155 in 452ca90
As we're filtering, we'd only expect a single event here, so I can just remove the for loop altogether.
Actually, I wouldn't even need the closure at all here, so let me inline the things.
88225ad to
2397030
Compare
commit def549c passed through the context to the daemon.ContainerStart function. As a result, restarting containers no longer is an atomic operation, because a context cancellation could interrupt the restart (between "stopping" and "(re)starting"), resulting in the container being stopped, but not restarted. Restarting a container, or more factually; making a successful request on the `/containers/{id]/restart` endpoint, should be an atomic operation. This patch uses a context.WithoutCancel for restart requests. It's worth noting that daemon.containerStop already uses context.WithoutCancel, so in that function, we'll be wrapping the context twice, but this should likely not cause issues (just redundant for this code-path). Before this patch, starting a container that bind-mounts the docker socket, then restarting itself from within the container would cancel the restart operation. The container would be stopped, but not started after that: docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh docker exec myself sh -c 'docker restart myself' docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3a2a741c65ff docker:cli "docker-entrypoint.s…" 26 seconds ago Exited (128) 7 seconds ago myself With this patch: the stop still cancels the exec, but does not cancel the restart operation, and the container is started again: docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh docker exec myself sh -c 'docker restart myself' docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4393a01f7c75 docker:cli "docker-entrypoint.s…" About a minute ago Up 4 seconds myself Signed-off-by: Sebastiaan van Stijn <[email protected]>
2397030 to
aeb8972
Compare
|
|
||
| // Make restart request, but cancel the request before the container | ||
| // is (forcibly) killed. | ||
| ctx2, cancel := context.WithTimeout(ctx, 100*time.Millisecond) |
There was a problem hiding this comment.
Do we need the timeout? Can we use WithCancel here?
There was a problem hiding this comment.
Hm, good point; not sure why I did the timeout here 🤔 this may have been from an earlier iteration where I tried a slightly different approach 🙈
Guess it doesn't harm, but indeed probably wouldn't have been needed.
commit def549c (#44365) passed through the context to the daemon.ContainerStart function. As a result, restarting containers no longer is an atomic operation, because a context cancellation could interrupt the restart (between "stopping" and "(re)starting"), resulting in the container being stopped, but not restarted.
Restarting a container, or more factually; making a successful request on the
/containers/{id]/restartendpoint, should be an atomic operation.This patch uses a context.WithoutCancel for restart requests.
It's worth noting that daemon.containerStop already uses context.WithoutCancel, so in that function, we'll be wrapping the context twice, but this should likely not cause issues (just redundant for this code-path).
Before this patch, starting a container that bind-mounts the docker socket, then restarting itself from within the container would cancel the restart operation. The container would be stopped, but not started after that:
With this patch: the stop still cancels the exec, but does not cancel the restart operation, and the container is started again:
- What I did
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)