Skip to content

daemon: daemon.containerRestart: don't cancel restart on context cancel#46688

Merged
thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah:restart_nocancel
Oct 26, 2023
Merged

daemon: daemon.containerRestart: don't cancel restart on context cancel#46688
thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah:restart_nocancel

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Oct 20, 2023

commit def549c (#44365) passed through the context to the daemon.ContainerStart function. As a result, restarting containers no longer is an atomic operation, because a context cancellation could interrupt the restart (between "stopping" and "(re)starting"), resulting in the container being stopped, but not restarted.

Restarting a container, or more factually; making a successful request on the /containers/{id]/restart endpoint, should be an atomic operation.

This patch uses a context.WithoutCancel for restart requests.

It's worth noting that daemon.containerStop already uses context.WithoutCancel, so in that function, we'll be wrapping the context twice, but this should likely not cause issues (just redundant for this code-path).

Before this patch, starting a container that bind-mounts the docker socket, then restarting itself from within the container would cancel the restart operation. The container would be stopped, but not started after that:

docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
docker exec myself sh -c 'docker restart myself'

docker ps -a
CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS                       PORTS     NAMES
3a2a741c65ff   docker:cli    "docker-entrypoint.s…"   26 seconds ago   Exited (128) 7 seconds ago             myself

With this patch: the stop still cancels the exec, but does not cancel the restart operation, and the container is started again:

docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
docker exec myself sh -c 'docker restart myself'
docker ps
CONTAINER ID   IMAGE        COMMAND                  CREATED              STATUS         PORTS     NAMES
4393a01f7c75   docker:cli   "docker-entrypoint.s…"   About a minute ago   Up 4 seconds             myself

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Copy Markdown
Member Author

Looking if i can come up with a test somehow. I can probably write a test that immediately cancels the context, but that wouldn't be testing the "half-way" situation (although knowing that we already fixed canceling the context for containerStop (in #45738), perhaps that's still enough.

@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Oct 21, 2023

Test is not happy on windows 😞

=== FAIL: github.com/docker/docker/integration/container TestContainerRestartWithCancelledRequest (4.63s)
    restart_test.go:286: assertion failed: error is not nil: timeout waiting for restart event
    restart_test.go:244: Cleaning up test container failed with error: 
      Error response from daemon: container c25e1bc8c9082a4c579db9d41ceee15c20019b58c4638cdd941b3d3781d915a1: 
        driver "windowsfilter" failed to remove root filesystem: second rename attempt following detach failed: 
          rename C:\Users\runneradmin\AppData\Local\Temp\moby-root\windowsfilter\c25e1bc8c9082a4c579db9d41ceee15c20019b58c4638cdd941b3d3781d915a1 C:\Users\runneradmin\AppData\Local\Temp\moby-root\windowsfilter\c25e1bc8c9082a4c579db9d41ceee15c20019b58c4638cdd941b3d3781d915a1-removing: 
            Access is denied.: 
              rename C:\Users\runneradmin\AppData\Local\Temp\moby-root\windowsfilter\c25e1bc8c9082a4c579db9d41ceee15c20019b58c4638cdd941b3d3781d915a1 C:\Users\runneradmin\AppData\Local\Temp\moby-root\windowsfilter\c25e1bc8c9082a4c579db9d41ceee15c20019b58c4638cdd941b3d3781d915a1-removing: 
                Access is denied.

@thaJeztah thaJeztah force-pushed the restart_nocancel branch 2 times, most recently from c72740e to 88225ad Compare October 23, 2023 09:41
@thaJeztah
Copy link
Copy Markdown
Member Author

@vvoland @rumpl PTAL (writing the test was a bit finicky; suggestions on better ways to test this welcome!)

Copy link
Copy Markdown
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the decision on whether the container-restart operation is cancelable should be left up to the caller of daemon.ContainerRestart(). But there are two problems with that:

  1. It would make daemon.ContainerRestart() inconsistent with daemon.ContainerStop(). (#46687)
  2. daemon.ContainerRestart() starts by calling daemon.GetContainer(), which we eventually want to plumb contexts into for containerd reasons. We need a way for the caller to be able to apply a different cancelation policy to the GetContainer() call than to the stop-restart operation itself.

Addressing those problems is a bigger task which would blow up the scope of this PR, so would best be saved for a follow-up.

Comment thread integration/container/restart_test.go Outdated
return events.Message{}, nil
}
return events.Message{}, err
case <-time.After(restartTimeout):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each turn of the loop resets the timeout. That doesn't seem right. Shouldn't this timer be started outside the loop?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! I recall I went looking if we have a utility for handling the event stream in tests, and landed on TestEventsVolumeCreate as "somewhat close", and copied that

getEvents := func(messages <-chan events.Message, errs <-chan error) ([]events.Message, error) {
var evts []events.Message
for {
select {
case m := <-messages:
evts = append(evts, m)
case err := <-errs:
if err == io.EOF {
return evts, nil
}
return nil, err
case <-time.After(time.Second * 3):
return nil, errors.New("timeout hit")
}
}
}

As we're filtering, we'd only expect a single event here, so I can just remove the for loop altogether.

Actually, I wouldn't even need the closure at all here, so let me inline the things.

commit def549c passed through the context
to the daemon.ContainerStart function. As a result, restarting containers
no longer is an atomic operation, because a context cancellation could
interrupt the restart (between "stopping" and "(re)starting"), resulting
in the container being stopped, but not restarted.

Restarting a container, or more factually; making a successful request on
the `/containers/{id]/restart` endpoint, should be an atomic operation.

This patch uses a context.WithoutCancel for restart requests.

It's worth noting that daemon.containerStop already uses context.WithoutCancel,
so in that function, we'll be wrapping the context twice, but this should
likely not cause issues (just redundant for this code-path).

Before this patch, starting a container that bind-mounts the docker socket,
then restarting itself from within the container would cancel the restart
operation. The container would be stopped, but not started after that:

    docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
    docker exec myself sh -c 'docker restart myself'

    docker ps -a
    CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS                       PORTS     NAMES
    3a2a741c65ff   docker:cli    "docker-entrypoint.s…"   26 seconds ago   Exited (128) 7 seconds ago             myself

With this patch: the stop still cancels the exec, but does not cancel the
restart operation, and the container is started again:

    docker run -dit --name myself -v /var/run/docker.sock:/var/run/docker.sock docker:cli sh
    docker exec myself sh -c 'docker restart myself'
    docker ps
    CONTAINER ID   IMAGE        COMMAND                  CREATED              STATUS         PORTS     NAMES
    4393a01f7c75   docker:cli   "docker-entrypoint.s…"   About a minute ago   Up 4 seconds             myself

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Copy Markdown
Member Author

okay; this should be ready for re-review; @vvoland @rumpl @corhere

@thaJeztah thaJeztah merged commit e0476be into moby:master Oct 26, 2023
@thaJeztah thaJeztah deleted the restart_nocancel branch October 26, 2023 07:50

// Make restart request, but cancel the request before the container
// is (forcibly) killed.
ctx2, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the timeout? Can we use WithCancel here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point; not sure why I did the timeout here 🤔 this may have been from an earlier iteration where I tried a slightly different approach 🙈

Guess it doesn't harm, but indeed probably wouldn't have been needed.

Copy link
Copy Markdown
Member

@rumpl rumpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just have one question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants