Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions daemon/stop.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ func (daemon *Daemon) containerStop(ctx context.Context, ctr *container.Containe
defer cancel()

if status := <-ctr.Wait(subCtx, containertypes.WaitConditionNotRunning); status.Err() == nil {
// Ensure container status changes are committed by handler of container exit before returning control to the caller
ctr.Lock()
defer ctr.Unlock()
Comment on lines 95 to +98
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick blurb - I started to dig into this PR again to check code-paths, but didn't finish, so let me jot down some thoughts I had while looking;

I started to wonder if there's a bug in ctr.Wait, or at least it looks odd that we have a function that's returning a channel (with likely the purpose to handle synchronisation), but that contract to be "broken" (i.e., it returns before it's fully done).

Perhaps it's "ok" to use the ctr.Lock() -> Unlock to work around that issue as a short-term solution; mostly wondering if we're still tapering over the underlying issue, and if this is something ideally fixed in Wait (or elsewhere).

Copy link
Copy Markdown
Contributor Author

@mabrarov mabrarov Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thaJeztah,

I was thinking about the same, but I didn't find a better solution for container.State.Wait.

Ideally, all code which wants to access (e.g. read) state of container should use ctr.Lock() + ctr.Unlock(). It's pretty clear requirement which is already met (even before this PR). daemon.Daemon.containerStop method (within its current implementation) doesn't need to access state of container directly - all it needs is to wait for container to be stopped and this requirement is in place even before this PR (please note that with this PR we make /containers/{id}/stop Docker Engine API slower by introducing additional synchronization - that's the reason I'm worried about regression).

IMHO, the issue is located not in container.State.Wait, but in daemon.Daemon.Containers method and in daemon.Daemon.containersReplica field ("read-only view", which is used by daemon.Daemon.Containers method). I understand the need of daemon.Daemon.containersReplica field, but it looks like this read-only view introduced some cases which lack synchronization with daemon.Daemon.containers field. #50133 is just one of these cases, other can be in /containers/{id}/kill and /containers/{id}/wait API (just replace /containers/{id}/stop with these APIs in #50133). Please take a look at testcontainers/testcontainers-go#3197 which uses /containers/{id}/json API (which reads state of container from daemon.Daemon.containers and not from daemon.Daemon.containersReplica) - this workaround demonstrates that there is no issue (with container.State.Wait) when using daemon.Daemon.containers field with ctr.Lock() + ctr.Unlock().

Maybe we should realize that all Docker Engine API which uses daemon.Daemon.containersReplica field can get outdated (delayed) state of containers and highlight that in API documentation (e.g. split read-only API from read-write API, then describe synchronization b/w API calls). Otherwise, even without concurrent access to Docker Engine API we leave users of this API without understanding of:

  1. Synchronization points (i.e. where slowness / lock can happen).
  2. Visibility of state changes.

Alternative solution (the way this PR goes, but only for /containers/{id}/stop API endpoint) is to ensure that daemon.Daemon.containersReplica is updated before completion of any API call which modifies state of container(s).

Thank you.

// container did exit, so ignore any previous errors and return
return nil
}
Expand Down Expand Up @@ -122,5 +125,9 @@ func (daemon *Daemon) containerStop(ctx context.Context, ctr *container.Containe
// container did exit, so ignore previous errors and continue
}

// Ensure container status changes are committed by handler of container exit before returning control to the caller
ctr.Lock()
defer ctr.Unlock()
Comment on lines +128 to +130
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this caused a regression; see docker/compose#12950 (comment)

The ctr.Lock() makes this function wait for the container to die, but because it produces the stop event in a defer, the stop event now gets produced AFTER the die event;

moby/daemon/stop.go

Lines 74 to 78 in 8e6cd44

defer func() {
if retErr == nil {
daemon.LogContainerEvent(ctr, events.ActionStop)
}
}()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the defer was introduced by me in 952902e - but that change doesn't look critical; mostly cleanup; we can probably inline it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return nil
}
Loading