-
Notifications
You must be signed in to change notification settings - Fork 18.9k
fix: daemon: state of stopped container visible to other queries when container is stopped #50136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -93,6 +93,9 @@ func (daemon *Daemon) containerStop(ctx context.Context, ctr *container.Containe | |||||||||||
| defer cancel() | ||||||||||||
|
|
||||||||||||
| if status := <-ctr.Wait(subCtx, containertypes.WaitConditionNotRunning); status.Err() == nil { | ||||||||||||
| // Ensure container status changes are committed by handler of container exit before returning control to the caller | ||||||||||||
| ctr.Lock() | ||||||||||||
| defer ctr.Unlock() | ||||||||||||
| // container did exit, so ignore any previous errors and return | ||||||||||||
| return nil | ||||||||||||
| } | ||||||||||||
|
|
@@ -122,5 +125,9 @@ func (daemon *Daemon) containerStop(ctx context.Context, ctr *container.Containe | |||||||||||
| // container did exit, so ignore previous errors and continue | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| // Ensure container status changes are committed by handler of container exit before returning control to the caller | ||||||||||||
| ctr.Lock() | ||||||||||||
| defer ctr.Unlock() | ||||||||||||
|
Comment on lines
+128
to
+130
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like this caused a regression; see docker/compose#12950 (comment) The Lines 74 to 78 in 8e6cd44
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||
|
|
||||||||||||
| return nil | ||||||||||||
| } | ||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick blurb - I started to dig into this PR again to check code-paths, but didn't finish, so let me jot down some thoughts I had while looking;
I started to wonder if there's a bug in
ctr.Wait, or at least it looks odd that we have a function that's returning a channel (with likely the purpose to handle synchronisation), but that contract to be "broken" (i.e., it returns before it's fully done).Perhaps it's "ok" to use the
ctr.Lock() -> Unlockto work around that issue as a short-term solution; mostly wondering if we're still tapering over the underlying issue, and if this is something ideally fixed inWait(or elsewhere).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thaJeztah,
I was thinking about the same, but I didn't find a better solution for
container.State.Wait.Ideally, all code which wants to access (e.g. read) state of container should use
ctr.Lock()+ctr.Unlock(). It's pretty clear requirement which is already met (even before this PR).daemon.Daemon.containerStopmethod (within its current implementation) doesn't need to access state of container directly - all it needs is to wait for container to be stopped and this requirement is in place even before this PR (please note that with this PR we make /containers/{id}/stop Docker Engine API slower by introducing additional synchronization - that's the reason I'm worried about regression).IMHO, the issue is located not in
container.State.Wait, but indaemon.Daemon.Containersmethod and indaemon.Daemon.containersReplicafield ("read-only view", which is used bydaemon.Daemon.Containersmethod). I understand the need ofdaemon.Daemon.containersReplicafield, but it looks like this read-only view introduced some cases which lack synchronization withdaemon.Daemon.containersfield. #50133 is just one of these cases, other can be in /containers/{id}/kill and /containers/{id}/wait API (just replace/containers/{id}/stopwith these APIs in #50133). Please take a look at testcontainers/testcontainers-go#3197 which uses /containers/{id}/json API (which reads state of container fromdaemon.Daemon.containersand not fromdaemon.Daemon.containersReplica) - this workaround demonstrates that there is no issue (withcontainer.State.Wait) when usingdaemon.Daemon.containersfield withctr.Lock()+ctr.Unlock().Maybe we should realize that all Docker Engine API which uses
daemon.Daemon.containersReplicafield can get outdated (delayed) state of containers and highlight that in API documentation (e.g. split read-only API from read-write API, then describe synchronization b/w API calls). Otherwise, even without concurrent access to Docker Engine API we leave users of this API without understanding of:Alternative solution (the way this PR goes, but only for /containers/{id}/stop API endpoint) is to ensure that
daemon.Daemon.containersReplicais updated before completion of any API call which modifies state of container(s).Thank you.