No container locks on `docker ps` by fabiokung · Pull Request #31273 · moby/moby

fabiokung · 2017-02-22T23:55:50Z

- What I did

We are seeing a fair amount of lockups on docker ps. More details here.

While things seem to get better with every new version of Docker, there may always be a legitimate reason to hold container locks during some not-very-quick operations. Container queries (docker ps) currently try to grab a lock on every single container being inspected. The risk for lockups is big.

I started by trying to pick up @cpuguy83's work on #30225 (the memdb branch). Completely moving the container in-memory store to a consistent ACID DB is a noble cause, but I quickly realized it would be a huge undertaking (references that would need to be deep copied, structs that would need to be broken apart, etc.).

This is a more incremental step towards that goal. Containers keep being stored in the existing in-memory store, and mutations still grab a Lock() to avoid races. We then keep a consistent view of all containers rendered during all of these mutations, so readers (queries) do not need to Lock() anything.

Queries and docker ps are now very cheap. There are now virtually no chances of them getting stuck in a lockup, at the expense of some more lock contention during some mutation operations, because the current MemDB implementation (using hashicorp/go-memdb) does a table-level Lock() during write transactions.

These write locks are very short and hopefully won't be a problem (memdb.Save()). If they are, in the future another in-memory ACID implementation that supports row level locking could be investigated. Or replications could be done asynchronously (and optimistically), reducing lock contentions but causing the read-only view to be eventually consistent.

There is also admittedly some risk of missing parts of the code that are mutating containers and not replicating state when they should. However, we already replicate container state to disk (container.ToDisk()), then any of these cases should be treated as bugs and covered by tests.

- How I did it

All data that is necessary to serve queries (docker ps) is snapshotted during operations that mutate that data. Typically these mutations already hold a lock on the container object they are mutating. All places in the code calling container.ToDisk() and container.ToDiskLocking() are good candidates to also replicate state to the in-memory rendered consistent view.

Queries use a read-only transaction on the replicated in-memory DB, and don't need to grab locks on each individual container being inspected.

In the future, more read operations can be served from the in-memory ACID store, e.g.:

docker inspect (shouldn't be too hard, but will require deep copying some pointers currently being held by the container during Snapshot())
healthcheck probe results
stats collection
container.RWLayer
ExecConfigs

This way we can incrementally move things as needed/wanted and even potentially one day completely eliminating container locks (#30225). One day container.ToDisk() could also be replaced by checkpointing the in-memory DB to disk.

I explicitly left out some code paths that mutate parts of the container that queries don't currently care about:

container.RemovalInProgress
container.BaseFS (all calls to mount.Mount() may mutate the container: daemon.ContainerArchivePath, daemon.ContainerCopy, daemon.ContainerExtractToDir, daemon.ContainerStatPath, ... )
container.HasBeenManuallyStopped
container.HostsPath, container.ResolvConfPath: being mutated by some networking code paths

- How to verify it

All related docker ps and GetContainerApi cli-integration tests are passing. This should be treated as an internal refactoring and no functionality is being added or changed.

- Description for the changelog

lock-free docker ps, reducing the chances of daemon lockups during queries

- A picture of a cute animal (not mandatory but encouraged)

My dog, Jilly:

icecrime · 2017-02-23T01:09:21Z

Ping @cpuguy83 @tonistiigi 🎉

fabiokung · 2017-02-23T01:35:23Z

also related to #28754

thaJeztah · 2017-02-23T02:00:37Z

Thanks for working on this @fabiokung !

tonistiigi · 2017-02-23T02:05:48Z

oops, good catch

aaronlehmann · 2017-02-23T19:27:02Z

I like the idea. This looks way simpler than the original iteration that involved deep copies.

It looks like daemon.containersReplica.Save(c.Snapshot()) is a common pattern, and it needs to be protected by the container lock to avoid transactions overwriting each other. Maybe it would be good idea to define a method on Container that does this, and add a prominent comment that it should only be called with the lock held (or it could acquire the lock itself, if this pattern works for the call sites).

fabiokung · 2017-02-23T19:48:30Z

Good point on moving the replication to Container. I'll take a stab at it.

tonistiigi · 2017-02-23T20:18:53Z

I'm bit torn about this. I stand by my earlier comment that we shouldn't be afraid of locks, just the wrong implementation of them. This problem doesn't only show in performance, most places we call IsRunning/IsPaused/IsRestarting are clear races. If we had proper synchronization between lifecycle states, a plain RWMutex should probably even perform better than this without the extra maintenance cost. But I don't see anyone taking on this refactor, at least before everything about containerd integration is cleared. This version does look simpler than the earlier one. Not going to block if other maintainers want to go forward with it.

cpuguy83 · 2017-02-23T21:24:01Z

This is effectively the same thing as the deep copy, and has to be maintained in the same way.
It ends up being pretty fragile as any change even to any of the underlying types could wind up messing things up... e.g. adding a new reference type to a struct which is not properly copied.

It is similar, but not necessarily the same. It's more of a mapping to what queries need to see, rather than only a (deep) copy. The rest of the code already has precedence for similar mapping to different views/strucs being done at different layers.

It is fragile in the same sense that public API handlers can break when new fields are added and not properly mapped to strucs being serialized for responses.

Instead of defining a new type do you think we could use the API container type?

I started with that, but it had a pointer to *NetworkSettings and was missing a few fields that daemon/list.go requires. Since types.Container is a public API, I didn't want break API compatibility or mess with its JSON serialization.

It's probably possible, but a bit riskier I think.

fabiokung · 2017-02-23T21:35:20Z

@tonistiigi fair point. This doesn't block going back to locking for reads when/if you are comfortable enough that all concurrency and locking is being properly handled across the codebase. It would be a bit of throwaway work, but at least it fixes the problem we are having today in the short term, until more long term solutions (what you propose with proper synchronization between lifecycle states, or "lock-free" transactional mutations everywhere, ...).

I'm not proposing we lock down on a decision on how concurrency will be handled long term, but this is a big pain right now and I don't feel we can wait much for a big re-architecture to happen.

fabiokung · 2017-02-23T23:14:28Z

@aaronlehmann I moved the checkpointing/replication operation to *Container. How does it look now?

I liked it more, since now it is clearer that checkpointing should probably be done when saving state to disk. It also allowed container.snapshot() to not be public.

aaronlehmann · 2017-02-24T00:11:50Z

Thanks, I think that's an improvement.

dnephin · 2017-03-09T22:26:26Z

This needs a rebase, but it looks like some of the requested changes were made, so could use another review as well

fabiokung · 2017-03-10T17:56:27Z

Rebased.

AkihiroSuda · 2017-03-17T06:44:58Z

Please add some comment about how to maintain this structure?
e.g. Why HostConfig.{NetworkMode, Isolation} is needed but others not?

Also, I wonder we can embed api/types.Container for deduplication

I'll add a comment about the structure.

Re: using types.Container, here's a brief discussion about it we had on a commit that was previously on this PR: #31273 (comment)

cpuguy83 · 2017-03-17T17:20:08Z

I wonder if instead of having this intermediate object, we can store a duplicate of the container object like so:

(warning, pseudo-code follows)

func(c *Container) ToDisk() {
   // normal stuff

 snapshot := &Container{ID: c.ID}
 snapshot.FromDisk()
  c.snapshotsStore.Add(snapshot)
}

fabiokung · 2017-03-20T15:14:31Z

@cpuguy83 isn't it what we discussed here: #31273 (comment) ?

Or is it something else I'm missing?

cpuguy83 · 2017-03-20T15:30:39Z

@fabiokung Not the same, this would be the actual *container.Container type which gets stored in the snapshot which is fully-copied by definition since it'll be unmarshalled from the on-disk json.

thaJeztah · 2017-06-29T08:36:15Z

The dump shows in the daemon logs, not on the command-iine. I'm not sure docker 1.6.2 had this already; possibly it needs debug to be enabled on the daemon

thaJeztah · 2017-06-29T08:38:42Z

That debugging functionality was added in docker 1.7 #10786

Anyway, this is not really the best location for this discussion - feel free to message me on Slack if you need assistance

gyliu513 · 2017-06-29T08:40:04Z

I googled docker slack channel before, but no help info, what is the link of docker slack channel? @thaJeztah

thaJeztah · 2017-06-29T08:43:55Z

Here's more info about the slack channel; https://blog.docker.com/2016/11/introducing-docker-community-directory-docker-community-slack/, register here for the docker community, and to get an invitation for the Slack channel: http://dockr.ly/community

gyliu513 · 2017-06-29T08:50:25Z

Thanks @thaJeztah , as I cannot be approved to the community, I want to ask you the last question here, hope it is OK. ;-)

Regarding the debug mode, it is really helpful for troubleshooting. The question is: is it good to enable the debug mode in a production env? For me, I do want to enable it, but not sure if there are any overhead for this especially for production.

gyliu513 · 2017-06-29T10:13:10Z

Got the answer myself, we can turn on the debug mode automatically, cool!

gyliu513 · 2017-06-30T02:16:18Z

@fabiokung

Container queries (docker ps) currently try to grab a lock on every single container being inspected

One question for the issue, before your fix, why does docker ps need to grab a lock for each single container? Why read-only operations require a lock?

Another issue I want to mention is that when such issue happens, not only docker ps hang, but also docker images hang, does your fix include this? Also, do you know why docker images also hang? Other commands such as docker inspect <container id>, docker logs <container id> works fine.

fabiokung · 2017-06-30T05:57:22Z

@gyliu513 it uses locks to prevent partial reads and data corruption, since container data is being concurrently modified. Even in read-only operations, locking is how is ensures the state being read will be consistent.

This fix only applies to docker ps, and was a first step in fixing others you mentioned. I did mention them in the description. docker inspect will also hang, if you happen to hit one of the containers that has its lock held somewhere else.

krmayankk · 2017-08-01T04:00:23Z

@fabiokung this fix is available in what docker versions ?

thaJeztah · 2017-08-01T05:47:16Z

@krnayankk this will be included in docker 17.07 and up

thiagoalves · 2017-09-06T16:46:18Z

@thaJeztah How about docker-ee? Will we have it in 17.06-ee3 or ee4? Or should we wait until 17.09?

cpuguy83 · 2017-09-06T16:53:31Z

@thiagoalves It will not be in EE until the next major release. I do not see this one getting backported.

cpuguy83 · 2017-09-06T17:53:06Z

@thiagoalves btw, if you have a specific case where you are seeing a deadlock, please report so it can be fixed. This patch is just making the deadlock less apparent, not actually fixing deadlocks.

thiagoalves · 2017-09-06T18:19:33Z

@mlespiau

robertglen · 2017-11-21T23:06:38Z

17.07 sounds great, when can we expect to see it? I'm only seeing 17.05 with the last commit from May of this year. and oldschool docker 1.13.1 from Feb of this year.

How long do critical fixes affecting an entire ecosystem of projects typically bake before a release is cut?

cpuguy83 · 2017-11-21T23:27:27Z

@robertglen 17.07 was released in July....

tonistiigi · 2017-11-21T23:37:21Z

@robertglen You probably need to update your apt/yum repositories from dockerproject.org to download.docker.com. https://docs.docker.com/engine/installation/

robertglen · 2017-11-22T03:29:48Z

heh, OK I just went to the code portion of this repo and looked at branches and tags and it stopped at 17.05 and had completely dismissed visiting docker's website :\ my bad.

euank · 2017-11-22T04:42:18Z

@robertglen

The Moby project doesn't make releases of this repo, rather Docker Inc makes releases of the Docker Community Edition software (which includes code from this repo). Those releases are tagged in the docker-ce repository.

GordonTheTurtle added the status/0-triage label Feb 22, 2017

fabiokung force-pushed the consistent-ro-view branch 5 times, most recently from 23a8be6 to 6da800d Compare February 23, 2017 00:49

icecrime added status/1-design-review and removed status/0-triage labels Feb 23, 2017

tonistiigi reviewed Feb 23, 2017

View reviewed changes

fabiokung force-pushed the consistent-ro-view branch from 6da800d to 5d5ff2f Compare February 23, 2017 02:20

cpuguy83 reviewed Feb 23, 2017

View reviewed changes

fabiokung force-pushed the consistent-ro-view branch from af35a93 to 8bf082e Compare February 24, 2017 16:07

aaronlehmann mentioned this pull request Feb 25, 2017

The /containers/json API hangs while multiple containers are being created by swarmkit #30883

Open

GordonTheTurtle assigned dnephin Mar 9, 2017

fabiokung force-pushed the consistent-ro-view branch from 8bf082e to 882fbe3 Compare March 10, 2017 17:56

AkihiroSuda reviewed Mar 17, 2017

View reviewed changes

aaronlehmann mentioned this pull request Jun 29, 2017

flaky tests: docker-py: ServiceTest.test_create_service_with_secret #33863

Closed

thaJeztah mentioned this pull request Jul 9, 2017

Docker Daemon becomes unresponsive #33710

Closed

cpuguy83 mentioned this pull request Jul 10, 2017

docker daemon 1.13.1 hung: docker ps not responding. #33994

Closed

allencloud mentioned this pull request Jul 15, 2017

docker 1.8.1 docker inspect some specific container hangs, no response #18848

Closed

thaJeztah mentioned this pull request Jul 24, 2017

[17.07.x] Changelog docker-archive/docker-ce#124

Merged

euank mentioned this pull request Sep 18, 2017

fix node unschedulable when runtime hung kubernetes/kubernetes#50097

Closed

thaJeztah mentioned this pull request Sep 26, 2017

Docker Daemon gets stuck when containerd fails to create a container #33828

Open

chestack mentioned this pull request Mar 19, 2018

Node flapping between Ready/NotReady with PLEG issues kubernetes/kubernetes#45419

Closed

kolyshkin mentioned this pull request May 1, 2020

backport: No container locks on docker ps projectatomic/docker#375

Merged

thaJeztah mentioned this pull request Sep 21, 2022

container: remove ViewDB and View interfaces, use concrete types #44160

Merged

corhere mentioned this pull request Nov 9, 2023

if docker cp container hang will cause docker inspect container hang #46782

Open

Conversation

fabiokung commented Feb 22, 2017

Uh oh!

icecrime commented Feb 23, 2017

Uh oh!

fabiokung commented Feb 23, 2017

Uh oh!

thaJeztah commented Feb 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronlehmann commented Feb 23, 2017

Uh oh!

fabiokung commented Feb 23, 2017

Uh oh!

tonistiigi commented Feb 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabiokung commented Feb 23, 2017

Uh oh!

fabiokung commented Feb 23, 2017

Uh oh!

aaronlehmann commented Feb 24, 2017

Uh oh!

dnephin commented Mar 9, 2017

Uh oh!

fabiokung commented Mar 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpuguy83 commented Mar 17, 2017

Uh oh!

fabiokung commented Mar 20, 2017

Uh oh!

cpuguy83 commented Mar 20, 2017

Uh oh!

thaJeztah commented Jun 29, 2017

Uh oh!

thaJeztah commented Jun 29, 2017

Uh oh!

gyliu513 commented Jun 29, 2017

Uh oh!

thaJeztah commented Jun 29, 2017

Uh oh!

gyliu513 commented Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gyliu513 commented Jun 29, 2017

Uh oh!

gyliu513 commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabiokung commented Jun 30, 2017

Uh oh!

krmayankk commented Aug 1, 2017

Uh oh!

thaJeztah commented Aug 1, 2017

Uh oh!

thiagoalves commented Sep 6, 2017

Uh oh!

cpuguy83 commented Sep 6, 2017

Uh oh!

cpuguy83 commented Sep 6, 2017

Uh oh!

thiagoalves commented Sep 6, 2017

Uh oh!

robertglen commented Nov 21, 2017

gyliu513 commented Jun 29, 2017 •

edited

Loading

gyliu513 commented Jun 30, 2017 •

edited

Loading