Skip to content

No container locks on docker ps#31273

Merged
aaronlehmann merged 18 commits intomoby:masterfrom
fabiokung:consistent-ro-view
Jun 23, 2017
Merged

No container locks on docker ps#31273
aaronlehmann merged 18 commits intomoby:masterfrom
fabiokung:consistent-ro-view

Conversation

@fabiokung
Copy link
Copy Markdown
Contributor

- What I did

Fixes #28183

We are seeing a fair amount of lockups on docker ps. More details here.

While things seem to get better with every new version of Docker, there may always be a legitimate reason to hold container locks during some not-very-quick operations. Container queries (docker ps) currently try to grab a lock on every single container being inspected. The risk for lockups is big.

I started by trying to pick up @cpuguy83's work on #30225 (the memdb branch). Completely moving the container in-memory store to a consistent ACID DB is a noble cause, but I quickly realized it would be a huge undertaking (references that would need to be deep copied, structs that would need to be broken apart, etc.).

This is a more incremental step towards that goal. Containers keep being stored in the existing in-memory store, and mutations still grab a Lock() to avoid races. We then keep a consistent view of all containers rendered during all of these mutations, so readers (queries) do not need to Lock() anything.

Queries and docker ps are now very cheap. There are now virtually no chances of them getting stuck in a lockup, at the expense of some more lock contention during some mutation operations, because the current MemDB implementation (using hashicorp/go-memdb) does a table-level Lock() during write transactions.

These write locks are very short and hopefully won't be a problem (memdb.Save()). If they are, in the future another in-memory ACID implementation that supports row level locking could be investigated. Or replications could be done asynchronously (and optimistically), reducing lock contentions but causing the read-only view to be eventually consistent.

There is also admittedly some risk of missing parts of the code that are mutating containers and not replicating state when they should. However, we already replicate container state to disk (container.ToDisk()), then any of these cases should be treated as bugs and covered by tests.

- How I did it

All data that is necessary to serve queries (docker ps) is snapshotted during operations that mutate that data. Typically these mutations already hold a lock on the container object they are mutating. All places in the code calling container.ToDisk() and container.ToDiskLocking() are good candidates to also replicate state to the in-memory rendered consistent view.

Queries use a read-only transaction on the replicated in-memory DB, and don't need to grab locks on each individual container being inspected.

In the future, more read operations can be served from the in-memory ACID store, e.g.:

  • docker inspect (shouldn't be too hard, but will require deep copying some pointers currently being held by the container during Snapshot())
  • healthcheck probe results
  • stats collection
  • container.RWLayer
  • ExecConfigs

This way we can incrementally move things as needed/wanted and even potentially one day completely eliminating container locks (#30225). One day container.ToDisk() could also be replaced by checkpointing the in-memory DB to disk.

I explicitly left out some code paths that mutate parts of the container that queries don't currently care about:

  • container.RemovalInProgress
  • container.BaseFS (all calls to mount.Mount() may mutate the container: daemon.ContainerArchivePath, daemon.ContainerCopy, daemon.ContainerExtractToDir, daemon.ContainerStatPath, ... )
  • container.HasBeenManuallyStopped
  • container.HostsPath, container.ResolvConfPath: being mutated by some networking code paths

- How to verify it

All related docker ps and GetContainerApi cli-integration tests are passing. This should be treated as an internal refactoring and no functionality is being added or changed.

- Description for the changelog

lock-free docker ps, reducing the chances of daemon lockups during queries

- A picture of a cute animal (not mandatory but encouraged)

My dog, Jilly:

my dog: Jilly

@icecrime
Copy link
Copy Markdown
Contributor

Ping @cpuguy83 @tonistiigi 🎉

@fabiokung
Copy link
Copy Markdown
Contributor Author

also related to #28754

@thaJeztah
Copy link
Copy Markdown
Member

Thanks for working on this @fabiokung !

Comment thread container/snapshot.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarted

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch

@aaronlehmann
Copy link
Copy Markdown

I like the idea. This looks way simpler than the original iteration that involved deep copies.

It looks like daemon.containersReplica.Save(c.Snapshot()) is a common pattern, and it needs to be protected by the container lock to avoid transactions overwriting each other. Maybe it would be good idea to define a method on Container that does this, and add a prominent comment that it should only be called with the lock held (or it could acquire the lock itself, if this pattern works for the call sites).

@fabiokung
Copy link
Copy Markdown
Contributor Author

Good point on moving the replication to Container. I'll take a stab at it.

@tonistiigi
Copy link
Copy Markdown
Member

I'm bit torn about this. I stand by my earlier comment that we shouldn't be afraid of locks, just the wrong implementation of them. This problem doesn't only show in performance, most places we call IsRunning/IsPaused/IsRestarting are clear races. If we had proper synchronization between lifecycle states, a plain RWMutex should probably even perform better than this without the extra maintenance cost. But I don't see anyone taking on this refactor, at least before everything about containerd integration is cleared. This version does look simpler than the earlier one. Not going to block if other maintainers want to go forward with it.

Comment thread container/snapshot.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively the same thing as the deep copy, and has to be maintained in the same way.
It ends up being pretty fragile as any change even to any of the underlying types could wind up messing things up... e.g. adding a new reference type to a struct which is not properly copied.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar, but not necessarily the same. It's more of a mapping to what queries need to see, rather than only a (deep) copy. The rest of the code already has precedence for similar mapping to different views/strucs being done at different layers.

It is fragile in the same sense that public API handlers can break when new fields are added and not properly mapped to strucs being serialized for responses.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining a new type do you think we could use the API container type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with that, but it had a pointer to *NetworkSettings and was missing a few fields that daemon/list.go requires. Since types.Container is a public API, I didn't want break API compatibility or mess with its JSON serialization.

It's probably possible, but a bit riskier I think.

@fabiokung
Copy link
Copy Markdown
Contributor Author

@tonistiigi fair point. This doesn't block going back to locking for reads when/if you are comfortable enough that all concurrency and locking is being properly handled across the codebase. It would be a bit of throwaway work, but at least it fixes the problem we are having today in the short term, until more long term solutions (what you propose with proper synchronization between lifecycle states, or "lock-free" transactional mutations everywhere, ...).

I'm not proposing we lock down on a decision on how concurrency will be handled long term, but this is a big pain right now and I don't feel we can wait much for a big re-architecture to happen.

@fabiokung
Copy link
Copy Markdown
Contributor Author

@aaronlehmann I moved the checkpointing/replication operation to *Container. How does it look now?

I liked it more, since now it is clearer that checkpointing should probably be done when saving state to disk. It also allowed container.snapshot() to not be public.

@aaronlehmann
Copy link
Copy Markdown

Thanks, I think that's an improvement.

@dnephin
Copy link
Copy Markdown
Member

dnephin commented Mar 9, 2017

This needs a rebase, but it looks like some of the requested changes were made, so could use another review as well

@fabiokung fabiokung force-pushed the consistent-ro-view branch from 8bf082e to 882fbe3 Compare March 10, 2017 17:56
@fabiokung
Copy link
Copy Markdown
Contributor Author

Rebased.

Comment thread container/snapshot.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comment about how to maintain this structure?
e.g. Why HostConfig.{NetworkMode, Isolation} is needed but others not?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder we can embed api/types.Container for deduplication

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment about the structure.

Re: using types.Container, here's a brief discussion about it we had on a commit that was previously on this PR: #31273 (comment)

@cpuguy83
Copy link
Copy Markdown
Member

I wonder if instead of having this intermediate object, we can store a duplicate of the container object like so:

(warning, pseudo-code follows)

func(c *Container) ToDisk() {
   // normal stuff

 snapshot := &Container{ID: c.ID}
 snapshot.FromDisk()
  c.snapshotsStore.Add(snapshot)
}

@fabiokung
Copy link
Copy Markdown
Contributor Author

@cpuguy83 isn't it what we discussed here: #31273 (comment) ?

Or is it something else I'm missing?

@cpuguy83
Copy link
Copy Markdown
Member

@fabiokung Not the same, this would be the actual *container.Container type which gets stored in the snapshot which is fully-copied by definition since it'll be unmarshalled from the on-disk json.

@thaJeztah
Copy link
Copy Markdown
Member

The dump shows in the daemon logs, not on the command-iine. I'm not sure docker 1.6.2 had this already; possibly it needs debug to be enabled on the daemon

@thaJeztah
Copy link
Copy Markdown
Member

That debugging functionality was added in docker 1.7 #10786

Anyway, this is not really the best location for this discussion - feel free to message me on Slack if you need assistance

@gyliu513
Copy link
Copy Markdown

I googled docker slack channel before, but no help info, what is the link of docker slack channel? @thaJeztah

@thaJeztah
Copy link
Copy Markdown
Member

@gyliu513
Copy link
Copy Markdown

gyliu513 commented Jun 29, 2017

Thanks @thaJeztah , as I cannot be approved to the community, I want to ask you the last question here, hope it is OK. ;-)

Regarding the debug mode, it is really helpful for troubleshooting. The question is: is it good to enable the debug mode in a production env? For me, I do want to enable it, but not sure if there are any overhead for this especially for production.

@gyliu513
Copy link
Copy Markdown

Got the answer myself, we can turn on the debug mode automatically, cool!

@gyliu513
Copy link
Copy Markdown

gyliu513 commented Jun 30, 2017

@fabiokung

Container queries (docker ps) currently try to grab a lock on every single container being inspected

One question for the issue, before your fix, why does docker ps need to grab a lock for each single container? Why read-only operations require a lock?

Another issue I want to mention is that when such issue happens, not only docker ps hang, but also docker images hang, does your fix include this? Also, do you know why docker images also hang? Other commands such as docker inspect <container id>, docker logs <container id> works fine.

@fabiokung
Copy link
Copy Markdown
Contributor Author

@gyliu513 it uses locks to prevent partial reads and data corruption, since container data is being concurrently modified. Even in read-only operations, locking is how is ensures the state being read will be consistent.

This fix only applies to docker ps, and was a first step in fixing others you mentioned. I did mention them in the description. docker inspect will also hang, if you happen to hit one of the containers that has its lock held somewhere else.

@krmayankk
Copy link
Copy Markdown

@fabiokung this fix is available in what docker versions ?

@thaJeztah
Copy link
Copy Markdown
Member

@krnayankk this will be included in docker 17.07 and up

@thiagoalves
Copy link
Copy Markdown
Contributor

@thaJeztah How about docker-ee? Will we have it in 17.06-ee3 or ee4? Or should we wait until 17.09?

@cpuguy83
Copy link
Copy Markdown
Member

cpuguy83 commented Sep 6, 2017

@thiagoalves It will not be in EE until the next major release. I do not see this one getting backported.

@cpuguy83
Copy link
Copy Markdown
Member

cpuguy83 commented Sep 6, 2017

@thiagoalves btw, if you have a specific case where you are seeing a deadlock, please report so it can be fixed. This patch is just making the deadlock less apparent, not actually fixing deadlocks.

@thiagoalves
Copy link
Copy Markdown
Contributor

@mlespiau

@robertglen
Copy link
Copy Markdown

17.07 sounds great, when can we expect to see it? I'm only seeing 17.05 with the last commit from May of this year. and oldschool docker 1.13.1 from Feb of this year.

How long do critical fixes affecting an entire ecosystem of projects typically bake before a release is cut?

@cpuguy83
Copy link
Copy Markdown
Member

@robertglen 17.07 was released in July....

@tonistiigi
Copy link
Copy Markdown
Member

@robertglen You probably need to update your apt/yum repositories from dockerproject.org to download.docker.com. https://docs.docker.com/engine/installation/

@robertglen
Copy link
Copy Markdown

heh, OK I just went to the code portion of this repo and looked at branches and tags and it stopped at 17.05 and had completely dismissed visiting docker's website :\ my bad.

@euank
Copy link
Copy Markdown
Contributor

euank commented Nov 22, 2017

@robertglen

The Moby project doesn't make releases of this repo, rather Docker Inc makes releases of the Docker Community Edition software (which includes code from this repo). Those releases are tagged in the docker-ce repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.