Skip to content

Lessen mount lock contention in layer store#39135

Merged
kolyshkin merged 2 commits intomoby:masterfrom
kolyshkin:layer-mount-lock
May 9, 2019
Merged

Lessen mount lock contention in layer store#39135
kolyshkin merged 2 commits intomoby:masterfrom
kolyshkin:layer-mount-lock

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 24, 2019

While testing massive parallel container creation and removal, we see there's a lock contention in layer store that affects container creation performance (in other words, many threads are waiting for the lock). With this patches, container creation time (wall clock time) is about 2x to 3x less than before (measured with both aufs and overlay2 graph drivers).

The (global, per-layerstore) lock in question is held for the duration of

  • CreateRWLayer()
  • GetRWLayer()
  • ReleaseRWLayer()
  • CreateRWLayerByGraphID()

The lock contention observed mostly between two instances of CreateRWLayer()

Here's a stupid script I am using to test the performance before and after:

#!/bin/bash

NUMCTS=1000
PARALLEL=100

create() {
	# -v 'vol_{}:/vol'
	time ( seq $NUMCTS | parallel -j$PARALLEL docker create alpine top > /dev/null )
	echo "^^^ CREATE TIME ^^^"
}

remove() {
	time ( docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null )
	echo "^^^ REMOVE TIME ^^^"
}

# precreate $NUMCTS containers
create
echo

# create another $NUMCTS while removing $NUMCTS containers
create &
remove &
wait

# remove the rest of it
remove

Results are:

  1. Container creation time (the first timing print by the script) is about 40s before, about 20s after this patchset.

  2. Concurrent container creation/removal time (the second and the third timing printed by the script) are about 70s before, 40s after this patchset.

  3. Container removal times (last timing printed by the script) stay the same at about 30s.

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "layer-mount-lock" [email protected]:kolyshkin/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842358683792
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@GordonTheTurtle GordonTheTurtle added dco/no Automatically set by a bot when one of the commits lacks proper signature status/0-triage labels Apr 24, 2019
@kolyshkin
Copy link
Contributor Author

My concern is locking after this patches might not be adequate; perhaps we need to add a per-id (rather than global) lock (using pkg/locker) on top of it (but I have yet to look into this).

@kolyshkin kolyshkin changed the title Lessen mount lock contention in layer store [WIP] Lessen mount lock contention in layer store Apr 24, 2019
@kolyshkin
Copy link
Contributor Author

perhaps we need to add a per-id (rather than global) lock (using pkg/locker) on top of it

added (not sure if it's needed though)

@kolyshkin
Copy link
Contributor Author

@dmcgowan PTAL

@dmcgowan
Copy link
Member

A finer grain lock is needed in this case. The mounted layer operates on a reference map so there does require locking to prevent alterations on that object in parallel. It maybe possible to just add that lock to *mountedLayer, assuming the global lock has already synchronized creation of the same names.

@kolyshkin
Copy link
Contributor Author

A finer grain lock is needed in this case. The mounted layer operates on a reference map so there does require locking to prevent alterations on that object in parallel. It maybe possible to just add that lock to *mountedLayer, assuming the global lock has already synchronized creation of the same names.

@xinfengliu if you have time to address the above comments? I'm currently away for DockerCon and don't have time :(

@xinfengliu
Copy link
Contributor

@kolyshkin , yes mountedLayer need a lock. Sorry I'm on China public holidays (May 1~3) now , I will spend some time on it but I'm not sure I can make it by the end of this week.

@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from 1a695ce to 1aba6e5 Compare May 3, 2019 17:03
@kolyshkin kolyshkin mentioned this pull request May 3, 2019
@kolyshkin
Copy link
Contributor Author

@xinfengliu no problem, I think I fixed it.

@dmcgowan PTAL at the last commit (I decided to not use RWMutex for the sake of simplicity). Do we need same for roLayer.references map?

@kolyshkin kolyshkin requested a review from tianon as a code owner May 4, 2019 00:31
@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from e07fb1f to 1aba6e5 Compare May 4, 2019 00:32
Add a mutex to protect concurrent access to mountedLayer.references map.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from 1aba6e5 to fb499c6 Compare May 6, 2019 17:56
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label May 6, 2019
@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from fb499c6 to d4c5ddf Compare May 6, 2019 17:59
@dmcgowan
Copy link
Member

dmcgowan commented May 6, 2019

assuming the global lock has already synchronized creation of the same names.

I don't see this part being done. Possibly setting a placeholder in the map and clear it if there is an error. In that case a second call for the same name would get the conflicting error. I agree locking for the entire save is unnecessary just to prevent simultaneous creation of the same name.

@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from d4c5ddf to 3a0eaeb Compare May 7, 2019 02:10
@codecov
Copy link

codecov bot commented May 7, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@994007d). Click here to learn what that means.
The diff coverage is 66.66%.

@@            Coverage Diff            @@
##             master   #39135   +/-   ##
=========================================
  Coverage          ?   37.02%           
=========================================
  Files             ?      612           
  Lines             ?    45448           
  Branches          ?        0           
=========================================
  Hits              ?    16829           
  Misses            ?    26333           
  Partials          ?     2286

@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from 3a0eaeb to f52aa5c Compare May 7, 2019 02:11
@kolyshkin
Copy link
Contributor Author

assuming the global lock has already synchronized creation of the same names.

I don't see this part being done. Possibly setting a placeholder in the map and clear it if there is an error. In that case a second call for the same name would get the conflicting error. I agree locking for the entire save is unnecessary just to prevent simultaneous creation of the same name.

@dmcgowan thanks for review! Sorry I missed that part entirely; I think I know what you mean now, please see the updated commit. I had to introduce the third state of ls.mount[name] when it's not empty but the value is nil, meaning the name being created. All the other places treat nil as non-existent, except for CreateRWLayer() which treats it as a name conflict. I'm not in love with this approach as we can have potential nil dereferences but it looks cleaner than yet another lock or map (such as map of names being created).

Oh, if you can think of any test cases that can be created, I'm happy to implement those.

@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from f52aa5c to f0c95a2 Compare May 9, 2019 17:44
@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 9, 2019

Some changes based on @dmcgowan and my own review:

  • added locking to loadMount() (just for consistency/uniformity, techinically it's not required in there);
  • added missing Unlock() to an error path in CreateRWLayer();
  • simplified return statements in CreateRWLayer as we use named return values;
  • simplified map element presence check by dropping ok assignment and check (if m, ok := l.mounts[name]; ok && m != nil --> if m := l.mounts[name]; m != nil).

Goroutine stack analisys shown some lock contention
while doing massively (100 instances of `docker rm`)
parallel image removal, with many goroutines waiting
for the mountL mutex. Optimize it.

With this commit, the above operation is about 3x
faster, with no noticeable change to container
creation times (tested on aufs and overlay2).

kolyshkin@:
- squashed commits
- added description
- protected CreateRWLayer against name collisions by
temporary assiging nil to ls.mounts[name], and treating
nil as "non-existent" in all the other functions.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the layer-mount-lock branch from f0c95a2 to 05250a4 Compare May 9, 2019 18:05
Copy link
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin changed the title [WIP] Lessen mount lock contention in layer store Lessen mount lock contention in layer store May 9, 2019
@kolyshkin kolyshkin merged commit a4d6993 into moby:master May 9, 2019
@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 13, 2019

A comment from @tonistiigi :

have some concerns about #39135. eg.in https://github.com/moby/moby/pull/39135/files#diff-5137c87d28c11ce0691d2f12e124409eR587 deleteRef, hasRef + remove is not atomic anymore. so if you call GetRWLayer after hasReference you get a rwlayer but the storage gets deleted in another goroutine

is to be addressed in a followup (update: #39209)

buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jun 30, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jun 30, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
paralin pushed a commit to skiffos/buildroot that referenced this pull request Jul 2, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
paralin pushed a commit to skiffos/buildroot that referenced this pull request Jul 2, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jul 7, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
(cherry picked from commit 13cf6f0)
Signed-off-by: Peter Korsgaard <[email protected]>
buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jul 7, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
(cherry picked from commit cdbb3ce)
Signed-off-by: Peter Korsgaard <[email protected]>
buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jul 7, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
(cherry picked from commit 13cf6f0)
Signed-off-by: Peter Korsgaard <[email protected]>
buildroot-auto-update pushed a commit to buildroot/buildroot that referenced this pull request Jul 7, 2019
Fixes CVE-2018-15664: API endpoints behind the 'docker cp' command are
vulnerable to a symlink-exchange attack with Directory Traversal, giving
attackers arbitrary read-write access to the host filesystem with root
privileges, because daemon/archive.go does not do archive operations on a
frozen filesystem (or from within a chroot).

And includes additional post-18.09.6 fixes:

Builder
- Fixed a panic error when building dockerfiles that contain only comments.
  moby/moby#38487
- Added a workaround for GCR authentication issue. moby/moby#38246
- Builder-next: Fixed a bug in the GCR token cache implementation
  workaround.  moby/moby#39183

Runtime
- Added performance optimizations in aufs and layer store that helps in
  massively parallel container creation and removal.  moby/moby#39107,
  moby/moby#39135
- daemon: fixed a mirrors validation issue. moby/moby#38991
- Docker no longer supports sorting UID and GID ranges in ID maps.
  moby/moby#39288

Logging
- Added a fix that now allows large log lines for logger plugins.
  moby/moby#39038

Signed-off-by: Peter Korsgaard <[email protected]>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <[email protected]>
(cherry picked from commit cdbb3ce)
Signed-off-by: Peter Korsgaard <[email protected]>
@thaJeztah thaJeztah added this to the 20.03.0 milestone Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants