[18.09 backport ENGCORE-830] aufs optimizations #39107 by kolyshkin · Pull Request #262 · docker-archive/engine

kolyshkin · 2019-06-04T22:37:45Z

Backport

aufs optimizations moby/moby#39107

which required

Clean cherry-pick (after cherry-picking all the required commits).

git cherry-pick -x 893b24b80db170279d5c9532ed508a81c328de5e
git cherry-pick -x b941f081523cce7defafd4457f380442e10fa345
git cherry-pick -x 47c51447e1b6dacf92b40574f6f929958ca9d621 6fd26c4185ce408e72de1d2728e75f8ed74e39f0
git cherry-pick -x f93750b2c4d5f6144f0790ffa89a2a43c6ddbc6ddbc58a7916f74b4def55f577b52291da3c097b80 4beee98026feabe4f4f0468215b8fd9b56f90d5e 5873768dbe3be2733874b8cf68cb492817f81a94 5cd62852fa199f272b542d828c8c5d1db427ea53 d58c434bffef76e48bff75ede290937874488dd3 8fda12c6078ed5c86be0822a7a980c6fbc9220bf ae431b10a9508e2bf3b1782e9d435855e3e13977

As standard mount.Unmount does what we need, let's use it. In addition, this adds ignoring "not mounted" condition, which was previously implemented (see PR#33329, commit cfa2591) via a very expensive call to mount.Mounted(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 77bc327)

Signed-off-by: Omri Shiv <[email protected]> (cherry picked from commit fe1083d)

It has been pointed out that we're ignoring EINVAL from umount(2) everywhere, so let's move it to a lower-level function. Also, its implementation should be the same for any UNIX incarnation, so let's consolidate it. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 90be078)

@cpuguy83

The errors returned from Mount and Unmount functions are raw syscall.Errno errors (like EPERM or EINVAL), which provides no context about what has happened and why. Similar to os.PathError type, introduce mount.Error type with some context. The error messages will now look like this: > mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted or > mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted Before this patch, it was just > operation not permitted [v2: add Cause()] [v3: rename MountError to Error, document Cause()] [v4: fixes; audited all users] [v5: make Error type private; changes after @cpuguy83 reviews] Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 6533136) Signed-off-by: Kir Kolyshkin <[email protected]>

The function is not needed as it's just a shallow wrapper around unix.Mount(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 2f98b5f)

Both mount and unmount calls are already protected by fine-grained (per id) locks in Get()/Put() introduced in commit fc1cf19 ("Add more locking to storage drivers"), so there's no point in having a global lock in mount/unmount. The only place from which unmount is called without any locking is Cleanup() -- this is to be addressed in the next patch. This reverts commit 824c24e. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit f93750b)

1. Use mount.Unmount() which ignores EINVAL ("not mounted") error, and provides better error diagnostics (so we don't have to explicitly add target to error messages). 2. Since we're ignoring "not mounted" error, we can call multiple unmounts without any locking -- but since "auplink flush" is still involved and can produce an error in logs, let's keep the check for fs being mounted (it's just a statfs so should be fast). 2. While at it, improve the "can't unmount" error message in Put(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 4beee98)

Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 5873768)

Apparently there is some kind of race in aufs kernel module code, which leads to the errors like: [98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17 [98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17). [98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17 [98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17). [98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17) [98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17 [98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17). [98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17) So, we have to have a lock around mount syscall. While at it, don't call the whole Unmount() on an error path, as it leads to bogus error from auplink flush. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 5cd6285)

In case there are a big number of layers, so that mount data won't fit into a single memory page (4096 bytes on most platforms, which is good enough for about 40 layers, depending on how long graphdriver root path is), we supply additional layers with O_REMOUNT, as described in aufs documentation. Problem is, the current implementation does that one layer at a time (i.e. there is one mount syscall per each additional layer). Optimize the code to supply as many layers as we can fit in one page (basically reusing the same code as for the original mount). Note, per aufs docs, "[a]t remount-time, the options are interpreted in the given order, e.g. left to right" so we should be good. Tested on an image with ~100 layers. Before (35 syscalls): > [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504> > [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105> > [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098> > ..... (~30 lines skipped for clarity) > [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266> After (2 syscalls): > [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717> > [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358> Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit d58c434)

Do not use filepath.Walk() as there's no requirement to recursively go into every directory under mnt -- a (non-recursive) list of directories in mnt is sufficient. With filepath.Walk(), in case some container will fail to unmount, it'll go through the whole container filesystem which is both excessive and useless. This is similar to commit f1a4592 ("devmapper.shutdown: optimize") While at it, raise the priority of "unmount error" message from debug to a warning. Note we don't have to explicitly add `m` as unmount error (from pkg/mount) will have it. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 8fda12c)

Running a bundled aufs benchmark sometimes results in this warning: > WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs If we take a look at what aulink utility produces on stderr, we'll see: > auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument and auplink exits with exit code of 22 (EINVAL). Looking into auplink source code, what happens is it tries to find a record in /proc/self/mounts corresponding to the mount point (by using setmntent()/getmntent_r() glibc functions), and it fails. Some manual testing, as well as runtime testing with lots of printf added on mount/unmount, as well as calls to check the superblock fs magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target) confirmed that this record is in fact there, but sometimes auplink can't find it. I was also able to reproduce the same error (inability to find a mount in /proc/self/mounts that should definitely be there) using a small C program, mocking what `auplink` does: ```c #include <stdio.h> #include <err.h> #include <mntent.h> #include <string.h> #include <stdlib.h> int main(int argc, char **argv) { FILE *fp; struct mntent m, *p; char a[4096]; char buf[4096 + 1024]; int found =0, lines = 0; if (argc != 2) { fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]); exit(1); } fp = setmntent("/proc/self/mounts", "r"); if (!fp) { err(1, "setmntent"); } setvbuf(fp, a, _IOLBF, sizeof(a)); while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) { lines++; if (!strcmp(p->mnt_dir, argv[1])) { found++; } } printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines); return !found; } ``` I have also wrote a few other C proggies -- one that reads /proc/self/mounts directly, one that reads /proc/self/mountinfo instead. They are also prone to the same occasional error. It is not perfectly clear why this happens, but so far my best theory is when a lot of mounts/unmounts happen in parallel with reading contents of /proc/self/mounts, sometimes the kernel fails to provide continuity (i.e. it skips some part of file or mixes it up in some other way). In other words, this is a kernel bug (which is probably hard to fix unless some other interface to get a mount entry is added). Now, there is no real fix, and a workaround I was able to come up with is to retry when we got EINVAL. It usually works on the second attempt, although I've once seen it took two attempts to go through. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit ae431b1)

thaJeztah

LGTM

andrewhsu

LGTM

kolyshkin and others added 12 commits June 4, 2019 15:21

fix typo

b941f08

Signed-off-by: Omri Shiv <[email protected]> (cherry picked from commit fe1083d)

aufs: get rid of mount()

023b63a

The function is not needed as it's just a shallow wrapper around unix.Mount(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 2f98b5f)

aufs: aufsMount: better errors for unix.Mount()

4ab3020

Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 5873768)

kolyshkin force-pushed the 18.09-backport-aufs-lock branch from 25705ba to c303e63 Compare June 5, 2019 18:51

kolyshkin changed the title ~~[18.09 backport] aufs optimizations #39107~~ [18.09 backport] ENGCORE-830 aufs optimizations #39107 Jun 5, 2019

kolyshkin changed the title ~~[18.09 backport] ENGCORE-830 aufs optimizations #39107~~ [18.09 backport ENGCORE-830] aufs optimizations #39107 Jun 5, 2019

thaJeztah added this to the 18.09.7 milestone Jun 11, 2019

thaJeztah approved these changes Jun 11, 2019

View reviewed changes

andrewhsu approved these changes Jun 17, 2019

View reviewed changes

andrewhsu merged commit 95d8b39 into docker-archive:18.09 Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[18.09 backport ENGCORE-830] aufs optimizations #39107#262

[18.09 backport ENGCORE-830] aufs optimizations #39107#262
andrewhsu merged 12 commits intodocker-archive:18.09from
kolyshkin:18.09-backport-aufs-lock

kolyshkin commented Jun 4, 2019

Uh oh!

thaJeztah left a comment

Uh oh!

andrewhsu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kolyshkin commented Jun 4, 2019

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

andrewhsu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants