This repository was archived by the owner on Oct 13, 2023. It is now read-only.
[18.09 backport ENGCORE-830] aufs optimizations #39107#262
Merged
andrewhsu merged 12 commits intodocker-archive:18.09from Jun 17, 2019
Merged
[18.09 backport ENGCORE-830] aufs optimizations #39107#262andrewhsu merged 12 commits intodocker-archive:18.09from
andrewhsu merged 12 commits intodocker-archive:18.09from
Conversation
As standard mount.Unmount does what we need, let's use it. In addition, this adds ignoring "not mounted" condition, which was previously implemented (see PR#33329, commit cfa2591) via a very expensive call to mount.Mounted(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 77bc327)
Signed-off-by: Omri Shiv <[email protected]> (cherry picked from commit fe1083d)
It has been pointed out that we're ignoring EINVAL from umount(2) everywhere, so let's move it to a lower-level function. Also, its implementation should be the same for any UNIX incarnation, so let's consolidate it. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 90be078)
The errors returned from Mount and Unmount functions are raw syscall.Errno errors (like EPERM or EINVAL), which provides no context about what has happened and why. Similar to os.PathError type, introduce mount.Error type with some context. The error messages will now look like this: > mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted or > mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted Before this patch, it was just > operation not permitted [v2: add Cause()] [v3: rename MountError to Error, document Cause()] [v4: fixes; audited all users] [v5: make Error type private; changes after @cpuguy83 reviews] Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 6533136) Signed-off-by: Kir Kolyshkin <[email protected]>
The function is not needed as it's just a shallow wrapper around unix.Mount(). Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 2f98b5f)
Both mount and unmount calls are already protected by fine-grained (per id) locks in Get()/Put() introduced in commit fc1cf19 ("Add more locking to storage drivers"), so there's no point in having a global lock in mount/unmount. The only place from which unmount is called without any locking is Cleanup() -- this is to be addressed in the next patch. This reverts commit 824c24e. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit f93750b)
1. Use mount.Unmount() which ignores EINVAL ("not mounted") error,
and provides better error diagnostics (so we don't have to explicitly
add target to error messages).
2. Since we're ignoring "not mounted" error, we can call
multiple unmounts without any locking -- but since "auplink flush"
is still involved and can produce an error in logs, let's keep
the check for fs being mounted (it's just a statfs so should be fast).
2. While at it, improve the "can't unmount" error message in Put().
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 4beee98)
Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 5873768)
Apparently there is some kind of race in aufs kernel module code, which leads to the errors like: [98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17 [98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17). [98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17 [98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17). [98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17) [98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17 [98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17). [98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17) So, we have to have a lock around mount syscall. While at it, don't call the whole Unmount() on an error path, as it leads to bogus error from auplink flush. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 5cd6285)
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit d58c434)
Do not use filepath.Walk() as there's no requirement to recursively go into every directory under mnt -- a (non-recursive) list of directories in mnt is sufficient. With filepath.Walk(), in case some container will fail to unmount, it'll go through the whole container filesystem which is both excessive and useless. This is similar to commit f1a4592 ("devmapper.shutdown: optimize") While at it, raise the priority of "unmount error" message from debug to a warning. Note we don't have to explicitly add `m` as unmount error (from pkg/mount) will have it. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 8fda12c)
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit ae431b1)
25705ba to
c303e63
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport
which required
Clean cherry-pick (after cherry-picking all the required commits).