Skip to content

[WIP] daemon/archive: attempt to resolve leaking mounts during cp#43583

Closed
neersighted wants to merge 1 commit intomoby:masterfrom
neersighted:pull/43583
Closed

[WIP] daemon/archive: attempt to resolve leaking mounts during cp#43583
neersighted wants to merge 1 commit intomoby:masterfrom
neersighted:pull/43583

Conversation

@neersighted
Copy link
Member

@neersighted neersighted commented May 11, 2022

- What I did

This is a naive first attempt at resolving the issue reported in #38995 and #43390. This approach does pass the integration tests locally, and it passes my minimal reproduction. It additionally passes the short stress-test of the daemon.

However, the attached integration test, while correct, does not properly fail, without this patch. This is because it is not triggered by Docker-in-Docker -- to see this bug, we have to be in the 'true root' mount namespace.

There are a couple other possibilities we can explore -- the first one I see is an alternate test location. Testing the daemon without the container abstraction would take this from a regression test for docker cp to a test of the soundness of mount propagation during the special case of mounting the daemon root into the container.

The second alternative is a completely different, larger-scale change, to use a disposable mount namespace in the engine to do all container mounting in daemon/archive. This is more invasive and harder to prototype, but less likely to have issues in the future as we can completely sidestep this category of issues (interactions with the root mount namespace).

Eyeballs and input on this approach, or the merits of the other two approaches (as well as anything I haven't forseen) is appreciated. I would like feedback from upstream (especially on how to test/validate this) before I go to far ahead with more revisions.

A prior attempt at solving this can be found at #38993.

Signed-off-by: Bjorn Neergaard [email protected]

- How I did it

The current approach is ugly, but results in a minimally invasive change.

- Description for the changelog

Resolve mounts leaking during docker cp when the docker root is mounted into the container (#38995, #43390).

- How to verify it

Minimal reproduction:

docker run --name mount-repro -d -v /var:/hostvar:rslave busybox sleep 1d
docker exec -it mount-repro cat /proc/self/mountinfo | wc -l
docker cp mount-repro:/etc/passwd ./testfile
docker exec -it mount-repro cat /proc/self/mountinfo | wc -l
docker rm -f mount-repro
rm testfile

Stress-test this change:

#!/usr/bin/env bash

set -e

NCONTAINERS=${NCONTAINERS:-$((2*$(nproc)))}
NPARALLEL=${NPARALLEL:-4}
DURATION=${DURATION:-1m}

echo "Creating $NCONTAINERS containers for stress test..."
for _ in $(seq "$NCONTAINERS"); do
  docker run -d -v /var:/hostvar:rslave busybox sleep 1d | tee -a stress-containers
done

OUTDIR=$(mktemp -d)

cleanup() {
  echo "Reaping tasks..."
  for job in $(jobs -p); do
    kill "$job"
  done

  wait

  echo "Killing containers..."
  parallel -u -j"$NPARALLEL" -a stress-containers docker rm -f '{}'
  rm -f stress-containers
  rm -rf "$OUTDIR"
}; trap 'cleanup' EXIT

copy() {
  echo "Starting copy..."
  while test -f stress-containers; do
    parallel -u -j"$NPARALLEL" -a <(shuf stress-containers) docker cp '{}':/etc/passwd "$(mktemp "${OUTDIR}/stress-XXXX")"
  done
}; copy &

restart() {
  echo "Starting restart..."
  while test -f stress-containers; do
    parallel -u -j"$NPARALLEL" -a <(shuf stress-containers) docker restart -t0 '{}'
  done
}; restart &

sleep "$DURATION"
echo "Test complete!"

Guidance is needed on a proper regression test as the current tests will not fail under Docker-in-Docker.

This is a naive first attempt at resolving the issue reported in moby#38995
and moby#43390. This approach does pass the integration tests locally, and
it passes my minimal reproduction. It additionally passes the short
stress-test of the daemon.

However, the attached integration test, while correct, does not properly
fail, without this patch. This is because it is not triggered by
Docker-in-Docker -- to see this bug, we have to be in the 'true root'
mount namespace.

There are a couple other possibilities we can explore -- the first one I
see is an [alternate test location]. Testing the daemon without the
container abstraction would take this from a regression test for `docker
cp` to a test of the soundness of mount propagation during the special
case of mounting the daemon root into the container.

The second alternative is a completely different, larger-scale change,
to use a disposable mount namespace in the engine to do all container
mounting in daemon/archive. This is more invasive and harder to
prototype, but less likely to have issues in the future as we can
completely sidestep this category of issues (interactions with the root
mount namespace).

Eyeballs and input on this approach, or the merits of the other two
approaches (as well as anything I haven't forseen) is appreciated. I
would like feedback from upstream (especially on how to test/validate
this) before I go to far ahead with more revisions.

A prior attempt at solving this can be found at moby#38993.

  [alternate test location]: https://github.com/moby/moby/blob/c55a4ac7795c7606b548b38e24673733481e2167/daemon/daemon_linux_test.go

Signed-off-by: Bjorn Neergaard <[email protected]>
@neersighted
Copy link
Member Author

Per discussion offline in the maintainers meeting, this approach is intractable because of the testing issue. While future test harnesses may be able to do end-to-end testing including the kernel <-> daemon interface, we don't have that capability at present and this is a band-aid.

The conclusion was that docker cp needs to be re-engineered to avoid this category of bug. This also has the side-effect that a new design will be much easier to refactor when the eventual containerd snapshotter refactor happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant