Skip to content

Fix 'docker cp' mount loop#44171

Closed
corhere wants to merge 2 commits intomoby:masterfrom
corhere:fix-mount-loop
Closed

Fix 'docker cp' mount loop#44171
corhere wants to merge 2 commits intomoby:masterfrom
corhere:fix-mount-loop

Conversation

@corhere
Copy link
Contributor

@corhere corhere commented Sep 22, 2022

- What I did

  • Fixed the 'docker cp' mount loop issue without resorting to MS_UNBINDABLE or temporary mount namespaces
  • Made the issue reproducible in a DinD environment so it can be covered by a regression test

- How I did it

  • I got the issue to reproduce in a DinD container by changing the mount propagation mode to shared, which matches the behaviour of SystemD PID 1 outside of a container.
  • Force the mount propagation mode of the container rootfs (in the initial mount namespace) to private before the container starts so that any mounts under it after do not propagate anywhere.

- How to verify it
Check out the first commit and run the integration test to observe it failing. Repeat with the second commit to observe it passing.

- Description for the changelog

  • Fixed docker cp failing with "no space left on device" when the daemon root is mounted into the container as a volume

- A picture of a cute animal (not mandatory but encouraged)

// unconditionally unmount it on cleanup without a chance of unmounting
// any mounts managed by the graph driver.
rootfs := container.BaseFS.Path()
return mount.Mount(rootfs, rootfs, "", "bind,private")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making each container rootfs private, we could solve the issue in daemon.setupDaemonRootPropagation by ensuring the propagation of /var/lib/docker is slave or private rather than the current behaviour of ensuring it is shared or slave. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might not be a great idea, come to think of it. If /var/lib/docker had slave or private propagation and a container is started with /var/lib/docker as a volume, any rootfs mounts for other containers would get copied into the container namespace with a propagation mode which would prevent unmounts of the other containers' rootfs'es from propagating into that container? Should the daemon be forcing the propagation mode of /var/lib/docker to shared(+slave)?

corhere and others added 2 commits September 22, 2022 11:43
Modify the DinD entrypoint scripts to make the issue reproducible inside
a DinD container.

Co-authored-by: Bjorn Neergaard <[email protected]>
Signed-off-by: Bjorn Neergaard <[email protected]>
Signed-off-by: Cory Snider <[email protected]>
On modern Linux systems, the propagation mode for all mounts in the
initial namespace is set to shared early in boot. The propagation mode
of any container root directory is therefore also set to shared on such
systems as the default propagation mode of any mount is inherited from
the propagation mode of its parent mount. The runtime changes the
propagation mode of the container namespace's '/' mount to slave (by
default) to prevent mounts inside the container from propagating out
into the initial namespace, but any filesystems mounted under the rootfs
in the initial namespace could still propagate into the container.

When a 'docker cp' command is invoked, the daemon mounts all container
volumes under the rootfs directory in the initial namespace so that
files can be copied into and out of paths located inside volumes. The
mounts propagate into the container's namespace, overmounting the volume
mounts with themselves. This is not problematic in trivial cases as the
unmounts also propagate into the container namespace, reverting the
overmounts. However, when a parent directory of the container's rootfs
is configured as a container volume, the mounts propagated into the
container namespace for 'docker cp' are not fully reversed when the
daemon unmounts them in the initial namespace. Subsequent commands
propagate new mounts into the container namespace which do not get
cleaned up, which build up until the mount table fills to its limit and
subsequent mount operations fail with ENOSPC.

Set the propagation mode for the container rootfs path to private before
starting the container so that child mounts added in the initial
namespace after the fact never propagate into the container namespace in
the first place.

Signed-off-by: Cory Snider <[email protected]>
@corhere
Copy link
Contributor Author

corhere commented Sep 22, 2022

Come to think of it, making the container root fs mounts private wouldn't work as it would be subject to all those same lovely race conditions. If another container was to be started with /var/lib/docker bind-mounted in while a docker cp operation was in progress, all those volume mounts would be copied into the started container's namespace with private propagation and so would not be unmounted from that container's namespace when the docker cp operation unmounts them from the initial namespace. It really is looking like our only viable options are unbindable mounts or performing the copy in a different mount namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant