Conversation
daemon/daemon_unix.go
Outdated
| // unconditionally unmount it on cleanup without a chance of unmounting | ||
| // any mounts managed by the graph driver. | ||
| rootfs := container.BaseFS.Path() | ||
| return mount.Mount(rootfs, rootfs, "", "bind,private") |
There was a problem hiding this comment.
Instead of making each container rootfs private, we could solve the issue in daemon.setupDaemonRootPropagation by ensuring the propagation of /var/lib/docker is slave or private rather than the current behaviour of ensuring it is shared or slave. WDYT?
There was a problem hiding this comment.
That might not be a great idea, come to think of it. If /var/lib/docker had slave or private propagation and a container is started with /var/lib/docker as a volume, any rootfs mounts for other containers would get copied into the container namespace with a propagation mode which would prevent unmounts of the other containers' rootfs'es from propagating into that container? Should the daemon be forcing the propagation mode of /var/lib/docker to shared(+slave)?
cf5c763 to
9aa38cc
Compare
Modify the DinD entrypoint scripts to make the issue reproducible inside a DinD container. Co-authored-by: Bjorn Neergaard <[email protected]> Signed-off-by: Bjorn Neergaard <[email protected]> Signed-off-by: Cory Snider <[email protected]>
On modern Linux systems, the propagation mode for all mounts in the initial namespace is set to shared early in boot. The propagation mode of any container root directory is therefore also set to shared on such systems as the default propagation mode of any mount is inherited from the propagation mode of its parent mount. The runtime changes the propagation mode of the container namespace's '/' mount to slave (by default) to prevent mounts inside the container from propagating out into the initial namespace, but any filesystems mounted under the rootfs in the initial namespace could still propagate into the container. When a 'docker cp' command is invoked, the daemon mounts all container volumes under the rootfs directory in the initial namespace so that files can be copied into and out of paths located inside volumes. The mounts propagate into the container's namespace, overmounting the volume mounts with themselves. This is not problematic in trivial cases as the unmounts also propagate into the container namespace, reverting the overmounts. However, when a parent directory of the container's rootfs is configured as a container volume, the mounts propagated into the container namespace for 'docker cp' are not fully reversed when the daemon unmounts them in the initial namespace. Subsequent commands propagate new mounts into the container namespace which do not get cleaned up, which build up until the mount table fills to its limit and subsequent mount operations fail with ENOSPC. Set the propagation mode for the container rootfs path to private before starting the container so that child mounts added in the initial namespace after the fact never propagate into the container namespace in the first place. Signed-off-by: Cory Snider <[email protected]>
9aa38cc to
373ce66
Compare
|
Come to think of it, making the container root fs mounts private wouldn't work as it would be subject to all those same lovely race conditions. If another container was to be started with |
- What I did
MS_UNBINDABLEor temporary mount namespaces- How I did it
- How to verify it
Check out the first commit and run the integration test to observe it failing. Repeat with the second commit to observe it passing.
- Description for the changelog
docker cpfailing with "no space left on device" when the daemon root is mounted into the container as a volume- A picture of a cute animal (not mandatory but encouraged)