Skip to content

introduce DOCKER_REMOUNT_SYS_RW (allow Docker-in-Kata without --privileged)#191

Closed
AkihiroSuda wants to merge 1 commit intodocker-library:masterfrom
AkihiroSuda:kata
Closed

introduce DOCKER_REMOUNT_SYS_RW (allow Docker-in-Kata without --privileged)#191
AkihiroSuda wants to merge 1 commit intodocker-library:masterfrom
AkihiroSuda:kata

Conversation

@AkihiroSuda
Copy link
Copy Markdown
Contributor

@AkihiroSuda AkihiroSuda commented Sep 12, 2019

Docker-in-Kata can be launched with --privileged but it ruins the benefit of Kata because it mounts /dev from the host.

Now Docker-in-Kata can be launched without --privileged:

$ docker run \
  --runtime kata \
  -e DOCKER_REMOUNT_SYS_RW=1 \
  --cap-add all \
  --security-opt seccomp=unconfined \
  --security-opt systempaths=unconfined \
  docker:dind

Tested with Kata Containers 1.8.0
(1.8.1 is broken: kata-containers/runtime#2047)

Alternative to moby/moby#39702

Signed-off-by: Akihiro Suda [email protected]

@AkihiroSuda AkihiroSuda changed the title introduce DOCKER_REMOUNT_SYS_RW (allow Docker-in-Kata without --privi… introduce DOCKER_REMOUNT_SYS_RW (allow Docker-in-Kata without --privileged) Sep 12, 2019
@thaJeztah
Copy link
Copy Markdown
Contributor

Does this change also allow DIND on non-kata runtimes?
Perhaps this would be a good example for inclusion in the docs for this image 🤔 (or would it be too specific if it's only for kata containers?)

--cap-add all

Curious; do we know if a more limited set of caps works?

…leged)

Docker-in-Kata can be launched with `--privileged` but it ruins the benefit of
Kata because it mounts `/dev` from the host.

Now Docker-in-Kata can be launched without `--privileged`:

  $ docker run --runtime kata -e DOCKER_REMOUNT_SYS_RW=1 --cap-add all --security-opt seccomp=unconfined --security-opt systempaths=unconfined docker:dind

Tested with Kata Containers 1.8.0
(1.8.1 is broken: kata-containers/runtime#2047)

Alternative to moby/moby#39702

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda
Copy link
Copy Markdown
Contributor Author

Does this change also allow DIND on non-kata runtimes?

This doesn't seem to work on runc, however, even if it worked with runc, it doesn't make sense, because systempaths=unconfined allows container breakout (which is not problematic for Kata as long as the host devices are not mounted)

Curious; do we know if a more limited set of caps works?

Probably we can drop some capabilities related to power management and Smack stuffs, but I don't see any benefit because anyway we need systempaths=unconfined

#
# Full flags needed for Docker-in-Kata:
# --runtime kata -e DOCKER_REMOUNT_SYS_RW=1 --cap-add all --security-opt seccomp=unconfined --security-opt systempaths=unconfined
echo "Remounting /sys in RW"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment above;

  • Should this print a WARNING: ?
  • Should this produce an error if runtime is default/runc ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this print a WARNING: ?

I don't think this is WARNING. Because dind is typically launched with --privileged and sysfs is already RW

Should this produce an error if runtime is default/runc ?

I'm not sure there is a robust way to detect runc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there is a robust way to detect runc

Perhaps if --runtime is not set, idk (it was just a thought)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't inspect docker run flags from containers :p

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂ arch, my bad, wasn't thinking; was thinking the flag was passed to the dind daemon 😂

@tianon
Copy link
Copy Markdown
Member

tianon commented Sep 12, 2019

Hilariously, this used to be a major part of the purpose of the hack/dind script in Docker's upstream repository. 😄 (before Docker mounted /sys/fs/cgroup by default)

@AkihiroSuda
Copy link
Copy Markdown
Contributor Author

@tianon

Do you think this should be moved to hack/dind in the moby repo?

@tianon
Copy link
Copy Markdown
Member

tianon commented Sep 19, 2019

I do think this functionality makes more sense there, yes.

Do you think there's anything that can be done to auto-detect the "bad" state to some degree?

You can see the old "cgroups mounting" implementation in the commit which deleted it, which might be useful for reference: moby/moby@81aa1b5#diff-fbcad9c68dbfcabbf5f6bf87019cc571

As a perhaps cleaner (in this case, meaning more generic) idea, is it potentially possible to detect that the /sys/fs/cgroup mounts are insufficient and simply remove them entirely, creating a whole new mount hierarchy to replace them with appropriate mounts? (which is roughly what that old code did)

https://github.com/tianon/cgroupfs-mount/blob/481f8b684fd67656f9d4c79828b48e24b6f9f321/cgroupfs-mount is probably another OK example, although it might have more edge cases (often used by folks/distributions without systemd automatically mounting a reasonable hierarchy).

This is also further complicated by cgroups v2, right?

@AkihiroSuda
Copy link
Copy Markdown
Contributor Author

Kata maintainers seems preferring moby/moby#39702

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants