Skip to content

privileged_without_host_devices=true prevents running containers within DIND #5679

@dtnyn

Description

@dtnyn

This is a duplication of the issue on the old CRI project repo #6643 since we're still experiencing the same problem

Description

When configuring containerd with privileged_without_host_devices=true while using runc1.0.0-rc91 onwards, we cannot run any containers on a privileged DIND container.

We saw the following error running a normal container in DIND on a node that is running containerd-1.4.3 with runc1.0.0-rc95

/ # docker container run alpine:3.7
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "c 5:1 rwm": write /sys/fs/cgroup/devices/docker/d933b364769b1857434209b2576d82c40871670730ab190d983c4b0b50493045/devices.allow: operation not permitted: unknown.

This is due to the removal of the whitelisting of the /dev/console rule c 5:1 rwm in runc1.0.0-rc91 and oci.WithAllDevicesAllowed is not enabled for priviledged container when privileged_without_host_devices=true is configured.

/ # cat /sys/fs/cgroup/devices/devices.list
b *:* m
c *:* m
c 1:3 rwm
c 1:5 rwm
c 1:7 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:2 rwm
c 10:200 rwm
c 136:* rwm

When privileged=true and privileged_without_host_devices=false. This is not a problem because

/ # cat /sys/fs/cgroup/devices/devices.list
a *:* rwm

Steps to reproduce the issue:

Using the pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: dind-test
  namespace: default
spec:
  containers:
    - name: client
      image: docker:stable-dind
      command:
        - sleep
        - "86400"
      env:
        - name: DOCKER_HOST
          value: tcp://127.0.0.1:2375
    - name: dind
      image: docker:stable-dind
      env:
        - name: DOCKER_TLS_CERTDIR
      securityContext:
        privileged: true

and then exec-ing into the client container:

$ kubectl exec -it test -c client -- sh
/ # docker container run alpine:3.7
Unable to find image 'alpine:3.7' locally
3.7: Pulling from library/alpine
5d20c808ce19: Pull complete 
Digest: sha256:8421d9a84432575381bfabd248f1eb56f3aa21d9d7cd2511583c68c9b7511d10
Status: Downloaded newer image for alpine:3.7
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "c 5:1 rwm": write /sys/fs/cgroup/devices/docker/4dcb197f000dbe16e9aad23ae2c5105006b4e3e9ed41d5455436ee68cd606e13/devices.allow: operation not permitted: unknown.
/ # 

Workaround

The workaround of adding a lifecycle to the privileged Kubernetes container to update the cgroup still works:

      lifecycle:
        postStart:
          exec:
            command:
              - /bin/sh
              - -c
              - echo "a *:* rwm" > /sys/fs/cgroup/devices/devices.allow

But however at current state, this would be enabled for every privileged DIND container spec for the daemon to run any container at all.

Proposed Solution

Considering the comment regarding keeping the behaviour of privileged_without_host_devices unchanged containerd/cri#1567 (comment). The proposed solution would be to add a new flag option and enable oci.WithAllDevicesAllowed on top of privileged_without_host_devices when the flag is enabled.

diff --git a/pkg/cri/config/config.go b/pkg/cri/config/config.go
index b1c04add3..3814d099c 100644
--- a/pkg/cri/config/config.go
+++ b/pkg/cri/config/config.go
@@ -54,6 +54,10 @@ type Runtime struct {
        // PrivilegedWithoutHostDevices overloads the default behaviour for adding host devices to the
        // runtime spec when the container is privileged. Defaults to false.
        PrivilegedWithoutHostDevices bool `toml:"privileged_without_host_devices" json:"privileged_without_host_devices"`
+       // PrivilegedWithoutHostDevicesAllDevicesAllowed overloads the default behaviour for mounting host devices and
+       // whitelisting devices to the runtime spec when the container is privileged. Requires
+       // PrivilegedWithoutHostDevices to be enabled. Defaults to false.
+       PrivilegedWithoutHostDevicesAllDevicesAllowed bool `toml:"privileged_without_host_devices_all_devices_allowed" json:"privileged_without_host_devices_all_devices_allowed"`
        // BaseRuntimeSpec is a json file with OCI spec to use as base spec that all container's will be created from.
        BaseRuntimeSpec string `toml:"base_runtime_spec" json:"baseRuntimeSpec"`
 }
diff --git a/pkg/cri/server/container_create_linux.go b/pkg/cri/server/container_create_linux.go
index 26386e991..12ba708aa 100644
--- a/pkg/cri/server/container_create_linux.go
+++ b/pkg/cri/server/container_create_linux.go
@@ -221,6 +221,10 @@ func (c *criService) containerSpec(
                if !ociRuntime.PrivilegedWithoutHostDevices {
                        specOpts = append(specOpts, oci.WithHostDevices, oci.WithAllDevicesAllowed)
                } else {
+                       // allow rwm on all devices for the container
+                       if ociRuntime.PrivilegedWithoutHostDevicesAllDevicesAllowed {
+                               specOpts = append(specOpts, oci.WithAllDevicesAllowed)
+                       }
                        // add requested devices by the config as host devices are not automatically added
                        specOpts = append(specOpts, customopts.WithDevices(c.os, config),
                                customopts.WithCapabilities(securityContext, c.allCaps))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions