Skip to content

privileged container's disk attribute is ro because the namespace of pause container is added #11270

@fengwei0328

Description

@fengwei0328

Description

When I create the container, privileged is turned on, but since the pause container sysfs is ro, my privileged container is also ro

Steps to reproduce the issue

pod.json

{
    "metadata": {
        "name": "privileged-pod",
	"namespace": "k8s.io",
	"uid": "hdishd83djaidwnduwk28bcsb"
    },
    "command": [
        "top"
    ],
    "log_directory": "/var/log/pods",
    "linux": {
	    "security_context": {
            	"privileged": true
            }
 

container.json

{
    "metadata": {
        "name": "busybox-200-3"
    },
    "image": {
        "image": "docker.io/library/busybox"
    },
    "command": [
        "top"
    ],

    "tty": true,
    "stdin": true,
    "log_path": "busybox-200-3.log",
    "mounts":[
    	{
	    "container_path": "/sys",
            "host_path": "/sys"
	}
    ],
    "linux": {
	    "security_context": {
            	"privileged": true
 

In container

/ # mount | grep  sysfs
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)

Examining a privileged container run through Kubernetes, we see this in the OCI bundle config.json:

    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "rw"
      ]
    },

This is so because pod's config.json:

{
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        },

I've found this to be because the pause container is configured by default:

func defaultMounts() []specs.Mount {

{
Destination: "/sys",
Type: "sysfs",
Source: "sysfs",
Options: []string{"nosuid", "noexec", "nodev", "ro"},
},

I found a workaround when RunPodSandbox:

// Create sandbox container.
// NOTE: sandboxContainerSpec SHOULD NOT have side
// effect, e.g. accessing/creating files, so that we can test
// it safely.
spec, err := c.sandboxContainerSpec(id, config, &image.ImageSpec.Config, metadata.NetNSPath, ociRuntime.PodAnnotations)
if err != nil {
return cin, fmt.Errorf("failed to generate sandbox container spec: %w", err)
}

//If privileged is enabled, sysfs must have the rw attribute
	if config.Linux.SecurityContext.Privileged {
		for i, k := range spec.Mounts {
			if k.Destination == "/sys" {
				spec.Mounts[i].Options = []string{"nosuid", "noexec", "nodev", "rw"}
				break
			}
		}
	

I'm implementing the ability to pass the mount property, similar to --mount for ctr and nerdctl, I wanted to implement it and then mention it, but I can provide a way to circumvent it first.

Describe the results you received and expected

Hopefully, in the case of non-bind, the sysfs of the privileged container is rw

What version of containerd are you using?

containerd github.com/containerd/containerd/v2 v2.0.1 88aa2f5

Any other relevant information

No response

Show configuration if it is related to CRI plugin.

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions