Skip to content

Preserve cgroup mount options for privileged containers#12952

Merged
samuelkarp merged 4 commits intocontainerd:mainfrom
chrishenzie:mount-option-removal
Mar 24, 2026
Merged

Preserve cgroup mount options for privileged containers#12952
samuelkarp merged 4 commits intocontainerd:mainfrom
chrishenzie:mount-option-removal

Conversation

@chrishenzie
Copy link
Copy Markdown
Member

@chrishenzie chrishenzie commented Feb 28, 2026

Privileged containers don't have a cgroup namespace, so by default they run in the host's cgroup namespace.

// cgroupns is used for hiding /sys/fs/cgroup from containers.
// For compatibility, cgroupns is not used when running in cgroup v1 mode or in privileged.
// https://github.com/containers/libpod/issues/4363
// https://github.com/kubernetes/enhancements/blob/0e409b47497e398b369c281074485c8de129694f/keps/sig-node/20191118-cgroups-v2.md#cgroup-namespace
if isUnifiedCgroupsMode() && !securityContext.GetPrivileged() {
specOpts = append(specOpts, oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.CgroupNamespace}))
}

When mounting cgroup2 inside a privileged container, applying a different set of mount options can inadvertently alter the host's shared cgroup2 VFS superblock mount options. Because the container's mount options were previously hardcoded, any additional host mount options like nsdelegate or memory_recursiveprot would be accidentally stripped from the host.

Fixes this issue by reading the host's /sys/fs/cgroup mount options during container creation and explicitly including them if the container is privileged.

An integration test is also included to verify that the host's cgroup mount options remain unchanged before and after running a privileged container.

Additionally updates the Vagrantfile and cri-integration script to forward the RUNC_FLAVOR environment variable to conditionally skip the integration test for crun until support is added for nsdelegate.

Assisted-by: gemini-cli

@samuelkarp @Divya063

@github-project-automation github-project-automation Bot moved this to Needs Triage in Pull Request Review Feb 28, 2026
@dosubot dosubot Bot added the area/cri Container Runtime Interface (CRI) label Feb 28, 2026
@chrishenzie chrishenzie added kind/bug and removed area/cri Container Runtime Interface (CRI) labels Feb 28, 2026
Comment thread internal/cri/opts/spec_linux_opts.go
@chrishenzie chrishenzie force-pushed the mount-option-removal branch 8 times, most recently from 2afa220 to 0debea3 Compare March 3, 2026 07:04
@chrishenzie
Copy link
Copy Markdown
Member Author

Continuing to wrestle some integration test issues, converting back to draft until I sort those out.

@chrishenzie chrishenzie marked this pull request as draft March 3, 2026 16:36
@chrishenzie chrishenzie force-pushed the mount-option-removal branch from 0debea3 to 54f9e37 Compare March 3, 2026 21:59
@chrishenzie chrishenzie marked this pull request as ready for review March 4, 2026 19:43
@dosubot dosubot Bot added the area/cri Container Runtime Interface (CRI) label Mar 4, 2026
Comment thread integration/container_cgroup_mount_options_linux_test.go Outdated
Comment thread integration/container_cgroup_mount_options_linux_test.go
Comment thread integration/container_cgroup_mount_options_linux_test.go Outdated
Comment thread internal/cri/opts/spec_linux_opts.go Outdated
Moves cgroup namespace addition logic higher in buildLinuxSpec so it
runs before any custom spec adjusters (such as WithMounts).

This is necessary because subsequent spec adjusters may want to inspect
the set of namespaces to make decisions (e.g., configuring mount options
based on whether or not they are shared with the host).

Signed-off-by: Chris Henzie <[email protected]>
@chrishenzie chrishenzie force-pushed the mount-option-removal branch 2 times, most recently from 0141288 to 3e31e6e Compare March 11, 2026 06:08
Comment thread integration/container_cgroup_mount_options_linux_test.go Outdated
@chrishenzie chrishenzie force-pushed the mount-option-removal branch from 3e31e6e to 9594a13 Compare March 17, 2026 16:33
Copy link
Copy Markdown
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM

Comment thread internal/cri/opts/spec_linux_opts.go Outdated
Comment thread integration/container_cgroup_mount_options_linux_test.go Outdated
Comment thread integration/container_cgroup_mount_options_linux_test.go Outdated
@chrishenzie chrishenzie force-pushed the mount-option-removal branch 2 times, most recently from c1e2aa0 to abb5aae Compare March 18, 2026 05:10
Comment thread internal/cri/opts/spec_linux_opts.go Outdated
@github-project-automation github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review Mar 23, 2026
Privileged containers don't have a cgroup namespace and share the host's
cgroup namespace. Mounting cgroup2 inside these containers can
inadvertently alter the host's cgroup2 VFS superblock mount options
because they are shared.

To prevent this, update WithMounts to read the host's /sys/fs/cgroup
mount options and explicitly propagate nsdelegate and
memory_recursiveprot into the container's mount spec. This avoids
stripping them on the host when they are not in the hardcoded default
set.

Signed-off-by: Chris Henzie <[email protected]>
Update Vagrantfile and cri-integration test runner to forward
RUNC_FLAVOR to the test environment.

Allows integration tests to conditionally skip testing certain cgroup
mount setups when running against other runtimes that may not support
them yet.

Signed-off-by: Chris Henzie <[email protected]>
Verifies that running a privileged container does not alter host cgroup
mount options (specifically nsdelegate and memory_recursiveprot).

Creates a privileged sandbox and container, starts it, and compares the
host's /sys/fs/cgroup mount options before and after execution to
guarantee safety.

Signed-off-by: Chris Henzie <[email protected]>
@chrishenzie chrishenzie force-pushed the mount-option-removal branch from abb5aae to 0eef29a Compare March 23, 2026 19:24
@samuelkarp samuelkarp removed the request for review from cpuguy83 March 24, 2026 22:04
@samuelkarp samuelkarp added this pull request to the merge queue Mar 24, 2026
@samuelkarp samuelkarp added cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch labels Mar 24, 2026
Merged via the queue into containerd:main with commit 248b1a6 Mar 24, 2026
95 of 96 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Mar 24, 2026
@chrishenzie
Copy link
Copy Markdown
Member Author

/cherry-pick release/2.1
/cherry-pick release/2.2

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@chrishenzie: only containerd org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually.

Details

In response to this:

/cherry-pick release/2.1
/cherry-pick release/2.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrishenzie
Copy link
Copy Markdown
Member Author

/cherry-pick release/2.1
/cherry-pick release/2.2

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@chrishenzie: new pull request created: #13119

Details

In response to this:

/cherry-pick release/2.1
/cherry-pick release/2.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@chrishenzie: new pull request created: #13120

Details

In response to this:

/cherry-pick release/2.1
/cherry-pick release/2.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrishenzie chrishenzie deleted the mount-option-removal branch March 25, 2026 02:24
@samuelkarp samuelkarp added cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch and removed cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch labels Mar 25, 2026
@chrishenzie chrishenzie added cherry-picked/2.2.x PR commits are cherry-picked into release/2.2 branch and removed cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch cherry-picked/2.2.x PR commits are cherry-picked into release/2.2 branch kind/bug size/L

Projects

Development

Successfully merging this pull request may close these issues.

5 participants