Preserve cgroup mount options for privileged containers#12952
Preserve cgroup mount options for privileged containers#12952samuelkarp merged 4 commits intocontainerd:mainfrom
Conversation
2afa220 to
0debea3
Compare
|
Continuing to wrestle some integration test issues, converting back to draft until I sort those out. |
0debea3 to
54f9e37
Compare
Moves cgroup namespace addition logic higher in buildLinuxSpec so it runs before any custom spec adjusters (such as WithMounts). This is necessary because subsequent spec adjusters may want to inspect the set of namespaces to make decisions (e.g., configuring mount options based on whether or not they are shared with the host). Signed-off-by: Chris Henzie <[email protected]>
0141288 to
3e31e6e
Compare
3e31e6e to
9594a13
Compare
samuelkarp
left a comment
There was a problem hiding this comment.
Minor comments, otherwise LGTM
c1e2aa0 to
abb5aae
Compare
Privileged containers don't have a cgroup namespace and share the host's cgroup namespace. Mounting cgroup2 inside these containers can inadvertently alter the host's cgroup2 VFS superblock mount options because they are shared. To prevent this, update WithMounts to read the host's /sys/fs/cgroup mount options and explicitly propagate nsdelegate and memory_recursiveprot into the container's mount spec. This avoids stripping them on the host when they are not in the hardcoded default set. Signed-off-by: Chris Henzie <[email protected]>
Update Vagrantfile and cri-integration test runner to forward RUNC_FLAVOR to the test environment. Allows integration tests to conditionally skip testing certain cgroup mount setups when running against other runtimes that may not support them yet. Signed-off-by: Chris Henzie <[email protected]>
Verifies that running a privileged container does not alter host cgroup mount options (specifically nsdelegate and memory_recursiveprot). Creates a privileged sandbox and container, starts it, and compares the host's /sys/fs/cgroup mount options before and after execution to guarantee safety. Signed-off-by: Chris Henzie <[email protected]>
abb5aae to
0eef29a
Compare
|
/cherry-pick release/2.1 |
|
@chrishenzie: only containerd org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release/2.1 |
|
@chrishenzie: new pull request created: #13119 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@chrishenzie: new pull request created: #13120 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Privileged containers don't have a cgroup namespace, so by default they run in the host's cgroup namespace.
containerd/internal/cri/server/container_create.go
Lines 933 to 939 in d1d9d07
When mounting cgroup2 inside a privileged container, applying a different set of mount options can inadvertently alter the host's shared cgroup2 VFS superblock mount options. Because the container's mount options were previously hardcoded, any additional host mount options like
nsdelegateormemory_recursiveprotwould be accidentally stripped from the host.Fixes this issue by reading the host's
/sys/fs/cgroupmount options during container creation and explicitly including them if the container is privileged.An integration test is also included to verify that the host's cgroup mount options remain unchanged before and after running a privileged container.
Additionally updates the Vagrantfile and cri-integration script to forward the
RUNC_FLAVORenvironment variable to conditionally skip the integration test forcrununtil support is added fornsdelegate.Assisted-by: gemini-cli
@samuelkarp @Divya063