KEP-5714: Allow specifying whether to unshare cgroup namespaces by AkihiroSuda · Pull Request #5715 · kubernetes/enhancements

AkihiroSuda · 2025-12-03T11:14:28Z

One-line PR description: Allow specifying whether to unshare cgroup namespaces

Issue link: Allow specifying whether to unshare cgroup namespaces #5714

Other comments:

k8s-ci-robot · 2025-12-03T11:24:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AkihiroSuda
Once this PR has been reviewed and has the lgtm label, please assign deads2k, mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haircommander · 2026-01-19T20:16:14Z

+[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
+-->
+
+The motivation is to allow privileged pods to unshare cgroup namespaces.


TBH this feature seems like something that could be toggled on the CRI impl side. I don't know if the use case is broad enough to warrant a new pod spec field

NRI plugin could do it too

TBH this feature seems like something that could be toggled on the CRI impl side.

Disagree. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/pod-security-admission/policy/check_hostNamespaces.go expects the namespaces to be fully controlled in the pod spec.

I don't know if the use case is broad enough to warrant a new pod spec field

I'd rather say every privileged pod should use this field to unshare the cgroup namespace, unless it really has to use the host cgroup namespace.
This applies to use cases with Podman/Buildah as well. https://www.redhat.com/en/blog/podman-inside-kubernetes

expects the namespaces to be fully controlled in the pod spec.

some of them, we never specify uts namespace because it's always pod level. that's a contract the kubelet asks the cri to uphold, but there isn't much maintaining outside of a critest. My broader point is there are plenty of cases the cri lies to the kubelet that it's doing something when really the cri is fully following users intent (NRI does this, registry mirroring does this). This case could be covered by that as well, as there's nothing in the kubelet verifying the namespace is actually pod level

I'd rather say every privileged pod should use this field to unshare the cgroup namespace, unless it really has to use the host cgroup namespace.

yeah I get the point here, but personally I don't think we (as a kubernetes project) should try that hard to secure privileged pods. it's supposed to be a really heavy hammer. customizations can exist through NRI if needed, but the pod spec has a high barrier of entry and I'm not sure there is ecosystem integration that requires it be exposed on pod level right now.

some of them, we never specify uts namespace because it's always pod level.

UTS namespace does not seem comparable to cgroup namespace here, as UTSNS does not really affect anything but the hostname and the domainname, while cgroupNS affects the actual cgroup hierarchy.

### User Stories (Optional) #### Story 1: BuildKit See <https://github.com/moby/buildkit/pull/6368>: > When buildkitd is run in a managed environment like Kubernetes without its own cgroup namespace > (the default behavior of privileged pods in Kubernetes where cgroup v2 is in use; see cgroup v2 KEP), > the OCI worker will spawn processes in cgroups that are outside of the cgroup hierarchy that was > created for the buildkitd container, leading to incorrect resource accounting and enforcement > which in turn can cause OOM errors and CPU contention on the node.

yeah I get the point here, but personally I don't think we (as a kubernetes project) should try that hard to secure privileged pods.

I didn't say we should try to secure privileged pods.
The purpose of unsharing the cgroup namespace is just to keep the /sys/fs/cgroup hierarchy consistent with normal pods.

I'm open to what other maintainers think but I'm still personally not convinced the use cases are worth the API surface

I'm just chiming in as a user (and the author of the buildkit PR that @AkihiroSuda referenced).

It was surprising to me that a privileged pod shares the cgroupns of the host by default on a cgroup v2 system. I would expect that to be an opt-in setting much like hostNetwork and hostPID since it has to do with visibility (and the assumption of isolation even when privileged) rather than the actual capabilities of the pod processes. The privileged flag is indeed a huge hammer when it comes to capabilities, but IMHO that should not come at the cost of isolation by default. Isolation via a cgroupns has real utility independent of the security context. In this case, it would prevent the resource accounting and enforcement of all processes on the node to be interfered with (unintentionally) by a single pod process (see the referenced PR).

Looking at past discussions around the group v2 implementation, it seems like it was mainly for cgroup v1 backwards compatibility that the cgroup v2 implementation did not adopt an unshared cgroupns by default. Those concerns seem totally reasonable to me as a user. However, the utility of cgroupns isolation, which only became possible with cgroup v2, was never questioned in that discussion. Rather it appears to have been acknowledged explicitly.

Given the decision to not unshare cgroupns by default but the acknowledgement of its utility with regards to isolation, it seems reasonable that the API would support it.

@kubernetes/sig-node-leads Could you take a look?

Currently privileged pods have to inevitably use the host cgroup namespace, but it is relatively fragile with nested containers, and do not work with runc v1.4.0:

Probe with exec fails because no cgroup directory is found (can't open cgroup: openat2 /sys/fs/cgroup/...: no such file or directory) opencontainers/runc#5089 (comment)

We the runc maintainers will find a workaround to fix this regression, but for the long-term it would be nice to allow specifying whether to use the host cgroup namespace or not

Signed-off-by: Akihiro Suda <[email protected]>

k8s-ci-robot · 2026-02-18T02:48:17Z

@AkihiroSuda: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-test	`870222e`	link	true	`/test pull-enhancements-test`
pull-enhancements-verify	`870222e`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot requested review from dchen1107 and derekwaynecarr December 3, 2025 11:14

This was referenced Dec 3, 2025

dockerfile: run buildkitd within a cgroup namespace for cgroup v2 moby/buildkit#6368

Merged

add KEP for cgroups v2 support #1370

Merged

AkihiroSuda force-pushed the cgroupns branch from ac0d6dd to 486b57b Compare December 3, 2025 11:24

AkihiroSuda force-pushed the cgroupns branch from 486b57b to d140cef Compare December 3, 2025 11:25

AkihiroSuda mentioned this pull request Dec 3, 2025

Allow specifying whether to unshare cgroup namespaces #5714

Open

4 tasks

AkihiroSuda force-pushed the cgroupns branch from d140cef to 7ec2861 Compare December 3, 2025 17:06

AkihiroSuda mentioned this pull request Dec 22, 2025

Does the image build processes spawn by buildkit pods running on kubernetes use its cgroup? moby/buildkit#6432

Open

SergeyKanzhelev moved this to Triage in SIG Node 1.36 KEPs planning Jan 13, 2026

SergeyKanzhelev added this to SIG Node 1.36 KEPs planning Jan 13, 2026

SergeyKanzhelev removed this from SIG Node 1.36 KEPs planning Jan 13, 2026

haircommander reviewed Jan 19, 2026

View reviewed changes

AkihiroSuda mentioned this pull request Feb 5, 2026

Probe with exec fails because no cgroup directory is found (can't open cgroup: openat2 /sys/fs/cgroup/...: no such file or directory) opencontainers/runc#5089

Closed

KEP-5714: Allow specifying whether to unshare cgroup namespaces

870222e

Signed-off-by: Akihiro Suda <[email protected]>

AkihiroSuda force-pushed the cgroupns branch from 7ec2861 to 870222e Compare February 18, 2026 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-5714: Allow specifying whether to unshare cgroup namespaces#5715

KEP-5714: Allow specifying whether to unshare cgroup namespaces#5715
AkihiroSuda wants to merge 1 commit intokubernetes:masterfrom
AkihiroSuda:cgroupns

AkihiroSuda commented Dec 3, 2025

Uh oh!

k8s-ci-robot commented Dec 3, 2025

Uh oh!

haircommander Jan 19, 2026

Uh oh!

haircommander Jan 19, 2026

Uh oh!

AkihiroSuda Jan 20, 2026

Uh oh!

haircommander Jan 20, 2026

Uh oh!

AkihiroSuda Jan 21, 2026

Uh oh!

haircommander Jan 21, 2026

Uh oh!

marxarelli Jan 21, 2026 •

edited

Loading

Uh oh!

AkihiroSuda Feb 5, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

AkihiroSuda commented Dec 3, 2025

Uh oh!

k8s-ci-robot commented Dec 3, 2025

Uh oh!

haircommander Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

haircommander Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

AkihiroSuda Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

haircommander Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

AkihiroSuda Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

haircommander Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

marxarelli Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AkihiroSuda Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

marxarelli Jan 21, 2026 •

edited

Loading

AkihiroSuda Feb 5, 2026 •

edited

Loading