-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[release/2.0] Fix pidfd leak in UnshareAfterEnterUserns #12178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/2.0] Fix pidfd leak in UnshareAfterEnterUserns #12178
Conversation
|
Hi @k8s-infra-cherrypick-robot. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
it could be double close if using go1.23.0 or go1.23.1 to build
28a4aa1 to
39811fb
Compare
39811fb to
bd0b862
Compare
bd0b862 to
a0851c2
Compare
c2791df to
cc4496d
Compare
|
LGTM |
| return fmt.Errorf("unshare flags should not include user namespace") | ||
| } | ||
|
|
||
| if !SupportsPidFD() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fuweid I like moving this up here and verifying we have things we want before doing a bunch of work only to determine this later, but was it necessary as part of the backport? I might be missing the reasoning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a bug in go1.23.{0, 1}. we need to add some checks about go version before we close it. like
goVer := runtime.Version()
needDup := goVer == "go1.23.0" || goVer == "go1.23.1"
...
if pidfd == -1 || !SupportsPidFD() {
if pidfd != -1 && !needDup {
unix.Close(pidfd)
}
...
}
if needDup {
...
}
defer unix.Close(pidfd)
..
}
Sorry I made code hard to read in first revision 😢 . What do you think about that above change without moving SupportsPidFD() up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh, I see. Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?
So in go 1.23.{0,1} we can't close the pidfd we get back from StartProcess due to that bug, and therefore have to be extra careful about when we close it (only can close the dup'ed one)
I think your logic above makes sense! Maybe in your PR description + commit message after the SOB by Jose, but before yours include this information? I'm new to these projects here, but the kernel does that sort of thing for backports that need modification to make it clear that it didn't apply cleanly or needed some logical change to work well!
i.e.:
Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails")
Cc: [email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
[ Conflicts in subflow.c, because commit f1f26512a9bf ("mptcp: use plain
bool instead of custom binary enum") and commit 46a5d3abedbe
("mptcp: fix typos in comments") are not in this version. Both are
causing conflicts in the context, and the same modifications can still
be applied. ]
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?
We don't use go1.23.2 to build but users may use old version. So we keep it here. Once v1.25 is out, we can remove this part of code and update go.mod to go1.24.
Updated the git commit message, please take a look. thanks!
| return fmt.Errorf("unshare flags should not include user namespace") | ||
| } | ||
|
|
||
| if !SupportsPidFD() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh, I see. Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?
So in go 1.23.{0,1} we can't close the pidfd we get back from StartProcess due to that bug, and therefore have to be extra careful about when we close it (only can close the dup'ed one)
I think your logic above makes sense! Maybe in your PR description + commit message after the SOB by Jose, but before yours include this information? I'm new to these projects here, but the kernel does that sort of thing for backports that need modification to make it clear that it didn't apply cleanly or needed some logical change to work well!
i.e.:
Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails")
Cc: [email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
[ Conflicts in subflow.c, because commit f1f26512a9bf ("mptcp: use plain
bool instead of custom binary enum") and commit 46a5d3abedbe
("mptcp: fix typos in comments") are not in this version. Both are
causing conflicts in the context, and the same modifications can still
be applied. ]
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
UnshareAfterEnterUserns() creates a pidfd via os.StartProcess() with CLONE_PIDFD but fails to close the file descriptor in any code path, resulting in a file descriptor leak for every container that uses user namespace isolation. The leak occurs because: - The pidfd is created when PidFD field is set in SysProcAttr - The original defer block only calls PidfdSendSignal() and pidfdWaitid() - No code path calls unix.Close(pidfd) to release the file descriptor This causes one pidfd leak per container launch when user namespace isolation is enabled (e.g., Kubernetes pods with hostUsers: false). In production environments with high container churn, this can exhaust the system's file descriptor limit. Fix the leak by adding a defer statement immediately after process creation that ensures unix.Close(pidfd) is always called, regardless of which code path is taken. This guarantees cleanup even if the function returns early due to errors or lack of pidfd support. This follows the same cleanup pattern already established in core/mount/mount_idmapped_utils_linux.go:getUsernsFD() which properly closes its pidfd. Closes: containerd#12166 Signed-off-by: Jose Fernandez <[email protected]> [Move SupportsPidFD up to handle dupfd in Go 1.23.{0,1} and simplify backport] Signed-off-by: Wei Fu <[email protected]>
cc4496d to
5bec0a3
Compare
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
containerd 2.0.7 Welcome to the v2.0.7 release of containerd! The seventh patch release for containerd 2.0 includes various bug fixes and updates. * **containerd** * [**GHSA-pwhc-rpq9-4c8w**](GHSA-pwhc-rpq9-4c8w) * [**GHSA-m6hq-p25p-ffr2**](GHSA-m6hq-p25p-ffr2) * **runc** * [**GHSA-qw9x-cqr3-wc7r**](GHSA-qw9x-cqr3-wc7r) * [**GHSA-cgrx-mc8f-2prm**](GHSA-cgrx-mc8f-2prm) * [**GHSA-9493-h29p-rfm2**](GHSA-9493-h29p-rfm2) * **Disable event subscriber during task cleanup** ([containerd#12406](containerd#12406)) * **Add SystemdCgroup to default runtime options** ([containerd#12254](containerd#12254)) * **Fix userns with container image VOLUME mounts that need copy** ([containerd#12241](containerd#12241)) * **Add dial timeout field to hosts toml configuration** ([containerd#12136](containerd#12136)) * **Update runc binary to v1.3.3** ([containerd#12479](containerd#12479)) * **Fix lost container logs from quickly closing io** ([containerd#12376](containerd#12376)) * **Create bootstrap.json with 0644 permission** ([containerd#12184](containerd#12184)) * **Fix pidfd leak in UnshareAfterEnterUserns** ([containerd#12178](containerd#12178)) Please try out the release binaries and report any issues at https://github.com/containerd/containerd/issues. * Austin Vazquez * Phil Estes * Rodrigo Campos * Wei Fu * Akihiro Suda * Derek McGowan * Maksym Pavlenko * ningmingxiao * Kirtana Ashok * Akhil Mohan * Andrew Halaney * Jin Dong * Jose Fernandez * Mike Baynton * Philip Laine * Swagat Bora * wheat2018 <details><summary>56 commits</summary> <p> * Prepare release notes for v2.0.7 ([containerd#12482](containerd#12482)) * [`4931e24f1`](containerd@4931e24) Prepare release notes for v2.0.7 * [`205bc4f2d`](containerd@205bc4f) Update mailmap * [`5f708b76a`](containerd@5f708b7) Merge commit from fork * [`8cd112d82`](containerd@8cd112d) Fix directory permissions * [`05290b5bc`](containerd@05290b5) Merge commit from fork * [`4d1edf4ad`](containerd@4d1edf4) fix goroutine leak of container Attach * Update runc binary to v1.3.3 ([containerd#12479](containerd#12479)) * [`b46dc6a67`](containerd@b46dc6a) runc: Update runc binary to v1.3.3 * ci: bump Go 1.24.9; 1.25.3 ([containerd#12361](containerd#12361)) * [`5e9c82178`](containerd@5e9c821) Update GHA runners to use latest images for basic binaries build * [`7f59248dc`](containerd@7f59248) Update GHA runners to use latest image for most jobs * [`e1373e8a8`](containerd@e1373e8) ci: bump Go 1.24.9, 1.25.3 * [`e1a910a6a`](containerd@e1a910a) ci: bump Go 1.24.8; 1.25.2 * [`fd04b7f17`](containerd@fd04b7f) move exclude-dirs to issues.exclude-dirs * [`b49377975`](containerd@b493779) update golangci-lint to v1.64.2 * [`6e45022a1`](containerd@6e45022) build(deps): bump golangci/golangci-lint-action from 6.3.2 to 6.5.0 * [`09ce0f2a1`](containerd@09ce0f2) build(deps): bump golangci/golangci-lint-action from 6.2.0 to 6.3.2 * [`de63a740b`](containerd@de63a74) build(deps): bump golangci/golangci-lint-action from 6.1.1 to 6.2.0 * Fix lost container logs from quickly closing io ([containerd#12376](containerd#12376)) * [`f953ee8a3`](containerd@f953ee8) bugfix:fix container logs lost because io close too quickly * CI: update Fedora to 43 ([containerd#12448](containerd#12448)) * [`f6f15f513`](containerd@f6f15f5) CI: update Fedora to 43 * Disable event subscriber during task cleanup ([containerd#12406](containerd#12406)) * [`2a2329cbd`](containerd@2a2329c) cri/server/podsandbox: disable event subscriber * CI: skip ubuntu-24.04-arm on private repos ([containerd#12428](containerd#12428)) * [`dfb954743`](containerd@dfb9547) CI: skip ubuntu-24.04-arm on private repos * Remove additional fuzzers from instrumentation repo ([containerd#12420](containerd#12420)) * [`f6b02f6bb`](containerd@f6b02f6) Remove additional fuzzers from CI * runc:Update runc binary to v1.3.1 ([containerd#12275](containerd#12275)) * [`75c13ee3f`](containerd@75c13ee) runc:Update runc binary to v1.3.1 * Add SystemdCgroup to default runtime options ([containerd#12254](containerd#12254)) * [`427cdd06c`](containerd@427cdd0) add SystemdCgroup to default runtime options * install-runhcs-shim: fetch target commit instead of tags ([containerd#12255](containerd#12255)) * [`0b35e19fb`](containerd@0b35e19) install-runhcs-shim: fetch target commit instead of tags * Fix userns with container image VOLUME mounts that need copy ([containerd#12241](containerd#12241)) * [`3212afc2f`](containerd@3212afc) integration: Add test for directives with userns * [`b855c6e10`](containerd@b855c6e) cri: Fix userns with Dockerfile VOLUME mounts that need copy * Fix overlayfs issues related to user namespace ([containerd#12223](containerd#12223)) * [`05c0c99f4`](containerd@05c0c99) core/mount: Retry unmounting idmapped directories * [`afdede4ce`](containerd@afdede4) core/mount: Test cleanup of DoPrepareIDMappedOverlay() * [`47205f814`](containerd@47205f8) core/mount: Properly cleanup on doPrepareIDMappedOverlay errors * [`6f4abd970`](containerd@6f4abd9) core/mount: Don't call nil function on errors * [`a2f0d65d7`](containerd@a2f0d65) core/mount: Only idmap once per overlayfs, not per layer * [`1c32accd7`](containerd@1c32acc) Make ovl idmap mounts read-only * ci: bump Go 1.23.12, 1.24.6 ([containerd#12187](containerd#12187)) * [`9e72e91e6`](containerd@9e72e91) ci: bump Go 1.23.12, 1.24.6 * Create bootstrap.json with 0644 permission ([containerd#12184](containerd#12184)) * [`009622e04`](containerd@009622e) fix: create bootstrap.json with 0644 permission * Fix pidfd leak in UnshareAfterEnterUserns ([containerd#12178](containerd#12178)) * [`5bec0a332`](containerd@5bec0a3) sys: fix pidfd leak in UnshareAfterEnterUserns * Fix windows test failures ([containerd#12120](containerd#12120)) * [`2a2488131`](containerd@2a24881) Fix intermittent test failures on Windows CIs * [`018470948`](containerd@0184709) Remove WS2025 from CIs due to regression * Add dial timeout field to hosts toml configuration ([containerd#12136](containerd#12136)) * [`b50cbbc98`](containerd@b50cbbc) Add dial timeout field to hosts toml configuration </p> </details> This release has no dependency changes Previous release can be found at [v2.0.6](https://github.com/containerd/containerd/releases/tag/v2.0.6) * `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`: ✅Recommended. Dynamically linked with glibc 2.31 (Ubuntu 20.04). * `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`: Statically linked. Expected to be used on non-glibc Linux distributions. Not position-independent. In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases) and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too. See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
This is an automated cherry-pick of #12167
/assign fuweid