Skip to content

Conversation

@k8s-infra-cherrypick-robot

This is an automated cherry-pick of #12167

/assign fuweid

@k8s-ci-robot
Copy link

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

fuweid
fuweid previously approved these changes Aug 7, 2025
@fuweid fuweid dismissed their stale review August 7, 2025 13:50

it could be double close if using go1.23.0 or go1.23.1 to build

@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.0 branch from 28a4aa1 to 39811fb Compare August 7, 2025 14:06
@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.0 branch from 39811fb to bd0b862 Compare August 7, 2025 14:07
@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.0 branch from bd0b862 to a0851c2 Compare August 7, 2025 14:21
@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.0 branch 2 times, most recently from c2791df to cc4496d Compare August 7, 2025 14:25
@jfernandez
Copy link
Contributor

LGTM

@fuweid
Copy link
Member

fuweid commented Aug 7, 2025

cc @dcantah @rata @halaney

return fmt.Errorf("unshare flags should not include user namespace")
}

if !SupportsPidFD() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid I like moving this up here and verifying we have things we want before doing a bunch of work only to determine this later, but was it necessary as part of the backport? I might be missing the reasoning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a bug in go1.23.{0, 1}. we need to add some checks about go version before we close it. like

goVer := runtime.Version()
needDup := goVer == "go1.23.0" || goVer == "go1.23.1"

...

if pidfd == -1 || !SupportsPidFD() {
       if pidfd != -1 && !needDup {
              unix.Close(pidfd)
       }
       ...
}

if needDup {
     ...
}

defer unix.Close(pidfd)

..
}

Sorry I made code hard to read in first revision 😢 . What do you think about that above change without moving SupportsPidFD() up?

Copy link
Contributor

@halaney halaney Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, I see. Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?

So in go 1.23.{0,1} we can't close the pidfd we get back from StartProcess due to that bug, and therefore have to be extra careful about when we close it (only can close the dup'ed one)

I think your logic above makes sense! Maybe in your PR description + commit message after the SOB by Jose, but before yours include this information? I'm new to these projects here, but the kernel does that sort of thing for backports that need modification to make it clear that it didn't apply cleanly or needed some logical change to work well!

i.e.:

 Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails")
    Cc: [email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    [ Conflicts in subflow.c, because commit f1f26512a9bf ("mptcp: use plain
      bool instead of custom binary enum") and commit 46a5d3abedbe
      ("mptcp: fix typos in comments") are not in this version. Both are
      causing conflicts in the context, and the same modifications can still
      be applied. ]
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?

We don't use go1.23.2 to build but users may use old version. So we keep it here. Once v1.25 is out, we can remove this part of code and update go.mod to go1.24.

Updated the git commit message, please take a look. thanks!

return fmt.Errorf("unshare flags should not include user namespace")
}

if !SupportsPidFD() {
Copy link
Contributor

@halaney halaney Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, I see. Did containerd drop support for < go 1.23.2 at some point since the 2.0 release (and hence why that's not there)?

So in go 1.23.{0,1} we can't close the pidfd we get back from StartProcess due to that bug, and therefore have to be extra careful about when we close it (only can close the dup'ed one)

I think your logic above makes sense! Maybe in your PR description + commit message after the SOB by Jose, but before yours include this information? I'm new to these projects here, but the kernel does that sort of thing for backports that need modification to make it clear that it didn't apply cleanly or needed some logical change to work well!

i.e.:

 Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails")
    Cc: [email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    [ Conflicts in subflow.c, because commit f1f26512a9bf ("mptcp: use plain
      bool instead of custom binary enum") and commit 46a5d3abedbe
      ("mptcp: fix typos in comments") are not in this version. Both are
      causing conflicts in the context, and the same modifications can still
      be applied. ]
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

UnshareAfterEnterUserns() creates a pidfd via os.StartProcess() with
CLONE_PIDFD but fails to close the file descriptor in any code path,
resulting in a file descriptor leak for every container that uses user
namespace isolation.

The leak occurs because:
- The pidfd is created when PidFD field is set in SysProcAttr
- The original defer block only calls PidfdSendSignal() and
  pidfdWaitid()
- No code path calls unix.Close(pidfd) to release the file descriptor

This causes one pidfd leak per container launch when user namespace
isolation is enabled (e.g., Kubernetes pods with hostUsers: false). In
production environments with high container churn, this can exhaust the
system's file descriptor limit.

Fix the leak by adding a defer statement immediately after process
creation that ensures unix.Close(pidfd) is always called, regardless of
which code path is taken. This guarantees cleanup even if the function
returns early due to errors or lack of pidfd support.

This follows the same cleanup pattern already established in
core/mount/mount_idmapped_utils_linux.go:getUsernsFD() which properly
closes its pidfd.

Closes: containerd#12166
Signed-off-by: Jose Fernandez <[email protected]>
[Move SupportsPidFD up to handle dupfd in Go 1.23.{0,1} and simplify backport]
Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.0 branch from cc4496d to 5bec0a3 Compare August 7, 2025 17:34
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Aug 7, 2025
@fuweid fuweid merged commit 2ea2055 into containerd:release/2.0 Aug 7, 2025
53 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Aug 7, 2025
@dmcgowan dmcgowan changed the title [release/2.0] sys: fix pidfd leak in UnshareAfterEnterUserns [release/2.0] Fix pidfd leak in UnshareAfterEnterUserns Nov 5, 2025
mansikulkarni96 added a commit to mansikulkarni96/containerd that referenced this pull request Dec 4, 2025
containerd 2.0.7

Welcome to the v2.0.7 release of containerd!

The seventh patch release for containerd 2.0 includes various bug fixes and updates.

* **containerd**
  * [**GHSA-pwhc-rpq9-4c8w**](GHSA-pwhc-rpq9-4c8w)
  * [**GHSA-m6hq-p25p-ffr2**](GHSA-m6hq-p25p-ffr2)

* **runc**
  * [**GHSA-qw9x-cqr3-wc7r**](GHSA-qw9x-cqr3-wc7r)
  * [**GHSA-cgrx-mc8f-2prm**](GHSA-cgrx-mc8f-2prm)
  * [**GHSA-9493-h29p-rfm2**](GHSA-9493-h29p-rfm2)

* **Disable event subscriber during task cleanup** ([containerd#12406](containerd#12406))
* **Add SystemdCgroup to default runtime options** ([containerd#12254](containerd#12254))
* **Fix userns with container image VOLUME mounts that need copy** ([containerd#12241](containerd#12241))

* **Add dial timeout field to hosts toml configuration** ([containerd#12136](containerd#12136))

* **Update runc binary to v1.3.3** ([containerd#12479](containerd#12479))
* **Fix lost container logs from quickly closing io** ([containerd#12376](containerd#12376))
* **Create bootstrap.json with 0644 permission** ([containerd#12184](containerd#12184))
* **Fix pidfd leak in UnshareAfterEnterUserns** ([containerd#12178](containerd#12178))

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Austin Vazquez
* Phil Estes
* Rodrigo Campos
* Wei Fu
* Akihiro Suda
* Derek McGowan
* Maksym Pavlenko
* ningmingxiao
* Kirtana Ashok
* Akhil Mohan
* Andrew Halaney
* Jin Dong
* Jose Fernandez
* Mike Baynton
* Philip Laine
* Swagat Bora
* wheat2018

<details><summary>56 commits</summary>
<p>

* Prepare release notes for v2.0.7 ([containerd#12482](containerd#12482))
  * [`4931e24f1`](containerd@4931e24) Prepare release notes for v2.0.7
  * [`205bc4f2d`](containerd@205bc4f) Update mailmap
  * [`5f708b76a`](containerd@5f708b7) Merge commit from fork
  * [`8cd112d82`](containerd@8cd112d) Fix directory permissions
  * [`05290b5bc`](containerd@05290b5) Merge commit from fork
  * [`4d1edf4ad`](containerd@4d1edf4) fix goroutine leak of container Attach
* Update runc binary to v1.3.3 ([containerd#12479](containerd#12479))
  * [`b46dc6a67`](containerd@b46dc6a) runc: Update runc binary to v1.3.3
* ci: bump Go 1.24.9; 1.25.3 ([containerd#12361](containerd#12361))
  * [`5e9c82178`](containerd@5e9c821) Update GHA runners to use latest images for basic binaries build
  * [`7f59248dc`](containerd@7f59248) Update GHA runners to use latest image for most jobs
  * [`e1373e8a8`](containerd@e1373e8) ci: bump Go 1.24.9, 1.25.3
  * [`e1a910a6a`](containerd@e1a910a) ci: bump Go 1.24.8; 1.25.2
  * [`fd04b7f17`](containerd@fd04b7f) move exclude-dirs to issues.exclude-dirs
  * [`b49377975`](containerd@b493779) update golangci-lint to v1.64.2
  * [`6e45022a1`](containerd@6e45022) build(deps): bump golangci/golangci-lint-action from 6.3.2 to 6.5.0
  * [`09ce0f2a1`](containerd@09ce0f2) build(deps): bump golangci/golangci-lint-action from 6.2.0 to 6.3.2
  * [`de63a740b`](containerd@de63a74) build(deps): bump golangci/golangci-lint-action from 6.1.1 to 6.2.0
* Fix lost container logs from quickly closing io ([containerd#12376](containerd#12376))
  * [`f953ee8a3`](containerd@f953ee8) bugfix:fix container logs lost because io close too quickly
* CI: update Fedora to 43 ([containerd#12448](containerd#12448))
  * [`f6f15f513`](containerd@f6f15f5) CI: update Fedora to 43
* Disable event subscriber during task cleanup ([containerd#12406](containerd#12406))
  * [`2a2329cbd`](containerd@2a2329c) cri/server/podsandbox: disable event subscriber
* CI: skip ubuntu-24.04-arm on private repos ([containerd#12428](containerd#12428))
  * [`dfb954743`](containerd@dfb9547) CI: skip ubuntu-24.04-arm on private repos
* Remove additional fuzzers from instrumentation repo ([containerd#12420](containerd#12420))
  * [`f6b02f6bb`](containerd@f6b02f6) Remove additional fuzzers from CI
* runc:Update runc binary to v1.3.1 ([containerd#12275](containerd#12275))
  * [`75c13ee3f`](containerd@75c13ee) runc:Update runc binary to v1.3.1
* Add SystemdCgroup to default runtime options ([containerd#12254](containerd#12254))
  * [`427cdd06c`](containerd@427cdd0) add SystemdCgroup to default runtime options
* install-runhcs-shim: fetch target commit instead of tags ([containerd#12255](containerd#12255))
  * [`0b35e19fb`](containerd@0b35e19) install-runhcs-shim: fetch target commit instead of tags
* Fix userns with container image VOLUME mounts that need copy ([containerd#12241](containerd#12241))
  * [`3212afc2f`](containerd@3212afc) integration: Add test for directives with userns
  * [`b855c6e10`](containerd@b855c6e) cri: Fix userns with Dockerfile VOLUME mounts that need copy
* Fix overlayfs issues related to user namespace ([containerd#12223](containerd#12223))
  * [`05c0c99f4`](containerd@05c0c99) core/mount: Retry unmounting idmapped directories
  * [`afdede4ce`](containerd@afdede4) core/mount: Test cleanup of DoPrepareIDMappedOverlay()
  * [`47205f814`](containerd@47205f8) core/mount: Properly cleanup on doPrepareIDMappedOverlay errors
  * [`6f4abd970`](containerd@6f4abd9) core/mount: Don't call nil function on errors
  * [`a2f0d65d7`](containerd@a2f0d65) core/mount: Only idmap once per overlayfs, not per layer
  * [`1c32accd7`](containerd@1c32acc) Make ovl idmap mounts read-only
* ci: bump Go 1.23.12, 1.24.6 ([containerd#12187](containerd#12187))
  * [`9e72e91e6`](containerd@9e72e91) ci: bump Go 1.23.12, 1.24.6
* Create bootstrap.json with 0644 permission ([containerd#12184](containerd#12184))
  * [`009622e04`](containerd@009622e) fix: create bootstrap.json with 0644 permission
* Fix pidfd leak in UnshareAfterEnterUserns ([containerd#12178](containerd#12178))
  * [`5bec0a332`](containerd@5bec0a3) sys: fix pidfd leak in UnshareAfterEnterUserns
* Fix windows test failures ([containerd#12120](containerd#12120))
  * [`2a2488131`](containerd@2a24881) Fix intermittent test failures on Windows CIs
  * [`018470948`](containerd@0184709) Remove WS2025 from CIs due to regression
* Add dial timeout field to hosts toml configuration ([containerd#12136](containerd#12136))
  * [`b50cbbc98`](containerd@b50cbbc) Add dial timeout field to hosts toml configuration
</p>
</details>

This release has no dependency changes

Previous release can be found at [v2.0.6](https://github.com/containerd/containerd/releases/tag/v2.0.6)
* `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`:         ✅Recommended. Dynamically linked with glibc 2.31 (Ubuntu 20.04).
* `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`:  Statically linked. Expected to be used on non-glibc Linux distributions. Not position-independent.

In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases)
and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too.

See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants