Skip to content

[release/2.1] Properly shutdown non-groupable shims to prevent resource leaks#11971

Merged
kiashok merged 1 commit intocontainerd:release/2.1from
k8s-infra-cherrypick-robot:cherry-pick-11916-to-release/2.1
Jun 10, 2025
Merged

[release/2.1] Properly shutdown non-groupable shims to prevent resource leaks#11971
kiashok merged 1 commit intocontainerd:release/2.1from
k8s-infra-cherrypick-robot:cherry-pick-11916-to-release/2.1

Conversation

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

This is an automated cherry-pick of #11916

/assign fuweid

Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
@k8s-ci-robot
Copy link
Copy Markdown

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot Bot added the area/runtime Runtime label Jun 10, 2025
Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green

@github-project-automation github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review Jun 10, 2025
@austinvazquez
Copy link
Copy Markdown
Member

/ok-to-test

@kiashok kiashok merged commit 5531309 into containerd:release/2.1 Jun 10, 2025
99 of 102 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Jun 10, 2025
@dmcgowan dmcgowan changed the title [release/2.1] *: properly shutdown non-groupable shims to prevent resource leaks [release/2.1] Properly shutdown non-groupable shims to prevent resource leaks Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants