Skip to content

[carry-11761] core/runtime: should invoke shim binary if it doesn't support Sandbox API#11793

Merged
fuweid merged 3 commits intocontainerd:mainfrom
fuweid:carry-on-11761
May 6, 2025
Merged

[carry-11761] core/runtime: should invoke shim binary if it doesn't support Sandbox API#11793
fuweid merged 3 commits intocontainerd:mainfrom
fuweid:carry-on-11761

Conversation

@fuweid
Copy link
Copy Markdown
Member

@fuweid fuweid commented May 5, 2025

Since the v2.0.0 release, the shim manager has attempted to use a single ttrpc connection to manage multiple containers through one shim server. However, this approach may introduce compatibility issues with older shims. According to the current documentation, container resources are cleaned up via the shim delete binary call, which is triggered as a callback when the connection is closed. To ensure compatibility, containerd should always invoke the shim start binary to establish a new connection—regardless of the shim type—if the shim version is lower than 3.

This patch enables containerd to invoke the shim start binary call, but doing so reveals another issue:

  • A running pod is using an older containerd-shim-runc-v2 version (v1.7.x).
  • After upgrading to v2.x.y, the new containerd instance attempts to take over the running pod.
  • The old containerd-shim-runc-v2 binary has been replaced by the newer v2.x.y version.
    • The shim start call for the running pod now returns { "version": 3, "protocol": "ttrpc", "address": "EXISTING-SOCKET" }.
  • If a container in the pod exits and kubelet tries to recreate it, the shim manager attempts to connect using a version: 3 task client (containerd.io.v3.Task), which is not implemented in the old shim.

To address this, this patch also introduces a clientVersionDowngrader on shim, allowing the shim manager to retry with a lower client version for compatibility with older shims.


Closes: #11761
Closes: #9137
Closes: #11535
We should revert this change: #11741 from release/2.0

@dmcgowan @mikebrow @mxpv @samuelkarp
cc @yylt @chrishenzie

@github-project-automation github-project-automation Bot moved this to Needs Triage in Pull Request Review May 5, 2025
@dosubot dosubot Bot added area/cri Container Runtime Interface (CRI) area/runtime Runtime labels May 5, 2025
@fuweid fuweid force-pushed the carry-on-11761 branch from 44157a3 to cee2ebc Compare May 5, 2025 22:22
fuweid and others added 2 commits May 6, 2025 01:54
@fuweid fuweid force-pushed the carry-on-11761 branch from cee2ebc to c2f653e Compare May 6, 2025 05:59
@fuweid fuweid changed the title [carry-11761] core/runtime: should invoke shim binary if it doesn't support Sandbox API. [carry-11761] core/runtime: should invoke shim binary if it doesn't support Sandbox API May 6, 2025
@fuweid fuweid requested review from dmcgowan, mikebrow, mxpv and samuelkarp and removed request for dmcgowan May 6, 2025 15:02
@fuweid fuweid force-pushed the carry-on-11761 branch from c2f653e to 20a924f Compare May 6, 2025 16:51
Comment thread core/runtime/v2/shim_manager.go Outdated
Comment thread core/runtime/v2/task_manager.go Outdated
@fuweid fuweid force-pushed the carry-on-11761 branch from 20a924f to cd95eb7 Compare May 6, 2025 16:55
@fuweid fuweid requested a review from chrishenzie May 6, 2025 17:04
Comment thread core/runtime/v2/shim_manager.go Outdated
params = shimbinary.BootstrapParams{
Version: int(opts.Version),
Protocol: protocol,
Address: address,
Copy link
Copy Markdown
Member

@mxpv mxpv May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should introduce a concept of "supported features" on the shim side. The bootstrap parameters could return a list of supported features as strings (e.g., ["sandboxes", "streaming_io", ...]). This would allow containerd to explicitly check for feature support, rather than relying on assumptions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds good to me. we can re-design this part. I think it's more easy to decide which client version we should use. but right now, version number is good enough to fix this compatible issue.

Signed-off-by: yang yang <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the carry-on-11761 branch from cd95eb7 to 5dc29f0 Compare May 6, 2025 17:20
@github-project-automation github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review May 6, 2025
@dmcgowan dmcgowan added this pull request to the merge queue May 6, 2025
@dmcgowan dmcgowan added this to the 2.1 milestone May 6, 2025
@dmcgowan dmcgowan moved this from Review In Progress to Merge on Green in Pull Request Review May 6, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2025
@github-project-automation github-project-automation Bot moved this from Merge on Green to Review In Progress in Pull Request Review May 6, 2025
@fuweid fuweid added this pull request to the merge queue May 6, 2025
Merged via the queue into containerd:main with commit 3a1c2db May 6, 2025
101 of 103 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review May 6, 2025
@fuweid fuweid deleted the carry-on-11761 branch May 7, 2025 00:10
// It's rolled out together with the sandbox API and can be used
// to determine whether we should invoke the shim binary.
const supportSandboxAPIVersion = 3
if params.Version < supportSandboxAPIVersion {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this should not be here, only when params is valid should it be judged

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that param is empty, there are several cases we should invoke shim:

  • sandbox(pause) container (there is no sandbox ID yet)
  • created container without SandboxID field because it could be created by v1.7.
  • with SandboxID, but the sandbox is not created by podsandbox plugin
  • ...

So if param is empty, we should invoke shim.

return newPodTCtxWithRuntimeHandler(t, rSvc, name, ns, "", opts...)
}

func newPodTCtxWithRuntimeHandler(t *testing.T, rSvc cri.RuntimeService,
Copy link
Copy Markdown
Contributor

@yylt yylt May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a structure similar to the one below, I think this part of the code is easier to read and maintain.

type ctrdConfig struct {
	t       *testing.T
	workdir string
	version     int
}
type runtimeConfig struct {
	name        string
	runtimeType string
	runtimePath string
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new method is to allow us to run pod with different runtime. I think it's not related to runtime config you mentioned. Yeah, I think we should use function options here. I just don't get that why runtime handler is not that part of pod config. We can improve that in follow up

@yylt
Copy link
Copy Markdown
Contributor

yylt commented May 7, 2025

hm... there should have to be manually cherry picked to 2.0?

Have learned a lot, and before seems that there were too many hard coded parts, and some of them were difficult to understand

@fuweid
Copy link
Copy Markdown
Member Author

fuweid commented May 7, 2025

@yylt what part is hard code, would you please share that code? Or reference? Thanks

@yylt
Copy link
Copy Markdown
Contributor

yylt commented May 7, 2025

what part is hard code

i mean the #11761 which there are too many hard coded

smira added a commit to smira/pkgs that referenced this pull request May 19, 2025
When updating for containerd 2.1, I assumed the patch is no longer
needed, as it was included into containerd, but it turns out it got
reverted upstream: containerd/containerd#11793

So bring back the patch in updated form, as otherwise `gvisor` leaves
running processes behind on container stop.

Signed-off-by: Andrey Smirnov <[email protected]>
fuweid added a commit to fuweid/containerd that referenced this pull request Jun 2, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
fuweid added a commit to fuweid/containerd that referenced this pull request Jun 2, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
k8s-infra-cherrypick-robot pushed a commit to k8s-infra-cherrypick-robot/containerd that referenced this pull request Jun 10, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
fuweid added a commit to fuweid/containerd that referenced this pull request Jun 11, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
(cherry picked from commit 1ac97c2)
Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid added cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch and removed cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch labels Jun 11, 2025
Cindia-blue pushed a commit to Cindia-blue/containerd that referenced this pull request Aug 31, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
Cindia-blue pushed a commit to Cindia-blue/containerd that referenced this pull request Sep 1, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
Cindia-blue pushed a commit to Cindia-blue/containerd that referenced this pull request Oct 20, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
Cindia-blue pushed a commit to Cindia-blue/containerd that referenced this pull request Nov 8, 2025
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.

Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.

Additionally, PR containerd#11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.

To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.

(Introduced by containerd#11793)

Signed-off-by: Wei Fu <[email protected]>
yanghoeg pushed a commit to yanghoeg/pkgs that referenced this pull request May 4, 2026
When updating for containerd 2.1, I assumed the patch is no longer
needed, as it was included into containerd, but it turns out it got
reverted upstream: containerd/containerd#11793

So bring back the patch in updated form, as otherwise `gvisor` leaves
running processes behind on container stop.

Signed-off-by: Andrey Smirnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) area/runtime Runtime cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch size/L

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Upgrade test failures on 2.1 beta when upgrading from 1.7 shim Add integration test for Sandbox shims

7 participants