Skip to content

cri: do not set sandbox id when using podsandbox type#11735

Merged
mxpv merged 1 commit intocontainerd:mainfrom
yylt:sbxid
Apr 22, 2025
Merged

cri: do not set sandbox id when using podsandbox type#11735
mxpv merged 1 commit intocontainerd:mainfrom
yylt:sbxid

Conversation

@yylt
Copy link
Copy Markdown
Contributor

@yylt yylt commented Apr 22, 2025

Fix #11708

The exit of the work container will call shutdown at 1.7, but only call shutdown when the sandbox exits at 2.0.

// some processes about shim.Task

kill    ->    delete    ->   shutdown     
            ( callback by backoff by wait )

// https://github.com/containerd/containerd/blob/fb4c30d4ede3531652d86197bf3fc9515e5276d9/core/runtime/v2/shim.go#L529-L532C46
	// Don't shutdown sandbox as there may be other containers running.
	// Let controller decide when to shutdown.
	if !sandboxed {
		if err := s.waitShutdown(ctx); err != nil {

Based on the above code, it seems to be an issue with gvisor, and the shim connection (write close at shim side) should be closed when call delete rpc.

Above, could solve problem #11708, but not sure it's worth.

If there were clearer documentation about the shim API, I think it would be more helpful for this issue

cc @mxpv

@k8s-ci-robot
Copy link
Copy Markdown

Hi @yylt. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot Bot added area/cri Container Runtime Interface (CRI) kind/enhancement labels Apr 22, 2025
@yylt yylt changed the title [runtime] not set sandbox id when podsandbox type [cri] not set sandbox id when podsandbox type Apr 22, 2025
@samuelkarp
Copy link
Copy Markdown
Member

/ok-to-test

Would you mind rebasing on the tip of main? We just merged a fix for the Vagrant CI failures.

@samuelkarp
Copy link
Copy Markdown
Member

This appears to fix #11708 for me.

If there were clearer documentation about the shim API, I think it would be more helpful for this issue

The protocol is described here and here. In particular:

containerd invokes the shim, including container configuration, which uses that information to decide whether to launch a new socket listener (1:1 shim to container) or use an existing one (1:many)

  • if existing, return the address of the existing socket and exit
  • if new, the shim:
    a. creates a new process to listen on a socket for ttRPC commands from containerd
    b. returns the address to that socket to containerd
    c. exits

The start command MUST write to stdout either the ttrpc address that the shim is serving its API on, or (experimental) a JSON structure in the following format (where protocol can be either "ttrpc" or "grpc"):
[...]
The address will be used by containerd to issue API requests for container operations.

The start command can either start a new shim or return an address to an existing shim based on the shim's logic.

I'm not wild about special-casing the "podsandbox" sandboxer here; it should really be that containerd is reading and respecting the value returned by the shim. However, I'm going to approve this anyway because I'd like to get this fixed for 2.0 (and we can clean it up afterward).

It looks like I have write permission on the branch, so I'll do the rebase onto main to fix CI.

@samuelkarp samuelkarp added area/runtime Runtime cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch labels Apr 22, 2025
@samuelkarp
Copy link
Copy Markdown
Member

For more context on why this change works: prior to the change, containers using podsandbox sandboxer and a third-party shim that did not group would not receive a s.task.Shutdown. Setting sandboxed = false allows shimTask.delete to call s.waitShutdown (which eventually calls s.task.Shutdown after 3 seconds).

@samuelkarp samuelkarp changed the title [cri] not set sandbox id when podsandbox type cri: do not set sandbox id when using podsandbox type Apr 22, 2025
@dmcgowan dmcgowan added this to the 2.1 milestone Apr 22, 2025
@samuelkarp samuelkarp moved this from Needs Triage to Needs Reviewers in Pull Request Review Apr 22, 2025
@samuelkarp samuelkarp requested a review from mxpv April 22, 2025 18:28
@github-project-automation github-project-automation Bot moved this from Needs Reviewers to Review In Progress in Pull Request Review Apr 22, 2025
@mxpv mxpv added this pull request to the merge queue Apr 22, 2025
@mxpv
Copy link
Copy Markdown
Member

mxpv commented Apr 22, 2025

/cherry-pick release/2.0

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mxpv: once the present PR merges, I will cherry-pick it on top of release/2.0 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release/2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@samuelkarp
Copy link
Copy Markdown
Member

@mxpv I did a manual cherry-pick in #11741 since git cherry-pick will produce a merge conflict.

@mxpv
Copy link
Copy Markdown
Member

mxpv commented Apr 22, 2025

Yup, just noticed it. I guess there is no way to cancel the request, so I'll close the PR once its opened.

Merged via the queue into containerd:main with commit 93bc845 Apr 22, 2025
100 of 103 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Apr 22, 2025
@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mxpv: #11735 failed to apply on top of branch "release/2.0":

Applying: not set sandbox id when use podsandbox type
Using index info to reconstruct a base tree...
M	internal/cri/server/container_create.go
Falling back to patching base and 3-way merge...
Auto-merging internal/cri/server/container_create.go
CONFLICT (content): Merge conflict in internal/cri/server/container_create.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 not set sandbox id when use podsandbox type

Details

In response to this:

/cherry-pick release/2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@samuelkarp samuelkarp added cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch and removed cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch labels Apr 22, 2025
@yylt
Copy link
Copy Markdown
Contributor Author

yylt commented Apr 23, 2025

The protocol is described here and here. In particular:

Thanks, but when containerd invokes the shim for start new work container, not found daemon process, like containerd-shim-runc-v2 -namespace k8s.io -address [containerdsock] -id [work container unix domain socket] ..., look list it is clearly 1:1 (shim to container),

I believe I must have missed some information. Could you please clarify this? Thank you very much

@samuelkarp
Copy link
Copy Markdown
Member

I'm having trouble understanding your question.

When containerd starts a new container, it execs a new shim process (for example, containerd-shim-runc-v2). The shim process will then write a Unix domain socket address to stdout, and exit. From containerd's perspective, the contract is that the shim provides some valid Unix domain socket address, and then subsequently connects to that socket via ttrpc (or grpc) and can issue RPCs. From a shim author's perspective, the shim can choose any of the following scenarios:

  1. Have a 1:1 correspondence between shim:container (1 shim per container)
  2. Have a 1:many correspondence between shim:container (1 shim per group, which might be a Kubernetes pod)
  3. Have a 1:all correspondence between shim:container (1 shim per host)

The mechanism used by the shim to do this is deciding what address to return. In case (1) above, every container gets a unique address (and there is typically a unique process that listens on the socket). In case (2) above, each group of containers gets a unique address, and the shim logic gets to decide what constitutes a group (the containerd-shim-runc-v2 uses these labels to decide). In case (3) above, there is a single address that all containers get and a single process listening on the socket.

@yylt
Copy link
Copy Markdown
Contributor Author

yylt commented Apr 24, 2025

Thank you for your patience, @samuelkarp

finally, see https://github.com/containerd/containerd/blob/fb4c30d4ede3531652d86197bf3fc9515e5276d9/cmd/containerd-shim-runc-v2/manager/manager_linux.go#L198-L203C3, where use sandbox id replace container id

	for _, group := range groupLabels {
		if groupID, ok := spec.Annotations[group]; ok {
			grouping = groupID
			break
		}
	}

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Apr 24, 2025

I think we should use containerd/1.7 server to create pod with shim-runc-v1. And then we upgrade it into v2.x and do test.

I would like to see there is e2e test case before release.

@yylt
Copy link
Copy Markdown
Contributor Author

yylt commented Apr 24, 2025

I think we should use containerd/1.7 server to create pod with shim-runc-v1. And then we upgrade it into v2.x and do test.

I would like to see there is e2e test case before release.

mean #11535?

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Apr 28, 2025

@yylt For the runc-v1 binary, it doesn't support grouping containers in one shim. I think it's good chance for us to verify your change without introducing gVisor depedency. Based on this issue - #9727 (comment), we should check state dir after deleting one container. It should not have any leaky resource.

@yylt
Copy link
Copy Markdown
Contributor Author

yylt commented Apr 29, 2025

we should check state dir after deleting one container. It should not have any leaky resource.

Yes, we should check the status directory after remove container

At the same time, there is a problem, when upgrad from 1. x to 2.0, create and start new container based on sandbox which the old shim-v2 prepared, it is should also be expected to be successful.

The above modifications are on new pr. Could you take a look and provide suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) area/runtime Runtime cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch kind/enhancement ok-to-test size/S

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

gVisor containerd shim incompatible w/ containerd 2.0

7 participants