Handle pod network teardown more carefully when setup failed. by MrHohn · Pull Request #12132 · containerd/containerd

MrHohn · 2025-07-21T23:32:59Z

This is a follow up of both #10744 and #10839, where teardownPodNetwork may be skipped if setup networks failed in the first place.

In the situation illustrated on #12130, for chained CNI pattern the skipping logic may be problematic and leads to network resource leakage (e.g. Pod IP).

This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially.

/cc @sameersaeed @mikebrow @aojea

k8s-ci-robot · 2025-07-21T23:33:03Z

@MrHohn: GitHub didn't allow me to request PR reviews from the following users: aojea, sameersaeed.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

This is a follow up of both #10744 and #10839, where teardownPodNetwork may be skipped if setup networks failed in the first place.

In the situation illustrated on #12130, for chained CNI pattern the skipping logic may be problematic and leads to network resource leakage (e.g. Pod IP).

This PR attempts to make the sandbox network handling more robust by conditionally ignoring the teardown network when error is related to CNI initialization, while allowing teardown retry to happen when it makes sense.

/cc @sameersaeed @mikebrow @aojea

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-07-21T23:33:09Z

Hi @MrHohn. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sameersaeed · 2025-07-21T23:59:58Z

Thanks @MrHohn, I tested creating a new pod and it seems to work as expected - cannot create a new pod when no CNI plugins are initialized:

root@4db2b51bc43c:/src# crictl runp pod-config.json 
E0721 23:45:32.441792   25707 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a263eb38687a7c4affe7b7d863c5ca9f3997c4329538ccbe9205cfe21353447d\": cni plugin not initialized"
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a263eb38687a7c4affe7b7d863c5ca9f3997c4329538ccbe9205cfe21353447d": cni plugin not initialized 
root@4db2b51bc43c:/src# crictl pods
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
root@4db2b51bc43c:/src#

MrHohn · 2025-07-23T17:02:03Z

/cc @estesp

MrHohn · 2025-07-23T17:03:23Z

Adding @estesp from the previous change (#10839) - as we will need 2 approving reviews. Thanks!

mikebrow · 2025-07-25T12:14:22Z

/ok-to-test

MrHohn · 2025-07-25T17:30:48Z

From the pull-containerd-node-e2e failures - many of them seem irrelevant to the changes here.

/retest

aojea · 2025-07-31T14:00:40Z

-				return fmt.Errorf("failed to destroy network for sandbox %q: %w", id, err)
-			}
+		if err := c.teardownPodNetwork(ctx, sandbox); err != nil {
+			return fmt.Errorf("failed to destroy network for sandbox %q: %w", id, err)


why you should call the CNI plugin if you didn;t get any result before?

it seems that previous if sandbox.CNIResult != nil { is ok

I reverted this change because CNIResult will only be set if setupPodNetwork succeeded. In cases where both the setupPodNetwork and teardownPodNetwork failed during a sandbox creation, we will end up not retrying the teardownPodNetwork despite it failed previously.

yeah, that is why I added this comment #12132 (comment) , the setupPodNetworkerror can be handled directly after the output of that function and not in this magic defer that is so complex to understand ... it also depend on two magic error variables that makes it more confusing

There are a few more thoughts behind this - trying to merge the conversation into this thread: #12132 (comment).

aojea · 2025-07-31T14:01:34Z

 				if cleanupErr = c.teardownPodNetwork(deferCtx, sandbox); cleanupErr != nil {
 					log.G(ctx).WithError(cleanupErr).Errorf("Failed to destroy network for sandbox %q", id)

-					// ignoring failed to destroy networks when we failed to setup networks
-					if sandbox.CNIResult == nil {
+					// Ignoring network cleanup error if the CNI config/plugin failed to be initialized.
+					// It is safe to ignore in this scenario as no network plugins should have been executed.
+					if errors.Is(cleanupErr, ErrCNIConfigNotInitialized) ||
+						errors.Is(cleanupErr, cni.ErrCNINotInitialized) {
 						cleanupErr = nil


what makes this complex to think about is to embed all the logic into this big defer, I think that is better to compartmentalize this logic and handle the errors that come from c.setupPodNetwork directly there, if the network plugin is not initialized you don't have to call teardownPodNetwork and just return an error and this remains the same if sandbox.CNIResult == nil

Thanks for the comment Antonio. +1 that baking more complex logic into this defer block makes it hard to reason about the behavior. I will take your advice and see how to handle error from c.setupPodNetwork directly.

(Sorry for the long comment! I was thinking out loud while amending the codes.)

Coming back to this - I looked around how TeardownPodNetwork() is triggered and didn't find a ideal way yet to handle this. Would like to probe for more inputs here.

So there are two separate issues this PR is focusing on (the aim is to address both):

When SetupPodNetwork fails completely (e.g. due to CNI not initialized), the sandbox deletion will be blocked by TeardownPodNetwork for the same reason. This deadloack was previously addressed by checking if CNIResult == nil and ignoring cleanup error.

When SetupPodNetwork fails partially in the second half of the CNI chain, TeardownPodNetwork might fail early on (CNI executes in the reverse order during teardown) and leak the network resource allocated by the first half. TeardownPodNetwork is then skipped for later retries because CNIResult == nil.

There are two places where TeardownPodNetwork are called:

RunPodSandbox: When SetupPodNetwork fail, TeardownPodNetwork will be called in the defer func as the initial attempt to cleanup network resource.

StopPodSanbox: TeardownPodNetwork is called along with the process to stop the sandbox.

In both above places, TeardownPodNetwork will be skipped (or ignore error) when CNIResult == nil. This check itself seems problematic because a half-executed SetupPodNetwork could have allocated network resource while resulted in a nil CNIResult, hence it is not always safe to skip or ignore the cleanup error with this condition. Otherwise it prevents further retry of TeardownPodNetwork from happening, eventually leading to leaked resource permanently (during a node lifecycle).

One tricky part I'm seeing in the implementation is - in RunPodSandbox, TeardownPodNetwork is conditionally called in the defer func based on local variables (retErr and cleanupErr). There is no obvious way to influence the execution of TeardownPodNetwork directly while handling the error returned by SetupPodNetwork (without major refactoring). The most straightforward way is to categorize retErr and conditionally calling TeardownPodNetwork - and that's what in this PR at the moment.

If I take a step back and think of a more intuitive way of handling this. Ideally I will adjust the Setup methods in go-cni lib to return more accurate result. For instance, if the CNI plugins were executed partially, the partial results should also be returned so that they can be stored by containerd. The decision on whether a cleanup is required should then based on CNIResult for real, rather than the random conditions we are having here.

Would love to see what thoughts you might have on this :)

MrHohn · 2025-08-25T20:02:24Z

Bumping this up again - hoping to bring this back on track.

The aim of this fix is to really stop the bleeding while minimizing the changes needed. We are keen to resolve the IP leaking issue and making sure the resolution being backported to the impacted branches in a timely manner.

Yes, there are various options to "correct" the behavior (e.g. #12130 (comment)) outside of what's proposed here. I'm open for any advice and feedback on this. Thanks!

chrishenzie · 2025-09-02T18:29:54Z

Thanks for your work on this PR and patience in getting it reviewed.

To the maintainers, this PR is an important fix for the GKE team. We've had customers report IP address leaks on their nodes, and our investigation traced the root cause to the change in network teardown logic in #10744. Given the customer impact, we would appreciate it if we could get this reviewed and merged as soon as possible.

@mikebrow, @estesp, @aojea, could you please take a look when you have a moment?

mikebrow

LGTM

MrHohn · 2026-02-03T18:27:29Z

The test failure looks like flake reported here: #12260

/retest

This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially. Signed-off-by: Zihong Zheng <[email protected]>

k8s-ci-robot · 2026-02-21T07:13:23Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

github-project-automation Bot added this to Pull Request Review Jul 21, 2025

github-project-automation Bot moved this to Needs Triage in Pull Request Review Jul 21, 2025

k8s-ci-robot requested a review from mikebrow July 21, 2025 23:33

k8s-ci-robot added the size/S label Jul 21, 2025

k8s-ci-robot added the needs-ok-to-test label Jul 21, 2025

dosubot Bot added the area/cri Container Runtime Interface (CRI) label Jul 21, 2025

MrHohn mentioned this pull request Jul 21, 2025

IP Address Leak with Chained CNI Plugins When Teardown Fails #12130

Open

MrHohn force-pushed the improve-network-teardown branch from 35f4699 to 9ba9f99 Compare July 21, 2025 23:40

k8s-ci-robot requested a review from estesp July 23, 2025 17:02

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Jul 25, 2025

MrHohn force-pushed the improve-network-teardown branch from 9ba9f99 to 1c77f9a Compare July 25, 2025 18:31

This was referenced Jul 31, 2025

OCPBUGS-58229: server: handle missing network namespace gracefully during networkStop cri-o/cri-o#9301

Merged

Failed to destroy network for pod from CNI network after reboot kubernetes/kubernetes#133081

Closed

aojea reviewed Jul 31, 2025

View reviewed changes

MrHohn force-pushed the improve-network-teardown branch 2 times, most recently from ed67626 to e95ba1e Compare August 4, 2025 19:27

samuelkarp moved this from Needs Triage to Needs Reviewers in Pull Request Review Aug 19, 2025

mikebrow force-pushed the improve-network-teardown branch from e95ba1e to 0d13443 Compare January 30, 2026 18:01

mikebrow approved these changes Jan 30, 2026

View reviewed changes

mikebrow added cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch labels Jan 30, 2026

mikebrow force-pushed the improve-network-teardown branch from 0d13443 to 2b67a18 Compare February 3, 2026 21:20

k8s-ci-robot added the needs-rebase label Feb 21, 2026

aojea mentioned this pull request Feb 24, 2026

Fix CNI issue where CNI DEL is never executed #12923

Merged

Conversation

MrHohn commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jul 21, 2025

Uh oh!

k8s-ci-robot commented Jul 21, 2025

Uh oh!

sameersaeed commented Jul 21, 2025

Uh oh!

MrHohn commented Jul 23, 2025

Uh oh!

MrHohn commented Jul 23, 2025

Uh oh!

mikebrow commented Jul 25, 2025

Uh oh!

MrHohn commented Jul 25, 2025

Uh oh!

aojea Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

aojea Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

MrHohn Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

aojea Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

MrHohn Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

aojea Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrHohn Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

MrHohn Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

MrHohn commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrishenzie commented Sep 2, 2025

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

MrHohn commented Feb 3, 2026

Uh oh!

k8s-ci-robot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

MrHohn commented Jul 21, 2025 •

edited

Loading

aojea Jul 31, 2025 •

edited

Loading

MrHohn commented Aug 25, 2025 •

edited

Loading