Setup pod network after creating the sandbox container #5904

qiutongs · 2021-08-23T06:17:13Z

Signed-off-by: Qiutong Song [email protected]

Overview

Fix issue #5768

Change the order to:

Create network namespace
Create sandbox container
Setup pod network
Create sandbox container task

Testing

make
make test
sudo make root-test
sudo make integration test

Local Verification

Simulate IP Leakage with Chained Plugin

I can simulate ip leakage case by modifying the CNI config /etc/cni/net.d/10-containerd-net.conflist.

Add a dummy plugin that doesn't exist.

{
      "type": "dummy",
}

Thus, the RundPodSandbox failed with error

ERRO[2022-04-06T08:02:59.236331342-07:00] RunPodSandbox for &PodSandboxMetadata{Name:nginx-pod,Uid:d52d99d3-b4d7-4715-a0fa-4aaca8218de1,Namespace:default,Attempt:0,} failed, error  error="failed to setup network for sandbox \"949d785ccf3d121f65407690ee2adf80d2b24f788f97cc688ed5ab4f049b7309\": failed to find plugin \"dummy\" in path [/opt/cni/bin]"
ERRO[2022-04-06T08:02:59.263476902-07:00] StopPodSandbox for "949d785ccf3d121f65407690ee2adf80d2b24f788f97cc688ed5ab4f049b7309" failed  error="failed to destroy network for sandbox \"949d785ccf3d121f65407690ee2adf80d2b24f788f97cc688ed5ab4f049b7309\": failed to find plugin \"dummy\" in path [/opt/cni/bin]"

Before the Fix

Kubelet keeps calling RunPodSandbox and each time it reserves a new IP. All the IPs are not released. This can be observed with below command.

sudo ls /var/lib/cni/networks/containerd-net

After the Fix

Kubelet keeps calling StopPodSandbox after the initial RunPodSandbox. Even though all StopPodSandbox also fail, the IP is not leaked (watching /var/lib/cni/networks/containerd-net as above).

Pod event

Normal   SandboxChanged          12m (x16 over 15m)  kubelet            Pod sandbox changed, it will be killed and re-created.

New Integration Test with Failpoint

# Need CNI 1.0+
$ script/setup/install-failpoint-binaries
$ ENABLE_CRI_SANDBOXES="" CONTAINERD_RUNTIME=runc FOCUS=TestRunPodSandboxWithShimStartAndTealdownCNIFailure make cri-integration
$ ENABLE_CRI_SANDBOXES="" CONTAINERD_RUNTIME=runc FOCUS=TestRunPodSandboxWithShimDeleteFailure make cri-integration

Note: sudo rm -rf /var/lib/containerd-test to cleanup the states if the test fails. Otherwise, the tests will fail with "failed to reserve sandbox name" in the subsequent runs.

New Integration Test

$ FOCUS=TestNetworkTeardownFailureWithoutIPLeakage make cri-integration
INFO[0000] Using the following image list: {Alpine:docker.io/library/alpine:latest BusyBox:docker.io/library/busybox:latest Pause:k8s.gcr.io/pause:3.6 ResourceConsumer:k8s.gcr.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.1 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1} 
=== RUN   TestNetworkTeardownFailureWithoutIPLeakage
    sandbox_clean_remove_test.go:136: Make sure host-local ipam and portmap are in use
    sandbox_clean_remove_test.go:152: Rename the portmap binary
    sandbox_clean_remove_test.go:157: Count the numbers of IPs allocated currently
    sandbox_clean_remove_test.go:163: Create a sandbox
E0507 19:38:18.844400 1848990 remote_runtime.go:135] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to setup network for sandbox "accdc40e1fbdc2d305634485c0246d0130ed016a2d53cbb4353803b0d2eccc73": plugin type="portmap" failed (add): failed to find plugin "portmap" in path [/opt/cni/bin]
    sandbox_clean_remove_test.go:189: Get pod information
    sandbox_clean_remove_test.go:206: [{pid } {ipc } {uts } {mount } {network /var/run/netns/cni-8171a6cd-01bc-7e96-55d6-a5fd1e953717}]
    sandbox_clean_remove_test.go:215: Count the numbers of IPs allocated after teardown failure
    sandbox_clean_remove_test.go:224: Should be able to stop and remove the sandbox
    sandbox_clean_remove_test.go:228: Count the numbers of IPs allocated after removing the sandbox
--- PASS: TestNetworkTeardownFailureWithoutIPLeakage (2.21s)
PASS

Before the fix, the test fails

    sandbox_clean_remove_test.go:198: 
                Error Trace:    sandbox_clean_remove_test.go:198
                Error:          Should NOT be empty, but was 
                Test:           TestNetworkTeardownFailureWithoutIPLeakage
                Messages:       sandboxId should be found

k8s-ci-robot · 2021-08-23T06:17:22Z

Hi @qiutongs. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

qiutongs · 2021-08-23T06:17:34Z

/assign @Random-Liu

theopenlab-ci · 2021-08-23T06:37:55Z

Build succeeded.

containerd-build-arm64 : NODE_FAILURE in 0s (non-voting)
containerd-test-arm64 : NODE_FAILURE in 0s (non-voting)
containerd-integration-test-arm64 : NODE_FAILURE in 0s (non-voting)

Random-Liu · 2021-09-09T06:03:52Z

With this change, even though the pod network setup and teardown are moved down, our problem is still not resolved:

The sandbox container metadata will still be deleted from the containerd metadata store in the defer;
The sandbox is not added into the sandbox store, so will not be listed by kubelet, thus no cleanup.

We can probably add the sandbox into the sandbox store in unknown state if there is any sandbox startup problem (remove the defer cleanup), and rely on kubelet to stop and remove the sandbox afterwards. A similar approach can be used for the container creation as well.

The assumption is that sandbox stop should be idempotent.

One thing could be missing with the approach above is selinux.ReleaseLabel(sandbox.ProcessLabel). I'm not sure why it is not included in sandbox stop today.

mikebrow · 2021-10-11T17:48:26Z

@qiutongs note @Random-Liu 's request for change

estesp · 2021-10-28T20:28:47Z

any update here based on the feedback @qiutongs?

qiutongs · 2022-04-04T06:51:49Z

Sorry, I paused this. Now I am prioritizing the fix. Will update asap.

mikebrow · 2022-04-04T18:56:21Z

Sorry, I paused this. Now I am prioritizing the fix. Will update asap.

that works I was poking around at this at the end of last week.. for performance reasons I was thinking we could run the pause container and cni setup in parallel

wip commit over here:
mikebrow@17cb9da

they both take about the same amount of time.. thus running them in parallel will essentially make the pause container just a resource hit. Still have to resolve the idempotent issue.

qiutongs · 2022-04-05T05:22:17Z

theopenlab-ci · 2022-04-05T05:22:36Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

kevpar · 2022-04-05T05:58:13Z

I think this will break VM-isolated containers on Windows. We rely on the CNI running before the pause container is started, since CNI creates the HCN endpoint, and we look for the HCN endpoint when creating the UVM, which is done as part of the pause container creation.

kevpar · 2022-04-05T06:01:59Z

@dcantah can you take a look to confirm if this will affect the vm-isolated container work you are doing?

mikebrow · 2022-04-05T16:58:09Z

I think this will break VM-isolated containers on Windows. We rely on the CNI running before the pause container is started, since CNI creates the HCN endpoint, and we look for the HCN endpoint when creating the UVM, which is done as part of the pause container creation.

would you be ok if cni setup and the create/start of the pause container going on in parallel if you were to add a wait check for the endpoint to exist?

dcantah · 2022-04-06T00:52:23Z

would you be ok if cni setup and the create/start of the pause container going on in parallel if you were to add a wait check for the endpoint to exist?

@mikebrow Are you saying if we could make CNI and the pause container operations start at the same time would it be alright on our end if we can just busy loop until an endpoint shows up (e.g. until CNI finishes)? I guess I'm asking if you mean in our shim would we be fine looping for the existence of an ep

qiutongs · 2022-04-06T04:39:44Z

With this change, even though the pod network setup and teardown are moved down, our problem is still not resolved:

The sandbox container metadata will still be deleted from the containerd metadata store in the defer;

The sandbox is not added into the sandbox store, so will not be listed by kubelet, thus no cleanup.

We can probably add the sandbox into the sandbox store in unknown state if there is any sandbox startup problem (remove the defer cleanup), and rely on kubelet to stop and remove the sandbox afterwards. A similar approach can be used for the container creation as well.

The assumption is that sandbox stop should be idempotent.

One thing could be missing with the approach above is selinux.ReleaseLabel(sandbox.ProcessLabel). I'm not sure why it is not included in sandbox stop today.

I see what I missed. I updated my change based on your idea.
Working on to reproduce this locally.

theopenlab-ci · 2022-04-06T05:04:16Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 4m 19s (non-voting)
containerd-test-arm64 : SUCCESS in 5m 49s (non-voting)
containerd-integration-test-arm64 : SUCCESS in 24m 12s (non-voting)

qiutongs · 2022-04-06T15:20:47Z

I have successfully simulated the IP leakage case and verified this fix works for the case. See #5904 (comment)

mikebrow · 2022-04-06T20:06:37Z

would you be ok if cni setup and the create/start of the pause container going on in parallel if you were to add a wait check for the endpoint to exist?

@mikebrow Are you saying if we could make CNI and the pause container operations start at the same time would it be alright on our end if we can just busy loop until an endpoint shows up (e.g. until CNI finishes)? I guess I'm asking if you mean in our shim would we be fine looping for the existence of an ep

well yes.. not for a long period of time.. creating a pod is down to about 250mills on my old bare metal server assuming we are not resource constrained, moving to parallel will make that 180mils, again assuming not resource constrained. A few fixes that went in brought pod start down from a couple seconds for our default containerd/cni setup/install. We could alternatively just serialize for runtimes that have a dependency on the network setup completing before the pause container is created/started..

mikebrow · 2022-04-06T20:33:44Z

With this change, even though the pod network setup and teardown are moved down, our problem is still not resolved:

The sandbox container metadata will still be deleted from the containerd metadata store in the defer;

The sandbox is not added into the sandbox store, so will not be listed by kubelet, thus no cleanup.

We can probably add the sandbox into the sandbox store in unknown state if there is any sandbox startup problem (remove the defer cleanup), and rely on kubelet to stop and remove the sandbox afterwards. A similar approach can be used for the container creation as well.
The assumption is that sandbox stop should be idempotent.
One thing could be missing with the approach above is selinux.ReleaseLabel(sandbox.ProcessLabel). I'm not sure why it is not included in sandbox stop today.

I see what I missed. I updated my change based on your idea. Working on to reproduce this locally.

I think Lantau was pushing for a different approach entirely to the moving network setup down.. I think he's saying set the pod status to unknown .. that will/should cause the pod to be destroyed by kubelet with a stop and remove. Thus we should be able to remove most of the defer logic.

dcantah · 2022-04-06T21:03:04Z

We could alternatively just serialize for runtimes that have a dependency on the network setup completing before the pause container is created/started..

This doesn't seem like a bad idea either, but I'm not sure if the approach in the PR currently is the final one.. With what's in the diff now at least, I don't think this would break Windows. As long as the network is up and running before we start the task we should be alright.

qiutongs · 2022-04-08T21:23:40Z

I think Lantau was pushing for a different approach entirely to the moving network setup down.. I think he's saying set the pod status to unknown .. that will/should cause the pod to be destroyed by kubelet with a stop and remove. Thus we should be able to remove most of the defer logic.

I synced with Lantau offline. He agreed that we could do minimal changes now which is still attempting to cleanup. But if the cleanup fails, we rely on Kubelet to call StopPodSandbox to cleanup.

In my change, the sandbox object will remain the "unknown" status and be added to store.

sandboxstore.Status{
	State: sandboxstore.StateUnknown,
},

samuelkarp · 2022-04-11T02:18:49Z

@qiutongs Would you mind fixing DCO on the commit here so the rest of the CI checks can run?

mikebrow

LGTM (late) with the listed followups.. I think progress is made.

k8s-ci-robot added the needs-ok-to-test label Aug 23, 2021

k8s-ci-robot assigned Random-Liu Aug 23, 2021

mikebrow added the status/needs-update Awaiting contributor update label Oct 11, 2021

qiutongs closed this Apr 5, 2022

qiutongs reopened this Apr 5, 2022

qiutongs closed this Apr 6, 2022

qiutongs force-pushed the ip-leakage-fix branch from 25081ba to 83f44dd Compare April 6, 2022 04:34

qiutongs reopened this Apr 6, 2022

samuelkarp added the cherry-pick/1.6.x label Sep 30, 2022

samuelkarp requested a review from dmcgowan September 30, 2022 00:56

dmcgowan approved these changes Sep 30, 2022

View reviewed changes

dmcgowan merged commit 1cc38f8 into containerd:main Sep 30, 2022

qiutongs deleted the ip-leakage-fix branch September 30, 2022 01:51

This was referenced Sep 30, 2022

[release/1.6] test: introduce failpoint control to runc-shimv2 and cni #7455

Merged

[release/1.6] Setup pod network after creating the sandbox container #7456

Merged

samuelkarp added the impact/changelog label Sep 30, 2022

mikebrow reviewed Sep 30, 2022

View reviewed changes

samuelkarp added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch and removed cherry-pick/1.6.x labels Oct 4, 2022

qiutongs mentioned this pull request Oct 5, 2022

Update container with sandbox metadata after NetNS is created #7481

Merged

estesp added cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch and removed cherry-pick/1.5.x labels Oct 28, 2022

samuelkarp added the cherry-picked/sbserver Changes are backported to sbserver label Nov 23, 2022

henry118 mentioned this pull request Dec 2, 2022

[Proposal] Add Networks Service Plugin #7751

Closed

dmcgowan mentioned this pull request Dec 10, 2022

Prepare release notes for v1.7.0-beta.1 #7793

Merged

mikebrow mentioned this pull request Dec 12, 2022

Add support for user namespaces in stateless pods (KEP-127) #7679

Merged

4 tasks

dcantah mentioned this pull request Dec 20, 2022

CRI: Fix no CNI info for pod sandbox on restart #7845

Merged

dmcgowan mentioned this pull request Jan 3, 2023

Prepare release notes for v1.7.0-beta.2 #7903

Merged

fuweid mentioned this pull request Jan 13, 2023

containerd restart broke: failed to recover state: failed to reserve sandbox name #6095

Closed

dmcgowan mentioned this pull request Jan 28, 2023

Prepare release notes for v1.7.0-beta.3 #8014

Merged

almaslennikov mentioned this pull request Feb 8, 2023

Return success if plugin config is not found in CmdDel k8snetworkplumbingwg/accelerated-bridge-cni#65

Merged

dmcgowan mentioned this pull request Feb 15, 2023

Prepare release notes for v1.7.0 beta.4 #8110

Merged

This was referenced Feb 25, 2023

Prepare release notes for v1.7.0-rc.0 #8162

Merged

Prepare release notes for v1.7.0-rc.1 #8192

Merged

This was referenced Mar 7, 2023

Add release notes for v1.7.0-rc.2 #8216

Merged

Prepare release notes for v1.7.0-rc.3 #8233

Merged

Prepare release notes for v1.7.0 #8242

Merged

antoninbas mentioned this pull request Oct 31, 2023

IP may leak after agent restart antrea-io/antrea#4326

Closed

Setup pod network after creating the sandbox container #5904

Setup pod network after creating the sandbox container #5904

Uh oh!

Conversation

qiutongs commented Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Local Verification

Simulate IP Leakage with Chained Plugin

Before the Fix

After the Fix

New Integration Test with Failpoint

New Integration Test

Uh oh!

k8s-ci-robot commented Aug 23, 2021

Uh oh!

qiutongs commented Aug 23, 2021

Uh oh!

theopenlab-ci bot commented Aug 23, 2021

Uh oh!

Random-Liu commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebrow commented Oct 11, 2021

Uh oh!

estesp commented Oct 28, 2021

Uh oh!

qiutongs commented Apr 4, 2022

Uh oh!

mikebrow commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qiutongs commented Apr 5, 2022

Uh oh!

theopenlab-ci bot commented Apr 5, 2022

Uh oh!

kevpar commented Apr 5, 2022

Uh oh!

kevpar commented Apr 5, 2022

Uh oh!

mikebrow commented Apr 5, 2022

Uh oh!

dcantah commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qiutongs commented Apr 6, 2022

Uh oh!

theopenlab-ci bot commented Apr 6, 2022

Uh oh!

qiutongs commented Apr 6, 2022

Uh oh!

mikebrow commented Apr 6, 2022

Uh oh!

mikebrow commented Apr 6, 2022

Uh oh!

dcantah commented Apr 6, 2022

Uh oh!

qiutongs commented Apr 8, 2022

Uh oh!

samuelkarp commented Apr 11, 2022

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

qiutongs commented Aug 23, 2021 •

edited

Loading

Random-Liu commented Sep 9, 2021 •

edited

Loading

mikebrow commented Apr 4, 2022 •

edited

Loading

dcantah commented Apr 6, 2022 •

edited

Loading