-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Setup pod network after creating the sandbox container #5904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @qiutongs. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/assign @Random-Liu |
|
Build succeeded.
|
|
With this change, even though the pod network setup and teardown are moved down, our problem is still not resolved:
We can probably add the sandbox into the sandbox store in unknown state if there is any sandbox startup problem (remove the defer cleanup), and rely on kubelet to stop and remove the sandbox afterwards. A similar approach can be used for the container creation as well. The assumption is that sandbox stop should be idempotent. One thing could be missing with the approach above is |
|
@qiutongs note @Random-Liu 's request for change |
|
any update here based on the feedback @qiutongs? |
|
Sorry, I paused this. Now I am prioritizing the fix. Will update asap. |
that works I was poking around at this at the end of last week.. for performance reasons I was thinking we could run the pause container and cni setup in parallel wip commit over here: they both take about the same amount of time.. thus running them in parallel will essentially make the pause container just a resource hit. Still have to resolve the idempotent issue. |
|
|
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
I think this will break VM-isolated containers on Windows. We rely on the CNI running before the pause container is started, since CNI creates the HCN endpoint, and we look for the HCN endpoint when creating the UVM, which is done as part of the pause container creation. |
|
@dcantah can you take a look to confirm if this will affect the vm-isolated container work you are doing? |
would you be ok if cni setup and the create/start of the pause container going on in parallel if you were to add a wait check for the endpoint to exist? |
@mikebrow Are you saying if we could make CNI and the pause container operations start at the same time would it be alright on our end if we can just busy loop until an endpoint shows up (e.g. until CNI finishes)? I guess I'm asking if you mean in our shim would we be fine looping for the existence of an ep |
I see what I missed. I updated my change based on your idea. |
|
Build succeeded.
|
|
I have successfully simulated the IP leakage case and verified this fix works for the case. See #5904 (comment) |
well yes.. not for a long period of time.. creating a pod is down to about 250mills on my old bare metal server assuming we are not resource constrained, moving to parallel will make that 180mils, again assuming not resource constrained. A few fixes that went in brought pod start down from a couple seconds for our default containerd/cni setup/install. We could alternatively just serialize for runtimes that have a dependency on the network setup completing before the pause container is created/started.. |
I think Lantau was pushing for a different approach entirely to the moving network setup down.. I think he's saying set the pod status to unknown .. that will/should cause the pod to be destroyed by kubelet with a stop and remove. Thus we should be able to remove most of the defer logic. |
This doesn't seem like a bad idea either, but I'm not sure if the approach in the PR currently is the final one.. With what's in the diff now at least, I don't think this would break Windows. As long as the network is up and running before we start the task we should be alright. |
I synced with Lantau offline. He agreed that we could do minimal changes now which is still attempting to cleanup. But if the cleanup fails, we rely on Kubelet to call StopPodSandbox to cleanup. In my change, the |
|
@qiutongs Would you mind fixing DCO on the commit here so the rest of the CI checks can run? |
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (late) with the listed followups.. I think progress is made.
Signed-off-by: Qiutong Song [email protected]
Overview
Fix issue #5768
Change the order to:
Testing
Local Verification
Simulate IP Leakage with Chained Plugin
I can simulate ip leakage case by modifying the CNI config
/etc/cni/net.d/10-containerd-net.conflist.Add a dummy plugin that doesn't exist.
Thus, the RundPodSandbox failed with error
Before the Fix
Kubelet keeps calling RunPodSandbox and each time it reserves a new IP. All the IPs are not released. This can be observed with below command.
After the Fix
Kubelet keeps calling StopPodSandbox after the initial RunPodSandbox. Even though all StopPodSandbox also fail, the IP is not leaked (watching
/var/lib/cni/networks/containerd-netas above).Pod event
New Integration Test with Failpoint
Note:
sudo rm -rf /var/lib/containerd-testto cleanup the states if the test fails. Otherwise, the tests will fail with "failed to reserve sandbox name" in the subsequent runs.New Integration TestBefore the fix, the test fails