Implement RestartAllContainers#134345
Conversation
|
Skipping CI for Draft Pull Request. |
|
/cc marking for review |
c6b7713 to
442a62a
Compare
78eefbd to
3bb33af
Compare
|
/approve for api |
|
/retest-required |
| // discovery and inference logic. | ||
| var AllFeatures = []nodedeclaredfeatures.Feature{ | ||
| inplacepodresize.Feature, | ||
| restartallcontainers.Feature, |
tallclair
left a comment
There was a problem hiding this comment.
Is there any way that the pod could transition out of a Failed / Succeeded state with this feature? I'm having a hard time reasoning through the code paths to convince myself that that's not the case.
| // Kill and remove containers in reverse order. Source containers (which exited and triggered | ||
| // RestartAllContainers) are removed last. |
There was a problem hiding this comment.
Why does the order matter? Why are source containers separated out?
There was a problem hiding this comment.
You resolved this, but didn't answer the question?
|
Before this change, was it possible for a pod to transition from the Running to Pending phase? |
|
@tallclair it's impossible for a pod to transition out of a Failed / Succeeded state. This is enforced by kubelet.generateAPIPodStatus function: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1871-L1885
I think there is no check / validation forbidding it. I think it may happen if sidecar got restarted before the regular container starts running. However, I do think the pod should be kept in runnings state during the RestartAllContainers action. I did some refactor and added unit tests to ensure the pod status will be running during this restart. |
| // Kill and remove containers in reverse order. Source containers (which exited and triggered | ||
| // RestartAllContainers) are removed last. |
There was a problem hiding this comment.
You resolved this, but didn't answer the question?
| return | ||
| } | ||
| } | ||
| if err := m.removeContainer(ctx, containerInfo.containerID.ID); err != nil { |
There was a problem hiding this comment.
removeContainer also removes the container logs. Is that desired? Normally logs aren't removed when a container restarts.
| return | ||
| } | ||
| } | ||
| if err := m.removeContainer(ctx, containerInfo.containerID.ID); err != nil { |
There was a problem hiding this comment.
removeContainer triggers topology manager to remove the topology assignment for the container. In the case of pod-level resource managers, I think we want to make sure that this doesn't reset the pod-level topology assignment.
/cc @KevinTMtz @ffromani
There was a problem hiding this comment.
Thanks for the ping. Will read the KEP and the code and incorporate in the pod-level resource manager integration work.
There was a problem hiding this comment.
Naive question because I'm not into this KEP details. Do we run admission again then? because removing the topology assignment MAY mean that a pod which used to run with alignment guarantees loses them on restart, which seems undesirable. If this is real, we should probably add at least an admission check: if topology manager is enforcing, a pod with the RestartAllContainers flag enabled (at glance this seems to be surfacing at podspec level?) should be not admitted.
There was a problem hiding this comment.
The RestartAllContainers does not seem to affect the overall pod, only its status might become pending, which is not a concern (the CPU and Memory managers reconcileState considers a pending pod as active). In regards to clearing the Topology manager's state, it appears to not produce any secondary effects (the CPU/Memory managers are the effective source of truth), and it seems right to clean up the container ID mappings.
As action items, the RestartAllContainers should introduce a test to verify that the restart flow does not affect the resource manager's functionality (overall CPU and Memory managers features, not specific to PodLevelResourceManagers).
I created the following document with my findings and the discussion that I had with @yuanwang04, @ndixita and @SergeyKanzhelev: PodLevelResourceManagers & RestartAllContainers.
Any work that has to be done to address any situation created between PodLevelResourceManagers and other features can be followed in #136481.
There was a problem hiding this comment.
It might be worth adding the reasoning behind removing in-memory topology manager state when a container is removed to RestartAllContainers KEP as a note @yuanwang04
There was a problem hiding this comment.
And just to make sure that we are not missing out of any critical details, by any chance @ffromani @swatisehgal are you aware of why the in-memory topology manager state is retained for containers after the topology manager is done making the affinity decisions? We could have very well cleared the memory used to store the topology manager state
There was a problem hiding this comment.
Adding a reference to a comment relevant to this discussion from the Pod Level Resource Managers KEP PR: kubernetes/enhancements#5775 (comment).
|
@tallclair: GitHub didn't allow me to request PR reviews from the following users: KevinTMtz. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| logger.V(3).Info("Removing container before pod restarts", "containerName", cName, "containerID", containerInfo.containerID, "pod", klog.KObj(pod)) | ||
| removeContainerResult := kubecontainer.NewSyncResult(kubecontainer.RemoveContainer, cName) | ||
| result.AddSyncResult(removeContainerResult) | ||
| if containerInfo.kill { |
There was a problem hiding this comment.
I noticed that killContainersWithSyncResult kills the containers in parallel. Would that be preferred here?
|
/hold Let's close on final comments and merge. @yuanwang04 please unhold at will whenever there is an lgtm |
|
LGTM label has been added. DetailsGit tree hash: 12ea3a573e06c086e46571468ab1867d0c9a23a5 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SergeyKanzhelev, tallclair, thockin, yuanwang04 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 06922a6b32b3b49288f8e3cfc497591609c98401 |
|
@yuanwang04: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
|
Looks like this is failing in the serial lanes. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implement RestartAllContainers KEP: kubernetes/enhancements#5532
Which issue(s) this PR is related to:
KEP: kubernetes/enhancements#5532
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: