-
Notifications
You must be signed in to change notification settings - Fork 42k
Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker #102344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9679e37 to
54021e0
Compare
|
/priority critical-urgent Race conditions in the kubelet mean that quickly deleted pods can leave dangling resources for significant amounts of time (although this is enough of a change that we shouldn't rush to merge it until we're confident it's both correct, safe, and improves testing overall) |
|
/triage accepted |
|
Doc is https://docs.google.com/document/d/1DvAmqp9CV8i4_zYvNdDWZW-V0FMk1NbZ8BAOvLRD3Ds/edit# (link in first comment is the RH internal draft) |
54021e0 to
e4d4b3b
Compare
* Bump the pod status and node status update timeouts to avoid flakes * Add a small delay after dbus restart to ensure dbus has enough time to restart to startup prior to sending shutdown signal * Change check of pod being terminated by graceful shutdown. Previously, the pod phase was checked to see if it was `Failed` and the pod reason string matched. This logic needs to change after 1.22 graceful node shutdown change introduced in PR kubernetes#102344 which changed behavior to no longer put the pods into a failed phase. Instead, the test now checks that containers are not ready, and the pod status message and reason are set appropriately. Signed-off-by: David Porter <[email protected]>
A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state (they don't know whether the pod is setting up, tearing down, or can never be started again).
Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Have pod workers remain running until the pod is ready to be deleted in etcd (PodResourcesAreReclaimed would return true). Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
In addition, removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup.
See https://docs.google.com/document/d/1DvAmqp9CV8i4_zYvNdDWZW-V0FMk1NbZ8BAOvLRD3Ds/edit# for details
TODO:
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: