-
Notifications
You must be signed in to change notification settings - Fork 42k
Prevent pods from defaulting to zero second grace periods #102025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/assign @liggitt |
|
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
|
/priority important-soon |
|
Wow, we have e2e tests that are racing on the same pod name: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102025/pull-kubernetes-e2e-kind-ipv6/1393342213345251328 These tests were force deleting pods, then starting a pod of the same name and execing to it - which wouldn't necessarily guarantee that an exec session was even going to the new pod (at least, historically it wouldn't). I will fix the e2e tests that assume that deletion like this is safe (it isn't) |
2bd6593 to
372e30e
Compare
|
/hold for reviews and prevent accidental merging :) |
|
This implies we have a kubelet bug where some events are being lost / the worker is saturated, or a call is failing in containerd. Needs investigation. This is yet another reason why force delete in e2e hides SLO failures in underlying components - I would expect subsecond container kills from time the delete is received, even if volume cleanup takes longer. Something very wrong / inadequate is happening in code + environment. @rphillips can you help me find an owner to run down what would cause the kubelet to get this bogged down (might be container runtime, hard to say). |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/reopen This is still relevant. I ran into this unexpected behavior of |
|
@pohly: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/reopen |
|
@pohly: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
A zero second grace period on pods is a special value and is intended for exceptional use (to break the safety guarantees when a node is partitioned). When spec.terminationGracePeriodSeconds (which overrides the default of 30s) is zero, pods are instantly deleted without waiting for the kubelet to ensure the process is not running. The intent of this default was not to allow users to bypass that process (since the normal behavior of a pod is not to bypass kubelet shutdown) and the documentation was written as such. Users who wish to bypass the kubelet's participation must set gracePeriod to zero on the delete options call.
With this change the value 1 second is substituted when users specify 0, resulting in the kubelet participating in the deletion process and only a minor delay in practice.
The controlling design for pod safety is https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md and I believe we simply missed this during the early review and no one ever blew themselves up with this footgun. We did not intend to offer the footgun as a feature of defaulting as described here in the doc:
That matches the value we placed in the field godoc:
The intent was the default is the minimum, that the kubelet would be involved in deletion, and that gracePeriod=0 on a delete operation was the only way a user would be able to perform force delete. Unfortunately we didn't actually handle 0 specially at the time we implemented this and it has remained ambiguous since then.
As this changes the observed behavior of the system (prioritizing safety and the documented behavior over the unsafe behavior), this is also an API change. Kube is consistent by default, and bypassing the kubelet is both a safety (accidental consistency loss) and operational hazard (deleting a large number of pods without the kubelet means that resources are still allotted to those workers and could take an arbitrary amount of time to clean up).
Discovered while doing a review of clusters in the wild - some workloads were setting this thinking that it simply meant "as fast as possible".
/kind bug
/kind api-change
/sig node