Prevent pods from defaulting to zero second grace periods #102025

smarterclayton · 2021-05-14T23:07:23Z

A zero second grace period on pods is a special value and is intended for exceptional use (to break the safety guarantees when a node is partitioned). When spec.terminationGracePeriodSeconds (which overrides the default of 30s) is zero, pods are instantly deleted without waiting for the kubelet to ensure the process is not running. The intent of this default was not to allow users to bypass that process (since the normal behavior of a pod is not to bypass kubelet shutdown) and the documentation was written as such. Users who wish to bypass the kubelet's participation must set gracePeriod to zero on the delete options call.

With this change the value 1 second is substituted when users specify 0, resulting in the kubelet participating in the deletion process and only a minor delay in practice.

The controlling design for pod safety is https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md and I believe we simply missed this during the early review and no one ever blew themselves up with this footgun. We did not intend to offer the footgun as a feature of defaulting as described here in the doc:

A user may specify an interval on the pod called the termination grace period that defines the minimum amount of time the pod will have to complete the termination phase, and all components will honor this interval.

That matches the value we placed in the field godoc:

The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal.

The intent was the default is the minimum, that the kubelet would be involved in deletion, and that gracePeriod=0 on a delete operation was the only way a user would be able to perform force delete. Unfortunately we didn't actually handle 0 specially at the time we implemented this and it has remained ambiguous since then.

As this changes the observed behavior of the system (prioritizing safety and the documented behavior over the unsafe behavior), this is also an API change. Kube is consistent by default, and bypassing the kubelet is both a safety (accidental consistency loss) and operational hazard (deleting a large number of pods without the kubelet means that resources are still allotted to those workers and could take an arbitrary amount of time to clean up).

Discovered while doing a review of clusters in the wild - some workloads were setting this thinking that it simply meant "as fast as possible".

/kind bug
/kind api-change
/sig node

The pod `spec.terminationGracePeriodSeconds` field unintentionally allowed a zero value to be copied directly to the pod during deletion, causing the pod to bypass kubelet shutdown during deletion (aka force deletion). This was unintentional - force deletion was only meant to be possible on an explicit delete with the gracePeriod field provided to delete options.  Pods with `spec.terminationGracePeriodSeconds=0` now are interpreted the same as `spec.terminationGracePeriodSeconds=1` and will no longer be deleted without the kubelet having a chance to clean up the pod.  The kubelet will shut the pod down as fast as it can, and force deleting a pod is now only possible by invoking delete or delete collection on pods with the gracePeriod=0 field.

smarterclayton · 2021-05-14T23:20:40Z

/assign @liggitt
/assign @derekwaynecarr

fejta-bot · 2021-05-14T23:37:15Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

smarterclayton · 2021-05-14T23:43:32Z

/priority important-soon

smarterclayton · 2021-05-14T23:48:48Z

Wow, we have e2e tests that are racing on the same pod name: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102025/pull-kubernetes-e2e-kind-ipv6/1393342213345251328



Kubernetes e2e suite: [sig-network] Services should implement service.kubernetes.io/service-proxy-name expand_less | 1m31s
-- | --
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/service.go:1883 May 14 23:32:30.493: Unexpected error:     <*errors.StatusError \| 0xc0008e59a0>: {         ErrStatus: {             TypeMeta: {Kind: "", APIVersion: ""},             ListMeta: {                 SelfLink: "",                 ResourceVersion: "",                 Continue: "",                 RemainingItemCount: nil,             },             Status: "Failure",             Message: "object is being deleted: pods \"verify-service-down-host-exec-pod\" already exists",             Reason: "AlreadyExists",             Details: {                 Name: "verify-service-down-host-exec-pod",                 Group: "",                 Kind: "pods",                 UID: "",                 Causes: nil,                 RetryAfterSeconds: 0,             },             Code: 409,         },     }     object is being deleted: pods "verify-service-down-host-exec-pod" already exists occurred /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/service.go:2705

These tests were force deleting pods, then starting a pod of the same name and execing to it - which wouldn't necessarily guarantee that an exec session was even going to the new pod (at least, historically it wouldn't). I will fix the e2e tests that assume that deletion like this is safe (it isn't)

dims · 2021-05-15T02:38:27Z

/hold

for reviews and prevent accidental merging :)

smarterclayton · 2021-05-15T17:05:21Z

Kubernetes e2e suite: [sig-api-machinery] Garbage collector should keep the rc around until all its pods are deleted if the deleteOptions says so in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102025/pull-kubernetes-e2e-kind-ipv6/1393391901620572160 is broken because it's taking over 1m30s minutes to delete 10 pods that have single second termination grace period. That's a very long time - it points to either kubelet teardown issues (several of which exist and I suspect there may be containerd issues not diagnosed) or another issue with qps. Looking at the pod remaining on the first worker:

May 15 02:42:53 kind-worker kubelet[251]: I0515 02:42:53.451499     251 kubelet.go:1948] "SyncLoop DELETE" source="api" pods=[gc-4860/simpletest.rc-2mlnf]
...
May 15 02:43:11 kind-worker kubelet[251]: I0515 02:43:11.814953     251 kubelet_pods.go:967] "Pod is terminated, but some containers are still running" pod="gc-4860/simpletest.rc-2mlnf"
...
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.382949     251 kuberuntime_container.go:714] "Killing container with a grace period override" pod="gc-4860/simpletest.rc-2mlnf" podUID=ae1207d1-5e23-4844-aff1-8e5a6ee2b160 containerName="nginx" containerID="containerd://5aa9bb1fe965bfc14eb5f8ddeab3f0a22362c795b4b9ebba3656d8d9a91a0d57" gracePeriod=2
...
# this takes FAR too long - what is going on
...
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.383385     251 event.go:291] "Event occurred" object="gc-4860/simpletest.rc-2mlnf" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container nginx"
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.532080     251 kuberuntime_container.go:722] "Container exited normally" pod="gc-4860/simpletest.rc-2mlnf" podUID=ae1207d1-5e23-4844-aff1-8e5a6ee2b160 containerName="nginx" containerID="containerd://5aa9bb1fe965bfc14eb5f8ddeab3f0a22362c795b4b9ebba3656d8d9a91a0d57"

This implies we have a kubelet bug where some events are being lost / the worker is saturated, or a call is failing in containerd. Needs investigation. This is yet another reason why force delete in e2e hides SLO failures in underlying components - I would expect subsecond container kills from time the delete is received, even if volume cleanup takes longer.

Something very wrong / inadequate is happening in code + environment. @rphillips can you help me find an owner to run down what would cause the kubelet to get this bogged down (might be container runtime, hard to say).

k8s-triage-robot · 2022-06-13T20:05:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-07-13T20:34:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-08-12T21:29:21Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-08-12T21:29:36Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2022-09-22T09:35:08Z

/reopen

This is still relevant. I ran into this unexpected behavior of TerminationGracePeriodSeconds=0 recently and created a doc update to warn about it (#112564), but personally I would prefer to remove this obviously unintended footgun - i.e., let's revive this PR and merge that instead of my doc update.

k8s-ci-robot · 2022-09-22T09:35:14Z

@pohly: Reopened this PR.

Details

In response to this:

/reopen

This is still relevant. I ran into this unexpected behavior of TerminationGracePeriodSeconds=0 recently and created a doc update to warn about it (#112564), but personally I would prefer to remove this obviously unintended footgun - i.e., let's revive this PR and merge that instead of my doc update.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

linux-foundation-easycla · 2022-09-22T09:35:17Z

The committers listed above are authorized under a signed CLA.

✅ login: smarterclayton / name: Clayton Coleman (0e1b003, 7bbde40, 01ef2be, bd42842, 61aea54)

k8s-ci-robot · 2022-09-22T09:39:53Z

@smarterclayton: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-triage-robot · 2022-10-22T10:29:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-10-22T10:29:34Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2023-06-01T06:18:11Z

/reopen
/assign

k8s-ci-robot · 2023-06-01T06:18:22Z

@pohly: Reopened this PR.

Details

In response to this:

/reopen
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2023-07-01T06:31:32Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2023-07-01T06:31:37Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from deads2k and justinsb May 14, 2021 23:08

k8s-ci-robot assigned derekwaynecarr and liggitt May 14, 2021

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 14, 2021

smarterclayton force-pushed the delete_safety branch from 2bd6593 to 372e30e Compare May 15, 2021 00:00

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2021

aojea mentioned this pull request May 26, 2022

Re-enable Kubelet Pod Readiness Probes on Termination and Pod probes should be handled by pod worker #110191

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2022

k8s-ci-robot closed this Aug 12, 2022

aojea mentioned this pull request Sep 22, 2022

api: document force-delete effect of TerminationGracePeriodSeconds=0 #112564

Closed

k8s-ci-robot reopened this Sep 22, 2022

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 22, 2022

spiffxp mentioned this pull request Oct 6, 2022

e2e: sub package refactoring #112043

Merged

k8s-ci-robot closed this Oct 22, 2022

k8s-ci-robot assigned pohly Jun 1, 2023

k8s-ci-robot reopened this Jun 1, 2023

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 1, 2023

k8s-ci-robot closed this Jul 1, 2023

Prevent pods from defaulting to zero second grace periods #102025

Prevent pods from defaulting to zero second grace periods #102025

Uh oh!

Conversation

smarterclayton commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarterclayton commented May 14, 2021

Uh oh!

fejta-bot commented May 14, 2021

Uh oh!

smarterclayton commented May 14, 2021

Uh oh!

smarterclayton commented May 14, 2021

Uh oh!

dims commented May 15, 2021

Uh oh!

smarterclayton commented May 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-triage-robot commented Jun 13, 2022

Uh oh!

k8s-triage-robot commented Jul 13, 2022

Uh oh!

k8s-triage-robot commented Aug 12, 2022

Uh oh!

k8s-ci-robot commented Aug 12, 2022

Uh oh!

pohly commented Sep 22, 2022

Uh oh!

k8s-ci-robot commented Sep 22, 2022

Uh oh!

linux-foundation-easycla bot commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Sep 22, 2022

Uh oh!

k8s-triage-robot commented Oct 22, 2022

Uh oh!

k8s-ci-robot commented Oct 22, 2022

Uh oh!

pohly commented Jun 1, 2023

Uh oh!

k8s-ci-robot commented Jun 1, 2023

Uh oh!

k8s-triage-robot commented Jul 1, 2023

Uh oh!

k8s-ci-robot commented Jul 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

smarterclayton commented May 14, 2021 •

edited

Loading

smarterclayton commented May 15, 2021 •

edited

Loading

linux-foundation-easycla bot commented Sep 22, 2022 •

edited

Loading