Skip to content

Conversation

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented May 14, 2021

A zero second grace period on pods is a special value and is intended for exceptional use (to break the safety guarantees when a node is partitioned). When spec.terminationGracePeriodSeconds (which overrides the default of 30s) is zero, pods are instantly deleted without waiting for the kubelet to ensure the process is not running. The intent of this default was not to allow users to bypass that process (since the normal behavior of a pod is not to bypass kubelet shutdown) and the documentation was written as such. Users who wish to bypass the kubelet's participation must set gracePeriod to zero on the delete options call.

With this change the value 1 second is substituted when users specify 0, resulting in the kubelet participating in the deletion process and only a minor delay in practice.

The controlling design for pod safety is https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md and I believe we simply missed this during the early review and no one ever blew themselves up with this footgun. We did not intend to offer the footgun as a feature of defaulting as described here in the doc:

A user may specify an interval on the pod called the termination grace period that defines the minimum amount of time the pod will have to complete the termination phase, and all components will honor this interval.

That matches the value we placed in the field godoc:

The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal.

The intent was the default is the minimum, that the kubelet would be involved in deletion, and that gracePeriod=0 on a delete operation was the only way a user would be able to perform force delete. Unfortunately we didn't actually handle 0 specially at the time we implemented this and it has remained ambiguous since then.

As this changes the observed behavior of the system (prioritizing safety and the documented behavior over the unsafe behavior), this is also an API change. Kube is consistent by default, and bypassing the kubelet is both a safety (accidental consistency loss) and operational hazard (deleting a large number of pods without the kubelet means that resources are still allotted to those workers and could take an arbitrary amount of time to clean up).

Discovered while doing a review of clusters in the wild - some workloads were setting this thinking that it simply meant "as fast as possible".

/kind bug
/kind api-change
/sig node

The pod `spec.terminationGracePeriodSeconds` field unintentionally allowed a zero value to be copied directly to the pod during deletion, causing the pod to bypass kubelet shutdown during deletion (aka force deletion). This was unintentional - force deletion was only meant to be possible on an explicit delete with the gracePeriod field provided to delete options.  Pods with `spec.terminationGracePeriodSeconds=0` now are interpreted the same as `spec.terminationGracePeriodSeconds=1` and will no longer be deleted without the kubelet having a chance to clean up the pod.  The kubelet will shut the pod down as fast as it can, and force deleting a pod is now only possible by invoking delete or delete collection on pods with the gracePeriod=0 field.  

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 14, 2021
@k8s-ci-robot k8s-ci-robot requested review from deads2k and justinsb May 14, 2021 23:08
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 14, 2021
@smarterclayton
Copy link
Contributor Author

/assign @liggitt
/assign @derekwaynecarr

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@smarterclayton
Copy link
Contributor Author

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 14, 2021
@smarterclayton
Copy link
Contributor Author

Wow, we have e2e tests that are racing on the same pod name: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102025/pull-kubernetes-e2e-kind-ipv6/1393342213345251328



Kubernetes e2e suite: [sig-network] Services should implement service.kubernetes.io/service-proxy-name expand_less | 1m31s
-- | --
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/service.go:1883 May 14 23:32:30.493: Unexpected error:     <*errors.StatusError \| 0xc0008e59a0>: {         ErrStatus: {             TypeMeta: {Kind: "", APIVersion: ""},             ListMeta: {                 SelfLink: "",                 ResourceVersion: "",                 Continue: "",                 RemainingItemCount: nil,             },             Status: "Failure",             Message: "object is being deleted: pods \"verify-service-down-host-exec-pod\" already exists",             Reason: "AlreadyExists",             Details: {                 Name: "verify-service-down-host-exec-pod",                 Group: "",                 Kind: "pods",                 UID: "",                 Causes: nil,                 RetryAfterSeconds: 0,             },             Code: 409,         },     }     object is being deleted: pods "verify-service-down-host-exec-pod" already exists occurred /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/service.go:2705

These tests were force deleting pods, then starting a pod of the same name and execing to it - which wouldn't necessarily guarantee that an exec session was even going to the new pod (at least, historically it wouldn't). I will fix the e2e tests that assume that deletion like this is safe (it isn't)

@k8s-ci-robot k8s-ci-robot added area/test sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 15, 2021
@dims
Copy link
Member

dims commented May 15, 2021

/hold

for reviews and prevent accidental merging :)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2021
@smarterclayton
Copy link
Contributor Author

smarterclayton commented May 15, 2021

Kubernetes e2e suite: [sig-api-machinery] Garbage collector should keep the rc around until all its pods are deleted if the deleteOptions says so in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102025/pull-kubernetes-e2e-kind-ipv6/1393391901620572160 is broken because it's taking over 1m30s minutes to delete 10 pods that have single second termination grace period. That's a very long time - it points to either kubelet teardown issues (several of which exist and I suspect there may be containerd issues not diagnosed) or another issue with qps. Looking at the pod remaining on the first worker:

May 15 02:42:53 kind-worker kubelet[251]: I0515 02:42:53.451499     251 kubelet.go:1948] "SyncLoop DELETE" source="api" pods=[gc-4860/simpletest.rc-2mlnf]
...
May 15 02:43:11 kind-worker kubelet[251]: I0515 02:43:11.814953     251 kubelet_pods.go:967] "Pod is terminated, but some containers are still running" pod="gc-4860/simpletest.rc-2mlnf"
...
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.382949     251 kuberuntime_container.go:714] "Killing container with a grace period override" pod="gc-4860/simpletest.rc-2mlnf" podUID=ae1207d1-5e23-4844-aff1-8e5a6ee2b160 containerName="nginx" containerID="containerd://5aa9bb1fe965bfc14eb5f8ddeab3f0a22362c795b4b9ebba3656d8d9a91a0d57" gracePeriod=2
...
# this takes FAR too long - what is going on
...
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.383385     251 event.go:291] "Event occurred" object="gc-4860/simpletest.rc-2mlnf" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container nginx"
May 15 02:44:16 kind-worker kubelet[251]: I0515 02:44:16.532080     251 kuberuntime_container.go:722] "Container exited normally" pod="gc-4860/simpletest.rc-2mlnf" podUID=ae1207d1-5e23-4844-aff1-8e5a6ee2b160 containerName="nginx" containerID="containerd://5aa9bb1fe965bfc14eb5f8ddeab3f0a22362c795b4b9ebba3656d8d9a91a0d57"

This implies we have a kubelet bug where some events are being lost / the worker is saturated, or a call is failing in containerd. Needs investigation. This is yet another reason why force delete in e2e hides SLO failures in underlying components - I would expect subsecond container kills from time the delete is received, even if volume cleanup takes longer.

Something very wrong / inadequate is happening in code + environment. @rphillips can you help me find an owner to run down what would cause the kubelet to get this bogged down (might be container runtime, hard to say).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor

pohly commented Sep 22, 2022

/reopen

This is still relevant. I ran into this unexpected behavior of TerminationGracePeriodSeconds=0 recently and created a doc update to warn about it (#112564), but personally I would prefer to remove this obviously unintended footgun - i.e., let's revive this PR and merge that instead of my doc update.

@k8s-ci-robot k8s-ci-robot reopened this Sep 22, 2022
@k8s-ci-robot
Copy link
Contributor

@pohly: Reopened this PR.

Details

In response to this:

/reopen

This is still relevant. I ran into this unexpected behavior of TerminationGracePeriodSeconds=0 recently and created a doc update to warn about it (#112564), but personally I would prefer to remove this obviously unintended footgun - i.e., let's revive this PR and merge that instead of my doc update.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 22, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 22, 2022
@k8s-ci-robot
Copy link
Contributor

@smarterclayton: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-storage-slow f5f0066d5a7622fa5f8cfd03de3ef9d5943b4525 link /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-node-kubelet-serial-containerd c5d320c94bc376bc04e0252578f2793490e07bdd link /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-kubelet-serial c5d320c94bc376bc04e0252578f2793490e07bdd link /test pull-kubernetes-node-kubelet-serial
pull-kubernetes-e2e-gci-gce-ingress 61aea54 link false /test pull-kubernetes-e2e-gci-gce-ingress
pull-kubernetes-e2e-ubuntu-gce-network-policies 61aea54 link false /test pull-kubernetes-e2e-ubuntu-gce-network-policies
pull-kubernetes-e2e-gce-ubuntu-containerd 61aea54 link true /test pull-kubernetes-e2e-gce-ubuntu-containerd
pull-kubernetes-e2e-gci-gce-ipvs 61aea54 link false /test pull-kubernetes-e2e-gci-gce-ipvs
pull-kubernetes-integration 61aea54 link true /test pull-kubernetes-integration
pull-kubernetes-conformance-kind-ipv6-parallel 61aea54 link false /test pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-dependencies 61aea54 link true /test pull-kubernetes-dependencies
pull-kubernetes-node-e2e-containerd 61aea54 link true /test pull-kubernetes-node-e2e-containerd
pull-kubernetes-unit 61aea54 link true /test pull-kubernetes-unit
pull-kubernetes-e2e-kind-ipv6 61aea54 link true /test pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-verify-govet-levee 61aea54 link true /test pull-kubernetes-verify-govet-levee
pull-kubernetes-conformance-kind-ga-only-parallel 61aea54 link true /test pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-verify 61aea54 link true /test pull-kubernetes-verify
pull-kubernetes-e2e-gce-100-performance 61aea54 link true /test pull-kubernetes-e2e-gce-100-performance
pull-kubernetes-e2e-kind 61aea54 link true /test pull-kubernetes-e2e-kind
pull-kubernetes-typecheck 61aea54 link true /test pull-kubernetes-typecheck

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor

pohly commented Jun 1, 2023

/reopen
/assign

@k8s-ci-robot
Copy link
Contributor

@pohly: Reopened this PR.

Details

In response to this:

/reopen
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 1, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.