Allowed shortened grace period for pods in Kubelet #98507

wzshiming · 2021-01-28T03:58:55Z

What type of PR is this?

/kind bug
/sig node

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #83916
Fixes #84298
Fixes #87039
Fixes #88613
xref #98506
Fixes #100695

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Allowed shortened grace period for pods in Kubelet

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

k8s-ci-robot · 2021-01-28T03:59:02Z

@wzshiming: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

smarterclayton · 2021-01-28T04:39:22Z

I’d expect a test case to show the problem failing before (or an e2e if it’s simple to do via delete override).

jqmichael

I think the issue isn't in the kubelet code, but on the server side.

	if objectMeta.GetDeletionTimestamp() != nil {
		// if we are already being deleted, we may only shorten the deletion grace period
		// this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set,
		// so we force deletion immediately
		// IMPORTANT:
		// The deletion operation happens in two phases.
		// 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp
		// 2. Delete the object from storage.
		// If the update succeeds, but the delete fails (network error, internal storage error, etc.),
		// a resource was previously left in a state that was non-recoverable.  We
		// check if the existing stored resource has a grace period as 0 and if so
		// attempt to delete immediately in order to recover from this scenario.
		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
			return false, false, nil
		}
		// only a shorter grace period may be provided by a user
		if options.GracePeriodSeconds != nil {
			period := int64(*options.GracePeriodSeconds)
			if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
				return false, true, nil
			}

https://github.com/kubernetes/apiserver/blame/d4c9a195921609cf81e3e950beaf246f934e0f4c/pkg/registry/rest/delete.go#L96-L110

As the commen suggest, in phase 1, the DeletionGracePeriodSeconds was set to a negative value (meaning DeletionTimestamp was in the past), and in phase 2, options.GracePeriodSeconds has to be set even lower, otherwise, it will be considered as pending graceful deletion and not performing the immediate delete.

I think there're two possible fixes.

Consider not allowing setting DeletionTimestamp in the past (DeletionGracePeriodSeconds should be non-negative). In that case, we could convert options.GracePeriodSeconds to 0 if negative.
Accept that DeletionTimestamp could be set in the past. In that case, in phase 2, we treat negative value of objectMeta.GetDeletionGracePeriodSeconds() the same as 0, and allow immediate delete.

Thoughts?

/CC: @smarterclayton @liggitt @lavalamp @deads2k

pkg/kubelet/kuberuntime/kuberuntime_container.go

lavalamp · 2021-01-28T19:09:11Z

DeletionGracePeriodSeconds should not accept negative values, how did that happen?

ayberk · 2021-01-28T19:30:24Z

@lavalamp We're still trying to figure it out, but current hypothesis is a 3rd party node termination handler manually set it to -1. There are no guards against negative values from what I can tell.

wzshiming · 2021-11-24T07:23:48Z

Hi @smarterclayton

I tested in the latest version, the problem persists and is getting worse. At least it could be deleted with --force. Now even if it is forcibly deleted, the Pod is still in the CRI.
that syncTerminatingPod does not handle Context, can we bypass this problem first, and reduce the Pod leakage that this problem may bring.

adisky · 2021-12-01T05:30:24Z

/cc

dims · 2022-01-05T13:44:50Z

/milestone v1.24

jyotimahapatra · 2022-02-09T03:05:32Z

Hi 👋 I'm checking in from the bug triage team for release 1.24. Is this PR targeted for release 1.24?

wzshiming · 2022-02-09T08:12:00Z

@jyotimahapatra
Yes
Since PR #102344 was merged, both 1.22 and 1.23 exist, when Pod that is deleting cannot be forced deleted. Earlier versions, just can't shorten the grace period but can be forced to delete.

dims · 2022-03-28T00:25:41Z

@ehashman @SergeyKanzhelev do we need this for v1.24?

ehashman · 2022-03-28T17:46:07Z

I don't think we've made enough progress on this to merge it by tomorrow.

/milestone clear

k8s-triage-robot · 2022-06-26T18:33:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu · 2022-06-30T06:32:54Z

/remove-lifecycle stale
Does this fix #109352 as well?

k8s-triage-robot · 2022-09-28T06:45:37Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-10-28T07:08:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot · 2022-11-05T18:34:39Z

@wzshiming: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-12-10T03:03:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-12-10T03:03:27Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from SergeyKanzhelev and mtaufen January 28, 2021 03:59

wzshiming changed the title ~~Fix grace period override~~ [WIP] Fix grace period override Jan 28, 2021

wzshiming force-pushed the fix-grace-period-override branch 2 times, most recently from ae72c97 to eeff551 Compare January 28, 2021 15:40

wzshiming force-pushed the fix-grace-period-override branch from eeff551 to 80b6cd9 Compare January 28, 2021 15:50

jqmichael reviewed Jan 28, 2021

View reviewed changes

pkg/kubelet/kuberuntime/kuberuntime_container.go Outdated Show resolved Hide resolved

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2021

k8s-ci-robot requested a review from adisky December 1, 2021 05:30

k8s-ci-robot added this to the v1.24 milestone Jan 5, 2022

wzshiming requested a review from smarterclayton February 9, 2022 08:14

pacoxu mentioned this pull request Mar 19, 2022

Pod is removed from store but the containers are not terminated #88613

Closed

k8s-ci-robot removed this from the v1.24 milestone Mar 28, 2022

ehashman mentioned this pull request Apr 12, 2022

should not override KillPodOptions.PodTerminationGracePeriodSecondsOverride by pod spec.TerminationGracePeriodSeconds #109412

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 28, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2022

pacoxu mentioned this pull request Nov 8, 2022

GracePeriod elegant delete can only be set once #113712

Open

k8s-ci-robot closed this Dec 10, 2022

pacoxu mentioned this pull request Jan 5, 2023

The second time the pod is deleted the grace period does not take effect #113883

Closed

Allowed shortened grace period for pods in Kubelet #98507

Allowed shortened grace period for pods in Kubelet #98507

Uh oh!

Conversation

wzshiming commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 28, 2021

Uh oh!

smarterclayton commented Jan 28, 2021

Uh oh!

jqmichael left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lavalamp commented Jan 28, 2021

Uh oh!

ayberk commented Jan 28, 2021

Uh oh!

wzshiming commented Nov 24, 2021

Uh oh!

adisky commented Dec 1, 2021

Uh oh!

dims commented Jan 5, 2022

Uh oh!

jyotimahapatra commented Feb 9, 2022

Uh oh!

wzshiming commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dims commented Mar 28, 2022

Uh oh!

ehashman commented Mar 28, 2022

Uh oh!

k8s-triage-robot commented Jun 26, 2022

Uh oh!

pacoxu commented Jun 30, 2022

Uh oh!

k8s-triage-robot commented Sep 28, 2022

Uh oh!

k8s-triage-robot commented Oct 28, 2022

Uh oh!

k8s-ci-robot commented Nov 5, 2022

Uh oh!

k8s-triage-robot commented Dec 10, 2022

Uh oh!

k8s-ci-robot commented Dec 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

wzshiming commented Jan 28, 2021 •

edited

Loading

jqmichael left a comment •

edited

Loading

wzshiming commented Feb 9, 2022 •

edited

Loading