-
Notifications
You must be signed in to change notification settings - Fork 42k
Allowed shortened grace period for pods in Kubelet #98507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wzshiming: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I’d expect a test case to show the problem failing before (or an e2e if it’s simple to do via delete override). |
ae72c97 to
eeff551
Compare
eeff551 to
80b6cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue isn't in the kubelet code, but on the server side.
if objectMeta.GetDeletionTimestamp() != nil {
// if we are already being deleted, we may only shorten the deletion grace period
// this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set,
// so we force deletion immediately
// IMPORTANT:
// The deletion operation happens in two phases.
// 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp
// 2. Delete the object from storage.
// If the update succeeds, but the delete fails (network error, internal storage error, etc.),
// a resource was previously left in a state that was non-recoverable. We
// check if the existing stored resource has a grace period as 0 and if so
// attempt to delete immediately in order to recover from this scenario.
if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
return false, false, nil
}
// only a shorter grace period may be provided by a user
if options.GracePeriodSeconds != nil {
period := int64(*options.GracePeriodSeconds)
if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
return false, true, nil
}
As the commen suggest, in phase 1, the DeletionGracePeriodSeconds was set to a negative value (meaning DeletionTimestamp was in the past), and in phase 2, options.GracePeriodSeconds has to be set even lower, otherwise, it will be considered as pending graceful deletion and not performing the immediate delete.
I think there're two possible fixes.
- Consider not allowing setting
DeletionTimestampin the past (DeletionGracePeriodSecondsshould be non-negative). In that case, we could convertoptions.GracePeriodSecondsto 0 if negative. - Accept that
DeletionTimestampcould be set in the past. In that case, in phase 2, we treat negative value ofobjectMeta.GetDeletionGracePeriodSeconds()the same as 0, and allow immediate delete.
Thoughts?
|
DeletionGracePeriodSeconds should not accept negative values, how did that happen? |
|
@lavalamp We're still trying to figure it out, but current hypothesis is a 3rd party node termination handler manually set it to -1. There are no guards against negative values from what I can tell. |
|
I tested in the latest version, the problem persists and is getting worse. At least it could be deleted with --force. Now even if it is forcibly deleted, the Pod is still in the CRI. |
|
/cc |
|
/milestone v1.24 |
|
Hi 👋 I'm checking in from the bug triage team for release 1.24. Is this PR targeted for release 1.24? |
|
@jyotimahapatra |
|
@ehashman @SergeyKanzhelev do we need this for v1.24? |
|
I don't think we've made enough progress on this to merge it by tomorrow. /milestone clear |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
/remove-lifecycle stale |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
@wzshiming: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
/sig node
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #83916
Fixes #84298
Fixes #87039
Fixes #88613
xref #98506
Fixes #100695
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: