-
Notifications
You must be signed in to change notification settings - Fork 42k
Delay Deletion of a Pod until volumes are cleaned up #41456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay Deletion of a Pod until volumes are cleaned up #41456
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: dashpole Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
ad3fd4f to
1d38818
Compare
|
@dashpole Could you put some details about this issue you discovered in the previous PR so that we could learn something from it? Thanks! |
|
@jingxu97 The issue from the previous PR, as shown here, was that some updates to the API server were breaking the invariant that pod status always progresses. For example, Status can go PENDING -> RUNNING -> FAILED, but should never go PENDING -> RUNNING -> FAILED -> RUNNING, which was what was happening in the previous PR. By adding debugging statements, and reproducing the bug, I was able to see that calls to statusManager.SetPodStatus, was correctly updating the status to FAILED, and that a call to statusManager.TerminatePod was resetting the status to RUNNING. This led to the discovery of #41436 in TerminatePod, which I believe has solved the problem. |
| if !ok { | ||
| return false | ||
| } | ||
| return !kubepod.IsMirrorPod(pod) && m.podDeletionSafety.OkToDeletePod(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#41507 might be relevant here.
|
Here we go again :) |
|
/lgtm |
|
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416) |
|
@dashpole: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694) Create DefaultPodDeletionTimeout for e2e tests In our e2e and e2e_node tests, we had a number of different timeouts for deletion. Recent changes to the way deletion works (kubernetes#41644, kubernetes#41456) have resulted in some timeouts in e2e tests. kubernetes#42661 was the most recent fix for this. Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished. For this reason, we should change all these tests to use a standard, fairly high timeout for deletion. cc @vishh @Random-Liu
#41436 fixed the bug that caused #41095 and #40239 to have to be reverted. Now that the bug is fixed, this shouldn't cause problems.
@vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc