-
Notifications
You must be signed in to change notification settings - Fork 42k
[Bug Fix] Garbage Collect Node e2e Failing #42661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/release-note-none |
|
Flake PR: #38050 |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: dashpole, dchen1107 Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
|
@k8s-bot non-cri e2e test this |
| for _, pod := range test.testPods { | ||
| By(fmt.Sprintf("Deleting Pod %v", pod.podName)) | ||
| f.PodClient().DeleteSync(pod.podName, &metav1.DeleteOptions{}, defaultRuntimeRequestTimeoutDuration) | ||
| f.PodClient().DeleteSync(pod.podName, &metav1.DeleteOptions{}, podDisappearTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The name is odd. Could it be podDeletionTimeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is equally odd:
// podDisappearTimeout is the timeout to wait node disappear.
podDisappearTimeout = time.Minute * 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth just doing a quick pass over the e2e node suite and rename constants and move them to a single file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also podWaitTimeout in lifecycle_hook_test.go, which is set to 3 minutes, and is used as a timeout for deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I feel we can add default timeouts to DeleteSync() itself which can derive that from the framework. Less constants to deal with in such a workflow.
if this PR is urgent, don't block on these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#42734 creates a default and addresses these comments
|
@k8s-bot gce etcd3 e2e test this |
|
@dashpole: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Automatic merge from submit-queue |
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694) Create DefaultPodDeletionTimeout for e2e tests In our e2e and e2e_node tests, we had a number of different timeouts for deletion. Recent changes to the way deletion works (kubernetes#41644, kubernetes#41456) have resulted in some timeouts in e2e tests. kubernetes#42661 was the most recent fix for this. Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished. For this reason, we should change all these tests to use a standard, fairly high timeout for deletion. cc @vishh @Random-Liu
This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion. See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.
This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix.
cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr