-
Notifications
You must be signed in to change notification settings - Fork 42k
Ensure pod cgroup is deleted prior to deletion of pod #41644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure pod cgroup is deleted prior to deletion of pod #41644
Conversation
|
@derekwaynecarr might make more sense to put it here https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L453 That way |
|
@sjenning -- i struggled with that last night, but i cant delete the pod cgroup until i know the volumes are removed so memory charges dont propagate. i was also tempted to rename the method to EnsureOkToDeletePod or something similar. |
|
iterating on this more, will poke for a second look. |
|
@derekwaynecarr just noting the problematic callsite for |
d7ea538 to
cafb9f5
Compare
|
ok, there is a logic error in HandlePodCleanups that is causing a chicken/egg situation. |
cafb9f5 to
1b46910
Compare
3a8777d to
06b294e
Compare
|
I think this should be good to go (passing node e2e will confirm it). Previously, we only deleted a cgroup if the pod was removed from the api server. I moved the code to reduce cpu limits so its localized with the pod volume check that we wait to resolve. If volumes are gone, then I delete as expected. I don't pend if the keep-terminated-volumes flag is on since operators have chosen to run with wild-west in that model to support debugging. |
| // If volumes have not been unmounted/detached, do not delete the cgroup | ||
| // so any memory backed volumes don't have their charges propagated to the | ||
| // parent croup. If the volumes still exist, reduce the cpu shares for any | ||
| // process in the cgroup to the minimum value while we wait. if the kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that cpu has been freed up needs to be communicated to the scheduler. May be we need to introduce a notion of Availability (in addition to Usage) at the nodes instead of having the scheduler calculate summation across nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly, definitely not for 1.6 ;-)
|
Holding LGTM to resolve #41644 (comment) |
|
Sure, Ill run some tests. Give me an hour or so, and ill post results here. |
|
If you have any scripts or e2e's that you use to benchmark, share them with
us and we can run it too.
…On Fri, Feb 17, 2017 at 12:47 PM, David Ashpole ***@***.***> wrote:
Sure, Ill run some tests. Give me an hour or so, and ill post results here.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#41644 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKNaLrB_cmxnlG2gR7alfvG0864Onks5rdgdGgaJpZM4MEZrD>
.
|
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515) Disable cgroups-per-qos pending Burstable/cpu.shares being set Disable cgroups-per-qos to allow kubemark problems to still be resolved. Re-enable it once the following merge: #41753 #41644 #41621 Enabling it before cpu.shares is set on qos tiers can cause regressions since Burstable and BestEffort pods are given equal time.
|
@k8s-bot node e2e test this |
e886bf0 to
d1597b7
Compare
|
rebased and re-tagging /lgtm |
|
@k8s-bot test this |
|
@k8s-bot kops aws e2e test this |
d1597b7 to
21a899c
Compare
|
rebased and retagging.... |
|
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: derekwaynecarr, vishh Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
@derekwaynecarr: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212) |
Automatic merge from submit-queue [Bug Fix] Garbage Collect Node e2e Failing This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes). #41644 likely increased time for deletion. See that PR for analysis on that. There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout. This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix. cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694) Create DefaultPodDeletionTimeout for e2e tests In our e2e and e2e_node tests, we had a number of different timeouts for deletion. Recent changes to the way deletion works (kubernetes#41644, kubernetes#41456) have resulted in some timeouts in e2e tests. kubernetes#42661 was the most recent fix for this. Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished. For this reason, we should change all these tests to use a standard, fairly high timeout for deletion. cc @vishh @Random-Liu
What this PR does / why we need it:
This PR ensures that the kubelet removes the pod cgroup sandbox prior to deletion of a pod from the apiserver. We need this to ensure that the default behavior in the kubelet is to not leak resources.