Ensure pod cgroup is deleted prior to deletion of pod #41644

derekwaynecarr · 2017-02-17T15:31:01Z

What this PR does / why we need it:
This PR ensures that the kubelet removes the pod cgroup sandbox prior to deletion of a pod from the apiserver. We need this to ensure that the default behavior in the kubelet is to not leak resources.

k8s-reviewable · 2017-02-17T15:31:09Z

This change is

derekwaynecarr · 2017-02-17T15:31:22Z

fyi @sjenning @vishh @dashpole

sjenning · 2017-02-17T16:55:19Z

@derekwaynecarr might make more sense to put it here https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L453

That way OkToDeletePod() has no side effects and it is clear that the pod cgroup deletion happens right before the pod deletion in syncPod().

derekwaynecarr · 2017-02-17T16:58:14Z

@sjenning -- i struggled with that last night, but i cant delete the pod cgroup until i know the volumes are removed so memory charges dont propagate. i was also tempted to rename the method to EnsureOkToDeletePod or something similar.

derekwaynecarr · 2017-02-17T17:00:51Z

iterating on this more, will poke for a second look.

sjenning · 2017-02-17T17:04:25Z

@derekwaynecarr just noting the problematic callsite for OkToDeletePod()
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L481

derekwaynecarr · 2017-02-17T18:07:06Z

ok, there is a logic error in HandlePodCleanups that is causing a chicken/egg situation.

derekwaynecarr · 2017-02-17T20:09:20Z

I think this should be good to go (passing node e2e will confirm it).

Previously, we only deleted a cgroup if the pod was removed from the api server. I moved the code to reduce cpu limits so its localized with the pod volume check that we wait to resolve. If volumes are gone, then I delete as expected. I don't pend if the keep-terminated-volumes flag is on since operators have chosen to run with wild-west in that model to support debugging.

vishh · 2017-02-17T20:31:08Z

pkg/kubelet/kubelet_pods.go

+		// If volumes have not been unmounted/detached, do not delete the cgroup
+		// so any memory backed volumes don't have their charges propagated to the
+		// parent croup.  If the volumes still exist, reduce the cpu shares for any
+		// process in the cgroup to the minimum value while we wait.  if the kubelet


The fact that cpu has been freed up needs to be communicated to the scheduler. May be we need to introduce a notion of Availability (in addition to Usage) at the nodes instead of having the scheduler calculate summation across nodes.

possibly, definitely not for 1.6 ;-)

vishh · 2017-02-17T20:32:31Z

What is the worst case latency with this change? @dashpole ran some experiments. I wonder if we can run them on your PR too. @dashpole thoughts?

I suspect the only variable here is the loop period for HandlePodCleanup() method.

vishh · 2017-02-17T20:32:43Z

Holding LGTM to resolve #41644 (comment)

dashpole · 2017-02-17T20:46:32Z

Sure, Ill run some tests. Give me an hour or so, and ill post results here.

vishh · 2017-02-17T20:53:38Z

If you have any scripts or e2e's that you use to benchmark, share them with us and we can run it too.

…

On Fri, Feb 17, 2017 at 12:47 PM, David Ashpole ***@***.***> wrote: Sure, Ill run some tests. Give me an hour or so, and ill post results here. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#41644 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKNaLrB_cmxnlG2gR7alfvG0864Onks5rdgdGgaJpZM4MEZrD> .

Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515) Disable cgroups-per-qos pending Burstable/cpu.shares being set Disable cgroups-per-qos to allow kubemark problems to still be resolved. Re-enable it once the following merge: #41753 #41644 #41621 Enabling it before cpu.shares is set on qos tiers can cause regressions since Burstable and BestEffort pods are given equal time.

derekwaynecarr · 2017-02-28T00:53:16Z

@k8s-bot node e2e test this

derekwaynecarr · 2017-02-28T19:59:34Z

rebased and re-tagging /lgtm

derekwaynecarr · 2017-02-28T22:23:41Z

@k8s-bot test this

derekwaynecarr · 2017-03-01T05:10:56Z

@k8s-bot kops aws e2e test this

derekwaynecarr · 2017-03-01T20:30:54Z

rebased and retagging....

k8s-github-robot · 2017-03-01T20:32:59Z

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, vishh

Needs approval from an approver in each of these OWNERS Files:

~~pkg/kubelet/OWNERS~~ [derekwaynecarr,vishh]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-03-01T22:19:23Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-ci-robot · 2017-03-01T23:00:07Z

@derekwaynecarr: The following test(s) failed:

Test name	Commit	Details	Rerun command
Jenkins GCE Node e2e	`21a899c`	link	`@k8s-bot node e2e test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2017-03-01T23:30:30Z

Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

@calebamiles

Automatic merge from submit-queue [Bug Fix] Garbage Collect Node e2e Failing This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes). #41644 likely increased time for deletion. See that PR for analysis on that. There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout. This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix. cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr

@vishh

Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694) Create DefaultPodDeletionTimeout for e2e tests In our e2e and e2e_node tests, we had a number of different timeouts for deletion. Recent changes to the way deletion works (kubernetes#41644, kubernetes#41456) have resulted in some timeouts in e2e tests. kubernetes#42661 was the most recent fix for this. Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished. For this reason, we should change all these tests to use a standard, fairly high timeout for deletion. cc @vishh @Random-Liu

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 17, 2017

derekwaynecarr assigned vishh Feb 17, 2017

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Feb 17, 2017

derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Feb 17, 2017

derekwaynecarr changed the title ~~Ensure pod cgroup is deleted prior to deletion of pod~~ WIP: Ensure pod cgroup is deleted prior to deletion of pod Feb 17, 2017

derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch 2 times, most recently from d7ea538 to cafb9f5 Compare February 17, 2017 17:24

derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from cafb9f5 to 1b46910 Compare February 17, 2017 19:43

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 17, 2017

derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch 2 times, most recently from 3a8777d to 06b294e Compare February 17, 2017 20:05

derekwaynecarr changed the title ~~WIP: Ensure pod cgroup is deleted prior to deletion of pod~~ Ensure pod cgroup is deleted prior to deletion of pod Feb 17, 2017

vishh reviewed Feb 17, 2017

View reviewed changes

vishh assigned dashpole Feb 17, 2017

derekwaynecarr added this to the v1.6 milestone Feb 28, 2017

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2017

derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from e886bf0 to d1597b7 Compare February 28, 2017 19:57

k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2017

derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2017

Ensure pod cgroup is deleted prior to deletion of pod

21a899c

derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from d1597b7 to 21a899c Compare March 1, 2017 20:30

derekwaynecarr added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 1, 2017

k8s-github-robot merged commit ddd8b5c into kubernetes:master Mar 1, 2017

dashpole mentioned this pull request Mar 4, 2017

ci-kubernetes-node-kubelet-serial: broken test run #38050

Closed

jszczepkowski mentioned this pull request Mar 6, 2017

Broken ci-kubernetes-node-kubelet e2e test #42580

Closed

This was referenced Mar 7, 2017

[Bug Fix] Garbage Collect Node e2e Failing #42661

Merged

Add Pod Deletion Prometheus Metric #42673

Closed

This was referenced Mar 8, 2017

Create DefaultPodDeletionTimeout for e2e tests #42734

Merged

e2e flake: [k8s.io] NoExecuteTaintManager [Serial] eventually evict pod with finite tolerations from tainted nodes #42685

Closed

dashpole mentioned this pull request Mar 15, 2017

Eviction Manager should not evict the next pod until the previously evicted pod is completely deleted. #43166

Closed

Ensure pod cgroup is deleted prior to deletion of pod #41644

Ensure pod cgroup is deleted prior to deletion of pod #41644

Conversation

derekwaynecarr commented Feb 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-reviewable commented Feb 17, 2017

Uh oh!

derekwaynecarr commented Feb 17, 2017

Uh oh!

sjenning commented Feb 17, 2017

Uh oh!

derekwaynecarr commented Feb 17, 2017

Uh oh!

derekwaynecarr commented Feb 17, 2017

Uh oh!

sjenning commented Feb 17, 2017

Uh oh!

derekwaynecarr commented Feb 17, 2017

Uh oh!

derekwaynecarr commented Feb 17, 2017

Uh oh!

vishh Feb 17, 2017

Choose a reason for hiding this comment

Uh oh!

derekwaynecarr Feb 17, 2017

Choose a reason for hiding this comment

Uh oh!

vishh commented Feb 17, 2017

Uh oh!

vishh commented Feb 17, 2017

Uh oh!

dashpole commented Feb 17, 2017

Uh oh!

vishh commented Feb 17, 2017 via email

Uh oh!

derekwaynecarr commented Feb 28, 2017

Uh oh!

derekwaynecarr commented Feb 28, 2017

Uh oh!

derekwaynecarr commented Feb 28, 2017

Uh oh!

derekwaynecarr commented Mar 1, 2017

Uh oh!

derekwaynecarr commented Mar 1, 2017

Uh oh!

k8s-github-robot commented Mar 1, 2017

Uh oh!

k8s-github-robot commented Mar 1, 2017

Uh oh!

k8s-ci-robot commented Mar 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-github-robot commented Mar 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

derekwaynecarr commented Feb 17, 2017 •

edited

Loading

k8s-ci-robot commented Mar 1, 2017 •

edited

Loading