Skip to content

Conversation

@derekwaynecarr
Copy link
Member

@derekwaynecarr derekwaynecarr commented Feb 17, 2017

What this PR does / why we need it:
This PR ensures that the kubelet removes the pod cgroup sandbox prior to deletion of a pod from the apiserver. We need this to ensure that the default behavior in the kubelet is to not leak resources.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 17, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@derekwaynecarr
Copy link
Member Author

fyi @sjenning @vishh @dashpole

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Feb 17, 2017
@derekwaynecarr derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Feb 17, 2017
@sjenning
Copy link
Contributor

@derekwaynecarr might make more sense to put it here https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/status/status_manager.go#L453

That way OkToDeletePod() has no side effects and it is clear that the pod cgroup deletion happens right before the pod deletion in syncPod().

@derekwaynecarr
Copy link
Member Author

@sjenning -- i struggled with that last night, but i cant delete the pod cgroup until i know the volumes are removed so memory charges dont propagate. i was also tempted to rename the method to EnsureOkToDeletePod or something similar.

@derekwaynecarr derekwaynecarr changed the title Ensure pod cgroup is deleted prior to deletion of pod WIP: Ensure pod cgroup is deleted prior to deletion of pod Feb 17, 2017
@derekwaynecarr
Copy link
Member Author

iterating on this more, will poke for a second look.

@sjenning
Copy link
Contributor

@derekwaynecarr derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch 2 times, most recently from d7ea538 to cafb9f5 Compare February 17, 2017 17:24
@derekwaynecarr
Copy link
Member Author

ok, there is a logic error in HandlePodCleanups that is causing a chicken/egg situation.

@derekwaynecarr derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from cafb9f5 to 1b46910 Compare February 17, 2017 19:43
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 17, 2017
@derekwaynecarr derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch 2 times, most recently from 3a8777d to 06b294e Compare February 17, 2017 20:05
@derekwaynecarr derekwaynecarr changed the title WIP: Ensure pod cgroup is deleted prior to deletion of pod Ensure pod cgroup is deleted prior to deletion of pod Feb 17, 2017
@derekwaynecarr
Copy link
Member Author

I think this should be good to go (passing node e2e will confirm it).

Previously, we only deleted a cgroup if the pod was removed from the api server. I moved the code to reduce cpu limits so its localized with the pod volume check that we wait to resolve. If volumes are gone, then I delete as expected. I don't pend if the keep-terminated-volumes flag is on since operators have chosen to run with wild-west in that model to support debugging.

// If volumes have not been unmounted/detached, do not delete the cgroup
// so any memory backed volumes don't have their charges propagated to the
// parent croup. If the volumes still exist, reduce the cpu shares for any
// process in the cgroup to the minimum value while we wait. if the kubelet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that cpu has been freed up needs to be communicated to the scheduler. May be we need to introduce a notion of Availability (in addition to Usage) at the nodes instead of having the scheduler calculate summation across nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly, definitely not for 1.6 ;-)

@vishh
Copy link
Contributor

vishh commented Feb 17, 2017

What is the worst case latency with this change? @dashpole ran some experiments. I wonder if we can run them on your PR too. @dashpole thoughts?

I suspect the only variable here is the loop period for HandlePodCleanup() method.

@vishh
Copy link
Contributor

vishh commented Feb 17, 2017

Holding LGTM to resolve #41644 (comment)

@dashpole
Copy link
Contributor

Sure, Ill run some tests. Give me an hour or so, and ill post results here.

@vishh
Copy link
Contributor

vishh commented Feb 17, 2017 via email

k8s-github-robot pushed a commit that referenced this pull request Feb 25, 2017
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)

Disable cgroups-per-qos pending Burstable/cpu.shares being set

Disable cgroups-per-qos to allow kubemark problems to still be resolved.

Re-enable it once the following merge:
#41753
#41644
#41621

Enabling it before cpu.shares is set on qos tiers can cause regressions since Burstable and BestEffort pods are given equal time.
@derekwaynecarr
Copy link
Member Author

@k8s-bot node e2e test this

@derekwaynecarr derekwaynecarr added this to the v1.6 milestone Feb 28, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2017
@derekwaynecarr derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from e886bf0 to d1597b7 Compare February 28, 2017 19:57
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2017
@derekwaynecarr
Copy link
Member Author

rebased and re-tagging /lgtm

@derekwaynecarr derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2017
@derekwaynecarr
Copy link
Member Author

@k8s-bot test this

@derekwaynecarr
Copy link
Member Author

@k8s-bot kops aws e2e test this

@derekwaynecarr derekwaynecarr force-pushed the ensure-pod-cgroup-deleted branch from d1597b7 to 21a899c Compare March 1, 2017 20:30
@derekwaynecarr
Copy link
Member Author

rebased and retagging....

@derekwaynecarr derekwaynecarr added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 1, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, vishh

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 1, 2017

@derekwaynecarr: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins GCE Node e2e 21a899c link @k8s-bot node e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

@k8s-github-robot k8s-github-robot merged commit ddd8b5c into kubernetes:master Mar 1, 2017
k8s-github-robot pushed a commit that referenced this pull request Mar 8, 2017
Automatic merge from submit-queue

[Bug Fix] Garbage Collect Node e2e Failing

This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion.  See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.

This PR changes the Garbage Collector test to use the default timeout.  This should allow us to discern if there are any actual bugs to fix.

cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
jwhonce pushed a commit to jwhonce/kubernetes that referenced this pull request Mar 9, 2017
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Create DefaultPodDeletionTimeout for e2e tests

In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (kubernetes#41644, kubernetes#41456) have resulted in some timeouts in e2e tests.  kubernetes#42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.

cc @vishh @Random-Liu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants