Pod level in place pod resize - alpha#132919
Pod level in place pod resize - alpha#132919k8s-ci-robot merged 16 commits intokubernetes:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
1c21c5b to
b2b7220
Compare
|
The WiFi isn't working at the conference, so I won't be able to review this until I get back to my hotel room tonight. @natasha41575 can you help review today? If you lgtm I'll try to just approve tonight. |
| // comparison (DeepEqual) against oldPod.Spec. If the specs match after masking, | ||
| // it guarantees that no changes were made other than the allowed resource modifications | ||
| // needed for the resize. | ||
| func validatePodLevelResourcesResize(newPod, oldPod *core.Pod, podSpecToMutate *core.PodSpec, specPath *field.Path, opts PodValidationOptions) field.ErrorList { |
There was a problem hiding this comment.
Add some of these cases to the e2e tests in test/e2e/common/node/pod_resize.go, in doPodResizePatchErrorTests
| // owner: @ndixita | ||
| // kep: https://kep.k8s.io/5419 | ||
| // | ||
| // Enables specifying resources at pod-level. |
There was a problem hiding this comment.
update the description
| GetContainerResources(podUID types.UID, containerName string) (v1.ResourceRequirements, bool) | ||
| GetPodResourceInfoMap() PodResourceInfoMap | ||
| GetPodResourceInfo(podUID types.UID) (PodResourceInfo, bool) | ||
| GetPodLevelResources(podUID types.UID) (*v1.ResourceRequirements, bool) |
There was a problem hiding this comment.
(tiny optional nit) the analogous GetContainerResources returns (v1.ResourceRequirements, bool), should we keep it consistent and have the same return values?
| type writer interface { | ||
| SetContainerResources(podUID types.UID, containerName string, resources v1.ResourceRequirements) error | ||
| SetPodResourceInfo(podUID types.UID, resourceInfo PodResourceInfo) error | ||
| SetPodLevelResources(podUID types.UID, alloc *v1.ResourceRequirements) error |
|
|
||
| podInfo, ok := s.podResources[podUID] | ||
| if !ok { | ||
| podInfo.PodLevelResources = &v1.ResourceRequirements{} |
There was a problem hiding this comment.
does this line do anything? You are immediately overwriting it in the next line
| expectPodSyncTriggered: "true", | ||
| }, | ||
| { | ||
| name: "pod-level: Request CPU and memory increase - expect InProgress", |
There was a problem hiding this comment.
Thanks for adding these cases. They look good. I would like to request one or two cases with both pod level and container level resources
| if tt.inPlacePodLevelResizeEnabled { | ||
| featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.InPlacePodLevelResourcesVerticalScaling, true) | ||
| } |
There was a problem hiding this comment.
I see a few other places actually where the same comment applies, you can update them all
| if tt.inPlacePodLevelResizeEnabled { | |
| featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.InPlacePodLevelResourcesVerticalScaling, true) | |
| } | |
| featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.InPlacePodLevelResourcesVerticalScaling, tt.inPlacePodLevelResizeEnabled) |
| // testPods[0] has the highest priority, as it doesn't increase resource requests (pod-level). | ||
| // testPods[1] has the next highest priority, as it doesn't increase resource requests (container-level). |
There was a problem hiding this comment.
(nonblocking)
-
Is the pod with pod-level resources being prioritied over the pod with container-level resources intentional? (I think the answer is and should be no, they are equal priority and either being first is acceptable) Let's leave a comment to state as such
-
What I want this test to cover is that the resize of pod-level resources is taken into account when determining priority. As in, resizes that do not increase any requests are prioritized over resizes that do increase pod-level requests. To that end, I think some of the other pods should be converted to using pod-level resources. One option to ensure all our bases are covered:
- testPods[2] should have both pod and container-level resources specified, with only pod-level resources being increased
- testPods[3] should be converted to having only pod-level resources, with the resize increasing them
- testPods[4] should have both pod and container-level resources, with the resize increasing only container-level resources
- testPods[5] and testPods[6] cover the case of only container-level resources being specified and resized.
| currentResources := containerResourcesFromRequirements(&actuatedResources) | ||
| var actuatedPodResources *v1.ResourceRequirements | ||
| if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodLevelResourcesVerticalScaling) { | ||
| actuatedPodResources, _ = m.actuatedState.GetPodLevelResources(pod.UID) |
There was a problem hiding this comment.
do we need to check the return value of found here (similar to what we do with container resources above)
There was a problem hiding this comment.
It's just used for logging, but probably good to know?
| if m.isContainerResourceResizeInProgress(allocatedPod, podStatus) { | ||
| return true | ||
| } | ||
|
|
||
| return m.isPodLevelResourcesResizeInProgress(allocatedPod, podStatus) |
There was a problem hiding this comment.
optional nit, just a personal style preference
| if m.isContainerResourceResizeInProgress(allocatedPod, podStatus) { | |
| return true | |
| } | |
| return m.isPodLevelResourcesResizeInProgress(allocatedPod, podStatus) | |
| return m.isContainerResourceResizeInProgress(allocatedPod, podStatus) || m.isPodLevelResourcesResizeInProgress(allocatedPod, podStatus) |
| // b) api-server in services doesn't start with --enable-admission-plugins=ResourceQuota | ||
| // and is not possible to start it from TEST_ARGS | ||
| // Above tests in test/e2e/node/pod_resize.go | ||
| var _ = SIGDescribe("Pod InPlace Resize Container", func() { |
There was a problem hiding this comment.
I don't want to change this here because this description is included in the name of the test in conformance.yaml (see https://github.com/kubernetes/kubernetes/pull/135067/files)
There was a problem hiding this comment.
i.e. make sure there's no diff when you run ./hack/update-conformance.sh
There was a problem hiding this comment.
The WiFi isn't working at the conference, so I won't be able to review this until I get back to my hotel room tonight.
@natasha41575 can you help review today? If you lgtm I'll try to just approve tonight.
I'm out of time today. I have a few outstanding comments here and also above but besides this I think this largely LGTM if those are addressed
I will also add a caveat that I did not get around to looking at the e2e tests super carefully after they were updated to make everything pass - so that still needs another eye on it.
| return resizeResult | ||
| } | ||
|
|
||
| updateActuatedPodLevelResources := func(resourceName v1.ResourceName) error { |
There was a problem hiding this comment.
Is it possible to move this function definition to right before it's called (L903) so that the code block is within the FG?
|
/hold |
Signed-off-by: ndixita <[email protected]>
|
|
||
| podWithoutContainerResources := pod.DeepCopy() | ||
| podWithoutContainerResources.Spec.Containers[0].Resources = v1.ResourceRequirements{} | ||
| podWithoutContainerResources.Spec.InitContainers[0].Resources = v1.ResourceRequirements{} |
There was a problem hiding this comment.
I think the pod has 2 regular containers and 2 init containers. Also, did you mean for this pod to have pod-level resources?
| // The implementation of CPU/Memory limit calculation is duplicated here and in | ||
| // pkg/kubelet/kuberuntime/kuberuntime_container_linux.go (getCPULimits/getMemoryLimits). | ||
| // Refactor into a shared utility function. | ||
| func GetLimits(res *ResourceOpts) v1.ResourceList { |
There was a problem hiding this comment.
nit: why the ResourceOpts struct, rather than just passing 2 parameters?
There was a problem hiding this comment.
While testing I swapped the two params :| so for clarity or I could pass container resources as the v1.ResourceRequirements and pod resources as *v1.ResourceRequirements.
| currentResources := containerResourcesFromRequirements(&actuatedResources) | ||
| var actuatedPodResources *v1.ResourceRequirements | ||
| if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodLevelResourcesVerticalScaling) { | ||
| actuatedPodResources, _ = m.actuatedState.GetPodLevelResources(pod.UID) |
There was a problem hiding this comment.
It's just used for logging, but probably good to know?
|
|
||
| if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodLevelResourcesVerticalScaling) { | ||
| if err = updateActuatedPodLevelResources(rName); err != nil { | ||
| logger.Error(err, "Failed to update pod-level actuated resources", "resource", rName, "pod", klog.KObj(pod)) |
| return nil | ||
| } | ||
|
|
||
| if actuatedPod.Spec.Resources == nil { |
There was a problem hiding this comment.
nit: this and the nil check below seem unnecessary.
| if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodLevelResourcesVerticalScaling) { | ||
| apiPodStatus.Resources = kl.convertToAPIPodLevelResourcesStatus(pod, oldPodStatus) | ||
| opts := resourcehelper.PodResourcesOptions{ | ||
| SkipPodLevelResources: !utilfeature.DefaultFeatureGate.Enabled(features.PodLevelResources), |
There was a problem hiding this comment.
This should exclude overhead, right?
| } | ||
|
|
||
| func (kl *Kubelet) convertToAPIPodLevelResourcesStatus(allocatedPod *v1.Pod, oldPodStatus v1.PodStatus) *v1.ResourceRequirements { | ||
| if allocatedPod.Status.Phase != v1.PodRunning { |
There was a problem hiding this comment.
I think we discussed just leaving the actual resources unset when the pod isn't running?
| if !opts.SkipPodLevelResources && IsPodLevelRequestsSet(pod) { | ||
|
|
||
| var effectiveReqs v1.ResourceList | ||
| if opts.InPlacePodLevelResourcesVerticalScalingEnabled && opts.UseStatusResources { |
There was a problem hiding this comment.
These changes should have unit test coverage.
| }, | ||
| } | ||
|
|
||
| if podResources != nil { |
There was a problem hiding this comment.
nit: nil check is unnecessary
|
/approve looks like lint / verify has some cleanup to make happy |
|
/test pull-kubernetes-unit-windows-master |
Signed-off-by: ndixita <[email protected]>
|
/retest |
1 similar comment
|
/retest |
|
/test pull-kubernetes-node-e2e-containerd |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, liggitt, ndixita, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-kubernetes-e2e-gce |
Signed-off-by: ndixita <[email protected]>
|
/retest |
|
@ndixita: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test pull-kubernetes-e2e-gce |
| if oldPodInfo.Spec.Resources == nil && newPodInfo.Spec.Resources == nil { | ||
| return false | ||
| } | ||
| return !reflect.DeepEqual(oldPodInfo.Spec.Resources, newPodInfo.Spec.Resources) |
There was a problem hiding this comment.
I didn't catch this before, but I think we forbid updating a pod with pod-level-resources before? So can this logic just be:
| if oldPodInfo.Spec.Resources == nil && newPodInfo.Spec.Resources == nil { | |
| return false | |
| } | |
| return !reflect.DeepEqual(oldPodInfo.Spec.Resources, newPodInfo.Spec.Resources) | |
| return oldPodInfo.Spec.Resources != nil || newPodInfo.Spec.Resources != nil |
If we do need the resource comparison, it should use api.Semantic.DeepEqual instead.
|
/lgtm Some follow-up to do, but I think this is good enough for alpha. |
|
LGTM label has been added. DetailsGit tree hash: 87219c958ab2fc2dea5777e8039fae6ced1de7d4 |
|
/hold cancel |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
This PR adds support for IPPR + Pod Level resources to allow in place resizing of pod-level resources in alpha stage.
Which issue(s) this PR is related to:
Issue: kubernetes/enhancements#5419
Reference KEPs when applicable in addition to specific issues.
KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5419-pod-level-resources-in-place-resize/README.md
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md
Does this PR introduce a user-facing change?