windows: Add runhcs-wcow-hypervisor runtimeclass to the default config#6901
windows: Add runhcs-wcow-hypervisor runtimeclass to the default config#6901kevpar merged 1 commit intocontainerd:mainfrom
Conversation
|
Given that this is part of #6862, will this runtime class work as-is once added, or does it depend on other (yet to come) feature work? |
|
@kevpar It needs a small change in the shim to set the HyperV runtime spec field that's mentioned in the description. Just posted the shim PR here microsoft/hcsshim#1388 |
e365018 to
8cc87af
Compare
8cc87af to
75f4f1f
Compare
kevpar
left a comment
There was a problem hiding this comment.
LGTM pending CI. Should we wait till the hcsshim change is in before merging?
@kevpar I just checked it in, but if you mean wait until we vendor in a tag that contains it here that would make sense to me. You'd end up getting a process isolated container at the moment without it |
|
Gonna leave this on draft until we get an hcsshim tag with the work to make use of this in microsoft/hcsshim@18f4761 |
| // ScaleCpuLimitsToSandbox indicates that container CPU limits should | ||
| // be adjusted to account for the difference in number of cores between the | ||
| // host and UVM. This should only be turned on if SandboxIsolation is 1. | ||
| "ScaleCpuLimitsToSandbox": true, |
There was a problem hiding this comment.
Can we be more specific here? Is this Shares/Count/Maximum? Or is it just Maximum?
There was a problem hiding this comment.
I'll update to be more specific. It's referring to Maximum only https://github.com/microsoft/hcsshim/blob/25b67340dfe7eb35a4591fddf97025cf7e417f69/internal/hcsoci/hcsdoc_wcow.go#L195
There was a problem hiding this comment.
Done, let me know if that's sufficient
There was a problem hiding this comment.
Hmm. The more I think about this I dont know if its a correct "static" answer. When I create a spec I will do things like this:
C1.Max = 25% * 100 == 2500
C2.Max = 50% * 100 == 5000
etc.
When I create the default pod size as 4 vCPU vs 8 vCPU I still want that ratio since its "percentage" based. I only want that ratio to change when its "units" (millicores) based. So for example if the spec was:
C1.Cpus == "1500m"
C2.Cpus == "3000m"
Then indeed I have to use "maximum" and I would indeed then need it scaled out of the UVM size. However, given the above and the fact that the pod size is created before the containers, it seems likely that the caller knows the size of the pod as a whole. IE: it will be CEILING(3 + 1.5) == 5 vCPU for the UVM. Given that they know its 5 vCPU's when they schedule, shouldnt the caller then scale the millicores to the pod size?
But I can see why a caller doesnt want this logic as well, and would automagic behavior. Should this be dynamic or at least overrideable when scheduling a pod?
Am I crazy? (<- yes)
There was a problem hiding this comment.
This comment has the background behind the fields existence. https://github.com/microsoft/hcsshim/blob/main/internal/hcsoci/hcsdoc_wcow.go#L195-L237
Kind of in the same realm, the thing that sucks about the hyper-v multi-container pod work atm is unlike Kata, we can't hot-add cpus/mem at runtime so a client/user needs to be supply the UVM size up front by calculating what the containers will end up using in total. The Kata folks actually added a change to k8s to have the kubelet itself do this tallying and then just pass it to the runtime, as they'd prefer not to hot-add. From what I remember it's only tallied for Linux pods though😿 https://github.com/containerd/containerd/blob/main/pkg/cri/annotations/annotations.go#L35-L39. We need to do the same for Windows for this hyper-v work, it'd be a lot better of a story
There was a problem hiding this comment.
In previous testing we found that k8s tests will set a CPU % limit based on the host core count, and then measure how much CPU the guest gets, and expect it to be equal to that % of the host count. Once we introduced hv-isolation, the default of scaling the % to the guest core count became an issue as the k8s tests failed as they did not get as much CPU as expected.
This setting just says "when the orchestrator asks for 50% CPU, make it 50% of the host, rather than 50% of the UVM". In practice this seems to match expectations (like those of the tests) which are setting this value for process-isolation today, and would probably be surprised if switching to hv-isolation suddenly changed things.
There was a problem hiding this comment.
@jterry75 Does the above address the concerns/comments you had?
There was a problem hiding this comment.
Yea it addresses it, I'm just still not sure how it works from k8s perspective because I dont use % in the pod spec. I use core counts. But I will take your word for it that you have thought through this rock solid plan :)
75f4f1f to
7768ac2
Compare
|
Need to rebase this finally 🐳, checked in a shim tag that will allow this. |
As part of the effort of getting hypervisor isolated windows container support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor handler for the default config. This sets the correct SandboxIsolation value that the Windows shim uses to differentiate process vs. hypervisor isolation. This change additionally sets the wcow-process runtime to passthrough io.microsoft.container* annotations and the hypervisor runtime to accept io.microsoft.virtualmachine* annotations. Note that for K8s users this runtime handler will need to be configured by creating the corresponding RuntimeClass resources on the cluster as it's not the default runtime. Signed-off-by: Daniel Canter <[email protected]>
7768ac2 to
f0036cb
Compare
As part of the effort of getting Windows hypervisor isolated container support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor handler to the default config. This sets the correct SandboxIsolation value that the Windows shim uses to differentiate process vs. hypervisor isolation. This additionally allows io.microsoft.container* annotations for the wcow-process runtime and io.microsoft.virtualmachine* annotations for the vm based runtime.
Note that for K8s users this runtime handler will need to be configured by creating the corresponding RuntimeClass resources on the cluster as it's not the default runtime.
This needs a change in the shim that actually sets the HyperV field on the runtime spec to work properly, see here: microsoft/hcsshim#1388Done