windows: Add runhcs-wcow-hypervisor runtimeclass to the default config by dcantah · Pull Request #6901 · containerd/containerd

dcantah · 2022-05-06T06:14:18Z

As part of the effort of getting Windows hypervisor isolated container support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor handler to the default config. This sets the correct SandboxIsolation value that the Windows shim uses to differentiate process vs. hypervisor isolation. This additionally allows io.microsoft.container* annotations for the wcow-process runtime and io.microsoft.virtualmachine* annotations for the vm based runtime.

Note that for K8s users this runtime handler will need to be configured by creating the corresponding RuntimeClass resources on the cluster as it's not the default runtime.

~~This needs a change in the shim that actually sets the HyperV field on the runtime spec to work properly, see here: microsoft/hcsshim#1388~~ Done

PS C:\Users\TestVMAdmin\Desktop> .\crictl.exe pods
POD ID              CREATED             STATE             NAME       NAMESPACE      ATTEMPT      RUNTIME
363319e093c7a       11 minutes ago      Ready             pod        default        1            runhcs-wcow-hypervisor

PS C:\Users\TestVMAdmin\Desktop> hcsdiag list
363319e093c7a688d8d18ad17f5d384e60cb805ba7bf128ea93f0613af5483e9@vm
    VM,                         Running, C2C42EC1-5040-5D0A-AC86-F4BACF6B75CB, containerd-shim-runhcs-v1.exe

dcantah · 2022-05-06T06:20:30Z

cc @kevpar @marosset @jsturtevant @jterry75

kevpar · 2022-05-06T06:34:38Z

Given that this is part of #6862, will this runtime class work as-is once added, or does it depend on other (yet to come) feature work?

dcantah · 2022-05-06T06:55:54Z

@kevpar It needs a small change in the shim to set the HyperV runtime spec field that's mentioned in the description. Just posted the shim PR here microsoft/hcsshim#1388

kevpar

LGTM pending CI. Should we wait till the hcsshim change is in before merging?

dcantah · 2022-05-06T18:22:23Z

LGTM pending CI. Should we wait till the hcsshim change is in before merging?

@kevpar I just checked it in, but if you mean wait until we vendor in a tag that contains it here that would make sense to me. You'd end up getting a process isolated container at the moment without it

dcantah · 2022-05-06T20:55:08Z

Gonna leave this on draft until we get an hcsshim tag with the work to make use of this in microsoft/hcsshim@18f4761

jterry75 · 2022-05-10T21:41:48Z

+						// ScaleCpuLimitsToSandbox indicates that container CPU limits should
+						// be adjusted to account for the difference in number of cores between the
+						// host and UVM. This should only be turned on if SandboxIsolation is 1.
+						"ScaleCpuLimitsToSandbox": true,


Can we be more specific here? Is this Shares/Count/Maximum? Or is it just Maximum?

I'll update to be more specific. It's referring to Maximum only https://github.com/microsoft/hcsshim/blob/25b67340dfe7eb35a4591fddf97025cf7e417f69/internal/hcsoci/hcsdoc_wcow.go#L195

Done, let me know if that's sufficient

Hmm. The more I think about this I dont know if its a correct "static" answer. When I create a spec I will do things like this:

C1.Max = 25% * 100 == 2500
C2.Max = 50% * 100 == 5000
etc.

When I create the default pod size as 4 vCPU vs 8 vCPU I still want that ratio since its "percentage" based. I only want that ratio to change when its "units" (millicores) based. So for example if the spec was:

C1.Cpus == "1500m"
C2.Cpus == "3000m"

Then indeed I have to use "maximum" and I would indeed then need it scaled out of the UVM size. However, given the above and the fact that the pod size is created before the containers, it seems likely that the caller knows the size of the pod as a whole. IE: it will be CEILING(3 + 1.5) == 5 vCPU for the UVM. Given that they know its 5 vCPU's when they schedule, shouldnt the caller then scale the millicores to the pod size?

But I can see why a caller doesnt want this logic as well, and would automagic behavior. Should this be dynamic or at least overrideable when scheduling a pod?

Am I crazy? (<- yes)

This comment has the background behind the fields existence. https://github.com/microsoft/hcsshim/blob/main/internal/hcsoci/hcsdoc_wcow.go#L195-L237

Kind of in the same realm, the thing that sucks about the hyper-v multi-container pod work atm is unlike Kata, we can't hot-add cpus/mem at runtime so a client/user needs to be supply the UVM size up front by calculating what the containers will end up using in total. The Kata folks actually added a change to k8s to have the kubelet itself do this tallying and then just pass it to the runtime, as they'd prefer not to hot-add. From what I remember it's only tallied for Linux pods though😿 https://github.com/containerd/containerd/blob/main/pkg/cri/annotations/annotations.go#L35-L39. We need to do the same for Windows for this hyper-v work, it'd be a lot better of a story

In previous testing we found that k8s tests will set a CPU % limit based on the host core count, and then measure how much CPU the guest gets, and expect it to be equal to that % of the host count. Once we introduced hv-isolation, the default of scaling the % to the guest core count became an issue as the k8s tests failed as they did not get as much CPU as expected.

This setting just says "when the orchestrator asks for 50% CPU, make it 50% of the host, rather than 50% of the UVM". In practice this seems to match expectations (like those of the tests) which are setting this value for process-isolation today, and would probably be surprised if switching to hv-isolation suddenly changed things.

@jterry75 Does the above address the concerns/comments you had?

Yea it addresses it, I'm just still not sure how it works from k8s perspective because I dont use % in the pod spec. I use core counts. But I will take your word for it that you have thought through this rock solid plan :)

dcantah · 2022-08-17T20:23:31Z

Need to rebase this finally 🐳, checked in a shim tag that will allow this.

As part of the effort of getting hypervisor isolated windows container support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor handler for the default config. This sets the correct SandboxIsolation value that the Windows shim uses to differentiate process vs. hypervisor isolation. This change additionally sets the wcow-process runtime to passthrough io.microsoft.container* annotations and the hypervisor runtime to accept io.microsoft.virtualmachine* annotations. Note that for K8s users this runtime handler will need to be configured by creating the corresponding RuntimeClass resources on the cluster as it's not the default runtime. Signed-off-by: Daniel Canter <[email protected]>

kevpar reviewed May 6, 2022

View reviewed changes

Comment thread pkg/cri/config/config_windows.go Outdated

kevpar reviewed May 6, 2022

View reviewed changes

Comment thread pkg/cri/config/config_windows.go Outdated

kevpar reviewed May 6, 2022

View reviewed changes

Comment thread pkg/cri/config/config_windows.go Outdated

kevpar mentioned this pull request May 6, 2022

Windows Hyper-V Container Support For CRI #6862

Closed

5 tasks

dcantah force-pushed the add-wcowhyp-runtime branch from e365018 to 8cc87af Compare May 6, 2022 07:52

dcantah added this to the 1.7 milestone May 6, 2022

kevpar reviewed May 6, 2022

View reviewed changes

Comment thread pkg/cri/config/config_windows.go

dcantah force-pushed the add-wcowhyp-runtime branch from 8cc87af to 75f4f1f Compare May 6, 2022 17:35

kevpar approved these changes May 6, 2022

View reviewed changes

dcantah changed the title ~~windows: Add runhcs-wcow-hypervisor runtimeclass to the default config~~ [Do not merge] windows: Add runhcs-wcow-hypervisor runtimeclass to the default config May 6, 2022

dcantah changed the title ~~[Do not merge] windows: Add runhcs-wcow-hypervisor runtimeclass to the default config~~ windows: Add runhcs-wcow-hypervisor runtimeclass to the default config May 6, 2022

dcantah marked this pull request as draft May 6, 2022 20:54

jterry75 reviewed May 10, 2022

View reviewed changes

dcantah force-pushed the add-wcowhyp-runtime branch from 75f4f1f to 7768ac2 Compare May 10, 2022 23:37

dcantah closed this May 11, 2022

dcantah reopened this May 11, 2022

aznashwan mentioned this pull request Jul 7, 2022

Add Workflow for running critest with Hyper-V Containers on Windows. #7025

Merged

dcantah marked this pull request as ready for review August 17, 2022 20:22

dcantah force-pushed the add-wcowhyp-runtime branch from 7768ac2 to f0036cb Compare August 19, 2022 14:59

jterry75 approved these changes Aug 31, 2022

View reviewed changes

kevpar merged commit de509c0 into containerd:main Sep 8, 2022

Conversation

dcantah commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dcantah commented May 6, 2022

Uh oh!

Uh oh!

Uh oh!

kevpar commented May 6, 2022

Uh oh!

dcantah commented May 6, 2022

Uh oh!

Uh oh!

kevpar left a comment

Choose a reason for hiding this comment

Uh oh!

dcantah commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcantah commented May 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcantah commented Aug 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dcantah commented May 6, 2022 •

edited

Loading

dcantah commented May 6, 2022 •

edited

Loading