Skip to content

windows: Add runhcs-wcow-hypervisor runtimeclass to the default config#6901

Merged
kevpar merged 1 commit intocontainerd:mainfrom
dcantah:add-wcowhyp-runtime
Sep 8, 2022
Merged

windows: Add runhcs-wcow-hypervisor runtimeclass to the default config#6901
kevpar merged 1 commit intocontainerd:mainfrom
dcantah:add-wcowhyp-runtime

Conversation

@dcantah
Copy link
Copy Markdown
Member

@dcantah dcantah commented May 6, 2022

As part of the effort of getting Windows hypervisor isolated container support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor handler to the default config. This sets the correct SandboxIsolation value that the Windows shim uses to differentiate process vs. hypervisor isolation. This additionally allows io.microsoft.container* annotations for the wcow-process runtime and io.microsoft.virtualmachine* annotations for the vm based runtime.

Note that for K8s users this runtime handler will need to be configured by creating the corresponding RuntimeClass resources on the cluster as it's not the default runtime.

This needs a change in the shim that actually sets the HyperV field on the runtime spec to work properly, see here: microsoft/hcsshim#1388 Done

PS C:\Users\TestVMAdmin\Desktop> .\crictl.exe pods
POD ID              CREATED             STATE             NAME       NAMESPACE      ATTEMPT      RUNTIME
363319e093c7a       11 minutes ago      Ready             pod        default        1            runhcs-wcow-hypervisor

PS C:\Users\TestVMAdmin\Desktop> hcsdiag list
363319e093c7a688d8d18ad17f5d384e60cb805ba7bf128ea93f0613af5483e9@vm
    VM,                         Running, C2C42EC1-5040-5D0A-AC86-F4BACF6B75CB, containerd-shim-runhcs-v1.exe

Comment thread pkg/cri/config/config_windows.go Outdated
@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 6, 2022

Comment thread pkg/cri/config/config_windows.go Outdated
Comment thread pkg/cri/config/config_windows.go Outdated
@kevpar
Copy link
Copy Markdown
Member

kevpar commented May 6, 2022

Given that this is part of #6862, will this runtime class work as-is once added, or does it depend on other (yet to come) feature work?

@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 6, 2022

@kevpar It needs a small change in the shim to set the HyperV runtime spec field that's mentioned in the description. Just posted the shim PR here microsoft/hcsshim#1388

@dcantah dcantah force-pushed the add-wcowhyp-runtime branch from e365018 to 8cc87af Compare May 6, 2022 07:52
@dcantah dcantah added this to the 1.7 milestone May 6, 2022
Comment thread pkg/cri/config/config_windows.go
@dcantah dcantah force-pushed the add-wcowhyp-runtime branch from 8cc87af to 75f4f1f Compare May 6, 2022 17:35
Copy link
Copy Markdown
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI. Should we wait till the hcsshim change is in before merging?

@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 6, 2022

LGTM pending CI. Should we wait till the hcsshim change is in before merging?

@kevpar I just checked it in, but if you mean wait until we vendor in a tag that contains it here that would make sense to me. You'd end up getting a process isolated container at the moment without it

@dcantah dcantah changed the title windows: Add runhcs-wcow-hypervisor runtimeclass to the default config [Do not merge] windows: Add runhcs-wcow-hypervisor runtimeclass to the default config May 6, 2022
@dcantah dcantah changed the title [Do not merge] windows: Add runhcs-wcow-hypervisor runtimeclass to the default config windows: Add runhcs-wcow-hypervisor runtimeclass to the default config May 6, 2022
@dcantah dcantah marked this pull request as draft May 6, 2022 20:54
@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 6, 2022

Gonna leave this on draft until we get an hcsshim tag with the work to make use of this in microsoft/hcsshim@18f4761

// ScaleCpuLimitsToSandbox indicates that container CPU limits should
// be adjusted to account for the difference in number of cores between the
// host and UVM. This should only be turned on if SandboxIsolation is 1.
"ScaleCpuLimitsToSandbox": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific here? Is this Shares/Count/Maximum? Or is it just Maximum?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know if that's sufficient

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The more I think about this I dont know if its a correct "static" answer. When I create a spec I will do things like this:

C1.Max = 25% * 100 == 2500
C2.Max = 50% * 100 == 5000
etc.

When I create the default pod size as 4 vCPU vs 8 vCPU I still want that ratio since its "percentage" based. I only want that ratio to change when its "units" (millicores) based. So for example if the spec was:

C1.Cpus == "1500m"
C2.Cpus == "3000m"

Then indeed I have to use "maximum" and I would indeed then need it scaled out of the UVM size. However, given the above and the fact that the pod size is created before the containers, it seems likely that the caller knows the size of the pod as a whole. IE: it will be CEILING(3 + 1.5) == 5 vCPU for the UVM. Given that they know its 5 vCPU's when they schedule, shouldnt the caller then scale the millicores to the pod size?

But I can see why a caller doesnt want this logic as well, and would automagic behavior. Should this be dynamic or at least overrideable when scheduling a pod?

Am I crazy? (<- yes)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has the background behind the fields existence. https://github.com/microsoft/hcsshim/blob/main/internal/hcsoci/hcsdoc_wcow.go#L195-L237

Kind of in the same realm, the thing that sucks about the hyper-v multi-container pod work atm is unlike Kata, we can't hot-add cpus/mem at runtime so a client/user needs to be supply the UVM size up front by calculating what the containers will end up using in total. The Kata folks actually added a change to k8s to have the kubelet itself do this tallying and then just pass it to the runtime, as they'd prefer not to hot-add. From what I remember it's only tallied for Linux pods though😿 https://github.com/containerd/containerd/blob/main/pkg/cri/annotations/annotations.go#L35-L39. We need to do the same for Windows for this hyper-v work, it'd be a lot better of a story

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous testing we found that k8s tests will set a CPU % limit based on the host core count, and then measure how much CPU the guest gets, and expect it to be equal to that % of the host count. Once we introduced hv-isolation, the default of scaling the % to the guest core count became an issue as the k8s tests failed as they did not get as much CPU as expected.

This setting just says "when the orchestrator asks for 50% CPU, make it 50% of the host, rather than 50% of the UVM". In practice this seems to match expectations (like those of the tests) which are setting this value for process-isolation today, and would probably be surprised if switching to hv-isolation suddenly changed things.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jterry75 Does the above address the concerns/comments you had?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it addresses it, I'm just still not sure how it works from k8s perspective because I dont use % in the pod spec. I use core counts. But I will take your word for it that you have thought through this rock solid plan :)

@dcantah dcantah force-pushed the add-wcowhyp-runtime branch from 75f4f1f to 7768ac2 Compare May 10, 2022 23:37
@dcantah dcantah closed this May 11, 2022
@dcantah dcantah reopened this May 11, 2022
@dcantah dcantah marked this pull request as ready for review August 17, 2022 20:22
@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented Aug 17, 2022

Need to rebase this finally 🐳, checked in a shim tag that will allow this.

As part of the effort of getting hypervisor isolated windows container
support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor
handler for the default config. This sets the correct SandboxIsolation
value that the Windows shim uses to differentiate process vs. hypervisor
isolation. This change additionally sets the wcow-process runtime to
passthrough io.microsoft.container* annotations and the hypervisor runtime
to accept io.microsoft.virtualmachine* annotations.

Note that for K8s users this runtime handler will need to be configured by
creating the corresponding RuntimeClass resources on the cluster as it's
not the default runtime.

Signed-off-by: Daniel Canter <[email protected]>
@dcantah dcantah force-pushed the add-wcowhyp-runtime branch from 7768ac2 to f0036cb Compare August 19, 2022 14:59
@kevpar kevpar merged commit de509c0 into containerd:main Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants