Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion pkg/cri/config/config_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,28 @@ func DefaultConfig() PluginConfig {
NoPivot: false,
Runtimes: map[string]Runtime{
"runhcs-wcow-process": {
Type: "io.containerd.runhcs.v1",
Type: "io.containerd.runhcs.v1",
ContainerAnnotations: []string{"io.microsoft.container.*"},
},
"runhcs-wcow-hypervisor": {
Type: "io.containerd.runhcs.v1",
PodAnnotations: []string{"io.microsoft.virtualmachine.*"},
ContainerAnnotations: []string{"io.microsoft.container.*"},
// Full set of Windows shim options:
// https://pkg.go.dev/github.com/Microsoft/hcsshim/cmd/containerd-shim-runhcs-v1/options#Options
Options: map[string]interface{}{
Comment thread
kevpar marked this conversation as resolved.
// SandboxIsolation specifies the isolation level of the sandbox.
// PROCESS (0) and HYPERVISOR (1) are the valid options.
"SandboxIsolation": 1,
// ScaleCpuLimitsToSandbox indicates that the containers CPU
// maximum value (specifies the portion of processor cycles that
// a container can use as a percentage times 100) should be adjusted
// to account for the difference in the number of cores between the
// host and UVM.
//
// This should only be turned on if SandboxIsolation is 1.
"ScaleCpuLimitsToSandbox": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific here? Is this Shares/Count/Maximum? Or is it just Maximum?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know if that's sufficient

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The more I think about this I dont know if its a correct "static" answer. When I create a spec I will do things like this:

C1.Max = 25% * 100 == 2500
C2.Max = 50% * 100 == 5000
etc.

When I create the default pod size as 4 vCPU vs 8 vCPU I still want that ratio since its "percentage" based. I only want that ratio to change when its "units" (millicores) based. So for example if the spec was:

C1.Cpus == "1500m"
C2.Cpus == "3000m"

Then indeed I have to use "maximum" and I would indeed then need it scaled out of the UVM size. However, given the above and the fact that the pod size is created before the containers, it seems likely that the caller knows the size of the pod as a whole. IE: it will be CEILING(3 + 1.5) == 5 vCPU for the UVM. Given that they know its 5 vCPU's when they schedule, shouldnt the caller then scale the millicores to the pod size?

But I can see why a caller doesnt want this logic as well, and would automagic behavior. Should this be dynamic or at least overrideable when scheduling a pod?

Am I crazy? (<- yes)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has the background behind the fields existence. https://github.com/microsoft/hcsshim/blob/main/internal/hcsoci/hcsdoc_wcow.go#L195-L237

Kind of in the same realm, the thing that sucks about the hyper-v multi-container pod work atm is unlike Kata, we can't hot-add cpus/mem at runtime so a client/user needs to be supply the UVM size up front by calculating what the containers will end up using in total. The Kata folks actually added a change to k8s to have the kubelet itself do this tallying and then just pass it to the runtime, as they'd prefer not to hot-add. From what I remember it's only tallied for Linux pods though😿 https://github.com/containerd/containerd/blob/main/pkg/cri/annotations/annotations.go#L35-L39. We need to do the same for Windows for this hyper-v work, it'd be a lot better of a story

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous testing we found that k8s tests will set a CPU % limit based on the host core count, and then measure how much CPU the guest gets, and expect it to be equal to that % of the host count. Once we introduced hv-isolation, the default of scaling the % to the guest core count became an issue as the k8s tests failed as they did not get as much CPU as expected.

This setting just says "when the orchestrator asks for 50% CPU, make it 50% of the host, rather than 50% of the UVM". In practice this seems to match expectations (like those of the tests) which are setting this value for process-isolation today, and would probably be surprised if switching to hv-isolation suddenly changed things.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jterry75 Does the above address the concerns/comments you had?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it addresses it, I'm just still not sure how it works from k8s perspective because I dont use % in the pod spec. I use core counts. But I will take your word for it that you have thought through this rock solid plan :)

},
},
},
},
Expand Down