Skip to content

Windows Hyper-V Container Support For CRI #6862

@dcantah

Description

@dcantah

What is the problem you're trying to solve

We'd like to support launching hypervisor isolated Windows containers through the CRI entry point to light up this scenario for K8s. There's support to launch Hyper-V containers present in Containerd itself via the WithWindowsHyperV client option, as well as the ctr testing tools –isolation flag, however there is nothing in the CRI plugin that makes use of this functionality at the moment.

Describe the solution you'd like

There's a few spots that would need to change to add in "full" support, but at least in the 1.7 timeframe for getting in the minimal amount needed to launch/manage these containers, there's not a great deal.

Initial Support (1.7 timeframe)

Filling in the HyperV runtime spec field

The Windows Containerd shim exposes a SandboxIsolation enum that can be used to tell the shim what kind of container/pod to launch. This field in combination with new runtime class definitions in Containerd is how we can differentiate between process and hypervisor isolation for Windows. Below is an example pod spec and runtime class definition in Containerds config file:

kind: Deployment
metadata:
  name: wcow-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: wcow
  template:
    metadata:
      labels:
        app: wcow
    spec:
      runtimeClassName: runhcs-wcow-hypervisor  <----------------
      containers:
      - name: servercore
        image: mcr.microsoft.com/windows/servercore:1809
        ports:
        - containerPort: 80
          protocol: TCP
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = "io.containerd.runhcs.v1"
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor.options]
          Debug = true
          DebugType = 2
          SandboxImage = "mcr.microsoft.com/windows/servercore:1809"
          SandboxPlatform = "windows/amd64"
          SandboxIsolation = 1 <-------------------

We can also additionally expand on what the default CRI config can be in Containerd for Windows if not supplied in the config file. We would have to continually update this to include new runtimes anytime a new OS release/container image pair is made available.

// DefaultConfig returns default configurations of CRI plugin.
func DefaultConfig() PluginConfig {
     //
     // New Additions
     //
    ws2019Opts := options.Options{
        SandboxImage:     "mcr.microsoft.com/windows/nanoserver:1809",
        SandboxPlatform:  "windows/amd64",
        SandboxIsolation: options.Options_HYPERVISOR,
    }
    ws2022Opts := options.Options{
        SandboxImage:     "mcr.microsoft.com/windows/nanoserver:ltsc2022",
        SandboxPlatform:  "windows/amd64",
        SandboxIsolation: options.Options_HYPERVISOR,
    }
    // 
    // End of new additions
    //
    return PluginConfig{
        CniConfig: CniConfig{
            NetworkPluginBinDir: filepath.Join(os.Getenv("ProgramFiles"), "containerd", "cni", "bin"),
            NetworkPluginConfDir: filepath.Join(os.Getenv("ProgramFiles"), "containerd", "cni", "conf"),
            NetworkPluginMaxConfNum:   1,
            NetworkPluginConfTemplate: "",
        },
        ContainerdConfig: ContainerdConfig{
            Snapshotter:        containerd.DefaultSnapshotter,
            DefaultRuntimeName: "runhcs-wcow-process",
            NoPivot:            false,
            Runtimes: map[string]Runtime{
                "runhcs-wcow-process": {
                    Type:                 "io.containerd.runhcs.v1",
                    ContainerAnnotations: []string{"io.microsoft.container.*"},
                },
                //
                // New additions
                //
                "runhcs-wcow-hypervisor-1809": {
                    Type:                 "io.containerd.runhcs.v1",
                    PodAnnotations:       []string{"io.microsoft.virtualmachine.*"},
                    ContainerAnnotations: []string{"io.microsoft.container.*"},
                    Options:              ws2019Opts,
                },
                "runhcs-wcow-hypervisor-17763": {
                    Type:                 "io.containerd.runhcs.v1",
                    PodAnnotations:       []string{"io.microsoft.virtualmachine.*"},
                    ContainerAnnotations: []string{"io.microsoft.container.*"},
                    Options:              ws2019Opts,
                },
                "runhcs-wcow-hypervisor-20348": {
                    Type:                 "io.containerd.runhcs.v1",
                    PodAnnotations:       []string{"io.microsoft.virtualmachine.*"},
                    ContainerAnnotations: []string{"io.microsoft.container.*"},
                    Options:              ws2022Opts,
                },
                "runhcs-wcow-hypervisor-21H2": {
                    Type:                 "io.containerd.runhcs.v1",
                    PodAnnotations:       []string{"io.microsoft.virtualmachine.*"},
                    ContainerAnnotations: []string{"io.microsoft.container.*"},
                    Options:              ws2022Opts,
                },
                //
                // End of new additions
                //
            },
        },
        … Omitted other fields …
    }
}

Resource Limits For the VM

One way that the Windows shim supports setting resource limits (memory, vcpu count) for the lightweight VM is via annotations. The virtual machine based annotations all begin with io.microsoft.virtualmachine.*, so playing into the last section above would be to allow these annotations via the PodAnnotations and ContainerAnnotations fields as shown.

An example pod spec asking for the VM hosting the containers in the pod to boot with 4GB of memory and 4 vps is below:

apiVersion: v1
kind: Pod
metadata:
  name: wcow-test
  labels:
        app: wcow
  annotations:
          io.microsoft.virtualmachine.computetopology.memory.sizeinmb: "4096"
          io.microsoft.virtualmachine.computetopology.processor.count: "4"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: wcow  
    spec:
      runtimeClassName: runhcs-wcow-hypervisor  <----------------
      containers:
      - name: servercore
        image: mcr.microsoft.com/windows/servercore:1809
        ports:
        - containerPort: 80
          protocol: TCP

Another way resource limits could be set, although the values would be fixed for the duration of a deployment unless Containerd was restarted or the value was overrode by specifying an annotation, would be the vm_process_count and vm_memory_size_in_mb fields that are present in the Windows shim specific options.

This could be extended further by having the runtime class specify the resource limits in the name. For example runhcs-wcow-hypervisor-20348-1vp2gb:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor-20348-1vp2gb.options]
    Debug = true
    DebugType = 2
    SandboxPlatform = "windows/amd64"
    SandboxIsolation = 1
    VmProcessorCount = 1
    VmMemorySizeInMb = 2048

Testing

This is tricky as Github actions runners don't support nested virtualization, we'll likely need to do something similar to the approach the Windows periodic tests use and allocate az vms to do our bidding (https://github.com/containerd/containerd/blob/main/.github/workflows/windows-periodic.yml). This might be the most work..

"Full Support"

Pulling images that don't match hosts build

One of the pros for Hyper-V containers is that you're not constrained to the Windows hosts build number for image choice (ws2019 host no longer has to only use a 1809/ws2019 image). However, the Windows platform matching code is finnicky and tough to get right, and the main selling point for these containers is really security. I'd be alright punting figuring out the platform package changes until we know what's the right approach, and just get in the work to be able to launch these in general.

Resource Limits Looking Forward

There's platform limitations to supporting vcpu hot-add, but ideally k8s would tally up the total resource limits by adding up the container resource limits in the pod and sending it in some field for Windows. If that does come to fruition then we'll need to do something with this data. Writing this down for future reference mainly

Additional context

Thanks for reading the wall of text :)

Tracking

1.7

Future

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions