What is the problem you're trying to solve
We'd like to support launching hypervisor isolated Windows containers through the CRI entry point to light up this scenario for K8s. There's support to launch Hyper-V containers present in Containerd itself via the WithWindowsHyperV client option, as well as the ctr testing tools –isolation flag, however there is nothing in the CRI plugin that makes use of this functionality at the moment.
Describe the solution you'd like
There's a few spots that would need to change to add in "full" support, but at least in the 1.7 timeframe for getting in the minimal amount needed to launch/manage these containers, there's not a great deal.
Initial Support (1.7 timeframe)
Filling in the HyperV runtime spec field
The Windows Containerd shim exposes a SandboxIsolation enum that can be used to tell the shim what kind of container/pod to launch. This field in combination with new runtime class definitions in Containerd is how we can differentiate between process and hypervisor isolation for Windows. Below is an example pod spec and runtime class definition in Containerds config file:
kind: Deployment
metadata:
name: wcow-test
spec:
replicas: 2
selector:
matchLabels:
app: wcow
template:
metadata:
labels:
app: wcow
spec:
runtimeClassName: runhcs-wcow-hypervisor <----------------
containers:
- name: servercore
image: mcr.microsoft.com/windows/servercore:1809
ports:
- containerPort: 80
protocol: TCP
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = "io.containerd.runhcs.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor.options]
Debug = true
DebugType = 2
SandboxImage = "mcr.microsoft.com/windows/servercore:1809"
SandboxPlatform = "windows/amd64"
SandboxIsolation = 1 <-------------------
We can also additionally expand on what the default CRI config can be in Containerd for Windows if not supplied in the config file. We would have to continually update this to include new runtimes anytime a new OS release/container image pair is made available.
// DefaultConfig returns default configurations of CRI plugin.
func DefaultConfig() PluginConfig {
//
// New Additions
//
ws2019Opts := options.Options{
SandboxImage: "mcr.microsoft.com/windows/nanoserver:1809",
SandboxPlatform: "windows/amd64",
SandboxIsolation: options.Options_HYPERVISOR,
}
ws2022Opts := options.Options{
SandboxImage: "mcr.microsoft.com/windows/nanoserver:ltsc2022",
SandboxPlatform: "windows/amd64",
SandboxIsolation: options.Options_HYPERVISOR,
}
//
// End of new additions
//
return PluginConfig{
CniConfig: CniConfig{
NetworkPluginBinDir: filepath.Join(os.Getenv("ProgramFiles"), "containerd", "cni", "bin"),
NetworkPluginConfDir: filepath.Join(os.Getenv("ProgramFiles"), "containerd", "cni", "conf"),
NetworkPluginMaxConfNum: 1,
NetworkPluginConfTemplate: "",
},
ContainerdConfig: ContainerdConfig{
Snapshotter: containerd.DefaultSnapshotter,
DefaultRuntimeName: "runhcs-wcow-process",
NoPivot: false,
Runtimes: map[string]Runtime{
"runhcs-wcow-process": {
Type: "io.containerd.runhcs.v1",
ContainerAnnotations: []string{"io.microsoft.container.*"},
},
//
// New additions
//
"runhcs-wcow-hypervisor-1809": {
Type: "io.containerd.runhcs.v1",
PodAnnotations: []string{"io.microsoft.virtualmachine.*"},
ContainerAnnotations: []string{"io.microsoft.container.*"},
Options: ws2019Opts,
},
"runhcs-wcow-hypervisor-17763": {
Type: "io.containerd.runhcs.v1",
PodAnnotations: []string{"io.microsoft.virtualmachine.*"},
ContainerAnnotations: []string{"io.microsoft.container.*"},
Options: ws2019Opts,
},
"runhcs-wcow-hypervisor-20348": {
Type: "io.containerd.runhcs.v1",
PodAnnotations: []string{"io.microsoft.virtualmachine.*"},
ContainerAnnotations: []string{"io.microsoft.container.*"},
Options: ws2022Opts,
},
"runhcs-wcow-hypervisor-21H2": {
Type: "io.containerd.runhcs.v1",
PodAnnotations: []string{"io.microsoft.virtualmachine.*"},
ContainerAnnotations: []string{"io.microsoft.container.*"},
Options: ws2022Opts,
},
//
// End of new additions
//
},
},
… Omitted other fields …
}
}
Resource Limits For the VM
One way that the Windows shim supports setting resource limits (memory, vcpu count) for the lightweight VM is via annotations. The virtual machine based annotations all begin with io.microsoft.virtualmachine.*, so playing into the last section above would be to allow these annotations via the PodAnnotations and ContainerAnnotations fields as shown.
An example pod spec asking for the VM hosting the containers in the pod to boot with 4GB of memory and 4 vps is below:
apiVersion: v1
kind: Pod
metadata:
name: wcow-test
labels:
app: wcow
annotations:
io.microsoft.virtualmachine.computetopology.memory.sizeinmb: "4096"
io.microsoft.virtualmachine.computetopology.processor.count: "4"
spec:
replicas: 2
selector:
matchLabels:
app: wcow
spec:
runtimeClassName: runhcs-wcow-hypervisor <----------------
containers:
- name: servercore
image: mcr.microsoft.com/windows/servercore:1809
ports:
- containerPort: 80
protocol: TCP
Another way resource limits could be set, although the values would be fixed for the duration of a deployment unless Containerd was restarted or the value was overrode by specifying an annotation, would be the vm_process_count and vm_memory_size_in_mb fields that are present in the Windows shim specific options.
This could be extended further by having the runtime class specify the resource limits in the name. For example runhcs-wcow-hypervisor-20348-1vp2gb:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor-20348-1vp2gb.options]
Debug = true
DebugType = 2
SandboxPlatform = "windows/amd64"
SandboxIsolation = 1
VmProcessorCount = 1
VmMemorySizeInMb = 2048
Testing
This is tricky as Github actions runners don't support nested virtualization, we'll likely need to do something similar to the approach the Windows periodic tests use and allocate az vms to do our bidding (https://github.com/containerd/containerd/blob/main/.github/workflows/windows-periodic.yml). This might be the most work..
"Full Support"
Pulling images that don't match hosts build
One of the pros for Hyper-V containers is that you're not constrained to the Windows hosts build number for image choice (ws2019 host no longer has to only use a 1809/ws2019 image). However, the Windows platform matching code is finnicky and tough to get right, and the main selling point for these containers is really security. I'd be alright punting figuring out the platform package changes until we know what's the right approach, and just get in the work to be able to launch these in general.
Resource Limits Looking Forward
There's platform limitations to supporting vcpu hot-add, but ideally k8s would tally up the total resource limits by adding up the container resource limits in the pod and sending it in some field for Windows. If that does come to fruition then we'll need to do something with this data. Writing this down for future reference mainly
Additional context
Thanks for reading the wall of text :)
Tracking
1.7
Future
What is the problem you're trying to solve
We'd like to support launching hypervisor isolated Windows containers through the CRI entry point to light up this scenario for K8s. There's support to launch Hyper-V containers present in Containerd itself via the WithWindowsHyperV client option, as well as the ctr testing tools –isolation flag, however there is nothing in the CRI plugin that makes use of this functionality at the moment.
Describe the solution you'd like
There's a few spots that would need to change to add in "full" support, but at least in the 1.7 timeframe for getting in the minimal amount needed to launch/manage these containers, there's not a great deal.
Initial Support (1.7 timeframe)
Filling in the HyperV runtime spec field
The Windows Containerd shim exposes a SandboxIsolation enum that can be used to tell the shim what kind of container/pod to launch. This field in combination with new runtime class definitions in Containerd is how we can differentiate between process and hypervisor isolation for Windows. Below is an example pod spec and runtime class definition in Containerds config file:
We can also additionally expand on what the default CRI config can be in Containerd for Windows if not supplied in the config file. We would have to continually update this to include new runtimes anytime a new OS release/container image pair is made available.
Resource Limits For the VM
One way that the Windows shim supports setting resource limits (memory, vcpu count) for the lightweight VM is via annotations. The virtual machine based annotations all begin with
io.microsoft.virtualmachine.*, so playing into the last section above would be to allow these annotations via thePodAnnotationsandContainerAnnotationsfields as shown.An example pod spec asking for the VM hosting the containers in the pod to boot with 4GB of memory and 4 vps is below:
Another way resource limits could be set, although the values would be fixed for the duration of a deployment unless Containerd was restarted or the value was overrode by specifying an annotation, would be the vm_process_count and vm_memory_size_in_mb fields that are present in the Windows shim specific options.
This could be extended further by having the runtime class specify the resource limits in the name. For example runhcs-wcow-hypervisor-20348-1vp2gb:
Testing
This is tricky as Github actions runners don't support nested virtualization, we'll likely need to do something similar to the approach the Windows periodic tests use and allocate az vms to do our bidding (https://github.com/containerd/containerd/blob/main/.github/workflows/windows-periodic.yml). This might be the most work..
"Full Support"
Pulling images that don't match hosts build
One of the pros for Hyper-V containers is that you're not constrained to the Windows hosts build number for image choice (ws2019 host no longer has to only use a 1809/ws2019 image). However, the Windows platform matching code is finnicky and tough to get right, and the main selling point for these containers is really security. I'd be alright punting figuring out the platform package changes until we know what's the right approach, and just get in the work to be able to launch these in general.
Resource Limits Looking Forward
There's platform limitations to supporting vcpu hot-add, but ideally k8s would tally up the total resource limits by adding up the container resource limits in the pod and sending it in some field for Windows. If that does come to fruition then we'll need to do something with this data. Writing this down for future reference mainly
Additional context
Thanks for reading the wall of text :)
Tracking
1.7
Future