Skip to content

Windows snapshotter touch ups and new functionality#6918

Merged
kzys merged 1 commit intocontainerd:mainfrom
dcantah:windows-snapshotter-cleanup
Jun 10, 2022
Merged

Windows snapshotter touch ups and new functionality#6918
kzys merged 1 commit intocontainerd:mainfrom
dcantah:windows-snapshotter-cleanup

Conversation

@dcantah
Copy link
Copy Markdown
Member

@dcantah dcantah commented May 10, 2022

This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.

Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).

With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new snapshotterOpts() function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.

So all in all this change:

  • Gets rid of code to pass CRI annotations as labels down to snapshotters.

  • Gets rid of the functionality to create a 1GB sized scratch disk if
    the client provided a size < 20GB. This code is not used currently and
    has a few logical shortcomings as it won't be able to create the disk
    if a container is already running and using the same base layer. WCIFS
    (driver that handles the unioning of windows container layers together)
    holds open handles to some files that we need to delete to create the
    1GB scratch disk is the underlying problem.

  • Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
    label in favor of a new containerd.io/snapshot.rootfs.size-bytes label.
    The previous label/annotation wasn't being used by us, and from a cursory
    github search wasn't being used by anyone else either. Now that there is a CRI
    field to specify the size, this should just be a field that users can set
    on their pod specs and don't need to concern themselves with what it eventually
    gets translated to, but I've added a more generic label for non-CRI clients.

Verified rootfs size work via:

{
	"metadata": {
		"name": "rootfs-size-test"
	},
	"image": {
		"image": "mcr.microsoft.com/windows/nanoserver:1809"
	},
	"command": [
		"ping",
		"-t",
		"127.0.0.1"
	],
        "windows": {
                "resources": {
                        "rootfs_size_in_bytes": 26843545600 
                }
        }
}
PS C:\Users\Administrator\Desktop> $pod = .\crictl.exe -t 20s --config .\crictl.yaml runp .\configs\pod.json ; $c = .\crictl.e
xe -t 20s create $pod .\configs\ctr.json .\configs\pod.json ; .\crictl.exe -t 20s start $c
e8fce4f3b03d225a8a82a318876abb7534b893d21491d14804d5938ccf45fe50

PS C:\Users\Administrator\Desktop> .\crictl.exe exec -it $c cmd
Microsoft Windows [Version 10.0.17763.2928]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\>dir
 Volume in drive C has no label.
 Volume Serial Number is 5CA1-BDE0

 Directory of C:\

05/05/2022  09:36 AM             5,510 License.txt
05/05/2022  09:42 AM    <DIR>          Users
05/12/2022  07:14 PM    <DIR>          Windows
               1 File(s)          5,510 bytes
               2 Dir(s)  26,647,625,728 bytes free

@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch 2 times, most recently from 2fd2bba to 907ddeb Compare May 10, 2022 11:59
@dcantah dcantah marked this pull request as draft May 10, 2022 12:02
@jterry75
Copy link
Copy Markdown
Contributor

High level comments:

  1. What about sandbox create for snapshot ops?
  2. I don't understand why it's safe to remove the < 20GB logic. Can you explain more here?

@kevpar kevpar self-assigned this May 11, 2022
@kevpar
Copy link
Copy Markdown
Member

kevpar commented May 11, 2022

@dcantah just let me know when you're ready for review.

@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 13, 2022

High level comments:

  1. What about sandbox create for snapshot ops?
  2. I don't understand why it's safe to remove the < 20GB logic. Can you explain more here?

@jterry75

  1. There's no windows resources on the pod sandbox object, so there's nothing for us to check. I guess, would there be any use for having a bigger scratch for the pause container? (or maybe you were thinking of something for the UVMs scratch whenever wcow-hyp works)
  2. wcifs holds open handles to files that a container accesses in the layer which apparently means SYSTEM_BASE/SOFTWARE_BASE and pals, and creating that 1GB scratch requires deleting the directory first https://github.com/microsoft/hcsshim/blob/master/computestorage/helpers.go#L33-L40. So the only time this would work is if a container wasn't already running that's based off of the same base layer (which there isn't many choices 🤣). I think a better approach is to make the 1gb scratch when we're unpacking the image instead somehow. I can add this info to the commit, it's pretty barebones in that regard.. Or we might split this PR up if it's too much crammed in one anyways.

@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch 2 times, most recently from 2d46b08 to e0b0e4a Compare May 13, 2022 02:22
@dcantah dcantah marked this pull request as ready for review May 13, 2022 02:26
@jterry75
Copy link
Copy Markdown
Contributor

@dcantah - Yea I was getting ahead of myself I guess. Was thinking about wcow/lcow-hyp. This is fine for wcow-process

@jterry75
Copy link
Copy Markdown
Contributor

Also, I should have read 2 first lol. You just blew my mind. Are you telling me that the writable layer is intrinsically tied to the prepared layers once a single container is running? That doesn't sound right. It's just an empty disk right? Its irrespective of the chain underneath it right?

@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 16, 2022

Also, I should have read 2 first lol. You just blew my mind. Are you telling me that the writable layer is intrinsically tied to the prepared layers once a single container is running? That doesn't sound right. It's just an empty disk right? Its irrespective of the chain underneath it right?

Sorry that was poorly worded/just plain wrong. wcifs will have a handle to files that are accessed in the scratch during the containers uptime.

@dcantah dcantah marked this pull request as draft May 17, 2022 01:05
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch 2 times, most recently from a9851ad to d7d66fe Compare May 17, 2022 03:03
@dcantah dcantah marked this pull request as ready for review May 17, 2022 03:03
@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented May 17, 2022

@kevpar @jterry75 Okay, this is ready for review.


log.G(ctx).Debugf("Container %q spec: %#+v", id, spew.NewFormatter(spec))

snapshotterOpt := snapshots.WithLabels(snapshots.FilterInheritedLabels(config.Annotations))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not pass annotations as snapshot labels anymore?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the reasoning is that the only label names that passed this check either have two /, or are containerd.io/snapshot.ref. The first seems valid as long as we decide to not care about CRI API outside of Kubelet (a decent decision IMO). The second we should double check to see if anyone is doing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I got rid of it because for actual Kubernetes usage it was doing nothing, and to be more specific we'd never even get as far as Containerd as an annotation in the containerd.io/snapshot/* form would fail validation. It seemed sane to me to not leave in something that would only take effect for crictl/some other cri client that isn't the Kubelet itself.

Copy link
Copy Markdown
Member Author

@dcantah dcantah May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's a good point on the second.. let me check. I think that's actually used for remote snapshotters now that I'm thinking of it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, the label is set by the client itself during unpack. I think we're ok here. Don't see anybody passing this as an annotation to be set for the rootfs snapshot (which wouldn't make sense afaik)

labels[labelSnapshotRef] = chainID

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe enough since k8s doesn't seem to provide a mechanism for the user to set annotations directly on the container spec, and even if they it seems unlikely someone would be manually setting containerd.io/snapshot.ref with a correct chain ID and depending on this behavior. So while technically a breaking change this seems okay to me. Let's leave this thread open to see if another maintainer agrees though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Comment thread snapshots/windows/windows.go
Comment thread snapshots/windows/windows.go Outdated
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch from d7d66fe to 65314c9 Compare May 18, 2022 12:35
@dcantah dcantah added this to the 1.7 milestone May 18, 2022
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch from 65314c9 to c16b04a Compare May 18, 2022 14:58
Copy link
Copy Markdown
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Comment thread pkg/cri/server/container_create_windows.go Outdated
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch from c16b04a to ca09fca Compare May 18, 2022 21:28
Comment thread pkg/cri/server/container_create_windows.go
Comment thread snapshots/windows/windows.go
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch from ca09fca to bdf4bda Compare May 19, 2022 22:54
Comment thread snapshots/windows/windows.go
Copy link
Copy Markdown
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Danny!

@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented Jun 6, 2022

Seems this needs a rebase. Will do shortly

This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.

Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).

With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.

So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.

- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.

- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.

- Add test to cri integration suite to validate expanding the rootfs size.

Signed-off-by: Daniel Canter <[email protected]>
@dcantah dcantah force-pushed the windows-snapshotter-cleanup branch from bdf4bda to 44e12dc Compare June 6, 2022 22:01
@dcantah
Copy link
Copy Markdown
Member Author

dcantah commented Jun 6, 2022

@kevpar @kzys Anyone able to give a second look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants