-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add support for cimfs snapshotter & differ #8807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @ambarve. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
80b7a6e to
65dba00
Compare
integration/client/go.mod
Outdated
| replace github.com/AdaLogics/go-fuzz-headers => github.com/AdamKorcz/go-fuzz-headers-1 v0.0.0-20230111232327-1f10f66a31bf | ||
| replace ( | ||
| github.com/AdaLogics/go-fuzz-headers => github.com/AdamKorcz/go-fuzz-headers-1 v0.0.0-20230111232327-1f10f66a31bf | ||
| github.com/Microsoft/hcsshim => github.com/ambarve/hcsshim v0.9.1-0.20230715003907-4c7289998f8a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this PR still WIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP is only for the vendoring commit. Once the hcsshim PRs are merged and we have a tag on hcsshim, I will update the go.mod with that tag and removed the WIP tag.
|
I see that We just discussed the same topic in #8789: containerd has several mount/unmount codepaths that bypass the snapshotter and ended up with "containerd snapshotter API needs to be improved". But in that case it can be delayed (there's only one snapshotter anyway). But I don't see how current logic can work with WCIFS+CimFS. |
|
@slonopotamus CimFs works evne without making any changes to the existing mount APIs because
|
|
Just in case: I'm not insisting on any changes in current PR, I'm just trying to build a mental model about all this mount stuff in my head) |
|
@AkihiroSuda & @slonopotamus, I have thought about the Mounts of CimFs based snapshots and I have decided to not include support for Mounting/Unmounting of CimFs based snapshots in this PR. Let me know what you think. P.S: This PR is still marked as WIP because the hcsshim commit that this PR vendors in is still from my private branch. But rest of the code is ready to review. |
0604de4 to
c67c646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been through this heavily, but my understanding is that this snapshotter is just a thin wrapper around the existing windows snapshotter, to insert imported-from-content-store CIMs as parent layers, but then subsequent locally-created WCIFS layers are used as is the case today, so a container running a locally-built container images will look like CIMFS+CIMFS+WCIFS+WCIFS+Scratch.
Why isn't this part of the wcowSnapshotter, enabled when OS support is present, and maybe with a feature flag? As it stands, this only applies to layers imported from the content store, which wcowSnapshotter is already aware of and handles specially, so being more special should be fine, and won't require users to make config changes.
For its current use-case of imported-from-content-store layers, the only thing you lose is local host mount support for layer-creation use, and I understand it's desired to fix that. The current windowsSnapshotter can't export/diff imported-from-content-store layers either, so that lack isn't a blocker for this to be the default snapshotter on Windows.
Anyway, if I'm parsing this correctly, the snapshotter as-is would be suitable to use in a CRI config under a k8s cluster or similar situation where you only ever import layers from the content store and run a single scratch layer on top of them.
Hmm... Does CimFS (or UnionFS) support not creating a scratch layer at all for a read-only root mount? I'm not sure I've ever tried read-only root on a Windows container.
It seems most of the heavy lifted is done by hcsshim when it sees a cimfs mount type, and there's nothing equivalent to func (s *wcowSnapshotter) convertScratchToReadOnlyLayer for creating new CIM layers.
In the future, it'd be nice if convertScratchToReadOnlyLayer or equivalent could output CIM layers, since that'd let us speed-up build times for large filesystem layers (which currently pay the price of backuptar twice, see #5384), and not require a push-delete-and-pull to take advantage of the speed increases from CIM usage when creating the next layer.
Separately, if we had tar export support for CIM layers, then we could delete the local sandbox.vhdx after convertScratchToReadOnlyLayer, and save like 50% of the storage cost of locally-built Windows container layers. (As noted in #5384, this might already have been solved in the RS5 computestorage API, but I never got any of those working in containerd before I went on sabbatical.)
Edit: I looked again at the CimFS Windows API docs, and there's actually nothing to extract data from a CIM except to mount it as a filesystem? It's a write-only API? Maybe we end up needing Windows-equivalent of the walking differ, and that solves #5384 too.
Lack of mounting support will prevent the snapshot test suite being able to be run. (The Import/Export Layer tests can still run without mounting, but you also don't have Export support here, so it's really just the ImportLayer test that can be run in the current state.)
My current understanding from the discussion is that the lack of mounting support is more about containerd mount management than underlying infeasibility.
We'd need to support CIMFS+CIMFS+WCIFS+Scratch mounting as single volume or mount-point. That support could be exported from hcsshim, and could also support WCIFS-Base+WCIFS+Scratch in a single API (or two, one CIMFS and one non-CIMFS, since we know the CIMFS-ness from the Mount object), hiding the Apply/Prepare dance from containerd, (and maybe even the BindFS call? Maybe UnionFS replaces this call?) i.e. Windows mount code could look like Windows Apply code, and be just a single hand-off to hcsshim rather than a bunch of hcsshim API calls with no intervening logic.
That said, I also haven't been through the hcsshim-side code, so I don't know if this is actually possible, or if CIMFS+CIMFS+WCIFS+Scratch is actually only possible inside a container. (Although that'd block host-process too, right? Or maybe the recent silo changes made this trade-off, in exchange for supporting the virtual filesystem thing whose name escapes me.)
It'd be nice if we also get read-only mount support (CIMFS+CIMFS+WCIFS, no scratch) out of this to save creating a new sandbox VHDX for Snapshotter.View, but that's only tangential here. (Maybe "UnionFS" can solve this? Will it support WCIFS layers too?)
archive/tar_opts_windows.go
Outdated
| // Container Layer written in the CIM format. The caller must be holding | ||
| // SeBackupPrivilege and SeRestorePrivilege. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #8346, I thought these privileges were not going to be necessary anymore?
I see the same comment at https://github.com/ambarve/hcsshim/blob/26f23d5c59383ccb23e8d7a99c80c9fc29cdf62f/pkg/ociwclayer/cim/import.go#L25-L26, but can't see anything at https://learn.microsoft.com/en-us/windows/win32/api/_cimfs/ that explains this need.
I also note that the differ appears to deliberately not take these privileges, i.e. that's the main difference between windowsDiff.Apply and cimDiff.Apply, so I assume this is a documentation cut-and-paste issue.
I see the CimFS sample code takes those privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I forgot to update these comments. I have fixed it here. I will fix it in hcsshim too.
|
I like the idea of combining the cimfs snapshotter with the existing windows snapshotter. Originally, we created a separate snapshotter because the cimfs snapshotter used to automatically mount the cim whenever a scratch layer was created with a cim based parent layer (hcsshim didn't mount cims in that case). But now that we don't have this, I am open to the idea of combining these two snapshotters. Let me look into it a bit more. When you say, "The current windowsSnapshotter can't export/diff imported-from-content-store layers either", what do you mean? A snapshotter isn't responsible for generating the diff, right? For now, the only intended use of this snapshotter is for imported image layers. We currently don't support exorting a cim based layer or a scratch layer (created with UnionFS) to tar but we do plan to add that support in the near future. Mounting support, especially in a way that handles both legacy (WCIFS) & CimFS based layers is somewhat complicated. For example, current mount support for windows snapshots is only used for locally mounting the snapshots (for image builds etc.). It doesn't handle the snapshot mounts by a shim process. Plus, even if we add the mount support for CimFS based snapshots in containerd right now, that won't be of much use as layer export isn't supported. That's why, we plan to get this PR merged in first and then continue working on adding layer export & mount support. Btw, I would like to clarify that we won't be using WCIFS with CimFS at all. CimFS will be used with UnionFS. |
72a2142 to
17c626d
Compare
What I mean is that even with the existing WCIFS-based Snapshotter, when you import a layer from the content store, you only have the WCIFS read-only layer (files-on-disk). And hcsshim's v1 layer API can only generate a tar stream from diffing a WCIFS read-write layer (VHDX file) against its parent read-only layer, not from a read-only layer against its parent read-only layer. So yeah, the Snapshotter doesn't create the diff, I just tie them together in my head. Frankly, I feel like a Snapshot plugin should be able to provide tarstream handlers in a neater way than the existing Anyway, see containerd/snapshots/windows/windows.go Lines 495 to 500 in a3c68d1
It's possible hcsshim's v2 layer API (computestorage) allows exporting from diffing two WCIFS read-only layers (there's a constant that suggests this is possible), but I never got computestorage's hcsshim projection able to do anything before I went on sabbatical, so I never tried it. Your comment makes sense to me. We definitely need export before mount (I'm somewhat familiar with the ordering of features needed here. ^_^) so I'm mostly interested in what's future-feasible versus what's impossible, in terms of how this feature is positioned/advertised/used. I was thinking about this, and it seemed like the CimFS C++ API docs implied that the intent was that a single CIM would hold multiple layers, e.g. the deduplication features and forks, but this implementation is using a CIM-per-layer? I thought the multiple layers in one big CIM file would be nice, because that would deduplicate between multiple patch versions of the Windows OS second layer, which I suspect is 95% identical between adjacent patches; I haven't checked this though, so that's ±95%. |
|
I will have to investigate a bit more to find out if currently there is a way of generating a diff from two read-only layers. But for CimFs, we do plan on adding that support i.e. we will add a way to read a layer CIM into a tar.
Edit: |
|
In general, I think the interface between containerd and the hcsshim code (primarily the For instance, rather than passing a path to a directory in the CimFS mount, and assuming that for directory I hope that this can reduce coupling between the two packages, e.g. so that they no longer need to agree on a common directory structure with The way I view this is the code in hcsshim should be written to be generic as to where it's being used (for instance, it doesn't need to know what a "snapshotter" is or how it's organized). Then the containerd code enforces the specific structure and organization that it wants for the snapshotter. |
bc22f4c to
eb2314b
Compare
Discussed this with Kevin offline. We want to get initial CimFS changes merged before containerd 2.0 release. To avoid any further delays for that plans we will merge the CimFS changes without any further refactoring of the layer management code in hcsshim. However, as soon as these changes are merged, we will make additional changes to refactor layer management functions in hcsshim so that they allow a more flexible interface for different types of snapshotter Mounts when running containers. |
vendor/github.com/Microsoft/hcsshim/pkg/ociwclayer/cim/import.go
Outdated
Show resolved
Hide resolved
52f79cb to
175722d
Compare
kevpar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to get the code in to enable usage of CimFS in 2.0.
As discussed previously, I still believe more work is needed to better decouple the containerd snapshotter/differ from the implementation in hcsshim. Amit is planning on doing this work next, and I don't think this PR needs to be blocked on it. The decoupling work should be done before CimFS becomes the default snapshotter (which I guess would likely be for 2.1 at this point).
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Details about CimFs project are discussed in containerd#8346 Signed-off-by: Amit Barve <[email protected]>
Details described in #8346