Skip to content

Implement Windows snapshotter and differ#1969

Merged
dmcgowan merged 6 commits intocontainerd:masterfrom
darstahl:WindowsSnapshotter5
Jan 25, 2018
Merged

Implement Windows snapshotter and differ#1969
dmcgowan merged 6 commits intocontainerd:masterfrom
darstahl:WindowsSnapshotter5

Conversation

@darstahl
Copy link
Copy Markdown
Contributor

@darstahl darstahl commented Jan 5, 2018

This implements the snapshotter and differ for Windows container layers, and enables them for use by ctr run.

It is currently split into two commits, one to implement the snapshotter and differ, and one to start using it in ctr. I can split this into two PRs if it would be helpful.

The snapshotter View method currently creates a new layer, but it should really just use the already created layer and create Mounts to use it. This works fine currently, but is needed for DiffMounts

TODO:

  • Change snapshotter View to not create a new layer

  • Enable tests to start using snapshotter and differ

  • Move getFileSystemType to continuity

A followup PR needs to implement Windows layer Diff, and changes the snapshotter commit to read the changes out of the sandbox layer, and put them in a Read only layer, as the current implementation splits changes to the VHD from those to the on disk layer.

@stevvooe @dmcgowan @crosbymichael @jessvalarezo

@darstahl
Copy link
Copy Markdown
Contributor Author

darstahl commented Jan 5, 2018

I'm creating this new PR as a replacement for #1613 because so much has changed since then that a new PR makes sense to clean up the noise.

Comment thread mount/mount_windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something going to be done after this defer at some point in the future? Right now it is not doing anything

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had something after the defer in a previous version. Removing it.

Comment thread mount/mount_windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't using JSON here still problematic since the comma won't be escaped? What is the main motivation behind encoding with JSON

Copy link
Copy Markdown
Contributor Author

@darstahl darstahl Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was just to pack the parent layer paths in a single mount in a way that would be able to get back the same list when mounting. JSON was used because it would still be readable, but any way to Marshal should be ok, as long as all characters in the path are OK.

I don't think comma is a problem, as on Windows this is not actually used by a mount syscall similar to Unix, but rather is parsed by containerd and passed as extra info to HCS to assemble the layer. I might be missing somewhere where this struct is serialized though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, the serialization from go array should only go to protobuf types and back. We have a ctr command which gives the mount command, but we can just have it output something else on Windows since a Unix mount command doesn't make sense here.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expense of calculating this on Windows? Full walk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the layer will come for free (ish) when I add the followup PR which implements Diff and redo Commit to build an actual layer that works for both process and hyperv isolation modes by diffing the current layer from the parent, and creating a new RO layer.

The size in usage is the one where I still need to work out what to do, and if we even have permission to do a full walk in all cases.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a rename do on Windows?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changes the name, doesn't do any actual data moves (very cheap). The intent is to have a short transaction, and do the heavy lifting outside the transaction.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Member

@dmcgowan dmcgowan Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what is the cost of this destroy? I have some in progress work to make this asynchronous, it would probably be good to do here too

see #1833

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I alluded to in the above comment, this is the costly part of remove. Async would be really great.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why the parentPath gets passed in twice, once as an argument and again as the first element of the path array?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is just a direct map to the platform function, and the argument is not actually used in the platform currently, but can't be changed, as changing platform function signatures without adding a new name is not fun for anyone.

So just legacy.. If I get a chance, I'd like to fix the hcsshim function to not require it, but that requires a breaking change.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for setting quotas?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the default quota is 20GB (since it needs to be stored in a VHD, it cannot be unbounded). It can be expanded to larger than 20GB, but cannot be shrunk.

@dmcgowan
Copy link
Copy Markdown
Member

dmcgowan commented Jan 5, 2018

I am impressed with how straightforward the snapshotter is 😄

Copy link
Copy Markdown
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 😍!

I unfortunately don't have the capacity to test on windows anymore, but nothing wrong jumped out after my first pass through, just nits.

So glad to see this going forward :)

Comment thread diff/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to have those as metrics?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Windows currently supports metrics in containerd. I don't think these are needed as metrics, but it is something I would love to get working on Windows!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prometheus, it should just be a matter of defining and registering the metrics :)

Comment thread diff/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from memory this look like duplicated code from other places in the repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a chunk of shared code here. Is there a good place to store code shared between packages in containerd that might not have use outside containerd? This could be something like func IsCompressedTar(desc ocispec.Descriptor) (bool, error), but I don't see a good place to have that function exist.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe somewhere in the image package?

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for the () version

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread snapshots/windows/utilities.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is only used in snashots/windows/widnows.go it could just be in that file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it, as logging the rollback failures is actually not very useful at all (it only fails if the transaction is already committed, at which point we don't care)

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess (from reading getFileSystemType()) root[0] is the drive letter. https://golang.org/pkg/path/filepath/#VolumeName would make this more obvious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That is much clearer. Didn't know about that function.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if is unnecessary

return info, err

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of tracking commited?

Looking at the implementation of Tx.Rollback I don't think this error is ever relevant. The only error is "transaction was already closed", which I think would only occur if commited == true, so this will never log anything.

Maybe remove the committed, and ignore the error from Rollback() ? Then you can always just defer tx.Rollback(), which seems to be what other implementations are doing.

Copy link
Copy Markdown
Contributor Author

@darstahl darstahl Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. For some reason I thought that rollback shouldn't be called after a commit, but I don't know why. Changing this to be similar to other implementations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the other snapshotters are doing almost the exact same thing such as https://github.com/containerd/containerd/blob/master/snapshots/overlay/overlay.go#L290 (bool rollback instead of bool committed, but the exact same idea) and https://github.com/containerd/containerd/blob/master/snapshots/overlay/overlay.go#L204 (setting t = nil to abandon the rollback instead of using a bool).

Same thing in naive. Looks like the other snapshotters should also be fixed to just use defer tx.Rollback() regardless to me.

Comment thread diff/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return errdefs.ErrNotImplemented?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. fixed.

@darstahl darstahl force-pushed the WindowsSnapshotter5 branch 3 times, most recently from 0d0dc71 to 0da3e63 Compare January 12, 2018 01:23
@darstahl
Copy link
Copy Markdown
Contributor Author

darstahl commented Jan 17, 2018

Windows tests are (sortof) passing! The cleanup is failing due to attempting to delete the root folder via os.Remove rather than hcsshim.DestroyLayer. I'll resolve that tomorrow.

@darstahl darstahl force-pushed the WindowsSnapshotter5 branch from f3d7691 to 7015486 Compare January 19, 2018 20:26
@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 19, 2018

Codecov Report

Merging #1969 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1969   +/-   ##
=======================================
  Coverage   45.42%   45.42%           
=======================================
  Files          96       96           
  Lines        9427     9427           
=======================================
  Hits         4282     4282           
  Misses       4435     4435           
  Partials      710      710
Flag Coverage Δ
#linux 50.34% <ø> (ø) ⬆️
#windows 40.29% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12eaf13...95a0b3a. Read the comment docs.

@darstahl darstahl changed the title [WIP] Implement Windows snapshotter and differ Implement Windows snapshotter and differ Jan 19, 2018
@darstahl
Copy link
Copy Markdown
Contributor Author

All current feedback is addressed, and CI is passing. Only item leftover is to move getFileSystemType to Continuity. I think this is OK to do in a followup so it doesn't block this PR.

Comment thread mount/mount_windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be done after m.GetParentPaths() to avoid allocating the memory if not necessary

Copy link
Copy Markdown
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits, LGTM

Comment thread diff/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prometheus, it should just be a matter of defining and registering the metrics :)

Comment thread mount/mount_windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be a single var block with the above

Comment thread snapshots/windows/windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would do that first to avoid loading the dll if not needed

@darstahl darstahl force-pushed the WindowsSnapshotter5 branch from 80524e1 to 8ed6376 Compare January 22, 2018 23:13
@darstahl
Copy link
Copy Markdown
Contributor Author

darstahl commented Jan 23, 2018

nits fixed and rebased on master. @stevvooe @dmcgowan PTAL

edit: I can clean up the commits back into 2 or 3 if preferred.

This implements the Windows snapshotter and diff Apply function.
This allows for Windows layers to be created, and layers to be pulled
from the hub.

Signed-off-by: Darren Stahl <[email protected]>
This changes the Windows runtime to use the snapshotter and differ
created layers, and updates the ctr commands to use the snapshotter and differ.

Signed-off-by: Darren Stahl <[email protected]>
Signed-off-by: Darren Stahl <[email protected]>
@darstahl darstahl force-pushed the WindowsSnapshotter5 branch from 8ed6376 to 95a0b3a Compare January 23, 2018 23:39
Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no ability to test, but have reviewed the code. LGTM

@jessvalarezo
Copy link
Copy Markdown
Contributor

Cool! LGTM!

A question:
I did a quick search for if runtime.GOOS == "windows" to see which tests are still being skipped on windows. From my understanding, some are skipped due to the snapshotter not being implemented, others because they don't apply to windows. Will the follow-up PR remove some of these skips? I'm seeing the diff tests and a writer commit test (whose functionality you mentioned in the PR description), but am curious what will happen with TestImageIsUnpacked, TestMetadata and TestSnapshotterClient?

@stevvooe
Copy link
Copy Markdown
Member

LGTM

I took a good look at this yesterday and it is looking nice and simple.

Comment thread mount/mount_windows.go
@@ -9,15 +16,70 @@ var (

// Mount to the provided target
func (m *Mount) Mount(target string) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This target is completely unused, but that makes using this function kind of awkward. The concern is that code cannot rely on calling this function to get access to the data through the mount. This could complicate builder use cases which are trying to build off the snapshot pattern. Could this code be unified with the code in windows/hcsshim.go which does mounting to provide the mount at a destination path? Doing the equivalent of a mount + bind mount to achieve this would be fine if that is even possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is awkward to not mount at target.

It is not possible to specify the mount location currently, as the layer is mounted as a volume on the host, not at a path on the host. I think it might be possible to mount the resulting volume on the host at a given path, but this would only be delaying solving the same problem again later, as LCOW layers cannot be mounted on the host regardless due to limitations in NTFS. See #1359 for more context on this.

On Windows, it makes much more sense to get the mounted path, rather than provide a path to mount at. I'm not sure if mounting the volume on the host would fix this for WCOW, but I can attempt if we think it is worth fixing it here before we deal with the LCOW situation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss further later, possibly start to include @tonistiigi in our sync ups

@dmcgowan
Copy link
Copy Markdown
Member

LGTM

@dmcgowan dmcgowan merged commit 7e44035 into containerd:master Jan 25, 2018
@darstahl
Copy link
Copy Markdown
Contributor Author

@jessvalarezo Yes, there are certainly test that need to be enabled on Windows still. I think some of them can already be enabled, but I did not look at all of them yet.

From the looks of it, TestImageIsUnpacked, and TestSnapshotterClient should work. Not sure about TestMetadata, but maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants