Skip to content

Conversation

@slonopotamus
Copy link
Contributor

@slonopotamus slonopotamus commented Jul 9, 2023

This is somewhat an alternative to #5935. I went a completely different route and leveraged FUSE bindfs that provides an equivalent of mount --bind.

This PR mostly consists of tossing functions around, so they better align with platforms. macOS-specific stuff happens in mount/mount_darwin.go. As you can see there, I'm somewhat struggling with Go process API. If there are easier ways to do what I am trying to achieve (see comments), I'm all ears and really willing to simplify the code.

This code is currently being used in rund - a very experimental and WIP containerd shim that allows running macOS inside "containers", lightly isolated from host machine via chroot and (in future) sandbox-exec.


Results of containerd tests:

cd snapshots/native && sudo go test -test.root -v .
<snip>
--- PASS: TestNative (0.00s)
    --- PASS: TestNative/StatActive (0.19s)
    --- PASS: TestNative/StatComitted (0.20s)
    --- PASS: TestNative/TransitivityTest (0.42s)
    --- PASS: TestNative/ViewReadonly (0.24s)
    --- PASS: TestNative/PreareViewFailingtest (0.28s)
    --- PASS: TestNative/Update (0.53s)
    --- PASS: TestNative/RootPermission (0.14s)
    --- PASS: TestNative/CloseTwice (0.15s)
    --- PASS: TestNative/Chown (0.67s)
    --- PASS: TestNative/StatInWalk (0.23s)
    --- PASS: TestNative/Rename (0.70s)
    --- PASS: TestNative/Basic (0.75s)
    --- PASS: TestNative/RemoveIntermediateSnapshot (0.37s)
    --- PASS: TestNative/Walk (0.42s)
    --- PASS: TestNative/MoveFileFromLowerLayer (0.66s)
    --- PASS: TestNative/DeletedFilesInChildSnapshot (0.57s)
    --- PASS: TestNative/RemoveDirectoryInLowerLayer (0.70s)
    --- PASS: TestNative/Remove (0.24s)
    --- PASS: TestNative/DirectoryPermissionOnCommit (0.64s)
    --- PASS: TestNative/LayerFileupdate (4.59s)
    --- PASS: TestNative/128LayersMount (10.62s)
PASS
ok      github.com/containerd/containerd/snapshots/native       10.634s

UPD: actually, with this PR, macOS passes the whole sudo make test test-root

@k8s-ci-robot
Copy link

Hi @slonopotamus. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

)

// fuseSuperMagic is defined in statfs(2)
const fuseSuperMagic = 0x65735546
Copy link
Contributor Author

@slonopotamus slonopotamus Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can possibly be enabled on FreeBSD, but it needs testing, so I'd prefer to leave it Linux-specific as it was for now.

@slonopotamus slonopotamus changed the title Add support for mounts on Darwin Add support for bind-mounts on Darwin (a.k.a. "make native snapshotter work") Jul 9, 2023
@dcantah
Copy link
Member

dcantah commented Jul 10, 2023

Some small things to start, but I need to give the actual approach some thought later today. cc @mxpv for another macOS user 😆

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Jul 10, 2023

Some small things to start, but I need to give the actual approach some thought later today

Well, my whole idea of this PR is that we just do what other platforms do - we bind-mount (although in a less traditional way, via FUSE bindfs).

@slonopotamus slonopotamus force-pushed the macos-bind-mount branch 3 times, most recently from f4ad8a1 to 3a6f893 Compare July 10, 2023 17:26
@slonopotamus slonopotamus requested a review from dcantah July 10, 2023 17:41
@slonopotamus slonopotamus force-pushed the macos-bind-mount branch 2 times, most recently from 5d80864 to 4f4c2ce Compare July 11, 2023 19:34
@slonopotamus
Copy link
Contributor Author

I believe that Linux sandbox CI failure is unrelated to this PR.

@dcantah
Copy link
Member

dcantah commented Jul 14, 2023

Sorry for the delay here. My main concern is everything kind of feels.. hacky? for our options on Darwin which is just due to the state of things: there's no kernel support for any of the usual niceties which I'm sure you know at this point 😪. Though, I see the value in "deploy/manage thing like a container even if all the container bits aren't fully there" as there's HostProcess containers on Windows. It feels off that we need to have some third party fuse kext (or the usermode version when/if that issue is fixed) to get this to work, but I'm not completely opposed to giving this a shot as we can experiment while we work towards 2.0. My minds wandering on if there's some way to achieve what we want here just as a remote snapshotter or similar 🤔. I do think if we did get this in we'll have to add some docs stating for Darwin if you want to use the native snapshotter what dependencies you'll now need

@TBBle
Copy link
Contributor

TBBle commented Jul 14, 2023

There was some discussion about mount plugins at #7055 (comment), and #8347 probably falls into the same area-of-interest.

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Jul 14, 2023

My main concern is everything kind of feels.. hacky?

I'd rephase this "everything feels an MVP". I do not claim that what I did here is the macOS snapshotter. It is just 35 lines of code that give us a snapshotter and allow to move on with experimenting on runtime. As far as I understand, this is the whole idea of native snapshotter. It is not designed to be high-performant or production-grade. I believe that 35 lines of code is not that much of a maintenance burden and could be merged without much concerns.

I also want to note that attempts to bring in a more full-blown solution are already stuck for two years.

Also, I have to plans to merge rund into containerd. What I'm trying to patch on containerd side is bare minimum that would make things work.

or the usermode version when/if that issue is fixed

Btw, all those were fixed.

@dcantah
Copy link
Member

dcantah commented Jul 14, 2023

Hacky wasn't a shot at this work, more at the current state of what we have to work with on Mac. It's hard to make a call on any of the solutions people have been trialing given that, but this seems the most container-like.

Like I said I think now is a decent time to experiment given the next release is up for some breaking changes, and if we find something new and shiny we can pivot or if not and it doesn't go as planned we can go back to the drawing board.

Btw, all those were fixed.

Sweet!

@slonopotamus
Copy link
Contributor Author

slonopotamus commented Jul 14, 2023

Hacky wasn't a shot at this work, more at the current state of what we have to work with on Mac.

There's another path of Hypervisor framework runtime that would produce containers kinda similar to Hyper-V isolation Windows. By no means I'm trying to close that door.

Like I said I think now is a decent time to experiment given the next release is up for some breaking changes

I'm no tying my work to any of containerd releases. rund is deliberately an out-of-tree shim and it may never become of sufficient quality that would be worth PRing into containerd. I also work on it in my spare time, so no specific calendar milestones and work often gets interrupted by higher-priority tasks.


So, I say it again: I treat this PR as a smallest amount of changes to containerd that produces a functional snapshotter on macOS (currently there are none).

@dcantah
Copy link
Member

dcantah commented Jul 14, 2023

There's another path of Hypervisor framework runtime that would produce containers kinda similar to Hyper-V isolation Windows. By no means I'm trying to close that door.

Yea I know this is focused on Mac native containers and not linux on darwin, although I think there's obviously merit for that

I'm not tying my work to any of containerd releases.

By experiment for the next release I mean anything added to containerd (e.g. this change or any future that you may end up needing to trial MacOS bits) not the shim you're working on

@slonopotamus
Copy link
Contributor Author

Yea I know this is focused on Mac native containers and not linux on darwin

Nono, you can also run macOS inside Hypervisor. See Tart for example. For some reason they decided to make a separate command-line tool instead shaping their tool as a shim, but at least we can use Tart to see what is possible using Hypervisor in principle.

@dcantah
Copy link
Member

dcantah commented Jul 14, 2023

Also I'm planning to block off some time to try rund next week 😆

@slonopotamus
Copy link
Contributor Author

to try rund next week

It's a bit too early :D It's missing a process subreaper yet, so whatever involving more that one process leaks stuff and tends to hang during shutdown.

@dcantah
Copy link
Member

dcantah commented Jul 14, 2023

Nono, you can also run macOS inside Hypervisor.

Oop, I shouldn't have assumed which platform. My mind always goes to Linux for virt

@dcantah
Copy link
Member

dcantah commented Jul 18, 2023

The idea (in my head albeit right now) would be to have a plugin that you can optionally compile in that would handle these. We'd have a standardized cmdline interface for the binaries (must have both mount and unmount commands, support {insert flags} etc.) and you'd communicate with it via a new grpc API probably. So instead of mount.MountAll, you'd call client.MountAll or something similar. If the plugin doesn't have a mapping for that mount type it'll just use the builtin versions (e.g just call mount.Mount internally).

@slonopotamus
Copy link
Contributor Author

@dcantah Feel free to ping me when there's something to play with. I'll try to use that stuff to my macOS needs. For now... Looks like rund will have to live with containerd fork.

AkihiroSuda added a commit that referenced this pull request Jul 18, 2023
Revert "Add support for bind-mounts on Darwin (a.k.a. "make native snapshotter work")" (#8789)
@slonopotamus
Copy link
Contributor Author

As I understand, #5935 should be closed with the same reasoning.

@thehajime
Copy link
Contributor

Also native snapshotter worked just fine before this change.

I'm not sure how you verified that without a runtime. See runtime/v2/bundle.go in this PR diff. Current code just skips

I think @mxpv just mentioned about this
#6329

@thehajime
Copy link
Contributor

I lost the context...

What happened between merging this PR (e5a49e6) and revert this PR (#8838) ?

It would be nice to discuss on a public channel but cannot find any on this.

@AkihiroSuda
Copy link
Member

I lost the context...

What happened between merging this PR (e5a49e6) and revert this PR (#8838) ?

It would be nice to discuss on a public channel but cannot find any on this.

#8789 (comment)
#8789 (comment)

The PR was merged too quickly without establishing consensus, so we wanted to revert this until we can find a proper interface to shell out a third party mount helper without depending on it by default.

@thehajime
Copy link
Contributor

@AkihiroSuda thanks, makes sense.

@slonopotamus
Copy link
Contributor Author

just mentioned about this #6329

You're missing the point. Yes, current containerd code allows snapshotter to download and unpack images. No, current containerd code is not allowing to Just Write a runtime because certain codepaths inside containerd itself expect to be able to mount/unmount.

@bergwolf
Copy link
Contributor

@dcantah Any updates on the mount plugin idea? It would be great to see a common solution to this.

@dcantah
Copy link
Member

dcantah commented Aug 22, 2023

@bergwolf I've been working on it in my spare time since about a week and a half ago. Curious, do you have a use case? The only one I could think of so far is this Darwin case so you could "overload" what a bind mount does and proxy it to an external binary.

@dcantah
Copy link
Member

dcantah commented Aug 22, 2023

If there's appetite for this outside of that case I can put some more priority into the work

@AkihiroSuda
Copy link
Member

What about this?

Probably containerd/containerd/mount should have some function like RegisterHelper(fsType string, helper func(m Mount, target string)(err error)).

BuildKit may have [mountHelpers] section in buildkitd.toml to specify third party mount binaries like "bindfs" mount helper.

The default config of [mountHelpers] should be just empty, at least for the initial release.

Originally posted by @AkihiroSuda in moby/buildkit#4059 (comment)

@bergwolf
Copy link
Contributor

@dcantah Yes, we do have a use case. I'm working on Nydus and right now we use a binary mount helper to drop certain options returned by the snapshotter. We've opened #6746 to extened the Mount struct but clearly that is not an acceptable solution. So if an external mount plugin is preferred, we'd appreciate to support it so that we can drop the binary mount helper.

@bergwolf
Copy link
Contributor

@AkihiroSuda Still, we'd have to maintain and ship an external binary. Is it possible to optionally register a GRPC endpoint (as @dcantah proposed) instead?

@dcantah
Copy link
Member

dcantah commented Aug 22, 2023

@dcantah Yes, we do have a use case. I'm working on Nydus and right now we use a binary mount helper to drop certain options returned by the snapshotter. We've opened #6746 to extened the Mount struct but clearly that is not an acceptable solution. So if an external mount plugin is preferred, we'd appreciate to support it so that we can drop the binary mount helper.

Awesome okay, I'll prioritize drafting up my initial ideas into an issue then and then let me know if whatever the model is would work for your use case!

@AkihiroSuda
Copy link
Member

Is it possible to optionally register a GRPC endpoint (as @dcantah proposed) instead?

I guess we need both, for those who consume containerd as a library.

@jiangliu
Copy link
Contributor

@dcantah Yes, we do have a use case. I'm working on Nydus and right now we use a binary mount helper to drop certain options returned by the snapshotter. We've opened #6746 to extened the Mount struct but clearly that is not an acceptable solution. So if an external mount plugin is preferred, we'd appreciate to support it so that we can drop the binary mount helper.

Awesome okay, I'll prioritize drafting up my initial ideas into an issue then and then let me know if whatever the model is would work for your use case!

Thanks for prioritizing this work:)
So will the mount helper support two modes?

  • execute the actual mount operation
  • modify fields in MountInfo structure and then call mount.Mount to do actual mount operation

@dcantah
Copy link
Member

dcantah commented Aug 23, 2023

@jiangliu My idea so far is your mount helper could do whatever it wants. You'd receive all of the args and options that the mount syscall would receive and you're free to interpret or not honor any of them, much in the same way your helper does right now with ignoring the extra data. We'd modify the bits in containerd that call mount (runc shim for example) to call into the mount plugin instead. If you configured a mount type to point to your binary in containerds config file mount plugin section, then containerd upon receiving the request will invoke your binary and pass over everything it needs to perform the mount (this is undecided so far, e.g. just pass args on the cmdline or have the binary read a json blob from stdin, either might work).

We'd have a section in containerds config where you could setup mappings from mount type to binary:

[plugins."io.containerd.grpc.v1.mount".mappings]
        [plugins."io.containerd.grpc.v1.mount".mappings.overlay]
            helper = "/path/to/overlay-helper"
        [plugins."io.containerd.grpc.v1.mount".mappings.bind]
            helper = "/path/to/bind-helper"

where anything that doesn't have a registered "override" would just have containerd call mount.All as it does today and nothing changes. This leaves the current behavior the exact same for folks who don't need to use this functionality.

@jadnhm jadnhm mentioned this pull request Nov 21, 2023
6 tasks
// Mount to the provided target.
func (m *Mount) mount(target string) error {
// See https://github.com/slonopotamus/containerd-darwin-mount-helper for reference implementation
const commandName = "containerd-darwin-mount-helper"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can reach consensus on reviving this PR if this is just an environment variable like CONTAINERD_DARWIN_MOUNT_HELPER or CONTAINERD_DARWIN_MOUNT_HELPER_EXPERIMENTAL that takes the path of the binary that is compatible with https://github.com/slonopotamus/containerd-darwin-mount-helper .

cc @containerd/committers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTAINERD_DARWIN_MOUNT_HELPER_EXPERIMENTAL sounds fairly reasonable to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of the generic helper interface which would be Darwin specific and not a well defined plugin interface. The idea around a large mount plugin interface doesn't really need to apply here either though, as the idea behind that interface before was always to go from custom mount types to system mount types. Even though "bindfs" is not a default util, I think we could easily just call out to "bindfs" and "hdiutil" in the Darwin mount code to cover a large range of the normal "system" mounts that would be expected there. Since "bindfs" is not a default, we could make that swappable or disabled via environment variables, but still utilizing the bindfs args. For apfs or other block type mounts, hdiutil can be relied on without needing to have custom binaries which just wrap hdiutil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slonopotamus @mxpv WDYT? ↑

Copy link
Contributor Author

@slonopotamus slonopotamus Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I somehow missed this.

  1. Both containerd and buildkit expect across numerous places in codebase that they can just umount() things. bindfs working through either macFUSE or fuse-t supports that.
  2. Containerd expects that bind-mounts exist and work. Even native snapshotter depends on bind mounts: https://github.com/containerd/containerd/blob/v2.0.0-rc.4/plugins/snapshots/native/native_default.go#L21, that's why darwin-containers@416899f hack was applied (it can be reverted if we have proper bind-mounts).
  3. I agree with @dmcgowan that introducing a plugin API is an overkill. And it is very likely to be a weird plugin API: do we expect that a single helper is supposed to support all kinds of mounts? If there are multiple helpers, codebase needs to track which helper was used for mounting to use the same helper for unmount. This even can be across processes (containerd cleans up things in case shim dies).
  4. Yes, binds is kind of a hack. Nobody says currently this is a production-scale code. If there emerges a better solution in the future, I will raise both of my hands for using it.
  5. We're talking about a diff with negative line delta :)

So, I'd like to split concerns.

First one is the need of bind-mounts that is not going to go anywhere. If you have other solutions besides bindfs, let's discuss them. if there are none, I don't see what's the benefit of adding wrapper layer. And API that is architected based on a single usecase usually fails to support the second when it emerges.

Second one is ineffectiveness of native snapshotter. This is a separate issue that can be solved independently (either via sparse-bundle approach suggested in #5935 or some FUSE solution or new FSKit API). Even if native snapshotter is slow, it is functionally correct and allows experimenting. My changes do not aim to solve this issue, I am only concentrated on bind-mounts that will be needed regardless of what snapshotter is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda Let's suppose someone created a bindfs-like FOSS filesystem based on FSKit. Would dependency on such thing be considered acceptable for containerd as a bind mounts implementation? According to FSKit documentation, it would be used via mount -t MyAwesomeBind /src /dst. Plain unmount would also work for this fs. Given that Darwin doesn't have builtin bind fs type, such implementation could even just use that name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, maybe it should be rather a built-in feature of containerd

cc @mxpv @dmcgowan

Copy link
Contributor Author

@slonopotamus slonopotamus Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be rather a built-in feature of containerd

Well, the issue is that FSKit API is currently only available for Swift and Objective-C. Quick Google search suggests it is possible to call Objective-C via cgo, though that might be not a pleasant experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call out to hdiutil for now?

Copy link
Contributor Author

@slonopotamus slonopotamus Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call out to hdiutil for now?

No, because we don't know what args to pass to hdiutil yet (that imaginary FSKit-based bind-mount filesystem is nonexistent currently).

slonopotamus added a commit to darwin-containers/containerd that referenced this pull request Aug 30, 2024
…implementation

After these changes, in order to add Darwin bind-mount implementation, one only needs:
* Adjust HasBindMounts definition in mount.go
* Provide implementation in mount_darwin.go

As a bonus, Linux FUSE-related code was moved to a separate file and possibly could be reused on FreeBSD, though this needs testing.

There was no consensus on adding dependency on bindfs, that seems to be the only working solution for bind-mounts on Darwin as of today, in containerd#8789, that's why the actual implementation is not added in current PR.
slonopotamus added a commit to darwin-containers/containerd that referenced this pull request Aug 30, 2024
…implementation

After these changes, in order to add Darwin bind-mount implementation, one only needs:
* Adjust HasBindMounts definition in mount.go
* Provide implementation in mount_darwin.go

As a bonus, Linux FUSE-related code was moved to a separate file and possibly could be reused on FreeBSD, though this needs testing.

There was no consensus on adding dependency on bindfs, that seems to be the only working solution for bind-mounts on Darwin as of today, in containerd#8789, that's why the actual implementation is not added in current PR.

Signed-off-by: Marat Radchenko <[email protected]>
slonopotamus added a commit to darwin-containers/containerd that referenced this pull request Aug 30, 2024
…implementation

After these changes, in order to add Darwin bind-mount implementation, one only needs:
* Adjust HasBindMounts definition in mount.go
* Provide implementation in mount_darwin.go

There was no consensus on adding dependency on bindfs, that seems to be the only working solution for bind-mounts on Darwin as of today, in containerd#8789, that's why the actual implementation is not added in current PR.

As a bonus, Linux FUSE-related code was moved to a separate file and possibly could be reused on FreeBSD, though this needs testing.

Signed-off-by: Marat Radchenko <[email protected]>
slonopotamus added a commit to darwin-containers/containerd that referenced this pull request Aug 30, 2024
…implementation

After these changes, in order to add Darwin bind-mount implementation, one only needs:
* Adjust HasBindMounts definition in mount.go
* Provide implementation in mount_darwin.go

There was no consensus on adding dependency on bindfs, that seems to be the only working solution for bind-mounts on Darwin as of today, in containerd#8789, that's why the actual implementation is not added in current PR.

As a bonus, Linux FUSE-related code was moved to a separate file and possibly could be reused on FreeBSD, though this needs testing.

Signed-off-by: Marat Radchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.