containerd-shim: Do not remount root MS_SLAVE#1141
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1141 +/- ##
======================================
Coverage 28.1% 28.1%
======================================
Files 28 28
Lines 2832 2832
======================================
Hits 796 796
Misses 1887 1887
Partials 149 149Continue to review full report at Codecov.
|
|
Because the shim mounts the rootfs, we also need to make sure it is unmounted before the shim exits with this change. |
|
Thanks, I'll look into that. Perhaps that need also explains the |
|
Updated with a version which removes the mount completely and also cleans up the rootfs mount. PTAL. (transcribed from #1132 (comment) which I confusingly put in the wrong place) |
1d46787 to
1f93aac
Compare
|
LGTM |
estesp
left a comment
There was a problem hiding this comment.
Seems reasonable; just a few thoughts on the cleanup function
There was a problem hiding this comment.
relatively insignificant, but maybe it's simpler just to do nrRootMounts = len(r.Rootfs) here at the top and not worry about counting in the loop?
There was a problem hiding this comment.
This was deliberate in order to only undo the work we succeeded with. Although that is somewhat undermined by my not actually calling cleanupMounts in the error path for the Mount call itself!
There was a problem hiding this comment.
Actually, the more i look at this why don't you just loop until you receive a mount error that the path is not mounted instead of keeping this count around?
There was a problem hiding this comment.
You can just loop until you hit EINVAL
There was a problem hiding this comment.
TL;DR, yes, I should loop until EINVAL.
I was thinking it was possible that the caller might have also mounted some stuff on the rootfs before calling create? It'd be weird for containerd to undo that. But thinking some more the rootfs here is entirely internal to containerd(-shim) and the user doesn't really formally know where it would be, which I suppose is why were have the rootfs slice in the first place...
There was a problem hiding this comment.
EINVAL covers a multitude of sins according to umount(2), including "bad flags" in addition to "not a mount point" (I'm not too worried about "called with MNT_EXPIRE and either MNT_DETACH or MNT_FORCE").
I think it's still the right way to go though.
There was a problem hiding this comment.
what about making this a defered function with a success variable declared as false before the defer definition. then the unmount can be gated by if !success {, and only set success to true at the one success path at the bottom before returning the pid?
There was a problem hiding this comment.
In this case, wouldn't success be defined as success := err == nil. If so, can jut check if the err is nil at the top of the closure.
There was a problem hiding this comment.
The err captured by the closure may not be the same err as gets assigned to further down the function, since err can be shadowed (e.g. if err := foo(); err != nil { return err } shadows the previous err, I think)
There was a problem hiding this comment.
Yeah, my assumption is the err == nil defer style would require a careful undoing of err scoping in all the if blocks with a function-scoped var err error at the top, right?
There was a problem hiding this comment.
Even then I think you can shadow without the compiler warning about it if the new declaration happens in a nested scope, so there would be a danger of forgetting and doing an err := somewhere.
There was a problem hiding this comment.
On error cases, won't this ultimately give a partially unwound stack of mounts anyway?
Should it just return on the first error?
There was a problem hiding this comment.
I figured there was no harm in trying to do all of them, but yes it is likely not to do much in practice after the first failure and will certainly not unmount the full stack if one of them fails.
There was a problem hiding this comment.
I think it is still best to try.
We expect this (or something very similar) to be merged soon, it fixes linuxkit#2131 so moving ahead now. The new alpine mirror is linuxkit/alpine:ce99c286eec27d0e722fc0e6ac2a3d2b1abc874e Signed-off-by: Ian Campbell <[email protected]>
|
I made the change to use |
|
Rebased onto alpha0. |
We expect this (or something very similar) to be merged soon, it fixes linuxkit#2131 so moving ahead now. The new alpine mirror is linuxkit/alpine:6832775a7e861ee2d7842e157688ece52d007142 Signed-off-by: Ian Campbell <[email protected]>
estesp
left a comment
There was a problem hiding this comment.
One minor nit: it seems idiomatic in Go to not assign zero values; so var success bool is usually more accepted than success := false
Regardless, LGTM
Any other maintainers want to weigh in? I believe @ijc has responded to all feedback
|
Thanks @estesp I fixed the zero val assign. |
Mounting as MS_SLAVE here breaks use cases which want to use rootPropagation=shared in order to expose mounts to the host (and other containers binding the same subtree), mounting as e.g. MS_SHARED is pointless in this context so just remove. Having done this we also need to arrange to manually clean up the mounts on delete, so do so. Note that runc will also setup root as required by rootPropagation, defaulting to MS_PRIVATE. Fixes containerd#1132. Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Ian Campbell <[email protected]>
This avoids someone adding a new error path and forgetting to call the cleanup function. We prefer to use an explicit flag to gate the clean rather than relying on `err != nil` so we don't have to rely on people never accidentally shadowing the `err` as seen by the closure. Signed-off-by: Ian Campbell <[email protected]>
This is simpler than trying to count how many successful mounts we made. Signed-off-by: Ian Campbell <[email protected]>
|
Rebased onto alpha1. |
|
LGTM This looks okay. The only area that I'm worried about is unmount mounts that weren't mounted by the shim. |
Since 'containerd/containerd#1141' has been merged into containerd git repo, so we don't need the below repo 'https://github.com/ijc/containerd.git' as the containerd git tree in that pkg. Moreover, with the 'ijc' containerd git tree, we'll have below error message when build the pkg/containerd: fatal: reference is not a tree: d42cb88ba2b08d2ca6c8c017d629b394bf1dd08c The command '/bin/sh -c git checkout $CONTAINERD_COMMIT' returned a non-zero code: 128. Except that above issue, we need to update the "github.com/Sirupsen/logrus" pkg imported, because the latest code has change the 'Sirupsen' to 'sirupsen', else the build process will complain that that pkg can't be found. We also need to update the caller of NewTask(.) adapting to the latest implementation of NewTask() to avoid the type conflict issue. Signed-off-by: Dennis Chen <[email protected]>
Mounting as MS_SLAVE here breaks use cases which want to use
rootPropagation=shared in order to expose mounts to the host (and other
containers binding the same subtree).
Note that runc will also setup root as required by rootPropagation, defaulting
to MS_PRIVATE.
Fixes #1132.
Signed-off-by: Ian Campbell [email protected]