Support running the Engine daemon inside a user namespace#20902
Support running the Engine daemon inside a user namespace#20902hallyn wants to merge 1 commit intomoby:masterfrom
Conversation
|
Hi @hallyn, thanks for contribution. Could you do the following :
|
|
D'oh! the third patch (which edits vendor/) was supposed to be removed and is in fact already a PR against opencontainers/runc. Removing it for real now. |
6dda0d6 to
7101e12
Compare
7101e12 to
412c77a
Compare
|
ping @estesp for user namespaces |
|
Got panics on CI |
pkg/chrootarchive/archive_unix.go
Outdated
There was a problem hiding this comment.
This is the cause of the test panic as options is still nil at this point.
However, I assume this is actually being passed in via the JSON "options" object and shouldn't be set here anyway, so maybe a stray leftover from testing?
|
Hm, I don't know what happened there, that doesn't look like my original patch... checking. |
|
Last step here is to handle cross-platform issue with the RunningInUserNS: Running make cross will reveal the issue (the libcontainer/system/linux.go functions are only compiled on I think the cleanest way is a runC PR to have a stub for non-Linux that returns |
|
@hallyn I created opencontainers/runc#620 to solve this; I'm hoping to vendor in opencontainers/runc very soon for the shared namespace work that is now in libcontainer/nsenter; so hopefully we can get this solved ASAP. |
|
@hallyn you can include the change from opencontainers/runc#620 in your PR by adding the file here https://github.com/docker/docker/tree/master/vendor/src/github.com/opencontainers/runc/libcontainer/system. This will make the "vendor" CI check fail, but the other tests should then complete successfully. Once the RunC PR is merged, and vendored, you can rebase your PR to make the vendor check pass |
|
Thanks, done, will watch for the runc pr merge... |
pkg/chrootarchive/diff_unix.go
Outdated
There was a problem hiding this comment.
options is nil until the line further down, so this fails with panic: runtime error: invalid memory address or nil pointer dereference 😄
There was a problem hiding this comment.
actually, even worse -- options doesn't appear to ever be set to non-nil within this function
There was a problem hiding this comment.
? the diff I have here has options.InUserNS = true happening after the unmarshalling
There was a problem hiding this comment.
Hmm, @mwhudson managed to trigger a panic here on s390x, so I wonder if there's some edge case where json.Unmarshal doesn't create the struct?
There was a problem hiding this comment.
Ah nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2#diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍
There was a problem hiding this comment.
Quoting Tianon Gravi ([email protected]):
@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}
- if inUserns {
options.InUserNS = true(even still, couldn't this just be
options.InUserNS = inUserns? or does setting it regardless like that have other consequences?)
nu, bc we must detect it before the chroot.
There was a problem hiding this comment.
Quoting Tianon Gravi ([email protected]):
@@ -49,6 +51,10 @@ func applyLayer() {
fatal(err)
}
- if inUserns {
options.InUserNS = trueAh nevermind, that's a different set of changes from v1.10.2...hallyn:v1.10.0.serge.2#diff-4166b9ad558bc9c8d0ff7b01b69c128aR34 causing that one -- carry on! 👍
yeah i had it wrong originally
There was a problem hiding this comment.
fwiw, I triggered this by running "sudo adt-run --unbuilt-tree . --- null". and I just tried the same on arm64 with the same results, so it would probably happen on amd64 too...
There was a problem hiding this comment.
Exactly which commit id were you using?
There was a problem hiding this comment.
That was my fault -- he was testing on essentially ab8e54b, which is missing 45afdc9af9c300887cfd9952e04f5ba7d500f4d9 (which fixes the panic).
|
@hallyn we need to wait until the containerd integration PR merges (#20662) to get rid of your vendor commit--that PR will update the In the meantime, what do you think about logging the "skips" for unsupported file types when tarring/untarring in the archive packages? That way there will be some record if bugs come up in the future about "missing" content? |
bdb2d33 to
a32e59c
Compare
|
Hi, rebased and added the log msg @estesp mentioned. |
|
code LGTM. Per comment in the more recent PR re: userns + AUFS, @thaJeztah do you know which doc would be the right place to explain this behavior/availability of running the engine in a userns? |
|
Ah ok if you're using the stock xenial kernel then the patch
in the closed merge request I sent earlier should work for you.
It'll use the 'nsroot=' field in mountinfo to figure out what
it needs. (The nsroot= is not likely to go upstream, hence
the closed merge request)
|
|
So I still can't get this to work. |
|
@cpuguy83 - can you show the output of 'lxc config show containername --expanded' ? lxc launch ubuntu;xenial x1 -p default -p docker and use the docker.io package in the archive, and see whether that also fails for you? |
|
ping? |
|
There should be a test added for this, I bet you can use unshare, and add it to the daemon suite |
|
otherwise we will never know if it breaks :) |
|
A test would definately be good. IMO this was a bugfix not a feature so a testcase could be a follow-on, but especially now that it's been months since the patch was written, it would already be nice to have one. I'm all for someone resubmitting my patch with an added-on test :) |
|
Ping @estesp: halp! |
|
I may also try with lxc, but I thought using Ran into several issues with an ubuntu image that I could mostly get around with the following daemon startup: However, on Maybe LXC is set up better for this nesting, but I would like to figure out how to get runc as a valid "candidate" for the outer layer of the nesting if at all possible. Any thoughts @hallyn ? |
|
Ah, this is caused by #22506; seems we will need to extend this patch to rely on/fallback to |
|
Getting further after fallback to "real choot" if nested userns, but shm tmpfs mount seems to be a problem; or a real problem is not logged and the cleanup of mounts is just a side effect: At this point I have to stop for now, but suggestions welcome. Looks like we are close to a testable scenario with runc, maybe? |
|
ping @mlaventure perhaps you could have a look? |
|
Here's the real issue; at least running under |
|
I'm trying to catch up this PR because I also want to run docker inside user namespace (unpriv LXC container) and want to reproduce the errors locally first. @estesp Mind sharing in what env you are using? Specifically, I like to know
Also, I'm curious if running the latest docker inside userns becomes possible without changing runc/containerd? Or is it something we'll find out eventually? |
|
@kimh it is possible with LXC and this PR plus an additional change that is required since this PR was created (I mentioned it above related to chroot/untar fallback in a comment 12 days ago). I will be carrying this PR with the additional change, but had been stuck on trying to see if I could get a working test using I did try via LXC today and with this PR + my added change I can run the daemon with " |
|
@estesp So if my understanding is correct, my added change refers to fallback to chroot under nested userns ? Any chance you can share the change with us? I looked for the branch in your fork but not sure which one it is. I'm thrilled about testing this 💣 |
| case os.ModeNamedPipe: | ||
| fallthrough | ||
| case os.ModeSocket: | ||
| if rsystem.RunningInUserNS() { |
There was a problem hiding this comment.
this seems be a typo. according to commit message, it should skip in os.ModeDevice case, not in os.ModeSocket
This change was introduced early in the development of rootless support, before all the kinks were worked out and rootlesskit was built. The author was testing the daemon by inside a user namespace set up by runc, observed that the unshare(2) syscall was returning EPERM, and assumed that it was a fundamental limitation of user namespaces. Seeing as the kernel documentation (of today) disagrees with that assessment and that unshare demonstrably works inside user namespaces, I can only assume that the EPERM was due to a quirk of their test environment, such as a seccomp filter set up by runc blocking the unshare syscall. moby#20902 (comment) Mount namespaces are necessary to address moby#38995 and moby#43390. Revert the special-casing so those issues can also be fixed for rootless daemons. This reverts commit dc95056. Signed-off-by: Cory Snider <[email protected]>
This change was introduced early in the development of rootless support, before all the kinks were worked out and rootlesskit was built. The author was testing the daemon by inside a user namespace set up by runc, observed that the unshare(2) syscall was returning EPERM, and assumed that it was a fundamental limitation of user namespaces. Seeing as the kernel documentation (of today) disagrees with that assessment and that unshare demonstrably works inside user namespaces, I can only assume that the EPERM was due to a quirk of their test environment, such as a seccomp filter set up by runc blocking the unshare syscall. moby#20902 (comment) Mount namespaces are necessary to address moby#38995 and moby#43390. Revert the special-casing so those issues can also be fixed for rootless daemons. This reverts commit dc95056. Signed-off-by: Cory Snider <[email protected]>
This change was introduced early in the development of rootless support, before all the kinks were worked out and rootlesskit was built. The author was testing the daemon by inside a user namespace set up by runc, observed that the unshare(2) syscall was returning EPERM, and assumed that it was a fundamental limitation of user namespaces. Seeing as the kernel documentation (of today) disagrees with that assessment and that unshare demonstrably works inside user namespaces, I can only assume that the EPERM was due to a quirk of their test environment, such as a seccomp filter set up by runc blocking the unshare syscall. moby#20902 (comment) Mount namespaces are necessary to address moby#38995 and moby#43390. Revert the special-casing so those issues can also be fixed for rootless daemons. This reverts commit dc95056. Signed-off-by: Cory Snider <[email protected]>
This change was introduced early in the development of rootless support, before all the kinks were worked out and rootlesskit was built. The author was testing the daemon by inside a user namespace set up by runc, observed that the unshare(2) syscall was returning EPERM, and assumed that it was a fundamental limitation of user namespaces. Seeing as the kernel documentation (of today) disagrees with that assessment and that unshare demonstrably works inside user namespaces, I can only assume that the EPERM was due to a quirk of their test environment, such as a seccomp filter set up by runc blocking the unshare syscall. moby#20902 (comment) Mount namespaces are necessary to address moby#38995 and moby#43390. Revert the special-casing so those issues can also be fixed for rootless daemons. This reverts commit dc95056. Signed-off-by: Cory Snider <[email protected]>
This change was introduced early in the development of rootless support, before all the kinks were worked out and rootlesskit was built. The author was testing the daemon by inside a user namespace set up by runc, observed that the unshare(2) syscall was returning EPERM, and assumed that it was a fundamental limitation of user namespaces. Seeing as the kernel documentation (of today) disagrees with that assessment and that unshare demonstrably works inside user namespaces, I can only assume that the EPERM was due to a quirk of their test environment, such as a seccomp filter set up by runc blocking the unshare syscall. moby/moby#20902 (comment) Mount namespaces are necessary to address #38995 and #43390. Revert the special-casing so those issues can also be fixed for rootless daemons. This reverts commit 047305a. Signed-off-by: Cory Snider <[email protected]>
Two commits to accomodate things we cannot do when running docker from a container which is in a user namespace. Combined with one more patch to libcontainer itself, this suffices to run docker inside a user namespaced container.