Start containers in their own cgroup namespaces#38377
Start containers in their own cgroup namespaces#38377yongtang merged 2 commits intomoby:masterfrom rgulewich:38332-cgroup-ns
Conversation
|
Please sign your commits following these rules: $ git clone -b "38332-cgroup-ns" [email protected]:rgulewich/moby.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -fAmending updates the existing PR. You DO NOT need to open a new one. |
thaJeztah
left a comment
There was a problem hiding this comment.
Thanks! Only gave this a cursory glance, but left some comments.
Looks like this information will be returned by the /info endpoint, so we'll need changes to the swagger file https://github.com/moby/moby/blob/master/api/swagger.yaml, and the API changelog https://github.com/moby/moby/blob/master/docs/api/version-history.md as well
There was a problem hiding this comment.
Perhaps we should have a testEnv.SupportsCgroupNS() utility for this 🤔
pkg/sysinfo/sysinfo_linux.go
Outdated
There was a problem hiding this comment.
We may want to extract it to a function, e.g.
| sysInfo.CgroupNamespaces = true | |
| sysInfo.CgroupNamespaces = cgroupNamespacesSupported() |
Looking at that, perhaps this should be a utility in libcontainer; somewhere around here; https://github.com/opencontainers/runc/blob/f5b99917df9fbe1d9a4114966fb088dd6860e85a/libcontainer/cgroups/utils.go#L30-L45
and/or in the containerd/cgroups package https://github.com/containerd/cgroups/blob/master/utils.go
There was a problem hiding this comment.
Good idea - it makes sense to me for this to live in libcontainer. What do you think about adding it in its own sub-package of libcontainer? Everything in the libcontainer/cgroups package deals with cgroups themselves, whereas this is actually a namespace - I worry that putting them together in the cgroups package might make this more confusing.
So maybe something like libcontainer/namespace/cgroups.go?
daemon/oci_linux.go
Outdated
There was a problem hiding this comment.
Wondering if this should be enabled by default if the host support it, or if it should be an opt-in/opt-out
I'm not too familiar with the feature itself, but my train of thought is; is this causing a change in behavior? Would there be situations where namespaced cgroups are not desirable?
There was a problem hiding this comment.
I was on the fence about this myself - I listed a couple other options in the GH issue. @cpuguy83, any thoughts?
There was a problem hiding this comment.
The main change in behaviour is that /proc/self/cgroup will now hide the Docker-related stuff. Personally I think it should be enabled by default and shouldn't be daemon-wide at all -- it should be treated like all other namespaces. My only worry is that sharing a cgroup with --cgroup-parent will now become more complicated because if you have two containers in the same cgroup but in different cgroup namespaces then things start to get a bit weird (especially with nsdelegate thrown in).
There was a problem hiding this comment.
Sorry, I missed this notification.
Agree we should enable by default, but make sure not to break users who expect things like cgroup-parent and privileged to work a certain way.
There was a problem hiding this comment.
The current behaviour is to enable it by default (just like any other namespace), except if the container is privileged.
The main side effect (which shows up in the integration tests), is that the cgroup hierarchy is no longer visible from inside the container. I don't know if this qualifies as a breaking change or not.
@cyphar / @cpuguy83, what are your concerns about running with --cgroup-parent? I can add some integration tests to confirm the behaviour.
There was a problem hiding this comment.
The main side effect (which shows up in the integration tests), is that the cgroup hierarchy is no longer visible from inside the container. I don't know if this qualifies as a breaking change or not.
This is the main purpose of cgroup namespaces, so it's good that it works. 😉 I don't think it'll count as a breaking change unless some folks have really tightly integrated into what /proc/self/cgroups contains which really isn't a good idea anyway.
what are your concerns about running with
--cgroup-parent?
Since writing the above comment I've remembered that cgroup sharing doesn't exist in Docker (getting two containers to share a cgroup) and so --cgroup-parent shouldn't be a problem after all. The concern would be that if you have two containers sharing a cgroup you would want them to share cgroup namespaces.
There was a problem hiding this comment.
We do support sharing of most namespaces, so we could add support for cgroup namespace sharing.
|
CI failures look related |
Codecov Report
@@ Coverage Diff @@
## master #38377 +/- ##
=========================================
Coverage ? 36.99%
=========================================
Files ? 612
Lines ? 45468
Branches ? 0
=========================================
Hits ? 16823
Misses ? 26352
Partials ? 2293 |
|
@thaJeztah This change doesn't change the Should it be added to that endpoint? |
|
@rgulewich plz check failing tests |
|
(reserved for my derek commands) |
|
@rgulewich #38426 removed SameHostDaemon check so CI fails now to: so can your rebase and update tests? |
|
@olljanat Pushed changes. The experimental failures don't look to be related, unless I'm mistaken. |
|
Would anyone be able to give a more thorough review of this? I agree with @cyphar's comment in opencontainers/runc#1976 that moving the detection code into libcontainer isn't really necessary (though that PR is there if that's preferable). @cpuguy83, would you mind checking that any concerns you have are covered? @AkihiroSuda / @yongtang / @olljanat / @kolyshkin / @justincormack - you all look to have committed code in this area somewhat recently. Would any of you mind reviewing? Thanks! |
|
I'm a bit worried that we are having to change tests and as such making a breaking change... but maybe the tests are just not good tests. |
|
@cpuguy83 - Fair enough. I wouldn't want to break someone who was depending on this behaviour. Would a run-time flag to enable / disable this for the entire daemon be sufficient, do you think? |
|
All tests passed, merging. |
|
Is there docker CLI pr for overriding |
|
@AkihiroSuda I don't think there is one yet (at least; I don't see one linked) |
|
@rgulewich
It doesn't fail in PRs and in master just because it is skipped because of the system condition checks. Can you follow-up please? |
Good catch, @tonistiigi - please see my PR here: #39578 |
- What I did
This adds cgroup namespace support to docker. It's enabled for all containers that are not run with
--privileged, if the kernel supports it.Fixes #38332.
- How I did it
Now that runc supports cgroup namespaces, just add a namespace of type "cgroup" to the list.
At daemon startup, this looks for
/proc/self/ns/cgroupto be present to determine whether or not to enable it.- How to verify it
Run a new container and verify that its pid 1 is running in its own cgroup namespace.
- Description for the changelog
Add cgroup namespace support for Linux containers
- A picture of a cute animal (not mandatory but encouraged)