Skip to content

Start containers in their own cgroup namespaces#38377

Merged
yongtang merged 2 commits intomoby:masterfrom
rgulewich:38332-cgroup-ns
May 12, 2019
Merged

Start containers in their own cgroup namespaces#38377
yongtang merged 2 commits intomoby:masterfrom
rgulewich:38332-cgroup-ns

Conversation

@rgulewich
Copy link
Contributor

@rgulewich rgulewich commented Dec 14, 2018

- What I did

This adds cgroup namespace support to docker. It's enabled for all containers that are not run with --privileged, if the kernel supports it.

Fixes #38332.

- How I did it

Now that runc supports cgroup namespaces, just add a namespace of type "cgroup" to the list.

At daemon startup, this looks for /proc/self/ns/cgroup to be present to determine whether or not to enable it.

- How to verify it

Run a new container and verify that its pid 1 is running in its own cgroup namespace.

- Description for the changelog

Add cgroup namespace support for Linux containers

- A picture of a cute animal (not mandatory but encouraged)

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "38332-cgroup-ns" [email protected]:rgulewich/moby.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@GordonTheTurtle GordonTheTurtle added dco/no Automatically set by a bot when one of the commits lacks proper signature status/0-triage and removed dco/no Automatically set by a bot when one of the commits lacks proper signature labels Dec 14, 2018
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Only gave this a cursory glance, but left some comments.

Looks like this information will be returned by the /info endpoint, so we'll need changes to the swagger file https://github.com/moby/moby/blob/master/api/swagger.yaml, and the API changelog https://github.com/moby/moby/blob/master/docs/api/version-history.md as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should have a testEnv.SupportsCgroupNS() utility for this 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to extract it to a function, e.g.

Suggested change
sysInfo.CgroupNamespaces = true
sysInfo.CgroupNamespaces = cgroupNamespacesSupported()

Looking at that, perhaps this should be a utility in libcontainer; somewhere around here; https://github.com/opencontainers/runc/blob/f5b99917df9fbe1d9a4114966fb088dd6860e85a/libcontainer/cgroups/utils.go#L30-L45

and/or in the containerd/cgroups package https://github.com/containerd/cgroups/blob/master/utils.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - it makes sense to me for this to live in libcontainer. What do you think about adding it in its own sub-package of libcontainer? Everything in the libcontainer/cgroups package deals with cgroups themselves, whereas this is actually a namespace - I worry that putting them together in the cgroups package might make this more confusing.

So maybe something like libcontainer/namespace/cgroups.go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be enabled by default if the host support it, or if it should be an opt-in/opt-out

I'm not too familiar with the feature itself, but my train of thought is; is this causing a change in behavior? Would there be situations where namespaced cgroups are not desirable?

@AkihiroSuda @crosbymichael ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about this myself - I listed a couple other options in the GH issue. @cpuguy83, any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change in behaviour is that /proc/self/cgroup will now hide the Docker-related stuff. Personally I think it should be enabled by default and shouldn't be daemon-wide at all -- it should be treated like all other namespaces. My only worry is that sharing a cgroup with --cgroup-parent will now become more complicated because if you have two containers in the same cgroup but in different cgroup namespaces then things start to get a bit weird (especially with nsdelegate thrown in).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this notification.

Agree we should enable by default, but make sure not to break users who expect things like cgroup-parent and privileged to work a certain way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behaviour is to enable it by default (just like any other namespace), except if the container is privileged.

The main side effect (which shows up in the integration tests), is that the cgroup hierarchy is no longer visible from inside the container. I don't know if this qualifies as a breaking change or not.

@cyphar / @cpuguy83, what are your concerns about running with --cgroup-parent? I can add some integration tests to confirm the behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main side effect (which shows up in the integration tests), is that the cgroup hierarchy is no longer visible from inside the container. I don't know if this qualifies as a breaking change or not.

This is the main purpose of cgroup namespaces, so it's good that it works. 😉 I don't think it'll count as a breaking change unless some folks have really tightly integrated into what /proc/self/cgroups contains which really isn't a good idea anyway.

what are your concerns about running with --cgroup-parent?

Since writing the above comment I've remembered that cgroup sharing doesn't exist in Docker (getting two containers to share a cgroup) and so --cgroup-parent shouldn't be a problem after all. The concern would be that if you have two containers sharing a cgroup you would want them to share cgroup namespaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support sharing of most namespaces, so we could add support for cgroup namespace sharing.

@thaJeztah
Copy link
Member

CI failures look related

23:54:57 FAIL: docker_cli_build_test.go:3973: DockerSuite.TestBuildContainerWithCgroupParent
23:54:57 
23:54:57 docker_cli_build_test.go:3996:
23:54:57     c.Fatalf("There is no expected memory cgroup with parent /%s/: %s", cgroupParent, result.Combined())
23:54:57 ... Error: There is no expected memory cgroup with parent /test/: Sending build context to Docker daemon  2.048kB


23:54:57 Step 1/2 : FROM busybox
23:54:57  ---> 6ad733544a63
23:54:57 Step 2/2 : RUN cat /proc/self/cgroup
23:54:57  ---> Running in a799443f87c8
23:54:57 11:devices:/
23:54:57 10:pids:/
23:54:57 9:cpuset:/
23:54:57 8:cpu,cpuacct:/
23:54:57 7:freezer:/
23:54:57 6:net_cls,net_prio:/
23:54:57 5:hugetlb:/
23:54:57 4:blkio:/
23:54:57 3:perf_event:/
23:54:57 2:memory:/
23:54:57 1:name=systemd:/
23:54:57 Removing intermediate container a799443f87c8
23:54:57  ---> 48b98162374e
23:54:57 Successfully built 48b98162374e
23:54:57 Successfully tagged buildcgroupparent:latest
23:54:57 

00:15:14 FAIL: docker_cli_run_test.go:3237: DockerSuite.TestRunContainerWithCgroupParent
00:15:14 
00:15:14 docker_cli_run_test.go:3242:
00:15:14     // cgroup-parent relative path
00:15:14     testRunContainerWithCgroupParent(c, "test", "cgroup-test")
00:15:14 docker_cli_run_test.go:3267:
00:15:14     c.Fatalf("unexpected cgroup paths. Expected at least one cgroup path to have suffix %q. Cgroup Paths: %v", expectedCgroup, cgroupPaths)
00:15:14 ... Error: unexpected cgroup paths. Expected at least one cgroup path to have suffix "test/ee671df4af17e4f7594f6199410af9e09025b4cefb4f3fdef3391f35eaf976d5". Cgroup Paths: map[cpu,cpuacct:/ hugetlb:/ blkio:/ memory:/ name=systemd:/ perf_event:/ devices:/ pids:/ cpuset:/ freezer:/ net_cls,net_prio:/]
00:15:14 


00:16:09 FAIL: docker_cli_run_test.go:3272: DockerSuite.TestRunInvalidCgroupParent
00:16:09 
00:16:09 docker_cli_run_test.go:3276:
00:16:09     testRunInvalidCgroupParent(c, "../../../../../../../../SHOULD_NOT_EXIST", "SHOULD_NOT_EXIST", "cgroup-invalid-test")
00:16:09 docker_cli_run_test.go:3307:
00:16:09     c.Fatalf("unexpected cgroup paths. Expected at least one cgroup path to have suffix %q. Cgroup Paths: %v", expectedCgroup, cgroupPaths)
00:16:09 ... Error: unexpected cgroup paths. Expected at least one cgroup path to have suffix "SHOULD_NOT_EXIST/42c0cf9e9b4892b42a23b57f943d417a52a8373608c7d69b32e56bc528bc0761". Cgroup Paths: map[pids:/ net_cls,net_prio:/ blkio:/ perf_event:/ name=systemd:/ devices:/ cpuset:/ cpu,cpuacct:/ freezer:/ hugetlb:/ memory:/]
00:16:09 


00:24:18 FAIL: docker_cli_daemon_test.go:1857: DockerDaemonSuite.TestDaemonCgroupParent
00:24:18 
00:24:18 [d2f8a22850347] waiting for daemon to start
00:24:18 [d2f8a22850347] daemon started
00:24:18 
00:24:18 docker_cli_daemon_test.go:1881:
00:24:18     c.Assert(found, checker.True, check.Commentf("Cgroup path for container (%s) doesn't found in cgroups file: %s", expectedCgroup, cgroupPaths))
00:24:18 ... obtained bool = false
00:24:18 ... Cgroup path for container (test/0c465dd7a395dc6ce8c0d6a292233f3580b0192f20db92712fb892077c8c8268) doesn't found in cgroups file: map[devices:/ pids:/ cpuset:/ freezer:/ perf_event:/ memory:/ name=systemd:/ cpu,cpuacct:/ net_cls,net_prio:/ hugetlb:/ blkio:/]
00:24:18 
00:24:18 [d2f8a22850347] exiting daemon
00:24:18 [d2f8a22850347] waiting for daemon to start
00:24:18 [d2f8a22850347] daemon started
00:24:18 
00:24:18 [d2f8a22850347] exiting daemon
00:24:18 

@thaJeztah thaJeztah added status/1-design-review status/failing-ci Indicates that the PR in its current state fails the test suite and removed status/0-triage labels Dec 15, 2018
@codecov
Copy link

codecov bot commented Dec 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@b20a14b). Click here to learn what that means.
The diff coverage is 36.58%.

@@            Coverage Diff            @@
##             master   #38377   +/-   ##
=========================================
  Coverage          ?   36.99%           
=========================================
  Files             ?      612           
  Lines             ?    45468           
  Branches          ?        0           
=========================================
  Hits              ?    16823           
  Misses            ?    26352           
  Partials          ?     2293

@rgulewich
Copy link
Contributor Author

@thaJeztah This change doesn't change the /info endpoint on my machine:

$ curl -sS http://localhost:4243/v1.26/info | jq . | grep -i cgroup
  "CgroupDriver": "cgroupfs",

Should it be added to that endpoint?

@olljanat
Copy link
Contributor

@rgulewich plz check failing tests

@olljanat
Copy link
Contributor

olljanat commented Jan 11, 2019

(reserved for my derek commands)

@derek derek bot added rebuild/z and removed rebuild/z status/failing-ci Indicates that the PR in its current state fails the test suite labels Jan 11, 2019
@olljanat
Copy link
Contributor

@rgulewich #38426 removed SameHostDaemon check so CI fails now to:

07:13:07 # github.com/docker/docker/integration-cli [github.com/docker/docker/integration-cli.test]
07:13:07 integration-cli/docker_cli_daemon_test.go:1858:18: undefined: SameHostDaemon
07:13:07 integration-cli/docker_cli_run_test.go:3239:18: undefined: SameHostDaemon

so can your rebase and update tests?

@rgulewich
Copy link
Contributor Author

@olljanat Pushed changes. The experimental failures don't look to be related, unless I'm mistaken.

@rgulewich
Copy link
Contributor Author

rgulewich commented Feb 14, 2019

Would anyone be able to give a more thorough review of this? I agree with @cyphar's comment in opencontainers/runc#1976 that moving the detection code into libcontainer isn't really necessary (though that PR is there if that's preferable). @cpuguy83, would you mind checking that any concerns you have are covered?

@AkihiroSuda / @yongtang / @olljanat / @kolyshkin / @justincormack - you all look to have committed code in this area somewhat recently. Would any of you mind reviewing?

Thanks!

@cpuguy83
Copy link
Member

I'm a bit worried that we are having to change tests and as such making a breaking change... but maybe the tests are just not good tests.
I feel like the cgroup API is pretty well defined here and depending on a particular heirarchy for a cgroup seems wrong.... but still worried that people could be doing this, particularly because there's no way to turn it off.

@rgulewich
Copy link
Contributor Author

@cpuguy83 - Fair enough. I wouldn't want to break someone who was depending on this behaviour. Would a run-time flag to enable / disable this for the entire daemon be sufficient, do you think?

@yongtang
Copy link
Member

All tests passed, merging.

@AkihiroSuda
Copy link
Member

Is there docker CLI pr for overriding default-cgroupns-mode ?

@thaJeztah
Copy link
Member

@AkihiroSuda I don't think there is one yet (at least; I don't see one linked)

@tonistiigi
Copy link
Member

@rgulewich TestCgroupNamespacesRunPrivileged added in this test seems to be failing consistently with

assertion failed: error is not nil: Error response from daemon: privileged mode is incompatible with private cgroup namespaces. You must run the container in the host cgroup namespace when running privileged mode

eg. https://jenkins.dockerproject.org/job/Docker%20Master/label=centos-7-noselinux-devicemapper/10631/consoleFull

It doesn't fail in PRs and in master just because it is skipped because of the system condition checks. Can you follow-up please?

@rgulewich
Copy link
Contributor Author

@rgulewich TestCgroupNamespacesRunPrivileged added in this test seems to be failing consistently with

assertion failed: error is not nil: Error response from daemon: privileged mode is incompatible with private cgroup namespaces. You must run the container in the host cgroup namespace when running privileged mode

eg. https://jenkins.dockerproject.org/job/Docker%20Master/label=centos-7-noselinux-devicemapper/10631/consoleFull

It doesn't fail in PRs and in master just because it is skipped because of the system condition checks. Can you follow-up please?

Good catch, @tonistiigi - please see my PR here: #39578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cgroup namespace support