-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd: support cgroup parent with specified slice #336
Conversation
@hqhq Thanks for reviving this! :) I think we can review your PR. |
ping @LK4D4 @crosbymichael @mrunalp @vishh What do you think about this PR. A lot of users want If we can't consist them, and use Maybe we can reopen #141 to finish the discussion. And I'm not a systemd expert, please correct me if I am wrong. |
Systemd allows creation of hierarchical slices. Why can't we pass the cgroups path to a slice as Since runc is attempting to implement the OCI Spec, we should probably move this discussion to OCI spec. FYI: The Spec currently supports taking in an absolute path to cgroup. If runc were to implement that, |
would be nicer to drop systemd support since 50%+ of the code is ours setting things up that systemd does not. Its so bad. If we want the container in the systemd scope for a unit then we can just use the |
If the end-goal is to track a container using a unit file, we need to On Mon, Oct 26, 2015 at 10:38 AM, Michael Crosby [email protected]
|
@crosbymichael I understand we can use systemd unit files to manage runc as a systemd service. But I think that's not good enough for users. And we still have to support systemd for Docker right? Nowadays almost all distros are using systemd, giving up systemd support would be a very big deal, I think we don't have choice but enhance systemd support. ping @philips since go-systemd is from CoreOS repo, what do you think about problems we are facing with systemd? Such like this |
IIUC, users want to create systemd services which are docker containers. On Wed, Oct 28, 2015 at 11:29 PM, Qiang Huang [email protected]
|
@vishh I don't think so, in my understanding, users want to create a Docker container by @crosbymichael Correct me if I'm wrong. |
Currently when cgroups are managed by systemd, each container runs in a systemd scope. AFAICT, scopes are not hierarchical, so we cannot nest a container's scope in another (like raw cgroup fs). However, systemd slices are hierarchical. This can be used to group containers and control resource limits on them. To group containers, users should be able to create a slice definition under /usr/lib/systemd/system, say "limits.slice" and run containers in it using --cgroup-parent option: Grouping of containers can be observed under limits.slice using systemd-cgls. Implementing a change that honors the cgroup-parent as the slice is straightforward. If we are okay with the following side-effects of the change, then we can proceed to implement it:
ping @crosbymichael @LK4D4 @hqhq @vishh |
@@ -151,8 +151,8 @@ func (m *Manager) Apply(pid int) error { | |||
properties []systemdDbus.Property | |||
) | |||
|
|||
if c.Slice != "" { | |||
slice = c.Slice | |||
if c.Parent != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When invoked via docker, c.Parent will never be NULL. The default parent cgroup is "docker".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Docker can not be the only user, if we enable systemd cgroup for runc, runc itself can set c.Parent
to anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker sets Cgroup Parent to be "docker" in daemon/execdriver/native/template/default_template_linux.go. This should be changed to "system.slice" in case of systemd cgroups.
@anusha-ragunathan That is an accurate summary. What this PR is doing is fixing the current behavior which is absolutely wrong. I don't see why we shouldn't allow this PR. It fixes systemd behavior to be correct while not really affecting those who want to continue using fs implementation. |
@hqhq This needs rebase btw. |
@mrunalp @anusha-ragunathan +1 to ensuring that |
Pick up opencontainers#119 Fixes: moby/moby#16681 Signed-off-by: Qiang Huang <[email protected]>
3c4b761
to
7695a0d
Compare
@mrunalp Rebased. |
// Parent slice to use for systemd TODO: remove in favor or parent | ||
Slice string `json:"slice"` | ||
// ScopePrefix decribes prefix for the scope name | ||
ScopePrefix string `json:"scope_prefix"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Docker, ScopePrefix should always be "docker".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that needs to be done on Docker side and of course we will do that.
@mrunalp @hqhq: I see that parent-cgroup needs to work with systemd. The issue with this PR is that the The fact that runc now has an in-built support for systemd dilutes my argument though :( |
@vishh i thought runc did not support systemd cgroups or is that some other feature? |
@vishh The problem this is trying to address is in docker that you can see below. Instead of renaming the scope, it should use a different parent slice as specified.
|
@crosbymichael: You are correct. |
The problem is no one really knows what to expect with systemd and hierarchies because it does not seem to have that concept. @anusha-ragunathan has been doing alot of research on how systemd handles cgroups and its like two totally different worlds with scopes and slices. It does not seem to appear that sytemd supports hierarchies like people want, alot of the parent child relation ship is encoded into the name but @anusha-ragunathan can explain a little better. |
@crosbymichael systemd supports hierarchy of slices and at the lowest levels there are scopes/services. By allowing slices as values that one could pass to cgroup-parent we have parity with the fs implementation. The current behavior of renaming the scope is plain wrong. |
systemd (208 in my case. default for RHEL 7.1) allows hierarchy of slices, but no hierarchy of scopes.
systemd-run --scope sleep 180 systemd-run --scope --slice=run-3894.scope sleep 180 However, note that systemd-cgls doesnt nest the scopes. It simply creates a slice in p1's scopename and run p2 in it, while p1 is run under system.slice. If systemd cgroups requires a way to group containers, then slices is the way to go. It's not equivalent to raw cgroupfs tree hierarchy which can allow deep nesting. If the --cgroup-parent option should not be overloaded to accomodate both cgroupfs and systemd cgroups, then having a separate option (such as --systemd-slice) might be the correct way to proceed. |
@anusha-ragunathan Yes, that's what I said in #336 (comment) regarding hierarchy in systemd. In systemd a slice is essentially a parent for either sub-slices or scopes/services at the lowest level. Hence, IMO, --cgroup-parent should be allowed to take systemd slices as arguments. |
@crosbymichael Agree with you. |
I'll try to summarize:
|
@anusha-ragunathan I proposed a new option like |
@hqhq : I'm perfectly fine with using --cgroup-parent to take a slice name for systemd cgroup. We can clearly document this difference in behavior. |
I think its reasonable that if I do the following on a $ systemctl status foo
● foo.service - Foo that does something important
Loaded: loaded (/etc/systemd/system/foo.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2015-12-03 14:07:38 EST; 1 day 3h ago
Main PID: 15864 (foo)
CGroup: /important.slice/foo.service For user's on those systems, they will think Anything else is just confusing to the end-user. So +1 @anusha-ragunathan |
@crosbymichael @LK4D4 @vishh WDYT? |
@mrunalp Ok for me |
@mrunalp: Which approach are you referring to? |
@vishh I am talking about allowing --cgroup-parent to take slices as arguments |
@LK4D4 We need not even expose the scope prefix as a cmdline/API option. For docker it could just be set to "docker". |
@mrunalp Yeah, I meant that we doing that option only for docker. |
LGTM |
LGTM. On Wed, Dec 9, 2015 at 12:12 PM, Mrunal Patel [email protected]
|
@crosbymichael @LK4D4 ping |
LGTM |
systemd: support cgroup parent with specified slice
Note that there are two remaining tasks to call this feature complete:
@hqhq : Will you be taking care of this in a follow-up change? Thanks. |
Whatever PR does the follow-up can a link be provided to those that subscribed this PR? |
isn't this gonna be configurable henceforth at the daemon level? On Fri, Dec 11, 2015 at 11:57 AM, Derek Carr [email protected]
|
@anusha-ragunathan I am working on integrating this with docker. Should have a PR out today. |
@vishh That is correct but the defaults need to be modified to be "system.slice" when started with --exec-opt native.cgroupdriver=systemd |
Here is the docker PR moby/moby#18612 |
Pick up #119
Fixes: moby/moby#16681
Signed-off-by: Qiang Huang [email protected]