podman: add new cgroup mode split#6666
podman: add new cgroup mode split#6666openshift-merge-robot merged 1 commit intocontainers:masterfrom
Conversation
|
@vrothberg PTAL |
vrothberg
left a comment
There was a problem hiding this comment.
The commit messages states conmon-delegated but the code adds a no-conmon-delegated. I'd guess we should update the docs as well.
Do we need to update the systemd unit files in pkg/system/generate?
libpod/container_internal_linux.go
Outdated
There was a problem hiding this comment.
Can we make "no-conmon-delegated" a constant somewhere?
There was a problem hiding this comment.
The commit messages states
conmon-delegatedbut the code adds ano-conmon-delegated. I'd guess we should update the docs as well.
Thanks, I've renamed it to conmon-delegated and added a constant
Do we need to update the systemd unit files in
pkg/system/generate?
I don't think we can do that yet. crun needs containers/crun#409
0148073 to
8832f34
Compare
docs/source/markdown/podman-run.1.md
Outdated
There was a problem hiding this comment.
If it's also specific to systemd cgroups (looks like it is) we should also mention that
There was a problem hiding this comment.
not sure it is so specific to systemd. It can be used also without systemd, even if it is less useful
There was a problem hiding this comment.
this will also break remote. We probably need a check to not permit the option being used as remote as it changes the cgroup of the Podman process itself
|
Overall, I like this. I think it does break |
e168a7c to
173e414
Compare
|
Is there a reason why you did cgroups 2 only? This approach works for cgroups 1: I'd happily make this coexist if need be. |
|
For that matter, why is cgroup-parent so draconian? It only allows items to end in .slice. And yet this works: in config.json Container cgroups could be implemented with --cgroup-parent /machine.slice/%n/container, if the cgroup-parent check weren't rejecting it. This commit proves it works. Would a better implementation be implementing --conmon-cgroup-parent /machine.slice/%n/supervisor? |
we could relax them.
The current implementation works well also without systemd, i.e. you have a cgroup you own and you want to create both conmon+container cgroup in it. |
just simplicity. We can surely add cgroup v1 support. It must be done for each controller separately, it is not enough to get the cgroup from the pid controller |
I read the cgroup from the pid controller, but in the Move call, I move all of them... that should do it, right? I don't know if pids will always be 3 - I can fix the find if need be. |
there could be different cgroups for each controller, the safest is to read each controller and then do the move as part of the same loop for that controller |
... except you were using the same call to determine the name of the supervisor cgroup - so I could only return 1 So should the supervisor cgroup name be based on the pids namespace, and the move do something else? i.e. just tack "container" onto the end? tack "container" onto the end if it's not /? |
|
A couple thoughts here:
|
FYI top is working for me - it properly shows the /slice/unit/container cgroup Stats is not. |
for cgroup v2 there cannot be processes in the inner groups. We need to make sure the conmon process is already in the right sub cgroup before it starts (or have a way to communicate to conmon to move itself). We need to do such steps before setting up the container, the previous code path cannot be used.
there is only one hierarchy in cgroup v2, so that is enough. In any case, I think I can just make my PR more generic to handle cgroup v1 too, even if it is not necessary to move the conmon process in a sub cgroup |
|
pr updated w/ podman stats working I'll be the first to admit, I don't fully understand all the implications, and I don't have a cgv2 machine to test on. I do know this all works on cgv1, and it looks simpler and reuses code paths that were already present and seem to make sense. But I don't see how podman stats is going to work if you never pass the cgroup parent back through the container object... Thanks! |
|
FCOS appears to be running cg1/2 hybrid, and pkg/cgroups doesn't adjust /sys/fs/cgroup/unified. Is the long term plan to fix pkg/cgroups or deprecate it? |
25e14a5 to
66c9e08
Compare
|
LGTM |
|
@mheon @vrothberg PTAL |
|
We want to hold this until Monday - we're in a semi-feature-freeze state for v2.0.0 release, which should happen in a few hours. Will do a full review once that is cut. |
There was a problem hiding this comment.
Can we just call this current? I don't see where split- adds any additional information to the user.
There was a problem hiding this comment.
we may need current in future to really mean current and use the cgroup where Podman is running instead of creating two sub-cgroups
There was a problem hiding this comment.
reuse-current or perhaps just reuse then?
66c9e08 to
c555172
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c555172 to
ac92674
Compare
|
Can you add the cgroup-parent validation to |
When running under systemd there is no need to create yet another cgroup for the container. With conmon-delegated the current cgroup will be split in two sub cgroups: - supervisor - container The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend). Closes: containers#6400 Signed-off-by: Giuseppe Scrivano <[email protected]>
ac92674 to
6ee5f74
Compare
added and pushed a new version |
|
ready for review |
|
LGTM |
1 similar comment
|
LGTM |
| The **enabled** option will create a new cgroup under the cgroup-parent. | ||
| The **disabled** option will force the container to not create CGroups, and thus conflicts with CGroup options (**--cgroupns** and **--cgroup-parent**). | ||
| The **no-conmon** option disables a new CGroup only for the **conmon** process. | ||
| The **split** option splits the current cgroup in two sub-cgroups: one for conmon and one for the container payload. It is not possible to set **--cgroup-parent** with **split**. |
There was a problem hiding this comment.
We really, really, really need to create a conmon man page that we can add a link to for pages like this. @haircommander any chance of making this happen, even if the page is terribly short and sweet?
This is not a holdback for this review at this time.
|
/lgtm |
add a new cgroup mode split.
When running under systemd there is no need to create yet another cgroup for the container.
With conmon-delegated the current cgroup will be split in two sub cgroups:
The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend).
Closes: #6400
Signed-off-by: Giuseppe Scrivano [email protected]