-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: introduce support for cgroup freezer #13512
Conversation
CI fails because I've messed up test-suite integration. I was running the test case manually and I should have tried to run it as part of the overall test-suite. I will update the PR with the fix. |
The fail on Arch is most likely a timing issue (as it failed with The issue would be, probably, present in Ubuntu CI as well, but in this case the test is skipped there:
|
Also, would it make sense to force cgroups v2 for this particular test? Something along: KERNEL_APPEND="$KERNEL_APPEND systemd.unified_cgroup_hierarchy=true" in the systemd/test/TEST-06-SELINUX/test.sh Line 16 in b1ccd38
This way the test could test what it's supposed to even on systems which default to legacy/hybrid hierarchies. |
would it be possible to have a "systemctl is-frozen" ? is this visible on "systemctl status" ? |
Yes, it is visible in systemctl output.
|
I was playing with the failing test case and I've figured out that timeout happens because apparently dbus reply about the unit being frozen was not sent. Previously in Now I've changed the code so that reply is sent directly from |
CI si finally all green. @cdown I'd appreciate if you could have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept generally looks like it's about done, just some things to clean up. Thanks!
@cdown I've force pushed my branch. I've addressed (hopefully) everything that you've pointed out, except two points.
Thank you very much for the review! I owe you a🍺 or two. |
Ubuntu CI failures seem unrelated (most likely caused by #13794). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly OK, but there are at least two high-level issues:
- the kernel check is not forward compatible
- operation on slices ignores unsupported children, but I think it shouldn't.
More detailed comments inline.
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the missing keyes in cgroup attribute file are OK or not. Also the wrappers for both strict and graceful version are provided.
Updated. @poettering PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this once the tests pass. I think it'd be good to move stuff to src/shared/ as mentioned below, but there'll be probably other things to move, so this can be done in a separate PR.
@msekletar you promised another PR with docs on nested operation ;)
@@ -149,6 +149,20 @@ bool cg_ns_supported(void) { | |||
return enabled; | |||
} | |||
|
|||
bool cg_freezer_supported(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this should go in shared/, not /basic/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? sounds generic enough and similar enough to many other cg_xyz() checkers to stay here...
[FREEZER_THAWING] = "thawing", | ||
}; | ||
|
||
DEFINE_STRING_TABLE_LOOKUP(freezer_state, FreezerState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too.
src/core/dbus-unit.c
Outdated
@@ -719,7 +719,7 @@ int bus_unit_method_clean(sd_bus_message *message, void *userdata, sd_bus_error | |||
if (r == -EUNATCH) | |||
return sd_bus_error_setf(error, BUS_ERROR_NOTHING_TO_CLEAN, "No matching resources found."); | |||
if (r == -EBUSY) | |||
return sd_bus_error_setf(error, BUS_ERROR_UNIT_BUSY, "Unit is not inactive or has pending job."); | |||
return sd_bus_error_setf(error, BUS_ERROR_UNIT_BUSY, "Unit is inactive or has pending job."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the wording was correct. we can only clean a unit that is inactive and has no job queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped the commit that introduced the change. I got confused by double negative and as I've never played with clean so far it didn't occur to me it is actually correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grammar is so tortuous that I was confused too (I suggested the change).
did another review round. found some more minor stuff, ptal |
Fixed. @poettering PTAL. |
lgtm. Now I hope the CIs can be convinced too! |
With cgroup v2 the cgroup freezer is implemented as a cgroup attribute called cgroup.freeze. cgroup can be frozen by writing "1" to the file and kernel will send us a notification through "cgroup.events" after the operation is finished and processes in the cgroup entered quiescent state, i.e. they are not scheduled to run. Writing "0" to the attribute file does the inverse and process execution is resumed. This commit exposes above low-level functionality through systemd's DBus API. Each unit type must provide specialized implementation for these methods, otherwise, we return an error. So far only service, scope, and slice unit types provide the support. It is possible to check if a given unit has the support using CanFreeze() DBus property. Note that DBus API has a synchronous behavior and we dispatch the reply to freeze/thaw requests only after the kernel has notified us that requested operation was completed.
@poettering thanks for the reviews! Btw, I did one more fix-up to make Travis clang (asan+ubsan) job green (hopefully). |
Ubuntu CI failure seems unrelated, from the log it looks like some Openstack issue. Travis CI failure also seems unrelated (docker exec failure). |
I restarted at least the Travis CI job. |
Hi, good job pulling this off. I only got down to review this now, so let me know if you prefer followup here or would this better be filed as issue(s) against Freezing vs jobsThis manifests in cases like: Recursive behavior(I recall discussing this but can't find that now and it may be less relevant to the current version anyway.)
or
In the former case, it seems to be that user's request to have child.service frozen was wiped by thawing parent.slice. [1] In the latter case, thawing the child.service correctly doesn't change reported state from frozen (because of the parent), however, the change after second call is weird. I'd expect the following behavior
[1] Although lexically substituting freeze/thaw with start/stop would make the behavior familiar (in a sense that child.service can be started via dependency by parent.slice even though user requested child.service to stop), I think it's not what is intended when configuring properties in the hierarchy. I'd like to know your opinion. Freezing vs jobs 2During my experiments (with 9e12d5b) I run into the following situation. I didn't catch the steps to reproduce it, so it's rather a hint.
Minor remarksI suggest making freezing/thawing messages same logging level as starting/stopping. I can imagine troubleshooting some issues with regular logging may get cumbersome. I ran into misaligned coloring, this should fix it. |
@Werkov it would be much appreciated if you could file separate issue. Thanks! |
Hi Michal Hope you are doing good. I use RHEL 8.3. My systemd still getting error "thawing". Any official fix? Please help me |
Please update to latest RHEL-8.3 systemd package (systemd-239-41.el8_3). |
With cgroupv2 the cgroup freezer is implemented as a cgroup
attribute (knob) called cgroup.freeze. cgroup can be frozen by writing
"1" to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed. See [1] for more details of the new interface.
This commit exposes above low-level functionality through systemd's DBus
API. Two new methods on each unit object are introduced, Freeze() and
Thaw(). However, each unit type must provide specialized implementation
for these methods, otherwise, we return an error (EOPNOTSUPP) to the
client requesting the operation on the unit type w/o the support. So far
only service, scope, and slice unit have support.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed. systemctl verbs that wrap the dbus
calls are also provided.
Implementation is accompanied by very basic integration test.
[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=freezer#core-interface-files