Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: introduce support for cgroup freezer #13512

Merged
merged 4 commits into from
May 1, 2020
Merged

Conversation

msekletar
Copy link
Contributor

With cgroupv2 the cgroup freezer is implemented as a cgroup
attribute (knob) called cgroup.freeze. cgroup can be frozen by writing
"1" to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed. See [1] for more details of the new interface.

This commit exposes above low-level functionality through systemd's DBus
API. Two new methods on each unit object are introduced, Freeze() and
Thaw(). However, each unit type must provide specialized implementation
for these methods, otherwise, we return an error (EOPNOTSUPP) to the
client requesting the operation on the unit type w/o the support. So far
only service, scope, and slice unit have support.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed. systemctl verbs that wrap the dbus
calls are also provided.

Implementation is accompanied by very basic integration test.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=freezer#core-interface-files

@msekletar
Copy link
Contributor Author

CI fails because I've messed up test-suite integration. I was running the test case manually and I should have tried to run it as part of the overall test-suite. I will update the PR with the fix.

@mrc0mmand
Copy link
Member

mrc0mmand commented Sep 17, 2019

The fail on Arch is most likely a timing issue (as it failed with Call failed: Connection timed out). I guess tweaking the busctl timeout via --timeout might help in this case, as the test runs in a VM without KVM acceleration.

The issue would be, probably, present in Ubuntu CI as well, but in this case the test is skipped there:

+ set -e
+ set -o pipefail
+ unit=foo.service
+ unit_file=/etc/systemd/system/foo.service
+ test -e /sys/fs/cgroup/system.slice/cgroup.freeze
+ echo OK
+ exit 0

@mrc0mmand
Copy link
Member

mrc0mmand commented Sep 17, 2019

Also, would it make sense to force cgroups v2 for this particular test? Something along:

KERNEL_APPEND="$KERNEL_APPEND systemd.unified_cgroup_hierarchy=true"

in the test.sh file, as we, sort of, already do with the SELinux test:

KERNEL_APPEND="$KERNEL_APPEND selinux=1 security=selinux"

This way the test could test what it's supposed to even on systems which default to legacy/hybrid hierarchies.

@boucman
Copy link
Contributor

boucman commented Sep 20, 2019

would it be possible to have a "systemctl is-frozen" ? is this visible on "systemctl status" ?

@msekletar
Copy link
Contributor Author

Yes, it is visible in systemctl output.

[msekleta@f30 ~]$ sudo systemctl freeze cups
[sudo] password for msekleta:
[msekleta@f30 ~]$ sudo systemctl status cups
● cups.service - CUPS Scheduler
   Loaded: loaded (/usr/lib/systemd/system/cups.service; disabled; vendor preset: disabled)
   Active: active (frozen) since Fri 2019-09-20 18:05:29 CEST; 26s ago
     Docs: man:cupsd(8)
 Main PID: 1472 (cupsd)
   Status: "Scheduler is running..."
    Tasks: 1
   Memory: 6.2M
      CPU: 27ms
   CGroup: /system.slice/cups.service
           └─1472 /usr/sbin/cupsd -l

Sep 20 18:05:29 f30 systemd[1]: Starting CUPS Scheduler...
Sep 20 18:05:29 f30 systemd[1]: Started CUPS Scheduler.

@msekletar
Copy link
Contributor Author

I was playing with the failing test case and I've figured out that timeout happens because apparently dbus reply about the unit being frozen was not sent. Previously in unit_frozen() I've added the unit to dbus queue so once it is dispatched in next event loop iteration we send out the pending reply in bus_unit_send_pending_freezer_message().

Now I've changed the code so that reply is sent directly from unit_frozen(). I don't think that it is the best solution but it makes test case to PASS. I will keep it for now, until I figure out some better fix.

@msekletar
Copy link
Contributor Author

CI si finally all green. @cdown I'd appreciate if you could have another look.

@msekletar msekletar removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Sep 25, 2019
Copy link
Member

@cdown cdown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept generally looks like it's about done, just some things to clean up. Thanks!

@msekletar
Copy link
Contributor Author

@cdown I've force pushed my branch.

I've addressed (hopefully) everything that you've pointed out, except two points.

  • #if HAVE_SELINUX - see the rationale in the commit message
  • message that unit type doesn't support freezing - I propose to adjust the message if it proves to be confusing (unless you block merging before we change it :)).

Thank you very much for the review! I owe you a🍺 or two.

@msekletar
Copy link
Contributor Author

Ubuntu CI failures seem unrelated (most likely caused by #13794).

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly OK, but there are at least two high-level issues:

  • the kernel check is not forward compatible
  • operation on slices ignores unsupported children, but I think it shouldn't.

More detailed comments inline.

Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
@msekletar
Copy link
Contributor Author

Updated. @poettering PTAL

@msekletar msekletar removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Apr 29, 2020
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this once the tests pass. I think it'd be good to move stuff to src/shared/ as mentioned below, but there'll be probably other things to move, so this can be done in a separate PR.

@msekletar you promised another PR with docs on nested operation ;)

@@ -149,6 +149,20 @@ bool cg_ns_supported(void) {
return enabled;
}

bool cg_freezer_supported(void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should go in shared/, not /basic/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? sounds generic enough and similar enough to many other cg_xyz() checkers to stay here...

[FREEZER_THAWING] = "thawing",
};

DEFINE_STRING_TABLE_LOOKUP(freezer_state, FreezerState);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

@keszybz keszybz added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Apr 29, 2020
@@ -719,7 +719,7 @@ int bus_unit_method_clean(sd_bus_message *message, void *userdata, sd_bus_error
if (r == -EUNATCH)
return sd_bus_error_setf(error, BUS_ERROR_NOTHING_TO_CLEAN, "No matching resources found.");
if (r == -EBUSY)
return sd_bus_error_setf(error, BUS_ERROR_UNIT_BUSY, "Unit is not inactive or has pending job.");
return sd_bus_error_setf(error, BUS_ERROR_UNIT_BUSY, "Unit is inactive or has pending job.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the wording was correct. we can only clean a unit that is inactive and has no job queued.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the commit that introduced the change. I got confused by double negative and as I've never played with clean so far it didn't occur to me it is actually correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar is so tortuous that I was confused too (I suggested the change).

@poettering
Copy link
Member

did another review round. found some more minor stuff, ptal

@msekletar
Copy link
Contributor Author

Fixed. @poettering PTAL.

@poettering
Copy link
Member

lgtm. Now I hope the CIs can be convinced too!

With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
@msekletar
Copy link
Contributor Author

@poettering thanks for the reviews!

Btw, I did one more fix-up to make Travis clang (asan+ubsan) job green (hopefully).

@msekletar
Copy link
Contributor Author

Ubuntu CI failure seems unrelated, from the log it looks like some Openstack issue. Travis CI failure also seems unrelated (docker exec failure).

@mrc0mmand
Copy link
Member

Ubuntu CI failure seems unrelated, from the log it looks like some Openstack issue. Travis CI failure also seems unrelated (docker exec failure).

I restarted at least the Travis CI job.

@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label May 1, 2020
@keszybz keszybz merged commit b76ef59 into systemd:master May 1, 2020
@Werkov
Copy link
Contributor

Werkov commented May 6, 2020

Hi, good job pulling this off. I only got down to review this now, so let me know if you prefer followup here or would this better be filed as issue(s) against master.

Freezing vs jobs

This manifests in cases like: systemd-run --slice=frozen.slice cmd or systemctl stop unit-frozen-by-ancestor.slice (the latter mainly visible with ExecStop=). At one moment I considered making freezing/thawing a first class type of job, however, that wouldn't compose nicely with established state machine (active, inactive, ...). In the end, I think it'd be simpler to reject jobs (basically anything but JOB_DONE to fail deps) on units that are currently in frozen context (regardless of freezing origin, i.e. frozen ancestor or self).

Recursive behavior

(I recall discussing this but can't find that now and it may be less relevant to the current version anyway.)
The behavior I observe is

: nothing frozen at the beginning
systemctl freeze child.service # effectively frozen, reported state is frozen
systemctl freeze parent.slice # effectively frozen, reported state is frozen
systemctl thaw parent.slice # everything is thawed

or

: nothing frozen at the beginning
systemctl freeze parent.slice
: child.service effectively frozen, reported state is frozen
systemctl thaw child.service # effectively frozen, reported state remains frozen
systemctl thaw child.service # effectively frozen, reported state switches to thawing

In the former case, it seems to be that user's request to have child.service frozen was wiped by thawing parent.slice. [1]

In the latter case, thawing the child.service correctly doesn't change reported state from frozen (because of the parent), however, the change after second call is weird.

I'd expect the following behavior

: nothing frozen at the beginning
systemctl freeze child.service # effectively frozen, reported state is frozen
systemctl freeze parent.slice # effectively frozen, reported state is frozen
systemctl thaw parent.slice # child effectively frozen, reported state is frozen

: nothing frozen at the beginning
systemctl freeze parent.slice
: child.service effectively frozen, reported state is frozen-ancestor
systemctl thaw child.service # effectively frozen, reported state remains frozen-ancestor
...
systemctl freeze child.service # effectively frozen, reported state switches to frozen

[1] Although lexically substituting freeze/thaw with start/stop would make the behavior familiar (in a sense that child.service can be started via dependency by parent.slice even though user requested child.service to stop), I think it's not what is intended when configuring properties in the hierarchy. I'd like to know your opinion.

Freezing vs jobs 2

During my experiments (with 9e12d5b) I run into the following situation. I didn't catch the steps to reproduce it, so it's rather a hint.

image:/etc/systemd/system # systemctl status test.slice
● test.slice
Loaded: loaded
Active: active since Wed 2020-05-06 18:09:32 CEST; 21s ago
Tasks: 4
Memory: 3.2M
CPU: 75ms
CGroup: /test.slice
├─looper10.service
│ ├─495 /usr/bin/bash /root/looper 10
│ └─537 sleep 1
└─looper11.service
├─502 /usr/bin/bash /root/looper 11
└─536 sleep 1

May 06 18:09:49 image looper[502]: s11
May 06 18:09:49 image looper[495]: s10
May 06 18:09:50 image looper[502]: s11
May 06 18:09:50 image looper[495]: s10
May 06 18:09:51 image looper[502]: s11
May 06 18:09:51 image looper[495]: s10
May 06 18:09:52 image looper[502]: s11
May 06 18:09:52 image looper[495]: s10
May 06 18:09:53 image looper[502]: s11
May 06 18:09:53 image looper[495]: s10
image:/etc/systemd/system # systemctl list-jobs
No jobs running.
image:/etc/systemd/system # systemctl freeze test.slice
Failed to freeze unit test.slice: Unit has a pending job.

Minor remarks

I suggest making freezing/thawing messages same logging level as starting/stopping. I can imagine troubleshooting some issues with regular logging may get cumbersome.

I ran into misaligned coloring, this should fix it.

@msekletar
Copy link
Contributor Author

@Werkov it would be much appreciated if you could file separate issue. Thanks!

@Werkov
Copy link
Contributor

Werkov commented May 19, 2020

I've filed #15849, #15850 and #15851, I skipped Freezing vs jobs 2 for the lack of data.
A bit more time passed than I expected but the points should still be valid.

@venuprsd
Copy link

venuprsd commented Dec 9, 2020

Hi Michal

Hope you are doing good. I use RHEL 8.3. My systemd still getting error "thawing". Any official fix? Please help me

@msekletar
Copy link
Contributor Author

Hope you are doing good. I use RHEL 8.3. My systemd still getting error "thawing". Any official fix? Please help me

Please update to latest RHEL-8.3 systemd package (systemd-239-41.el8_3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

8 participants