sleep: always thaw user.slice even if freezing failed by msizanoen1 · Pull Request #25662 · systemd/systemd

msizanoen1 · 2022-12-07T06:19:58Z

FreezeUnit can fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is [email protected] being frozen while still being able to log in over SSH.

Fixes: #25356

msizanoen1 · 2022-12-07T09:31:10Z

Apparently trying to freeze user.slice currently always fail on [email protected] as it doesn't have a realized cgroup but an active state, causing unit_cgroup_freezer_action to return -EBUSY, which might cause the FreezeUnit call to fail halfway through after freezing [email protected] or some session-*.scope unit. Shutting down the system still works as systemd will automatically thaw [email protected] on shutdown.

bluca · 2022-12-07T09:31:53Z

Which processes are you talking about exactly? Have you filed bugs with those? Most of the comments in that issue are talking about various kernel versions, with reports that issue go away by changing the kernel.

msizanoen1 · 2022-12-07T09:33:42Z

The reporters have later stated that all tested kernel versions suffer from the issue.

msizanoen1 · 2022-12-07T09:36:01Z

FreezeUnit failing halfway through without thawing units is very bad, as it means some units won't get thawed later, which might be what results in [email protected] freezing while still being able to login using SSH.

cc @poettering

bluca · 2022-12-07T09:51:43Z

FreezeUnit failing halfway through without thawing units is very bad, as it means some units won't get thawed later, which might be what results in [email protected] freezing while still being able to login using SSH.

Makes sense, then please change that - simply set it so that thawing is always done, even if freezing returned an error. It's a no-op if a cgroup isn't frozen, so shouldn't matter too much.

src/core/cgroup.c

src/core/dbus-unit.c

poettering · 2022-12-07T15:14:51Z

/cc @msekletar (as the original author of the freeze/thaw logic)

src/core/cgroup.c

`FreezeUnit` can fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is `[email protected]` being frozen while still being able to log in over SSH.

…r thaw This ensures that services with `RemainAfterExit` but without any process running won't cause failure during freeze.

This ensures starting a new unit under a frozen slice work as expected.

Sometimes a freeze operation can hang due to the presence of kernel threads inside the unit cgroup (e.g. QEMU-KVM). This ensures that the ThawUnit operation invoked by systemd-sleep at wakeup always thaws the unit.

msizanoen1 · 2022-12-08T04:22:50Z

Seems like the CI failures are due to Launchpad being down: https://github.com/systemd/systemd/actions/runs/3644468625/jobs/6153747721

poettering · 2022-12-08T11:30:52Z

src/sleep/sleep.c

                return log_debug_errno(r, "Failed to open connection to systemd: %m");

+        /* Wait for 1.5 seconds at maximum for freeze operation */
+        sd_bus_set_method_call_timeout(bus, 1500 * USEC_PER_MSEC);


hmm, on an overloaded system 1.5s sounds a bit too little. let's make this 5s?

also, please cast result to (void), given you knowingly ignore any errors here. static checkers care for this.

This has to be 1.5 seconds. FreezeUnit will just hang if there's a QEMU/KVM process running in user.slice.

src/core/cgroup.c

poettering · 2022-12-08T11:34:48Z

lgtm, just some nitpicks

The `frozen` state can be `0` while the processes are indeed frozen (see last commit). Therefore do not respect cgroup.events when checking whether thawing is necessary.

A FreezeUnit operation can hang due to the presence of kernel threads (see last 2 commits). Keeping the default configuration will mean the system will hang for 25 seconds in suspend waiting for the response. 1.5 seconds should be sufficient for most cases.

Rename the field to reflect the new semantics.

keszybz · 2022-12-08T21:15:48Z

I guess we can backport this, but I skipped it in this batch because it's quite a few commits and it would be better to let them bake in main for a week.

bluca · 2022-12-08T21:19:12Z

We definitely need it in stable, but yes it's better to let it marinate a bit

keszybz · 2022-12-14T16:19:18Z

First five commits queued up v252.4.

yuwata added the sleep label Dec 7, 2022

bluca added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 7, 2022

bluca added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Dec 7, 2022

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from c3f78ff to 1ea6029 Compare December 7, 2022 09:56

msizanoen1 changed the title ~~Revert "sleep: freeze and thaw user.slice to avoid wasting power/resu…~~ sleep: always thaw user.slice even if freezing failed Dec 7, 2022

bluca reviewed Dec 7, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 3 times, most recently from c31907c to f36d485 Compare December 7, 2022 10:06

bluca removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 7, 2022

yuwata reviewed Dec 7, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

yuwata added the pid1 label Dec 7, 2022

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from f36d485 to 045c924 Compare December 7, 2022 10:39

msizanoen1 mentioned this pull request Dec 7, 2022

systemd:amd64 (252-2 -> 252.1-1) brakes suspend/resume #25356

Closed

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 4 times, most recently from 2403b12 to 264ac23 Compare December 7, 2022 14:39

poettering reviewed Dec 7, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

poettering requested changes Dec 7, 2022

View reviewed changes

src/core/cgroup.c Show resolved Hide resolved

src/core/dbus-unit.c Show resolved Hide resolved

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 2 times, most recently from 71a2c31 to 9238e4a Compare December 7, 2022 15:40

poettering reviewed Dec 7, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

msizanoen1 added 4 commits December 8, 2022 00:54

sleep: always thaw user.slice even if freezing failed

efa736d

`FreezeUnit` can fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is `[email protected]` being frozen while still being able to log in over SSH.

core/slice: skip member units without realized cgroup during freeze o…

fcb0878

…r thaw This ensures that services with `RemainAfterExit` but without any process running won't cause failure during freeze.

core/cgroup: thaw slice of unit when thawing unit

a14137d

This ensures starting a new unit under a frozen slice work as expected.

core/unit: allow overriding an ongoing freeze operation

3d19e12

Sometimes a freeze operation can hang due to the presence of kernel threads inside the unit cgroup (e.g. QEMU-KVM). This ensures that the ThawUnit operation invoked by systemd-sleep at wakeup always thaws the unit.

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 3 times, most recently from 75f8d9b to d2da003 Compare December 8, 2022 01:46

msizanoen1 requested a review from poettering December 8, 2022 04:23

poettering reviewed Dec 8, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

poettering reviewed Dec 8, 2022

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

poettering added the good-to-merge/with-minor-suggestions label Dec 8, 2022

msizanoen1 added 3 commits December 8, 2022 12:58

core/cgroup: ignore kernel cgroup.events when thawing

7fcd269

The `frozen` state can be `0` while the processes are indeed frozen (see last commit). Therefore do not respect cgroup.events when checking whether thawing is necessary.

core: pending_freezer_{message => invocation}

af1e336

Rename the field to reflect the new semantics.

msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from d2da003 to af1e336 Compare December 8, 2022 11:58

bluca approved these changes Dec 8, 2022

View reviewed changes

bluca added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed needs-stable-backport and removed good-to-merge/with-minor-suggestions labels Dec 8, 2022

poettering merged commit d20ea2c into systemd:main Dec 8, 2022

msizanoen1 deleted the s2h-nosuspend-user-proc branch December 8, 2022 16:25

keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Dec 8, 2022

n-peugnet mentioned this pull request Dec 12, 2022

sleep: several fixlets #25374

Merged

keszybz removed the needs-stable-backport label Dec 14, 2022

Uh oh!

Comments

Conversation

msizanoen1 commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msizanoen1 commented Dec 7, 2022

Uh oh!

bluca commented Dec 7, 2022

Uh oh!

msizanoen1 commented Dec 7, 2022

Uh oh!

msizanoen1 commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluca commented Dec 7, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poettering commented Dec 7, 2022

Uh oh!

Uh oh!

msizanoen1 commented Dec 8, 2022

Uh oh!

poettering Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

msizanoen1 Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

poettering commented Dec 8, 2022

Uh oh!

keszybz commented Dec 8, 2022

Uh oh!

bluca commented Dec 8, 2022

Uh oh!

keszybz commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

msizanoen1 commented Dec 7, 2022 •

edited

Loading

msizanoen1 commented Dec 7, 2022 •

edited

Loading