Skip to content

Comments

sleep: always thaw user.slice even if freezing failed#25662

Merged
poettering merged 7 commits intosystemd:mainfrom
msizanoen1:s2h-nosuspend-user-proc
Dec 8, 2022
Merged

sleep: always thaw user.slice even if freezing failed#25662
poettering merged 7 commits intosystemd:mainfrom
msizanoen1:s2h-nosuspend-user-proc

Conversation

@msizanoen1
Copy link
Contributor

@msizanoen1 msizanoen1 commented Dec 7, 2022

FreezeUnit can fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is [email protected] being frozen while still being able to log in over SSH.

Fixes: #25356

@yuwata yuwata added the sleep label Dec 7, 2022
@msizanoen1
Copy link
Contributor Author

Apparently trying to freeze user.slice currently always fail on [email protected] as it doesn't have a realized cgroup but an active state, causing unit_cgroup_freezer_action to return -EBUSY, which might cause the FreezeUnit call to fail halfway through after freezing [email protected] or some session-*.scope unit. Shutting down the system still works as systemd will automatically thaw [email protected] on shutdown.

@bluca
Copy link
Member

bluca commented Dec 7, 2022

Which processes are you talking about exactly? Have you filed bugs with those? Most of the comments in that issue are talking about various kernel versions, with reports that issue go away by changing the kernel.

@bluca bluca added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 7, 2022
@msizanoen1
Copy link
Contributor Author

The reporters have later stated that all tested kernel versions suffer from the issue.

@msizanoen1
Copy link
Contributor Author

msizanoen1 commented Dec 7, 2022

FreezeUnit failing halfway through without thawing units is very bad, as it means some units won't get thawed later, which might be what results in [email protected] freezing while still being able to login using SSH.

cc @poettering

@bluca
Copy link
Member

bluca commented Dec 7, 2022

FreezeUnit failing halfway through without thawing units is very bad, as it means some units won't get thawed later, which might be what results in [email protected] freezing while still being able to login using SSH.

Makes sense, then please change that - simply set it so that thawing is always done, even if freezing returned an error. It's a no-op if a cgroup isn't frozen, so shouldn't matter too much.

@bluca bluca added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Dec 7, 2022
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from c3f78ff to 1ea6029 Compare December 7, 2022 09:56
@msizanoen1 msizanoen1 changed the title Revert "sleep: freeze and thaw user.slice to avoid wasting power/resu… sleep: always thaw user.slice even if freezing failed Dec 7, 2022
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 3 times, most recently from c31907c to f36d485 Compare December 7, 2022 10:06
@bluca bluca removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Dec 7, 2022
@yuwata yuwata added the pid1 label Dec 7, 2022
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from f36d485 to 045c924 Compare December 7, 2022 10:39
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 4 times, most recently from 2403b12 to 264ac23 Compare December 7, 2022 14:39
@poettering
Copy link
Member

/cc @msekletar (as the original author of the freeze/thaw logic)

@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 2 times, most recently from 71a2c31 to 9238e4a Compare December 7, 2022 15:40
`FreezeUnit` can fail even when some units did got frozen, causing some
user units to be frozen. A possible symptom is `[email protected]` being
frozen while still being able to log in over SSH.
…r thaw

This ensures that services with `RemainAfterExit` but without any
process running won't cause failure during freeze.
This ensures starting a new unit under a frozen slice work as expected.
Sometimes a freeze operation can hang due to the presence of kernel
threads inside the unit cgroup (e.g. QEMU-KVM). This ensures that the
ThawUnit operation invoked by systemd-sleep at wakeup always thaws the
unit.
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch 3 times, most recently from 75f8d9b to d2da003 Compare December 8, 2022 01:46
@msizanoen1
Copy link
Contributor Author

@msizanoen1 msizanoen1 requested a review from poettering December 8, 2022 04:23
return log_debug_errno(r, "Failed to open connection to systemd: %m");

/* Wait for 1.5 seconds at maximum for freeze operation */
sd_bus_set_method_call_timeout(bus, 1500 * USEC_PER_MSEC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, on an overloaded system 1.5s sounds a bit too little. let's make this 5s?

also, please cast result to (void), given you knowingly ignore any errors here. static checkers care for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be 1.5 seconds. FreezeUnit will just hang if there's a QEMU/KVM process running in user.slice.

@poettering
Copy link
Member

lgtm, just some nitpicks

The `frozen` state can be `0` while the processes are indeed frozen (see
last commit). Therefore do not respect cgroup.events when checking
whether thawing is necessary.
A FreezeUnit operation can hang due to the presence of kernel threads
(see last 2 commits). Keeping the default configuration will mean the
system will hang for 25 seconds in suspend waiting for the response. 1.5
seconds should be sufficient for most cases.
Rename the field to reflect the new semantics.
@msizanoen1 msizanoen1 force-pushed the s2h-nosuspend-user-proc branch from d2da003 to af1e336 Compare December 8, 2022 11:58
@bluca bluca added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed needs-stable-backport and removed good-to-merge/with-minor-suggestions labels Dec 8, 2022
@poettering poettering merged commit d20ea2c into systemd:main Dec 8, 2022
@msizanoen1 msizanoen1 deleted the s2h-nosuspend-user-proc branch December 8, 2022 16:25
@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Dec 8, 2022
@keszybz
Copy link
Member

keszybz commented Dec 8, 2022

I guess we can backport this, but I skipped it in this batch because it's quite a few commits and it would be better to let them bake in main for a week.

@bluca
Copy link
Member

bluca commented Dec 8, 2022

We definitely need it in stable, but yes it's better to let it marinate a bit

@keszybz
Copy link
Member

keszybz commented Dec 14, 2022

First five commits queued up v252.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

systemd:amd64 (252-2 -> 252.1-1) brakes suspend/resume

6 participants