sleep: always thaw user.slice even if freezing failed#25662
sleep: always thaw user.slice even if freezing failed#25662poettering merged 7 commits intosystemd:mainfrom
Conversation
|
Apparently trying to freeze |
|
Which processes are you talking about exactly? Have you filed bugs with those? Most of the comments in that issue are talking about various kernel versions, with reports that issue go away by changing the kernel. |
|
The reporters have later stated that all tested kernel versions suffer from the issue. |
|
cc @poettering |
Makes sense, then please change that - simply set it so that thawing is always done, even if freezing returned an error. It's a no-op if a cgroup isn't frozen, so shouldn't matter too much. |
c3f78ff to
1ea6029
Compare
c31907c to
f36d485
Compare
f36d485 to
045c924
Compare
2403b12 to
264ac23
Compare
|
/cc @msekletar (as the original author of the freeze/thaw logic) |
71a2c31 to
9238e4a
Compare
`FreezeUnit` can fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is `[email protected]` being frozen while still being able to log in over SSH.
…r thaw This ensures that services with `RemainAfterExit` but without any process running won't cause failure during freeze.
This ensures starting a new unit under a frozen slice work as expected.
Sometimes a freeze operation can hang due to the presence of kernel threads inside the unit cgroup (e.g. QEMU-KVM). This ensures that the ThawUnit operation invoked by systemd-sleep at wakeup always thaws the unit.
75f8d9b to
d2da003
Compare
|
Seems like the CI failures are due to Launchpad being down: https://github.com/systemd/systemd/actions/runs/3644468625/jobs/6153747721 |
src/sleep/sleep.c
Outdated
| return log_debug_errno(r, "Failed to open connection to systemd: %m"); | ||
|
|
||
| /* Wait for 1.5 seconds at maximum for freeze operation */ | ||
| sd_bus_set_method_call_timeout(bus, 1500 * USEC_PER_MSEC); |
There was a problem hiding this comment.
hmm, on an overloaded system 1.5s sounds a bit too little. let's make this 5s?
also, please cast result to (void), given you knowingly ignore any errors here. static checkers care for this.
There was a problem hiding this comment.
This has to be 1.5 seconds. FreezeUnit will just hang if there's a QEMU/KVM process running in user.slice.
|
lgtm, just some nitpicks |
The `frozen` state can be `0` while the processes are indeed frozen (see last commit). Therefore do not respect cgroup.events when checking whether thawing is necessary.
A FreezeUnit operation can hang due to the presence of kernel threads (see last 2 commits). Keeping the default configuration will mean the system will hang for 25 seconds in suspend waiting for the response. 1.5 seconds should be sufficient for most cases.
Rename the field to reflect the new semantics.
d2da003 to
af1e336
Compare
|
I guess we can backport this, but I skipped it in this batch because it's quite a few commits and it would be better to let them bake in |
|
We definitely need it in stable, but yes it's better to let it marinate a bit |
|
First five commits queued up v252.4. |
FreezeUnitcan fail even when some units did got frozen, causing some user units to be frozen. A possible symptom is[email protected]being frozen while still being able to log in over SSH.Fixes: #25356