Skip to content

sd-shutdown improvements#8429

Merged
keszybz merged 7 commits intosystemd:masterfrom
medhefgo:sd-shutdown
Mar 13, 2018
Merged

sd-shutdown improvements#8429
keszybz merged 7 commits intosystemd:masterfrom
medhefgo:sd-shutdown

Conversation

@medhefgo
Copy link
Contributor

This should make sure the shutdown process only logs when it should. I'm not too sure about the last patch, though. It feels a bit ugly, but it gets the job done.

I am also wondering whether the sd-shutdown umount loop shouldn't be turned into a while loop. We are already checking if we are making any changes during iterations, and currently, there is potential for mounts not to get handled should we have a crazy nested setup that goes beyond FINALIZE_ATTEMPTS.

Fixes #8155

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, some minor nitpicks below.

n_failed ++;
continue;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason for this chunk.

if (r < 0)
return r;
if (!isempty(unknown_flags))
log_warning("Ignoring unkown mount flags '%s'.", unknown_flags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "unkown"


/* MS_BIND is special. If it is provided it will only make the mount-point
* read-only. If left out, the super block itself is remounted, which we want. */
m->remount_flags = (m->remount_flags|MS_REMOUNT|MS_RDONLY) & ~MS_BIND;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the whole commit: please provide justification for this change in the patch comment. What are the benefits? For which file systems does this matter?

if (nonunmountable_path(m->path))
/* Remount failed, but try unmounting anyway,
* unless this is a mount point we want to skip. */
if (nonunmountable_path(m->path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, isn't this patch made obsolete by the subsequent patch to skip remount-ro on api filesystems?

@keszybz keszybz added the pid1 label Mar 12, 2018
This most likely amounts to no real benefits and is just here for
completeness sake.
In the case of some api filesystems remounting read-only fails
while unmounting succeeds.
There is little point in logging about unmounting errors if the
exact mountpoint will be successfully unmounted in a later retry
due unmounts below it having been removed.

Additionally, don't log those errors if we are going to switch back
to a initrd, because that one is also likely to finalize the remaining
mountpoints. If not, it will log errors then.
@medhefgo
Copy link
Contributor Author

Made some changes.

Originally, I thought that the kernel might be complaining about the flags not being the same (sans the ro) when remounting the cgroup tree. But it turned out not to do anything. I decided to keep it, because the manpage said that the flags should be set that way after all. In fact, I was the one that added the code to reuse the mount options some time ago. I simply must've skipped this detail the last time, or I would've added it then.

Yes, not remounting api filesystems in the first place would make the umount after failed remount obsolete. But I figured it would future-proof the whole thing. If some new (api) fs should come up that cannot be remounted but also isn't detected as api yet, we would once again be in a situation where we cannot unmount / or /oldroot just because of some (api) fs mounted below.

@keszybz
Copy link
Member

keszybz commented Mar 13, 2018

OK, this seems good to merge. This area is tricky, so I'm not 100% sure that it's all correct, but we're pretty far from a release now, so if there are any regressions we should catch them. Let's keep an eye out.

@keszybz keszybz merged commit d4f5c00 into systemd:master Mar 13, 2018
@sueridgepipe
Copy link

sueridgepipe commented Mar 13, 2018

Has this been tested with a LUKS encrypted root partition?

This patch was just introduced into Manjaro unstable branch. All my systems have an encrypted root partition and all crash during shutdown / reboot. Hard power off is required.

screenshot_20180314_084333-1

This is a virtualbox screen grab, but the text is identical on bare metal systems with the same crash.

The following systemd-cryptsetup errors are also generated during shutdown / reboot.

Mar 13 manjaro systemd-cryptsetup[14160]: Failed to deactivate: Device or resource busy
Mar 13 manjaro systemd-cryptsetup[14161]: Failed to deactivate: Device or resource busy
Mar 13 manjaro kernel: watchdog: watchdog0: watchdog did not stop!

With a non encrypted root partition reboot and shutdown work normally.

@medhefgo
Copy link
Contributor Author

I haven't tested this with encrypted root. Can you git bisect this to a specific commit?

@hphilm
Copy link

hphilm commented Mar 14, 2018

@sueridgepipe: please get the sources of our package and add the commits one by one after line 67 to see which commit is the culprit.

@keszybz
Copy link
Member

keszybz commented Mar 14, 2018

I think I have a fix for you, no need to bisect. I'll push a PR shortly.

@keszybz
Copy link
Member

keszybz commented Mar 14, 2018

See #8425 and in particular b936186.

@hphilm
Copy link

hphilm commented Mar 14, 2018

@keszybz: I assume you mean pull #8452

@evverx evverx added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR merged-but-needs-fixing labels Mar 14, 2018
@evverx
Copy link
Contributor

evverx commented Mar 14, 2018

I added the "ci-fails" and "merged-but-needs-fixing" labels because the CI had failed with

[�[0;32m  OK  �[0m] Stopped Remount Root and Kernel File Systems.
[�[0;32m  OK  �[0m] Removed slice system-systemd\x2dfsck.slice.
[�[0;32m  OK  �[0m] Stopped Cryptography Setup for varcrypt.
[�[0;32m  OK  �[0m] Reached target Unmount All Filesystems.
[�[0;32m  OK  �[0m] Removed slice system-systemd\x2dcryptsetup.slice.
[�[0;32m  OK  �[0m] Reached target Shutdown.
[�[0;32m  OK  �[0m] Reached target Final Step.
         Starting Power-Off...
[   21.831426] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[   21.831426] 
[   21.832309] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.4.0-116-generic #140-Ubuntu
[   21.832309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   21.832309]  0000000000000086 609837e11178841e ffff88001c73fc48 ffffffff813ffc13
[   21.832309]  ffffffff81cbc7b0 ffff88001c73fce0 ffff88001c73fcd0 ffffffff81191327
[   21.832309]  ffff880000000010 ffff88001c73fce0 ffff88001c73fc78 609837e11178841e
[   21.832309] Call Trace:
[   21.832309]  [<ffffffff813ffc13>] dump_stack+0x63/0x90
[   21.832309]  [<ffffffff81191327>] panic+0xd3/0x227
[   21.832309]  [<ffffffff810865c4>] do_exit+0xaf4/0xb00
[   21.832309]  [<ffffffff81086653>] do_group_exit+0x43/0xb0
[   21.832309]  [<ffffffff81092e74>] get_signal+0x294/0x600
[   21.832309]  [<ffffffff8102e567>] do_signal+0x37/0x6f0
[   21.832309]  [<ffffffff8109073a>] ? signal_wake_up_state+0x2a/0x30
[   21.832309]  [<ffffffff81090876>] ? complete_signal+0x106/0x1f0
[   21.832309]  [<ffffffff810cd681>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[   21.832309]  [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
[   21.832309]  [<ffffffff81003c26>] prepare_exit_to_usermode+0x26/0x30
[   21.832309]  [<ffffffff8184fbe2>] retint_user+0x8/0x50
[   21.832309] Kernel Offset: disabled
[   21.832309] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[   21.832309] 

@medhefgo medhefgo deleted the sd-shutdown branch March 16, 2018 16:45
floion added a commit to balena-os/meta-balena that referenced this pull request Nov 7, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements
Signed-off-by: Florin Sarbu <[email protected]>
floion added a commit to balena-os/meta-balena that referenced this pull request Nov 14, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo
Signed-off-by: Florin Sarbu <[email protected]>
floion added a commit to balena-os/meta-balena that referenced this pull request Nov 14, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo versions
Signed-off-by: Florin Sarbu <[email protected]>
floion added a commit to balena-os/meta-balena that referenced this pull request Nov 14, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo versions
Signed-off-by: Florin Sarbu <[email protected]>
floion added a commit to balena-os/meta-balena that referenced this pull request Nov 14, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo versions
Signed-off-by: Florin Sarbu <[email protected]>
agherzan pushed a commit to balena-os/meta-balena that referenced this pull request Nov 16, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo versions
Signed-off-by: Florin Sarbu <[email protected]>
agherzan pushed a commit to balena-os/meta-balena that referenced this pull request Nov 23, 2018
OS 2.26.0+rev1 reports the following at reboot:
systemd-shutdown[1]: Failed to wait for process: Protocol error
systemd-shutdown[1]: Failed to wait for process: Protocol error

As per systemd/systemd#8155 (comment), systemd/systemd#8429
needs to be backported.

We currently backport these patches for sumo only

Change-type: minor
Changelog-entry: Backport systemd sd-shutdown improvements for sumo versions
Signed-off-by: Florin Sarbu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR merged-but-needs-fixing pid1

Development

Successfully merging this pull request may close these issues.

5 participants