-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
Description
Issue description
This issue is a placeholder for all those users raising their voices in response to the deprecation of NixOps deployment.autoLuks introduced in #61321 (and backported to 19.03).
Please let us know if you are using the feature!
NixOps deployment.autoLuks is a feature to automatically handle block devices and luks encryption without storing secrets on the target devices.
Even in its current state it seems to be halfway broken (e.g. removing a LUKS device panics systemd), and people expressed doubts on whether it's being used at all.
Looking at the NixOps repository and searching for public infrastructure repositories didn't yield a large (or any) userbase of the feature. Thus we are asking for feedback if you are using it.
The changes previously done to our systemd fork included changes to the startup unit ordering. The local filesystems were no longer part of the very basic system init, allowing sshd and similar processes to start without finishing all mount units.
Due to those relaxed boot requirements a bunch of errors with state and runtime directories appeared. There were some fixes but they are still incomplete (e.g. nixos-rebuild switch regenerates all the state directories but reboots do not have the same guarantee).
Backing out of these changes and restoring a sane boot order for the price of requiring a few more lines of configuration in NixOps setups seems like a reasonable tradeoff.
Why did this become necessary?
In the past our systemd fork carried a patch (NixOS/systemd@ce79214) that removed the local-fs.target from the sysinit.target. This allowed services such as sshd to start while not all of the local filesystems were mounted, thus making it possible to send over keys using sshd.service. While probably a plausible workaround at the time this caused a bit of weird behavior down the road.
Systemd didn't support _netdev and subsequently struggled with all kinds of network block devices until roughly 2014.
Since systemd supports managing StateDirectory, RuntimeDirectory , etc (https://www.freedesktop.org/software/systemd/man/systemd.exec.html) and systemd-tmpfiles (https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html) and their usage increased (even inside systemd itself), the amount of unexpected side effects did increase.
While probably not noticeable for most people there is a race condition between the folders in /run/ and /var/lib/ being generated and the remaining system coming up. In many cases we might just be lucky that all the directories exist. In general it lead to many PreStart scripts that created those directories, if they are missing. Those in turn required to be priviledged since most daemons are not being run with root privileges. The option we used to turn those scripts into privileged scripts is now deprecated. We have an ongoing effort to replace them where possible (#56265, #62050, …).
Besides those, we are trying to reduce the amount of custom patches that are being applied to systemd. In the long run it should become easier to maintain our systemd package. Eventually we would like to upstream some of our changes in a portable way. Things that aren't strictly required for systemd to work on NixOS should therefore go away.
What can I do to make it work again?
Make sure you add _netdev to all the filesystems you are mounting via the autoLuks module. Adding that option moves them from the local-fs.target to remote-fs.target which will allow your system to start the sshd even without the luks volumes. Afterwards you can use nixops send-keys again.
Do not forget to read the error message you got and set the option that was mentioned there.