[WIP] nixos/systemd-boot: boot counting and automatic fallback#84204
[WIP] nixos/systemd-boot: boot counting and automatic fallback#84204danielfullmer wants to merge 3 commits intoNixOS:masterfrom
Conversation
|
cc @adisbladis for interest |
9584565 to
4c5d3e4
Compare
|
@danielfullmer with 245.3 merged into master, can you rebase this PR and see if it's still broken upstream? |
4c5d3e4 to
2fe51f7
Compare
|
@flokli Sure, just rebased. There haven't been any relavent changes to Since I haven't received any response about the menu ordering here systemd/systemd#15256, I'll send a patch upstream to see if that sparks any discussion. I also recently patched Just to be clear, the logic is now this: We select the default entry specified in Upstream PR here: systemd/systemd#15577 |
0db9bed to
2f2e083
Compare
|
I just prepared the bump to We also might need to address #86422 (comment), so actual sd-boot fixes manifest on existing installations. |
2f2e083 to
c03ed62
Compare
|
I've resolved the earlier issue with the My fix was this: instead of setting (for example) Since the proper default generation might not be the highest numbered generation, I've renamed the default entry to (for example) This approach has the additional benefit that non-default system profiles would be respected. If the user has run |
c03ed62 to
72a37ec
Compare
|
Hi @danielfullmer, I just wanted to swing by and say: thank you for your work on this! I tested this on my machine just now and it works flawlessly. Is there anything I can do to help move this into a mergeable state? EDIT: Removed from the Staging project because it looks like this is no longer a mass rebuild. |
cole-h
left a comment
There was a problem hiding this comment.
Something I just noticed (after having switched off of using this PR) -- I still have boot entries with counters attached, but I also have the same profiles without counters. e.g.:
$ ls /boot/loader/entries
[...]
.rwxr-xr-x 690 root 22 Mar 11:49 nixos-generation-70+0-1.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-70.conf
.rwxr-xr-x 690 root 22 Mar 11:49 nixos-generation-71+0-1.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-71.conf
.rwxr-xr-x 690 root 22 Mar 11:49 nixos-generation-72+0-1.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-72.conf
.rwxr-xr-x 690 root 22 Mar 11:49 nixos-generation-73+0-3.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-73.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-74.conf
.rwxr-xr-x 580 root 23 Mar 13:29 nixos-generation-76.conf
.rwxr-xr-x 690 root 22 Mar 11:49 nixos-generation-default-74+3.conf
If boot counters are ever disabled and we see that there are profiles with a counter attached, what should be done? Nothing (and continue to regenerate the counted profiles on nixos-rebuild switch)? Skip it (e.g. don't regenerate that profile)?
|
|
||
| generation_dir = os.readlink(system_dir(self.profile, self.generation)) | ||
| tmp_path = self.path + ".tmp" | ||
| kernel_params = "systemConfig=%s init=%s/init " % (generation_dir, generation_dir) |
There was a problem hiding this comment.
As of 5276ebb, systemConfig has been removed.
| kernel_params = "systemConfig=%s init=%s/init " % (generation_dir, generation_dir) | |
| kernel_params = "init=%s/init " % generation_dir |
|
I marked this as stale due to inactivity. → More info |
|
This still looks interesting, can we do something to help? |
|
I marked this as stale due to inactivity. → More info |
|
Still interested, can we help you @danielfullmer ? |
|
Man this has been on my TODO list for so long, apologies for not taking it the last 20% and trying to get it merged. I now have a concrete need for this feature so I intend to work on this soon (on the order of weeks). If anyone gets to it before I do: I know we'll at least need to (1) Resolve any conflicts and ensure the feature still works (2) Figure out a solution to @cole-h 's issue of how to handle boot entries if we turn the feature off and on, and (3) Write some documentation/release notes for the feature. This is very likely too late to make it into 22.05, but it'd be nice to get it merged into master for the release after that. |
|
@danielfullmer Gentlest of pings :) |
|
Partly related, I have a half-baked implementation of this for grub at home. Main issue is that it's shitty to implement when we want proper counting and that I cannot get the test runner to do what I want |
|
As part of Cachix Deploy, I'd very much like to sponsor this work to get it fixed for everyone in NixOS itself. Let me know if someone is willing to get it done :) |
|
@domenkozar I'm actually currently working on bringing this PR up-to-date with master right now. I know I've been meaning to work on this for a while now, and I don't want to prevent you from sponsoring someone else to work on this if needed. But if I don't have something within ~2 weeks, then take that as a sign that I might not get to it for a while longer. |
Are you unable to get sponsored for this? Not sure if that's what @domenkozar could have had in mind. |
It might be complicated for some unrelated reasons--but I'll reach out directly. |
I'm interested into taking over for this and bringing this to completion if @danielfullmer cannot really anymore. |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
|
@dasJ Could you share more about your grub implementation? Would love to see this PR go forward! |
|
Closing in favor of #273062 which is close to completion. |
Motivation for this change
This draft PR adds support for
systemd-boot"automatic boot assessment".For background information, please first read the following document: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/
Setting
boot.loader.systemd-boot.counters.enable = trueenables this optional functionality.Since what a "successful boot" exactly means is somewhat context dependent, systemd provides
boot-complete.targetwhich serves as a synchronization point that an end user could order necessary services before. The boot entry will be marked as successfully booting only if this target succeeds.For unattended boots, there are some other potentially useful options such as adding
panic=1toboot.kernelParamsor settingFailureAction=rebootfor certain necessary services. However, these are end-user concerns that are somewhat orthogonal to boot counting and automatic fallback, and they probably do not belong in this PR.This is a draft PR, as there are a few issues that still need to be resolved:
systemd-boottest needs a writable boot disk (since boot counters are state included in the entry filenames). I have a hack enabling this, which should be cleaned up somehow.(edit: the issue below has been resolved to my satisfaction)
Perhaps the biggest current issue which could use some feedback on is how we set the "default" for
systemd-boot. Normally,systemd-boot-builder.pysets thedefaultentry inloader.conf. The logic insystemd-bootmeans that this entry would always be selected--overriding anything having to do with boot counters. I've disabled settingdefaultin this draft PR, and sosystemd-bootshould select the latest generation which does not have 0 tries left. However, this would break the ability for the user to switch back to a previous generation, which is clearly undesirable. Perhaps we need to further patchsystemd-bootto usedefaultonly if it has remaining tries left, and otherwise select the latest generation with remaining tries left.Things done
sandboxinnix.confon non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"./result/bin/)nix path-info -Sbefore and after)