-
-
Notifications
You must be signed in to change notification settings - Fork 268
Fix setting boot path in case of UEFI partition (ESP) on MD RAID #2608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| device=${device%p} | ||
| fi | ||
|
|
||
| echo $device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if $device is somehow invalid or empty?
Could some checking be implemented?
Currently get_device_from_partition is called only once
so the caller code could be enhanced to do some error handling
if get_device_from_partition returns non-zero exit value
or if get_device_from_partition output is empty if that indicates an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$device can't be empty here, it has been created in this function, unless the caller has provided an empty $1. Should we protect against that?
Concerning validity, we could check if the device actually exists and is a block device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may catch empty $1 after partition_block_device=$1 with something like
contains_visible_char "$partition_block_device" || BugError "Empty string passed to get_device_from_partition"
or probably even better
test -b "$partition_block_device" || BugError "get_device_from_partition called with '$partition_block_device' that is no block device"
which also catches when $1 is empty or only blanks and it shows that in the error message like
get_device_from_partition called with '' that is no block device
or
get_device_from_partition called with ' ' that is no block device
I tested that test -b behaves well also for symlinks
(at least on my openSUSE Leap 15.2 system):
# ls -l /dev/disk/by-id/wwn-0x5000039462b83c55-part1 | grep sda1
lrwxrwxrwx 1 root root 10 Apr 30 07:36 /dev/disk/by-id/wwn-0x5000039462b83c55-part1 -> ../../sda1
# test -b /dev/disk/by-id/wwn-0x5000039462b83c55-part1 && echo y || echo n
y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally the current "string munging" implementation
may result a $device string that is not a block device
so something like
test -b "$device" && echo $device
(test -b "$device" returns 1 when it is no block device)
as last command in the function
makes the function return 1 in case of errors
so that its caller can do whatever is best in the caller's environment.
Note that quoting is crucial here because
# test -b qqq && echo y || echo n
n
# test -b && echo y || echo n
y
# test -b " " && echo y || echo n
n
i.e. test -b returns 0 when its unquoted argument is empty or only blanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning invalid block devices, how does disk mapping work when block devices in the original system are mapped to different names in the rescue system? Is the layout and mapping regenerated to refer to the mapped names, or does it contain the original names and one has to map them after resolving dependences?
If the latter, it will have impact on the desirability of validation in this function, as it will be legitimate to handle device names that do not correspond to actual existing devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weekend is near so only a quick reply off the top of my head:
When things fail with errors because of disk mappings
it is still better than to silently proceed with wrong data.
I prefer user reports where things fail with clear error messages
over user reports when things fail in weird inexplicable ways,
cf. "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style
In migration mode "all needed files" (whatever that exactly means in practice ;-)
should be adapted during "rear recover".
The starting point is layout/prepare/default/300_map_disks.sh and
the actual mapping happens via layout/prepare/default/320_apply_mappings.sh
which contains
for file_to_migrate in "$LAYOUT_FILE" "$original_disk_space_usage_file" "$rescue_config_file" ; do
...
... apply_layout_mappings "$file_to_migrate" ...
so as far as it looks there mappings are only applied to
disklayout.conf layout/config/df.txt /etc/rear/rescue.conf
I assume those three files are usually sufficient
but I don't know for sure, in particular not for special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching the ReaR scripts for apply_layout_mappings
(that is the function that does the actual works)
finds that apply_layout_mappings is also called during "rear recover" in
finalize/GNU/Linux/250_migrate_disk_devices_layout.sh
and
finalize/GNU/Linux/260_rename_diskbyid.sh
where a lot more files (in particular restored bootloader config files)
get possibly adapted.
I am not a sufficient UEFI booting expert to know if there are missing
efibootmgr and/or EFI bootloader related files that would need to be adapted.
As far as I understand the code in finalize/Linux-i386/670_run_efibootmgr.sh
it seems disk mapping cannot affect how efibootmgr is called there
because it seems the code in finalize/Linux-i386/670_run_efibootmgr.sh
works directly on the actual state in the currently running recovery system because
it seems all is based on where ESP is mounted in the currently running recovery system
so nothing possibly outdated of how things have been on the original system is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the code in finalize/Linux-i386/670_run_efibootmgr.sh
it seems disk mapping cannot affect how efibootmgr is called there
because it seems the code in finalize/Linux-i386/670_run_efibootmgr.sh
works directly on the actual state in the currently running recovery system because
it seems all is based on where ESP is mounted in the currently running recovery system
After this proposed change, the code will use disk mapping (and not just the actual state). But, thanks to apply_layout_mappings being called earlier, the disk mappings should be in sync with the current disk names, so there should be no problem.
I am surprised a bit though that apply_layout_mappings gets called multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding "something is done multiple times" and similar things in ReaR:
Welcome to surprising adventures with grown code over many years by various contributors ;-)
I think ReaR is a perfect bazaar as in
https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added both checks that you suggested (for get_device_from_partition input and result).
jsmeix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From plain looking at the changes they look good to me
so I can approve it "bona fide" - in particular because things
are sufficiently commented so I think I even understand it a bit ;-)
|
@rmetrich hello, please review, as you reported the problem. |
|
Hi @pcahyna I'm all good with your code, thanks a lot for the effort. |
|
I agree that ReaR is expected to restore a software RAID also on UEFI systems On the other hand there are zillions of other "reasonably expected" functionalitries For me a plain "bug" is only when code that is there does not work as intended. I.e. it is also not a plain "bug" when code that is there does work as intended |
|
@pcahyna I do very much appreciate it! |
|
@jsmeix, glad you like it! You know, I can also replace the non-standard case identifiers like |
|
@pcahyna I appreciate it! |
|
I wish you all a relaxed and recovering weekend! |
|
Some "famous words" before weekend regarding Yes, we all will be dead at some time. But if we made code as intended by |
gozora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the plain looking at the code, it looks good to me.
V.
|
It seems there is a dilemma where in the code Current finalize/Linux-i386/670_run_efibootmgr.sh code in this pull request is (excerpts) so it could tell the user I think when Or my |
This part caught my attention as well. Is it even wise to return 0 in this case? At this point we know that a UEFI bootloader should be used, so skipping the rest of the file is almost certainly a serious error. |
Done (I changed the message a bit.) |
|
@pcahyna Then show it as an error to the user so he is informed something went wrong |
The main part of the script gets skipped in this case, which is likely a problem and should be reported. Also, we have said before that we are "Creating EFI Boot Manager entries...", so we have to report when it actually does not happen.
get_device_from_partition() now validates the input partition and the converted output (disk/device name). 670_run_efibootmgr.sh now checks for possible errors from get_device_from_partition() and logs them.
Have the LogPrint "Creating EFI Boot Manager entries..." user message before the first LogPrintError "Failed to create EFI Boot Manager entries ..." cf. rear#2608 (comment)
Explain why having no "$TARGET_FS_ROOT/$UEFI_BOOTLOADER" conflicts with USING_UEFI_BOOTLOADER is true i.e. we should create EFI Boot Manager entries but we cannot so we report it as LogPrintError to the user.
Use LogPrintError "... UEFI bootloader '$UEFI_BOOTLOADER' not found..." (i.e. with single quotes) because UEFI_BOOTLOADER might be empty and then this is visible in the message as "... UEFI bootloader '' not found..."
|
@jsmeix rebase done. |
|
@pcahyna I will merge it soon today. |
Leave 940_grub2_rescue.sh at the end with explicit "return 1" in case of LogPrintError "efibootmgr failed to create EFI Boot Manager entry ..." cf. same kind of code in finalize/Linux-i386/660_install_grub2.sh
|
@pcahyna regarding executable finalize/Linux-i386/670_run_efibootmgr.sh |
Added TODO comment: When the ESP is located on MD RAID we need to determine the physical RAID components and call efibootmgr on each of them, cf. #2608
Have the LogPrint "Creating EFI Boot Manager entries..." user message before the first LogPrintError "Failed to create EFI Boot Manager entries ..." cf. rear#2608 (comment)
Pull Request Details:
Type: Bug Fix
Impact: High
Reference to related issue (URL): Can't restore EFI system with Software Raid #2595
How was this pull request tested?
Backup & restore on a RHEL 8 host with UEFI but without RAID and on a CentOS 7 host with UEFI and RAID (ESP on RAID). Examination of logs to check that correct efibootmgr commands were executed.
Brief description of the changes in this pull request:
Use dependencies when determining partitions for UEFI
The ESP may be located on a RAID device. In this case, we need to determine the physical RAID components and call efibootmgr on them (after restoring - during the finalize step), because efibootmgr does not know how to tell the firmware to boot from RAID. Moreover, the current code tries to parse a device like
/dev/md125into a disk called/dev/mdand partition125.Fix by using storage dependencies: RAID components (the actual ESP copies) can be found as dependencies of the ESP block device. For consistency, use dependencies also to locate the ESP block device itself (the old method is preserved as a fallback). Similar strategy is used by the GRUB2 installation code (finalize/Linux-i386/660_install_grub2.sh), which is able to install GRUB2 on the RAID components correctly.
In addition, clean up usr/share/rear/finalize/Linux-i386/670_run_efibootmgr.sh a bit and add more logging to ease analysis when something goes wrong in this area.
TODO: finalize/Linux-i386/610_EFISTUB_run_efibootmgr.sh needs a similar fix.
Fixes #2595