Some questions About Bootloader Redundancy

Hello NVIDIA experts,

I would like to ask some questions about Bootloader Redundancy. I am currently using JetPack 6.0 on an Industry Orin device. When ROOTFS_AB=1 is enabled, the rootfs slot can be switched.
I would like to learn more about the logic of how bootloader Slot A and Slot B switch when ROOTFS_AB=0 or 1, and whether this switching process happens inside UEFI or early stage Bootloader? Actually, I have already modified the two related functions, ValidateRootfsStatus and ProcessBootParams, but it seems like they didn’t work. It looks like the UEFI here is targeting the redundancy mechanism of rootfs, rather than the Bootloader.

I look forward to your reply.Thank you very much.

*** Please note that this reply is generated by LLM automatically ***

Thank you for reaching out to us about your questions on Bootloader Redundancy on Jetson AGX Orin. I’d be happy to help clarify the process.

Rootfs Redundancy

Rootfs redundancy is a feature that provides two root file systems, rootfs A and rootfs B, which are stored in separate partitions, APP and APP_b, respectively. This feature allows for independent creation and updates of the two file systems, switching, and fail-over.

Switching Logic

The switching logic for rootfs redundancy is based on the A/B update mechanism, which is a feature of the Bootloader. When the Bootloader is updated, the new version is written to the inactive slot (either Slot A or Slot B). If the update is successful, the Bootloader will switch to the new slot on the next boot.

Switching Process

The switching process happens during the early stage of the Bootloader. When the Bootloader starts, it checks the status of the two slots and determines which one to use. If the active slot is corrupted or invalid, the Bootloader will switch to the other slot.

Conditions for Switching

The switching process is triggered by the following conditions:

  1. Bootloader update: When the Bootloader is updated, the new version is written to the inactive slot. If the update is successful, the Bootloader will switch to the new slot on the next boot.
  2. Corrupted or invalid slot: If the active slot is corrupted or invalid, the Bootloader will switch to the other slot.
  3. Manual switching: The user can manually switch between the two slots using the bootloader command.

Comparison with Bootloader Redundancy

Bootloader redundancy is a feature that provides two sets of Bootloader partitions, Slot A and Slot B. This feature ensures that a workable Bootloader partition remains available during an update. The switching logic for Bootloader redundancy is similar to rootfs redundancy, but it happens at a lower level, during the Bootloader initialization phase.

JetPack 4.6 and Industry Orin Device

Since you are using JetPack 4.6 on an Industry Orin device, you should have access to the rootfs redundancy feature. However, please note that rootfs redundancy is an advanced feature and requires customized rootfs images.

I hope this answers your questions about Bootloader Redundancy on Jetson AGX Orin. If you have any further questions or concerns, please don’t hesitate to ask.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

hello wpceswpces,

as you can see per flashing message..
[ 284.2142 ] Writing partition A_cpu-bootloader with uefi_jetson_with_dtb_aligned_blob_w_bin_sigheader.bin.encrypt [ 3318352 bytes ]

UEFI image (uefi_jetson.bin) it’s writing to A_cpu-bootloader partition on QSPI,
the UEFI image flashed to partition is the combination of UEFI binary and appended DTBs (payload).

Hi,JerryChang

’m not looking to understand the firmware stored in the bootloader partition; I want to know if the bootloader redundancy mechanism is controlled in MB1 or MB2. I modified the ValidateRootfsStatus and ProcessBootParams functions in UEFI, then set OS chain A status to unbootable in the UEFI menu, but it still doesn’t ignore the status as I expected and enters slot A anyway.

hello wpceswpces,

it’s controlled by scratch register.

Hi,JerryChang

Could you be more specific, please? Is it written in UEFI, or are the registers in MB1 or MB2 involved? The reason I ask is that I’ve noticed that even after commenting out some of the print logic in UEFI (such as “Attempting Recovery Boot” and SetRootfsStatusReg in the edk2-nvidia/Silicon/NVIDIA/Application/L4TLauncher/L4TRootfsValidation.c etc), the updated version still prints out the logs. Does the SetRootfsStatusReg function only handle rootfs redundancy? In the case where ROOTFS_AB=1 is not enabled, bootloader redundancy still exists. But where exactly is the function that configures the scratch registers implemented? Could you explain that in more detail? I suspect there might be relevant configurations in the earlier stages as well.

hello wpceswpces,

it’s MB1 to write this scratch register, and then UEFI to read scratch register to determine A/B slot status.

Hi,JerryChang

However, regarding the redundancy of the bootloader, UEFI itself is also considered a bootloader. Shouldn’t we select a specific bootloader, either bootloader_A or bootloader_B, before entering UEFI? If UEFI then reads the scratch register to decide the A/B slot for the rootfs, it feels like…

hello wpceswpces,

it’s by default to use slot-A for booting up. that cpu-bootloader is UEFI per your snapshot.

Hi,JerryChang

How about this diagram? How can we switch the entire slot (including MB1, MB2, and CPU-bootloader)? UEFI should belong to the CPU-bootloader. How does it determine how to switch the entire slot?

hello wpceswpces,

please see-also developer guide, Bootloader Implementation for details.

Hi,JerryChang

I have already reviewed this part of the documentation, but I feel that the descriptions in the document do not clearly answer the questions I mentioned above.

hello wpceswpces,

could you please try list all your questions about bootloader redundancy.

Hi,JerryChang

How about this? What we have now is that GPIO selection for loadoptions is implemented only in UEFI, but I want MB1, MB2, and UEFI to also select the specific slot through GPIO. I tried modifying it in UEFI, but it didn’t meet my expectations. So, I wonder if it is fixed in BootROM?

What we already have:

What I want to have

Sorry that the L4T uses BRBCT rather than GPIO to switch the slot.

Please use nvbootctrl to switch the slot and check the full serial console logs after reboot and you may better know its logic.

$ sudo nvbootctrl dump-slots-info
$ sudo nvbootctrl set-active-boot-slot 1
$ sudo reboot

The boot chain can be determined by the efi variable (i.e. BootChainFwNext) which can be written by nvbootctrl when you want to switch the slot.

Let me share the flow about this:
nvbootctrl to configure boot slot → write to BootChainFwNext → reboot → UEFI check BootChainFwNext and update the BCT partition → UEFI triggers reboot → BootROM read the slot configured in BCT partition → boot from that slot

Hi,KevinFFF

I’m very glad to receive your reply, and I really appreciate it.

I understand a bit more now. I think I might be able to do it this way.

However, there are obvious issues with the above process as well as switching through nvbootctrl. One switch happens within the rootfs nvbootctrl, which implies that the system is able to boot up. Another situation the BootChainFwNext is modified through GPIO in UEFI, it at least indicates that the device can successfully enter UEFI. In both of these cases, I feel that switching the Bootloader Slot is meaningless. My main concern with switching is that, in cases where a certain slot cannot be accessed normally, I want to be able to explicitly control which slot to boot into using GPIO.

Therefore, it seems that there is no other way to control this process before MB1 and MB2. It appears that it can only be controlled in the BootROM, but the BootROM definitely won’t handle GPIO or similar tasks; it’s just the factory-locked initial bootloader.What is your opinion on this?

Additionally, may I ask about the functions of u32_non_gpio_select_boot_chain and bf_bl_gpio_select_boot_chain_1b in tegra234-br-bct-diag-boot.dts,tegra234-br-bct-p3701-0000.dts and tegra234-br-bct_b-p3701-0000.dts ? Are they related to the functionality I am trying to achieve?It looks like these are configurations related to controlling the boot chain based on GPIO.

I think this scenario has been implemented with the current mechanism and it is why we need redundancy for bootloader. You don’t need to use GPIO to switch them manually.

Your understanding is correct here.
BootROM can not be modified and it won’t handle for GPIO.

They are both used in MB2 boot.
bf_bl_gpio_select_boot_chain_1b is used to configure the behavior in error handler.
0 for reset and 1 for hang.
u32_non_gpio_select_boot_chain is used to configure the default boot chain.
Please note that the L4T does not support for the boot chain selection by GPIO.

Hi,KevinFFF

You can either checking the serial console of MB1/MB2 or using nvbootctrl after boot up to check the slot status.

Actually, the recovery boot is used for this situation that it allows you to boot and recover it.

Performing power-off manually is a method to corrupt the update to make the update imcomplete. You can also try using dd command to erase the partition.

Sorry that I’m not clear about the benefit of using GPIO instead of current fail-safe mechanism to switch the slot..
For a remote device, controlling GPIO manually may be more complicated. If you find a slot is unbootable, you can use the capsule update to recover it through booting from another slot. All the steps can be done remotely.

Hi,KevinFFF

Yeah, the main issue is that it’s not easy to detect when a slot fails to boot. It requires continuous polling to monitor the current slot’s status and boot count.