How to enable A/B redundancy in Jetson TX2
|
|
Introduction to A/B redundancy
A/B redundancy is useful to recover a system in case of a failure in one of its system partitions. Basically, there is a mirror for each partition and it is used in the case the main partition fails, so the system will fall back to the mirror (or recovery) partition.
NVIDIA Jetson TX2 has this capability. The main partitions do not have any suffix, whereas the recovery partitions have a _b suffix. For example:
kernel-dtb: Main partition kernel-dtb_b: Recovery partition
However, the Jetson TX2 has the A/B redundancy disabled by default. So, partitions with _b suffix are not used by TX2 when the main partition fails.
This article describes how to enable it and how to recover a TX2 after a partition failure. To recover a TX2 is required to enable A/B redundancy, for this you can flash the new c-boot configuration using flash.sh
or enable it from userspace using nv_update_engine
.
General information
Relevant directories
Jetpack is divided into modules represented by directories. Let us assign a convenient name to sort these directories and specify where the different components are.
1. Jetpack installation directory: $JETPACKDIR
2. Bootloader directory: JETSON_BOOTLOADER=$JETPACKDIR/64_TX2/Linux_for_Tegra/bootloader
Inside each directory, there are important files. Let's specify what we can find in each one:
1. $JETPACKDIR
: It contains all the Jetpack files, NVIDIA flashing binaries, and files for the right working of Tegra boards. Inside of it, there is a folder named 64_TX2, which contains such files.
2. $JETSON_BOOTLOADER
: It contains the bootable image with the bootloader (./boot.img
), the filesystem packed in an image file (./system.img
), the device-tree transferred to the encrypted partition (if you do not define the device-tree in boot script configuration, the TX2 uses the DTB from this partition). Also, it contains all the important binaries used for signing, encrypting, enabling A/B redundancy, and writing files to the Jetson Tegra board.
Test environment
All the steps and procedures written below were tested on Jetson Tegra X2 in the following versions of Jetpack:
Jetpack 3.2.1 Jetpack 3.3
Enabling the A/B redundancy from flash
There is a configuration file to enable/disable the A/B redundancy. It is in $JETSON_BOOTLOADER/smd_info.cfg
and is likely to have the following settings:
... # SMD metadata information < VERSION 3 > # # Config 1: Disable A/B support (Default) # # slot info order is important! # <priority> <suffix> <retry_count> <boot_successful> 15 _a 7 1 # # Config 2: Enable redundancy support (by removing comments ##) # ##< REDUNDANCY_USER 1 > # slot info order is important! # <priority> <suffix> <retry_count> <boot_successful> ##15 _a 7 1 ##14 _b 7 1
The config 1 disables the redundancy and config 2 enables it. To enable the redundancy, uncomment the lines with ##. Also, comment the settings of config 1. In the end, the settings to enable the A/B redundancy are like the following:
# SMD metadata information < VERSION 3 > # # Config 1: Disable A/B support (Default) # # slot info order is important! # <priority> <suffix> <retry_count> <boot_successful> ##15 _a 7 1 # # Config 2: Enable redundancy support (by removing comments ##) # < REDUNDANCY_USER 1 > # slot info order is important! # <priority> <suffix> <retry_count> <boot_successful> 15 _a 7 1 14 _b 7 1
After modifying the smd_info.cfg
, the next step is to make the BUP file (this file is used by C-Boot to control the booting partitions). For this, from $JETSON_BOOTLOADER/
, run:
cd $JETSON_BOOTLOADER sudo ./nv_smd_generator smd_info.cfg slot_metadata.bin
Finally, run the NVIDIA flashing tool:
cd $JETSON_BOOTLOADER cd $JETPACKDIR sudo ./flash.sh jetson-tx2 mmcblk0p1
Enabling the A/B redundancy from user space
This option is ideal for devices that have been flashed with an A/B disabled SMD image. In order to verify that A/B redundancy is disabled you can run the following command:
sudo nvbootctrl dump-slots-info
And based on its output you can verify if the redundancy was enabled previously. For example:
- A/B disabled output
magic:0x43424e00, version: 3 features: 0 num_slots: 1 slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1 slot: 1, priority: 0, suffix: , retry_count: 0, boot_successful: 0
- A/B enabled output
magic:0x43424e00, version: 3 features: 3 num_slots: 2 slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1 slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1
As it can be noticed, when the redundancy is disabled the slot 1 has 0 in the priority, retry_count and boot_successful values and it has not suffix information.
If your case was that it was disabled you can enable the A/B redundancy by running:
sudo nv_update_engine --enable-ab
Note: This command will fail if redundancy was already enabled. |
Verifying that A/B redundancy is enabled
For checking if the process was successful, run in the TX2:
sudo nvbootctrl dump-slots-info
It should show an output similar to this:
slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1 slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1
Where both slots have valid information in their properties.
TX2 A/B redundancy uses a group of partitions called "slots". A slot contains all the necessary partitions which make TX2 capable to boot properly. In the case of TX2, it has two slots: Slot 0 for the principal partitions and Slot 1 for recovery partitions. Besides, the difference between the principal partitions and the recovery ones is basically the suffix. For example, the principal partition which stores the DTB is kernel-dtb and the DTB recovery partition is kernel-dtb_b. The suffix _b indicates that is a recovery partition.
Also, it is important to highlight that the priority order defines which slot is used for booting. In the example shown above, slot 0 is going to be used by C-boot during the boot process.
Recovering the system after a partition failure
If the A/B redundancy is enabled and a principal partition (for example: kernel-dtb
) gets broken, the TX2 will fall back to the recovery partition (kernel-dtb_b
), which has the same content that the main partition had. However, this process will disable the principal partition indefinitely. For, going back to use the principal partition after fixing it, enable it in the target board (TX2) using:
SLOT=0 nvbootctrl set-active-boot-slot $SLOT
nvbootctrl set-active-boot-slot
allows you to enable a partition again. The SLOT=0
partitions are the principal partitions and the SLOT=1
partitions are redundant or recovery partitions.
Use case
Suppose that kernel-dtb gets broken after an upgrade attempt. When the TX2 reboots, it will fall back to the kernel-dtb_b and the system is now usable again. After fixing kernel-dtb, the user rebooted the TX2, but it continues falling back to kernel-dtb_b.
To see what happened:
sudo nvbootctrl dump-slots-info
Giving:
slot: 0, priority: 15, suffix: _a, retry_count: 0, boot_successful: 0 slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1
Please, note that retry_count
and boot_successful
are both in zero. So, the SLOT 0 will not work.
For solving this issue, the user has to enable the SLOT 0 and mark it bootable once more. To do so:
SLOT=0 nvbootctrl set-active-boot-slot $SLOT
This leads to:
slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 0 slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1
A retry_count
value different of zero gives SLOT 0 a try in the next boot. If it boots properly, it will continue booting from SLOT 0 and TX2 will use kernel-dtb again.
Disabling the A/B redundancy from user space
You can disable the A/B redundancy whether it was enabled from flash or userspace methods by running:
sudo nv_update_engine --disable-ab
Note: This command will fail if the redundancy was already disabled or if the system is currently running on slot 1 (B). |
Conclusion
A/B redundancy is disabled by default in TX2. To enable it, modify smd_info.cfg, run nv_smd_generator
and flash the TX2 or enable it from user space using nv_update_engine
. If you use the flash method and do not want to rebuild the filesystem, you can use the -r
parameter when executing flash.sh
.
On the other hand, there are some useful commands in TX2 to verify the status of the A/B redundancy:
nvbootctrl get-current-slot: It shows the current slot nvbootctrl set-active-boot-slot $SLOT: It chooses the $SLOT for the next boot. nvbootctrl dump-slots-info: It shows all slots details.
See also
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.