System stalled

Hi, Nvidia team:
The system was stalled periodically. when I type in keyboard, there is no react on screen, and later on all characters are coming together.
the serial port print following log:
"
��ph =��ter idle ta��GPC ��sk.
��> logic map: 0=>0 1=>2 2=>1
GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.
��> logic map: 0=>0 1=>2 2=>1
GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.
��> logic map: 0=>0 1=>2 2=>1
GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.
��> logic map: 0=>0 1=>2 2=>1
GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.
��> logic map: 0=>0 1=>2 2=>1
GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.

"
The HW is Thor, and SW is jetpack_7 GA.
Could you provide any sugguestion?
BR

sometime we can catch kernel log as follow:
"
[ 168.467378] rcu: INFO: rcu_sched self-detected stall on CPU
[ 168.467395] rcu: (t=5250 jiffies g=20469 q=19504 ncpus=14)0x4000000000000000 softirq=13642/13642 fqs=2610
csdsdewedwedededwedewdee[ 186.423397] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-… } 5575 jiffies s: 1369 root: 0x1/.
[ 186.423413] rcu: blocking rcu_node structures (internal RCU debug):
[ 186.423426] NMI backtrace for cpu 0

"

Can issue be reproduced on Thor devkit?

Yeah, It can be reproduced on Thor devkit.

Hi ekeechg,

Please use nv_tcu_demuxer and refer to Tegra Combined UART — NVIDIA Jetson Linux Developer Guide to access the serial console through CCPLEX: 0 port and capture the logs.

The CCPLEX0.txt log is empty

Please check other logs and find where it get stuck or any errors.
Maybe you can also check BPMP.txt, TZ0.txt.


sometimes, the Xorg process will have 100% cpu occupation, and the Ubuntu UI will stalled.

and dmesg have following log:
[ 131.312363] rcu: INFO: rcu_sched self-detected stall on CPU
[ 131.312367] rcu: 0-…: (21000 ticks this GP) idle=dca4/1/0x4000000000000000 softirq=4078/4078 fqs=7144
[ 131.312375] rcu: (t=21007 jiffies g=5805 q=6703 ncpus=14)
[ 131.312379] CPU: 0 PID: 3245 Comm: Xorg Tainted: G W OE 6.8.12-rt-tegra #1
[ 131.312383] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 38.1.0-gcid-41656216 08/07/2025
[ 131.312385] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=–)
[ 131.312389] pc : prepare_to_wait_event+0x74/0x17c
[ 131.312399] lr : dce_wait_cond_wait_interruptible+0x180/0x228 [tegra_dce]
[ 131.312417] sp : ffff8000a188b280
[ 131.312419] x29: ffff8000a188b280 x28: 0000000000000010 x27: ffffd7cfb3ff4d20
[ 131.312424] x26: ffffd7cfb290a1f0 x25: ffffd7cfb290a1f0 x24: ffffd7cfb290b638
[ 131.312429] x23: ffff8000a188b2b0 x22: 0000000000000000 x21: 0000000000000001
[ 131.312434] x20: ffff8000a188b2b0 x19: ffffd7cfb290b638 x18: ffffffffffffffff
[ 131.312439] x17: 0000000000000000 x16: ffffd7cfe7ec69b8 x15: ffff8000a188b130
[ 131.312444] x14: ffff8000a188b2c8 x13: 2e676e6979727465 x12: 72203a6465747075
[ 131.312449] x11: ffff00012ebf5c00 x10: 253de0d6288ba31b x9 : 3a03ed6e6f00448e
[ 131.312454] x8 : ffff8000a188b2c8 x7 : 747075727265746e x6 : 692074696177205d
[ 131.312459] x5 : 0000000000000000 x4 : ffff00012ebf5b40 x3 : ffff8000a188b2c8
[ 131.312464] x2 : ffff8000a188b2c8 x1 : ffff8000a188b2c8 x0 : fffffffffffffe00
[ 131.312469] Call trace:
[ 131.312470] prepare_to_wait_event+0x74/0x17c
[ 131.312477] dce_wait_cond_wait_interruptible+0x180/0x228 [tegra_dce]
[ 131.312491] dce_client_ipc_wait+0xd4/0x188 [tegra_dce]
[ 131.312505] dce_ipc_send_message_sync+0x90/0x288 [tegra_dce]
[ 131.312519] tegra_dce_client_ipc_send_recv+0x94/0x1d0 [tegra_dce]
[ 131.312532] nv_tegra_dce_client_ipc_send_recv+0x38/0x64 [nvidia]
[ 131.313320] dceclientSendRpc_IMPL+0x64/0xe0 [nvidia]
[ 131.314050] _dceRpcIssueAndWait.isra.0+0x80/0x100 [nvidia]
[ 131.314762] rpcRmApiControl_dce+0xc8/0x1b0 [nvidia]
[ 131.315471] rmresControl_Prologue_IMPL+0xb4/0x1c0 [nvidia]
[ 131.316193] resControl_IMPL+0xec/0x1d0 [nvidia]
[ 131.316920] serverControl+0x3b8/0x4a0 [nvidia]
[ 131.317632] _rmapiRmControl+0x474/0x6a0 [nvidia]
[ 131.318329] rmapiControlWithSecInfo+0xa8/0x150 [nvidia]
[ 131.319020] rmapiControlWithSecInfoTls+0x74/0xe0 [nvidia]
[ 131.319708] _nv04ControlWithSecInfo.constprop.0+0x80/0xa0 [nvidia]
[ 131.320395] Nv04ControlKernel+0x50/0x60 [nvidia]
[ 131.321081] nvkms_call_rm+0x58/0x94 [nvidia_modeset]
[ 131.321222] nvRmApiControl+0x50/0x70 [nvidia_modeset]
[ 131.321358] __arm64_sys_ioctl+0xac/0xf0
[ 131.321368] invoke_syscall+0x48/0x114
[ 131.321376] el0_svc_common.constprop.0+0x40/0xe0
[ 131.321384] do_el0_svc+0x1c/0x28
[ 131.321393] el0_svc+0x30/0xa8
[ 131.321402] el0t_64_sync_handler+0x120/0x12c
[ 131.321411] el0t_64_sync+0x194/0x198

The stall issue can be reproduce in devkit.
SW is JP7 GA.

Do you run any application at that moment?

Please also share /var/logXorg.0.log and the result of nvidia-smi from your board for further check.

I didn’t run any application at that moment.
when Xorg go 100%, there is no react in UI, and I have to reboot the board.

what is follow log mean?
"GPC ph => logic map: 0=>0 1=>2 2=>1
��INFO: END TASK:MB��
INFO: enter idle task.
INFO: END TASK:MB��
��ph =��ter idle ta��GPC ��sk.
"
when it show up in serial, the UI is stuck.

They are the logs output from bpmp
I think they are no harmful as I do see them on my setup but I don’t hit system hang issue when I get them.

It seems there’s no result when you run nvidia-smi.
I can get the output from nvidia-smi as following.

$ nvidia-smi
Tue Sep 23 08:21:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00                 Driver Version: 580.00         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA Thor                    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   N/A  N/A             N/A  /  N/A  | Not Supported          |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2081      G   /usr/lib/xorg/Xorg                        0MiB |
|    0   N/A  N/A            2255      G   /usr/bin/gnome-shell                      0MiB |
+-----------------------------------------------------------------------------------------+

Do you run sudo ./apply_binaries.sh --openrm before flash?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.