Orin AGX Bug: Workqueue Lockup When Using GPU

pierce · November 4, 2025, 6:54pm

Hello!

We are running into an issue where we get a Kernel Lockup intermittently on an Orin AGX (example log below).

We have seen this issue using both L4T 35.3.1 and L4T 35.6.2
Increasing vm.min_free_kbytes seems to reduce the frequency of occurrence. Decreasing to 50MB gets the issue to happen pretty reliably after ~20minutes. Increasing to 8GB or 16GB seems to reduce the occurrence to every few hours but not cured completely.
This only occurs when using applications that utilize the GPU.
We have not been able to get reliable reproduction steps or an example exhibiting the problem unfortunately
There are few similar threads on the forums here, but none with a clear resolution.
When the lockup occurs, the system becomes unusable and requires a powercycle to regain functionality.

Any help or insight would be appreciated!

[   34.655758] Adding 2678708k swap on /dev/zram11.  Priority:5 extents:1 across:2678708k SS
[  523.231403] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 31s!
[  523.231403] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 31s!
[  523.231685] Showing busy workqueues and worker pools:
[  523.231691] workqueue events: flags=0x0
[  523.231697]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
[  523.231709]     pending: vmpressure_work_fn, free_work, kfree_rcu_monitor
[  523.231745] workqueue rcu_gp: flags=0x8
[  523.231748]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  523.231753]     pending: wait_rcu_exp_gp
[  523.231760] workqueue mm_percpu_wq: flags=0x8
[  523.231763]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=3/256 refcnt=6
[  523.231768]     pending: drain_local_pages_wq BAR(2626), vmstat_update, lru_add_drain_per_cpu BAR(85)
[  523.231785] workqueue pm: flags=0x4
[  523.231787]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  523.231792]     in-flight: 25:pm_runtime_work
[  523.231799] workqueue cgroup_destroy: flags=0x0
[  523.231801]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/1 refcnt=2
[  523.231806]     pending: css_release_work_fn
[  523.231853] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=31s workers=3 idle: 3908 431

pierce · November 4, 2025, 11:06pm

This seems similar with a reported “pm_runtime_work” in-flight workqueue item: Agx orin(jetpack5.1.2) report errors "BUG: workqueue lockup" - #8 by 1712127445

Does anyone know what pm_runtime_work is?

1712127445 · November 5, 2025, 1:48am

This problem is still unsolved.

AastaLLL · November 5, 2025, 3:35am

Hi,

There is no conclusion on the topic you mentioned since we don’t have a way to reproduce this.

We have not been able to get reliable reproduction steps or an example exhibiting the problem unfortunately

Could you help us find a way to reproduce this? Or is this only reproducible with some internal source?

Thanks.

pierce · November 5, 2025, 4:05pm

Thanks for checking in @AastaLLL

Unfortunately, we have only been able to reproduce this when running with our own internal source. We don’t have a simple reproduction step to provide unfortunately.

We will tried turning on the debug logging you mentioned in another thread and see if we can capture a kernlog: echo 0x20 > /sys/kernel/debug/gpu.0/log_mask

Interestingly, we don’t ever see this problem on an Orin NX running the same L4T and same source code.

Thanks again,
-Pierce

pierce · November 5, 2025, 7:35pm

Hi All (and especially @AastaLLL)

Here is a captured kernlog with the extra GPU debugging enabled:
captured_kern.log (8.1 MB)

Hopefully that helps yield some clues.

AastaLLL · November 6, 2025, 9:03am

Hi,

Thanks for the update.
We will give it a check to see if any clues.

Are you able to share some details about the use case?
For example, what kind of CUDA kernel is in your code?
Is this a multi-threading or multi-process scenario?

Thanks.

pierce · November 6, 2025, 10:09pm

Hi @AastaLLL
We are currently using CUDA 11.4
Yes this is a multi-threaded application, but only 1 application is using the GPU.

Thanks again!
-Pierce

AastaLLL · November 10, 2025, 7:18am

Hi,

Is r36 an option for you?
If yes, could you try if the same issue also occurs on the JetPack 6?

Thanks.

pierce · November 10, 2025, 4:44pm

Hi @AastaLLL

Unfortunately upgrading to L4T 36 is not an easy option for us, as the change to Ubuntu 22 vs Ubuntu 20 as the base OS.

Is there a specific fix you are thinking about that is in L4T 36?

Thanks

AastaLLL · November 12, 2025, 7:57am

Hi,

In the forum topic you shared, the user doesn’t share a reproducible app as well.

As this is not a known issue, we need to reproduce locally and gather more information before providing further suggestions.

Thanks.

AastaLLL · December 4, 2025, 4:50am

Hi,

Have you found a way to reproduce this and can share with us?
We need to reproduce this locally to gather more information for the lockup issue.

Thanks.

Topic		Replies	Views
Agx orin(jetpack5.1.2) report errors "BUG: workqueue lockup" Jetson AGX Orin kernel	23	289	September 23, 2025
Kernel locked on Orin Jetson AGX Orin kernel	10	2037	June 30, 2023
Workqueue lockup happens whenever memory usage is high Jetson AGX Orin kernel	2	2283	August 9, 2023
Gpu not work Jetson AGX Orin gpu	21	371	August 20, 2025
BUG: workqueue lockup - pool cpus=11 node=0 flags=0x0 nice=0 stuck for 14757s! Jetson AGX Orin	2	850	December 28, 2023
The precursor of kernel lock and GPU driver exception Jetson AGX Orin gpu	6	118	August 25, 2025
JetPack 5.1 Jetson AGX Xavier kernel	7	573	February 29, 2024
GPU not responding Jetson AGX Orin	10	170	September 10, 2025
Did I kill it? Jetson AGX Orin reflash	3	986	June 17, 2022
Orin kworker/4:1+pm cpu100% Jetson AGX Orin performance , chinese	5	138	March 26, 2025

Orin AGX Bug: Workqueue Lockup When Using GPU

Related topics