I encountered the following problems when using GPU inference:
] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:47 prasselmagicbox kernel: [693378.508823] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:47 prasselmagicbox kernel: [693378.519097] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:47 prasselmagicbox kernel: [693378.533115] nvgpu: 57000000.gpu gk20a_tsg_unbind_channel:169 [ERR] Channel 507 unbind failed, tearing down TSG 4
Nov 23 00:47:47 prasselmagicbox kernel: [693378.644945] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:47 prasselmagicbox kernel: [693378.658149] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:47 prasselmagicbox kernel: [693378.668421] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:50 prasselmagicbox kernel: [693381.682952] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_runlist_wait_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:50 prasselmagicbox kernel: [693381.696353] nvgpu: 57000000.gpu gk20a_fifo_runlist_wait_pending:3409 [ERR] runlist wait timeout: runlist id: 0
Nov 23 00:47:50 prasselmagicbox kernel: [693381.706833] nvgpu: 57000000.gpu gk20a_fifo_update_runlist_locked:3705 [ERR] runlist 0 update timeout
Nov 23 00:47:50 prasselmagicbox kernel: [693381.816800] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:50 prasselmagicbox kernel: [693381.830013] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:50 prasselmagicbox kernel: [693381.840284] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:50 prasselmagicbox kernel: [693381.854276] nvgpu: 57000000.gpu gk20a_tsg_unbind_channel:169 [ERR] Channel 503 unbind failed, tearing down TSG 4
Nov 23 00:47:51 prasselmagicbox kernel: [693381.966136] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:51 prasselmagicbox kernel: [693381.979348] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:51 prasselmagicbox kernel: [693381.989619] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:54 prasselmagicbox kernel: [693385.004048] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_runlist_wait_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:54 prasselmagicbox kernel: [693385.017444] nvgpu: 57000000.gpu gk20a_fifo_runlist_wait_pending:3409 [ERR] runlist wait timeout: runlist id: 0
Nov 23 00:47:54 prasselmagicbox kernel: [693385.027802] nvgpu: 57000000.gpu gk20a_fifo_update_runlist_locked:3705 [ERR] runlist 0 update timeout
Nov 23 00:47:54 prasselmagicbox kernel: [693385.137879] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:54 prasselmagicbox kernel: [693385.151126] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:54 prasselmagicbox kernel: [693385.161422] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:54 prasselmagicbox kernel: [693385.175415] nvgpu: 57000000.gpu gk20a_tsg_unbind_channel:169 [ERR] Channel 504 unbind failed, tearing down TSG 4
Nov 23 00:47:54 prasselmagicbox kernel: [693385.286931] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:54 prasselmagicbox kernel: [693385.300141] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:54 prasselmagicbox kernel: [693385.310404] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:57 prasselmagicbox kernel: [693388.324612] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_runlist_wait_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:57 prasselmagicbox kernel: [693388.338018] nvgpu: 57000000.gpu gk20a_fifo_runlist_wait_pending:3409 [ERR] runlist wait timeout: runlist id: 0
Nov 23 00:47:57 prasselmagicbox kernel: [693388.348372] nvgpu: 57000000.gpu gk20a_fifo_update_runlist_locked:3705 [ERR] runlist 0 update timeout
Nov 23 00:47:57 prasselmagicbox kernel: [693388.458426] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68/0x128 [nvgpu]
Nov 23 00:47:57 prasselmagicbox kernel: [693388.471637] nvgpu: 57000000.gpu gk20a_fifo_is_preempt_pending:3006 [ERR] preempt timeout: id: 4 id_type: 1
Nov 23 00:47:57 prasselmagicbox kernel: [693388.481930] nvgpu: 57000000.gpu gk20a_fifo_preempt_tsg:3140 [ERR] preempt timed out for tsgid: 4, ctxsw timeout will trigger recovery if needed
Nov 23 00:47:57 prasselmagicbox kernel: [693388.495921] nvgpu: 57000000.gpu gk20a_tsg_unbind_channel:169 [ERR] Channel 506 unbind failed, tearing down TSG 4
ov 23 00:47:57 prasselmagicbox kernel: [693388.607715] nvgpu: 57000000.gpu __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gk20a_fifo_is_preempt_pending+0x68
CPU ISSUE What is this process irq/79-gk20a_s??
ov 27 10:28:17 prasselmagicbox kernel: [ 286.607894] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:18 prasselmagicbox kernel: [ 287.631862] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:19 prasselmagicbox kernel: [ 288.659846] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:20 prasselmagicbox kernel: [ 289.679855] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:21 prasselmagicbox kernel: [ 290.707802] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:22 prasselmagicbox kernel: [ 291.727781] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:23 prasselmagicbox kernel: [ 292.751782] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:24 prasselmagicbox kernel: [ 293.775762] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:25 prasselmagicbox kernel: [ 294.799717] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:26 prasselmagicbox kernel: [ 295.823715] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:27 prasselmagicbox kernel: [ 296.847682] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:28 prasselmagicbox kernel: [ 297.871658] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:29 prasselmagicbox kernel: [ 298.895652] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:30 prasselmagicbox kernel: [ 299.919620] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:31 prasselmagicbox kernel: [ 300.943603] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:32 prasselmagicbox kernel: [ 301.967609] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:33 prasselmagicbox kernel: [ 302.991588] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Nov 27 10:28:34 prasselmagicbox kernel: [ 304.015742] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature 2147483647 forced to 127000
Board information: jetson nano 4GB

