Libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4

Hi,

I am working in a custom system, using AGX Orin 64G SOM, with custom software based on Yocto Kirkstone,

I think I have a problem with nvidia-container-runtime, for examplenvidia-container-cli info doesn’t print the GPU information:

root@ubuntu:~# nvidia-container-cli info
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4
nvidia-container-cli: initialization error: cuda error: unknown error

Docker runtime

root@change-me:~# cat /etc/docker/daemon.json
{
  "bip": "240.10.0.1/24", 
  "fixed-cidr": "240.10.0.0/24" ,
  "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
    }},
  "default-runtime": "nvidia"
}

I have checked if the drivers are loaded:

root@ubuntu:~# lsmod | grep nvgpu
nvgpu                2662400  0
nvmap                 229376  2 nvgpu

Check if there is GPU devices:

root@ubuntu:~# ls -l /dev/nv*
crw-rw---- 1 root video 506,   0 Apr 28  2022 /dev/nvhost-ctrl
crw-rw---- 1 root video 506,  58 Apr 28  2022 /dev/nvhost-ctrl-isp
crw-rw---- 1 root video 506,  22 Apr 28  2022 /dev/nvhost-ctrl-nvdec
crw-rw---- 1 root video 506,  50 Apr 28  2022 /dev/nvhost-ctrl-nvdla0
crw-rw---- 1 root video 506,  54 Apr 28  2022 /dev/nvhost-ctrl-nvdla1
crw-rw---- 1 root video 506,  46 Apr 28  2022 /dev/nvhost-ctrl-pva0
crw-rw---- 1 root video 506,  57 Apr 28  2022 /dev/nvhost-isp
crw-rw---- 1 root video 506,  41 Apr 28  2022 /dev/nvhost-isp-thi
crw-rw---- 1 root video 506,  13 Apr 28  2022 /dev/nvhost-msenc
crw-rw---- 1 root video 506,  29 Apr 28  2022 /dev/nvhost-nvcsi
crw-rw---- 1 root video 506,  21 Apr 28  2022 /dev/nvhost-nvdec
crw-rw---- 1 root video 506,  49 Apr 28  2022 /dev/nvhost-nvdla0
crw-rw---- 1 root video 506,  53 Apr 28  2022 /dev/nvhost-nvdla1
crw-rw---- 1 root video 506,   5 Apr 28  2022 /dev/nvhost-nvjpg
crw-rw---- 1 root video 506,   9 Apr 28  2022 /dev/nvhost-nvjpg1
crw-rw---- 1 root video 506,  17 Apr 28  2022 /dev/nvhost-ofa
crw-rw---- 1 root video 505,   0 Apr 28  2022 /dev/nvhost-power-gpu
crw-rw---- 1 root video 506,  45 Apr 28  2022 /dev/nvhost-pva0
crw-rw---- 1 root video 506,  25 Apr 28  2022 /dev/nvhost-tsec
crw-rw---- 1 root video 506,  61 Apr 28  2022 /dev/nvhost-vi0
crw-rw---- 1 root video 506,  33 Apr 28  2022 /dev/nvhost-vi0-thi
crw-rw---- 1 root video 506,  65 Apr 28  2022 /dev/nvhost-vi1
crw-rw---- 1 root video 506,  37 Apr 28  2022 /dev/nvhost-vi1-thi
crw-rw---- 1 root video 506,   1 Apr 28  2022 /dev/nvhost-vic
crw-rw-rw- 1 root root  195, 254 Apr 28  2022 /dev/nvidia-modeset
crw-rw-rw- 1 root root  195,   0 Apr 28  2022 /dev/nvidia0
crw-rw-rw- 1 root root  195, 255 Apr 28  2022 /dev/nvidiactl
crw-rw---- 1 root video  10,  55 Apr 28  2022 /dev/nvmap
crw------- 1 root root  234,   0 Apr 28  2022 /dev/nvme0
brw-rw---- 1 root disk  259,   0 Apr 28  2022 /dev/nvme0n1
crw-rw-rw- 1 root root  235,   0 Apr 28  2022 /dev/nvsciipc

/dev/nvgpu:
total 0
drwxr-xr-x 2 root root 60 Apr 28  2022 igpu0

Verify if library exists

root@change-me:/usr/lib# ls -l /usr/lib/libnvrm_gpu.so
-rw-r--r-- 1 root root 379584 Mar  9  2018 /usr/lib/libnvrm_gpu.so

I am not sure how to track the root of the problem.

*** Please note that this reply is generated by LLM automatically ***

To address the issue you’re experiencing with nvidiacontainerruntime not printing GPU information in your custom system using Jetson AGX Orin 64G SOM with Yocto Kirkstone, let’s break down the potential steps and considerations to help you track down the root of the problem.

1. Verify NVIDIA Drivers and GPU Devices

First, ensure that the NVIDIA drivers are correctly installed and loaded. You can use the nvidia-smi command to check if the GPU is recognized by the system. If nvidia-smi doesn’t show your GPU or doesn’t work, it could indicate a problem with the driver installation.

  • Command: nvidia-smi
  • Expected Outcome: Detailed information about your GPU, including its model, memory usage, and temperature.

If the drivers seem to be installed but nvidia-smi doesn’t work, you might need to reinstall the drivers or check for any specific instructions related to the Jetson AGX Orin and Yocto Kirkstone.

2. Check for GPU Devices

Ensure that the system recognizes the GPU device. You can do this by checking the output of lspci or looking for GPU-related devices in /dev.

  • Command: lspci | grep -i nvidia
  • Expected Outcome: A list of NVIDIA devices recognized by the system.

3. Library Existence and Permissions

Verify that the necessary libraries for GPU support are present and have the correct permissions. This includes libraries like libnvidia-ml.so for NVIDIA Management Library (NVML) support.

  • Command: ldconfig -p | grep libnvidia-ml
  • Expected Outcome: The path to libnvidia-ml.so indicating it’s available and linked.

4. Docker Runtime and NVIDIA Container Runtime

Given that you’re using Docker, ensure that the NVIDIA Container Runtime is properly installed and configured. This runtime provides the necessary support for GPU acceleration within Docker containers.

  • Check Installation: Verify that nvidia-container-runtime is installed and up-to-date.
  • Configure Docker: Ensure Docker is configured to use the NVIDIA runtime. You can do this by editing the Docker daemon configuration file (/etc/docker/daemon.json) to include the NVIDIA runtime.

Example /etc/docker/daemon.json:

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
  • Restart Docker: After configuring, restart the Docker service to apply the changes.

5. Testing with a Simple Container

Try running a simple Docker container that uses the GPU to verify if the issue is specific to your custom software or a more general problem with GPU support in Docker.

  • Command: docker run --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
  • Expected Outcome: The output of nvidia-smi from within the container, indicating GPU support is working.

6. Custom Software and Yocto Kirkstone

If the above steps indicate that GPU support is working outside of your custom software, the issue might be specific to your application or its configuration within Yocto Kirkstone.

  • Review Application Logs: Look for any error messages related to GPU initialization or access.
  • Check Dependencies: Ensure all necessary dependencies for GPU support are included in your Yocto build.

Conclusion

By systematically checking each component involved in providing GPU support to your custom application on the Jetson AGX Orin with Yocto Kirkstone, you should be able to identify and potentially resolve the issue preventing nvidiacontainerruntime from printing GPU information. If problems persist, consider seeking support from NVIDIA’s developer forums or Yocto community resources for more tailored advice.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

The file /etc/docker/daemon.json include the NVIDIA runtime. (see first post)

Check for GPU Devices

root@ubuntu:~# lspci | grep -i nvidia
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)

The command ldconfig -p | grep libnvidia-ml doesn’t produce and output but I have another system that uses ORIN AGX where this command doesn’t produce and output either but nvidia-container-cli info provides an output

root@ubuntu:~# ldconfig -p | grep libnvidia-ml
root@ubuntu:~# 

Hi, I have the same error (libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4) running this container: nvidia nvcr.io/nvidia/cuda:11.4.2-runtime-ubuntu20.04

root@change-me:~# docker run --rm --runtime=nvidia nvcr.io/nvidia/cuda:11.4.2-runtime-ubuntu20.04
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4

==========
== CUDA ==
==========

CUDA Version 11.4.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Hi,

nvcr.io/nvidia/cuda:11.4.2-runtime-ubuntu20.04 doesn’t support iGPU, so it won’t work.
Usually, you will see the igpu tag if a container can work on Jetson.

The nvidia-container-cli is no longer supported on Jetsons that use iGPUs.
So the error is expected.

To check GPU functionality, you can use the command below instead:

$ sudo docker run --rm --runtime nvidia nvcr.io/nvidia/l4t-cuda:12.6.11-runtime nvidia-smi

==========
== CUDA ==
==========

CUDA Version 12.6.11

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Wed Jul 23 03:45:22 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.4.0                Driver Version: 540.4.0      CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Thanks.

Hi, thanks for the information

This is my output using nvidia nvcr.io/nvidia/l4t-cuda:12.6.11-runtime nvidia-smi

root@ubuntu:~# sudo docker run --rm --runtime nvidia nvcr.io/nvidia/l4t-cuda:12.6.11-runtime nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 1: unknown.

Try this. I don’t remember what it is in it that fixed similar error for me but it just worked on Jetpack 6.2.1

docker run -it --net=host --runtime nvidia --privileged --ipc=host --ulimit \
memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/l4t-cuda:12.6.11-runtime \
nvidia-smi

Hi,

I just changed the Cuda version since we have version 11.4.19-1-r0

Awesome, these parameters solve the error.

root@ubuntu:~# docker run -it --net=host --runtime nvidia --privileged --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/l4t-cuda:11.4.19-runtime nvidia-smi

==========
== CUDA ==
==========

CUDA Version 11.4.19

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found

I suppose nvidia-smi doesn’t exist in this container

I need to figure out why these parameters avoid the error (libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4)

root@ubuntu:~# sudo docker run --rm --runtime nvidia nvcr.io/nvidia/l4t-cuda:11.4.19-runtime nvidia-smi
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4

==========
== CUDA ==
==========

CUDA Version 11.4.19

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found

I have seen using this command

docker run -it --net=host --runtime nvidia --privileged --ipc=host \
--ulimit memlock=-1 --ulimit stack=67108864 \
nvcr.io/nvidia/l4t-cuda:11.4.19-runtime nvidia-smi

I have this dmesg log

[207939.958665] nvgpu: 17000000.ga10b engine_fb_queue_set_element_use_state:144  [ERR]  FBQ last received queue element not processed yet queue_pos 0
[207939.958672] nvgpu: 17000000.ga10b        nvgpu_engine_fb_queue_push:373  [ERR]  fb-queue element in use map is in invalid state
[207939.958674] nvgpu: 17000000.ga10b        nvgpu_engine_fb_queue_push:401  [ERR]  falcon id-0, queue id-1, failed
[207939.958677] nvgpu: 17000000.ga10b                     pmu_write_cmd:174  [ERR]  fail to write cmd to queue 1
[207939.958678] nvgpu: 17000000.ga10b             nvgpu_pmu_rpc_execute:713  [ERR]  Failed to execute RPC status=0xffffffea, func=0x3
[207939.958680] nvgpu: 17000000.ga10b gv100_pmu_lsfm_bootstrap_ls_falcon:95   [ERR]  Failed to execute RPC, status=0xffffffea
[207939.958682] nvgpu: 17000000.ga10b nvgpu_pmu_lsfm_bootstrap_ls_falcon:128  [ERR]  LSF Load failed
[207939.958684] nvgpu: 17000000.ga10b nvgpu_gr_falcon_load_secure_ctxsw_ucode:714  [ERR]  Unable to recover GR falcon
[207939.958686] nvgpu: 17000000.ga10b        nvgpu_gr_falcon_init_ctxsw:159  [ERR]  fail
[207939.958690] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:92   [ERR]  Error reporting is not supported in this platform
[207939.958692] nvgpu: 17000000.ga10b      gr_init_ctxsw_falcon_support:833  [ERR]  FECS context switch init error
[207939.958694] nvgpu: 17000000.ga10b            nvgpu_finalize_poweron:1010 [ERR]  Failed initialization for: g->ops.gr.gr_init_support
[207939.983170] nvgpu: 17000000.ga10b                 gk20a_power_write:127  [ERR]  power_node_write failed at busy

on my agx orin with the container running bash
and executing nvidia-smi on command line in another terminal

sudo dmesg|grep -i gpu
[    9.104212] nvgpu: 17000000.gpu          nvgpu_nvhost_syncpt_init:122  [INFO]  syncpt_unit_base 60000000 syncpt_unit_size 4000000 size 10000
[   10.082265] thermal-trip-event gpu-throttle-alert: cooling device registered.
[   16.225593] nvgpu: 17000000.gpu                  gk20a_scale_init:541  [INFO]  enabled scaling for GP
[   19.252491] [drm] [nvidia-drm] [GPU ID 0x00020000] Loading driver

Hi,

We start to support nvidia-smi from JetPack 6.
In the JetPack 5 environment, please try the deviceQuery CUDA app.

Thanks.

The deviceQuery app seems is not available in this container

root@ubuntu:/# find | grep deviceQuery
root@ubuntu:/# 

I copy the app from another board, but I think it was not a good idea, the app seems is not compatible.

root@ubuntu:/# ./workspace/deviceQuery 
./workspace/deviceQuery: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./workspace/deviceQuery)
root@ubuntu:/# ldd --version          
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31

note: git clone -b (your cuda version.)

I don’t think you have to have freeglut3 but when I ran cmake it complained about not finding it.

sudo apt install freeglut3 freeglut3-dev
git clone -b v11.4 https://github.com/NVIDIA/cuda-samples.git
cuda-samples
mkdir build && cd build
cmake ..
cd Samples/1_Utilities/deviceQuery
make
./deviceQuery

Thanks whitesscott

This is the output

../../bin/aarch64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Any idea?

Here’s nvidia container runtime troubleshooting.


Was the ‘fail’ error inside docker container or on agx orin? Only thing I can think of is are these env variables set?

LD_LIBRARY_PATH=/usr/local/cuda/lib64
CUDA_HOME=/usr/local/cuda
PATH=$PATH:/usr/local/cuda/bin

Hi,

Yes, I run the command from the container, I I have the same problem defining the env variables.

I think the problem is related with the problems that I see when I run the container:

[496526.513436] nvgpu: 17000000.ga10b                nvgpu_pmu_cmd_post:591  [ERR]  FBQ cmd setup failed
[496526.513443] nvgpu: 17000000.ga10b             nvgpu_pmu_rpc_execute:713  [ERR]  Failed to execute RPC status=0xfffffff4, func=0x3
[496526.513446] nvgpu: 17000000.ga10b gv100_pmu_lsfm_bootstrap_ls_falcon:95   [ERR]  Failed to execute RPC, status=0xfffffff4
[496526.513447] nvgpu: 17000000.ga10b nvgpu_pmu_lsfm_bootstrap_ls_falcon:128  [ERR]  LSF Load failed
[496526.513450] nvgpu: 17000000.ga10b nvgpu_gr_falcon_load_secure_ctxsw_ucode:714  [ERR]  Unable to recover GR falcon
[496526.513451] nvgpu: 17000000.ga10b        nvgpu_gr_falcon_init_ctxsw:159  [ERR]  fail
[496526.513456] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:92   [ERR]  Error reporting is not supported in this platform
[496526.513458] nvgpu: 17000000.ga10b      gr_init_ctxsw_falcon_support:833  [ERR]  FECS context switch init error
[496526.513460] nvgpu: 17000000.ga10b            nvgpu_finalize_poweron:1010 [ERR]  Failed initialization for: g->ops.gr.gr_init_support
[496526.537987] nvgpu: 17000000.ga10b                 gk20a_power_write:127  [ERR]  power_node_write failed at busy

Out of the container I have:

Tegrastats report -256C for GPU

07-27-2025 17:49:32 RAM 1120/62781MB (lfb 11654x4MB) CPU [10%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,4%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] VIC_FREQ 729 APE 174 CV0@-256C [email protected] Tboard@30C [email protected] [email protected] [email protected] CV1@-256C GPU@-256C [email protected] [email protected] CV2@-256C VDD_GPU_SOC 1622mW/1622mW VDD_CPU_CV 464mW/464mW VIN_SYS_5V0 3477mW/3477mW VDDQ_VDD2_1V8AO 297mW/297mW`

There is not gpu status

root@ubuntu:~# sudo cat /sys/kernel/debug/gpu.0/status
cat: /sys/kernel/debug/gpu.0/status: Bad address

nvpmodel is also reporting problems with the GPU:

root@ubuntu:~# nvpmodel -m 3
NVPM ERROR: Error opening /sys/devices/17000000.ga10b/devfreq_dev/available_frequencies: 2
NVPM ERROR: failed to read PARAM GPU: ARG FREQ_TABLE: PATH /sys/devices/17000000.ga10b/devfreq_dev/available_frequencies
NVPM ERROR: failed to set power mode!
NVPM ERROR: optMask is 1, no request for power mode

The directory devfreq_dev doesn’t exist.

root@ubuntu:~# ls /sys/devices/17000000.ga10b/devfreq_dev/
ls: cannot access '/sys/devices/17000000.ga10b/devfreq_dev/': No such file or directory

On agx orin my path goes through platform/bus@0

ll /sys/devices/platform/bus@0/17000000.gpu/devfreq/17000000.gpu/


Don’t know if this will help, it worked for me on agx orin jetpack 6.2.1

docker pull nvcr.io/nvidia/pytorch:25.06-py3-igpu

Note the pytorch:25.06-py3-igpu image is over 11gb when decompressed.
Make sure that you are using docker container image with suffix igpu

According to this oe4t discussion here is the docker run command to use.

docker run \
    --name "pytorch_igpu" \
    --rm -itd \
    --runtime nvidia \
    --network=host --ipc=host --privileged \
    --volume /dev:/dev \
    --env USER="$USER" \
    --user="$(id -u "$USER")":"$(id -g "$USER")" \
    --group-add sudo --group-add dialout --group-add video \
    --volume="/etc/group:/etc/group:ro" \
    --volume="/etc/passwd:/etc/passwd:ro" \
    --volume="/etc/shadow:/etc/shadow:ro" \
    --volume="/etc/sudoers.d:/etc/sudoers.d:ro" \
    --env="DISPLAY" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    --env HOME="$HOME" \
    --workdir="$PWD" \
    --volume "$HOME":"$HOME" \
    --env HISTFILE="$HOME"/.bash_history_sr \
    --volume "/opt/resources/settings":"/opt/resources/settings" \
    "nvcr.io/nvidia/pytorch:25.06-py3-igpu" 




docker attach pytorch_igpu

python

import torch

print(torch.__version__)
print(torch.cuda.is_available())
print(torch.empty((1, 2), device=torch.device("cuda")))

python /workspace/tutorials/recipes_source/recipes/defining_a_neural_network.py

cp $PathTo/deviceQuery .

./workspace/deviceQuery



Here’s a similar post to yours and a suggestion of a possible solution for the “nvgpu_gr_falcon_init_ctxsw:159 [ERR] fail” errors.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.