Gdscheck says gds supported but nvidiafs shows error

Using an Ubuntu 24.04 , linux kernel 6.8.0-48-generic and nvme SSD with NVIDIA Corporation GA107M [GeForce RTX 3050 Ti Mobile] GPU .
using Driver Version: 570.172.08 and CUDA Version: 12.8.

This is my gdscheck output

GDS release version: 1.14.1.1
 nvidia_fs version:  2.26 libcufile version: 2.12
 Platform: x86_64
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe P2PDMA        : Unsupported
 NVMe               : Supported
 NVMeOF             : Supported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 ScaTeFS            : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Enabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_pci_p2pdma : false
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.per_buffer_cache_size_kb : 1024
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384 
 properties.posix_pool_slab_count : 128 64 64 
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 0
 fs.scatefs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 fs.gpfs.gds_write_support: false
 fs.gpfs.gds_async_support: true
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 execution.max_io_threads : 4
 execution.max_io_queue_depth : 128
 execution.parallel_io : true
 execution.min_io_threshold_size_kb : 8192
 execution.max_request_parallelism : 4
 properties.force_odirect_mode : false
 properties.prefer_iouring : false
 =========
 GPU INFO:
 =========
 GPU index 0 NVIDIA GeForce RTX 3050 Ti Laptop GPU bar:1 bar size (MiB):4096 supports GDS, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 Found ACS enabled for switch 0000:00:01.0
 IOMMU: disabled
 Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
 Cuda Driver Version Installed:  12080
 Platform: XPS 15 9520, Arch: x86_64(Linux 6.8.0-48-generic)
 Platform verification succeeded

This is the dmesg error,

[  119.210065] nvidia_fs: registered correctly with major number 502
[  119.210813] max_peer_devs : 64 and max_pci_depth : 16
[  119.211685] nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 
[  119.211697] nvidia-fs:warning: error retrieving numa node for device 0000:03:00.0 
[  119.211712] nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 
[  119.211721] nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0 
[  147.202440] audit: type=1400 audit(1756719516.504:303): apparmor="DENIED" operation

These devices correspond to the nvme’s (2 of them) and the GPU.

another error was,

[ 1255.901058] nvidia-fs:nvfs_pin_gpu_pages:1336 Error ret -22 invoking nvidia_p2p_get_pages_persistent
                va_start=0x7e7f75000000/va_end=0x7e7f7501ffff/rounded_size=0x20000/gpu_buf_length=0x20000
[ 1255.901834] nvidia-fs:nvfs_pin_gpu_pages:1336 Error ret -22 invoking nvidia_p2p_get_pages_persistent
                va_start=0x7e7f75020000/va_end=0x7e7f7511ffff/rounded_size=0x100000/gpu_buf_length=0x100000

The cufile.log looks like this

 01-09-2025 14:57:37:93 [pid=22681 tid=22681] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:57:37:94 [pid=22681 tid=22681] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:57:37:94 [pid=22681 tid=22681] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:13:571 [pid=22798 tid=22798] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:13:571 [pid=22798 tid=22798] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:13:572 [pid=22798 tid=22798] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:19:879 [pid=22824 tid=22824] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:19:880 [pid=22824 tid=22824] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:19:880 [pid=22824 tid=22824] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:555 map failed

 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio-obj:156 error allocating nvfs handle, size: 131072
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio_core:1659 cuFileBufRegister error, object allocation failed
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio_core:1740 cuFileBufRegister error cufile success
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:555 map failed

 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:874 Buffer map failed for PCI-Group: 0 GPU: 0
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  0:1299 failed to get bounce buffer for PCI group 0 GPU 0
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio_core:3391 Final direct subio failed retval  -5011  buf_offset:  0  file_offset:  0  size:  131072
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio_core:3412 Setting I/O to failed. Expected I/O Size  131072  actual:  0
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio:146 cuFileBufDeregister error, object for device pointer is not registered
 01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR  cufio:172 cuFileBufDeregister error: device pointer lookup failure
 01-09-2025 15:26:56:720 [pid=11934 tid=11934] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:26:56:729 [pid=11934 tid=11934] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:26:56:730 [pid=11934 tid=11934] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:27:05:113 [pid=11970 tid=11970] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:27:05:113 [pid=11970 tid=11970] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:27:05:114 [pid=11970 tid=11970] ERROR  cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:555 map failed

 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  cufio-obj:156 error allocating nvfs handle, size: 131072
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  cufio_core:1659 cuFileBufRegister error, object allocation failed
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  cufio_core:1740 cuFileBufRegister error cufile success
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:555 map failed

 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:874 Buffer map failed for PCI-Group: 0 GPU: 0
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  0:1299 failed to get bounce buffer for PCI group 0 GPU 0
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  cufio_core:3391 Final direct subio failed retval  -5011  buf_offset:  0  file_offset:  0  size:  131072
 01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR  cufio_core:3412 Setting I/O to failed. Expected I/O Size  131072  actual:  0
 01-09-2025 15:27:05:240 [pid=11970 tid=11970] ERROR  cufio:146 cuFileBufDeregister error, object for device pointer is not registered
 01-09-2025 15:27:05:240 [pid=11970 tid=11970] ERROR  cufio:172 cuFileBufDeregister error: device pointer lookup failure

Can you help on how to proceed or is gds not available for this gpu?

You might find a more knowledgable audience posting over on this list.

GDS p2p mode is only supported on data center and Pro vis GPUs as it depends on GPU exposing BAR1 space. your GPU model is not supported.

1 Like