Using an Ubuntu 24.04 , linux kernel 6.8.0-48-generic and nvme SSD with NVIDIA Corporation GA107M [GeForce RTX 3050 Ti Mobile] GPU .
using Driver Version: 570.172.08 and CUDA Version: 12.8.
This is my gdscheck output
GDS release version: 1.14.1.1
nvidia_fs version: 2.26 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe P2PDMA : Unsupported
NVMe : Supported
NVMeOF : Supported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
ScaTeFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Enabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_pci_p2pdma : false
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.per_buffer_cache_size_kb : 1024
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 64
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.scatefs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
fs.gpfs.gds_async_support: true
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA GeForce RTX 3050 Ti Laptop GPU bar:1 bar size (MiB):4096 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
Found ACS enabled for switch 0000:00:01.0
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12080
Platform: XPS 15 9520, Arch: x86_64(Linux 6.8.0-48-generic)
Platform verification succeeded
This is the dmesg error,
[ 119.210065] nvidia_fs: registered correctly with major number 502
[ 119.210813] max_peer_devs : 64 and max_pci_depth : 16
[ 119.211685] nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0
[ 119.211697] nvidia-fs:warning: error retrieving numa node for device 0000:03:00.0
[ 119.211712] nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0
[ 119.211721] nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0
[ 147.202440] audit: type=1400 audit(1756719516.504:303): apparmor="DENIED" operation
These devices correspond to the nvme’s (2 of them) and the GPU.
another error was,
[ 1255.901058] nvidia-fs:nvfs_pin_gpu_pages:1336 Error ret -22 invoking nvidia_p2p_get_pages_persistent
va_start=0x7e7f75000000/va_end=0x7e7f7501ffff/rounded_size=0x20000/gpu_buf_length=0x20000
[ 1255.901834] nvidia-fs:nvfs_pin_gpu_pages:1336 Error ret -22 invoking nvidia_p2p_get_pages_persistent
va_start=0x7e7f75020000/va_end=0x7e7f7511ffff/rounded_size=0x100000/gpu_buf_length=0x100000
The cufile.log looks like this
01-09-2025 14:57:37:93 [pid=22681 tid=22681] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:57:37:94 [pid=22681 tid=22681] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:57:37:94 [pid=22681 tid=22681] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:13:571 [pid=22798 tid=22798] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:13:571 [pid=22798 tid=22798] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:13:572 [pid=22798 tid=22798] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:19:879 [pid=22824 tid=22824] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:19:880 [pid=22824 tid=22824] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:19:880 [pid=22824 tid=22824] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:555 map failed
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio-obj:156 error allocating nvfs handle, size: 131072
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio_core:1659 cuFileBufRegister error, object allocation failed
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio_core:1740 cuFileBufRegister error cufile success
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:555 map failed
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:874 Buffer map failed for PCI-Group: 0 GPU: 0
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR 0:1299 failed to get bounce buffer for PCI group 0 GPU 0
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio_core:3391 Final direct subio failed retval -5011 buf_offset: 0 file_offset: 0 size: 131072
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio_core:3412 Setting I/O to failed. Expected I/O Size 131072 actual: 0
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio:146 cuFileBufDeregister error, object for device pointer is not registered
01-09-2025 14:58:20:7 [pid=22824 tid=22824] ERROR cufio:172 cuFileBufDeregister error: device pointer lookup failure
01-09-2025 15:26:56:720 [pid=11934 tid=11934] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:26:56:729 [pid=11934 tid=11934] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:26:56:730 [pid=11934 tid=11934] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:27:05:113 [pid=11970 tid=11970] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:27:05:113 [pid=11970 tid=11970] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:27:05:114 [pid=11970 tid=11970] ERROR cufio-topo-nvfs:78 pci device not present in topology device attribute table: 0000:00:14.3
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:555 map failed
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR cufio-obj:156 error allocating nvfs handle, size: 131072
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR cufio_core:1659 cuFileBufRegister error, object allocation failed
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR cufio_core:1740 cuFileBufRegister error cufile success
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:522 nvidia-fs MAP ioctl failed : ioctl_return: -22 ioctl_ret: -1
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:555 map failed
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:874 Buffer map failed for PCI-Group: 0 GPU: 0
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR 0:1299 failed to get bounce buffer for PCI group 0 GPU 0
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR cufio_core:3391 Final direct subio failed retval -5011 buf_offset: 0 file_offset: 0 size: 131072
01-09-2025 15:27:05:239 [pid=11970 tid=11970] ERROR cufio_core:3412 Setting I/O to failed. Expected I/O Size 131072 actual: 0
01-09-2025 15:27:05:240 [pid=11970 tid=11970] ERROR cufio:146 cuFileBufDeregister error, object for device pointer is not registered
01-09-2025 15:27:05:240 [pid=11970 tid=11970] ERROR cufio:172 cuFileBufDeregister error: device pointer lookup failure
Can you help on how to proceed or is gds not available for this gpu?