GPU Storage (GDS) to SSD with RDMA

Adelgado2719 · May 28, 2025, 9:23am

Hello, I’m trying to make a gpu direct storage to ssd, but i think that in the installation something is wrong.
When i make this command: /usr/local/cuda-12.4/gds/tools/gdscheck.py -p i obtain in my terminal:

 GDS release version: 1.9.1.3
 nvidia_fs version:  2.17 libcufile version: 2.12
 Platform: x86_64
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Unsupported
 NVMeOF             : Unsupported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Enabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384 
 properties.posix_pool_slab_count : 128 64 32 
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 fs.gpfs.gds_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 execution.max_io_threads : 4
 execution.max_io_queue_depth : 128
 execution.parallel_io : true
 execution.min_io_threshold_size_kb : 8192
 execution.max_request_parallelism : 4
 properties.force_odirect_mode : false
 properties.prefer_iouring : false
 =========
 GPU INFO:
 =========
 GPU index 0 NVIDIA RTX 6000 Ada Generation bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Nvidia Driver Info Status: Supported only on (nvidia-fs version <= 2.17.4)
 Cuda Driver Version Installed:  12040
 Platform: Precision 7960 Rack, Arch: x86_64(Linux 6.2.0-060200-generic)
 Platform verification succeeded

In the image you can see that NVMe is not supported, rdma library is not loaded, rdma devices are not configured and the rdma device status is not UP = 1. Moreover, I’m not sure about the line " Nvidia Driver Info Status: Supported only on (nvidia-fs version <= 2.17.4)", is this a warning, or i have to change the version of nvidia-fs?

Also, I check that nvidia_fs and nvidia_peermem are loaded:

celis@celis-bb-rpr-1:~$ lsmod | grep nvidia_peermem
nvidia_peermem         16384  0
nvidia              54411264  32 nvidia_uvm,nvidia_peermem,nvidia_modeset
ib_uverbs             196608  3 nvidia_peermem,rdma_ucm,mlx5_ib
(base) celis@celis-bb-rpr-1:~$ lsmod | grep nvidia_fs
nvidia_fs             278528  0

Versions installed:
My kernel version is: 6.2.0-060200-generic
CUDA version: release 12.4, V12.4.131
Nvidia Driver version: 550.163.01
DOCA-OFED version: OFED-internal-25.01-0.6.0:
GPU NVIDIA: RTX 6000 Ada
Nvidia-fs version: 2.17.4
GDS version: 1.9.1.3

kmodukuri · May 28, 2025, 3:32pm

Looks like you have proprietory RM driver installed. nvidia-fs.ko is a GPL V2 module and needs OpenRM for newer linux versions.

please try with installing the open RM diver and also install the required NVMe support with DOCA

Adelgado2719 · May 30, 2025, 9:21am

Perfect, now i have the NVMe supported, but i have the mellanox peerdirect disabled, rdma library is not loaded, rdma devices is not configured and rdma device status is 0 in up. We install the nvidia driver open 560, and the kernel is 6.8.0.52-generic.

GDS release version: 1.9.1.3
nvidia_fs version:  2.25 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe               : Supported
NVMeOF             : Unsupported
SCSI               : Unsupported
ScaleFlux CSD      : Unsupported
NVMesh             : Unsupported
DDN EXAScaler      : Unsupported
IBM Spectrum Scale : Unsupported
NFS                : Unsupported
BeeGFS             : Unsupported
WekaFS             : Unsupported
Userspace RDMA     : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library        : Not Loaded (libcufile_rdma.so)
--rdma devices        : Not configured
--rdma_device_status  : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384 
properties.posix_pool_slab_count : 128 64 32 
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA RTX 6000 Ada Generation bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
GPU index 1 NVIDIA RTX 6000 Ada Generation bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed:  12060
Platform: Precision 7960 Rack, Arch: x86_64(Linux 6.8.0-52-generic)
Platform verification succeeded

cat /proc/driver/nvidia-fs/stats
GDS Version: 1.14.0.28 
NVFS statistics(ver: 4.0)
NVFS Driver(version: 2.25.6)
Mellanox PeerDirect Supported: False
IO stats: Disabled, peer IO stats: Disabled
Logging level: info
 
Active Shadow-Buffer (MiB): 0
Active Process: 0
Reads				: err=0 io_state_err=0
Sparse Reads		        : n=0 io=0 holes=0 pages=0 
Writes				: err=0 io_state_err=0 pg-cache=0 pg-cache-fail=0 pg-cache-eio=0
Mmap				: n=0 ok=0 err=0 munmap=0
Bar1-map			: n=0 ok=0 err=0 free=0 callbacks=0 active=0 delay-frees=0
Error				: cpu-gpu-pages=0 sg-ext=0 dma-map=0 dma-ref=0
Ops				: Read=0 Write=0 BatchIO=0```

Also with the command ofed_info we can see the different versions installed for your information:

Thank you!!

kmodukuri · May 30, 2025, 3:37pm

Perfect, now i have the NVMe supported, but i have the mellanox peerdirect disabled, rdma library is not loaded, rdma devices is not configured and rdma device status is 0 in up.

you don’t need nvidia_peermem or dmabuf support if you need only local SSD support.

nvidia_peermem or dmabuf support is only needed for userspace RDMA , WEKA FS or GPFS.

if you need to configure userspace RDMA for GPFS or WEKA, please see individual sections at
1. NVIDIA GPUDirect Storage Installation and Troubleshooting Guide — NVIDIA GPUDirect Storage Installation and Troubleshooting Guide,

Adelgado2719 · June 2, 2025, 8:18am

Yes, but in my case I need to have RDMA with peermem as I have a connectX through which we transfer to NVMe disks and GPUs hosted on other servers.

Topic		Replies	Views
GDS on NVMe-oF (RDMA) reports "No matching pair for network device to closest GPU" although RDMA devices are up GPU-Accelerated Libraries cuda	0	68	July 30, 2025
GPUDirect Storage access remote SSD Storage cuda , gds	2	268	May 8, 2025
GDS read/write error:use gdsio -I 1 to write GPU-Accelerated Libraries gds	4	91	September 22, 2025
How to verify GPUDirect Storage's P2P DMA is working correctly for local attached NVMe SSD? GPU-Accelerated Libraries nvme , gds	0	383	August 28, 2024
Is GDS actually enabled on my system? gdscheck shows NVMe Unsupported but cuFile I/O works Storage gds , system-setup	0	23	November 22, 2025
Nvidia_p2p_get_pages() failing with error code -22 GPU - Hardware gds	4	1232	September 22, 2023
Gdscheck says gds supported but nvidiafs shows error Storage gds	3	125	September 24, 2025
Nvme unsupported from "gdscheck" in GPU direct storage GPU-Accelerated Libraries nvme	3	3678	April 17, 2023
Why is my NVMe storage unsupported in GDS? (Ubuntu 22.04 + Tesla V100) GPU-Accelerated Libraries	0	54	November 13, 2025
GDS and Jetson Xavier Jetson AGX Xavier gds	5	944	June 7, 2023

GPU Storage (GDS) to SSD with RDMA

Related topics