DGX Spark GPUDirect RDMA

Does the DGX spark support GPUDirect RDMA? On a normal x86 system with a GPU and connectx card you need to have the nvidia-peermem module installed/activated before calling ibv_reg_mr with a GPU buffer or otherwise the system seg faults. However, there does not appear to be a peermem module for the spark pre-installed but I get a seg fault when trying to call ibv_reg_mr with a GPU buffer. Does peermem need to be installed or does some other module need to be installed or activated as I have seen a nvidia-p2p module mentioned for the orin? Or do I need an entirely different approach when working with the spark? Or is this just not currently supported?

Specifically I get this error when trying to load the peermem module: modprobe: ERROR: could not insert ‘nvidia_peermem’: Invalid argument So maybe it does exist? But just isn’t loading for some reason.

Same problem here. Some debug:

sonata@spark-05de:~$ sudo modprobe nvidia-peermem
[sudo] password for sonata:
modprobe: ERROR: could not insert 'nvidia_peermem': Invalid argument
sonata@spark-05de:~$ sudo modinfo nvidia-peermem
filename:       /lib/modules/6.11.0-1016-nvidia/kernel/nvidia-580-open/nvidia-peermem.ko
version:        580.95.05
license:        Dual BSD/GPL
description:    NVIDIA GPU memory plug-in
author:         Yishai Hadas
srcversion:     890AFFA635D55BDFFC7CFAE
depends:
name:           nvidia_peermem
vermagic:       6.11.0-1016-nvidia SMP preempt mod_unload modversions aarch64
sig_id:         PKCS#7
signer:         Canonical Ltd. Kernel Module Signing
sig_key:        E9:DF:13:0F:92:92:A9:B7
sig_hashalgo:   sha512
signature:      9A:D7:8E:63:77:91:1B:A6:83:83:C0:E8:17:92:DF:2B:B9:9A:52:C4:
		54:45:6F:87:DF:03:F8:CE:C5:F3:A8:43:8B:D5:72:A2:BC:4A:D5:44:
		56:B7:2C:FD:F1:5F:1F:A7:43:9F:27:BF:9D:AE:53:A0:94:B5:3F:31:
		AC:84:07:6A:6C:A0:2D:B3:CE:F5:1E:AF:63:26:DF:93:FB:D8:06:C5:
		A6:52:DE:B4:F3:6E:1B:4C:AA:D7:D9:40:13:5A:2B:4D:0C:56:43:0F:
		7C:40:ED:4B:7C:DA:3A:97:17:8C:A9:58:69:94:CD:02:5E:A1:2E:3E:
		B5:16:10:22:BD:0F:26:8F:8A:D2:55:B4:21:BD:C4:D7:57:EC:AC:F6:
		FD:18:CA:F7:70:C8:26:E9:E7:86:F3:BF:F8:D3:74:EE:E1:04:AF:EF:
		ED:D2:AA:08:3B:17:F5:47:00:47:C4:B8:6C:B3:5C:B2:58:A0:BE:01:
		C2:55:0F:F9:90:B8:6E:F1:B6:4E:9C:C4:6E:B2:87:6C:D2:56:68:E8:
		8B:CB:70:51:4E:E4:ED:89:56:31:7F:66:26:60:53:BB:4A:0A:5D:C8:
		5E:26:8E:EE:C7:AC:84:2B:80:2A:B2:48:40:4E:7D:85:E7:71:BF:ED:
		BD:A9:A9:40:70:CA:BE:25:95:DD:39:38:A5:F3:29:E4:53:58:C3:E0:
		78:EA:7A:D5:30:1A:AC:7B:49:EF:08:AB:A1:19:EC:FD:4E:2D:0D:59:
		6E:39:71:BD:A0:DA:2D:33:5E:14:F1:7D:F2:2D:C0:C2:5B:A8:E0:FD:
		1C:E7:0A:40:39:7B:6A:64:FE:D7:10:51:D0:1F:35:68:72:F0:40:30:
		8A:05:FC:15:84:E1:96:09:99:2B:3C:D5:04:7D:50:B7:23:DE:07:AA:
		19:FC:3B:8F:94:AA:55:E2:AF:28:4C:13:96:04:8B:55:D7:66:3E:B5:
		6B:A8:11:AE:D3:C9:1D:F6:61:A8:29:57:7E:2F:44:A9:9E:78:15:0B:
		0F:9C:6D:D9:1E:5E:31:19:E1:20:AB:E5:3B:BE:F0:72:AA:F0:B3:63:
		33:FE:DA:DA:23:FC:87:A7:59:46:68:8B:DD:E2:87:EB:46:BE:78:3C:
		DD:BC:9B:F6:DA:78:13:2C:FC:0F:40:31:48:65:BF:38:BD:A1:92:F6:
		B4:51:68:17:E4:DA:F9:DB:5E:5C:94:E5:FA:4F:0F:78:6A:3E:71:66:
		C7:C0:A2:4E:60:0A:4F:05:51:71:B4:8E:29:B2:01:59:9E:4F:F0:1C:
		74:53:CD:54:4D:33:DD:3A:81:75:F8:38:EC:1D:92:AF:E2:D7:7A:21:
		D6:1F:F8:DB:99:74:D8:20:51:71:75:68
parm:           peerdirect_support:Set level of support for Peer-direct, 0 [default] or 1 [legacy, for example MLNX_OFED 4.9 LTS] (int)
parm:           persistent_api_support:Set level of support for persistent APIs, 0 [legacy] or 1 [default] (int)

The dmesg is clean with nothing from peermem there.

So, my dmesg was also clean and modinfo for my peermem module looks the same. The interesting thing about it though is the depends: line. It is empty whereas on a normal system with peermem it looks like:

depends: nvidia,ib_uverbs

This led me to investigate the module further as I don’t understand why the spark module wouldn’t also depend on the nvidia and ib_uverbs module. So, I ran the following:

objdump -d /lib/modules/6.11.0-1016-nvidia/kernel/nvidia-580-open/nvidia-peermem.ko 

And got:

/lib/modules/6.11.0-1016-nvidia/kernel/nvidia-580-open/nvidia-peermem.ko:     file format elf64-littleaarch64


Disassembly of section .init.text:

0000000000000000 <init_module-0x8>:
   0:	d503201f 	nop
   4:	d503201f 	nop

0000000000000008 <init_module>:
   8:	d503201f 	nop
   c:	d503201f 	nop
  10:	128002a0 	mov	w0, #0xffffffea            	// #-22
  14:	d65f03c0 	ret

Disassembly of section .exit.text:

0000000000000000 <cleanup_module>:
   0:	d65f03c0 	ret

Disassembly of section .plt:

0000000000000000 <.plt>:
	...

Disassembly of section .text.ftrace_trampoline:

0000000000000000 <.text.ftrace_trampoline>:
	...

This is why the module returns the invalid argument. It is in essence empty. However, the code for the peermem module does seem to be present on the spark in the /usr/src/nvidia-580.95.05/nvidia-peermem folder. Looking through the code, it seems like the module expects NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT to be defined or otherwise the module’s init method just returns –EINVAL which becomes the ‘invalid argument’ we see when running modprobe. See here:

https://github.com/NVIDIA/open-gpu-kernel-modules/blob/2b436058a616676ec888ef3814d1db6b2220f2eb/kernel-open/nvidia-peermem/nvidia-peermem.c#L641

Also, NV_MLNX_IB_PEER_MEM_SYMBOLS_PRESENT seems to get defined here when building the module:

https://github.com/NVIDIA/open-gpu-kernel-modules/blob/2b436058a616676ec888ef3814d1db6b2220f2eb/kernel-open/conftest.sh#L3277

I assume that it was not defined when the module on the spark was built as there is no /usr/src/ofa_kernel folder or dkms source for ofed on the spark or wherever the module was built. But that is why I think the module is empty and does not load.

So maybe the DOCA-OFED drivers need to be installed, and the module needs to be rebuilt/replaced? However, the DGX OS user guide specifically states that the spark doesn’t require it:

So, I’m not really sure what is going on. Maybe someone from nvidia could respond and give an idea whether GPU Direct/peermem should work, will be supported at some point, isn’t intended to work, or some other method should be used as an alternative.

DGX Spark SoC is characterized by a unified memory architecture.

For performance reasons, specifically for CUDA contexts associated to the iGPU, the system memory returned by the pinned device memory allocators (e.g. cudaMalloc) cannot be coherently accessed by the CPU complex nor by I/O peripherals like PCI Express devices.

Hence the GPUDirect RDMA technology is not supported, and the mechanisms for direct I/O based on that technology, for example nvidia-peermem (for DOCA-Host), dma-buf or GDRCopy, do not work.

A compliant application should programmatically introspect the relevant platform capabilities, e.g. by querying CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED (related to nv-p2p kernel APIs) or CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORT (related to dma-buf), and leverage an appropriate fallback.

For example, for Linux RDMA applications based on the ib verbs library, we suggest to allocate the communication buffers with the cudaHostAlloc API and to register them with the ib_reg_mr function.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.