Describe the bug
ucx_info and ucx_perftest reports dc_mlx5.c:329 UCX ERROR mlx5dv_create_qp(mlx5_0:1, DCI): failed: Invalid argument.
Steps to Reproduce
UCX version: UCT version=1.10.0 revision c7add93
UCX build config: --prefix=$PREFIX --enable-debug --enable-assertions --enable-params-check --enable-frame-pointer --enable-backtrace-detail
Setup and versions
LSB Version: :core-4.1-aarch64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 8.1.1911 (Core)
Release: 8.1.1911
Codename: Core
ofed_info -s: MLNX_OFED_LINUX-5.1-0.6.6.0
rpm -q rdma-core: rdma-core-51mlnx1-1.51066.aarch64
rpm -q libibverbs: libibverbs-51mlnx1-1.51066.aarch64
Additional information (depending on the issue)
For ucx_info -d, this happens when it tries to print info about the dc_mlx5 transport.
For ucx_perftest, it happens when running any UCP test without any environment variable set.
All issues go away if I add --without-dc to the configure script.
This doesn't happen with UCX 1.9.0, dc transport will be enabled and work correctly.
This also doesn't happen when built against MLNX_OFED_LINUX-4.5-1.0.1.0 on another ThunderX2 machine, but it looks like dc is automatically disabled there.
Describe the bug
ucx_infoanducx_perftestreportsdc_mlx5.c:329 UCX ERROR mlx5dv_create_qp(mlx5_0:1, DCI): failed: Invalid argument.Steps to Reproduce
UCX version:
UCT version=1.10.0 revision c7add93UCX build config:
--prefix=$PREFIX --enable-debug --enable-assertions --enable-params-check --enable-frame-pointer --enable-backtrace-detailSetup and versions
lsb_release -a:ofed_info -s:MLNX_OFED_LINUX-5.1-0.6.6.0rpm -q rdma-core:rdma-core-51mlnx1-1.51066.aarch64rpm -q libibverbs:libibverbs-51mlnx1-1.51066.aarch64Additional information (depending on the issue)
For
ucx_info -d, this happens when it tries to print info about thedc_mlx5transport.For
ucx_perftest, it happens when running any UCP test without any environment variable set.All issues go away if I add
--without-dcto the configure script.This doesn't happen with UCX 1.9.0, dc transport will be enabled and work correctly.
This also doesn't happen when built against MLNX_OFED_LINUX-4.5-1.0.1.0 on another ThunderX2 machine, but it looks like dc is automatically disabled there.