Skip to content

Cray MPICH/ROCm compatibility for GPU-aware MPI#4729

Merged
WeiqunZhang merged 1 commit intoAMReX-Codes:developmentfrom
jaharris87:cray-mpich_rocm_gpu-aware_mpi_compatability
Oct 24, 2025
Merged

Cray MPICH/ROCm compatibility for GPU-aware MPI#4729
WeiqunZhang merged 1 commit intoAMReX-Codes:developmentfrom
jaharris87:cray-mpich_rocm_gpu-aware_mpi_compatability

Conversation

@jaharris87
Copy link
Copy Markdown
Contributor

Summary

Implements a check in OLCF makefile to compare Cray MPICH supported ROCm version with that in current environment. If the major versions differ, do not link library for GPU-aware MPI as this will cause an error at runtime.

Additional background

  • If the ROCm versions are incompatible, a user will see an error like the following at runtime: error while loading shared libraries: libamdhip64.so.6: cannot open shared object file: No such file or directory.
  • Supported major ROCm version for Cray MPICH is determined by parsing ldd output for libmpi_gtl_hsa.so.
  • The user's ROCm version is determined by parsing the output of hipconfig --version.
  • The user must also make sure the craype-accel-amd-gfx90a module (on Frontier) is not loaded, as this will automatically try to link GTL.
  • If running with a ROCm version that is not supported by the Cray MPICH version, CRAY_LD_LIBRARY_PATH must also be prepended to LD_LIBRARY_PATH and disable GPU-aware MPI with export MPICH_GPU_SUPPORT_ENABLED=0.

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@zingale
Copy link
Copy Markdown
Member

zingale commented Oct 23, 2025

I can build and run with this on Frontier.

However, ROCm 7 seems to give wrong answers for Castro (and the code crashes).

@WeiqunZhang
Copy link
Copy Markdown
Member

The make file changes work as expected.

@WeiqunZhang WeiqunZhang merged commit 3446dfb into AMReX-Codes:development Oct 24, 2025
85 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants