Skip to content

CMake: HIP Modernizing & RDC#2031

Merged
WeiqunZhang merged 4 commits intoAMReX-Codes:developmentfrom
ax3l:cmake-hipRDC
Jul 1, 2021
Merged

CMake: HIP Modernizing & RDC#2031
WeiqunZhang merged 4 commits intoAMReX-Codes:developmentfrom
ax3l:cmake-hipRDC

Conversation

@ax3l
Copy link
Copy Markdown
Member

@ax3l ax3l commented May 18, 2021

Summary

Add -fgpu-rdc flags to HIP if requested via AMReX_GPU_RDC.

Modernize HIP logic with recommended targets that avoid the flaky hipcc compiler scripts: https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html#using-hip-in-cmake
Add support for AMDs clang++/clang compiler for HIP instead of using the legacy hipcc perl wrapper as C++ compiler.
This also increases support towards Cray compiler wrappers, which also refer to clang++/clang underneath (Spock/OLCF).

Close #1688

Additional background

Relocatable-device-code (RDC) flags are needed for extern device variable support (for codes that use global variables on device). Also needed when linking with Ascent.

Follow-up to #2029

With HIP GPU RDC, static libs emitting & linking does get more fancy:
ROCm/rccl#260

Tests

  • Compiled in CI
  • Compile on Tulip (ROCm 4.1.1) with Cray Wrappers:
module unload cuda11.2
module load craype-accel-amd-gfx908
module load rocm/4.1.1
module load cmake/3.18.2
# default: cray-mvapich2/2.3.5

export CC=$(which cc)
export CXX=$(which CC)
export CXXFLAGS="--rocm-path=${ROCM_PATH}"

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908
cmake --build build -j 32
# In linking of final executable:
# error: Illegal instruction detected: VOP* instruction violates constant bus restriction
# renamable $sgpr4_sgpr5 = V_CMP_EQ_U64_e64 $exec, killed $vcc, implicit $exec, debug-location !293636; /opt/rocm-4.1.1/hip/../include/hip/hcc_detail/device_functions.h:1021:12
  • Compile on Tulip (ROCm 4.1.1) with AMD Clang:
module unload cuda11.2
module load craype-accel-amd-gfx908
module load rocm/4.1.1
module load cmake/3.18.2
# default: cray-mvapich2/2.3.5 as CUDA awareness in it and needs this at link time
module unload cray-mvapich2
module load cray-mvapich2_nogpu

export CC=$(which clang)
export CXX=$(which clang++)  # legacy works, too: export CXX=$(which hipcc)
# export CXXFLAGS="--rocm-path=${ROCM_PATH}"
export LDFLAGS="-L${CRAYLIBS_X86_64} $(CC --cray-print-opts=libs) -lmpi"

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908 -DMPI_CXX_COMPILER=$(which CC) -DMPI_C_COMPILER=$(which cc) -DMPI_COMPILER_FLAGS="--cray-print-opts=all"
cmake --build build -j 32
  • Compile on Spock (ROCm 4.1.0) with Cray Wrappers:
# module load DefApps/alt
module switch cce cce/12.0.0
# -> `cce/11.0.4` too old clang shipped by Cray (Clang 11-based) while ROCm/4.1.0 Clang already builds on Clang 12.
# -> use `cce/12.0.0`
module load craype-accel-amd-gfx908
module load cmake
module load ninja
module load rocm/4.1.0

export CC=$(which cc)
export CXX=$(which CC)

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908
cmake --build build
# In final link of  the executable:
# error: Illegal instruction detected: VOP* instruction violates constant bus restriction
# renamable $sgpr4_sgpr5 = V_CMP_EQ_U64_e64 $exec, killed $vcc, implicit $exec, debug-location !322291; nccs-svm1_sw/spock/spack-envs/views/rocm-4.1.0/hip/include/hip/hcc_detail/device_functions.h:1021:12
# clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
  • Compile on Spock (ROCm 4.1.0) with AMD Clang w/o MPI:
# module load DefApps/alt
module switch cce cce/12.0.0
module load craype-accel-amd-gfx908
module load cmake
module load ninja
module load rocm/4.1.0

export CC=$(which clang)
export CXX=$(which clang++)

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908 -DWarpX_MPI=OFF
cmake --build build -j 64
  • Compile on Spock (ROCm 4.1.0) with AMD Clang w/ MPI (2 tests: w/ and w/o RDC):
# module load DefApps/alt
module switch cce cce/12.0.0
module load craype-accel-amd-gfx908
module load cmake
module load ninja
module load rocm/4.1.0

export CC=$(which clang)
export CXX=$(which clang++)
export LDFLAGS="-L${CRAYLIBS_X86_64} $(CC --cray-print-opts=libs) -lmpi"

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908 -DMPI_CXX_COMPILER=$(which CC) -DMPI_C_COMPILER=$(which cc) -DMPI_COMPILER_FLAGS="--cray-print-opts=all"
cmake --build build -j 64
  • Compile on Spock (ROCm 4.1.0) with legacy logic (hipcc) w/ MPI:
# module load DefApps/alt
module switch cce cce/12.0.0
module load craype-accel-amd-gfx908
module load cmake
module load ninja
module load rocm/4.1.0

export CC=$(which clang)
export CXX=$(which hipcc)
export LDFLAGS="-L${CRAYLIBS_X86_64} $(CC --cray-print-opts=libs) -lmpi"

cmake -S . -B build -DWarpX_COMPUTE=HIP -DAMReX_AMD_ARCH=gfx908 -DMPI_CXX_COMPILER=$(which CC) -DMPI_C_COMPILER=$(which cc) -DMPI_COMPILER_FLAGS="--cray-print-opts=all"
cmake --build build

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • are described in the proposed changes to the AMReX documentation, if appropriate

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented May 19, 2021

hipcc is a ridiculous compiler wrapper: if we want to use device RDC, we need to link our device libs with -l and -L and cannot simply just forward them as object to a hipcc link line.
ROCm/hip#2154 (esp. ROCm/hip#2154 (comment))
ROCm/rccl#260

This is what happens when one perl's around on CLI arguments in the compiler world.

Update: Ohhh

For cmake, clang++ with the hip::device cmake target should be preferred over hipcc

Ref.: https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html#using-hip-in-cmake

Update: Fortran builds with hip::device expect that we use the shipped flang as Fortran compiler.

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented May 19, 2021

Either we drop (for now) HIP + Fortran support in CMake, work-around the flaky hipcc script to add -l and -L for static RDC builds or we hope that ROCm/hip#2190 gets merged at some point.
We could probably also unwrap and redefine our own hip::device target.

@WeiqunZhang
Copy link
Copy Markdown
Member

As for as I know, only Castro requires rdc, but it does not use cmake or amrex as a library. So maybe we can wait.

@ax3l ax3l changed the title CMake: HIP RDC [WIP] CMake: HIP Modernizing & RDC May 19, 2021
@ax3l ax3l mentioned this pull request May 25, 2021
5 tasks
@ax3l ax3l force-pushed the cmake-hipRDC branch 2 times, most recently from aea51f3 to 08f1771 Compare May 26, 2021 18:43
@ax3l ax3l force-pushed the cmake-hipRDC branch 4 times, most recently from e5ec258 to 2b35103 Compare May 28, 2021 05:15
@ax3l ax3l force-pushed the cmake-hipRDC branch 2 times, most recently from 007945c to 84a1fd9 Compare June 15, 2021 21:26
@ax3l ax3l mentioned this pull request Jun 15, 2021
5 tasks
@ax3l ax3l force-pushed the cmake-hipRDC branch 3 times, most recently from 9a2697e to d315a34 Compare June 21, 2021 19:53
Add `-fgpu-rdc` flags to HIP.
Relocatable-device-code (RDC) flags are needed for `extern` device
variable support (for codes that use global variables on device).
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Jun 23, 2021

Hi @jmsexton03 @mwm126 et al.,

in this PR, I am modernizing the ROCm support so that we can also build with the AMD clang++/clang compilers that are shipped inside /opt/rocm-4.1.1/llvm/bin/ instead of relying solely on the (flaky) hipcc perl script.

Until ROCm 4.4 is shipped, there are a few limitations with respect to Fortran, which I think you both do not use. I documented them in the docs, but the TL;DR is:

  • with hipcc, you cannot use relocatedable device code with CMake
ar: two different operation options specified
Use of uninitialized value $HIPLDARCHFLAGS in concatenation (.) or string at /usr/bin/hipcc line 667.
  • with AMD clang++/clang, you cannot use Fortran with CMake until ROCm 4.4 gets out

This PR adds support for the 2nd option, since this is also the preferred way to use ROCm with CMake:
https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html#using-hip-in-cmake
These clang additions are also periodically mainlined to LLVM; Cray Wrappers (cc/CC) also seem to prefer to go directly for clang++/clang, shipping their own build of Clang.

In ROCm 4.4, the issue with Fortran will be solved as well (ROCm/hip#2280) and then the 2nd option should be generally preferred for all builds.

I added a CI entry for both the legacy wrapper and the direct usage of clang++ (from AMD). I just ping you to make sure this PR does not break your workflows before we merge it in.

@ax3l ax3l changed the title [WIP] CMake: HIP Modernizing & RDC CMake: HIP Modernizing & RDC Jun 23, 2021
@ax3l ax3l changed the title CMake: HIP Modernizing & RDC [WIP] CMake: HIP Modernizing & RDC Jun 23, 2021
@ax3l ax3l force-pushed the cmake-hipRDC branch 2 times, most recently from 2b29b6c to 55e3077 Compare June 23, 2021 22:52
@ax3l ax3l changed the title [WIP] CMake: HIP Modernizing & RDC CMake: HIP Modernizing & RDC Jun 24, 2021
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Jun 30, 2021

Hi @jmsexton03 @mwm126, are you ok if we merge this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CMake & HIP: -fgpu-rdc

3 participants