Start HIP explicitly before MPI/AMReX initialization#574
Merged
baperry2 merged 2 commits intoAMReX-Combustion:developmentfrom Oct 6, 2025
Merged
Start HIP explicitly before MPI/AMReX initialization#574baperry2 merged 2 commits intoAMReX-Combustion:developmentfrom
baperry2 merged 2 commits intoAMReX-Combustion:developmentfrom
Conversation
This avoids potential SDMA or HSA initialization races observed on large Frontier runs.
Contributor
|
Interesting we still need to do this. We had been doing it on another project for years. https://github.com/Exawind/exawind-driver/blob/e56ebdf38484b7c425b773800d8ce317fb25f768/app/exawind/exawind.cpp#L40-L43 |
jrood-nrel
approved these changes
Oct 6, 2025
baperry2
approved these changes
Oct 6, 2025
Collaborator
|
Thanks @bssoriano! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When running PeleLMeX on Frontier at large scale (>256 nodes), some jobs failed during MPI initialization with errors such as:
srun: error: frontier02517: task 1529: Segmentation fault (core dumped) srun: Terminating StepId=3784394.0or
inet_recv: unexpected socket EOF on frontier0XXXX _pmi_network_allgather:_pmi_inet_recv from target failed Fatal error in PMPI_Init: Other MPI error, PMI_Allgather failed: -1The failures were intermittent at small scale but consistent at 256+ nodes. HIP was being initialized implicitly and concurrently across all ranks after MPI_Init or during early GPU-aware MPI setup. At very large node counts, this caused contention and race conditions in ROCm’s SDMA and HSA bring-up routines.
Fix
An explicit call to
hipInit(0);is now issued before MPI or AMReX initialization. This ensures that the HIP runtime and HSA driver are fully brought up on each rank before MPI communication setup begins, preventing concurrent lazy initialization of HIP contexts across thousands of ranks.
Validation
The fix was tested on Frontier (OLCF) for 8 separate runs at 512 nodes.
All runs completed successfully without the previous startup crashes.