Skip to content

MFIter: Device sync optimization#4897

Merged
WeiqunZhang merged 1 commit intoAMReX-Codes:developmentfrom
WeiqunZhang:mfiter_device_sync_pre
Jan 15, 2026
Merged

MFIter: Device sync optimization#4897
WeiqunZhang merged 1 commit intoAMReX-Codes:developmentfrom
WeiqunZhang:mfiter_device_sync_pre

Conversation

@WeiqunZhang
Copy link
Copy Markdown
Member

For safety reasons, we have a sync before starting jobs on non-default streams (i.e., streams with index > 0) in MFIter. But the previous approach of delaying the sync until necessary is actually slower than performing a sync on stream 0 (i.e., the default stream in amrex, not to be confused with CUDA's legacy default stream) at the beginning. This is because stream sync is relatively cheap when there are no active jobs in that stream. For example, the sequence of sync, kernel, kernel, and sync is faster than the sequence of kernel, sync, kernel, and sync, where there are no previous active jobs.

I have done some testing on perlmutter and frontier, the new way is about 5-10% faster for the following code.

BoxArray ba(Box(IntVect(0), IntVect(255)));
ba.maxSize(128);
MultiFab mf(ba, DistributionMapping{ba}, 1, 0);
mf.setVal(1);
auto t0 = amrex::second;
for (int i = 0; i < 10; ++i)
{
    mf.setVal(t0);
}
auto t1 = amrex::second();

For safety reasons, we have a sync before starting jobs on non-default
streams (i.e., streams with index > 0) in MFIter. But the previous approach
of delaying the sync until necessary is actually slower than performing a
sync on stream 0 (i.e., the default stream in amrex, not to be confused with
CUDA's legacy default stream) at the beginning. This is because stream sync
is relatively cheap when there are no active jobs in that stream. For
example, the sequence of sync, kernel, kernel, and sync is faster than the
sequence of kernel, sync, kernel, and sync, where there are no previous
active jobs.
@WeiqunZhang WeiqunZhang requested a review from atmyers January 14, 2026 22:37
@WeiqunZhang WeiqunZhang merged commit 46479e6 into AMReX-Codes:development Jan 15, 2026
74 checks passed
@WeiqunZhang WeiqunZhang deleted the mfiter_device_sync_pre branch January 15, 2026 01:37
WeiqunZhang added a commit to WeiqunZhang/amrex that referenced this pull request Jan 15, 2026
The issue was there was an out of bound error in MFIter when local size is
zero.
WeiqunZhang added a commit that referenced this pull request Jan 15, 2026
The issue was there was an out of bound error in MFIter when local size
is zero.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants