Skip to content

Generalize SIMD Single Source Design#4924

Merged
ax3l merged 2 commits intoAMReX-Codes:developmentfrom
ax3l:topic-simd-parallelfor
Feb 5, 2026
Merged

Generalize SIMD Single Source Design#4924
ax3l merged 2 commits intoAMReX-Codes:developmentfrom
ax3l:topic-simd-parallelfor

Conversation

@ax3l
Copy link
Copy Markdown
Member

@ax3l ax3l commented Jan 27, 2026

Add ParallelForSIMD<T>

This adds another template overload to ParallelForSIMD.

A typical user pattern for maximum controls so far is:

#ifdef AMREX_USE_SIMD
if constexpr (amrex::simd::is_vectorized<T>) {
    amrex::ParallelForSIMD<T::simd_width>(np, pushSingleParticle);
} else
#endif
{
    amrex::ParallelFor(np, pushSingleParticle);  // GPU & non-SIMD CPU
}

This simplifies it to:

amrex::ParallelForSIMD<T>(np, pushSingleParticle);

indicating there might be a SIMD path if T (e.g., a functor) implements it.

One can still call ParallelForSIMD with an explicit SIMD width (int), as before.

Generalized Particle Load/Store

A typical SIMD user pattern for particle SoA kernels was:

SIMDParticleReal<SIMD_WIDTH> part_x;
part_x.copy_from(&m_part_x[i], stdx::element_aligned);

el.compute(part_x);

#ifdef AMREX_USE_SIMD
if constexpr (is_nth_arg_non_const<&el::compute, n>)
    part_x.copy_to(&m_part_x[i], stdx::element_aligned);
#endif

This simplifies it to:

decltype(auto) x = load_1d(m_part_x, i);

el.compute(x);

store_1d<&el::compute, 0>(x, m_part_x, i);

and can now also be used for the GPU path, where for now the load is a transparent pointer forward/deref and the store is a no-OP.

Combined

Using these patterns together, one can now write single-source SIMD-CPU/non-SIMD-CPU/GPU kernels, e.g., BLAST-ImpactX/impactx#1279

Follow-up to #4520

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

This adds another template overload to `ParalleForSIMD`.

A typical user pattern for maximum controls so far is:
```
if constexpr (amrex::simd::is_vectorized<T>) {
    amrex::ParallelForSIMD<T::simd_width>(np, pushSingleParticle);  // TODO: test 2,4,8, ... and handle remainder
} else
{
    amrex::ParallelFor(np, pushSingleParticle);
}
```

This simplifies it to:
```
amrex::ParallelForSIMD<T>(np, pushSingleParticle);
```
indicating there might be a templated path.
@ax3l ax3l marked this pull request as ready for review January 28, 2026 20:32
@ax3l ax3l force-pushed the topic-simd-parallelfor branch from 877dde1 to 40f36a6 Compare January 28, 2026 20:32
@WeiqunZhang
Copy link
Copy Markdown
Member

I don't think amrex::ParallelForSIMD<T>(np, pushSingleParticle); compiles if AMREX_USE_GPU is true, because that function only exists in AMReX_GpuLaunchFunctsC.H, which is only included when AMREX_USE_GPU is not defined.

A typical SIMD user pattern for particle SoA kernels was:
```C++
SIMDParticleReal<SIMD_WIDTH> part_x;
part_x.copy_from(&m_part_x[i], stdx::element_aligned);

el.compute(part_x);

if constexpr (is_nth_arg_non_const<&el::compute, n>)
    part_x.copy_to(&m_part_x[i], stdx::element_aligned);
```

This simplifies it to:
```C++
decltype(auto) x = load_1d(m_part_x, i);

el.compute(x);

store_1d<&el::compute, 0>(x, m_part_x, i);
```
@ax3l ax3l force-pushed the topic-simd-parallelfor branch from 8da037c to 963e095 Compare February 3, 2026 08:19
AMREX_ATTRIBUTE_FLATTEN_FOR
void ParallelForSIMD (N n, L && f) noexcept
{
#ifdef AMREX_USE_SIMD
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could maybe add another "and not GPU" here, just to be super sure if someone turns on SIMD and GPU at the same time to stay with the GPU path... Not sure if relevant.

@ax3l ax3l merged commit 48c2794 into AMReX-Codes:development Feb 5, 2026
74 checks passed
@ax3l ax3l deleted the topic-simd-parallelfor branch February 5, 2026 19:25
@ax3l ax3l mentioned this pull request Feb 6, 2026
5 tasks
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Feb 6, 2026

Tests in #4938

ax3l added a commit that referenced this pull request Feb 7, 2026
## Summary

Needed to cover SIMD features and generic SIMD/non-SIMD patterns.
Bravely vibe coded, but now reviewed and improved for sensibility.

## Additional background

#4924 #4607 #4600  #4520

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

---------

Co-authored-by: Weiqun Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants