Generalize SIMD Single Source Design by ax3l · Pull Request #4924 · AMReX-Codes/amrex

ax3l · 2026-01-27T22:55:42Z

Add `ParallelForSIMD<T>`

This adds another template overload to ParallelForSIMD.

A typical user pattern for maximum controls so far is:

#ifdef AMREX_USE_SIMD
if constexpr (amrex::simd::is_vectorized<T>) {
    amrex::ParallelForSIMD<T::simd_width>(np, pushSingleParticle);
} else
#endif
{
    amrex::ParallelFor(np, pushSingleParticle);  // GPU & non-SIMD CPU
}

This simplifies it to:

amrex::ParallelForSIMD<T>(np, pushSingleParticle);

indicating there might be a SIMD path if T (e.g., a functor) implements it.

One can still call ParallelForSIMD with an explicit SIMD width (int), as before.

Generalized Particle Load/Store

A typical SIMD user pattern for particle SoA kernels was:

SIMDParticleReal<SIMD_WIDTH> part_x;
part_x.copy_from(&m_part_x[i], stdx::element_aligned);

el.compute(part_x);

#ifdef AMREX_USE_SIMD
if constexpr (is_nth_arg_non_const<&el::compute, n>)
    part_x.copy_to(&m_part_x[i], stdx::element_aligned);
#endif

This simplifies it to:

decltype(auto) x = load_1d(m_part_x, i);

el.compute(x);

store_1d<&el::compute, 0>(x, m_part_x, i);

and can now also be used for the GPU path, where for now the load is a transparent pointer forward/deref and the store is a no-OP.

Combined

Using these patterns together, one can now write single-source SIMD-CPU/non-SIMD-CPU/GPU kernels, e.g., BLAST-ImpactX/impactx#1279

Follow-up to #4520

Checklist

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

This adds another template overload to `ParalleForSIMD`. A typical user pattern for maximum controls so far is: ``` if constexpr (amrex::simd::is_vectorized<T>) { amrex::ParallelForSIMD<T::simd_width>(np, pushSingleParticle); // TODO: test 2,4,8, ... and handle remainder } else { amrex::ParallelFor(np, pushSingleParticle); } ``` This simplifies it to: ``` amrex::ParallelForSIMD<T>(np, pushSingleParticle); ``` indicating there might be a templated path.

WeiqunZhang · 2026-02-02T17:43:39Z

I don't think amrex::ParallelForSIMD<T>(np, pushSingleParticle); compiles if AMREX_USE_GPU is true, because that function only exists in AMReX_GpuLaunchFunctsC.H, which is only included when AMREX_USE_GPU is not defined.

Src/Base/AMReX_GpuLaunchFunctsC.H

A typical SIMD user pattern for particle SoA kernels was: ```C++ SIMDParticleReal<SIMD_WIDTH> part_x; part_x.copy_from(&m_part_x[i], stdx::element_aligned); el.compute(part_x); if constexpr (is_nth_arg_non_const<&el::compute, n>) part_x.copy_to(&m_part_x[i], stdx::element_aligned); ``` This simplifies it to: ```C++ decltype(auto) x = load_1d(m_part_x, i); el.compute(x); store_1d<&el::compute, 0>(x, m_part_x, i); ```

ax3l · 2026-02-03T08:33:37Z

Src/Base/AMReX_GpuLaunchFunctsSIMD.H

+AMREX_ATTRIBUTE_FLATTEN_FOR
+void ParallelForSIMD (N n, L && f) noexcept
+{
+#ifdef AMREX_USE_SIMD


One could maybe add another "and not GPU" here, just to be super sure if someone turns on SIMD and GPU at the same time to stay with the GPU path... Not sure if relevant.

ax3l · 2026-02-06T00:47:50Z

Tests in #4938

## Summary Needed to cover SIMD features and generic SIMD/non-SIMD patterns. Bravely vibe coded, but now reviewed and improved for sensibility. ## Additional background #4924 #4607 #4600 #4520 ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate --------- Co-authored-by: Weiqun Zhang <[email protected]>

ax3l added enhancement performance labels Jan 27, 2026

ax3l force-pushed the topic-simd-parallelfor branch from 5c3adb1 to 6cf39cb Compare January 27, 2026 22:56

ax3l mentioned this pull request Jan 27, 2026

BeamOptics: Generalize SIMD Logic BLAST-ImpactX/impactx#1279

Merged

4 tasks

ax3l marked this pull request as ready for review January 28, 2026 20:32

ax3l force-pushed the topic-simd-parallelfor branch from 877dde1 to 40f36a6 Compare January 28, 2026 20:32

ax3l requested review from AlexanderSinn, WeiqunZhang and atmyers January 28, 2026 20:33

ax3l mentioned this pull request Jan 28, 2026

Performance of WarpX in serial problems BLAST-WarpX/warpx#6487

Open

ax3l assigned WeiqunZhang Jan 29, 2026

ax3l commented Feb 3, 2026

View reviewed changes

Src/Base/AMReX_GpuLaunchFunctsC.H Outdated Show resolved Hide resolved

ax3l mentioned this pull request Feb 3, 2026

BeamOptics: Generalize SIMD Logic in AMReX BLAST-ImpactX/impactx#1289

Open

3 tasks

ax3l force-pushed the topic-simd-parallelfor branch from 40f36a6 to 8da037c Compare February 3, 2026 08:17

ax3l force-pushed the topic-simd-parallelfor branch from 8da037c to 963e095 Compare February 3, 2026 08:19

ax3l commented Feb 3, 2026

View reviewed changes

WeiqunZhang approved these changes Feb 4, 2026

View reviewed changes

ax3l merged commit 48c2794 into AMReX-Codes:development Feb 5, 2026
74 checks passed

ax3l deleted the topic-simd-parallelfor branch February 5, 2026 19:25

ax3l mentioned this pull request Feb 6, 2026

Test: SIMD #4938

Merged

5 tasks

ax3l mentioned this pull request Feb 26, 2026

SIMD: Add where and ternary operator #5095

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize SIMD Single Source Design#4924

Generalize SIMD Single Source Design#4924
ax3l merged 2 commits intoAMReX-Codes:developmentfrom
ax3l:topic-simd-parallelfor

ax3l commented Jan 27, 2026 •

edited

Loading

Uh oh!

WeiqunZhang commented Feb 2, 2026

Uh oh!

Uh oh!

ax3l Feb 3, 2026

Uh oh!

Uh oh!

ax3l commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ax3l commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add ParallelForSIMD<T>

Generalized Particle Load/Store

Combined

Checklist

Uh oh!

WeiqunZhang commented Feb 2, 2026

Uh oh!

Uh oh!

ax3l Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ax3l commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ax3l commented Jan 27, 2026 •

edited

Loading

Add `ParallelForSIMD<T>`