Skip to content

Relaxing streamSynchronize locations#595

Merged
baperry2 merged 12 commits intoAMReX-Combustion:developmentfrom
ThomasHowarth:stream_sync
Jan 22, 2026
Merged

Relaxing streamSynchronize locations#595
baperry2 merged 12 commits intoAMReX-Combustion:developmentfrom
ThomasHowarth:stream_sync

Conversation

@ThomasHowarth
Copy link
Copy Markdown
Contributor

This PR acts as a follow-up to #556 , but we relax a bit on where the streamSynchronizes are performed. I've also fused a couple of other kernels (e.g. sequential Copy+Multiply/Divide)

@baperry2 baperry2 requested a review from jrood-nrel November 24, 2025 17:01
@baperry2
Copy link
Copy Markdown
Collaborator

@jrood-nrel - can you run the script you set up for the last PR on this?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes GPU synchronization patterns and fuses sequential operations to improve performance. The main changes move streamSynchronize() calls from inside loops (after each iteration) to outside loops (after all iterations complete), and fuse sequential Copy+Multiply/Divide operations into single ParallelFor kernels.

Key changes:

  • Relocated streamSynchronize() from inside loops to outside loops across multiple files, reducing synchronization overhead
  • Fused sequential MultiFab operations (Copy+Multiply/Divide) into single ParallelFor kernels
  • Refactored loop structures in projection code to enable better synchronization placement

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
PeleLMeX_Utils.cpp Moved streamSynchronize outside loop in floorSpecies function
PeleLMeX_UMac.cpp Consolidated streamSynchronize calls to end of addChiIncrement function
PeleLMeX_TransportProp.cpp Fused lambda_turb computation kernel and moved sync outside loop; fused diff_aux computation with cp calculation
PeleLMeX_Soot.cpp Moved streamSynchronize outside loop in clipSootMoments function
PeleLMeX_Projection.cpp Refactored loop structure and added conditional syncs before CPU operations; moved final sync outside loops
PeleLMeX_Forces.cpp Moved streamSynchronize outside loop in addSpark function
PeleLMeX_DiffusionOp.cpp Consolidated streamSynchronize calls outside loops in diffuse_scalar and compute_divtau
PeleLMeX_Diffusion.cpp Fused spec_boundary computation kernels and moved syncs outside loops in multiple functions
PeleLMeX_Advection.cpp Moved streamSynchronize outside loops in updateVelocity, getScalarAdvForce, and updateScalarComp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jrood-nrel
Copy link
Copy Markdown
Contributor

Sure I can try to check for diffs.

@jrood-nrel
Copy link
Copy Markdown
Contributor

jrood-nrel commented Jan 21, 2026

I checked the diffs on this and no specific file causes the diffs to do anything significant. So I'm satisfied.

@baperry2 baperry2 enabled auto-merge (squash) January 21, 2026 23:15
@jrood-nrel
Copy link
Copy Markdown
Contributor

#616 has the fixes for test_masscons.py

@baperry2 baperry2 merged commit 68e5998 into AMReX-Combustion:development Jan 22, 2026
34 checks passed
@ThomasHowarth ThomasHowarth deleted the stream_sync branch January 26, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants