Relaxing streamSynchronize locations#595
Relaxing streamSynchronize locations#595baperry2 merged 12 commits intoAMReX-Combustion:developmentfrom
Conversation
|
@jrood-nrel - can you run the script you set up for the last PR on this? |
There was a problem hiding this comment.
Pull request overview
This PR optimizes GPU synchronization patterns and fuses sequential operations to improve performance. The main changes move streamSynchronize() calls from inside loops (after each iteration) to outside loops (after all iterations complete), and fuse sequential Copy+Multiply/Divide operations into single ParallelFor kernels.
Key changes:
- Relocated
streamSynchronize()from inside loops to outside loops across multiple files, reducing synchronization overhead - Fused sequential MultiFab operations (Copy+Multiply/Divide) into single ParallelFor kernels
- Refactored loop structures in projection code to enable better synchronization placement
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| PeleLMeX_Utils.cpp | Moved streamSynchronize outside loop in floorSpecies function |
| PeleLMeX_UMac.cpp | Consolidated streamSynchronize calls to end of addChiIncrement function |
| PeleLMeX_TransportProp.cpp | Fused lambda_turb computation kernel and moved sync outside loop; fused diff_aux computation with cp calculation |
| PeleLMeX_Soot.cpp | Moved streamSynchronize outside loop in clipSootMoments function |
| PeleLMeX_Projection.cpp | Refactored loop structure and added conditional syncs before CPU operations; moved final sync outside loops |
| PeleLMeX_Forces.cpp | Moved streamSynchronize outside loop in addSpark function |
| PeleLMeX_DiffusionOp.cpp | Consolidated streamSynchronize calls outside loops in diffuse_scalar and compute_divtau |
| PeleLMeX_Diffusion.cpp | Fused spec_boundary computation kernels and moved syncs outside loops in multiple functions |
| PeleLMeX_Advection.cpp | Moved streamSynchronize outside loops in updateVelocity, getScalarAdvForce, and updateScalarComp |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Sure I can try to check for diffs. |
|
I checked the diffs on this and no specific file causes the diffs to do anything significant. So I'm satisfied. |
|
#616 has the fixes for |
This PR acts as a follow-up to #556 , but we relax a bit on where the streamSynchronizes are performed. I've also fused a couple of other kernels (e.g. sequential Copy+Multiply/Divide)