Skip to content

Kernel fusing and stream synchronizes#556

Merged
baperry2 merged 102 commits intoAMReX-Combustion:developmentfrom
ThomasHowarth:kernel_fusing
Nov 21, 2025
Merged

Kernel fusing and stream synchronizes#556
baperry2 merged 102 commits intoAMReX-Combustion:developmentfrom
ThomasHowarth:kernel_fusing

Conversation

@ThomasHowarth
Copy link
Copy Markdown
Contributor

@ThomasHowarth ThomasHowarth commented Aug 19, 2025

This is a pretty extensive PR, which may take a fair amount of time to review.
It attempts to make everything consistent and make (micro-)optimisations where possible.
I'll start with a draft as I continue working on it. Here's a preliminary list of changes:

If any additional consistent patterns emerge, I'll add them here.

@jrood-nrel
Copy link
Copy Markdown
Contributor

I talked about this with @marchdf today. I'm hoping to set up an automated test script where we can plug each of these file changes in 1 by 1 and quantify the diffs in some kind of plot since we expect the results to be nondeterministic even without the changes. I'm going to work on trying to set up some testing for this because I would like to get these changes in but we want to be careful.

@jrood-nrel
Copy link
Copy Markdown
Contributor

Ok I wrote a script that checked the changes in each file one at a time and had it compare against reference plot files for the flamesheet case and an EB case. I also ran the cases with no changes twice to understand the inherent non determinism. Here is a few of the main variables all catted that show the absolute error (on the left) are all not far from machine precision so I think I'm satisfied.

 x_velocity                                       0                         0
 y_velocity                                       0                         0
 z_velocity                                       0                         0
 density                                          0                         0
 x_velocity                                       0                         0
 y_velocity                                       0                         0
 z_velocity                                       0                         0
 density                                          0                         0
 x_velocity                                       0                         0
 y_velocity                                       0                         0
 z_velocity                                       0                         0
 density                                          0                         0
 x_velocity                                       0                         0
 y_velocity                                       0                         0
 z_velocity                                       0                         0
 density                                          0                         0
 x_velocity                         2.753353101e-14           3.924120597e-15
 y_velocity                         7.993605777e-15           2.344838948e-15
 z_velocity                         2.888299106e-15            0.001072104352
 density                            7.549516567e-15           5.147540068e-15
 x_velocity                         7.460698725e-14           9.827722554e-15
 y_velocity                         1.776356839e-14           4.163458001e-15
 z_velocity                         6.996735778e-15            0.001245946531
 density                            2.087219286e-14           1.423143431e-14
 x_velocity                         5.217018224e-15           1.807580086e-14
 y_velocity                         6.806190946e-15           2.245464095e-14
 z_velocity                         1.132427485e-14           1.308674338e-14
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.608793394e-14           4.385254253e-14
 y_velocity                         1.812113827e-14           5.050365669e-14
 z_velocity                         2.742250871e-14           3.077952812e-14
 density                            2.664535259e-15           2.389904308e-15
 x_velocity                          5.41788836e-14           7.721656659e-15
 y_velocity                         1.643130076e-14           4.819946727e-15
 z_velocity                         2.148005881e-15           0.0007977283194
 density                            1.421085472e-14           9.689487188e-15
 x_velocity                         1.350031198e-13           1.778349796e-14
 y_velocity                          4.04121181e-14           9.471866952e-15
 z_velocity                         1.063935007e-14            0.001894771897
 density                            3.264055692e-14           2.225554088e-14
 x_velocity                         3.450039733e-15           1.195361575e-14
 y_velocity                         5.113847917e-15           1.687134842e-14
 z_velocity                         8.659739592e-15           1.000750964e-14
 density                            1.998401444e-15           1.792609081e-15
 x_velocity                         1.187570008e-14           3.237082181e-14
 y_velocity                         1.424853074e-14           3.971069003e-14
 z_velocity                         2.398081733e-14           2.691651042e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                          1.33226763e-14           1.898768031e-15
 y_velocity                         5.107025913e-15            1.49809155e-15
 z_velocity                          2.18772152e-15           0.0008119462849
 density                            5.107025913e-15           3.482159458e-15
 x_velocity                         5.950795412e-14           7.838778704e-15
 y_velocity                         1.287858709e-14           3.018507051e-15
 z_velocity                         6.089509513e-15            0.001084458006
 density                            1.976196984e-14           1.347444312e-14
 x_velocity                         3.307141887e-15           1.145850668e-14
 y_velocity                         4.737719548e-15           1.563044473e-14
 z_velocity                         7.771561172e-15           8.981098398e-15
 density                            2.664535259e-15           2.390145441e-15
 x_velocity                         1.217255463e-14           3.317998892e-14
 y_velocity                         1.554981899e-14           4.333738356e-14
 z_velocity                         2.053912596e-14           2.305349272e-14
 density                            2.442490654e-15           2.190745616e-15
 x_velocity                         1.953992523e-14           2.784859779e-15
 y_velocity                         8.881784197e-15           2.605376609e-15
 z_velocity                         2.801130485e-15            0.001040343608
 density                            4.440892099e-15           3.027964746e-15
 x_velocity                         5.595524044e-14           7.370791916e-15
 y_velocity                         1.643130076e-14           3.851198651e-15
 z_velocity                         5.869397026e-15            0.001045194413
 density                            1.154631946e-14            7.87270834e-15
 x_velocity                         3.119574911e-15           1.080862908e-14
 y_velocity                         5.472347835e-15           1.805409322e-14
 z_velocity                         7.882583475e-15           9.109399803e-15
 density                            2.442490654e-15           2.190966654e-15
 x_velocity                         1.206402599e-14            3.28841612e-14
 y_velocity                         1.696129944e-14           4.727118304e-14
 z_velocity                         2.298161661e-14           2.579498915e-14
 density                            3.108624469e-15           2.788221693e-15
 x_velocity                         1.509903313e-14           2.151937102e-15
 y_velocity                         3.774758284e-15           1.107285059e-15
 z_velocity                         2.133743172e-15           0.0007919078089
 density                            4.662936703e-15           3.179362983e-15
 x_velocity                         6.394884622e-14           8.423762189e-15
 y_velocity                         1.065814104e-14           2.498074801e-15
 z_velocity                           6.7712925e-15            0.001205785866
 density                            1.998401444e-14           1.362584136e-14
 x_velocity                         4.166263688e-15            1.44351715e-14
 y_velocity                         5.432979438e-15           1.792421099e-14
 z_velocity                         9.658940314e-15           1.116222229e-14
 density                            2.664535259e-15           2.390145441e-15
 x_velocity                         1.587749029e-14           4.327891457e-14
 y_velocity                         2.052584448e-14           5.720557878e-14
 z_velocity                         2.997602166e-14           3.364563802e-14
 density                            3.108624469e-15           2.788221693e-15
 x_velocity                         1.687538997e-14           2.405106172e-15
 y_velocity                         3.996802889e-15           1.172419474e-15
 z_velocity                         2.792042083e-15            0.001036121686
 density                            4.662936703e-15           3.179362983e-15
 x_velocity                         6.394884622e-14           8.423762189e-15
 y_velocity                         1.065814104e-14           2.498074801e-15
 z_velocity                         7.985377386e-15            0.001422087608
 density                            2.065014826e-14           1.408003607e-14
 x_velocity                         4.633337984e-15           1.605347943e-14
 y_velocity                         5.869951877e-15           1.936584836e-14
 z_velocity                         1.076916334e-14           1.244523635e-14
 density                            2.664535259e-15           2.390145441e-15
 x_velocity                         1.581525709e-14            4.31092791e-14
 y_velocity                         2.442547575e-14           6.807386066e-14
 z_velocity                          3.14193116e-14           3.526561319e-14
 density                            2.664535259e-15           2.389904308e-15
 x_velocity                         1.154631946e-14            1.64559896e-15
 y_velocity                         3.497202528e-15            1.02586704e-15
 z_velocity                         2.245761516e-15           0.0008336199059
 density                            3.996802889e-15           2.725168272e-15
 x_velocity                          6.30606678e-14           8.306765492e-15
 y_velocity                         2.442490654e-14           5.724754751e-15
 z_velocity                         6.928475925e-15            0.001233928907
 density                            2.042810365e-14           1.392863783e-14
 x_velocity                         5.155598171e-15           1.786299412e-14
 y_velocity                         7.267624003e-15           2.397697755e-14
 z_velocity                         9.880984919e-15           1.141882511e-14
 density                            1.998401444e-15           1.792609081e-15
 x_velocity                         1.601171452e-14           4.364478341e-14
 y_velocity                         2.333587967e-14           6.503715373e-14
 z_velocity                         3.186340081e-14           3.576406708e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                         3.641531521e-14           5.189965951e-15
 y_velocity                         1.065814104e-14           3.126451931e-15
 z_velocity                         2.361892463e-15           0.0008766049183
 density                            1.088018564e-14           7.418513628e-15
 x_velocity                         8.615330671e-14           1.134867962e-14
 y_velocity                         2.042810365e-14           4.787976701e-15
 z_velocity                         9.271752268e-15             0.00165114229
 density                            2.176037128e-14           1.483702726e-14
 x_velocity                         4.341470759e-15            1.50422248e-14
 y_velocity                         6.141774914e-15           2.026263318e-14
 z_velocity                         1.099120794e-14           1.270183916e-14
 density                            1.998401444e-15           1.792609081e-15
 x_velocity                         1.707206425e-14           4.653508813e-14
 y_velocity                         2.159764609e-14           6.019269249e-14
 z_velocity                         3.053113318e-14           3.426870539e-14
 density                            3.330669074e-15           2.987380385e-15
 x_velocity                          5.41788836e-14           7.721656659e-15
 y_velocity                         1.509903313e-14           4.429140235e-15
 z_velocity                         2.767343465e-15            0.001026965487
 density                            1.265654248e-14           8.629699527e-15
 x_velocity                         1.358912982e-13           1.790049465e-14
 y_velocity                           3.5971226e-14           8.431002452e-15
 z_velocity                         1.076910238e-14            0.001917871727
 density                            3.108624469e-14           2.119575322e-14
 x_velocity                         3.292938838e-15            1.14092963e-14
 y_velocity                         5.074011957e-15           1.673992363e-14
 z_velocity                          8.54871729e-15           9.879208237e-15
 density                            2.664535259e-15           2.390145441e-15
 x_velocity                         1.292260569e-14           3.522448054e-14
 y_velocity                         1.625582264e-14           4.530501747e-14
 z_velocity                         2.475797345e-14           2.778880474e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                         2.575717417e-14           3.670951526e-15
 y_velocity                         7.105427358e-15           2.084301287e-15
 z_velocity                          2.30838752e-15           0.0008567257923
 density                            5.551115123e-15           3.784955933e-15
 x_velocity                         6.217248938e-14           8.189768795e-15
 y_velocity                         1.643130076e-14           3.851198651e-15
 z_velocity                         6.505472208e-15            0.001158568658
 density                            2.131628207e-14           1.453423078e-14
 x_velocity                         3.908982513e-15           1.354374979e-14
 y_velocity                         5.288060571e-15             1.7446102e-14
 z_velocity                         7.993605777e-15           9.237701209e-15
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.274425444e-14            3.47383301e-14
 y_velocity                         1.639703998e-14           4.569859053e-14
 z_velocity                         2.298161661e-14           2.579498915e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                         3.197442311e-14           4.557043274e-15
 y_velocity                         9.325873407e-15            2.73564544e-15
 z_velocity                         2.557646086e-15           0.0009494356442
 density                            9.992007222e-15           6.812920679e-15
 x_velocity                         8.082423619e-14           1.064669943e-14
 y_velocity                         2.220446049e-14           5.204322501e-15
 z_velocity                         7.426920758e-15             0.00132262767
 density                             2.24265051e-14           1.529122197e-14
 x_velocity                          4.96087546e-15            1.71883235e-14
 y_velocity                          5.74214477e-15           1.894419362e-14
 z_velocity                         9.769962617e-15            1.12905237e-14
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.589288597e-14           4.332088014e-14
 y_velocity                         2.478989304e-14           6.908949255e-14
 z_velocity                         2.986499936e-14           3.352102455e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                         2.842170943e-14           4.050705132e-15
 y_velocity                         9.325873407e-15            2.73564544e-15
 z_velocity                         1.719649791e-15           0.0006386880725
 density                            9.547918012e-15           6.510124204e-15
 x_velocity                         8.260059303e-14           1.088069283e-14
 y_velocity                          2.26485497e-14           5.308408951e-15
 z_velocity                         7.650887193e-15            0.001362488885
 density                             2.24265051e-14           1.529122197e-14
 x_velocity                         4.769513777e-15           1.652529808e-14
 y_velocity                         6.841708732e-15           2.257181943e-14
 z_velocity                         1.310063169e-14           1.513956587e-14
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.817698485e-14           4.954688428e-14
 y_velocity                         1.959313246e-14           5.460610809e-14
 z_velocity                         3.264055692e-14            3.66363614e-14
 density                            2.664535259e-15           2.389904308e-15
 x_velocity                         3.819167205e-14           5.443135022e-15
 y_velocity                         1.154631946e-14           3.386989592e-15
 z_velocity                         1.917120863e-15           0.0007116322192
 density                            8.659739592e-15           5.904531255e-15
 x_velocity                         9.414691249e-14           1.240164989e-14
 y_velocity                         2.797762022e-14           6.557446352e-15
 z_velocity                         6.989017829e-15            0.001244620397
 density                            2.087219286e-14           1.423143431e-14
 x_velocity                         3.247239717e-15           1.125095906e-14
 y_velocity                         4.777265823e-15           1.576091379e-14
 z_velocity                         8.659739592e-15           1.000750964e-14
 density                            1.998401444e-15           1.792609081e-15
 x_velocity                         1.208646898e-14           3.294533637e-14
 y_velocity                         1.665770928e-14           4.642507652e-14
 z_velocity                         2.164934898e-14           2.429962746e-14
 density                            3.108624469e-15           2.788221693e-15
 x_velocity                         2.309263891e-14            3.29119792e-15
 y_velocity                         6.661338148e-15           1.954032457e-15
 z_velocity                         2.918495203e-15            0.001083812972
 density                            4.662936703e-15           3.179362983e-15
 x_velocity                         5.773159728e-14            7.60478531e-15
 y_velocity                         2.042810365e-14           4.787976701e-15
 z_velocity                         4.701996354e-15           0.0008373129132
 density                            1.287858709e-14           8.781097764e-15
 x_velocity                         4.498951125e-15           1.558785903e-14
 y_velocity                         6.352577698e-15           2.095810306e-14
 z_velocity                         1.043609643e-14           1.206033213e-14
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.629241447e-14           4.440991623e-14
 y_velocity                          2.06055164e-14           5.742762461e-14
 z_velocity                         3.375077995e-14           3.788249614e-14
 density                            2.886579864e-15           2.589063001e-15
 x_velocity                         2.309263891e-14            3.29119792e-15
 y_velocity                         6.661338148e-15           1.954032457e-15
 z_velocity                         2.376207034e-15           0.0008818154002
 density                            5.329070518e-15           3.633557695e-15
 x_velocity                         6.394884622e-14           8.423762189e-15
 y_velocity                          1.82076576e-14           4.267544451e-15
 z_velocity                          6.16257195e-15            0.001097509768
 density                            2.087219286e-14           1.423143431e-14
 x_velocity                         3.127923268e-15           1.083755426e-14
 y_velocity                         5.075184251e-15           1.674379121e-14
 z_velocity                         7.993605777e-15           9.237701209e-15
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                         1.387388468e-14           3.781747989e-14
 y_velocity                         1.728483892e-14           4.817288837e-14
 z_velocity                         2.509104036e-14           2.816264516e-14
 density                            2.442490654e-15           2.190745616e-15
 x_velocity                         3.019806627e-14           4.303874203e-15
 y_velocity                         8.881784197e-15           2.605376609e-15
 z_velocity                         2.089250878e-15           0.0007754934396
 density                            7.549516567e-15           5.147540068e-15
 x_velocity                         7.815970093e-14           1.029570934e-14
 y_velocity                          2.26485497e-14           5.308408951e-15
 z_velocity                         6.618129906e-15            0.001178539541
 density                            2.153832668e-14           1.468562902e-14
 x_velocity                         5.218481897e-15           1.808087216e-14
 y_velocity                         7.580673827e-15            2.50097757e-14
 z_velocity                         1.310063169e-14           1.513956587e-14
 density                            2.664535259e-15           2.390145441e-15
 x_velocity                         1.476705043e-14           4.025207398e-14
 y_velocity                         1.928746876e-14           5.375422261e-14
 z_velocity                         2.886579864e-14           3.239950328e-14
 density                            3.330669074e-15           2.987380385e-15
 x_velocity                          1.33226763e-14           1.898768031e-15
 y_velocity                         5.107025913e-15            1.49809155e-15
 z_velocity                         2.841687066e-15            0.001054644682
 density                            4.884981308e-15           3.330761221e-15
 x_velocity                          6.30606678e-14           8.306765492e-15
 y_velocity                          1.33226763e-14           3.122593501e-15
 z_velocity                         6.293269477e-15            0.001120797438
 density                            2.087219286e-14           1.423143431e-14
 x_velocity                         3.382927619e-15           1.172108728e-14
 y_velocity                         5.263483063e-15           1.736501712e-14
 z_velocity                          8.10462808e-15           9.366002615e-15
 density                            3.108624469e-15           2.788503014e-15
 x_velocity                          1.19374996e-14           3.253927515e-14
 y_velocity                         1.763639148e-14            4.91526662e-14
 z_velocity                         2.309263891e-14           2.591960262e-14
 density                            2.664535259e-15           2.389904308e-15

@baperry2
Copy link
Copy Markdown
Collaborator

@jrood-nrel I don't think I understand your output. What are all the different comparisons? Can you include species and temperature as well?

@ThomasHowarth
Copy link
Copy Markdown
Contributor Author

Sorry for the delay - I've fixed the issue with the merge. Hopefully this is in good enough shape now... Perhaps @jrood-nrel can just confirm these numbers for @baperry2 ?

I'll open another PR playing around with the position of the synchronizes, but I feel it's best we just get this in ASAP - it's been in the pipeline a while.

@jrood-nrel
Copy link
Copy Markdown
Contributor

I gave @baperry2 the output for all the variables. I'm still fine with merging this.

@baperry2
Copy link
Copy Markdown
Collaborator

Started going through what Jon sent me a while ago but then got distracted by other stuff, I'll get back on it.

Copy link
Copy Markdown
Collaborator

@baperry2 baperry2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments, the only thing that definitely needs to be changed before merging is the soot comment in PeleLMeX_Plot.cpp

Copy link
Copy Markdown
Collaborator

@baperry2 baperry2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go once tests pass. Thanks again for doing this!

@baperry2 baperry2 merged commit c8b3030 into AMReX-Combustion:development Nov 21, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants