Kernel fusing and stream synchronizes#556
Kernel fusing and stream synchronizes#556baperry2 merged 102 commits intoAMReX-Combustion:developmentfrom
Conversation
|
I talked about this with @marchdf today. I'm hoping to set up an automated test script where we can plug each of these file changes in 1 by 1 and quantify the diffs in some kind of plot since we expect the results to be nondeterministic even without the changes. I'm going to work on trying to set up some testing for this because I would like to get these changes in but we want to be careful. |
|
Ok I wrote a script that checked the changes in each file one at a time and had it compare against reference plot files for the flamesheet case and an EB case. I also ran the cases with no changes twice to understand the inherent non determinism. Here is a few of the main variables all catted that show the absolute error (on the left) are all not far from machine precision so I think I'm satisfied. |
|
@jrood-nrel I don't think I understand your output. What are all the different comparisons? Can you include species and temperature as well? |
|
Sorry for the delay - I've fixed the issue with the merge. Hopefully this is in good enough shape now... Perhaps @jrood-nrel can just confirm these numbers for @baperry2 ? I'll open another PR playing around with the position of the synchronizes, but I feel it's best we just get this in ASAP - it's been in the pipeline a while. |
|
I gave @baperry2 the output for all the variables. I'm still fine with merging this. |
|
Started going through what Jon sent me a while ago but then got distracted by other stuff, I'll get back on it. |
baperry2
left a comment
There was a problem hiding this comment.
Just some minor comments, the only thing that definitely needs to be changed before merging is the soot comment in PeleLMeX_Plot.cpp
baperry2
left a comment
There was a problem hiding this comment.
Should be good to go once tests pass. Thanks again for doing this!
This is a pretty extensive PR, which may take a fair amount of time to review.
It attempts to make everything consistent and make (micro-)optimisations where possible.
I'll start with a draft as I continue working on it. Here's a preliminary list of changes:
std::endl;->\n(Consistency, types, micro-optimisations #562)forloops:i++->++i(Consistency, types, micro-optimisations #562)constandconstexpr(Consistency, types, micro-optimisations #562)PeleLM::namespace in several places (particularlyPeleLM::TimeStamp) (Consistency, types, micro-optimisations #562)int,Real,TimeStamp(and other small types, unless they are modified), and by reference for anything in anamrex::Vector, or large types. (Consistency, types, micro-optimisations #562)amrex::ParallelFor(MultiFab...where possible. There is more to be done here, but I'm not sure if an overload/workaround exists for working onnodaltilebox's. I also need a little guidance to determine exactly when I canamrex::Gpu::streamSynchronize()in or outside oflevloops.[=]) has been replaced with explicit capturing (Consistency, types, micro-optimisations #562, mostly)m_incompressible) are pulled outside ofParallelForloops where possible. (Consistency, types, micro-optimisations #562)reserveandemplace_backwhere possible, rather thanresizeanddefine, to constructMultiFabs in place. (Consistency, types, micro-optimisations #562)If any additional consistent patterns emerge, I'll add them here.