Enable OpenMP in addParticles#4615
Conversation
|
I think The omp parallel region could include the for loop over levels. |
|
Won't this have threading issues in |
Good point - we still need to do that part in serial. |
| for (int lev = 0; lev < other.numLevels(); ++lev) | ||
| { | ||
| const auto& plevel_other = other.GetParticles(lev); | ||
| for(MFIter mfi = other.MakeMFIter(lev); mfi.isValid(); ++mfi) |
There was a problem hiding this comment.
This is not related to OpeMP. But we could use this version MFIter MakeMFIter (int lev, const MFItInfo& info) const to disable gpu device sync here and above. If we disable it here, we need to add Gpu::streamSynchronizeAll() after the loop over levels.
Do you want to do this in a follow-up or this PR?
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
I will try this in a follow-on PR.
|
I did a quick test in ImpactX with 10M particles and looking at init, e.g., with I see a 2x speedup for |
|
Tested scalability a bit more for a 10M particle test, I think this looks reasonable (tapers off quickly, no benefit from threads in the old one): |
|
Correctness is fine, too: ImpactX tests all pass locally. cmake --fresh -S . -B build -DImpactX_PYTHON=ON -DpyAMReX_IPO=OFF -DImpactX_PYTHON_IPO=OFF -DImpactX_FFT=ON -DImpactX_amrex_src=$HOME/src/amrex
cmake --build build -j 6
ctest --test-dir build --output-on-failure -j 6 |
Turn on OMP threading in addParticles.
The proposed changes: