Add ParticleIDWrapper::make_invalid()#3735
Add ParticleIDWrapper::make_invalid()#3735WeiqunZhang merged 1 commit intoAMReX-Codes:developmentfrom
ParticleIDWrapper::make_invalid()#3735Conversation
c9b0c39 to
c5ef706
Compare
ParticleIDWrapper::negate()ParticleIDWrapper::flip_valid() and ::is_valid()
1e5d663 to
2d9510f
Compare
ParticleIDWrapper::flip_valid() and ::is_valid()ParticleIDWrapper::make_valid()
ParticleIDWrapper::make_valid()ParticleIDWrapper::make_invalid()
A cheaper and explicit way to swap validity sign on particle ids. Not the same as `id = -id`, but also reversible.
|
@WeiqunZhang @atmyers @AlexanderSinn ready for review now - let me know if this looks legit |
|
We will wait till the next release tomorrow. |
|
It's just adding and now changing stuff, so it should be pretty safe, but I will also just need it after the release tomorrow, so no rush. |
|
As another optimization, I explored using 32bit registers via tricks like: bool is_valid () const noexcept
{
// the leftmost bit is our id's inverse sign
auto const * const i32 = (uint32_t*)&m_idata;
return *i32 >> 31;
}This does what one expects on CPU (DWORD over QWORD, 32bit register used over 64bit one) and on CUDA GPUs (SM_80) it demotes a 64bit load to a 32bit one & reduces one |
|
Replacing a 64 bit load with a 32 bit one? Don’t do this, the 64 bit load would be coalesced but the 32 bit load not because there would be a gap to the next thread. The 32 bit version might be slower. |
You are right. Yeah, I though to load coalesced and then copy into a 32bit register, do rest of ops there... but this micro-optimization seems not worth it. |
Summary
A cheaper way to swap validity sign on particle ids, as needed to select and track particles from one kernel to another (e.g., boundary condition treatment, re-emission physics, scraping of particles, etc.).
With our current encoding,
ParticleIDWrapper::make_invalid()is the same asid = -id, but cheaper.Improvements:
Additional background
Host Code
https://godbolt.org/z/KPjzExWz1
CUDA Device Code
PTX: https://godbolt.org/z/6En5rK14o
SASS for SM_80: https://godbolt.org/z/d6zYfxaKG
id = -id: now saves 4 registers 🎉Interesting: there are still no 64bit shifts / but shuffles in CUDA hardware...
Checklist
The proposed changes: