For our code, we need a deterministic version of SumBoundary.
We have implemented a deterministic particle deposition algorithm, which we have verified this for individual FABs, but due to the use of atomics in SumBoundary, it is not reproducible on GPU after calling SumBoundary.
Somewhat related: #3739.
@chongchonghe @WeiqunZhang