Reuse GPU tags in MLEBABecLap::applyBC()#4882
Reuse GPU tags in MLEBABecLap::applyBC()#4882WeiqunZhang merged 10 commits intoAMReX-Codes:developmentfrom
MLEBABecLap::applyBC()#4882Conversation
This comment was marked as outdated.
This comment was marked as outdated.
MLEBABecLap:: applyBC()MLEBABecLap::applyBC()
|
This is currently broken. Using more than 1 AMR level causes convergence issues |
|
Slight performance improvement when reusing GPU tags in GPU Device: NVIDIA A5000 Test: After Before |
|
/run-hpsf-gitlab-ci |
|
GitLab CI 1380239 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1380239. |
|
It looks like this might be a very small performance improvement at best. Adding |
|
The total runtime is faster by 3.7% I think that's quite decent if repeatable. |
|
You can benchmark with both versions |
|
I could rewrite this without using |
|
Sure. That would be better. Thanks! |
|
This is great! Thanks! I did a test using 4 AMD GPUs on a Frontier node with 512^3 cells. The applyBC time went down from 0.157 to 0.118. The total MLMG::solve time went down from 0.679 to 0.639, consistent with the performance improvement of applyBC. I used |
|
Oh wow that's great. In that case, I will update |
|
Please do MLCellLinOp::applyBC() in a different PR. |
|
@WeiqunZhang Is there a specific reason for iterating over |
|
I don't think there are any reasons other than personal taste. |
Do you plan to rewrite it? Either way is fine with me. |
Same optimization as implemented in #4882. --------- Co-authored-by: Ankith A Das <[email protected]> Co-authored-by: Weiqun Zhang <[email protected]>
Summary
Additional background
Checklist
The proposed changes: