Skip to content

Remove sort in init EB and increase parallelism#644

Merged
marchdf merged 16 commits intodevelopmentfrom
pfx-sum
Jul 14, 2023
Merged

Remove sort in init EB and increase parallelism#644
marchdf merged 16 commits intodevelopmentfrom
pfx-sum

Conversation

@marchdf
Copy link
Copy Markdown
Contributor

@marchdf marchdf commented Jun 22, 2023

This PR increases the parallelism in our definition of the EB structs, also it removes all the sort calls.

EB-C1 on 1 CPU rank and no openmp shows no diffs.

Things left to do:

  • try CPU + MPI on a bigger problem
  • try GPU
  • profile the challenge problem
  • more clean up

@marchdf marchdf requested a review from jrood-nrel June 22, 2023 22:16
@marchdf
Copy link
Copy Markdown
Contributor Author

marchdf commented Jun 23, 2023

Ok good news and bad news.

First the good news. Still no diffs with MPI or GPU. On GPU, with the challenge problem, we significantly improved the initialize_eb2_structs:
this PR:

PeleC::initialize_eb2_structs()                     51      0.211      0.211      0.211   0.30%

development branch:

PeleC::initialize_eb2_structs()                     51      11.74      11.74      11.74  19.92%

So yay.

The bad news. The apply_face_stencil blew up:
This PR:

PeleC::pc_apply_face_stencil()                     800      23.41      23.41      23.41  33.80%

development branch:

PeleC::pc_apply_face_stencil()                     800      1.004      1.004      1.004   1.71%

I think what's going on is that I have a bunch of duplicate faces and I am not catching those.

@marchdf
Copy link
Copy Markdown
Contributor Author

marchdf commented Jul 11, 2023

Ok I think I figured it out.

On the challenge problem, the apply face stencil goes back to being:

PeleC::pc_apply_face_stencil()                     800      1.067      1.067      1.067   2.28%

which is basically the same as development.
And the init eb structs drops to:

PeleC::initialize_eb2_structs()                     51    0.07289    0.07289    0.07289   0.16%

which is  a 160X speedup for that function.

@marchdf marchdf force-pushed the pfx-sum branch 3 times, most recently from 6e87941 to 730ade4 Compare July 12, 2023 19:54
@marchdf
Copy link
Copy Markdown
Contributor Author

marchdf commented Jul 12, 2023

I am getting unexpected diffs on GPU with the challenge problem. I don't see them on CPU on my laptop...

@marchdf marchdf marked this pull request as ready for review July 13, 2023 19:25
@marchdf marchdf marked this pull request as draft July 13, 2023 19:25
@marchdf marchdf marked this pull request as ready for review July 14, 2023 13:35
@marchdf marchdf enabled auto-merge (squash) July 14, 2023 13:36
@marchdf marchdf merged commit f9be122 into development Jul 14, 2023
@marchdf marchdf deleted the pfx-sum branch July 14, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants