Skip to content

Improve benchmark tests#4842

Merged
simone-silvestri merged 14 commits intomainfrom
ss/improve-benchmarks-pipeline
Oct 13, 2025
Merged

Improve benchmark tests#4842
simone-silvestri merged 14 commits intomainfrom
ss/improve-benchmarks-pipeline

Conversation

@simone-silvestri
Copy link
Copy Markdown
Collaborator

I am afraid that, given the very low values given to velocity, and an almost constant temperature and salinity, the benchmark tests do not really run in a fair way. I had already noticed a case where the performance depended on the initial conditions (i.e. if your velocities are zeros, the kernel automatically does not compute the reconstructions).

This PR changes the initial conditions of the benchmark tests to make sure they are fair, i.e. a stratified fluid in z with much higher velocity fluctuations that cannot die out in the first iteration

@simone-silvestri simone-silvestri added the benchmark performance runs preconfigured benchamarks and spits out timing label Oct 10, 2025
@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

The benchmarks seem to be fair, as this PR has no effect on the performance results.
We probably can close this PR even if I would prefer merging this so we know we have a test case which is solid.
However, do you guys know what all these test failures that do not pertain to this PR are?
@navidcy @glwagner @siddharthabishnu

Updated the bottom function to use a fixed denominator for bathymetry calculations and added Random.seed! for consistency in tests.
@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

Also, a random bathymetry with values going from -5000 to -1000, as it is right now, is not really representative of an ocean simulation where the bathymetry is somewhat smoother (the benchmarks here are hitting way more reduced advection than what would happen in a real simulation). Therefore, I think it is better to use a sloped bathymetry.

@glwagner
Copy link
Copy Markdown
Member

The benchmarks should measure time per time-step (not SYPD).

@glwagner
Copy link
Copy Markdown
Member

Where are the benchmarks uploaded? Can we put a link into the documentation or readme or somewhere?

Copy link
Copy Markdown
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there benchmarks for NonhydrostaticModel?

cc @giordano

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

The benchmarks should measure time per time-step (not SYPD).

They do better (for GPU only)! They measure the timing of each kernel independently. For the CPU tests, I have this open PR #4711, which probably still needs some work for the formatting of the output.

are there benchmarks for NonhydrostaticModel?

No, but we can include them for sure. The tests are run on Tartarus, and a .txt file summarizing the performance of the tested simulation is uploaded as a buildkite artifact.

@glwagner
Copy link
Copy Markdown
Member

and a .txt file summarizing the performance of the tested simulation is uploaded as a buildkite artifact.

How can one view the artifact? Is there documentation or instructions somewhere?

@glwagner
Copy link
Copy Markdown
Member

They do better (for GPU only)! They measure the timing of each kernel independently.

Why does it matter what the bathymetry is then?

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

Where are the benchmarks uploaded? Can we put a link into the documentation or readme or somewhere?

The tests are run on Tartarus, and a .txt file summarizing the performance of the tested simulation is uploaded as a buildkite artifact. I think they are not really user-friendly, more tests for developers to make sure we don't hit some performance regression:

This is an example of the output of the benchmark tests
buildkiteartifacts.com_52b8ca89-71e4-4c6b-b9a2-94f4dcea2227_019661c8-be0b-455f-80e2-9a259b307d77_0199cdc9-54e6-4211-a2f6-4bb0c7921248_0199cdc9-5dc2-4d76-a0dd-849eb1184893_diff_immersed_output.txt_response-content-type=text%2Fplain&X-Amz-Algorithm=AWS4.pdf

The artifacts are uploaded on the github pipeline (for example)
https://buildkite.com/clima/oceananigans-benchmarks-1/builds/3442#0199cdc9-5dc2-4d76-a0dd-849eb1184893
under the artifacts tab
image

The bathymetry has a role because of the different methods that are used near the immersed boundary which will modify the
profile of the simulation.

@glwagner
Copy link
Copy Markdown
Member

The bathymetry has a role because of the different methods that are used near the immersed boundary which will modify the
profile of the simulation.

But if we are using non-short-circuiting logic, the same methods are called in every cell regardless of the immersed boundary.

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

I am not sure that always happens on a GPU; there might be some optimizations that occur under the hood.

@glwagner
Copy link
Copy Markdown
Member

I am not sure that always happens on a GPU; there might be some optimizations that occur under the hood.

I don't think it's possible because the bathymetry is determined by an array whose values can change / are only known at runtime (not compile time)

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

Yeah, but at runtime, how CUDA organizes execution might depend on possible zeros.
I noticed that if you multiply by zero a very complicated computation (like, for example, a WENO reconstruction), the reconstruction is avoided altogether.

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

@giordano
Copy link
Copy Markdown
Collaborator

To make Greg happy, we could automatically upload the benchmark results somewhere so that we can produce charts like https://enzymead.github.io/Enzyme-JAX/benchmarks/ and https://enzymead.github.io/Reactant.jl/benchmarks/. Those are using https://github.com/benchmark-action/github-action-benchmark, but we can explore alternative visualisations if you prefer, just to give an idea. This could be another thing I can work on next month, if you're interested

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

simone-silvestri commented Oct 13, 2025

I would be very happy if you would like to take this on! Also, this could help with #4711. I can work on adding some relevant non-hydrostatic benchmarks in another PR (if I can find some time).

@giordano
Copy link
Copy Markdown
Collaborator

I opened #4851 to track this task.

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

In the meantime, I merge this PR that just changes the initial condition and physical setup slightly.

@simone-silvestri simone-silvestri merged commit 6a75213 into main Oct 13, 2025
58 of 59 checks passed
@simone-silvestri simone-silvestri deleted the ss/improve-benchmarks-pipeline branch October 13, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark performance runs preconfigured benchamarks and spits out timing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants