Conversation
|
Attaching my checkpoint comparison script |
|
I've added |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Thanks! It'd be good to also add a test (idea: setting up a small simulation, running it til the end while dumping a checkpoint, then also resume the checkpoint and compare the result of the first run and the resumed one?) |
to generalize prognostic_state and restore_prognostic_state! to all timesteppers
|
@giordano I've added a test per your suggestion. Verified that I get the same diffs (no bitwise or approx agreement at this point) as my previous workflow |
| all_match_approx &= all(d -> d ≈ 0, momentum_diffs) | ||
| all_match_bitwise &= all(d -> d == 0, momentum_diffs) | ||
|
|
||
| println(all_match_bitwise ? "\nPASS: restart is bitwise identical to no-restart." : | ||
| "\nFAIL: restart differs from no-restart.") | ||
| println(all_match_approx ? "\nPASS: restart is approximately identical to no-restart." : | ||
| "\nFAIL: restart significantly differs from no-restart.") |
There was a problem hiding this comment.
Rather than printing PASS or FAIL, use the @test macro from the Test standard library, which would loudly error out in case of a check failure. See the other test files in this directory for inspiration about the test organisation (just a random example: test/advection_schemes.jl)
There was a problem hiding this comment.
Thanks for the tip @giordano, this is still a WIP -- trying to get Breeze restart to be bitwise equal like Oceananigans at the moment, then I'll cleanup the testing...
|
|
||
| # ── Compare final states ───────────────────────────────────────────────────── | ||
|
|
||
| println("\nComparing no-restart vs restarted final states:") |
There was a problem hiding this comment.
For testing Breeze we use ParallelTestRunner.jl which allows us run the tests in parallel. As a side effect, though, this capture all the output printed to screen, and shows it only at the end rather than live, making prints typically a bit useless as they aren't shown while the tests are running anyway. But specifically to this case, I think most of the prints below should be replaced by @tests, as suggested above.
| - `implicit_solver`: Optional implicit solver for diffusion | ||
| """ | ||
| struct SSPRungeKutta3{FT, U0, TG, TI} <: AbstractTimeStepper | ||
| struct SSPRungeKutta3{FT, U0, TG, TI} <: AbstractBreezeTimeStepper |
There was a problem hiding this comment.
I believe we will want to move this upstream to Oceananigans at some point. What makes this time-stepper specific to Breeze?
There was a problem hiding this comment.
The Oceananigans timesteppers store
This reverts commit 433d31b.
| # By default, don't compute tendencies, cf. Oceananigans: | ||
| # "After restoring from a checkpoint, skip tendency computation since the restored | ||
| # tendencies are already correct." | ||
| function TimeSteppers.update_state!(model::AtmosphereModel, callbacks=[]; compute_tendencies=false) |
There was a problem hiding this comment.
I believe this is what broke the geostrophic_subsidence_forcings and forcing_and_boundary_conditions tests, right?
|
Further restart testing using CliMA/Oceananigans.jl#5379
|
|
I think relative difference is more informative than the absolute one, which depends on the scale of the numbers involved. |
|
Agree that relative diff often adds context @giordano but I don't think it changes the story here. I think the diffs are significant -- see the updated table. |
|
That seems to be even worse 😄 |
| function Oceananigans.prognostic_state(model::AtmosphereModel) | ||
| state = (clock = prognostic_state(model.clock), | ||
| timestepper = prognostic_state(model.timestepper)) | ||
| return merge(state, prognostic_fields(model)) |
There was a problem hiding this comment.
I believe that model.forcing can also have a prognostic state, which is missing here (in particular the SubsidenceForcing, which depends on horizontal averages?)
There was a problem hiding this comment.
For SubsidenceForcing, the averaged_field is diagnosed every update_state!
|
The dry thermal bubble is probably a good one to focus on at first since it's relatively simple and has no forcing or microphysics... The fact that the acoustic wave works but thermal bubble does not is interesting. The difference is that the acoustic wave uses |
|
Also, it might be sufficient for this PR to enable checkpointing for one of the dynamics + simple models. We can work on ensuring that all of the dynamics, forcings, boundary conditions, etc are supported in future PRs. |
|
Question: are you comparing using the CPU or the GPU architecture? I'm mildly sure our GPU examples aren't fully reproducible even on the same machine because we don't fix the GPU RNG seed correctly (I believe CUDA.jl documentation about this isn't accurate, or maybe just out of date). |
CPU |
Changing the thermal bubble from anelastic to compressible results in a perfect restart. |
|
Ok, that's interesting: so there seems to be something not right with one dynamics but not the other? At least that's partially reassuring 😁 |
may need |
@glwagner I don't think there are any prognostic variables in dynamics. |
|
Adding another data point. This is the RICO example run for 24 hours. "run1" and "run2" is the same case, but run on different machines -- effectively different realizations of the same conditions. "run2" was run from 0-24h (continuous) and also 0-8h, 8-24h (restarted). Presented profiles are averages over the last four hours. I think this actually looks pretty good, any thoughts @glwagner @giordano?
|
but there are for |
CompressibleDynamics is capable of a perfect restart, Anelastic (what I'm testing) is not. |

Addresses #443
Following https://github.com/CliMA/Oceananigans.jl/blob/51c67b5dc9445fe3b9015b3ccad4ebdfb89258f1/docs/src/simulations/checkpointing.md, I've independently verified that Oceananigans restarts work flawlessly -- bitwise agreement.
The same workflow applied to the free convection demo (https://numericalearth.github.io/BreezeDocumentation/stable/#Quick-Start)
--vs--
gives qualitatively identical results. The fields differ by: