Skip to content

Remove initialization_update_state! and clean up initialization#5420

Merged
glwagner merged 20 commits intomainfrom
glw/remove-initialization-update-state
Mar 26, 2026
Merged

Remove initialization_update_state! and clean up initialization#5420
glwagner merged 20 commits intomainfrom
glw/remove-initialization-update-state

Conversation

@glwagner
Copy link
Copy Markdown
Member

@glwagner glwagner commented Mar 21, 2026

Summary

This PR intends to clarify the meaning of "initialization" versus other actions associated with set!. A principal source of confusion is initialization_update_state!, whose name was intended to mean "update state, during initialization". However, the name seemed to be interpreted as "update state AND initialization", such that this function was overloaded with initialization activities. To fix this confusion this PR eliminates initialization_update_state!.

Another related issue is that set! was being misinterpreted as an initial condition function that can be called only once. This is not true; the intent of the design is that set! can be called multiple times. This actually can be useful for some cases where a second call to set! may use a computation from the first call. There isn't really a use case for this as far as I can tell for ocean modeling, but it applies to other kinds of models (like atmosphere models and as a corollary, for coupled models), and I believe consistency and non-surprising behavior across the ecosystem is important.

To help resolve these inconsistencies, this PR creates a distinction between "state reconciliation" which is needed to ensure consistency between 1) the 3D velocity field and the barotropic mode and 2) between the grid metrics and surface displacement for MutableVerticalDiscretization. In particular, state reconcilation needs to be performed whenever the model state is changed --- so it can occur more often than "initialization", which occurs only once. Moreover state reconciliation is "idempotent", in that multiple calls to reconcile_state!(model) (a new function for this PR) have no additional effect.

In contrast to state reconcilation, initialization is NOT guaranteed to be idempotent, and therefore must strictly be called only once. Initialization is now performed by time_step!(simulation), which is called by run!. This also means that "bare" time-stepping with time_step!(model, dt) (ie without Simulation) is not guaranteed correct unless initialize!(model) is called beforehand. Right now this issue only applies to models that use the Lagrangian-averaged DynamicSmagorinsky closure.

Note that we also call reconcile_state!(model) during initialization (recall that reconcile_state! is idempotent; so it can be called as often as needed), and we call fill_halo_regions! on prognostic variables as well. This resolves an issue raised on #5352 where halo filling needs to be moved out of update_state! (meaning that update_state! can no longer fulfill the role of a complete auxiliary state update). After this PR, a total auxiliary state update requires calling both reconcile_state! (which is not called during time-stepping), and update_state! which is called during time-stepping.

In summary this PR

  • Removes the confusing initialization_update_state! function, replacing it with a clear separation of concerns
  • Introduces reconcile_state! — ensures auxiliary state (barotropic velocities, vertical coordinate σ) is consistent with prognostic fields after external mutation (e.g. via set!)
  • Renames initialize_free_surface!reconcile_free_surface! and initialize_vertical_coordinate!reconcile_vertical_coordinate! to reflect their true role as reconciliation, not one-time initialization
  • Renames maybe_initialize_state!maybe_prepare_first_time_step! (calls only update_state! at iteration 0 as a safety net for bare model time-stepping)
  • Adds reconcile_state kwarg to set! for advanced users
  • Updates model interface documentation with idempotency requirements
  • Bumps version to 0.106.1

Function roles after this PR

Function Idempotent? Purpose
reconcile_state! Yes Make auxiliary state that isn't changed by update_state! (U,V,σ) consistent with prognostic fields
update_state! Yes Compute auxiliary state except the parts handled by reconcile_state!: recompute diagnostics, fill halos (async), compute tendencies
initialize! No reconcile_state! + sync halo fills + closure field init
maybe_prepare_first_time_step! Yes Safety net: calls reconcile_state! and update_state! at iter 0 for bare model time-stepping

Closes discussion from #5352.

Test plan

  • Smoke tests: model construction, set!, SplitExplicit barotropic mode, Simulation.run!, bare time_step!
  • HFSM tests: 152 passed, 7 broken (pre-existing)
  • Simulation tests: 55/55 passed
  • Time stepping tests: 118 passed, 2 broken (pre-existing)
  • Reactant HFSM tests: 2 errors (pre-existing MLIR failures, same on main)
  • Multi-region / CubedSphere tests (running)
  • CI

🤖 Generated with Claude Code

glwagner and others added 4 commits March 20, 2026 18:55
In tick!, both clock.last_Δt and clock.last_stage_Δt were set to the
same Δt.mlir_data, causing them to alias. Inside @trace for loops,
this triggers "Attempt to donate a buffer which is also used by the
same call to Execute()" because XLA tries to donate one while reading
the other in the while loop carry. Break the alias by creating a copy
via Δt + zero(Δt) for last_stage_Δt.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace the confusing `initialization_update_state!` with a clear separation
of concerns between `reconcile_state!`, `initialize!`, and `update_state!`:

- `reconcile_state!`: ensures auxiliary state (barotropic velocities, vertical
  coordinate scaling) is consistent with prognostic fields. Called in `set!`
  and `initialize!`. Idempotent.

- `initialize!`: one-time setup before first time step. Calls `reconcile_state!`,
  synchronous halo fills, and `initialize_closure_fields!`. Allowed to be
  non-idempotent.

- `update_state!`: idempotent recomputation of diagnostics/derived quantities.
  Called every time step and after `set!`.

Renames:
- `initialize_free_surface!` → `reconcile_free_surface!`
- `initialize_vertical_coordinate!` → `reconcile_vertical_coordinate!`
- `maybe_initialize_state!` → `maybe_initialize!` (now also calls `initialize!`)

The `set!` function gains a `reconcile_state` kwarg (default `true`) for
advanced users who set barotropic velocities independently.

Bump version to 0.106.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
maybe_initialize! should only call update_state!, not initialize!.
Calling initialize! inside time_step! would cause double-initialization
when used with Simulation (which already calls initialize! in
Simulation.initialize!).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Also move reconcile_state! definition to TimeSteppers (where update_state!
lives) and call it from maybe_prepare_first_time_step! so that bare
time_step!(model, Δt) reconciles state on the first step.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
glwagner and others added 6 commits March 21, 2026 06:53
XLA folds Δt + zero(Δt) → Δt, so the aliasing between last_Δt and
last_stage_Δt persists. Use Reactant.Ops.optimization_barrier which
XLA cannot fold, ensuring distinct buffers for the while loop carry.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add `materialize_clock!(clock, timestepper)` called in HFSM and NH model
constructors. For QuasiAdamsBashforth2TimeStepper, this sets
`clock.last_Δt = clock.last_stage_Δt` to ensure they are distinct objects.
This is needed for Reactant, where aliased ConcreteRNumber fields cause
XLA buffer donation errors in compiled loops.

With materialize_clock! breaking the alias at construction time, the
optimization_barrier workaround in the Reactant tick! override is no
longer needed and is removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
materialize_clock! in the constructor doesn't prevent MLIR-level aliasing
that occurs at runtime inside tick! when both last_Δt and last_stage_Δt
are assigned the same Δt value. The optimization_barrier is still needed
until Reactant provides a proper buffer copy op.

See #5389 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
materialize_clock! in the model constructor ensures last_Δt and
last_stage_Δt are distinct ConcreteRNumber objects. Assigning the same
value to both in tick! via .mlir_data does not re-alias the objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
For QAB2, last_Δt and last_stage_Δt always hold the same value after
tick!. In the Reactant extension, materialize_clock! aliases them via
setfield! (bypassing the ReactantClock setproperty! override) so that
Reactant's tracer sees one buffer, avoiding XLA buffer donation errors.

The src/ QAB2 method is removed since aliasing is only needed for
Reactant's ConcreteRNumber fields (normal Float64 fields are immutable
values with no aliasing concern).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@simone-silvestri
Copy link
Copy Markdown
Collaborator

I think this is ready to merge. We just need to fix the multi-region tests, I can take a look.

Copy link
Copy Markdown
Member Author

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.01%. Comparing base (85ba915) to head (8d89c68).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
ext/OceananigansReactantExt/Models.jl 0.00% 2 Missing ⚠️
ext/OceananigansReactantExt/TimeSteppers.jl 75.00% 1 Missing ⚠️
src/TimeSteppers/TimeSteppers.jl 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5420   +/-   ##
=======================================
  Coverage   74.01%   74.01%           
=======================================
  Files         400      400           
  Lines       22875    22876    +1     
=======================================
+ Hits        16930    16932    +2     
+ Misses       5945     5944    -1     
Flag Coverage Δ
buildkite 68.87% <87.09%> (-0.03%) ⬇️
julia 68.87% <87.09%> (-0.03%) ⬇️
reactant_unit 3.77% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@glwagner glwagner merged commit f231920 into main Mar 26, 2026
70 checks passed
@glwagner glwagner deleted the glw/remove-initialization-update-state branch March 26, 2026 05:16
@giordano giordano added the breaking change 💔 Concerning a change which breaks the API label Mar 26, 2026
giordano added a commit to JuliaRegistries/General that referenced this pull request Mar 26, 2026
giordano added a commit to JuliaRegistries/General that referenced this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change 💔 Concerning a change which breaks the API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants