(0.105.4) Add basic tests for Reactant-Oceananigans correctness by glwagner · Pull Request #5093 · CliMA/Oceananigans.jl

glwagner · 2025-12-31T17:40:01Z

This PR starts to port functionality over from GB-25 for testing the correctness of reactant-compiled kernels that perform basic Oceananigans tasks, like filling halos or computing tendencies.

This PR starts with tests for fill halo regions.

cc @giordano @wsmoses

test/test_reactant_correctness.jl

glwagner · 2026-01-14T01:36:47Z

@wsmoses @jlk9 it looks like a few things don't work with LatitudeLongitudeGrid --- either coriolis=HydrostaticSphericalCoriolis() or advection=WENO(). If this jives with what you've seen I can try to generate a non-Oceananigans MWE.

glwagner · 2026-01-14T18:38:42Z

@jtbuch and @dkytezab are interested in this

dkytezab · 2026-01-14T20:01:57Z

@glwagner seems like the problem also applies to RectilinearGrid if you change the x, y boundary conditions to Bounded.

test/test_reactant_correctness.jl

src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_tendencies.jl

src/Models/HydrostaticFreeSurfaceModels/update_hydrostatic_free_surface_model_state.jl

src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_buffers.jl

ext/OceananigansReactantExt/Architectures.jl

glwagner · 2026-03-07T00:05:17Z

@wsmoses tests pass on my mac so this may be read to merge

glwagner · 2026-03-07T00:30:44Z

drat

wsmoses · 2026-03-07T22:04:10Z

test/test_reactant_correctness.jl

+    )
+
+    # Note: raise=true mode is needed for autodiff but triggers non-deterministic
+    # segfaults in Reactant's CanonicalizeLoopsPass (MLIR bug). All tests below


is there an issue open for this that can be linked to?

i think this is out of date bc the tests below do use raise_first=true

unresolve if not true!

test/test_reactant_correctness.jl

glwagner · 2026-03-08T18:26:27Z

The regression tests are failing bc of a prior issue (#5354) so I will merge this!

@glwagner

…A#5093) * Add basic reactant correctness tests * comment out other tests * add more grids * implement take Gu tendency * test with different raise options * test raise=true and raise=false * raise=true * add fake halo filling kernels * filter triply periodic tests * add reactant to weakdeps * add more tests * updates * clean up * update report * fix * fix * update report * Update test_reactant_correctness.jl * Refactor new_data function to remove ReactantState * Change total_size function parameter type to ShardedGrid * try to restrict offset_array for sharding a bit more * test raise_first too * fix imports * Comment out failing Reactant correctness tests and remove raise=true mode Disable raise=true mode across all tests due to non-deterministic segfaults in Reactant's CanonicalizeLoopsPass (MLIR bug). Comment out TripolarGrid halo tests (Zipper BC compilation crashes) and HydrostaticFreeSurfaceModel time-stepping tests (MLIR codegen error: incorrect operand count). The remaining 102 tests (halo filling + Gu tendencies) pass reliably. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Fix CI to actually run Reactant correctness tests TEST_GROUP was set to "reactant" which matches no group in runtests.jl, so CI was running 0 Reactant tests and reporting success. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Attempt fix * Fix fix * Whitespace * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Delete test/fake_halo_fill_kernels.jl * Delete test/test_fake_halo_fill.jl * Delete reactant_raise_true_report.md * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Update runtests.jl * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Update test_reactant_correctness.jl * Apply suggestion from @glwagner * Use halo=(3,3,3) in Reactant correctness tests and skip initialization_update_state\! for Reactant models - Increase halo size from (1,1,1) to (3,3,3) in fill_halo_regions\! tests - Re-enable Bounded LLG topology test - Add no-op initialization_update_state\! for ReactantHFSM to defer kernel execution to first_time_step\! (runs inside @compile context) Co-Authored-By: Claude Opus 4.6 <[email protected]> * Revert fill_halo tests to halo=(1,1,1) to avoid MLIR pass bug on Linux x64 halo=(3,3,3) with periodic boundaries triggers a Reactant MLIR optimization pass failure on Linux x64. Keep halo=(1,1,1) for fill_halo tests (which pass) while compute_simple_Gu\! retains halo=(3,3,3) as required by WENO. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Add MWE for Reactant MLIR pass failure on periodic halo kernels Standalone reproducer (no Oceananigans dependency) for the MLIR optimization pass bug on Linux x64 with halo size H=3. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Simplify MWE to single kernel Co-Authored-By: Claude Opus 4.6 <[email protected]> * Update MWE to reproduce ka_with_reactant error without CUDA Remove `using CUDA` so the MWE reproduces the same MethodError as Oceananigans: ka_with_reactant has no method without ReactantCUDAExt. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Comment out RectilinearGrid Gu tests with Periodic topologies Reactant's MLIR raise pass fails on periodic halo-filling kernels with halo >= 2 ("cannot raise if yet" error). Add standalone MWE reproducer. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Delete test/reactant_raise_periodic_halo_mwe.jl * Update MWE to reproduce compute_simple_Gu\! CI test failure Minimal script using Oceananigans that reproduces the fill_halo_regions\! raise=true failure on periodic topologies with halo=(3,3,3). Co-Authored-By: Claude Opus 4.6 <[email protected]> * Fix MWE imports and usage instructions Co-Authored-By: Claude Opus 4.6 <[email protected]> * Use view instead of indexing in cpu_face_constructor_r (CliMA#5376) * Use view instead of indexing in cpu_face_constructor_r Avoids scalar indexing errors when grid data is stored on non-CPU architectures (e.g. Reactant ConcreteIFRTArray). Co-Authored-By: Claude Opus 4.6 <[email protected]> * Add on_architecture(::CPU, ::OffsetArray{..., <:AnyConcreteReactantArray}) The generic fallback on_architecture(arch, a) = a doesn't convert OffsetArrays wrapping Reactant arrays. This adds a proper method that converts the underlying Reactant array to Array while preserving the OffsetArray wrapper. Co-Authored-By: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> * Delete test/reactant_raise_periodic_halo_mwe.jl * Add maybe_initialize_state\! no-op for Reactant models The iteration == 0 check in maybe_initialize_state\! evaluates at trace time (ConcreteRNumber(0) == 0 is true), causing a redundant update_state\! to be compiled into every time_step\!. Make it a no-op for Reactant models since first_time_step\! handles initialization explicitly. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Apply suggestion from @glwagner * Replace bcs... splatting with explicit indexing in fill_halo_event\! Extends the fix from 0807efe to polar_boundary_condition.jl and fill_halo_regions_open.jl. Reactant cannot reconcile different call arity from splatting variable-length tuples into the same kernel. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Pass clock and fields to only_local_halos fill_halo_regions\! calls Avoids empty args tuple in fill_halo_event\!, which causes Reactant MLIR compilation failures (empty tuples get eliminated at call sites but not in function definitions, causing operand count mismatch). Co-Authored-By: Claude Opus 4.6 <[email protected]> * Add nothing to tendency kernels for Reactant compatibility KA kernels with implicit Float64 return values (from array assignment as last expression) cause Reactant to generate a Returns{Float64} parameter in the LLVM function definition that doesn't match the call site. Adding `nothing` as the last expression prevents this. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Add kernel_clock workaround for Reactant MLIR/LLVM lowering bug Reactant/Enzyme generates a phantom pointer parameter when mutable structs containing ConcreteIFRTNumber scalar fields are placed in tuples passed to KA kernels. This causes 'llvm.call incorrect number of operands' errors. The workaround converts Clock to a NamedTuple (via kernel_clock) before placing it in kernel argument tuples. The NamedTuple has identical field access semantics and doesn't trigger the LLVM bug. - kernel_clock(clock) defaults to identity (no-op on CPU/GPU) - Reactant extension overrides to return NamedTuple - Applied to all HFSM call sites: tendency kernels, fill_halo_regions\!, implicit_step\!, flux BCs, split-explicit substepping Co-Authored-By: Claude Opus 4.6 <[email protected]> * revert * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Apply suggestion from @glwagner * Test more things * is it about the halo size? * run Reactant tests on 1.11.9 * put halo back to 3 * simplify SubArray convertion to arch * emit something silly for prettytime of tracednumber * try to run reactant tests with checkbounds=auto * fix dispatch * resolve multi region ambiguity * Apply suggestion from @glwagner * multiregion fill halo fix --------- Co-authored-by: William Moses <[email protected]> Co-authored-by: dkytezab <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]>

glwagner added 4 commits December 31, 2025 10:10

Add basic reactant correctness tests

fa9a124

comment out other tests

b402543

add more grids

b06f8ee

implement take Gu tendency

5d80bf7

wsmoses reviewed Dec 31, 2025

View reviewed changes

test/test_reactant_correctness.jl Outdated Show resolved Hide resolved

glwagner added 2 commits December 31, 2025 23:21

test with different raise options

27bc61a

test raise=true and raise=false

dfdd2ec

navidcy added testing 🧪 Tests get priority in case of emergency evacuation extensions 🧬 labels Jan 2, 2026

glwagner added 2 commits January 2, 2026 18:08

raise=true

ee69652

add fake halo filling kernels

6f4e215

glwagner mentioned this pull request Jan 13, 2026

Updating the Oceananigans version to v104.2, ClimaOcean to v0.9.0 PRONTOLab/GB-25#250

Open

glwagner and others added 4 commits January 13, 2026 16:22

Merge branch 'main' into glw/reactant-correctness

b629342

filter triply periodic tests

2311018

add reactant to weakdeps

2927e40

add more tests

7d3c9b6

glwagner added 3 commits January 13, 2026 22:35

updates

7c2b543

clean up

a11b0a2

update report

56d84ae

fix

b983553

glwagner commented Jan 14, 2026

View reviewed changes

test/test_reactant_correctness.jl Outdated Show resolved Hide resolved

glwagner and others added 6 commits January 14, 2026 17:39

fix

69e6b9f

update report

3af982f

Update test_reactant_correctness.jl

f7eb17e

Merge branch 'main' into glw/reactant-correctness

ba20224

Merge branch 'main' into glw/reactant-correctness

47c49b1

Merge branch 'main' into glw/reactant-correctness

a09a329

Apply suggestion from @glwagner

42e99be

glwagner commented Mar 6, 2026

View reviewed changes

src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_tendencies.jl Outdated Show resolved Hide resolved

Apply suggestion from @glwagner

ba3c531

glwagner commented Mar 6, 2026

View reviewed changes

src/Models/HydrostaticFreeSurfaceModels/update_hydrostatic_free_surface_model_state.jl Outdated Show resolved Hide resolved

Apply suggestion from @glwagner

27ea367

glwagner commented Mar 6, 2026

View reviewed changes

src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_buffers.jl Outdated Show resolved Hide resolved

Apply suggestion from @glwagner

f2d795b

glwagner commented Mar 6, 2026

View reviewed changes

ext/OceananigansReactantExt/Architectures.jl Outdated Show resolved Hide resolved

glwagner added 2 commits March 6, 2026 21:13

Apply suggestion from @glwagner

97c4a93

Test more things

cfb9277

glwagner added 8 commits March 6, 2026 15:31

is it about the halo size?

ced236d

run Reactant tests on 1.11.9

b1aab8d

put halo back to 3

8bb6914

simplify SubArray convertion to arch

3c155f5

emit something silly for prettytime of tracednumber

6d07d17

try to run reactant tests with checkbounds=auto

da22e14

fix dispatch

de89568

resolve multi region ambiguity

aa31721

wsmoses reviewed Mar 7, 2026

View reviewed changes

glwagner commented Mar 7, 2026

View reviewed changes

test/test_reactant_correctness.jl Outdated Show resolved Hide resolved

glwagner added 2 commits March 7, 2026 22:06

Apply suggestion from @glwagner

634bd19

multiregion fill halo fix

9b950b8

glwagner merged commit 7d463d6 into main Mar 8, 2026
74 of 79 checks passed

glwagner deleted the glw/reactant-correctness branch March 8, 2026 18:26

giordano mentioned this pull request Mar 18, 2026

Loosen to oceananigans 0.96 [rather than patch release] PRONTOLab/GB-25#265

Open

glwagner mentioned this pull request Mar 29, 2026

Revert "Switch biases to be enums rather than distinct types. (#5318)" #5449

Merged

Conversation

glwagner commented Dec 31, 2025

Uh oh!

Uh oh!

glwagner commented Jan 14, 2026

Uh oh!

glwagner commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkytezab commented Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glwagner commented Mar 7, 2026

Uh oh!

glwagner commented Mar 7, 2026

Uh oh!

wsmoses Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

glwagner Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

glwagner Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glwagner commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

glwagner commented Jan 14, 2026 •

edited

Loading