Skip to content

Add AD component to acoustic wave example#565

Open
dkytezab wants to merge 26 commits intomainfrom
dkz/feature-grad-acoustic
Open

Add AD component to acoustic wave example#565
dkytezab wants to merge 26 commits intomainfrom
dkz/feature-grad-acoustic

Conversation

@dkytezab
Copy link
Copy Markdown
Collaborator

  • Adds Enzyme, Reactant to deps for docs
  • Adds an AD visualization to the Breeze documentation
  • Computes derivative of density perturbation at point after an acoustic wave has propagated
image

@dkytezab dkytezab requested a review from giordano March 17, 2026 17:32
@glwagner
Copy link
Copy Markdown
Member

excited to see the docs preview

Copy link
Copy Markdown
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you need to add the example to

Breeze.jl/docs/make.jl

Lines 30 to 46 in 7fa5816

examples = [
Example("Stratified dry thermal bubble", "dry_thermal_bubble", true),
Example("Cloudy thermal bubble", "cloudy_thermal_bubble", true),
Example("Cloudy Kelvin-Helmholtz instability", "cloudy_kelvin_helmholtz", true),
Example("Shallow cumulus convection (BOMEX)", "bomex", true),
Example("Precipitating shallow cumulus (RICO)", "rico", false),
Example("Convection over prescribed sea surface temperature (SST)", "prescribed_sea_surface_temperature", true),
Example("Inertia gravity wave: many time steppers", "inertia_gravity_wave", true),
Example("Single column radiation", "single_column_radiation", true),
Example("Stationary parcel model", "stationary_parcel_model", true),
Example("Rising parcel: adiabatic ascent", "rising_parcels", true),
Example("Acoustic wave in shear layer", "acoustic_wave", true),
Example("Cloud formation in prescribed updraft", "kinematic_driver", true),
Example("Splitting supercell", "splitting_supercell", false),
Example("Tropical cyclone world", "tropical_cyclone_world", false),
Example("Diurnal cycle of radiative convection", "radiative_convection", false),
]
to actually build it. The boolean argument is whether to build the example in all PRs. How long does it take for you to run this? Edit: ah, nevermind, it's adding stuff to an existing example!

using Oceananigans.Architectures: ReactantState
using Reactant: @trace

Reactant.set_default_backend("cpu")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, any specific reason for running this on CPU?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cheap case, so it doesn't require the GPU. I think our default thinking is to run expensive things on GPU, cheap things on CPU to save resources. Do you agree with that strategy or should we run everything on GPU?

Copy link
Copy Markdown
Member

@giordano giordano Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds sensible! I just wanted to understand the motivation. If it's fast already on CPU no need to use the GPU.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main motivation then is the fact that the example on main is on CPU:

grid = RectilinearGrid(size = (Nx, Nz), x = (-Lx/2, Lx/2), z = (0, Lz),
topology = (Periodic, Flat, Bounded))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk of the time here is spent in compilation. Oh reactant, my reactant.... ❤️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day CompilerCaching.jl will make our lives easier.

@giordano giordano added examples 👩‍🏫 not _everyone_ was born knowing how to use Breeze reactant ☣ towards a differentiable earth labels Mar 17, 2026
@bischtob
Copy link
Copy Markdown
Collaborator

bischtob commented Mar 17, 2026

@dkytezab is there a way we can compare to an estimate or exact solution to make sure the answer is correct?

@giordano
Copy link
Copy Markdown
Member

https://github.com/NumericalEarth/Breeze.jl/actions/runs/23207750341/job/67447818390?pr=565#step:7:31

Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
ERROR: LoadError: StackOverflowError:
Stacktrace:
 [1] promote_to(::Type{Reactant.TracedRArray{Float64, 3}}, rhs::Oceananigans.Fields.Field{Oceananigans.Grids.Center, Oceananigans.Grids.Center, O

CC @Pangoraw

@Pangoraw
Copy link
Copy Markdown
Collaborator

Pangoraw commented Mar 17, 2026

Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.

we can reproduce with show(dδρᵢ).

Not sure how to reproduce this but on 1.12 we also hit the 'llvm.call' op incorrect number of operands (3) for callee (expecting: 4) error.

@dkytezab
Copy link
Copy Markdown
Collaborator Author

dkytezab commented Mar 17, 2026

There are some pretty big discrepancies between the finite-difference derivatives and automatic derivatives. I think this is unrelated to timestepping: there is a difference even if we just call set! and don't time step.

edit: there is some problem differentiating through update_state! which gets called before timestepping. For zero timesteps, the automatic derivatives are nearly-uniformly 1 except at the corners/sides where they are 4/2. This is obviously surprising, because there is zero propagation time, so the sensitivity should only be nonzero at the receiver. One explanation is that we are appending the adjoint of global reads (i.e. writes) hence the 1's, and the adjoints of several halo fills. Hence the 2's and 4's.

derivative of just calling set!(model; ρ = ρᵗ, θ = θ₀, u = uᵇᵍ)

image

@giordano
Copy link
Copy Markdown
Member

#575 will allow new Ocenanigans, which will get the fix for the docs crash in.

@Pangoraw
Copy link
Copy Markdown
Collaborator

Unless we build docs on 1.11 it will also need CliMA/Oceananigans.jl#5415 (or upcoming call conv reactant fix) to get around the next crash in the pipeline.

@giordano
Copy link
Copy Markdown
Member

Ok, now we're getting (unfortunately Literate swallows full error messages and stacktraces):

ERROR: LoadError: LoadError: "failed to run pass manager on module"
in expression starting at /__w/Breeze.jl/Breeze.jl/docs/src/literated/acoustic_wave.md:2
when executing the following code block from inputfile `/__w/Breeze.jl/Breeze.jl/examples/acoustic_wave.jl`
```julia
@info "Compiling differentiated model — this may take a minute..."
compiled_grad = Reactant.@compile raise=true raise_first=true sync=true grad_loss(
    model_ad, dmodel_ad, δρᵢ, dδρᵢ,
    ρᵗ, ρᵇᵍ, uᵇᵍ, θ₀, Δt, Nsteps, target_i, target_k)
@info "Running gradient..."
dδρ, J = compiled_grad(
    model_ad, dmodel_ad, δρᵢ, dδρᵢ,
    ρᵗ, ρᵇᵍ, uᵇᵍ, θ₀, Δt, Nsteps, target_i, target_k)
xs = xnodes(grid_ad, Center())
zs = znodes(grid_ad, Center())
x_target = xs[target_i]
z_target = zs[target_k]
@info @sprintf("Receiver density J = %.6e  at (x=%.1f m, z=%.1f m) after %d steps",
               Float64(only(J)), x_target, z_target, Nsteps)
```

Is that what CliMA/Oceananigans.jl#5415 addresses?

@giordano
Copy link
Copy Markdown
Member

Still getting the same error after checking out CliMA/Oceananigans.jl@4da9e65

@dkytezab
Copy link
Copy Markdown
Collaborator Author

Yeah, there is some underlying issue with Reactant + Julia 1.12 codegen. See EnzymeAD/Reactant.jl#2701

@giordano
Copy link
Copy Markdown
Member

Should we switch the docs to using julia v1.11 then? It's doable, but I'll need to change the docker image.

@glwagner
Copy link
Copy Markdown
Member

Should we switch the docs to using julia v1.11 then? It's doable, but I'll need to change the docker image.

I'm ok with this esp if its computationally feasible

@giordano
Copy link
Copy Markdown
Member

Docs job is failing because stuff like (not the only case, there are multiple errors)

- `w`: parcel vertical velocity [m/s] (prognostic for `PrognosticVerticalVelocity`,
zero for `PrescribedVerticalVelocity`)
- `ρ`: **environmental** density at parcel height [kg/m³] (interpolated from background
profile, not the parcel's own density). The parcel density is computed from
`density(𝒰, constants)` using the ideal gas law applied to the parcel's thermodynamic state.
is being misinterpreted as links (pattern [...] (...)). I can't fix it now

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@giordano
Copy link
Copy Markdown
Member

giordano commented Mar 21, 2026

Preview at https://numericalearth.github.io/BreezeDocumentation/previews/PR565/literated/acoustic_wave/#Differentiability:-sensitivity-to-the-initial-perturbation. But figures aren't displayed anywhere because the figure objects aren't ever at the end of a "cell", which is what's displayed below. Edit: should be fixed in the latest version

Should we wait for EnzymeAD/Reactant.jl#2704 to make it into a new release so that we can go back to Julia v1.12 here? Edit: Reactant tests in #577 with Julia v1.12 are passing, so we may wait for a new Reactant version and go back to building the docs with Julia v1.12 👀

@giordano
Copy link
Copy Markdown
Member

giordano commented Mar 21, 2026

image

Does log10(error) larger than 0 mean that the relative error is larger than 100%? 🤔 Edit: yes, that's visible in the generated docs:

[ Info: Rel. error: min=2.01e-04  median=4.08e-02  mean=1.29e-01  max=3.18e+00

@giordano
Copy link
Copy Markdown
Member

After playing with some parameters locally, I'm convinced that the thing here is that the derivative in those two points is "very small", and with AD vs FD we get two different "very small" derivatives (things like -3.69401e-6 vs -1.42948e-5), but by its nature, finite differences isn't necessarily The Right Answer™️. Should we add a comment along the lines of "while it's useful to compare the patterns between AD and FD, FD is inherently unstable, and as such it doesn't necessarily provide a reliable reference", to justify the difference?

@glwagner
Copy link
Copy Markdown
Member

Maybe we should figure out how to compute a gradient that will be reasonably smooth and easy to access accuracy

@giordano giordano force-pushed the dkz/feature-grad-acoustic branch from e6e9614 to c3ddd47 Compare March 23, 2026 00:14
@dkytezab
Copy link
Copy Markdown
Collaborator Author

I can work on a suitable example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples 👩‍🏫 not _everyone_ was born knowing how to use Breeze reactant ☣ towards a differentiable earth

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants