Skip to content

Replace hypot(x, y) with sqrt(x*x + y*y) for GPU compatibility#48

Merged
siddharthabishnu merged 78 commits intomainfrom
sb/non-uniform-conformal-mapping
Oct 20, 2025
Merged

Replace hypot(x, y) with sqrt(x*x + y*y) for GPU compatibility#48
siddharthabishnu merged 78 commits intomainfrom
sb/non-uniform-conformal-mapping

Conversation

@siddharthabishnu
Copy link
Copy Markdown
Collaborator

@siddharthabishnu siddharthabishnu commented Oct 17, 2025

This change replaces calls to hypot(x, y) with sqrt(x*x + y*y) to improve GPU compatibility.

The hypot function was lowering to the CUDA symbol __nv_hypot, which caused JIT compilation failures in Oceananigans tests.

Using explicit multiplication avoids external libdevice dependencies and ensures portability across GPU backends.

Added the script 'elliptic_quasi_conformal_cubed_sphere_grid.jl' in the 'src'
directory, which leverages elliptic grid generation techniques to enlarge corner
cells of the conformal cubed sphere while preserving near-orthogonality. This
potentially offers a solution to both computational intensity and grid alignment
concerns.
Refined the quasi-conformal cubed sphere grid using the Ensemble Kalman
Inversion (EKI) method. EKI, an advanced derivative-free data assimilation
technique, iteratively adjusts model parameters to optimize performance and
accuracy, based on the comparison of model outputs with observational data. In
our application, EKI is employed to optimize the parameters of elliptic grid
generation, aiming to enhance the grid's quality. This improvement is measured
by steering three key grid diagnostics---orthogonality, isotropy, and uniformity
of cell sizes---towards their ideal values representing the “observational data”
in our optimization process.
Implement Haversine formula to compute spherical distance.
@siddharthabishnu siddharthabishnu force-pushed the sb/non-uniform-conformal-mapping branch from f9ef5af to d697527 Compare October 17, 2025 21:40
@navidcy
Copy link
Copy Markdown
Member

navidcy commented Oct 17, 2025

Two questions:

  • Is x*x different from x^2?

  • I don't understand the issue here... I thought these methods are called on elements of the grid, thus x, y, z are numbers.

@siddharthabishnu
Copy link
Copy Markdown
Collaborator Author

siddharthabishnu commented Oct 17, 2025

Two questions:

  • Is x*x different from x^2?
  • I don't understand the issue here... I thought these methods are called on elements of the grid, thus x, y, z are numbers.
  • I believe x*x is faster than x^2 on the GPU, which is the main reason I chose to use it.
  • The issue itself isn’t here but in Oceananigans, which throws the following error:
JIT session error: Symbols not found: [ __nv_hypot ] lookupError Failed to materialize symbols: { (enzymejitdl_23, { entry }) } [2025/10/17 20:03:20.897] ERROR Compilation failed, MLIR module written to /tmp/reactant_cZpeeZ/module_000_I6HH_post_all_pm.mlir -@-> /var/lib/buildkite-agent/.julia-oceananigans/packages/Reactant/IgTfV/src/mlir/IR/Pass.jl:119 Reactanigans unit tests: Error During Test at /var/lib/buildkite-agent/Oceananigans.jl-26323/test/test_reactant.jl:64 Got exception outside of a @test "failed to run pass manager on module"

On the GPU, the hypot function lowers to the CUDA symbol __nv_hypot, which the Reactant/XLA toolchain isn’t finding. To resolve this, I replaced it with a GPU-safe equivalent that uses sqrt, which avoids the external CUDA dependency.

@navidcy
Copy link
Copy Markdown
Member

navidcy commented Oct 18, 2025

are we missing an Adapt statement here?

cc @simone-silvestri, @glwagner

@simone-silvestri
Copy link
Copy Markdown

simone-silvestri commented Oct 20, 2025

I don't think so, it looks like an issue with Reactant/XLA. I guess if we can use a simpler function that does the same thing that's ok.

@siddharthabishnu siddharthabishnu merged commit a3fea33 into main Oct 20, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants