Skip to content

Load XESFM after Oceananigans#5393

Merged
giordano merged 1 commit intomainfrom
mg/xesfm
Mar 13, 2026
Merged

Load XESFM after Oceananigans#5393
giordano merged 1 commit intomainfrom
mg/xesfm

Conversation

@giordano
Copy link
Copy Markdown
Collaborator

@giordano
Copy link
Copy Markdown
Collaborator Author

The Enzyme environment can now be instantiate correctly, while still using MPICH_jll v5, which confirms the proposed solution is effective.

@xkykai
Copy link
Copy Markdown
Collaborator

xkykai commented Mar 13, 2026

I am using this opportunity to learn about this issue from @giordano .

In my use case I would like to use MPITrampoline which trampolines to a custom CUDA-aware MPI that is bundled with NVHPC. Not asking because I am too lazy to check it myself but I would like to understand the problem. Is this expected to work against all kinds of MPI?

So what happens when XESMF is loaded after Oceananigans, given that I've configured the system to use MPITrampoline? Does the conda try to compile something with the MPI that I specify?

Before that when I was running scripts which involve XESMF I also ran into HDF5_jll, NetCDF_jll compilation failures because it complains that I was not using the right MPI (which is the conda's version I believe)

@giordano
Copy link
Copy Markdown
Collaborator Author

The problem is that the Python package brings its own libmpi:

julia> using XESMF, Libdl

julia> filter(contains("libmpi.so"), dllist())[1]
"/home/runner/work/Oceananigans.jl/Oceananigans.jl/.CondaPkg/.pixi/envs/default/lib/libmpi.so.12"

When then another library on julia's side needs a libmpi.so.12, pixi's one would be already loaded, and if the two aren't compatible (e.g. libmpifort.so from MPICH_jll v5 expects libmpi.so.12 to provide the symbol MPIR_fortran_false, which was introduced in mpich v5.0.0, then there's a problem if pixi's mpich is older).

If you were using MPITrampoline, I believe that'd still try to load libmpi.so.12 at some point, but if you have XESMF already loaded then you'd have the wrong libmpi.so.12 around, which goes boom. By loading Oceananigans (which loads MPI.jl automatically) first, we're now forcing julia's libmpi.so.12 to be loaded first, not letting pixi's library to get into the way.

@giordano giordano merged commit 680904c into main Mar 13, 2026
75 of 79 checks passed
@giordano giordano deleted the mg/xesfm branch March 13, 2026 21:41
briochemc added a commit to briochemc/Oceananigans.jl that referenced this pull request Mar 15, 2026
…polarGrid' into bp/offline-ACCESS-OM2

* origin/bp-claude/distributed-FPivot-TripolarGrid: (40 commits)
  Fix implicit MPI import in OrthogonalSphericalShellGrids
  Minimize test_mpi_tripolar.jl diff: rename vars, add diagnostic printing
  Fix formatting of reference list in index.md (CliMA#5397)
  Fix stale UZBC import in OrthogonalSphericalShellGrids
  Remove tmp_MPI_Gadi/ dev workspace and stray test output files
  Code review fixes: remove hardcoded Float64, expand index-tracing docs
  FC/FF col 1 fix: conjugate MPI exchange for non-fixed-point ranks (Rx > 2)
  Wider north buffer optimization: eliminate conjugate MPI, skip x-halo re-exchange for CC/CF
  Add new paper reference to Oceananigans documentation (CliMA#5395)
  Fix parallel test script: source defaults.sh, use --project=.
  Explicit imports in Grids (CliMA#5391)
  Load XESFM after Oceananigans (CliMA#5393)
  Fix rm ENOENT in simulation testsets: use force=true for NFS cleanup
  GPU isbits fix: make Tripolar.fold_topology a type parameter, consolidate tests
  Clean up distributed fold: GPU-safe views, batched MPI, DRY fold-line helpers
  Rewrite distributed fold pipeline: 4-step switch_north_halos! for Rx > 2
  Fix nranks guard: use exact equality instead of >= for partition tests
  Add Partition(4,2) index tracing test to diagnose fold corner bug
  Update simulation_debug for large-pencil 4×2 debugging
  Fix FPivot south masking and increase grid sizes for simulation tests
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants