Skip to content

Fix incorrect site configs, revert zlib-ng to zlib, update Pre-configured Sites#1558

Merged
climbfuji merged 9 commits intoJCSDA:release/1.9.0from
rickgrubin-noaa:fix_site_configs
Mar 12, 2025
Merged

Fix incorrect site configs, revert zlib-ng to zlib, update Pre-configured Sites#1558
climbfuji merged 9 commits intoJCSDA:release/1.9.0from
rickgrubin-noaa:fix_site_configs

Conversation

@rickgrubin-noaa
Copy link
Copy Markdown
Collaborator

Summary

Multiple issues addressed:

  • Fix incorrect module file syntax (hercules, orion, ursa)
  • Revert zlib-ng (ursa)
  • Update Pre-configured Sites (hercules, jet, orion, ursa)

Testing

Incorrect site configs:

  • manually edited resultant module files / they load correctly

Revert zlib-ng:

  • reinstalled spack-stack

Documentation:

  • built within ReadTheDocs to verify veracity and formatting

Systems affected

  • hercules
  • orion
  • ursa

Dependencies

None

Issue(s) addressed

Link the issues addressed or resolved by this PR (use Fixes #??? for fully resolved issues)

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@rickgrubin-noaa rickgrubin-noaa requested review from RatkoVasic-NOAA and climbfuji and removed request for climbfuji March 12, 2025 17:22
@rickgrubin-noaa rickgrubin-noaa self-assigned this Mar 12, 2025
Copy link
Copy Markdown
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one-off modifications for Orion and Hercules are ok for release/1.9.0. However, we don't want to merge this information back, and we need a permanent and automated solution for develop.

@rickgrubin-noaa
Copy link
Copy Markdown
Collaborator Author

The one-off modifications for Orion and Hercules are ok for release/1.9.0. However, we don't want to merge this information back, and we need a permanent and automated solution for develop.

Agree that an automated solution is required.

spack-managed-x86-64_v3/v1.0 must be specified when creating / building the stack in order to be able to load the oneAPI compiler and MPI modules.

Given its presence in the site config files, it is necessarily added to the meta modules per spack stack setup-meta-modules.

I've looked at Lmod's inherit() function, it's not clear to me that this can be practically applied.

The interim solution in the PR is to not load spack-managed-x86-64_v3/v1.0 but instead prepend the path it sets to MODULEPATH, thus preventing

$ echo $MODULEPATH
/apps/spack-managed/modulefiles/linux-rocky9-x86_64/Core:/apps/other/modulefiles:/apps/containers/modulefiles:/apps/licensed/modulefiles

Loading spack-managed-x86-64_v3/v1.0 yields:

$ module load spack-managed-x86-64_v3/v1.0
$ echo $MODULEPATH
/apps/spack-managed-x86_64_v3-v1.0/modulefiles/Core:/apps/other/modulefiles:/apps/containers/modulefiles:/apps/licensed/modulefiles

The difference the -1.0 path, and pushenv() is intentional:

The old and new stacks are not compatible. The expectation is to load spack-managed before executing any other module use statements. pushenv is the only mechanism that allows for swapping between the two stacks that consistently removes the old tree, but returns it on unload. Using the "remove_path" function doesn't revert the changes.

I'm not sure what constitutes a repeatable and automated solution, as it's specific to certain hosts. Perhaps it would work to modify MODULEPATH (in setup.sh, or after sourcing that file) to be what's required to find the compiler and MPI modules?

@RatkoVasic-NOAA
Copy link
Copy Markdown
Collaborator

How about switching to intel-oneapi-compilers/2024.1.0 instead of intel-oneapi-compilers/2024.2.1 ?
Is there anything that we have in 2024.2 that we don't have in 2024.1?

@climbfuji
Copy link
Copy Markdown
Collaborator

climbfuji commented Mar 12, 2025

How about switching to intel-oneapi-compilers/2024.1.0 instead of intel-oneapi-compilers/2024.2.1 ? Is there anything that we have in 2024.2 that we don't have in 2024.1?

2024.1 isn't supported, there are bugs in icx and icpx if I recall correctly that prevent us from using it / building Python stacks.

Let's table this for now and let's make sure we have an issue in spack-stack to look into it after the 1.9.1 release.

@climbfuji
Copy link
Copy Markdown
Collaborator

The one-off modifications for Orion and Hercules are ok for release/1.9.0. However, we don't want to merge this information back, and we need a permanent and automated solution for develop.

Agree that an automated solution is required.

spack-managed-x86-64_v3/v1.0 must be specified when creating / building the stack in order to be able to load the oneAPI compiler and MPI modules.

Given its presence in the site config files, it is necessarily added to the meta modules per spack stack setup-meta-modules.

I've looked at Lmod's inherit() function, it's not clear to me that this can be practically applied.

The interim solution in the PR is to not load spack-managed-x86-64_v3/v1.0 but instead prepend the path it sets to MODULEPATH, thus preventing

$ echo $MODULEPATH
/apps/spack-managed/modulefiles/linux-rocky9-x86_64/Core:/apps/other/modulefiles:/apps/containers/modulefiles:/apps/licensed/modulefiles

Loading spack-managed-x86-64_v3/v1.0 yields:

$ module load spack-managed-x86-64_v3/v1.0
$ echo $MODULEPATH
/apps/spack-managed-x86_64_v3-v1.0/modulefiles/Core:/apps/other/modulefiles:/apps/containers/modulefiles:/apps/licensed/modulefiles

The difference the -1.0 path, and pushenv() is intentional:

The old and new stacks are not compatible. The expectation is to load spack-managed before executing any other module use statements. pushenv is the only mechanism that allows for swapping between the two stacks that consistently removes the old tree, but returns it on unload. Using the "remove_path" function doesn't revert the changes.

I'm not sure what constitutes a repeatable and automated solution, as it's specific to certain hosts. Perhaps it would work to modify MODULEPATH (in setup.sh, or after sourcing that file) to be what's required to find the compiler and MPI modules?

Several other site configs have instructions that must be run before building spack-stack environments (i.e. before source setup.sh, spack stack create env ..., etc.) and/or before loading spack-stakc modules. See https://github.com/JCSDA/spack-stack/wiki/spack%E2%80%90stack%E2%80%901.9.1-release-documentation-(placeholder), example Derecho. This is absolutely ok.

@climbfuji climbfuji merged commit ba656ce into JCSDA:release/1.9.0 Mar 12, 2025
8 of 9 checks passed
@rickgrubin-noaa
Copy link
Copy Markdown
Collaborator Author

Several other site configs have instructions that must be run before building spack-stack environments (i.e. before source setup.sh, spack stack create env ..., etc.) and/or before loading spack-stakc modules. See https://github.com/JCSDA/spack-stack/wiki/spack%E2%80%90stack%E2%80%901.9.1-release-documentation-(placeholder), example Derecho. This is absolutely ok.

This makes sense, and I suspect that it will work.

I am not convinced that it fixes the loading of modules for users; the spack-stack meta modules (stack-oneapi, stack-intel-oneapi-mpi) will attempt to load intel-oneapi-compilers/2024.2.1 and intel-oneapi-mpi/2021.13.1, and for that to succeed, one must first load spack-managed-x86-64_v3/v1.0, which takes us right back to: Orion & Hercules module config issue for spack-stack-1.9.0

@rickgrubin-noaa rickgrubin-noaa deleted the fix_site_configs branch March 21, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants