Skip to content

Enable modules for packages not compiled with preferred compiler (update spack-stack setup-meta-modules); remove need for external ecflow#1257

Merged
climbfuji merged 31 commits intoJCSDA:developfrom
climbfuji:feature/boost_gcc4intel
Aug 28, 2024
Merged

Enable modules for packages not compiled with preferred compiler (update spack-stack setup-meta-modules); remove need for external ecflow#1257
climbfuji merged 31 commits intoJCSDA:developfrom
climbfuji:feature/boost_gcc4intel

Conversation

@climbfuji
Copy link
Copy Markdown
Collaborator

@climbfuji climbfuji commented Aug 20, 2024

Summary

Note. I pinky swear that I am going to rewrite that old setup-meta-modules extension that was clobbered together in an afternoon three years ago with the thinking "let's try something and then do it properly next month".

This PR makes the necessary changes to the site configs and the setup-meta-modules extension to build certain packages with other compilers than the preferred compiler and still be able to load the module for that package. We can only do this now that we have the concept of a preferred compiler in our environments. One assumption made here is that the MPI provider, if any, is compiled with the preferred compiler (I think this is a reasonable assumption to make).

Our use cases are:

  1. bison needs to be compiled with gcc when the preferred compiler is oneapi. Strictly speaking, we don't need the bison module in this case, but I tested it on my laptop and it works.
  2. ecflow and boost must be compiled with gcc when the preferred compiler is intel. This allows us to move away from external ecflow packages that don't work with the proposed update of Python to 3.11.7 (because the external ecflow was compiled with an old Python 3.9). In this case, we need the ecflow modulefile. I tested this on the Ubuntu CI runner and on Narwhal.

Caveat: I have not tested if this new capability works with packages that depend on MPI (which is compiled with the preferred compiler) but that get compiled with a different compiler (e.g. something like intel-oneapi-mpi/2021.12.0/gcc/11.2.0 when the packages using the preferred compiler would have intel-oneapi-mpi/2021.12.0/intel/2021.12.0).

Still todo:

  • Configure gcc compiler used as backend for Intel for Atlantis, Gaea C5, Gaea C6
    • Atlantis deferred to a follow-up PR that also configures the oneAPI compiler
    • Gaea C5 and Gaea C6 - @AlexanderRichert-NOAA @RatkoVasic-NOAA any last-minute changes for the gcc backend for Gaea C5 and C6 for this PR, or do you want to do this as a follow-up PR/when we roll out the release?
  • Remove external ecflow from packages.yaml and make sure every site has an external qt@5 in its packages.yaml
  • Fix unit test, or disable because we are going to rewrite the setup-meta-modules extension after the 1.8.0 release. Yes, this time for sure! done

Testing

  • Tested for oneapi / bison on @climbfuji's laptop (unified environment)
  • Tested for intel / ecflow on Ubuntu CI runner and on Narwhal
  • More testing?

Applications affected

None (no changes to how applications are run)

Systems affected

All using Intel or oneAPI compilers

Dependencies

none

Issue(s) addressed

Link the issues addressed or resolved by this PR (use Fixes #??? for fully resolved issues)

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on some of the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@ashley314 ashley314 mentioned this pull request Aug 22, 2024
3 tasks
@climbfuji climbfuji force-pushed the feature/boost_gcc4intel branch from e0c6d92 to 5b75c23 Compare August 22, 2024 18:29
…n env using one principal (preferred) compiler
@climbfuji climbfuji force-pushed the feature/boost_gcc4intel branch from 5b75c23 to 5d22dd7 Compare August 22, 2024 18:38
@climbfuji climbfuji force-pushed the feature/boost_gcc4intel branch from 8ac23f7 to 99aa58d Compare August 23, 2024 14:58
logging.info(" ... ... appending {} to MODULEPATHS_SAVE".format(modulepath_save))
MODULEPATHS_SAVE.append(modulepath_save)

# For tcl modules remove the compiler prefices from the module contents
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block (remove the compiler prefices for tcl) got moved up, since it is needed for all compilers - not just the preferred compiler. This allows us to skip the remainder of the loop for compilers that are not the preferred compiler.

@climbfuji climbfuji force-pushed the feature/boost_gcc4intel branch from ded8f03 to 871fd3d Compare August 23, 2024 16:17
@climbfuji climbfuji marked this pull request as ready for review August 23, 2024 18:09
Copy link
Copy Markdown
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully installed intel on Hercules.
Approved.

@climbfuji climbfuji self-assigned this Aug 27, 2024
@ashley314
Copy link
Copy Markdown
Collaborator

I was able to install on S4, although ecflow might not have successfully compiled with gcc? I also cannot figure out how to load the spack build ecflow instead of the version at /data/prod/jedi/spack-stack/.

image

@climbfuji
Copy link
Copy Markdown
Collaborator Author

@ashley314 Your issue may be that you still have an external ecflow in the S4 site config. You will want to remove that and make sure you have an external qt@5 instead. And you will want to remove the ecflow module from the list of excluded modules in the site config. I think @srherbener can help you (I'll be talking about the PR in the spack-stack meeting today).

Copy link
Copy Markdown
Collaborator

@AlexanderRichert-NOAA AlexanderRichert-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work fine based on testing on my local machine (namely, loading stack-intel reveals gcc-built modules, e.g., boost).

@climbfuji
Copy link
Copy Markdown
Collaborator Author

This seems to work fine based on testing on my local machine (namely, loading stack-intel reveals gcc-built modules, e.g., boost).

Thanks for testing @AlexanderRichert-NOAA ! I'll wait for approval from JCSDA before I merge this.

@climbfuji climbfuji requested a review from srherbener August 27, 2024 20:44
@ashley314
Copy link
Copy Markdown
Collaborator

@climbfuji thanks for the suggestions. I worked with @srherbener, we removed ecflow as an external package and qt was already declared. The exclude for ecflow also needed to be removed from the modules file. We were able to then install ecflow and made it through ldmod refresh with everything looking good. Ran the meta module script, but then when trying to load the jedi environment ecflow is still not showing up. Do you have any ideas on what else is missing?

image

Although still seeing only boost:
image

@climbfuji
Copy link
Copy Markdown
Collaborator Author

@climbfuji thanks for the suggestions. I worked with @srherbener, we removed ecflow as an external package and qt was already declared. The exclude for ecflow also needed to be removed from the modules file. We were able to then install ecflow and made it through ldmod refresh with everything looking good. Ran the meta module script, but then when trying to load the jedi environment ecflow is still not showing up. Do you have any ideas on what else is missing?

image Although still seeing only boost: image

I'll ping you in slack

packages:
all:
compiler:: [[email protected]]
compiler:: [[email protected]] # todo: add gcc here
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check. Is the intention to wait until after the 1.8.0 release to make these changes for Gaea?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do this as part of the site config updates on the release branch and then bring it back to develop if that makes sense?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. Happy to approve!

packages:
all:
compiler:: [[email protected]]
compiler:: [[email protected]] # todo: add gcc here
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. Happy to approve!

@climbfuji climbfuji merged commit bcd873d into JCSDA:develop Aug 28, 2024
@climbfuji climbfuji deleted the feature/boost_gcc4intel branch August 28, 2024 15:25
@climbfuji climbfuji mentioned this pull request Aug 30, 2024
74 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

5 participants