Skip to content

Merge spack develop as of 2023/07/10 into jcsda_emc_spack_stack#672

Merged
AlexanderRichert-NOAA merged 27 commits intoJCSDA:developfrom
climbfuji:feature/merge_spack_develop_20230710
Aug 8, 2023
Merged

Merge spack develop as of 2023/07/10 into jcsda_emc_spack_stack#672
AlexanderRichert-NOAA merged 27 commits intoJCSDA:developfrom
climbfuji:feature/merge_spack_develop_20230710

Conversation

@climbfuji
Copy link
Copy Markdown
Collaborator

@climbfuji climbfuji commented Jul 14, 2023

Summary

This is the spack-stack PR for JCSDA/spack#295 (merge spack develop as of 2023/07/10 into jcsda_emc_spack_stack).

Update: many serious bugs with this version of spack.

Note. The only way to convince spack to use an existing, external git was to add buildable: False to package.yaml and using spack concretize --reuse.

Testing

  • CI testing
  • jedi-bundle testing on AWS Parallel Cluster with Intel: Only the following tests fail, and these are not due to the spack update (yaml validation error in soca):
The following tests FAILED:
	1521 - test_soca_forecast_pseudo (Failed)
	1548 - test_soca_hofx_4d_pseudo (Failed)
	1555 - test_soca_3dvarfgat_pseudo (Failed)
  • There is still a bit of uncertainty of dealing with _libiconv vs _iconv on macOS with this update, but a working version for Rosetta2 on my macOS gives the following ctest errors for jedi-bundle (the usual/known problems):
The following tests FAILED:
	 14 - test_util_signal_trap (Failed)
	336 - saber_test_error_covariance_training_bump_hdiag-nicas_2_1-1 (Failed)
	412 - saber_test_error_covariance_training_bump_hdiag-nicas_2_2-1 (Failed)
	1042 - ufo_test_tier1_test_ufo_qc_variableassignment (Failed)
	1162 - ufo_test_tier1_test_ufo_opr_gnssrorefmetoffice (Failed)
	1166 - ufo_test_tier1_test_ufo_opr_gnssrobendmetoffice_nopseudo (Failed)
	1174 - ufo_test_tier1_test_ufo_opr_groundgnssmetoffice (Failed)
	1227 - ufo_test_tier1_test_ufo_opr_scatwind_neutral_metoffice (Failed)
	1400 - fv3jedi_test_tier1_errorcovariance (Failed)
	1444 - fv3jedi_test_tier1_errorcovariance_bump (Failed)
	1507 - test_soca_errorcovariance (Failed)
Errors while running CTest
Output from these tests are in: /Users/heinzell/work/spack-stack/spack-stack-update-from-spack-dev-20230710/skylab-testing/build-release/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Notes from Alex:

  • added commit 27f148f, which reflects the change in modules.yaml config from blacklist/whitelist to exclude/include.
  • updated py-netcdf4 to work with recent py-numpy

Applications affected

Potentially all.

Systems affected

All.

Dependencies

Issue(s) addressed

Resolves #667

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@climbfuji climbfuji force-pushed the feature/merge_spack_develop_20230710 branch from debed70 to 7baeee6 Compare July 17, 2023 19:38
@climbfuji
Copy link
Copy Markdown
Collaborator Author

climbfuji commented Jul 19, 2023

Here are my notes for conflicts/changes/things in the spack submodule to watch out for:

first up: GREP FOR "DH* 20230710"

resolved conflicts in /.github/workflows/unit_tests.yaml

resolved conflicts in /.github/workflows/valid-style.yml

git checkout --theirs lib/spack/spack/environment/environment.py

git checkout --theirs lib/spack/spack/test/cmd/env.py

git checkout --theirs lib/spack/spack/test/env.py

resolved conflicts in  lib/spack/spack/test/modules/lmod.py

resolved conflicts in  share/spack/templates/container/singularity.def

resolved conflicts in  var/spack/repos/builtin/packages/esmf/package.py

resolved conflicts in  var/spack/repos/builtin/packages/fms/package.py

resolved conflicts in  var/spack/repos/builtin/packages/freetype/package.py

resolved conflicts in  var/spack/repos/builtin/packages/g2c/package.py

resolved conflicts in  var/spack/repos/builtin/packages/gettext/package.py

resolved conflicts in  var/spack/repos/builtin/packages/git/package.py

git checkout --theirs var/spack/repos/builtin/packages/glib/package.py

resolved conflicts in  var/spack/repos/builtin/packages/hdf/package.py

resolved conflicts in  var/spack/repos/builtin/packages/ip/package.py

resolved conflicts in  var/spack/repos/builtin/packages/libtiff/package.py

resolved conflicts in  var/spack/repos/builtin/packages/madis/package.py

resolved conflicts in  var/spack/repos/builtin/packages/mpich/package.py
--> FIXED TODO! SEND THIS RIGHT BACK TO SPACK DH* 20230710

git checkout --theirs var/spack/repos/builtin/packages/netcdf-c/package.py

resolved conflicts in  var/spack/repos/builtin/packages/openblas/package.py

resolved conflicts in  var/spack/repos/builtin/packages/py-contourpy/package.py

resolved conflicts in  var/spack/repos/builtin/packages/py-h5py/package.py

resolved conflicts in  var/spack/repos/builtin/packages/py-ruamel-yaml-clib/package.py

resolved conflicts in  var/spack/repos/builtin/packages/sp/package.py

git checkout --theirs var/spack/repos/builtin/packages/subversion/package.py

Major update/bug fix in var/spack/repos/builtin/packages/bufr/

Things to note:

  1. One of the changes in spack now creates jedi-fv3-env/1.0.0 instead of xjedi-fv3-env/unified-dev. We should consider, in a follow-up PR, to replace the version 1.0.0 in the jcsda-emc-bundles packages.py with develop or main .

  2. There is now a warning at the end of the spack install command on systems using lua that is annoying but doesn’t cause any harm:
    ==> Warning: detected custom TCL modules configuration in /mnt/experiments-efs/the-real-dom/sp-st-upd-20230710-tst/merge_dev_20230710/envs/unified-env/common, while TCL module file generation for the default module set is disabled. In Spack v0.20 module file generation has been disabled by default. To enable it run:

    $ spack config add 'modules:default:enable:[tcl]'
    That’s ok for now, but since we need to keep supporting tcl for dinosaur cray systems, we need to find a better way for spack-stack-1.5.0. Maybe have the lua/tcl module configs in separate files, detect from the site config used for spack stack create env which one is used (but may need an additional flag to override default lua for generic linux.default and macos.default) and the copy the correct one.

Copy link
Copy Markdown
Collaborator

@srherbener srherbener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on my mac arm64 laptop. spack-stack built successfully and jedi-bundle built successfully (but I had to uninstall git from homebrew and use /usr/bin/git instead due to _iconv symbol not found issue).

I tried running ctest and ran into the _iconv symbol not found issue as well.

@climbfuji climbfuji added the bug Something is not working label Jul 22, 2023
@srherbener
Copy link
Copy Markdown
Collaborator

I have tested jedi-bundle on Orion (Intel/Impi) and all ctests pass except for one ioda converter test: test_iodaconv_ncep_bufr The reason this test fails is because the new [email protected] library build is installing the python module ncepbufr in the lib64 while the module file is setting up PYTHONTPATH to use lib.

This is a very minor (and easily fixable) fault and weighing this against thrashing Dom's feature branches to get this fixed now, it seems that the best path forward is to defer fixing this until after these PRs are merged in.

This fault is not interfering with my skylab testing, it shouldn't interfere with any other testing, and we should be able to get a fix for it in plenty of time for the 1.5.0 release.

What do others think?

@AlexanderRichert-NOAA
Copy link
Copy Markdown
Collaborator

@srherbener I'll look into it real quick and see if there's a straightforward way to fix it that won't require us to re-run tests. If not, then yes I'm okay with us making an issue for it and circling back to it later.

@srherbener
Copy link
Copy Markdown
Collaborator

The skylab-atm-land test is working. The process completed two cycles successfully, which indicates that jedi-bundle and the software stack are good.

Copy link
Copy Markdown
Collaborator

@srherbener srherbener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac and JEDI testing look good

@AlexanderRichert-NOAA
Copy link
Copy Markdown
Collaborator

@srherbener I got a slightly different behavior when installing bufr on my machine (where it seems to install the python library both under lib/ and lib64/). If we're okay with proceeding for now, I'll create an issue and consult with the bufr and possibly spack developers and see where the issue lies.

@srherbener
Copy link
Copy Markdown
Collaborator

@srherbener I got a slightly different behavior when installing bufr on my machine (where it seems to install the python library both under lib/ and lib64/). If we're okay with proceeding for now, I'll create an issue and consult with the bufr and possibly spack developers and see where the issue lies.

@AlexanderRichert-NOAA thanks for investigating into this. I am good with proceeding with the current [email protected] library configuration for now.

Also thanks for the further investigation with the bufr and spack developers!

@AlexanderRichert-NOAA AlexanderRichert-NOAA merged commit d09ff3d into JCSDA:develop Aug 8, 2023
@climbfuji
Copy link
Copy Markdown
Collaborator Author

@AlexanderRichert-NOAA @srherbener @ulmononian Thanks very much for getting this merged while I was away.

I am curious if @srherbener had to use spack concretize --reuse to convince spack to use the external libiconv, or if the changes in the macos.default packages.yaml were enough. Hopefully the hardcoded version in that file won't cause problems later.

@srherbener
Copy link
Copy Markdown
Collaborator

@AlexanderRichert-NOAA @srherbener @ulmononian Thanks very much for getting this merged while I was away.

I am curious if @srherbener had to use spack concretize --reuse to convince spack to use the external libiconv, or if the changes in the macos.default packages.yaml were enough. Hopefully the hardcoded version in that file won't cause problems later.

I did not need to do spack concretize --reuse, ie the changes in the macos.default packages.yaml were enough to get it working on my Mac.

I'm not crazy about the hardcoded version number either. Perhaps there is a better way to do this?

@climbfuji
Copy link
Copy Markdown
Collaborator Author

@AlexanderRichert-NOAA @srherbener @ulmononian Thanks very much for getting this merged while I was away.
I am curious if @srherbener had to use spack concretize --reuse to convince spack to use the external libiconv, or if the changes in the macos.default packages.yaml were enough. Hopefully the hardcoded version in that file won't cause problems later.

I did not need to do spack concretize --reuse, ie the changes in the macos.default packages.yaml were enough to get it working on my Mac.

I'm not crazy about the hardcoded version number either. Perhaps there is a better way to do this?

I tried this again today - didn't need --reuse either.

I think I will have a better solution for libiconv today - detect it as an external package so that the version number is correct. Hopefully this works with the macOS "magic" you described earlier.

@srherbener
Copy link
Copy Markdown
Collaborator

@climbfuji just a note. I tried to get spack external find to locate libiconv, but didn't have any luck. I think this was happening because the file /usr/lib/libiconv.dylib doesn't actually exist on the file system.

I recall reading somewhere that you have to use dlopen to detect /usr/lib/libiconv.dylib. I wonder if that means that we could write a short script/program for the mac that does the dlopen and then we could discover the true libiconv version that way. I think dlopen is accessible from python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working INFRA JEDI Infrastructure

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

Update spack fork from authoritative repo (July 2023) Issue: macOS CI testing no longer uses full binary cache

4 participants