Skip to content

Commit 34bfda1

Browse files
climbfujiRatkoVasic-NOAAUbuntufmahebert
authored
Add template for jedi-mpas-nvidia and documentatio for setting up environment (#1084)
* First version of configs/templates/jedi-mpas-nvidia-dev template * Add pkg-config to list of excluded lua/tcl modules * Update configs/sites/noaa-gcloud/README.md: add R2D2 scrubber if applicable * Add tier-2 section back in doc/source/PreConfiguredSites.rst * Update submodule pointer for spack * Update path to modulefiles on Hera * Add a new section to doc/source/NewSiteConfigs.rst specifically for building the jedi-mpas-nividia environment with the Nvidia compilers Co-authored-by: Francois Hebert <[email protected]> --------- Co-authored-by: RatkoVasic-NOAA <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Francois Hebert <[email protected]>
1 parent 3d1a782 commit 34bfda1

File tree

7 files changed

+251
-5
lines changed

7 files changed

+251
-5
lines changed

configs/common/modules_lmod.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ modules:
7171
- openssl
7272
- perl
7373
- pkgconf
74+
- pkg-config
7475
- qt
7576
- randrproto
7677
- readline

configs/common/modules_tcl.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ modules:
7373
- openssl
7474
- perl
7575
- pkgconf
76+
- pkg-config
7677
- qt
7778
- randrproto
7879
- readline

configs/sites/noaa-gcloud/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ yum install -y xorg-x11-apps
1818
yum install -y perl-IPC-Cmd
1919
yum install -y gettext-devel
2020
yum install -y m4
21+
yum install -y finger
2122
exit
2223

2324
# Create a script that can be added to the cluster resource config so that these packages get installed automatically
@@ -37,10 +38,17 @@ yum install -y xorg-x11-apps
3738
yum install -y perl-IPC-Cmd
3839
yum install -y gettext-devel
3940
yum install -y m4
41+
yum install -y finger
4042
EOF
4143

4244
chmod a+x /contrib/admin/basic_setup.sh
4345

46+
# Enable R2D2 experiment scrubber in cron (if applicable)
47+
48+
Refer to https://github.com/JCSDA-internal/jedi-tools/tree/develop/crontabs/noaa-gcloud
49+
50+
The scripts are all set up in the /contrib space and should work after a restart of the cluster. However, any updates to R2D2 that require changes to the scrubber scripts need to be made!
51+
4452
# Create a mysql config for local R2D2 use (if applicable)
4553

4654
sudo su
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# The intent of this template is to minimize the jedi-mpas-env virtual environment
2+
# to provide only the packages needed to compile jedi-bundle with mpas (only).
3+
# Updated April 2024 by Dom Heinzeller
4+
spack:
5+
concretizer:
6+
unify: when_possible
7+
view: false
8+
include:
9+
- site
10+
- common
11+
12+
specs:
13+
14+
# Externals or gcc-built packages
15+
- cmake
16+
- git
17+
- git-lfs
18+
- wget
19+
- curl
20+
- pkg-config
21+
- python
22+
23+
# Several packages are commented out and not removed from the list;
24+
# this is intentional since they may be needed for running ctest etc.
25+
26+
# Packages built with nvhpc
27+
- zlib-api %nvhpc
28+
- hdf5 %nvhpc
29+
- netcdf-c %nvhpc ~blosc ~dap ~zstd
30+
- netcdf-fortran %nvhpc
31+
- parallel-netcdf %nvhpc
32+
- parallelio %nvhpc
33+
#- nccmp
34+
35+
- blas
36+
- boost %nvhpc
37+
#- bufr
38+
- ecbuild %nvhpc
39+
#- eccodes
40+
- eckit %nvhpc
41+
- ecmwf-atlas %nvhpc
42+
- fckit %nvhpc
43+
# Currently using openblas, would be nice if we could use the nvhpc package/provider for this
44+
- fftw-api
45+
# Doesn't build with nvhpc:
46+
#- gsibec
47+
- gsl-lite %nvhpc
48+
- jedi-cmake %nvhpc
49+
#- nlohmann-json
50+
#- nlohmann-json-schema-validator
51+
#- odc
52+
- sp %nvhpc
53+
- udunits %nvhpc
54+
- jasper %nvhpc

doc/source/NewSiteConfigs.rst

Lines changed: 171 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ It is also instructive to peruse the GitHub actions scripts in ``.github/workflo
1313
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
1414
| Compiler | Versions tested/in use in one or more site configs | Spack compiler identifier |
1515
+===========================================+======================================================================+===========================+
16-
| Intel classic (icc, icpc, ifort) | 2021.3.0 to the latest available version in oneAPI 2023.1.0 [#fn1]_ | ``intel@`` |
16+
| Intel classic (icc, icpc, ifort) | 2021.3.0 to the latest available version in oneAPI 2023.2.3 [#fn1]_ | ``intel@`` |
1717
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
1818
| Intel mixed (icx, icpx, ifort) | all versions up to latest available version in oneAPI 2023.1.0 | ``intel@`` |
1919
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
@@ -23,6 +23,8 @@ It is also instructive to peruse the GitHub actions scripts in ``.github/workflo
2323
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
2424
| LLVM clang (clang, clang++, w/ gfortran) | 10.0.0 to 14.0.3 | ``clang@`` |
2525
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
26+
| Nvidia HPC SDK (nvcc, nvc++, nvfortran) | 12.3 (Nvidia HPC SDK 24.3) [#fn3]_ | ``nvhpc@`` |
27+
+-------------------------------------------+----------------------------------------------------------------------+---------------------------+
2628

2729
.. rubric:: Footnotes
2830

@@ -33,6 +35,9 @@ It is also instructive to peruse the GitHub actions scripts in ``.github/workflo
3335
Note that ``[email protected]`` compiler versions are fully supported, and ``[email protected]`` will work but requires the :ref:`workaround noted below<apple-clang-15-workaround>`.
3436
Also, when using ``[email protected]`` you must use Command Line Tools version 15.1, and the Command Line Tools versions 15.3 and newer are not yet supported.
3537
38+
.. [#fn3]
39+
Support for Nvidia compilers is experimental and limited to a subset of packages. Please refer to :numref:`Section %s <NewSiteConfigs_Linux_CreateEnv_Nvidia>` below.
40+
3641
.. _NewSiteConfigs_macOS:
3742

3843
------------------------------
@@ -419,6 +424,8 @@ The following instructions were used to prepare a basic Red Hat 8 system as it i
419424
420425
This environment enables working with spack and building new software environments, as well as loading modules that are created by spack for building JEDI and UFS software.
421426

427+
.. _NewSiteConfigs_Linux_Ubuntu_Prerequisites:
428+
422429
Prerequisites: Ubuntu (one-off)
423430
-------------------------------------
424431

@@ -473,6 +480,8 @@ The following instructions were used to prepare a basic Ubuntu 20.04 or 22.04 LT
473480

474481
This environment enables working with spack and building new software environments, as well as loading modules that are created by spack for building JEDI and UFS software.
475482

483+
.. _NewSiteConfigs_Linux_CreateEnv:
484+
476485
Creating a new environment
477486
--------------------------
478487

@@ -610,3 +619,164 @@ See the :ref:`documentation <Duplicate_Checker>` for usage information including
610619
spack stack setup-meta-modules
611620
612621
15. You now have a spack-stack environment that can be accessed by running ``module use ${SPACK_STACK_DIR}/envs/unified-env.mylinux/install/modulefiles/Core``. The modules defined here can be loaded to build and run code as described in :numref:`Section %s <UsingSpackEnvironments>`.
622+
623+
624+
.. _NewSiteConfigs_Linux_CreateEnv_Nvidia:
625+
626+
Creating a new environment with Nvidia compilers
627+
------------------------------------------------
628+
629+
.. warning::
630+
Support for Nvidia compilers is experimental and limited to a small subset of packages of the unified environment. The Nvidia compilers are known for their bugs and flaws, and many packages simply don't build. The strategy for building environments with Nvidia is therefore the opposite of what it is with other supported compilers.
631+
632+
In order to build environments with the Nvidia compilers, a different approach is needed than for our main compilers (GNU, Intel). Since many packages do not build with the Nvidia compilers, the idea is to provide as many packages as possible as external packages or build them with ``gcc``. Because our spack extension ``spack stack setup-meta-modules`` does not support combiniations of modules built with different compilers, packages not being built with the Nvidia compilers need to fulfil the two following criteria:
633+
634+
1. The package is used as a utility to build or run the code, but not linked into the application (this may be overly restrictive, but it ensures that the application will be able to leverage all of Nvidia's features, for example run on GPUs).
635+
636+
2. One of the following applies:
637+
638+
a. The package is installed outside of the spack-stack environment and made available as an external package. A typical use case is a package that is installed using the OS package manager.
639+
640+
b. The package is built with another compiler (typically ``gcc``) within the same environment, and no modulefile is generated for the package. The spack modulefile generator in this case ensures that other packages that depend on this particular package have the necessary paths in their own modules. If the ``gcc`` compiler itself requires additional ``PATH``, ``LD_LIBRARY_PATH``, ... variables to be set, then these can be set in the spack compiler config for the Nvidia compiler (similar to how we configure the ``gcc`` backend for the Intel compiler).
641+
642+
With all of that in mind, the following instructions were used on an Amazon Web Services EC2 instance running Ubuntu 22.04 to build an environment based on template ``jedi-mpas-nvidia-dev``. These instructions follow the one-off setup instructions in :numref:`Section %s <NewSiteConfigs_Linux_Ubuntu_Prerequisites>` and replace the instructions in Section :numref:`Section %s <NewSiteConfigs_Linux_CreateEnv>`.
643+
644+
1. Follow the instructions in :numref:`Section %s <NewSiteConfigs_Linux_Ubuntu_Prerequisites>` to install the basic packages. In addition, install the following packages using `apt`:
645+
646+
.. code-block:: console
647+
648+
sudo su
649+
apt update
650+
apt install -y cmake
651+
apt install -y pkg-config
652+
exit
653+
654+
2. Download the latest version of the Nvidia HPC SDK following the instructions on the Nvidia website. For ``[email protected]``:
655+
656+
.. code-block:: console
657+
658+
curl https://developer.download.nvidia.com/hpc-sdk/ubuntu/DEB-GPG-KEY-NVIDIA-HPC-SDK | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg
659+
echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | sudo tee /etc/apt/sources.list.d/nvhpc.list
660+
sudo su
661+
apt update
662+
apt-get install -y nvhpc-24-3
663+
exit
664+
665+
3. Load the correct module shipped with ``nvhpc-24-3``. Note that this is only required for ``spack`` to detect the compiler and ``openmpi`` library during the environment configuration below. It is not required when using the new environment to compile code.
666+
667+
.. code-block:: console
668+
module purge
669+
module use /opt/nvidia/hpc_sdk/modulefiles
670+
module load nvhpc-openmpi3/24.3
671+
672+
4. Clone spack-stack and its dependencies and activate the spack-stack tool.
673+
674+
.. code-block:: console
675+
676+
git clone --recurse-submodules https://github.com/jcsda/spack-stack.git
677+
cd spack-stack
678+
679+
# Sources Spack from submodule and sets ${SPACK_STACK_DIR}
680+
source setup.sh
681+
682+
5. Create a pre-configured environment with the default (nearly empty) site config for Linux and activate it (optional: decorate bash prompt with environment name). At this point, only the ``jedi-mpas-nvidia-dev`` template is supported.
683+
684+
.. code-block:: console
685+
686+
spack stack create env --site linux.default --template jedi-mpas-nvidia-dev --name jedi-mpas-nvidia-env
687+
cd envs/jedi-mpas-nvidia-env/
688+
spack env activate [-p] .
689+
690+
6. Temporarily set environment variable ``SPACK_SYSTEM_CONFIG_PATH`` to modify site config files in ``envs/jedi-mpas-nvidia-env/site``
691+
692+
.. code-block:: console
693+
694+
export SPACK_SYSTEM_CONFIG_PATH="$PWD/site"
695+
696+
7. Find external packages, add to site config's ``packages.yaml``. If an external's bin directory hasn't been added to ``$PATH``, need to prefix command.
697+
698+
.. code-block:: console
699+
700+
spack external find --scope system \
701+
--exclude bison --exclude cmake \
702+
--exclude curl --exclude openssl \
703+
--exclude openssh --exclude python
704+
spack external find --scope system wget
705+
spack external find --scope system openmpi
706+
spack external find --scope system python
707+
spack external find --scope system curl
708+
spack external find --scope system pkg-config
709+
spack external find --scope system cmake
710+
711+
8. Find compilers, add to site config's ``compilers.yaml``
712+
713+
.. code-block:: console
714+
715+
spack compiler find --scope system
716+
717+
9. Unset the ``SPACK_SYSTEM_CONFIG_PATH`` environment variable
718+
719+
.. code-block:: console
720+
721+
unset SPACK_SYSTEM_CONFIG_PATH
722+
723+
10. Add the following block to ``envs/jedi-mpas-nvidia-env/spack.yaml`` (pay attention to the correct indendation, it should be at the same level as ``specs:``):
724+
725+
.. code-block:: console
726+
727+
packages:
728+
all:
729+
providers:
730+
731+
zlib-api: [zlib]
732+
blas: [nvhpc]
733+
compiler:
734+
735+
nvhpc:
736+
externals:
737+
- spec: [email protected] %nvhpc
738+
modules:
739+
- nvhpc/24.3
740+
buildable: false
741+
python:
742+
buildable: false
743+
require:
744+
- '@3.10.12'
745+
curl:
746+
buildable: false
747+
cmake:
748+
buildable: false
749+
pkg-config:
750+
buildable: false
751+
752+
11. If you have manually installed lmod, you will need to update the site module configuration to use lmod instead of tcl. Skip this step if you followed the Ubuntu instructions above.
753+
754+
.. code-block:: console
755+
756+
sed -i 's/tcl/lmod/g' site/modules.yaml
757+
758+
12. Process the specs and install
759+
760+
It is recommended to save the output of concretize in a log file and inspect that log file using the :ref:`show_duplicate_packages.py <Duplicate_Checker>` utility.
761+
This is done to find and eliminate duplicate package specifications which can cause issues at the module creation step below. Specifically for this environment, the
762+
concretizer log must be inspected to ensure that all packages being built are built with the Nvidia compiler (``%nvhpc``) except for those described at the beginning of this section.
763+
764+
.. code-block:: console
765+
766+
spack concretize 2>&1 | tee log.concretize
767+
${SPACK_STACK_DIR}/util/show_duplicate_packages.py -d [-c] log.concretize
768+
spack install [--verbose] [--fail-fast] 2>&1 | tee log.install
769+
770+
13. Create tcl module files (replace ``tcl`` with ``lmod`` if you have manually installed lmod)
771+
772+
.. code-block:: console
773+
774+
spack module tcl refresh
775+
776+
14. Create meta-modules for compiler, mpi, python
777+
778+
.. code-block:: console
779+
780+
spack stack setup-meta-modules
781+
782+
15. You now have a spack-stack environment that can be accessed by running ``module use ${SPACK_STACK_DIR}/envs/jedi-mpas-nvidia-env/install/modulefiles/Core``. The modules defined here can be loaded to build and run code as described in :numref:`Section %s <UsingSpackEnvironments>`.

doc/source/PreConfiguredSites.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -521,9 +521,9 @@ The following is required for building new spack environments and for using spac
521521
.. code-block:: console
522522
523523
module purge
524-
module use /scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles
525-
module load miniconda/3.9.12
526-
module load ecflow/5.5.3
524+
module use /scratch1/NCEPDEV/nems/role.epic/modulefiles
525+
module load miniconda3/4.12.0
526+
module load ecflow/5.8.4
527527
528528
For ``spack-stack-1.7.0`` with Intel, proceed with loading the following modules:
529529

@@ -631,6 +631,18 @@ For ``spack-stack-1.7.0``, run:
631631
module load stack-openmpi/5.0.1
632632
module load stack-python/3.10.13
633633
634+
.. _Preconfigured_Sites_Tier2:
635+
636+
=============================================================
637+
Pre-configured sites (tier 2)
638+
=============================================================
639+
640+
Tier 2 preconfigured site are not officially supported by spack-stack. As such, instructions for these systems are provided in form of a `README.md` in the site directory or may not be available. Also, these site configs are not updated on the same regular basis as those of the tier 1 systems and therefore may be out of date and/or not working.
641+
642+
The following sites have site configurations in directory `configs/sites/`:
643+
- TACC Frontera (`configs/sites/frontera/`)
644+
- AWS Single Node with Nvidia (NVHPC) compilers (`configs/sites/aws-nvidia/`)
645+
634646
.. _Configurable_Sites_CreateEnv:
635647

636648
========================

spack

0 commit comments

Comments
 (0)