You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add template for jedi-mpas-nvidia and documentatio for setting up environment (#1084)
* First version of configs/templates/jedi-mpas-nvidia-dev template
* Add pkg-config to list of excluded lua/tcl modules
* Update configs/sites/noaa-gcloud/README.md: add R2D2 scrubber if applicable
* Add tier-2 section back in doc/source/PreConfiguredSites.rst
* Update submodule pointer for spack
* Update path to modulefiles on Hera
* Add a new section to doc/source/NewSiteConfigs.rst specifically for building the jedi-mpas-nividia environment with the Nvidia compilers
Co-authored-by: Francois Hebert <[email protected]>
---------
Co-authored-by: RatkoVasic-NOAA <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Francois Hebert <[email protected]>
Copy file name to clipboardExpand all lines: configs/sites/noaa-gcloud/README.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,7 @@ yum install -y xorg-x11-apps
18
18
yum install -y perl-IPC-Cmd
19
19
yum install -y gettext-devel
20
20
yum install -y m4
21
+
yum install -y finger
21
22
exit
22
23
23
24
# Create a script that can be added to the cluster resource config so that these packages get installed automatically
@@ -37,10 +38,17 @@ yum install -y xorg-x11-apps
37
38
yum install -y perl-IPC-Cmd
38
39
yum install -y gettext-devel
39
40
yum install -y m4
41
+
yum install -y finger
40
42
EOF
41
43
42
44
chmod a+x /contrib/admin/basic_setup.sh
43
45
46
+
# Enable R2D2 experiment scrubber in cron (if applicable)
47
+
48
+
Refer to https://github.com/JCSDA-internal/jedi-tools/tree/develop/crontabs/noaa-gcloud
49
+
50
+
The scripts are all set up in the /contrib space and should work after a restart of the cluster. However, any updates to R2D2 that require changes to the scrubber scripts need to be made!
51
+
44
52
# Create a mysql config for local R2D2 use (if applicable)
@@ -33,6 +35,9 @@ It is also instructive to peruse the GitHub actions scripts in ``.github/workflo
33
35
Note that ``[email protected]`` compiler versions are fully supported, and ``[email protected]`` will work but requires the :ref:`workaround noted below<apple-clang-15-workaround>`.
34
36
Also, when using ``[email protected]`` you must use Command Line Tools version 15.1, and the Command Line Tools versions 15.3 and newer are not yet supported.
35
37
38
+
.. [#fn3]
39
+
Support for Nvidia compilers is experimental and limited to a subset of packages. Please refer to :numref:`Section %s <NewSiteConfigs_Linux_CreateEnv_Nvidia>` below.
40
+
36
41
.. _NewSiteConfigs_macOS:
37
42
38
43
------------------------------
@@ -419,6 +424,8 @@ The following instructions were used to prepare a basic Red Hat 8 system as it i
419
424
420
425
This environment enables working with spack and building new software environments, as well as loading modules that are created by spack for building JEDI and UFS software.
421
426
427
+
.. _NewSiteConfigs_Linux_Ubuntu_Prerequisites:
428
+
422
429
Prerequisites: Ubuntu (one-off)
423
430
-------------------------------------
424
431
@@ -473,6 +480,8 @@ The following instructions were used to prepare a basic Ubuntu 20.04 or 22.04 LT
473
480
474
481
This environment enables working with spack and building new software environments, as well as loading modules that are created by spack for building JEDI and UFS software.
475
482
483
+
.. _NewSiteConfigs_Linux_CreateEnv:
484
+
476
485
Creating a new environment
477
486
--------------------------
478
487
@@ -610,3 +619,164 @@ See the :ref:`documentation <Duplicate_Checker>` for usage information including
610
619
spack stack setup-meta-modules
611
620
612
621
15. You now have a spack-stack environment that can be accessed by running ``module use ${SPACK_STACK_DIR}/envs/unified-env.mylinux/install/modulefiles/Core``. The modules defined here can be loaded to build and run code as described in :numref:`Section %s <UsingSpackEnvironments>`.
622
+
623
+
624
+
.. _NewSiteConfigs_Linux_CreateEnv_Nvidia:
625
+
626
+
Creating a new environment with Nvidia compilers
627
+
------------------------------------------------
628
+
629
+
.. warning::
630
+
Support for Nvidia compilers is experimental and limited to a small subset of packages of the unified environment. The Nvidia compilers are known for their bugs and flaws, and many packages simply don't build. The strategy for building environments with Nvidia is therefore the opposite of what it is with other supported compilers.
631
+
632
+
In order to build environments with the Nvidia compilers, a different approach is needed than for our main compilers (GNU, Intel). Since many packages do not build with the Nvidia compilers, the idea is to provide as many packages as possible as external packages or build them with ``gcc``. Because our spack extension ``spack stack setup-meta-modules`` does not support combiniations of modules built with different compilers, packages not being built with the Nvidia compilers need to fulfil the two following criteria:
633
+
634
+
1. The package is used as a utility to build or run the code, but not linked into the application (this may be overly restrictive, but it ensures that the application will be able to leverage all of Nvidia's features, for example run on GPUs).
635
+
636
+
2. One of the following applies:
637
+
638
+
a. The package is installed outside of the spack-stack environment and made available as an external package. A typical use case is a package that is installed using the OS package manager.
639
+
640
+
b. The package is built with another compiler (typically ``gcc``) within the same environment, and no modulefile is generated for the package. The spack modulefile generator in this case ensures that other packages that depend on this particular package have the necessary paths in their own modules. If the ``gcc`` compiler itself requires additional ``PATH``, ``LD_LIBRARY_PATH``, ... variables to be set, then these can be set in the spack compiler config for the Nvidia compiler (similar to how we configure the ``gcc`` backend for the Intel compiler).
641
+
642
+
With all of that in mind, the following instructions were used on an Amazon Web Services EC2 instance running Ubuntu 22.04 to build an environment based on template ``jedi-mpas-nvidia-dev``. These instructions follow the one-off setup instructions in :numref:`Section %s <NewSiteConfigs_Linux_Ubuntu_Prerequisites>` and replace the instructions in Section :numref:`Section %s <NewSiteConfigs_Linux_CreateEnv>`.
643
+
644
+
1. Follow the instructions in :numref:`Section %s <NewSiteConfigs_Linux_Ubuntu_Prerequisites>` to install the basic packages. In addition, install the following packages using `apt`:
645
+
646
+
.. code-block:: console
647
+
648
+
sudo su
649
+
apt update
650
+
apt install -y cmake
651
+
apt install -y pkg-config
652
+
exit
653
+
654
+
2. Download the latest version of the Nvidia HPC SDK following the instructions on the Nvidia website. For ``[email protected]``:
echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | sudo tee /etc/apt/sources.list.d/nvhpc.list
660
+
sudo su
661
+
apt update
662
+
apt-get install -y nvhpc-24-3
663
+
exit
664
+
665
+
3. Load the correct module shipped with ``nvhpc-24-3``. Note that this is only required for ``spack`` to detect the compiler and ``openmpi`` library during the environment configuration below. It is not required when using the new environment to compile code.
666
+
667
+
.. code-block:: console
668
+
module purge
669
+
module use /opt/nvidia/hpc_sdk/modulefiles
670
+
module load nvhpc-openmpi3/24.3
671
+
672
+
4. Clone spack-stack and its dependencies and activate the spack-stack tool.
# Sources Spack from submodule and sets ${SPACK_STACK_DIR}
680
+
source setup.sh
681
+
682
+
5. Create a pre-configured environment with the default (nearly empty) site config for Linux and activate it (optional: decorate bash prompt with environment name). At this point, only the ``jedi-mpas-nvidia-dev`` template is supported.
6. Temporarily set environment variable ``SPACK_SYSTEM_CONFIG_PATH`` to modify site config files in ``envs/jedi-mpas-nvidia-env/site``
691
+
692
+
.. code-block:: console
693
+
694
+
export SPACK_SYSTEM_CONFIG_PATH="$PWD/site"
695
+
696
+
7. Find external packages, add to site config's ``packages.yaml``. If an external's bin directory hasn't been added to ``$PATH``, need to prefix command.
697
+
698
+
.. code-block:: console
699
+
700
+
spack external find --scope system \
701
+
--exclude bison --exclude cmake \
702
+
--exclude curl --exclude openssl \
703
+
--exclude openssh --exclude python
704
+
spack external find --scope system wget
705
+
spack external find --scope system openmpi
706
+
spack external find --scope system python
707
+
spack external find --scope system curl
708
+
spack external find --scope system pkg-config
709
+
spack external find --scope system cmake
710
+
711
+
8. Find compilers, add to site config's ``compilers.yaml``
712
+
713
+
.. code-block:: console
714
+
715
+
spack compiler find --scope system
716
+
717
+
9. Unset the ``SPACK_SYSTEM_CONFIG_PATH`` environment variable
718
+
719
+
.. code-block:: console
720
+
721
+
unset SPACK_SYSTEM_CONFIG_PATH
722
+
723
+
10. Add the following block to ``envs/jedi-mpas-nvidia-env/spack.yaml`` (pay attention to the correct indendation, it should be at the same level as ``specs:``):
11. If you have manually installed lmod, you will need to update the site module configuration to use lmod instead of tcl. Skip this step if you followed the Ubuntu instructions above.
753
+
754
+
.. code-block:: console
755
+
756
+
sed -i 's/tcl/lmod/g' site/modules.yaml
757
+
758
+
12. Process the specs and install
759
+
760
+
It is recommended to save the output of concretize in a log file and inspect that log file using the :ref:`show_duplicate_packages.py <Duplicate_Checker>` utility.
761
+
This is done to find and eliminate duplicate package specifications which can cause issues at the module creation step below. Specifically for this environment, the
762
+
concretizer log must be inspected to ensure that all packages being built are built with the Nvidia compiler (``%nvhpc``) except for those described at the beginning of this section.
spack install [--verbose] [--fail-fast] 2>&1 | tee log.install
769
+
770
+
13. Create tcl module files (replace ``tcl`` with ``lmod`` if you have manually installed lmod)
771
+
772
+
.. code-block:: console
773
+
774
+
spack module tcl refresh
775
+
776
+
14. Create meta-modules for compiler, mpi, python
777
+
778
+
.. code-block:: console
779
+
780
+
spack stack setup-meta-modules
781
+
782
+
15. You now have a spack-stack environment that can be accessed by running ``module use ${SPACK_STACK_DIR}/envs/jedi-mpas-nvidia-env/install/modulefiles/Core``. The modules defined here can be loaded to build and run code as described in :numref:`Section %s <UsingSpackEnvironments>`.
Tier 2 preconfigured site are not officially supported by spack-stack. As such, instructions for these systems are provided in form of a `README.md` in the site directory or may not be available. Also, these site configs are not updated on the same regular basis as those of the tier 1 systems and therefore may be out of date and/or not working.
641
+
642
+
The following sites have site configurations in directory `configs/sites/`:
643
+
- TACC Frontera (`configs/sites/frontera/`)
644
+
- AWS Single Node with Nvidia (NVHPC) compilers (`configs/sites/aws-nvidia/`)
0 commit comments