Skip to content

cray platform: detect platform properly after module purge#12989

Merged
becker33 merged 52 commits intodevelopfrom
bugfix/cray-no-module-preloaded
May 5, 2020
Merged

cray platform: detect platform properly after module purge#12989
becker33 merged 52 commits intodevelopfrom
bugfix/cray-no-module-preloaded

Conversation

@becker33
Copy link
Copy Markdown
Member

@becker33 becker33 commented Oct 1, 2019

Currently, we assume the cray platform can be identified by the default modules

We have discovered a cray machine on which the default modules are No modules loaded.

This caused some problems for Spack's platform detection and OS setup. This PR allows Spack to persevere despite those problems. In the worst case in which the cle_release files have been removed from the system, we will continue with os=cnlunknown instead of failing immediately.

NOTE 2019-10-01: The scope of this PR has expanded to include a set of hotfixes found/used at an internal hackathon at LLNL.

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, based on testing this PR on Cori after a module purge.

@chuckatkins
Copy link
Copy Markdown

The CCE changes look like they're based on the assumption that v9.0 and greater are using the new clang interface. This isn't necessarily the case as the the cce-legacy module can be used that still preserves the older interface with newer versions of cce. So far in CMake we've been able to treat the new cce interface as proper clang and not make the distinction between clang and cray-clang. That has not been the case for other compilers like XL where we explicitly have both XL and XLClang compiler id's, but so far that hasn't been an issue for cce and we can just have Cray for the older legacy interface and have the new one use the same Clang interface as vanilla clang.

@chuckatkins
Copy link
Copy Markdown

I suppose this goes back to a previous discussion we had regarding the cray detection. My question would be whether or not spack even should be detecting the cray platform if no modules are loaded? If no modules are loaded then should the machine not be treated as vanilla linux for the front end?

@becker33 becker33 force-pushed the bugfix/cray-no-module-preloaded branch from d77406e to 59ed98b Compare February 26, 2020 19:50
@becker33
Copy link
Copy Markdown
Member Author

becker33 commented Apr 7, 2020

@tgamblin I think I've addressed all of your review comments.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have some requests here -- the os.environ thing needs another tweak.

This method takes advantage of compiler modules to be cross-platform.
"""
# store environment to replace later
backup_env = os.environ
Copy link
Copy Markdown
Member

@tgamblin tgamblin Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't work because backup_env is a reference, not a copy. You want:

backup_env = os.environ.copy()

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33: needs one more pass.

modifications) to enable the compiler to run properly on any platform.
"""
# store environment to replace later
backup_env = os.environ.copy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wrong 😢

Can you please add a test for this?

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33: nice! LGTM.

@becker33 becker33 merged commit dd3762d into develop May 5, 2020
@adamjstewart
Copy link
Copy Markdown
Member

This PR broke Spack support on Blue Waters:

$ spack help
Traceback (most recent call last):
  File "/projects/eot/bbcj/stewart1/spack/bin/spack", line 64, in <module>
    sys.exit(spack.main.main())
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/main.py", line 767, in main
    if spack.config.get('config:debug'):
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/config.py", line 671, in get
    return config.get(path, default, scope)
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/llnl/util/lang.py", line 552, in __getattr__
    return getattr(self.instance, name)
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/llnl/util/lang.py", line 548, in instance
    self._instance = self.factory()
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/config.py", line 653, in _config
    _add_platform_scope(cfg, ConfigScope, name, path)
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/config.py", line 606, in _add_platform_scope
    platform = spack.architecture.platform().name
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/llnl/util/lang.py", line 178, in _memoized_function
    func.cache[args] = func(*args)
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/architecture.py", line 517, in platform
    return platform_cls()
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/platforms/cray.py", line 49, in __init__
    for target in self._avail_targets():
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/platforms/cray.py", line 188, in _avail_targets
    craype_targets = target_names_from_modules(craype_modules)
  File "/mnt/b/projects/eot/bbcj/stewart1/spack/lib/spack/spack/platforms/cray.py", line 164, in target_names_from_modules
    for mod in modules:
TypeError: 'NoneType' object is not iterable

@adamjstewart
Copy link
Copy Markdown
Member

adamjstewart commented May 9, 2020

Hopefully this is useful debugging information:

$ module avail -t craype-
/u/sciteam/stewart1/spack/share/spack/modules/cray-cnl5-interlagos:
/sw/EasyBuild/modules/all:
/sw/bw/modulefiles:
/sw/xe/modulefiles:
/usr/local/modulefiles:
/opt/cray/craype/2.5.16/modulefiles:
craype-abudhabi
craype-abudhabi-cu
craype-accel-host
craype-accel-nvidia20
craype-accel-nvidia35
craype-accel-nvidia52
craype-accel-nvidia60
craype-barcelona
craype-hugepages128K
craype-hugepages16M
craype-hugepages2M
craype-hugepages512K
craype-hugepages64M
craype-hugepages8M
craype-interlagos
craype-interlagos-cu
craype-istanbul
craype-mc12
craype-mc8
craype-network-gemini
craype-network-none
craype-shanghai
/opt/cray/modulefiles:
craype-installer/1.16.2
craype-installer/1.17.0
craype-installer/1.18.0
craype-installer/1.20.0
craype-installer/1.24.2
craype-installer/1.24.3(default)
/opt/modulefiles:
/opt/cray/gem/modulefiles:
$ ls /opt/cray/pe/craype/default/modulefiles
ls: cannot access /opt/cray/pe/craype/default/modulefiles: No such file or directory

The problem is the second check for modules_from_listdir returns None. We should check whether available_craype_modules is None before trying to execute it.

@adamjstewart
Copy link
Copy Markdown
Member

adamjstewart commented May 9, 2020

P.S. Quick workaround for anyone else who encounters this:

diff --git a/lib/spack/spack/platforms/cray.py b/lib/spack/spack/platforms/cray.py
index 2fccf2fe5..b036a507f 100644
--- a/lib/spack/spack/platforms/cray.py
+++ b/lib/spack/spack/platforms/cray.py
@@ -181,7 +181,7 @@ def modules_from_listdir():
         if getattr(self, '_craype_targets', None) is None:
             strategies = [
                 lambda: modules_in_output(module('avail', '-t', 'craype-')),
-                modules_from_listdir
+                #modules_from_listdir
             ]
             for available_craype_modules in strategies:
                 craype_modules = available_craype_modules()

@adamjstewart
Copy link
Copy Markdown
Member

Also, note that before this PR, I got:

$ spack arch
cray-cnl5-interlagos

but now I only get:

$ spack arch
cray-cnl5-x86_64

so I think this also broke microarchitecture detection. That'll probably require someone else to debug.

@adamjstewart
Copy link
Copy Markdown
Member

This PR also broke module loading in compilers.yaml. In my compilers.yaml, I have:

- compiler:
    paths:
      cc: cc
      cxx: CC
      f77: ftn
      fc: ftn
    operating_system: cnl5
    target: any
    modules:
    - PrgEnv-gnu
    - gcc/5.3.0
    environment: {}
    extra_rpaths: []
    flags: {}
    spec: [email protected]

This used to work fine. After updating to develop, I see an error message during installation:

configure: error: C compiler cannot create executables

When I inspect the config.log, it complains that there is no PrgEnv-* module loaded. If I explicitly load the same PrgEnv-gnu module in my job script, things succeed, but I later see errors like:

/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h: In function 'unsigned int __crc32b(unsigned int, unsigned char)':
/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h:63:39: error: '__builtin_ia32_crc32qi' was not declared in this scope
   return __builtin_ia32_crc32qi (__C, __V);
                                       ^
/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h: In function 'unsigned int __crc32w(unsigned int, short unsigned int)':
/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h:70:39: error: '__builtin_ia32_crc32hi' was not declared in this scope
   return __builtin_ia32_crc32hi (__C, __V);
                                       ^
/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h: In function 'unsigned int __crc32d(unsigned int, unsigned int)':
/opt/gcc/5.3.0/snos/lib/gcc/x86_64-suse-linux/5.3.0/include/ia32intrin.h:77:39: error: '__builtin_ia32_crc32si' was not declared in this scope
   return __builtin_ia32_crc32si (__C, __V);
                                       ^

I've never seen this before, but I'm guessing that it's because Spack is unsetting the LD_LIBRARY_PATH from the module. For now I'm just going to locally revert this commit so I can build the software I need to finish my final project.

bash = Executable('/bin/bash')
output = bash(
'-lc', 'echo $CRAY_CPU_TARGET',
'--norc', '--noprofile', '-lc', 'echo $CRAY_CPU_TARGET',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33 I am a bit late with this but what output do you expect from this command? The environment is clean and bash is called in the login mode with --noprofile. Doesn't this mean that the output is always empty?

@haampie haampie deleted the bugfix/cray-no-module-preloaded branch August 2, 2022 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants