Skip to content

spack chain#8772

Merged
becker33 merged 93 commits intospack:developfrom
scheibelp:features/chain-limited
Mar 27, 2019
Merged

spack chain#8772
becker33 merged 93 commits intospack:developfrom
scheibelp:features/chain-limited

Conversation

@scheibelp
Copy link
Copy Markdown
Member

@scheibelp scheibelp commented Jul 21, 2018

@gartung @paul-chelarescu @amundson

See also: #8545

This attempts to connect a local Spack instance to another "upstream" instance, exactly like #8545 (and uses many ideas from it). To use this, see #8772 (comment)

The primary goal of this PR, quoting from #8772 (comment):

Our use-case for this would be having a "system" instance of Spack maintained by our team that could be "extended" by our users. We would like the "user" instance of Spack to use installed packages in the "system" instance whenever possible.

The implementation differs from #8545: it tries to interact only with the upstream database, and otherwise avoids parsing upstream config files with the local Spack instance. For example it does not maintain a parent_layout analogous to parent_db: it assigns Spec prefixes based on the paths stored in install records and avoids all interactions with the directory layouts of upstream Spack instances.

Similarly, this PR adds a capability for upstream Spack instances to generate a module index file that associates installed packages with their generated module files. This is used for spack module tcl find etc. to locate module files. I think this approach will be more resilient to version drift between a local/upstream spack instance. Currently, the module file index is implicitly generated when doing e.g. spack module tcl refresh.

(EDIT 11/21: originally the plan to handle module files for upstream packages was to invoke the upstream Spack binary to locate them but that ended up being problematic outside of environment setup scripts. The module indices now correctly locate upstream module files for Spack commands, but do not currently handle autoloading module file generation schemes)

This PR also tries to be careful about not writing any files to upstream database, including:

  • Don't acquire locks on upstream DBs
  • Don't attempt to reindex upstream DBs if an issue is detected

TODOs (updated 9/7):

  • (new: 11/27) update spack uninstall -a to uninstall all specs from the local Spack instance (but not attempt to uninstall any upstream Spack installs)
  • (pending review) (new: 8/24) update the csh environment script to retrieve modules from upstream Spack instances in the same manner setup-env.sh
  • (new: 9/7) add a test to check that in the simple case where a Spack instance depends on a single upstream instance and a dependency is uninstalled from the upstream instance, that this is reported (e.g. when using spack find)
  • Raise an exception if an upstream DB attempts to acquire a lock
  • Raise an exception if local and upstream DB versions don't match
  • Handle loading of modules for packages that are installed upstream
  • (not essential) (new: 8/23) update spack reindex to set path on InstallRecords so that upstream-installed externals can be used for older Spack instances (UPDATE) there is not actually sufficient information available for the downstream instance to handle this: Without extra information added to InstallRecord, the downstream Spack instance would need to examine the configuration files of the upstream instance to find the externals (which this PR avoids). This PR adds the external path to the InstallRecord, so the external package is available; external packages "look the same as" Spack-installed packages from the perspective of a downstream Spack instance.
  • (handled 11/26) Determine whether to support the following case: given databases X, Y, and Z, where X has Y and Z as an upstream, allow Y to use Z as an upstream (if that's desired, it isn't currently supported)
  • (in a future PR) (new: 11/21) handle autoloading module generation schemes where dependencies are associated with upstream Spack instances

@scheibelp

This comment has been minimized.

@scheibelp
Copy link
Copy Markdown
Member Author

scheibelp commented Jul 21, 2018

I thought I'd add an example of how you edit config.yaml to include upstream instances but that interferes with doc generation and testing so I'll add an example here:

UPDATED 12/12: upstream configuration has been moved to its own config file called upstreams.yaml; each upstream Spack instance includes a name specification; spack-install-prefix has been replaced with install_tree to match the variable as specified in config.yaml for the local Spack install. The new upstreams.yaml config file looks like:

upstreams:
  test-a:
    install_tree: /g/g17/scheibel/repos/test-upstream-spack/spack/opt/spack
    modules:
      tcl: /g/g17/scheibel/repos/test-upstream-spack/share/spack/modules

(OLD) UPDATED 11/21: spack-binary has been removed - it was not necessary. Specification of module roots is now combined with specification of upstream installations to ensure that order matches. For a given upstream Spack instance, a given module type (e.g. tcl) can now have at most one associated module root.

config:
  upstream_spack_installations:
    - spack-install-prefix: /g/g17/scheibel/repos/test-upstream-spack/spack/opt/spack
      modules:
        tcl: /g/g17/scheibel/repos/test-upstream-spack/share/spack/modules

(OLD) UPDATED 8/23: specification of upstream_spack_installations has been updated in the config.yaml. It is now a list of dictionaries; each dictionary has the properties spack-binary and spack-install-prefix. The group of spack installations is now a list to ensure that they are searched in the same order each time.

config:

  upstream_spack_installations:
    - spack-binary: /g/g17/scheibel/repos/test-upstream-spack/bin/spack
      spack-install-prefix: /g/g17/scheibel/repos/test-upstream-spack/spack/opt/spack

  # THIS IS THE SAME AS BEFORE - FROM 8/10
  upstream_module_roots:
    tcl:
      - /g/g17/scheibel/repos/test-upstream-spack/share/spack/modules

(OLD) UPDATED 8/10: originally the upstream installations config variable was list of Spack install directories. Now it is a dictionary of Spack binary to installation directory. The binary is needed to compute module file locations for upstream Spack instances. It is not strictly required to specify the install directory along with the Spack binary for current versions of Spack, since you could have Spack print out the install location. That being said I don't know how far back this support goes, and in general I don't want to invoke several instances of Spack each time I construct the Spack database.

config:
  upstream_spack_installations:
    /path/to/spack/bin/spack: /path/to/spack/opt/spack/

  # This is required if you want the upstream Spack modules files to be in MODULE_PATH
  upstream_module_roots:
    tcl:
      - /path/to/spack/share/spack/modules/

OLD:

config:
  # For now, you specify the path up to where the DB is stored (everything
  # before .spack-db)
  upstream_spack_installations:
    - /path/to/spack/opt/spack

EDIT: it also appears that if there is no upstream_spack_installations entry in config.yaml, that Spack will generally fail - definitely a bug with this PR! I'll be able to tackle that on 7/30. (Now handled)

@paul-chelarescu
Copy link
Copy Markdown
Contributor

Hello, I've tested this feature with two installations of spack and two packages, installation 1 with package A and installation 2 with package B -> A and I can confirm that in the second installation spack will only build package B. However I have a case where I point spack to a directory with many pre-built packages and then try to build package C which depends on all of those. Spack will proceed to install every dependency of C again. I am not sure why I get this in the second scenario and I wanted to point it out and that there may be something that is still missing from this feature.

@scheibelp
Copy link
Copy Markdown
Member Author

Hi @paul-chelarescu, as a sanity check: did you confirm that C depended on the exact same instance of B and A (i.e. that the dependency hashes were the same)? I think spack spec gives you an option to output the DAG hash (apologies but I'm not in a spot to run spack spec --help to make sure of that right now).

If the dependency hashes do match (i.e. they exist in the upstream DB and aren't being used) would it be possible to attach the package.py files for A/B/C? I've had luck for at least one case of A->B->C, namely sqlite, where all of it's dependencies were installed upstream (so perhaps it is clear how your case differs from sqlite and you can just mention that vs. including the package.py files).

I'll have a chance to experiment with this on 7/30

@amundson
Copy link
Copy Markdown

amundson commented Jul 26, 2018 via email

@scheibelp
Copy link
Copy Markdown
Member Author

How hard is it to set up a real test suite implementation of something like this?

I anticipate it will be somewhat challenging but I should be able to sort it out by the end of this week. I think either mock packages provided for Spack testing (which makes it easy to create dependency DAGs) or the mock testing repository (which includes packages which actually install) will be useful. More details on that soon.

@paul-chelarescu
Copy link
Copy Markdown
Contributor

paul-chelarescu commented Aug 1, 2018

@scheibelp I think I have a better clue why C was installing every dependency again, the .spack-db in the upstream was created using an older version of spack (tags/v0.11.0). There are None objects
passed into cmake because of these differences.

  >> 20    CMake Error at /cvmfs/sft.cern.ch/lcg/views/LCG_93/x86_64-slc6-gcc62-opt/cmake/DD4hepConfig.cmake:32 (include):
     21      include could not find load file:
     22    
     23        None/cmake/DD4hep.cmake

After your last commit which looks at the spec hash, spack will correctly link against a few of the packages there, but there are some which will fail because of the differences in the database versions. I will try to make another upstream installation with a newer version of spack, but this points out at least that this method is not backwards compatible with older versions of spack installations.

@scheibelp scheibelp changed the title [WIP] spack chain considerations [WIP] spack chain Aug 2, 2018
@scheibelp
Copy link
Copy Markdown
Member Author

I've now added tests to check the following:

  • Spec.concretize appropriately sets _installed_upstream on packages (when they are in an upstream DB)
  • Packages that are registered in an upstream database are not installed locally
  • Upstream packages have the expected prefix (e.g. vs. using the directory layout of the local Spack install)
  • Upstream packages are not added to the local DB

I'm going to update the PR description with remaining issues.

this points out at least that this method is not backwards compatible with older versions of spack installations.

Thanks for testing that! I have not checked what happens when using a different DB version. Minimally I should add an explicit check to assert that the upstream DB is the same version (vs. failing silently).

…pstream DBs (the error message for this should never be seen by an end user, only of there is an internal Spack error)
@scheibelp scheibelp force-pushed the features/chain-limited branch from fb31880 to e643681 Compare August 7, 2018 02:16
@paul-chelarescu
Copy link
Copy Markdown
Contributor

paul-chelarescu commented Aug 8, 2018

@scheibelp I have generated another spack installation in my upstream using a newer version of spack (commit c79cd5f) and ran this feature locally pointing at it. Every package is correctly flagged as being installed upstream, except for one of them:

==> Installing gaudi
==> Error: RuntimeError: Unable to locate python command in None/bin

/build/paul/spack/var/spack/repos/builtin/packages/python/package.py:419, in command:
        416            command = 'python'
        417        else:
        418            msg = 'Unable to locate {0} command in {1}'
  >>    419            raise RuntimeError(msg.format(self.name, self.prefix.bin))
        420
        421        # The python command may be a symlink if it was installed
        422        # with Homebrew. Since some packages try to determine the

I am trying to investigate the differences between the local installation of this package, which I have been able to link against, and this one above installed upstream which is failing. There are differences between the two index.json databases but it is not clear to me why and what difference exactly is causing this instance to pick up a None object.

Edit:
It appears that gaudi is the only package which, in my case, when installed upstream, has a different hash than when installed locally. Spack will therefore try to install it again. That should cause no problem if self.prefix.bin weren't None/bin, so there is clearly something wrong happening.

Edit2:
The reason spack cannot find the installation directory of python is that python is externally installed in this case, and the prefix path appears as None. When spack sees the spec for gaudi has a different hash in upstream, it attempts to build it and fails like above. I have replicated this locally by creating a spack installation in one directory A with package gaudi depending on externally installed python, manually changing the hash of this package inside .spack-db/index.json and then using a spack instance in directory B pointing to A as its upstream. This latter instance will recognize the difference in hashes, attempt to build the package and fail because it can't resolve the external dependency python. Alternatively, the same can be achieved it I remove this package from directory A. It appears that packages which are externally installed inside an upstream instance of spack will not be picked up by this feature.

…ernal_path' to get path of upstream externals
@scheibelp scheibelp reopened this Mar 9, 2019
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp one point in addition to the tests we've discussed.

def metadata_path(self, spec):
return os.path.join(self.path_for_spec(spec), self.metadata_dir)
if self.check_upstream and spec.package.installed_upstream:
# TODO: This assumes that older spack versions use the same
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata dir is configurable on the YamlDirectoryLayout object. I think we should remove that option (no code appears to use it) as part of this PR since we depend on it not being used in the future.

scheibelp and others added 10 commits March 15, 2019 12:34
…(not just for upstream installs); add a test for querying a downstream DB when a package is installed in both the upstream and downstream DB
…s; this follows several other hardcoded variables which must be consistent between upstream and downstream DBs. Also update DirectoryLayout.metadata_path to work entirely with Spec.prefix, since Spec.prefix is set from the DB when available (so metadata_path was duplicating that logic)
@scheibelp
Copy link
Copy Markdown
Member Author

#8772 (comment) and #8772 (comment) are now addressed. Also, note that 5c4330b is addressed by beb9a31.

I decided as part of this that YamlDirectoryLayout should defer entirely to Spec.prefix when constructing the metadata path, which meant a test in directory_layout had to be updated to update spack.store.layout (see the layout_and_dir fixture update in 2e39fdf).

Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

@becker33 becker33 merged commit 99f35c3 into spack:develop Mar 27, 2019
@KineticTheory
Copy link
Copy Markdown
Contributor

I'm really excited to see this merged and I am ready to try it again. I know the yaml files changed format several times during the development process. Is there an official documentation location yet? I didn't see anything about spack chain at https://spack.readthedocs.io.

@gartung
Copy link
Copy Markdown
Member

gartung commented Mar 27, 2019

@chissg @drbenmorgan

@greenc-FNAL
Copy link
Copy Markdown
Member

greenc-FNAL commented Mar 28, 2019 via email

@sknigh
Copy link
Copy Markdown

sknigh commented Mar 28, 2019

@scheibelp I want to try this out, is any documentation available?

@scheibelp
Copy link
Copy Markdown
Member Author

@sknigh

I want to try this out, is any documentation available?

not in the actual docs, it is all here for now (the PR description and #8772 (comment)). Although docs are definitely worth adding and I can add a PR for something official in the next few days.

scheibelp added a commit that referenced this pull request Apr 10, 2019
Add documentation for the Spack chain feature added in #8772
scheibelp added a commit that referenced this pull request Apr 19, 2019
#11152 added documentation for #8772 but some details were based on
an earlier implementation that had changed by the time #8772 was
merged. In particular, #11152 mentioned that upstream Spack instances
were configured in config.yaml, when in fact they should be placed in
a separate upstreams.yaml config file; this PR updates the
documentation accordingly.
@tldahlgren tldahlgren mentioned this pull request Oct 22, 2019
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.