Skip to content

Implemented 'stack' command#7081

Closed
obreitwi wants to merge 6 commits intospack:developfrom
electronicvisions:feature/stack_cmd_symlink
Closed

Implemented 'stack' command#7081
obreitwi wants to merge 6 commits intospack:developfrom
electronicvisions:feature/stack_cmd_symlink

Conversation

@obreitwi
Copy link
Copy Markdown
Member

@obreitwi obreitwi commented Jan 26, 2018

Stack the local spack repository on top of a remote repository that contains pre-built packages, thereby avoiding building packages twice.

Essentially, this adds installed packages in remote read-only spack installations (on the same system) to the current spack repository.

Motivation:
In our group we have a set of pre-built packages that reside in their own spack repository and are available system-wide in a read-only fashion. Up until now there seemed to be no "proper" way to use these packages as dependencies for locally built specs in a seperate spack repository.

Especially, this is useful when debugging new package.pys against the installed set of packages. Previously the whole spack database had to be built a second time.

Implementation:
We symlink all installed specs from the remote repository into the local opt/spack-path and reindex.
Because everything is linked via RPATHs, the remote package will have to reside where they are (hence no option to use hardlinks right now). When compiling packages in the local spack repository, the RPATHs might point to symlinks, however, I do not expect this to be an issue.

Comments?

Stack the current spack repository on top of a remote repository that
contains pre-built packages.

Essentially, this adds installed packages in remote read-only spack
installations (on the same system) to the current spack repository.

Motivation:
In our group we have a set of pre-built packages that reside in their
own spack repository and are available system-wide in a read-only
fashion. Up until now there seemed to be no "proper" way to use these
packages as dependencies for locally built specs in a seperate spack
repository.
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jan 26, 2018

In our group we have a set of pre-built packages that reside in their own spack repository and are available system-wide in a read-only fashion. Up until now there seemed to be no "proper" way to use these packages as dependencies for locally built specs in a seperate spack repository.

We have exactly the same problem at my site, but opted to use relocation of binaries instead of sym-linking software or stacking the db. To foster the discussion, what we have in mind is:

  1. as part of deployment, build a mirror that contains binary caches of the software that is installed system wide (and made available via modules)

  2. expose the .spack/spec.yaml of the various packages via module files (i.e. we'll set an env variable for that)

  3. explain to users how to set up their local copy of Spack to point to our compilers + mirrors, and tell them to install from spec file

In practice a user should do something like this:

# 1. Clone Spack and activate shell support
$ git clone https://github.com/spack/spack.git

# 2. Symlink or copy part of our system configuration (modules.yaml, packages.yaml, compilers.yaml)
# 3. Load the module(s) they want to re-use or modify
$ module load gcc hdf5

# 4. Reproduce the installation via spec.yaml
$ spack install --use-cache -f ${HDF5_SPEC_YAML}

This won't exactly reuse the same software installed on the system but, on the other hand, it provides more isolation from it. For instance, you could decide for any reason to remove a package and that won't affect somebody that installed it in his own instance of Spack.

* updated section and level
level = "long" # TODO: re-check what 'level' is supposed to mean


def setup_parser(sp):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really do like your 2 letter variable names^^

I feel like there should be a "check remote" command, it should essentially just check the existence of the original file for each symlink. A poor-man implementation of this might be a simple: Remove all links and perform again. There should also probably be a section on "what probably happened if stuff randomly breaks" in the help section

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. in what way would it be different from just re-stacking? All symlinks with the same hash will be (deleted and) re-created.

I have yet to check what happends during reindexing the repository with dangling symlinks.

Maybe it would be better to add a "remove all dangling symlinks"-phase to reindex instead?

setup_parser.parser = sp

sp.add_argument(
'-v', '--verbose', action='store_true', default=False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: should there be a difference between -d and -v?

"""
config = spack.config.get_config("config")

# NOTE: This has to be kept in sync with spack/store.py!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a description of why this is necessary? Come to think of it: Why is it necessary at all? Or are you referring to the fact that the paths in the remote must look the same as in the local spack?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to create the directory layout the same way as the default spack - so if there ever was switch away from YamlDirectoryLayout this file has to be kept in sync.

# NOTE: This has to be kept in sync with spack/store.py!
layout = spack.directory_layout.YamlDirectoryLayout(
canonicalize_path(osp.join(remote, 'opt', 'spack')),
hash_len=config.get('install_hash_length'),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would use the hash_len of the current spack installation. What happens if the remote one has a different setting?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are on the same machine so I suppose there would be system-wide hashlen-setting that gets loaded automatically via config-machinery. If someone cares enough to change the default he will care enough to make it system-wide, so it should not be our concern imho.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, but if someone did this it tell the user what's the problem and not have some weird Traceback ending somewhere deep inside spack

layout = spack.directory_layout.YamlDirectoryLayout(
canonicalize_path(osp.join(remote, 'opt', 'spack')),
hash_len=config.get('install_hash_length'),
path_scheme=config.get('install_path_scheme'))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise

if osp.exists(tgt):
if osp.islink(tgt):
os.remove(tgt)
else:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should probably be a flag on how to handle this case. In general it should still be a valid stacked spack if it has some of the remote's packages locally, but this might indicate that something is wrong -> allow for raising an error here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the hash is the same it should be the same installed spec, just in a different location, so we can just change the link and if the file is installed locally we do not overwrite it and print a warning - what would you want the switch to do, exactly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically just: always take existing link, always take new link

continue
fs.mkdirp(osp.dirname(tgt))
if verbose:
tty.debug("Linking {} -> {}".format(src, tgt))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this shouldn't be an info and debug information should be something like the tgt and src paths above

num_packages += 1

if verbose:
tty.info("Added {} packages from {}".format(num_packages, remote))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be reported, even if it's just to make sure that the user knows that something did happen.


spack.store.db.reindex(spack.store.layout)

if args.verbose:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

verbose=args.verbose)
for remote in args.remotes))

spack.store.db.reindex(spack.store.layout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on why this is necessary. Also add an optional info-message?

@obreitwi
Copy link
Copy Markdown
Member Author

@alalazo If I understand you correctly the user "only" copies/symlinks part of the configuration data and then builds the software on his own?

Or does each user copy the binaries? If so, wouldn't the RPATHs point to the old locations still, thereby breaking dependencies as soon as the binaries are removed from the cache?

I guess what I want the stack command to achieve is to have the remote repository act as your "mirror"/binary cache for free. As soon as something is removed from the remote repository the whole house of cards will come crumbling down, of course. All I want is to avoid users having to rebuild ~80 specs just because they want one additional spec that was not pre-built in the system-wide spack repository.

* removed `--verbose` argument

* changed all verbose statements to call `tty.debug`, left some info
  statements

* Added `-n`/`--no-stack-if-exists` argument that will ignore present
  symlinks if they exists. The default is to point all existing symlinks
  to the present remote repository.
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jan 29, 2018

If I understand you correctly the user "only" copies/symlinks part of the configuration data and then builds the software on his own?

Correct, but with two caveats:

  1. users can rebuild exactly the same spec I installed, using the -f option of spack install
  2. users will install from a binary cache, so they won't lose a lot of time building their stack

We are starting to experiment with this, and the idea is to have the workflow stable and in production in July. Note that Spack has already every feature needed to support it (and at the moment the experiments we are conducting are proceeding without issues).

If so, wouldn't the RPATHs point to the old locations still, thereby breaking dependencies as soon as the binaries are removed from the cache?

No, Spack uses patchelf to relocate the binaries to the correct prefix.

@obreitwi
Copy link
Copy Markdown
Member Author

users will install from a binary cache, so they won't lose a lot of time building their stack

Since our installs are rather large, it makes sense for people to use the cached version without copying. However, relocating packages in the event the cache gets outdated seems to be a desirable feature.

No, Spack uses patchelf to relocate the binaries to the correct prefix.

Ah, I was not aware of this functionality.. Will you push your workflow upstream once it is ready? (So far I only found lib/spack/spack/{relocate,binary_distribution}.py but no ready-to-use command exposing the functionality to the command line.)

Up until then, spack stack will have to suffice for us..

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jan 30, 2018

(So far I only found lib/spack/spack/{relocate,binary_distribution}.py but no ready-to-use command exposing the functionality to the command line.)

Just fyi, the command that expose this functionality is spack buildcache. I think the user interface can be improved, but it does its job pretty well.

Creation of symlink was attempted even though target was already
present.
@citibeth
Copy link
Copy Markdown
Member

Superceded by #8014

@tgamblin
Copy link
Copy Markdown
Member

Closing in favor of #8014 but thanks a lot to @obreitwi for the contribution -- this idea is great.

@tgamblin tgamblin closed this May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants