Skip to content

feature: sharing a spack instance#11871

Open
carsonwoods wants to merge 289 commits intospack:developfrom
carsonwoods:features/shared
Open

feature: sharing a spack instance#11871
carsonwoods wants to merge 289 commits intospack:developfrom
carsonwoods:features/shared

Conversation

@carsonwoods
Copy link
Copy Markdown
Contributor

@carsonwoods carsonwoods commented Jun 27, 2019

Closes #15939

Changes which allow for a single system-installed Spack, with an installation root maintained by a system admin and an installation root maintained by a user (but for example admins can set config values that apply to any user of the Spack instance). Overall this is intended to allow admins to deploy an instance of Spack which looks like any other system-installed tool.

This includes the following changes:

  • A single Spack instance can now manage multiple install trees (example syntax for selecting an install tree below)
  • config.yaml contains a "shared_install_trees" entry which administrators can use to manage an install tree that is meant to be shared
  • Spack now maintains upstreams differently. Previously, users had to maintain an upstreams.yaml file. Now, for an install tree, the user can assign a single upstream (this is stored in the install tree, and Spack can manage a >2 levels of install trees by recursively following these pointers)
  • Move module roots to install tree root (not strictly required but in general for users they need to be located outside the Spack prefix)
  • Add config scope associated with the install tree itself (this allows admins to assign specific permissions to the packages installed that are meant to be available to all users)
  • Environments are stored in the install root. This allows users with a shared Spack instance to make their own environments. This does not allow admins of the shared Spack instance to provide environments (that could be handled in a later PR)
  • For now it is assumed that users would not need to manage their own GPG keys: they would use the admin-created/added keys to sign new binary caches or to install them.

Usage

# Install foo to install tree x
spack --install-tree x install foo
# Set y as the upstream for x
spack --install-tree x init-upstream y 
# If the config does not contain any shared_install_trees entries, this installs to var/spack
# Otherwise, it installs to ~/
spack install foo

This includes tests, for example:

  • Test that Spack can use upstreams.yaml but prefers using pointers
  • Test that Spack can use older format "install_tree" in config.yaml

This has changed significantly since the PR was first created, so the old description is included below:

# Shared Spack
These changes add the ability for spack to operate in a "shared" mode where multiple users can use the same instance of spack without directly affecting other users. Previously, a similar solution was possible via users configuring their local ~/.spack configurations, however doing so didn't stop other users from accidentally affecting other users packages/specs.

When shared mode is inactive spack behaves like a normal spack instance. This would allow system admins to configure repos, mirrors, environments, etc. These settings are shared by all users of this instance of spack.

When shared mode is enabled spack would treat the traditional installation locations as an upstream instance of spack, and the typical install/stage/cache/etc locations would be set to a directory that a user could specify by setting $SPACK_PATH=/some/directory/ in their environment.

Users could still make their own local setting configurations in ~/.spack.

One additional change that is introduced in this feature is that attempting to uninstall from an upstream instance of spack now creates an error rather than uninstalling the package.

Commands Introduced

$ spack share activate
$ spack share status
==> Shared mode enabled/disabled
$ spack share deactivate

### WIP
Some aspects of this are still a work in progress. Currently I have not implemented a good way to activate this version of spack. If a system-wide installation of spack, running the . $spack/share/spack/setup-env.sh could be hard to find. I experimented with creating a module file that runs that setup script and while that did work, it needs more work to be a viable way to load a shared spack.

@citibeth
Copy link
Copy Markdown
Member

I think we'd be better served by a shared spack that just lets everyone install stuff into the same place. That will produce the most efficient sharing of installations between users.

@carsonwoods
Copy link
Copy Markdown
Contributor Author

Ok, I can make that change. My only concern is that some users might not have permissions to install packages if the shared instance were placed at a system level.

@citibeth
Copy link
Copy Markdown
Member

My issue here is... this looks a lot like other features that have been proposed or integrated. I'd like to better understand how it is similar / different from them; and whether this is the way we want to go with Spack?

First, to better understand this PR:

  • Which of these are shared / not shared between users?

    • main spack directory?
    • tree of installed packages?
    • directory of generated modules?
  • Can you describe what problem this PR solves; and a typical use case?

  • Please look at Spack Chain (merged) and Spack Server (not implemented; Multi-User Spack: Spack Build Server #3156). Please compare / contrast this PR with those. And evaluate the merits of one approach vs the other.

I strongly encourage you to do this before writing more code on this feature.

@carsonwoods
Copy link
Copy Markdown
Contributor Author

What is shared

The main spack directory is shared between users. Everyone uses the same main installation and program files to run spack. As for packages and modules, some are shared and some are not. Right now, when shared mode is enabled a few things happen. First, spack changes where new packages are installed to wherever is specified by the user via an environment variable. Packages (and their respective modulefiles) installed this way are not shared between users.

Second, spack adds the typical install directory($spack/opt/spack) as an upstream (basically its treating itself as an upstream/chained instance of spack). That means that any packages that were installed when shared mode is disabled, become visible as upstream packages to users when shared mode is re-enabled. Because they are included in an upstream, packages installed when shared mode is disabled have their module files made visible to users.

Typical Use Case

The use case that I envisioned is for use on a multi-user system that is providing their environment through spack. The system-wide environment would be managed through spack. Environments installed at the system level would be visible to all users and changes could be made by modifying the environment or creating different versions that co-exist. At the same time, if a user needed a package or install configuration that wasn't shipped with the system-wide environment, they could use the same instance of spack to install packages locally that only they are using.

This would also allow for testing of new environment configurations without disrupting the existing environment on a system.

Differences/Similarities

Spack Chain

This is not so much an alternative to chaining spacks, but this feature is built on top of it. Rather than each user having to configure their own unique instance of spack to point at an upstream instance, this would be handled by a single instance of spack that is treating itself as an upstream. This doesn't prevent other upstreams from being added if a user wanted to include other packages from another instance of spack.

Spack Server

My shared spack is actual fairly similar to spack server in what it allows users to do, but the actual architecture of the feature is different. Rather than having a unique client/server relationship, there is still only a single instance of spack that manages everything. This means that builds between users are not shared (though more on that later). Also, because when users are interfacing with spack they are installing locally, multiple users could be interfacing with the same instance of spack more easily than if communication had to pass through a server. This also avoids having to juggle multiple permissions and install locations for packages.

While thinking about how to make packages be truly shared between users as well, I had the idea of adding each user as an upstream as well. I have yet to try this, and I am not sure that the permissions between users would allow this, but when a user first interfaces with a shared instance of spack, they could be registered as an upstream so that their packages become visible to other users.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Jul 6, 2019

@becker33: can you please review this one? It looks like cool stuff is happening over at SNL 👍

@tjfulle
Copy link
Copy Markdown
Contributor

tjfulle commented Jul 8, 2019

This feature seems ideal for the situation where a team of developers is working on a project with many dependencies. A single spack instance can install all of the dependencies in "unshared" mode and developers can then use that same spack instance in "shared" mode during their development workflow. eg, use spack in shared mode and spack setup myproject -- myproject would use dependencies from the shared location, but install in the user's workspace, without having to worry about installing dependencies, clobbering others' work, setting up additional repos, configurations, etc.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Jul 9, 2019 via email

@scheibelp scheibelp assigned scheibelp and unassigned becker33 Jul 15, 2019
@scheibelp scheibelp requested review from scheibelp and removed request for becker33 July 15, 2019 22:14
@scheibelp
Copy link
Copy Markdown
Member

I'm wondering if the following may serve (or if you point out issues with this suggestion for your use case it would help me understand spack share better):

Say we have a system installation of Spack: spack-system. If we wanted a separate downstream Spack instance but did not want the user to have to replicate the config, we could place the desired configuration in a separate directory and when running either instance we could point to that config with -C like

/path/to/downstream/spack -C /path/to/upstream/config ...
/path/to/upstream/spack -C  /path/to/upstream/config ...

(Normally git clone would be sufficient for this but in that case the user would have to remember to git pull any changes that have occurred)

This would still require each user to make at least one configuration change themselves: to add the upstream Spack to their upstreams.yaml. I think that could also be resolved by creating an additional config directory which contains the upstream config:

# /upstream-pointer-cfg-dir just contains an upstreams.yaml file that points to the upstream instance
/path/to/downstream/spack -C /path/to/upstream-pointer-cfg-dir/ -C /path/to/upstream/config ...
/path/to/upstream/spack -C  /path/to/upstream/config ...

With that the work to be done for each user would be something like configuring an alias so that users don't have to type -C all the time.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Jul 16, 2019 via email

@tgamblin
Copy link
Copy Markdown
Member

@citibeth:

Every user who builds the same package with the same hash will get the same result.

There are some rather painful ways this is not completely true at the moment:

  1. See Specific targets #3206
  2. Build dependencies are not (yet) included in the DAG hash (can result in subtle differences)
  3. The hash of the package.py file is not included in the DAG hash.
  4. To support platforms like Cray, we only blacklist certain environment variables to clean the build environment, which is not as thorough as starting from scratch and constructing the build environment.
  5. We don't (yet) support building everything down to libc.
  6. master and develop versions (as you mention) at the moment.

Regardless, I think this point is valid -- we don't intend for things with the same hash to be different.

Therefore, not sharing builds between users is never beneficial, IMHO.

The use cases this addresses are:

  1. I would consider it extremely beneficial not to share a build if, for example, I built something that was export-controlled. That's a very common use case for us. I would want to keep that private.
  2. This PR is meant to make it possible to have a central system installation of Spack that is shared among unprivileged users and the facility. The facility may not want to support all the things users want (e.g., ours does not). They also want users to rely on a common core of shared dependencies. This increases sharing by basically giving you something that is chained out of the box.
  3. Enabling this is critical for making Spack installable via PyPI. Currently, Spack requires write access to its own prefix. We need a version that does not require that to make it fit nicely into Python's provisioning model.
  4. This is further out, but we do support relocation. If enough users install something and the facility decides to support it, users can "push" a local installation to the central Spack installation. Or they could make a binary package of it and share it with the facility, and other users could consolidate to have their Spack instances remove local duplicate installations and re-RPATH to a newly provisioned central install. These are things we'd like to have eventually.

A properly set-up "Spack Sever" system would replicate / automate how software was traditionally installed on HPC systems: I want a package, I make a request to the sysadmins, they install it and provide me a module.

We're aiming to support this type of thing through build farms and binary packages (see #11612). While that PR does not yet have a REST API as a front-end, you could imagine adding one -- that might be interesting at some point, but I can't say it's on the near-term priority list.

But I don't think that's a key concern here at this point, because builds happen so rarely.

Builds happen every day, all the time at LLNL and other DOE sites. It's not a static environment.

even if we never think more deeply about parallelization.

Parallelization (via locking) is on the roadmap for this fall.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carsonwoods: This is a good start! Thanks for taking it on. I have some change requests for you.

No modes

The main gist of what i'm requesting is that instead of having two modes (one for admins and one for users), there should still be only one mode, and the user (admin or otherwise) should be able to pick where to install packages. I think you should be able to do this in a general way by still leaning heavily on the chaining functionality we've already implemented, but a Spack instance will have:

  1. A default install location of ~/.spack, as you've proposed here
  2. An "upstream" (ala Spack Chains) configured by default within the Spack prefix.

Instead of using the spack shared command to switch modes, I think this should add an argument to spack install to say where to install. See my note on this below; I think you could start by just adding support for spack install --global (to install to location 2 above) instead of doing the fully general thing where you could install to arbitrary upstreams.

This will allow admins to log in and spack install --global <pkg>, or even to activate an environment and just run spack install --global to get everything installed in the shared location. This also allows any user to first get a build working in their home directory, then install to the global location once they've debugged stuff. I think that will be a common use case for us -- it can take a while to get a build debugged.

Permissions

There is one thing I don't see addressed at all in this PR, and that is permissions. See the package permissions docs for how we currently handle per-package permission settings. The shared install location should support these types of permission settings and we should ensure that the group and world bits are set properly on installations. If you want to see a model of how this can work in practice, look at how git repository sharing works in the filesystem. I think the model for Spack can be similar.

Modules, environments, caches, etc.

Spack currently writes a bunch of other stuff into its prefix (modules, environments, caches) and I'd ideally like to see those moved to the home directory as well. I think we should think about how they factor into chaining -- environments in particular would be useful to have both in upstreams (sharable by anyone downstream) and locally in the home directory. Modules are already handled by chaining. /var/spack/cache -- the download cache -- should probably be moved as well, though it probably makes sense to allow that to be cached globally somehow (perhaps similarly to the permissions model on the install directory).

How does that sound? We should probably set up a telcon to discuss this, or cover it on a Spack weekly telcon.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Jul 16, 2019 via email

@carsonwoods
Copy link
Copy Markdown
Contributor Author

@tgamblin
Thanks for the review! I think that these suggestions make a lot of sense. I can start implementing some of these changes on my end and I'd be happy to have a telcon to discuss all this further.

@tgamblin
Copy link
Copy Markdown
Member

@carsonwoods: sounds good -- let us know on Slack or here if you've got questions, or we can set up a call sometime.

@tgamblin
Copy link
Copy Markdown
Member

@carsonwoods: Just FYI: if you can rebase on develop instead of a lot of merges, it may be easier to preserve your commits when this is finally merged.

@tjfulle
Copy link
Copy Markdown
Contributor

tjfulle commented Jul 16, 2019

@tgamblin @citibeth @carsonwoods - I am following this issue with interest as it looks like it will be used for development of a code my group develops that has dozens of dependencies that are themselves being developed. The team's devops person will be the admin and developers the users. This makes #11919 all the more important - on some of my accounts at SNL, I have limited space and cannot install packages to my home directory.

I'll also echo what @tgamblin said earlier, it is not unusual for me to have different jobs compiling on many different machines most hours of the day.

@Jordan474
Copy link
Copy Markdown
Contributor

I have a question on the wording of this PR:

  • Does it share a spack instance ? or a spack install tree ?

To me, the "instance" means "the spack git clone", and "install tree" means only the spack/opt/spack (or whatever install_tree config). Sharing an "instance" makes me think it would share more that install tree (scope site configurations of the "upstream" spack and stuff like that).

I would prefer to share only the spack install trees, because my CI/CD jobs deploy spack installations from a temporary git clone (the CI/CD job working directory), and in deployment jobs I only configure spack to output the install_tree to a shared FS. Before sbang was moved in the install tree, I did have the git clone in a shared FS but it was very much not ideal (very slow shared FS).

  • Will I still be able to keep only the install tree (spack/opt/spack) of the upstream Spack, and delete all the other files of the git-clone ?

FYI, For now, I don't see any critical references to files of the git clone in our package installations (except for spack/env/gcc but they cannot work anyway).

@carsonwoods
Copy link
Copy Markdown
Contributor Author

carsonwoods commented Feb 28, 2023

@Jordan474 Its been a long time since I've worked on this PR (mainly due to the maintenance burden of maintaining such a large change while waiting for a review), but if I am remembering correctly it was a shared Spack install tree. In retrospect, a shared Spack tree is more clear. If the PR gets more attention moving forward I'll be sure to make the wording clear.

@alalazo alalazo modified the milestones: v0.20.0, v0.21.0 May 2, 2023
@tgamblin tgamblin modified the milestones: v0.21.0, v0.22.0 Oct 17, 2023
@ajyounge
Copy link
Copy Markdown

Will this feature ever track in spack? If so, how many more years will it take?

@tgamblin
Copy link
Copy Markdown
Member

@ajyounge: I'm hoping to have something like it for 0.22 in June, but we have had a lot of other things (compilers, solvers, end of ECP funding, etc.) to deal with lately.

@harshula
Copy link
Copy Markdown
Contributor

FYI, Looks like changing the location of environments has been merged via #32836 .

@carsonwoods
Copy link
Copy Markdown
Contributor Author

#32836 is a great feature (and would allow for shared environments managed by Spack), but it doesn't exactly accomplish what this PR set out to enable. The goal of this PR was to allow for shared environments to serve as a pre-built environment that users can build off of using Spack (i.e., the user can install a package they individually need and leverage dependencies in a shared Spack environment). My understanding is that, if a user has a Spack environment activated, they can't install a package without first adding it to the environment and get an error that looks like this:

$ spack install zlib
==> Error: Cannot install 'zlib' because no matching specs are in the current environment. You can add specs to the environment with 'spack add zlib', or as part of the install command with 'spack install --add zlib'

In most computing environments where this feature would be useful, most users shouldn't be able to modify system environments. I suppose that users could add those environment packages as externals, so their Spack treats them as packages and lets them interact with them without activating an environment, but you lose a lot of the nice features of Spack being able to manage everything centrally that way.

Additionally, a big reason this PR was originally blocked from merging was ostensibly due to a desire to re-work how Spack manages permissions in these shared environments which I don't think #32836 makes any accommodations for. I'm unsure if that is still a priority or a requirement, but it's worth mentioning.

@tgamblin
Copy link
Copy Markdown
Member

Moving to 0.24 but meeting with SNL folks tomorrow to figure out a plan to actually get it done. FYI @psakievich

@tgamblin tgamblin modified the milestones: v0.23, v0.24 Oct 28, 2024
@scheibelp scheibelp mentioned this pull request Nov 15, 2024
1 task
@tgamblin tgamblin modified the milestones: v1.0.0, v1.1.0 Jun 9, 2025
@tgamblin tgamblin modified the milestones: v1.1.0, v1.2.0 Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.