feature: sharing a spack instance#11871
Conversation
|
I think we'd be better served by a shared spack that just lets everyone install stuff into the same place. That will produce the most efficient sharing of installations between users. |
|
Ok, I can make that change. My only concern is that some users might not have permissions to install packages if the shared instance were placed at a system level. |
|
My issue here is... this looks a lot like other features that have been proposed or integrated. I'd like to better understand how it is similar / different from them; and whether this is the way we want to go with Spack? First, to better understand this PR:
I strongly encourage you to do this before writing more code on this feature. |
What is sharedThe main spack directory is shared between users. Everyone uses the same main installation and program files to run spack. As for packages and modules, some are shared and some are not. Right now, when shared mode is enabled a few things happen. First, spack changes where new packages are installed to wherever is specified by the user via an environment variable. Packages (and their respective modulefiles) installed this way are not shared between users. Second, spack adds the typical install directory( Typical Use CaseThe use case that I envisioned is for use on a multi-user system that is providing their environment through spack. The system-wide environment would be managed through spack. Environments installed at the system level would be visible to all users and changes could be made by modifying the environment or creating different versions that co-exist. At the same time, if a user needed a package or install configuration that wasn't shipped with the system-wide environment, they could use the same instance of spack to install packages locally that only they are using. This would also allow for testing of new environment configurations without disrupting the existing environment on a system. Differences/SimilaritiesSpack ChainThis is not so much an alternative to chaining spacks, but this feature is built on top of it. Rather than each user having to configure their own unique instance of spack to point at an upstream instance, this would be handled by a single instance of spack that is treating itself as an upstream. This doesn't prevent other upstreams from being added if a user wanted to include other packages from another instance of spack. Spack ServerMy shared spack is actual fairly similar to spack server in what it allows users to do, but the actual architecture of the feature is different. Rather than having a unique client/server relationship, there is still only a single instance of spack that manages everything. This means that builds between users are not shared (though more on that later). Also, because when users are interfacing with spack they are installing locally, multiple users could be interfacing with the same instance of spack more easily than if communication had to pass through a server. This also avoids having to juggle multiple permissions and install locations for packages. While thinking about how to make packages be truly shared between users as well, I had the idea of adding each user as an upstream as well. I have yet to try this, and I am not sure that the permissions between users would allow this, but when a user first interfaces with a shared instance of spack, they could be registered as an upstream so that their packages become visible to other users. |
|
@becker33: can you please review this one? It looks like cool stuff is happening over at SNL 👍 |
|
This feature seems ideal for the situation where a team of developers is working on a project with many dependencies. A single spack instance can install all of the dependencies in "unshared" mode and developers can then use that same spack instance in "shared" mode during their development workflow. eg, use spack in shared mode and |
|
Thanks for explaining further. I am *really* not convinced. The core
reason why is simple: if user A has built a package with a particular hash,
and users A and B work together on a shared filesystem, then there is NO
REASON for B to ever rebuild that same package with the same hash. Every
user who builds the same package with the same hash will get the same
result. That is a fundamental feature of how Spack works.
Therefore, *not* sharing builds between users is never beneficial, IMHO.
It makes sense to me that Spack Environments, modules, other stuff might
not be shared. But the core builds themselves (i.e. the stuff that takes
the most CPU time), it makes no sense to NOT share.
A properly set-up "Spack Sever" system would replicate / automate how
software was traditionally installed on HPC systems: I want a package, I
make a request to the sysadmins, they install it and provide me a module.
I agree there can be issues of queuing / executing build requests. Yes,
more people can be building at once if they all build separately than if
the requests are channeled through a server; unless Spack gets a little
smarter about building multiple things in parallel. But I don't think
that's a key concern here at this point, because builds happen so rarely.
My computers, for example, might spend 3-4 hours *per year* building. I
spend more time talking about Spack on GitHub than I do actually using
Spack. I just don't think queue contention will be a big issue, even if we
never think more deeply about parallelization.
but install in the user's workspace, without having to worry about
installing dependencies, clobbering others' work, setting up additional
repos, configurations, etc.
In theory you won't clobber other peoples' work because of Spack's use of
hashes.
I suppose this breaks down with `@master` and `@develop` versions. Argh.
|
|
I'm wondering if the following may serve (or if you point out issues with this suggestion for your use case it would help me understand Say we have a system installation of Spack: (Normally This would still require each user to make at least one configuration change themselves: to add the upstream Spack to their With that the work to be done for each user would be something like configuring an alias so that users don't have to type |
|
I use aliases for my spack commands
…On Mon, Jul 15, 2019 at 20:32 Peter Scheibel ***@***.***> wrote:
I'm wondering if the following may serve (or if you point out issues with
this suggestion for your use case it would help me understand spack share
better):
Say we have a system installation of Spack: spack-system. If we wanted a
separate downstream Spack instance but did not want the user to have to
replicate the config, we could place the desired configuration in a
separate directory and when running either instance we could point to that
config with -C like
/path/to/downstream/spack -C /path/to/upstream/config ...
/path/to/upstream/spack -C /path/to/upstream/config ...
(Normally git clone would be sufficient for this but in that case the
user would have to remember to git pull any changes that have occurred)
This would still require each user to make at least one configuration
change themselves: to add the upstream Spack to their upstreams.yaml. I
think that could also be resolved by creating an additional config
directory which contains the upstream config:
# /upstream-pointer-cfg-dir just contains an upstreams.yaml file that points to the upstream instance
/path/to/downstream/spack -C /path/to/upstream-pointer-cfg-dir/ -C /path/to/upstream/config ...
/path/to/upstream/spack -C /path/to/upstream/config ...
With that the work to be done for each user would be something like
configuring an alias so that users don't have to type -C all the time.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#11871?email_source=notifications&email_token=AAOVY527GEF432AGJKMOA7LP7UJKTA5CNFSM4H37WRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7K2TQ#issuecomment-511618382>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOVY5YOU2FSYPQJFDOCGSTP7UJKTANCNFSM4H37WROQ>
.
|
There are some rather painful ways this is not completely true at the moment:
Regardless, I think this point is valid -- we don't intend for things with the same hash to be different.
The use cases this addresses are:
We're aiming to support this type of thing through build farms and binary packages (see #11612). While that PR does not yet have a REST API as a front-end, you could imagine adding one -- that might be interesting at some point, but I can't say it's on the near-term priority list.
Builds happen every day, all the time at LLNL and other DOE sites. It's not a static environment.
Parallelization (via locking) is on the roadmap for this fall. |
tgamblin
left a comment
There was a problem hiding this comment.
@carsonwoods: This is a good start! Thanks for taking it on. I have some change requests for you.
No modes
The main gist of what i'm requesting is that instead of having two modes (one for admins and one for users), there should still be only one mode, and the user (admin or otherwise) should be able to pick where to install packages. I think you should be able to do this in a general way by still leaning heavily on the chaining functionality we've already implemented, but a Spack instance will have:
- A default install location of
~/.spack, as you've proposed here - An "upstream" (ala Spack Chains) configured by default within the Spack prefix.
Instead of using the spack shared command to switch modes, I think this should add an argument to spack install to say where to install. See my note on this below; I think you could start by just adding support for spack install --global (to install to location 2 above) instead of doing the fully general thing where you could install to arbitrary upstreams.
This will allow admins to log in and spack install --global <pkg>, or even to activate an environment and just run spack install --global to get everything installed in the shared location. This also allows any user to first get a build working in their home directory, then install to the global location once they've debugged stuff. I think that will be a common use case for us -- it can take a while to get a build debugged.
Permissions
There is one thing I don't see addressed at all in this PR, and that is permissions. See the package permissions docs for how we currently handle per-package permission settings. The shared install location should support these types of permission settings and we should ensure that the group and world bits are set properly on installations. If you want to see a model of how this can work in practice, look at how git repository sharing works in the filesystem. I think the model for Spack can be similar.
Modules, environments, caches, etc.
Spack currently writes a bunch of other stuff into its prefix (modules, environments, caches) and I'd ideally like to see those moved to the home directory as well. I think we should think about how they factor into chaining -- environments in particular would be useful to have both in upstreams (sharable by anyone downstream) and locally in the home directory. Modules are already handled by chaining. /var/spack/cache -- the download cache -- should probably be moved as well, though it probably makes sense to allow that to be cached globally somehow (perhaps similarly to the permissions model on the install directory).
How does that sound? We should probably set up a telcon to discuss this, or cover it on a Spack weekly telcon.
|
This is an excellent discussion, I believe we should consider putting these
points in the docs.
…On Tue, Jul 16, 2019 at 3:35 AM Todd Gamblin ***@***.***> wrote:
@citibeth <https://github.com/citibeth>:
Every user who builds the same package with the same hash will get the
same result.
There are some rather painful ways this is not completely true at the
moment:
1. See #3206 <#3206>
2. Build dependencies are not (yet) included in the DAG hash (can
result in subtle differences)
3. The hash of the package.py file is not included in the DAG hash.
4. To support platforms like Cray, we only blacklist certain
environment variables to clean the build environment, which is not as
thorough as starting from scratch and constructing the build environment.
5. We don't (yet) support building everything down to libc.
6. master and develop versions (as you mention) at the moment.
Regardless, I think this point is valid -- we don't *intend* for things
with the same hash to be different.
Therefore, *not* sharing builds between users is never beneficial, IMHO.
The use cases this addresses are:
1. I would consider it extremely beneficial not to share a build if,
for example, I built something that was export-controlled. That's a very
common use case for us. I would want to keep that private.
2. This PR is meant to make it possible to have a central system
installation of Spack that is shared among unprivileged users *and*
the facility. The facility may not want to support all the things users
want (e.g., ours does not). They also want users to rely on a common core
of shared dependencies. This increases sharing by basically giving you
something that is chained out of the box.
3. Enabling this is critical for making Spack installable via PyPI.
Currently, Spack requires write access to its own prefix. We need a version
that does not require that to make it fit nicely into Python's provisioning
model.
4. This is further out, but we *do* support relocation. If enough
users install something and the facility decides to support it, users can
"push" a local installation to the central Spack installation. Or they
could make a binary package of it and share it with the facility, and other
users could consolidate to have their Spack instances remove local
duplicate installations and re-RPATH to a newly provisioned central
install. These are things we'd like to have eventually.
A properly set-up "Spack Sever" system would replicate / automate how
software was traditionally installed on HPC systems: I want a package, I
make a request to the sysadmins, they install it and provide me a module.
We're aiming to support this type of thing through build farms and binary
packages (see #11612 <#11612>). While
that PR does not yet have a REST API as a front-end, you could imagine
adding one -- that might be interesting at some point, but I can't say it's
on the near-term priority list.
But I don't think that's a key concern here at this point, because builds
happen so rarely.
Builds happen every day, all the time at LLNL and other DOE sites. It's
not a static environment.
even if we never think more deeply about parallelization.
Parallelization (via locking) is on the roadmap for this fall.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11871?email_source=notifications&email_token=AAOVY5YHGP2NBPDDJUJLN63P7V23XA5CNFSM4H37WRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ77IWQ#issuecomment-511702106>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOVY52WICYAV3YSA65SBGTP7V23XANCNFSM4H37WROQ>
.
|
|
@tgamblin |
|
@carsonwoods: sounds good -- let us know on Slack or here if you've got questions, or we can set up a call sometime. |
|
@carsonwoods: Just FYI: if you can rebase on develop instead of a lot of merges, it may be easier to preserve your commits when this is finally merged. |
|
@tgamblin @citibeth @carsonwoods - I am following this issue with interest as it looks like it will be used for development of a code my group develops that has dozens of dependencies that are themselves being developed. The team's devops person will be the admin and developers the users. This makes #11919 all the more important - on some of my accounts at SNL, I have limited space and cannot install packages to my home directory. I'll also echo what @tgamblin said earlier, it is not unusual for me to have different jobs compiling on many different machines most hours of the day. |
6d06a94 to
08b8238
Compare
|
I have a question on the wording of this PR:
To me, the "instance" means "the spack git clone", and "install tree" means only the I would prefer to share only the spack install trees, because my CI/CD jobs deploy spack installations from a temporary git clone (the CI/CD job working directory), and in deployment jobs I only configure spack to output the install_tree to a shared FS. Before sbang was moved in the install tree, I did have the git clone in a shared FS but it was very much not ideal (very slow shared FS).
FYI, For now, I don't see any critical references to files of the git clone in our package installations (except for |
|
@Jordan474 Its been a long time since I've worked on this PR (mainly due to the maintenance burden of maintaining such a large change while waiting for a review), but if I am remembering correctly it was a shared Spack install tree. In retrospect, a shared Spack tree is more clear. If the PR gets more attention moving forward I'll be sure to make the wording clear. |
|
Will this feature ever track in spack? If so, how many more years will it take? |
|
@ajyounge: I'm hoping to have something like it for 0.22 in June, but we have had a lot of other things (compilers, solvers, end of ECP funding, etc.) to deal with lately. |
|
FYI, Looks like changing the location of environments has been merged via #32836 . |
|
#32836 is a great feature (and would allow for shared environments managed by Spack), but it doesn't exactly accomplish what this PR set out to enable. The goal of this PR was to allow for shared environments to serve as a pre-built environment that users can build off of using Spack (i.e., the user can install a package they individually need and leverage dependencies in a shared Spack environment). My understanding is that, if a user has a Spack environment activated, they can't install a package without first adding it to the environment and get an error that looks like this: In most computing environments where this feature would be useful, most users shouldn't be able to modify system environments. I suppose that users could add those environment packages as externals, so their Spack treats them as packages and lets them interact with them without activating an environment, but you lose a lot of the nice features of Spack being able to manage everything centrally that way. Additionally, a big reason this PR was originally blocked from merging was ostensibly due to a desire to re-work how Spack manages permissions in these shared environments which I don't think #32836 makes any accommodations for. I'm unsure if that is still a priority or a requirement, but it's worth mentioning. |
|
Moving to 0.24 but meeting with SNL folks tomorrow to figure out a plan to actually get it done. FYI @psakievich |
Closes #15939
Changes which allow for a single system-installed Spack, with an installation root maintained by a system admin and an installation root maintained by a user (but for example admins can set config values that apply to any user of the Spack instance). Overall this is intended to allow admins to deploy an instance of Spack which looks like any other system-installed tool.
This includes the following changes:
Usage
This includes tests, for example:
This has changed significantly since the PR was first created, so the old description is included below:
# Shared SpackThese changes add the ability for spack to operate in a "shared" mode where multiple users can use the same instance of spack without directly affecting other users. Previously, a similar solution was possible via users configuring their local
~/.spackconfigurations, however doing so didn't stop other users from accidentally affecting other users packages/specs.When shared mode is inactive spack behaves like a normal spack instance. This would allow system admins to configure repos, mirrors, environments, etc. These settings are shared by all users of this instance of spack.When shared mode is enabled spack would treat the traditional installation locations as an upstream instance of spack, and the typical install/stage/cache/etc locations would be set to a directory that a user could specify by setting$SPACK_PATH=/some/directory/in their environment.Users could still make their own local setting configurations in~/.spack.One additional change that is introduced in this feature is that attempting to uninstall from an upstream instance of spack now creates an error rather than uninstalling the package.Commands Introduced
$ spack share activate$ spack share status==> Shared mode enabled/disabled$ spack share deactivate### WIPSome aspects of this are still a work in progress. Currently I have not implemented a good way to activate this version of spack. If a system-wide installation of spack, running the
. $spack/share/spack/setup-env.shcould be hard to find. I experimented with creating a module file that runs that setup script and while that did work, it needs more work to be a viable way to load a shared spack.