Skip to content

Maintain a view for an environment#10017

Merged
tgamblin merged 42 commits intospack:developfrom
scheibelp:features/view-with-env
Apr 10, 2019
Merged

Maintain a view for an environment#10017
tgamblin merged 42 commits intospack:developfrom
scheibelp:features/view-with-env

Conversation

@scheibelp
Copy link
Copy Markdown
Member

@scheibelp scheibelp commented Dec 5, 2018

This PR updates environments so that by default they are created with views. Currently my goal is to show how it works and get agreement on it. There are still tests etc. which are needed to complete it.

spack env create e1 #by default this will maintain a view in the directory Spack maintains for the env
spack env create e1 --with-view=/abs/path/to/anywhere
spack env create e1 --without-view

The manifest.yaml file now looks like:

spack:
  specs:
  - python
  view: true #or false, or a string

Existing environments will not automatically maintain views. I propose adding the following command to manipulate whether an env maintains a view (these commands aren't yet available):

spack env view --enable #by default create the view
spack env view --enable /abs/path/to/anywhere
spack env view --disable
spack env view --show #show where the view is maintained

(EDIT 4/8/19) The commands for managing a view for an environment have been added and have a slightly different syntax:

spack env view enable
spack env view envable /abs/path/to/anywhere
spack env view disable

Views are automatically updated when specs are installed to an environment. A view only maintains one copy of any package. An environment may refer to a package multiple times, in particular if it appears as a dependency. This PR establishes a prioritization for which environment specs are added to views: a spec has higher priority if it was concretized first. This does not necessarily exactly match the order in which specs were added, for example, given X->Z and Y->Z':

spack env activate e1
spack add X
spack install Y #immediately concretizes and installs Y and Z'
spack install #concretizes X and Z

In this case Z' will be favored over Z.

Specs in the environment must be concrete and installed to be added to the view, so there is another minor ordering effect: by default the view maintained for the environment ignores file conflicts between packages. If packages are not installed in order, and there are file conflicts, then the version chosen depends on the order.

Both ordering issues are avoided if spack install/spack add and spack install <spec> are not mixed.

(UPDATE 4/8/19) When activated, if an environment includes a view, this view will be added to PATH, CPATH, and other shell variables to expose the Spack environment in the user's shell.

@scheibelp scheibelp added the WIP label Dec 5, 2018
@healther
Copy link
Copy Markdown
Contributor

healther commented Dec 5, 2018

I really want this (as I want to switch my private python stack to being spack managed), but my usecase requires that all specs in an environment need to be concretized together, otherwise I'll end up with multiple versions of python, which will be incompatible with each other.

I still didn't get @citibeth's argument for having multiple versions of a library in an environment, so I'll just assume that it is a valid use case, nevertheless there should be the option to say something like spack env concretize --unique, which would essentially be the same as creating a superpackage with all dependencies

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Dec 8, 2018

@healther: I'm in favor of eventually making views concretize all together. The only reason as far as I'm concerned we don't do it right now is that the concretizer doesn't support it well.

@healther
Copy link
Copy Markdown
Contributor

@tgamblin Do you have a timeframe in mind when the new concretiser will be available? From a conceptual point, isn't it "just" to treat an environment as an ad-hoc package? Or is it more complicated than that?

@tgamblin
Copy link
Copy Markdown
Member

@healther: I'd rather have a reasonable guarantee that we won't run into conflicts, as well as good error messages, before moving to something like that. At the moment we could do what you suggest but I do not think we'd be able to tell users something intelligent about their choices.

@healther
Copy link
Copy Markdown
Contributor

@tgamblin Yeah fair enough. It was more a "did I miss something fundamentally" than a "this is the correct thing" :)

Do you have an idea on the time frame of the new concretizer?

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Dec 10, 2018 via email

@healther
Copy link
Copy Markdown
Contributor

OK, here's a really simple argument. You need to run two applications as part of your environment. The two applications require different versions or options on some underlying library. If/when that is the case, it is not (currently) possible to concretize the two together.

Sure, but you end up in a, at least potentially, broken environment, because there is no way to define the precedence of the library versions. I agree that there are cases that spack is not able to handle (yet), but that doesn't make your environment anymore defined than doing spack load -r app1 app2 would.

I hear that you really wan this. But if you want to switch your Python stack to Spack, you can do that today

It really is just a convenience thing for me. I'd like to only use one package manager, whereas right now I have a combination of spack (in the lab), pyenv, pip and brew. I could always do the same that we do in the lab, i.e. define a pseudo-package with all dependencies and a dummy install-method. But then I need to manually create the views define my modules, and it quickly becomes tedious. So it's really not a blocking issue, but I think spack envs are more powerful than the combination of pyenv and brew that I'm currently using.

In that config, I specify Python 3.5.2. And it works... every single spec

That works, because non of your specs has a python@:2.8 dependency ;) otherwise you would end up with both and without a warning. My problem stemmed from py-jupyter-notebook's node-js dependency and I was on a branch that doesn't yet relies on the system-python.

As I said, my comments are more a reminder that this is still a wanted feature, but you shouldn't take it as a "this is blocking", in that case I'd be asking what I can do to fix it :)

@tgamblin
Copy link
Copy Markdown
Member

@citibeth: I think this boils down to the difference between an environment intended for running code vs. one intended for building code. We're not going to deal with that here, but I have ideas for how we could. I actually think both of these design points are possible, maybe with a little config.

But again -- this is just a first cut. This PR just adds a view, with some precedence rules.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Dec 12, 2018 via email

Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three bugs in the behavior of this PR

  1. When in an environment, spack uninstall should remove the package from the view.
  2. When in an environment, spack remove foo; spack concretize should remove foo from the view
  3. If foo is installed in Spack outside of an environment, spack add foo; spack concretize should add foo to the view.

All of these are examples of the general principle that the packages in the view should be identical to the packages listed under \d installed package(s) by the output of spack -e ENV find.

After install, uninstall, and concretize commands, we need to check the specs in the environment against the specs in the view, and add/remove from the view until they match.

Also, we may want to consider not adding deps to the view. I think the view should only contain the root specs. If you care about a non-root spec, make it a root. I both think that's the behavior we want, and that obviates the need for some of the ordering logic you go through here.

@citibeth
Copy link
Copy Markdown
Member

[Sorry I didn't realize this was a PR thread, not an email thread] Thoughts on the PR itself...

#by default this will maintain a view in the directory Spack maintains for the env

Not sure what I think of this. Spack generally provides two way to enable a bunch of packages: (a) load modules, and (b) set up a view. There are pros and cons to these approaches, which I will try to list below. At this point in time, I don't think that Spack should prefer one over the other.

Pros/Cons:

  • Views increase the number of inodes required to build your environment. How much? It would be nice to know... but in any case, they might not be preferable in inode-constrained HPC environments.

  • Loading 100 different modules to load an environment can be a little slow on HPC systems with shared filesystems. This can/will be fixed by creating a single module that loads an entire environment in one shot. I'm not sure if it will "work great", or if things will still be slow if there are 100 directories in a $PATH environment variable. (My guess is it will work fine).

  • Loading modules involves making things work by setting environment variables. In the case of Python, that's like using LD_LIBRARY_PATH instead of RPATH, which we know sucks. Views offer the possibility to assemble Python (and R, etc) packages in a way that doesn't require env vars to be set properly. I don't know if this features has been built yet; if not, it should be.

  • Some systems are boneheaded and want to see all their dependencies in a single folder. More commonly, many programs STILL assume that netcdf and netcdf-fortran share a directory. Views are more convenient than modules as a way to deal with these systems.

  • Some people don't use modules, or don't want to learn how to use them, or don't want to install modules on their system.

The manifest.yaml file now looks like:

What does view: true mean here? Does it mean you're adding views for Python? Or turning on view for the environment overall? (I think the latter probably). In that case, I think this functionality is in the wrong place. I believe an environment should be a set of packages to be used together. A view is one way that one might use an environment. It's easy to imagine an environment that one person wants to use as a view, and someone else wants to use with modules.

Therefore, the view is not an intrinsic part of the environment, and shouldn't be included as such.

I propose adding the following command to manipulate whether an env maintains a view (these commands aren't yet available):

Since views and module scripts are semantically equivalent, they need to work the same in Spack. Syntax for how they are accessed / created has historically been different because the features were developed separately by separate people, and nobody unified them. With the advent of environments, that needs to change.

But it does bring up some tricky issues... because although they should work the same from the user's point of view, they aren't generated the same underneath. Views are large and are created bit by bit, package by package; whereas module scripts are generated all-at-once based on a list of packages. Therefore, incremental changes are easier for views whereas globally regenerating is easier for module scripts. We should keep this in mind and implement things thoughtfully, but not burden the user with the difference.

Currently, the spack env module load command generates a module load script. (Ideally in the future it will simply generate a module, which is the same thing but faster). The workflow is... you build the environment, then you generate the module script as the last thing. That is clunky and error-prone. It would be nice if we had a command that turns on modules with an environment; and then keeps the module script sync'd up with the environment as the environment changes, just like this PR does for views.

The other option would be to implement views as modules are currently done: you build your environment, then you run a spack env view create kind of command, which generates a whole view in one shot. I don't think this is as good an option.

In any case, views and module scripts need to work the same. PRs should help them converge, not diverge. Therefore, this PR will probably need to pay some attention to module scripts as well as views.

spack env view --enable /abs/path/to/anywhere

The spack env module load command creates a load script inside Spack's environments/ directory, it gives you no choice of where to create that file. If you want it somewhere else, you can copy or (more likely) symlink to it. But in the interests of keeping views and env modules working alike, I think we need to add a command that creates an env module load script in an arbitrary location. Because I DO think we need this functionality ofr views.

QUESTION: Is there any value in generating more than one view from the same environment? I think probably not.

This PR establishes a prioritization for which environment specs are added to views:

The prioritization was already established in the previous environments PR and the module load script. Unless there's a serious problem with what we already have, prioritization needs to be the same when using views as when using modules.

Copy link
Copy Markdown
Member

@citibeth citibeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We need a generic term for "view or module script". Maybe "loader"? I'll go with that for now, but hopefully we'll think of something better.

  • It should be possible to specify, build and use environments without putting details of the preferred loader in the environment specification. (I suppose that if users WANT to put preferred loaders in their env spec, we shouldn't stop them).

  • Develop a syntax for this PR that is agnostic to the kind of loader you intend to use. I would suggest something allowing people to add as many loaders as they like to a single environment instance, something like:

spack env create
spack install X
spack env loader --add view   # Add a view in the default location in the *environments/* directory
spack env loader --add module  # Add a module / module load script in the *environments/* directory
spack env loader --add view /my/path   # Adds a second view in /my/path

Now every time you add/remove from the environment, it will add/remove from two views and one module load script. Sysadmins might appreciate this ability to simultaneously generate views and scripts, based on the preferences of different users.

  • Views and module scripts need to be semantically equivalent. Every feature needs to work the same way for both of them. Unit tests should be developed to ensure this.

  • I believe that spack env activate doesn't just set an idea of a "current" environment; it also loads the environment, at least as it's already been built? Does it do this by loading env vars similar to how it would work with modules? I don't know... but this looks like an area ripe for re-do, if we have more formalized notions of views / module scripts coming up. If you have a view enabled, should it work by setting the env vars to the view. If you have multiple loaders enabled, which one should it use to activate? (Probably the first one added).

  • A view for a particular environment should be well-defined... by what you get if you build the environment, then create the view. In this system, there are many paths by which you could create a view. For example, you could create an environment, add 3 packages, enable the view, remove one of the packages previously added, and then add two more. Both cases should / must end up with exactly the same result in the resulting view, anything else would be a bug. Unit tests need to be added to ensure that a variety of paths result in exactly the same view. If we can't find a way to reliably do that, we should go to the (less convenient) procedure of building the environment, then generating a view from it as a snapshot of that environment at that point in time.

@scheibelp
Copy link
Copy Markdown
Member Author

What does view: true mean here? Does it mean you're adding views for Python? Or turning on view for the environment overall? (I think the latter probably)

view: true means there is a view and it is maintained inside of the environment directory (in this case, the absolute path is not stored in case the environment directory is moved). view: /some/filesystem/path means there is a view and it is maintained at the specified path. view: false means there is no view associated with the environment.

This PR establishes a prioritization for which environment specs are added to views:

The prioritization was already established in the previous environments PR and the module load script. Unless there's a serious problem with what we already have, prioritization needs to be the same when using views as when using modules.

That seems agreeable and also readily doable.

Develop a syntax for this PR that is agnostic to the kind of loader you intend to use.

In any case, views and module scripts need to work the same. PRs should help them converge, not diverge.

If this PR was merged as-is, it would add functionality for maintaining views in an automated manner. This isn't available yet for modules, so the type of divergence is less problematic (compared for example to your point about precedence). I think offering automatic management of a module file can be handled later.

It should be possible to specify, build and use environments without putting details of the preferred loader in the environment specification.

This could be moved into a separate config file: I do see that how the environment is exposed to one user may not make sense to place in the spack.yaml used to keep track of the environment, since that may contain details that only apply to the user on their system. However, two relevant points:

  • spack.yaml can already contain user-specific config file references
  • Users can specify --without-view when using a .lock or .yaml file to construct the view

A view for a particular environment should be well-defined... by what you get if you build the environment, then create the view. In this system, there are many paths by which you could create a view. For example, you could create an environment, add 3 packages, enable the view, remove one of the packages previously added, and then add two more. Both cases should / must end up with exactly the same result in the resulting view, anything else would be a bug.

I think that one consistency guarantee is that if a user copies an environment and creates a view from it, and then regenerates the view in the original environment from scratch, that they should be consistent; this PR does do that.

In this PR, environment views are automatically updated with spack install, and because of that, view contents can differ (as explained in the PR description) for a given concretization order depending on the commands executed to get there. Unless this PR enforced that spack install in an environment forced the concretization and install of all prior specs added to the environment, these inconsistencies are unavoidable. This PR does not preclude enforcing that later, so I don't think that is essential.

@citibeth
Copy link
Copy Markdown
Member

Develop a syntax for this PR that is agnostic to the kind of loader you intend to use.

In any case, views and module scripts need to work the same. PRs should help them converge, not diverge.

If this PR was merged as-is, it would add functionality for maintaining views in an automated manner. This isn't available yet for modules, so the type of divergence is less problematic (compared for example to your point about precedence). I think offering automatic management of a module file can be handled later.

There are two issues here: (1) the proper syntax for these commands, and (2) implementing the functionality behind the syntax.

  • I would like to push to get the syntax right on this PR. The current syntax is not symmetrical between views vs. module scripts, and it needs to be. I'm not suggesting any specific syntax, but asking that someone take another go at it and come up with a suggestion that has this symmetry property.
  • Once we have the syntax right, we can re-do existing functionality into it. If there are things the new syntax allows that are not implemented yet, we can put "not implemented yet" error message stubs in their place.
  • We currently have a way to generate a module script from an environment. Why not just re-run that regeneration procedure every time an environment is changed, can that be put in this PR? Sure it sounds a little inefficient. But it should work and should be pretty easy to implement, and might work "well enough" that we never have to do something more clever. (In any case... the long-range plan is to replace module scripts with a single module for the entire environment. So no need to kill ourselves optimizing generation of module scripts today.)

It should be possible to specify, build and use environments without putting details of the preferred loader in the environment specification.

This could be moved into a separate config file: I do see that how the environment is exposed to one user may not make sense to place in the spack.yaml used to keep track of the environment, since that may contain details that only apply to the user on their system.

I'd be happy with just specifying on the command line whether you want a view or module script, etc. That can be accommodated without removing the functionality from spack.yaml. (But the functionality in spack.yaml will have to change to make it symmetrical for views vs. module scripts).

  • Users can specify --without-view when using a .lock or .yaml file to construct the view

I believe that the default should be no view or module script generation. Some people won't want views, and it seems counter-intuitive to have to say more to get Spack to do less.

I think that one consistency guarantee is that if a user copies an environment and creates a view from it, and then regenerates the view in the original environment from scratch, that they should be consistent; this PR does do that.

I would look at the bug reports already listed, and consider adding unit tests that would have caught those bugs.

In this PR, environment views are automatically updated with spack install, and because of that, view contents can differ (as explained in the PR description) for a given concretization order depending on the commands executed to get there. Unless this PR enforced that spack install in an environment forced the concretization and install of all prior specs added to the environment, these inconsistencies are unavoidable. This PR does not preclude enforcing that later, so I don't think that is essential.

My thoughts on this issue:

  1. Inconsistencies like this are subtle, and will catch unsuspecting users; who will then post bug reports about how the precedence doesn't work the way it is supposed to. I believe, therefore, that consistency needs to be enforced in THIS PR, not some future hypothetical PR.
  2. You have suggested one algorithm to maintain consistency. There are likely other, cheaper, algorithms that would also work. Spack has access to the previous and current states of the environment, and should be able to detect precedence problems --- and then fix them --- without re-doing everything that came before.
  3. That said, I would implement the simpler consistency algorithm in this PR. We might find it's fast enough.

@healther
Copy link
Copy Markdown
Contributor

healther commented Dec 13, 2018

Loading 100 different modules to load an environment can be a little slow on HPC systems with shared filesystems. This can/will be fixed by creating a single module that loads an entire environment in one shot. I'm not sure if it will "work great", or if things will still be slow if there are 100 directories in a $PATH environment variable. (My guess is it will work fine).

Our experience tells us that having 100s of directories in $PATH makes things like tab-completion painfully slow (in fact that was the original motivation to implement this functionality).

Spack environments defines a precedence when there are conflicts.

I wasn't aware of that. Though I'd argue, that it is only barely more deterministic than spack load in the sense, that yes, given a close study of the documentation I now may influence the order. But for a naive user, there can still be multiple versions of a package (which can bite you, especially with python packages). Again I'm not arguing, that this is necessarily the only sane use case but it is also a sane use case and should be "detectable"

It is rare that somebody (in HPC land at least) needs to build code but not run anything.

There are use cases, i.e. when you use spack to provision a software environment for other people to work against. We do not necessarily want to force people to develop in our environment.

Spack generally provides two way to enable a bunch of packages: (a) load modules, and (b) set up a view.

I don't see them as such opposites. Really for us views are a way to reduce the size of the env variables that need to be set! In essence we provide a module for each view, and each view essentially supplements the structure of / and in fact will be overlayed to / in the container provisioning. So views are a performance addition and not really an alternative to modules (leaving aside the annoyance of multiple packages requiring to be installed in a single PREFIX)

Out of curiosity: Do you see an advantage of not using a view? Except inode usage, which could be fixed by using hard links (except for git, because it already uses O(100) hard links in its own prefix...)

Inconsistencies like this are subtle, and will catch unsuspecting users

I agree, but I don't see how this behaviour is compatible with the "copy the definition file and you get the same result" philosophy. If the order of additions and deletions of specs is important, then how do we define the resulting environment consistently?
Independent of that: Yes we should not have two commands with identical results for the user, but different implementations (load all prefixes via a module or load the view-module) behaving differently for the user.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Dec 13, 2018 via email

@healther
Copy link
Copy Markdown
Contributor

There might be a difference between shells on this one; which shell are you using?

My personal experience was with bash, but at least zsh is similar. The problem isn't really the shell, but the filesystem, the shell has to go through all PATH directories in order find all possible completions (or you use caching and run in the next source of trouble). For us a slow NFS system made the problems more pronounced, I'd suspect that my MacBook's SSD would cope better with that

  1. It looks like some kind of warning needs to be made when overrides
    happen. And the warning needs to differ when an explicit package (one
    mentioned explicitly in the env.yaml file) vs. implicit package gets
    overridden.

That's essentially what I'm saying!

  1. It has long been my belief that Spack works best when you ask it to
    concretize an environment, check over the environment produced, and then
    make changes to the various configurations / specs until you like what you
    see. I get the sense that a lot of people don't do this. But it's
    something I think we should encourage.

Expecting expertise from user is always a tricky thing. In my experience if you have expectations users will find a way to disappoint you :) And that's not a criticism of users. The whole point of using something like spack is not to have to take care about this yourself. Of course in practice there are limits how user-proof a tool can become. But it is not unreasonable to expect a tool (especially one written in python) to work "as I intended" if it didn't complain.

In this case, the Spack person is creating an environment that will be used by
its users
to build and run software; so I don't think it's a
counter-example to what I said.

It is somewhat. Because it is different software and it may very well be that the environment to build our software stack is vastly different from the environment a user of that software wants to build his/her project in. There is a difference between a development environment for a package (which spack should also absolutely support) and a development environment of a piece of software that depends on said package!
Think about for example boost. I'd like to be able to go to spack and tell it: I want to have an environment in which to develop boost. That means a) I want to have all dependencies installed, b) I want the sources of boost available and c) ideally I'd also want to be able to have spack invoke the build system. But that is fundamentally different from using spack to provide the boost library to users to develop their own software against

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Dec 14, 2018 via email

@becker33 becker33 dismissed their stale review December 18, 2018 22:12

I made the changes, but want to reserve the right to request additional changes.

@tgamblin tgamblin self-requested a review December 20, 2018 21:22
@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Dec 20, 2018

Maybe I can answer some questions and add some clarity to this discussion.

On @citibeth's main point that views and modules should be equivalent and that people should be able to choose a loader -- I like the gist of this but the truth is they're not equivalent. Some reasons:

  1. Modules stop scaling if you load too many of them, both because you have to search an awful lot of paths, and because you actually run out of space in the environment. There is a character limit for the size of environments; we've hit this in our job submission system where jobs would fail due to super large environments.
  2. Modules (even Spack ones, though RPATHs help) load things piecemeal and force the consistency issue on the user. Even with systems like Lmod that enforce sensible swap semantics for some dependencies (generally compilers and MPI), Lmod does not enforce consistency for dependencies outside its "hierarchy". The truth is that dependencies are not hierarchical -- they're a DAG, and the hierarchy is a great way to pick a toolchain but it's not going to save you from yourself if you aren't using RPATH.

That said, modules are not going away -- they're used all over in HPC and we're planning to keep them -- we have to support them. They are what HPC users are used to, and they are how Spack will continue to expose packages on clusters for the foreseeable future. But, virtual environments (with views) are an interface that can work anywhere, be more consistent, and doesn't have to rely on the module system. They're a chance to do something better. I would like it if eventually you could pass around spack.yaml or spack.lock files and reliably reproduce the same environment in a cross-platform way. Modules don't provide any of that reproducibility.

On specific technical points:

  1. I don't think that modules and views need a common interface, but I do think we should eventually have each environment generate a module that loads a view. I would like that to be a follow-on to this PR as there is a bit of re-engineering in the EnvironmentModifications that needs to be done to make it happen.
  2. I agree that if you run spack install in an environment, then resulting view, concretized environment (lockfile) and generated module (TBD) should be kept in sync. @scheibelp's taken care to preserve that here. Basically if you install something (or everything) it should be in the view, and if you uninstall something, it should not.
  3. We made view generation default because we are trying to make the simple case simple (i.e. not a lot of options to remember). If you don't want a view in an environment, you can still turn it off. When we add per-environment modules, I would like to generate those by default too. The main use case I see for view-less environments is if you want to deal with something cross-compiled or something that it does not make sense to actually load into your current environment. I see that as the uncommon case rather than the common case, and I think it's ok to expect people who do that to add an option vs. people who don't know anything about cross-compilation, PATH, etc.

I think that covers most of the discussion -- I hope that is ok with people. @becker33 is working on some finishing touches for this PR.

On build vs. run environments (this is future stuff):

  1. I think spack setup is nice and want to make it work, but I also like the idea of being able to build in a view without having to use spack setup. I would like to see more convergence between the Spack build environment and environments themselves. We're not there yet, but you could imagine generating a view that just has the right compiler wrappers in it.
  2. I agree that you don't build things often without wanting to run them, but we do have a lot of users who want to just run things. Generating an environment for building is actually more constraining because the transitive link dependencies need to be synced. Build dependencies can be free and unlinked (via RPATH). Generating an environment for running is easier because you only have to care about things to run.
  3. I'd like to handle both of these gracefully and we're working on it, but at the moment we have to settle for precedence rules.

Finally, note that @becker33 is working on "spack stacks", which will basically be like an environment (in terms of workflow) but aimed at facility deployment. In a stack, you'll be able to have per-environment module trees as well as combinatorial views (see #9679). The idea there is to make deploying an entire stack for a cluster as easy as making a slightly wordier spack.yaml file. Stacks will very much need modules so don't expect support for modules to wane.

@citibeth
Copy link
Copy Markdown
Member

On @citibeth's main point that views and modules should be equivalent and that people should be able to choose a loader -- I like the gist of this but the truth is they're not equivalent.

I meant they both do the same thing in the end: take a bunch of things you've built, and load them all up together for use in a shell. My observation was that if two things accomplish the same goal, we should try to have the same UI to access both of them. (That's not to say the two things are the same in every way; if they were, we would only have one of them).

I don't think that modules and views need a common interface

I am frankly mystified why we would want to (or be willing to) maintain two UIs if one could suffice. Even if some features are hard properly to implement for modules vs. views at this time, I think we should plan the UI with the asusmption that some day we will implement them all; and in the meantime throw a NotImplementedException or something.

Modules (even Spack ones, though RPATHs help) load things piecemeal and force the consistency issue on the user.

I agree that individual modules are a bad idea and Lmod does not magically fix things. The only way I use modules is I create an environment and then load all the modules that come with that environment. If I need to do something else, I unload everything, change my environment, and load up a new set of modules. I don't load modules piecemeal. And I don't use Lmod: it adds nothing to my use case, but is certainly harder to install, etc.

I believe our goal with the "module" flavor of environments should be to create a script / module / whatever you want to call it that, when you source/load it, sets up your environment variables properly to use the stuff that was built in the environment, without creating symlinks on the disk. And that script should be named after the environment, it should not have hashes in it. Spack should also be able to set environment variables as such without first creating a script to do so. The script is because in many settings, users don't want to learn Spack or type the spack command.

In summary, I see at least four ways that Spack can/should support loading things for use. I list them here from most heavyweight to least heavyweight:

A. Create a view with symlinks, along with a module/script that sets env vars properly to use that view.
B. Generate a module/script that sets env vars properly to use packages in their native installed location.
C. Set env vars properly to use packages in their native installed location, and launch a command (which could be your shell).
D. Set env vars properly to use packages in their native installed location, in the user's current shell.

Every time "module/script" is suggested, Spack could in theory be configurable to generate a module, or a simple bash script. I like bash scripts because they require no extra infrastructure. I like modules becuase they can be unloaded cleanly. Maybe (a) we can find a way to make bash scripts that can unload themselves cleanly, and (b) bash script generation can be set up as another form of module generation.

I would like to see more convergence between the Spack build environment and environments themselves.

I think that's a great idea.

** But remember that Spack build environments are different from Spack Environments because Spack creates a different build environment for each package. If you're building a DAG with 30 packages in it, then Spack will, over the course of building that DAG, generate 30 build environments. It is rare that users want to explicitly use that environment, and it makes sense for Spack to generate these build environemnts in as lightweight a manner as possible, generating 30 sets of symlink trees would make no sense. Therefore, Spack uses option (C) above to generate them. In some cases, however, users want to re-trace the steps of Spack, using the build environment provided by Spack. This option should be explicitly available --- and Spack should be able to bring forth a build environment using any of the 4 modes described above.

If we had this capability, then my guess is Spack Setup would not be needed. Remember: Spack Setup creates a script that sets up a build environment and then runs CMake. This could be replace, in a less CMake-specific manner, by option (C) above. I'm convincing myself that this is the direction Spack Setup should move in.

Without Spack Setup, Spack builds every package in a DAG. With Spack Setup, it either builds or generates a setup script for each package in a DAG. Even if we re-do Spack Setup using the ideas above, syntax is still needed in the env.yaml files to tell Spack which packages in which DAGs of the environment should be marked setup.

There is a known (to me) problem here. Suppose I'm building Spack Environment E, in which A and B are marked as setup and B->A. When E gets built, A and B will not get built; instead, setup scripts for them are generated. Spack module generation (and presumably any other way of Spack generating a set of env vars) decides what env vars to set by looking at the directories left behind by the installed package. At the time E is installed, A and B won't yet be installed, and the modules (or whatever) generated for them will be missing important paths (i.e. they don't work). This could also be a problem for the setup script generated for B, since A has not yet been installed. I don't know why I haven't run into that issue. Once you've built and installed A and B (manually), THEN Spack is able to generate correct modules (env var settings) for them.

I've been hacking my way though this problem so far by regenerating all modules (across all of Spack) once I've build A and B. We will need a better way. This is going to be a problem we have to face for options (A) and (B) above --- basically any environment-loader generation that involves writing stuff to disk ahead of time and then using the environment without Spack.

** Another important difference between Spack Environments and Spack build environments is, Spack Environments can contain more than on DAG. Precedence / conflict resolution rules come into play that we don't have to worry about with a single build environemnt.

but you could imagine generating a view that just has the right compiler wrappers in it.

I really like the idea of a way to load Spack Environemnts (by whatever option above), and loading the compiler wrappers with them. Right now, spack setup does not give you the compiler wrappers, which don't work outside of Spack. So it's not the same as running the regular spack install. Does this mean that making compiler wrappers useful outside of Spack needs to be a sub-task here?

Modules stop scaling... because you actually run out of space in the environment. There is a character limit for the size of environments; we've hit this in our job submission system where jobs would fail due to super large environments.

Wouldn't this make Spack itself stop working, since Spack sets environment variables when constructing build environments? Have you encountered this problem inside Spack itself? If not, why not? How would we work around it when Spack needs to build stuff?

But, virtual environments (with views) are... a chance to do something better.

I hope I've convinced you that symlink trees are not a uniformly better way to load a Spack Environemnt, as compared with setting env vars. If they were, then we would be re-engineering Spack to create symlink trees for every build environment it needs.

I would like it if eventually you could pass around spack.yaml or spack.lock files and reliably reproduce the same environment in a cross-platform way. Modules don't provide any of that reproducibility.

I don't see how this is affected by how you load your Spack Environment, or how modules prohibit that. Is this because modules involve hashes, which vary between platforms? Remember I'm suggesting we generate a module/script for the environment as a whole, which eliminates hashes. I think all 4 ways of loading an environment could be madke to be reliably cross-platform in this case.

I do think we should eventually have each environment generate a module that loads a view.

Yes that should be part of generating a view.

When we add per-environment modules, I would like to generate those by default too.

Currently, individual-package moduels get generated, and Spack knows how to generate a module load script. I know that's not exactly the same as a per-environment modules. But it's pretty close. And I think we should use it as a stand-in for per-environment modules for now, and then switch to real per-environment modules later.

The main use case I see for view-less environments is...

...that Spack uses them already internally.

Other reasons include:

  1. Your cluster has a I-node limit and you don't want to potentially double the number of I-nodes required to build your Spack Environment. (For this reason alone, I will avoid using symlink trees on our cluster).

  2. You have not hit an environment character limit on your system, and your bash runs fast enough searching through all those paths on your system, and you just don't want to have to maintain heavyweight state with your environment.

if you want to deal with something cross-compiled or something that it does not make sense to actually load into your current environment.

Can you elaborate? I don't understand. I think there are plenty of non-esoteric use cases already.

I think spack setup is nice... but you could imagine generating a view that just has the right compiler wrappers in it.

Remember that Spack Setup, as it exists today, does not provide the right compiler wrappers. It's still useful in spite of this obvious flaw, which I'd love to see fixed.

Stacks will very much need modules so don't expect support for modules to wane.

Will stacks require per-package modules, or per-environment modules? I'd love to see if there's a realistic way we can get rid of support for per-package modules. (That would probably involve a way to generate lightweight Spack Environments on the fly).

@becker33 becker33 closed this Jan 2, 2019
@becker33 becker33 reopened this Jan 2, 2019
@becker33
Copy link
Copy Markdown
Member

becker33 commented Jan 3, 2019

@tgamblin @scheibelp This is passing tests now.

@scheibelp
Copy link
Copy Markdown
Member Author

@tgamblin all comments are now addressed. For those where there was some back-and-forth conversation I left them open (but I consider everything resolved except #10017 (comment), which we agreed to handle later).

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Apr 10, 2019

As an aside on this #10017 (comment) I just added #11158 which should fit well for that purpose.

@tgamblin tgamblin dismissed citibeth’s stale review April 10, 2019 22:51

I think we've addressed most of the points in the discussion.

@tgamblin tgamblin merged commit ea1de6b into spack:develop Apr 10, 2019
becker33 added a commit that referenced this pull request Apr 16, 2019
greenc-FNAL added a commit to greenc-FNAL/spack that referenced this pull request Apr 18, 2019
becker33 added a commit that referenced this pull request Apr 18, 2019
becker33 pushed a commit that referenced this pull request Apr 22, 2019
Environments are nowm by default, created with views.  When activated, if an environment includes a view, this view will be added to `PATH`, `CPATH`, and other shell variables to expose the Spack environment in the user's shell.

Example:

```
spack env create e1 #by default this will maintain a view in the directory Spack maintains for the env
spack env create e1 --with-view=/abs/path/to/anywhere
spack env create e1 --without-view
```

The `spack.yaml` manifest file now looks like this:

```
spack:
  specs:
  - python
  view: true #or false, or a string
```

These commands can be used to control the view configuration for the active environment, without hand-editing the `spack.yaml` file:

```
spack env view enable
spack env view envable /abs/path/to/anywhere
spack env view disable
```

Views are automatically updated when specs are installed to an environment. A view only maintains one copy of any package. An environment may refer to a package multiple times, in particular if it appears as a dependency. This PR establishes a prioritization for which environment specs are added to views: a spec has higher priority if it was concretized first. This does not necessarily exactly match the order in which specs were added, for example, given `X->Z` and `Y->Z'`:

```
spack env activate e1
spack add X
spack install Y # immediately concretizes and installs Y and Z'
spack install # concretizes X and Z
```

In this case `Z'` will be favored over `Z`.

Specs in the environment must be concrete and installed to be added to the view, so there is another minor ordering effect: by default the view maintained for the environment ignores file conflicts between packages. If packages are not installed in order, and there are file conflicts, then the version chosen depends on the order.

Both ordering issues are avoided if `spack install`/`spack add` and `spack install <spec>` are not mixed.
tgamblin pushed a commit that referenced this pull request Apr 22, 2019
Environments are nowm by default, created with views.  When activated, if an environment includes a view, this view will be added to `PATH`, `CPATH`, and other shell variables to expose the Spack environment in the user's shell.

Example:

```
spack env create e1 #by default this will maintain a view in the directory Spack maintains for the env
spack env create e1 --with-view=/abs/path/to/anywhere
spack env create e1 --without-view
```

The `spack.yaml` manifest file now looks like this:

```
spack:
  specs:
  - python
  view: true #or false, or a string
```

These commands can be used to control the view configuration for the active environment, without hand-editing the `spack.yaml` file:

```
spack env view enable
spack env view envable /abs/path/to/anywhere
spack env view disable
```

Views are automatically updated when specs are installed to an environment. A view only maintains one copy of any package. An environment may refer to a package multiple times, in particular if it appears as a dependency. This PR establishes a prioritization for which environment specs are added to views: a spec has higher priority if it was concretized first. This does not necessarily exactly match the order in which specs were added, for example, given `X->Z` and `Y->Z'`:

```
spack env activate e1
spack add X
spack install Y # immediately concretizes and installs Y and Z'
spack install # concretizes X and Z
```

In this case `Z'` will be favored over `Z`.

Specs in the environment must be concrete and installed to be added to the view, so there is another minor ordering effect: by default the view maintained for the environment ignores file conflicts between packages. If packages are not installed in order, and there are file conflicts, then the version chosen depends on the order.

Both ordering issues are avoided if `spack install`/`spack add` and `spack install <spec>` are not mixed.
tgamblin pushed a commit that referenced this pull request Apr 22, 2019
Environments are nowm by default, created with views.  When activated, if an environment includes a view, this view will be added to `PATH`, `CPATH`, and other shell variables to expose the Spack environment in the user's shell.

Example:

```
spack env create e1 #by default this will maintain a view in the directory Spack maintains for the env
spack env create e1 --with-view=/abs/path/to/anywhere
spack env create e1 --without-view
```

The `spack.yaml` manifest file now looks like this:

```
spack:
  specs:
  - python
  view: true #or false, or a string
```

These commands can be used to control the view configuration for the active environment, without hand-editing the `spack.yaml` file:

```
spack env view enable
spack env view envable /abs/path/to/anywhere
spack env view disable
```

Views are automatically updated when specs are installed to an environment. A view only maintains one copy of any package. An environment may refer to a package multiple times, in particular if it appears as a dependency. This PR establishes a prioritization for which environment specs are added to views: a spec has higher priority if it was concretized first. This does not necessarily exactly match the order in which specs were added, for example, given `X->Z` and `Y->Z'`:

```
spack env activate e1
spack add X
spack install Y # immediately concretizes and installs Y and Z'
spack install # concretizes X and Z
```

In this case `Z'` will be favored over `Z`.

Specs in the environment must be concrete and installed to be added to the view, so there is another minor ordering effect: by default the view maintained for the environment ignores file conflicts between packages. If packages are not installed in order, and there are file conflicts, then the version chosen depends on the order.

Both ordering issues are avoided if `spack install`/`spack add` and `spack install <spec>` are not mixed.
greenc-FNAL added a commit to greenc-FNAL/spack that referenced this pull request Apr 22, 2019
greenc-FNAL added a commit to greenc-FNAL/spack that referenced this pull request May 6, 2019
alalazo pushed a commit to alalazo/spack that referenced this pull request May 9, 2019
becker33 added a commit that referenced this pull request Jul 2, 2019
becker33 added a commit that referenced this pull request Jul 2, 2019
tgamblin pushed a commit that referenced this pull request Jul 16, 2019
becker33 added a commit that referenced this pull request Jul 16, 2019
@haampie
Copy link
Copy Markdown
Member

haampie commented Mar 15, 2022

@scheibelp any reason why the default is ignore_conflicts=True but not in spack view and other places where YamlFilesystemView is used?

I always believed the default was the not to ignore conflicts.

Also I do remember conflict exceptions when creating environment views, so pretty sure it's rather inconsistent then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants