Skip to content

Spack Environments (part 4): command-line, spack.yaml, and spack.lock#9612

Merged
tgamblin merged 47 commits intodevelopfrom
features/environments-4
Nov 9, 2018
Merged

Spack Environments (part 4): command-line, spack.yaml, and spack.lock#9612
tgamblin merged 47 commits intodevelopfrom
features/environments-4

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented Oct 23, 2018

This supersedes #8231, reworks the API, and adds a lot of features.

Spack environments are fill a number of needs:

  1. Allow users to easily work with distinct sets of packages and configurations, e.g., for different projects or different types of deployments;
  2. Allow users to reproduce a set of packages someone else built, either:
    a. Functionally, from the abstract specs of the prior installation (i.e., with an spack.yaml file)
    b. Exactly, from the concrete specs that were previously installed (i.e., with what bundler, npm, cargo, and pipenv call a "lockfile")
  3. Allow users to interact with sets of specs easily on the command line;
  4. Allow users to easily version dependency and configuration information, e.g., in a git repository; and
  5. Provide an easier alternative to individually loading modules for packages with "virtual environments".

I've attempted to refactor the code so that one basic Environment concept can provide all of these. This builds on the prior work by @scheibelp and @citibeth, and it attempts to integrate some of what is described at #7944.

I think it is easiest to describe what's implemented here in terms of two workflows:

Command-line environment usage

You can use enviroments to work with a subset of packages, like you'd normally work with spack:

$ spack env create myenv         # create an environment
$ spack env list                 # what (named) environments are available?
==> 3 environments
    foo  my-env  my-project
$ spack env activate -p myenv    # "activate" it (-p optionally sets the prompt)
[myenv] $ spack find             # show packages that are in this environment
==> 0 installed packages.
[myenv] $ spack install libjpeg
[myenv] $ spack find             # now one's installed
==> 1 installed package.
-- darwin-highsierra-x86_64 / [email protected] -----------------
libjpeg@9c
[myenv] $ despacktivate          # deactivate (spack env deactivate also works :)
$

You can incrementally add things to environments:

$ spack env create myenv
$ spack env activate -p myenv
[myenv] $ spack env add openmpi  # add some packages
[myenv] $ spack env add hdf5
[myenv] $ spack env add libelf
[myenv] $ spack env status       # what've I done so far?
==> In environment myenv
added:
---- openmpi
---- hdf5
---- libelf
[myenv] $ spack env install      # install the whole thing
...
[myenv] $ spack env status
$ spack env status
==> In environment my-env
concrete:
---- libelf
[+]  [email protected]%[email protected] arch=darwin-highsierra-x86_64
---- openmpi
[+]  [email protected]%[email protected]~cuda+cxx_exceptions fabrics= ~java~legacylaunchers~memchecker~pmi schedulers= ~sqlite3~thread_multiple+vt arch=darwin-highsierra-x86_64
---- hdf5
[+]  [email protected]%[email protected]~cxx~debug~fortran~hl+mpi+pic+shared~szip~threadsafe arch=darwin-highsierra-x86_64
[myenv] $ spack find
-- darwin-highsierra-x86_64 / [email protected] -----------------
[email protected]  [email protected]  [email protected]
[myenv] $ despacktivate          # deactivate (spack env deactivate also works :)
$

Environments also let you concretize groups of specs at the same time:

$ spack env create myenv
$ spack env activate -p myenv
[myenv] $ spack env add openmpi  # add some packages
[myenv] $ spack env add hdf5

[myenv] $ spack env concretize.  # concretize everything that's added
[myenv] $ spack env status
$ spack env status
==> In environment my-env
concrete:
---- openmpi
 -   [email protected]%[email protected]~cuda+cxx_exceptions fabrics= ~java~legacylaunchers~memchecker~pmi schedulers= ~sqlite3~thread_multiple+vt arch=darwin-highsierra-x86_64
---- hdf5
 -   [email protected]%[email protected]~cxx~debug~fortran~hl+mpi+pic+shared~szip~threadsafe arch=darwin-highsierra-x86_64

[myenv] $ spack env install      # install it all
...
[myenv] $ spack env status
==> In environment my-env
concrete:
---- openmpi
[+]  [email protected]%[email protected]~cuda+cxx_exceptions fabrics= ~java~legacylaunchers~memchecker~pmi schedulers= ~sqlite3~thread_multiple+vt arch=darwin-highsierra-x86_64
---- hdf5
[+]  [email protected]%[email protected]~cxx~debug~fortran~hl+mpi+pic+shared~szip~threadsafe arch=darwin-highsierra-x86_64

[myenv] $ despacktivate
$

Environments are created, by default, in $spack/var/spack/environments, and you refer to them by name. You can also create environments external to Spack in directories:

spack env create -d directory-environment
spack env activate -d ./directory-environment   # -d isn't necessary unless ambiguous

Also, you don't have to activate them to use them. Spack has a new
-e / --env option you can use to execute any Spack command in a
specific environment:

$ spack -e myenv find
-- darwin-highsierra-x86_64 / [email protected] -----------------
[email protected]  [email protected]  [email protected]

spack.yaml / spack.lock: environments in the filesystem

"Named" environments in var/spack/environments and directory environments are both just directories with two special files: spack.yaml and spack.lock.

spack.yaml

spack.yaml describes the specs you want in your environment. It's created when you do spack env create <name>, and it describes the specs you've added so far:

# This is a Spack Environment file.
#
# It describes a set of packages to be installed, along with
# configuration settings.
spack:
  # add package specs to the `specs` list
  specs:
  - hdf5
  - libelf
  - openmpi

This is what other project dependency managers call a manifest file --
a list of the things you want to install with a project. You can
maintain it by hand, or spack env create, spack env add, and spack env remove, spack install, etc. will all update the spack.yaml file
when you use them.

In Spack, the spack.yaml file can also contain configuration:

# This is a Spack Environment file.
#
# It describes a set of packages to be installed, along with
# configuration settings.
spack:
  repos:
  - file:///usr/local/spack-repo

  # compiler locations for just this environment
  compilers:
    # ...

  # set custom stage areas for the environment
  config:
    template_dirs:
    - /path/to/my/local/module/templates
    build_stage:
    - /usr/workspace/myusername

  # package settings
  packages:
    all:
      compiler: [clang]
      providers:
        mpi: [openmpi]

  # add package specs to the `specs` list
  specs:
  - hdf5
  - libelf
  - openmpi

The sections in the file can be anything from the regular Spack
configuration files,
so you can do some sophisticated things here if you're creative. You can
also include configs from elsewhere:

spack:
  # include external configuration
  include:
  - ../special-config-directory/
  - ./config-file.yaml

  # add package specs to the `specs` list
  specs:
  - hdf5
  - libelf
  - openmpi

The included items can either be Spack configuration scopes (directories
with packages.yaml, config.yaml, compilers.yaml, etc.) or files
with all of those config sections merged into a single file (like in the
spack.yaml file above).

Notice that you can have relative paths. Those are relative to the
spack.yaml file, so you can put it and its associated configuration in
a git repository if you want to.

spack install, spack spec, and other commands will use the
configuration from your environment if it is active. So you can easily
maintain multiple sets of configurations in environments, then switch
quickly between them by activating the one you want to use..

spack.yaml in code repositories

As is common elsewhere in the dependnecy management world (Pipenv, Cargo,
Bundler, etc.), you can put a spack.yaml file in your project's
repository and use it to make bootstrapping dependencies easier for your
users:

$ git clone https://github.com/myproj/myproj.git
$ cd myproj
$ ls
CONTRIBUTING.md  LICENSE  README.md  configure*  m4/  spack.yaml  src/
$ spack install
# ... all dependencies from spack.yaml are installed ...

spack install, when called without arguments in a repo that has a
spack.yaml, will concretize and install all the specs in the
spack.yaml file.

We chose the name spack.yaml to make it clear that this is a file that
Spack understands, and so that it would be distinct from other files at
the top level of a repo.

spack.lock and reproducing environments

The spack.lock "lockfile" is created whenever you install or concretize
an environment. If you run spack install as suggested in the previous
section, it will produce a file called spack.lock alongside
spack.yaml. This contains both the abstract specs that were used to
install the packages, and the full, concretized specs of these packages
and their dependencies. It's intended to allow you to reproduce an
environment exactly as it was built by someone else.

If you send someone a spack.yaml or spack.lock file, they can create
a new environment from these files with commands like these:

spack env create myenv spack.yaml

or:

spack env create myenv spack.lock

If you create an environment from spack.yaml, you'll get a new
environment with the same root specs (like a pip requirements file), and
it will be re-concretized when you install it on a new machine. If you
create it from spack.lock, you'll get that and the concrete specs to
reproduce things exactly. Currently that has to be completely exact,
but in the future we'll support generating something "as close as
possible" to the original environment on a new host.

Using environments

You can currently use environments by running spack env loads and
sourcing the resulting file:

$ spack env loads myenv
To load this environment, type:
   source /Users/spackuser/src/spack/var/spack/environments/myenv/loads

spack env loads generates a single file with module load calls for
all packages in the environment. You need a module system in your
environment to use it.

TODO: views We will also be adding a view to each environment. A
view is a single prefix with all packages symlinked into it, like a
Python virtual environment. Activating an environment will add paths
for this directory. @scheibelp will be adding that after he reviews
this PR.

Some more technical details

On a technical level, this differs from the prior implementation in #8231 in a few important ways:

  1. There's only one manifest file format -- spack.yaml, and you can use it to initialize a new environment or to control an existing one.
  2. spack.yaml is updated with new abstract specs when you use spack env add, spack env remove, etc., and comments are preserved through these updates.
  3. spack.lock always contains the results of the last time the environment was concretized (both inputs and outputs). spack.yaml is the human-editable file with inputs, and spack.lock is machine-readable and exact. They're kept in sync.
  4. The command line is a bit different, but I hope that the activation concept along with embedding environment commands in regular Spack commands (spack find, spack spec, etc.) solves most of the issues.

Summary:

  • Move the old spack env command to spack build-env

  • Add a new spack env command:

    • creating and querying environments:
      • spack env create ENV: create a new environment
      • spack env destroy ENV: destroy an environment
      • spack env list: list available environments
      • spack env status [ENV] get a list of what's been added/installed to this environment
    • activating and deactivating environments:
      • spack env activate ENV: activate the named environment (makes ENV args implicit)
      • spack env deactivate OR despacktivate: deactivate the currently activated environment
    • installing, adding/removing specs, etc:
      • spack env add SPEC: add a spec to the current environment
      • spack env remove SPEC: remove a spec from the current environment
      • spack env install: concretize (see below) and install all specs in an environment (you can optionally just install already concretized specs)
      • spack env concretize [ENV]: concretize all specs in the environment and write a spack.lock file
    • environment functionality embedded in regular Spack commands:
      • spack install SPEC: if an environment is activated, this now installs into the active environment
      • spack install: if a spack.yaml file is found in the current directory, this concretizes and installs all specs in that yaml file -- so you can keep an environment in a git repo outside Spack.
      • spack find: if an environment is active, spack find shows only specs installed in the current environment
      • spack spec: if an environment is active, spack spec and other commands concretize using configuration from the active environment
    • other, miscellaneous commands:
      • spack env loads [ENV]: generate a script that loads all modules for an environment
      • spack env stage [ENV]: stage all specs in an environment
      • spack location -e ENV: get the location of an environment
      • spack cd -e ENV: cd to an environment's directory
      • spack env uninstall: uninstall all specs from an environment
  • The spack command itself now has a -e option that you can use to specify an environment on the command line. This takes precedence over the current environment from spack env activate ENV

TODO:

  • add view to environment, add view directories to PATH, PYTHONPATH, etc. on activate (@scheibelp)
  • update documentation (docs still deal with old Spack environments)

Copy link
Copy Markdown
Member

@citibeth citibeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I've left a lot of comments, etc. The summary and task list here is a succinct and specific version of what I have to say. I would suggest focusing on mainly on this, and skimming over the other text for additional context, details, etc.

  • Address requests for expanded explanations in documentation.
  • If two included config files set the same key or specify the same package in the packages.yaml section, establish a precedence between them. Don't just quit with an error (as it worked in the past), which prevented any kind of overriding of package versions between configs.
  • Make config and spec precedence work the same order as listed in spack.yanl` file (i.e. earlier vs. later has higher precedence)
  • Update precedence rules for packages loading in an environment so that explicitly named packages always take precedence over implicit packages.
  • Generate a single module for an environment. Remove spack env loads, which always felt like a hack.
  • If spack env loads is not removed, make sure that Spack module generation works within an activated environment. It should re-generate modules for just that environment.
  • Integrate Spack Environment garbage collection: scheibelp#1
  • Change YAML grammar to accommodate future inclusion of Spack Setup. See for example: https://github.com/citibeth/spack/blob/efischer/giss/var/spack/environments/twoway-dev-gibbs/env.yaml
  • Move description of spack.yaml inside a repo into a separate section of the manual. I'm not sure it's really Spack Environments, even if it relies on some of the same infrastructure to implement.
  • Make UI consistent. See explanation and additional tasks below.

UI Consistency

The current UI is inconsistent in a few ways:

  1. Some operations on a single environment exist as spack env sub-commands (eg: spack env add), whereas others exist as top-level commands (eg: spack install). Still others are top-level commands that take the environment name as an ad-hoc -e flag (eg: spack cd).
  2. Some spack env sub-commands duplicate functionality of a top-level command, but with a different name. eg: spack -e <ENV> find is essentially the same as spack env status <ENV>.
  3. Once a user has run spack env activate <ENV>, they are still required to (unnecessarily) give their env name again when they wish to modify it (eg: spack env add). Moveover, this requirement to re-state the env name exists for some things but not others.

The UI should be updated to conform to the following principles:

  1. Operations that deal with multiple environments, or managing or choosing environments (and only such operations), should be sub-commands of spack env.

  2. Operations that read/write a single environment should exist as a top-level command. They can be used in one of two ways, in a consistent manner: (a) spack -e [ENV] <command> or (b) spack env actiate [ENV]; ...; spack <command>. Corollarys:

    1. All spack env sub-commands should take an environment: eg spack env <sub-command> [ENV]. Exceptions are:
      1. spack env deactivate, which complements spack env activate. There's no need for this to be a spack env sub-command, but it seems to make sense.
      2. spack env list, which lists Spack Environments.
    2. spack -e [ENV] env <sub-command> [ENV]... should always be an error, because there's no point in specifying the env name twice.
    3. If spack env <sub-command> [ENV] is run within a spack env activate context, then the activated env should be ignored.
  3. If a top-level command makes sense either with or without an environment (eg: spack install), then the env and non-env version should be merged together in one command.

  4. It's OK if a top-level command ONLY makes sense with an environment (eg: spack add). Running it without first activating an env (either through spack env activate or through spack -e prefix) should raise a runtime error.

  5. We should think long and hard whenever a spack env sub-command has the same name as a top-level command. If their functionality is the same, should they just be merged? If their functionality is different, will it be confusing to users? For example, spack activate and spack env activate are two totally different things.

With that in mind, I request the following changes:

  • Make sure you raise an error if spack env destroy [ENV] is called within a spack env activate [ENV] context. You should really de-activate your env before destroying it.
  • Consider that users can/will put stuff that Spack doesn't understand in a Spack Environment directory. spack env destroy should do an ls of the entire directory and ask the user to confirm deletion.
  • Consider renaming spack env destroy to spack env rm.
  • Get rid of despacktivate. That's a heavyweight solution to a non-problem. Users who want it can do alias despacktivate='spack env deactivate'.
  • Merge spack env status [ENV] into the top-level command spack find.
  • Consider renaming spack find to spack status, since it's always been highly confusable with spack list.
  • Move spack env add SPEC to the top-level command spack add SPEC. Same for spack env remove. It's an error if these are run without an environment.
  • Remove spack env install, this functionality is alreday in spack install.
  • Move spack env concretize to the top-level command spack concretize. It's an error if it's run without an env.
  • Merge spack env loads to the existing "top-level" spack module loads.
  • Consider moving spack module loads to a top-level spack loads, since that functionality has always been in a strange place.
  • Move spack env stage [ENV] to spack stage.
  • Merge spack location -e ENV to spack location. (i.e. it can be run, among other ways, with spack -e ENV location).
  • Merge spack cd -e ENV into just spack cd.
  • Merge spack env uninstall to just spack uninstall.

ncview: # Highest precedence
netcdf:
nco: # Lowest precedence
py-sphinx:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a situation where configs take precedence if they're listed later, but specs take precedence if they're listed earlier. This is clearly an oversight in the API, which needs to be fixed. I would suggest making spec precedence work the same way as config precedence, since config precedence is an already-merged feature elsewhere.

#. Spack Environments may contain more than one version of the same
package; but only a single module for a package may be loaded.
Modules that occur in earlier specs listed in an environment take
precedence over modules that occur later.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody would (knowingly) put two versions of the same package in an environment. So explain why this would be a good thing: because your environment contains A and [email protected], and because A->[email protected].

In this case, it seems that the precedence rules are incomplete. If the user has placed A and [email protected] in an environment, then they would expect that A and [email protected] would both be loaded, regardless of which order they are listed in the file. Users might not even know that A depends on [email protected], and will be mystified when they get [email protected] instead of [email protected].

Of course... if the same packages is explicitly listed twice, then we SHOULD choose between them based on order. Similarly, if the same packages is implicitly listed twice, there should also be an order-based precedence. But explicitly listed packages should always take precedence over implicitly listed packages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the old docs but it's still true.

@citibeth
Copy link
Copy Markdown
Member

Comments based on the Docs in the main PR

Environments are created, by default, in $spack/var/spack/environments, and you refer to them by name. You can also create environments external to Spack in directories:

spack env create -d directory-environment
spack env activate -d ./directory-environment   # -d isn't necessary unless ambiguous

If you do all your work in Spack Environments, then it is now possible to garbage collect unnecessary packages. This is important on shared supercomputers with inode limits. BUT... Spack has to know about all the environments it is supposed to keep around. Creating environments outside of Spack will break garbage collection (as would using Spack without environments). The idea of creating an environment outside of Spack is an obvious feature; but I would want to have a clear idea of why we need it???

In any case, this downside of this feature should be documented.

Also, you don't have to activate them to use them. Spack has a new
-e / --env option you can use to execute any Spack command in a
specific environment:

This should have been said right at the top with spack activate. Also, don't bother saying the feature is "new;" all of Spack Environements are new.

spack.yaml describes the specs you want in your environment. It's created when you do spack env create <name>, and it describes the specs you've added so far:

I like that the former contents of the config/ sub-directory have been merged into the spack.yaml file. That's a better design. Now... is it possible to use a single YAML file for a config, rather than separate packages.yaml, etc? That would be nice.

This is what other project dependency managers call a manifest file --
a list of the things you want to install with a project.

It might be better to just describe Spack Environments; and then in a later place, relate Spack Environments to how other package managers do environments. Many of us are not familiar with other package managers, and I think it weakens our brand to always compare ourselves to them. I find the seemingly random references to other package managers to be jarring.

You can
maintain it by hand,

Good

spack:
repos:

  • file:///usr/local/spack-repo

Why does repos require a URL but other things in the config just require a filename? That seems a bit confusing. What happens if you give a simple filename for the Spack Repo?

set custom stage areas for the environment

config:
template_dirs:
- /path/to/my/local/module/templates
build_stage:
- /usr/workspace/myusername

Would use of this features limit portability of Spack Environments between different systems? Actually I suppose that any path-based features could... Something should be said somewhere that when you port a Spack Environment, the paths needs to be manually checked / updated. Acutally, there probably needs to be a whole section on porting environments, once we (someone) have gotten practice with it.

The sections in the file can be anything from the regular Spack
configuration files,
so you can do some sophisticated things here if you're creative. You can
also include configs from elsewhere:

spack:
  # include external configuration
  include:
  - ../special-config-directory/
  - ./config-file.yaml

Can this do paths based on env vars, eg:

include:
- {HOME}/my-config.yaml

add package specs to the specs list

specs:

  • hdf5
  • libelf
  • openmpi

Note the change in syntax (YAML grammar) required to accomodate Spack Setup (coming in a future PR on the heels of this one).

https://github.com/citibeth/spack/blob/efischer/giss/var/spack/environments/twoway-dev-gibbs/env.yaml

    specs:
        modele-tests:                          # Highest precedence
            setup: [modele-tests]
        modele:
            setup: [modele, ibmisc, pism, icebin]
        modele-control:
        modele-utils:
        ncview:
        netcdf:
        nco:      

The included items can either be Spack configuration scopes (directories
with packages.yaml, config.yaml, compilers.yaml, etc.) or files
with all of those config sections merged into a single file (like in the
spack.yaml file above).

Wouldn't it be great if this could be the case for the rest of Spack as well and we didn't have to separate our (now numerous) configs into 5 different files each...????

Notice that you can have relative paths. Those are relative to the
spack.yaml file, so you can put it and its associated configuration in
a git repository if you want to.

We should expand more on why this is a good idea. You can fork Spack, then add your own environments to it. Now you have a single turn-key Spack solution for your users. They clone your fork and install the environment(s) you included in it. They don't have to do any crazy setup or configuration. When you want to upgrade your users, you can do it in a controlled manner by updating your environment(s) in the repo, pulling from the latest Spack upstream, etc. And you can test it all before deploying your updated fork to users.

spack.yaml in code repositories

As is common elsewhere in the dependnecy management world (Pipenv, Cargo,
Bundler, etc.), you can put a spack.yaml file in your project's
repository and use it to make bootstrapping dependencies easier for your
users:

$ git clone https://github.com/myproj/myproj.git
$ cd myproj
$ ls
CONTRIBUTING.md  LICENSE  README.md  configure*  m4/  spack.yaml  src/
$ spack install
# ... all dependencies from spack.yaml are installed ...

Woah... I feel we just moved into an entirely new feature space, this is not really Spack Environments anymore. So... it looks like spack install from within a project's git repo installs that project's dependencies? This brings up so many questions:

  1. Why don't we get dependencies from a package.py file (or equivalent), which package developers should already be writing, that is included in the repo? Now I'll have to be writing my dependencies so many times: (a) in CMake, (b) in package.py, and now (c) in spack.yaml???
  2. Does this, explicitly or implicitly, create, activate, load or otherwise use a Spack Environment?

spack install, when called without arguments in a repo that has a
spack.yaml, will concretize and install all the specs in the
spack.yaml file.

Isn't this normally what happens when you install a package, via package.py? Why do we need a new mechanism for it?

What would make more sense to me is if 3d party repos can ship a combination of Spack Configs and Spack Environments, allowing them to set up a recommended configuration of software versions for their software (as opposed to trying to encode all that in package.py, which has created problems in the past).

In general... I think this feature needs more thought, more documented/expected use cases, etc.

spack.lock and reproducing environments

The spack.lock "lockfile" is created whenever you install or concretize
an environment. If you run spack install as suggested in the previous
section, it will produce a file called spack.lock alongside
spack.yaml. This contains both the abstract specs that were used to
install the packages, and the full, concretized specs of these packages
and their dependencies. It's intended to allow you to reproduce an
environment exactly as it was built by someone else.

If you send someone a spack.yaml or spack.lock file, they can create
a new environment from these files with commands like these:

It seems from the docs this is not a lockfile, so why is it named like one? Is this just the old environment.json file rehashed?

spack env create -e spack.yaml

or:

spack env create -e spack.lock

If you create an environment from spack.yaml

In which directory will that environment be created? Does it copy the spack.yaml file you gave?

, you'll get a new
environment with the same root specs (like a pip requirements file), and

Again... no need to keep referencing other environment managers. It just confuses those of us who are not familiar with them.

If you
create it from spack.lock, you'll get that and the concrete specs to
reproduce things exactly.

Suppose I do:

  1. Create spack.yaml. Edit by hand, add comments.
  2. Concretize
  3. Move spack.lock to a new machine.
  4. Create new environment based on spack.lock.

Question: Will comments from (1) be preserved in (4)? If not... is there a procedure that allows for this to happen?

Currently that has to be completely exact,
but in the future we'll support generating something "as close as
possible" to the original environment on a new host.

Huh?

Activating an environment will add paths
for this directory. @scheibelp will be adding that after he reviews
this PR.

It is inconsistent if activating an env with views includes the env into your PATH; but activating an env without views does not. It also brings up the question, what if I make an env and I want to use it sometimes as a view and sometimes by loading a module? Now when I "activate" it, I will always be loading the View directory, even if don't want to be using it that way at this time.

  1. There's only one manifest file format -- spack.yaml, and you can use it to initialize a new environment or to control an existing one.
  2. spack.yaml is updated with new abstract specs when you use spack env add, spack env remove, etc., and comments are preserved through these updates.
  3. spack.lock always contains the results of the last time the environment was concretized (both inputs and outputs). spack.yaml is the human-editable file with inputs, and spack.lock is machine-readable and exact. They're kept in sync.

These are solid improvements.

  1. The command line is a bit different, but I hope that the activation concept along with embedding environment commands in regular Spack commands (spack find, spack spec, etc.) solves most of the issues.

I believe this is heading in the right direction. If functionality is missing, it should be possible to add/tweak it in the future, without radically altering the UI.

  • Move the old spack env command to spack build-env

I feel we have confusion reigning. We now have the following ways to run something (including maybe a shell) with different env var settings:

  • spack build-env
  • spack env activate
  • Are there any other ways?

Meanwhile, spack activate does something completely different. Let's see if we can clean up the commands here, to avoid eternal confusion.

PS: Does spack build-env work within spack env activate???

@tgamblin
Copy link
Copy Markdown
Member Author

I'll just mark the comments on the docs resolved -- the rst docs still need updating, and the PR is supposed to be a high-level version of what they'll look like.

@citibeth
Copy link
Copy Markdown
Member

Yes... I would take the "requested changes" section seriously, the rest is commentary.

@tgamblin
Copy link
Copy Markdown
Member Author

If you do all your work in Spack Environments, then it is now possible to garbage collect unnecessary packages. This is important on shared supercomputers with inode limits. BUT... Spack has to know about all the environments it is supposed to keep around. Creating environments outside of Spack will break garbage collection (as would using Spack without environments). The idea of creating an environment outside of Spack is an obvious feature; but I would want to have a clear idea of why we need it???

In any case, this downside of this feature should be documented.

There are basically two reasons to do this:

  1. It lets people script environments. If I write a workflow that creates/modifies/etc. environments, I have to come up with names for named environments, which is great for UI on my own machine (and easy to remember and manage), but it's a pain for scripting. If I make an "anonymous" environment for a script, now I have to come up with a name for it. We could add the moral equivalent of mktmp for environments, but I think it's more flexible to let the scripter pick the storage location, instead of requiring that everything be managed. I agree we should document that they can't be gc'd (though we don't have a gc yet 😄).
  2. You need this feature to support sticking a spack.yaml in a project repo, and I'm trying to leverage what we're doing for environments for UI and developer workflow. The manifest/lockfile pattern is something developers have been asking for, and I don't think any other package manager in the same category in Spack provides it. For example, Conda and Conan both give you the equivalent of requirements.txt but nothing like a lockfile.

FWIW, the managed package thing also comes up with spack chains (#8772). We decided to ignore it there but we plan to have commands that will copy needed packages into a sub-spack, instead of just linking. We could do something similar for environments but I have not thought it through. The model that comes immediately to mind is npm's local vs. global modules.

@tgamblin
Copy link
Copy Markdown
Member Author

the rest is commentary

but I like the comments!

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 23, 2018 via email

@tgamblin
Copy link
Copy Markdown
Member Author

Yep I mean spack env activate. I really want to get rid of spack activate once environments are fully fleshed out.

Meaning (2) is not always desired; especially when a Spack Environment is not yet fully built.

I think this can be an option on activate (activate with and without loading the results). I agree you want this for things like x-compiled environments.

Tell them that you've created this nifty module named myenv that they can load with their trusted module load command --- and they will be very happy.

I think it would be easy to also generate modules for environments, with corresponding names, which would provide a familiar interface for HPC people.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 23, 2018 via email

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 23, 2018 via email

@tgamblin
Copy link
Copy Markdown
Member Author

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments, mostly on documentation.

I do have one request to change the code though: Database.query IMO should stay the same as it is and functions using it should take care to filter after the query.

For other folks reviewing this, FYI that several files were simply renamed, which accounts for at least 700 or so lines of the diff. This includes

  • spack/util/environment.py (moved from spack/environment.py, this may in particular be confusing because now environment.py is used to manage the Spack environment introduced in this PR)
  • spack/cmd/build_env.py (renamed from spack/cmd/env.py)
  • spack/test/cmd/build_env.py


$ spack env myenv create

Spack then creates the following files:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this section is more detail than what should initially be presented to the user for using environments. This is the first time they are seeing commands. Ideally the most-common 5 or so commands should all fit together in one (non-huge) screen. Listing the files is important but can be done later.

The following files may be added to this directory by the user or
Spack:

* ``env.yaml``: Additional environment specification and configuration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At whatever point we get into this level of detail, it is worth being explicit that env.yaml controls normal Spack configuration that is customized on a per-environment basis. i.e. I want to distinguish the customization of the environment with customization of Spack associated with the environment. This should also be mentioned in the section about advantages.

Also config/ is missing


All environments are stored in the ``var/spack/environments`` folder.

The following files may be added to this directory by the user or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this directory" refers to var/spack/environments since that is the most recent object. It can be clarified with: The following files may be added to var/spack/environments/myenv

.. note::

#. If ``env.yaml`` exists, then Spack will no longer automatically
load from the default environment ``config/`` directory. This is a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bug but stating it as such is more distracting than helpful here. Just suggesting the fix is sufficient. We can add a TODO or issue to address this later.

elsewhere.


Initializing an Environment from a Template
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality referred to in this section is gone, so this section should be removed.

Args:
dicts (list): list of dictionaries

Return: (dict): a new ``dict`` ``update()``'d with each ``dict`` in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit literal. I'd prefer return: a combination of the dictionaries

Also: mention which dict takes precedence if the same key is defined in multiple dictionaries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still too literal IMO, and still doesn't mention precedence.


_arguments['recurse_dependents'] = Args(
'-R', '--dependents', action='store_true', dest='dependents',
help='also uninstall any packages that depend on the ones given '
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This help message appears to be a bit specific given that it appears in common/arguments.py

# TODO: like installed and known that can be queried? Or are
# TODO: these really special cases that only belong here?

# TODO: handling of hashes restriction is not particularly elegant.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO environment-constrained queries can just filter the list of specs after the fact. In particular I care because I know of at least one other PR touching this interface: #8772 (Spack chain).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin I still think Database should not be modified in this PR. I expanded on that at #9612 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp: I didn't see a counterargument to the performance point I made. The reason I think this should be in query is because it really is just another constraint (subset of specs) and query is a general query function.

If you keep this outside query, then every query does as many spec compares as there are installed packages. If you push the hash comparison into query(), then we can quickly eliminate specs that aren't in the set of hashes, and only do spec comparisons on the things in the subset of hashes.

I could see having two query functions if they were separable and you could put the fast one first, but right now that's not possible without a bigger refactor of Database. Thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather this were refactored into a 2-3 functions (in a separate PR) and that it stayed as simple as possible for now (even at the cost of find being a bit slower than it could be in an environment):

  • A function with the bulk of this logic that doesn't use self._data and just looks through a set of specs that it is given
  • A function that runs the first function with a set of specs

Then you can call the second function with self._data or "the set of specs in an environment"

IMO this is the sort of speedup that could be added later: I'd prefer to reduce complexity added by this PR since it adds quite a lot already.

@scheibelp
Copy link
Copy Markdown
Member

@citibeth I think there are several requests you make that can be done later:

Generate a single module for an environment. Remove spack env loads, which always felt like a hack.

Potentially useful but this can be done later

If spack env loads is not removed, make sure that Spack module generation works within an activated environment. It should re-generate modules for just that environment.

spack env loads only loads modules for specs in the environment. It doesn't affect module generation (or regeneration). If there is module-specific configuration, then it occurs to me that the module files for different environments will all be merged into the same directory (with potentially different schemes). IMO this can also be handled later.

Integrate Spack Environment garbage collection

This should happen after this PR is merged. GC is a great feature but not essential for core environment functionality.

Comments on UI consistency:

Once a user has run spack env activate , they are still required to (unnecessarily) give their env name again when they wish to modify it (eg: spack env add)

I don't think this is true: if an environment is active, spack env add <spec> will add the spec to the current active environment (see cmd/env.get_env).

That being said, there is the possibility of inconsistency for spack env add: the spack env add command has a -e option, so there are 3 possible specifications of the environment: spack -e <env1> env add -e <env2> spec, with the third environment being the currently active one (i.e. the SPACK_ENV variable is set).

To make this consistent with other commands, the spack env add -e functionality could be removed.

@tgamblin on that count, why was -e added to Spack's main.py vs. managed as a common arg (added to cmd/common/arguments.py)? I'm guessing it has something to do with config initialization but it would be helpful if you stated it here.

Some operations on a single environment exist as spack env sub-commands (eg: spack env add), whereas others exist as top-level commands (eg: spack install). Still others are top-level commands that take the environment name as an ad-hoc -e flag (eg: spack cd).

I think that in different cases each of these approaches makes sense. This depends on whether:

  • a given action only makes sense within an environment
  • an existing Spack command could be modified to operate in the context of an environment

spack env add only makes sense in the context of an environment. One could offer spack add as a shorthand but IMO that would be confusing: exposing environment-specific operations as subcommands of environments is more helpful; I think having two top level commands spack add and spack install would be confusing.

spack -e <env> install vs. spack env install vs. spack env install <env name>: spack install exists separate from environments and intends to install a single spec; it can take on additional meaning in the context of an active environment. spack env install installs all specs in an environment and depends on activation of the environment; spack env install <env name> does the same without requiring activation of the environment. Of the 3, only the 3rd is redundant (with the 2nd), but it is also the one that is the least important and the easiest to control.

Regarding spack cd: Other than spack cd, all Spack commands customize their environment by setting spack -e <env> command ... rather than spack command -e <env> .... I think spack cd is a special case. This is already something of a catch-all command: it lets you relocate your CWD to various locations that are relevant to Spack (it is based on spack location). Unlike spack install it isn't necessarily related to the current environment. For example spack cd -P takes you to the package repository directory.

@SteVwonder
Copy link
Copy Markdown
Member

SteVwonder commented Oct 24, 2018

EDIT: This is an awesome feature! Thanks to everyone that helped bring it to life.

It seems from the docs this is not a lockfile, so why is it named like one? Is this just the old environment.json file rehashed?

I don't know the origin of the lock, but I know Gem and NPM use "lock files" to store the concrete package versions installed. I agree that the naming is confusing, and it might be best to break with convention here and name it something else. Like maybe spack.concrete?

Consider renaming spack find to spack status, since it's always been highly confusable with spack list.

I agree that the list vs find naming is very confusing, but based on the open issue (#4159) I don't think there is consensus on what to change them to. I would recommend against changing the names of any top-level commands in this PR, as that would be a large breaking change. Maybe spack env status could become spack env find if consistency is of utmost importance.

Now... is it possible to use a single YAML file for a config, rather than separate packages.yaml, etc? That would be nice.

Wouldn't it be great if this could be the case for the rest of Spack as well and we didn't have to separate our (now numerous) configs into 5 different files each...????

👍 👍

I really want to get rid of spack activate once environments are fully fleshed out.

👍 👍


(The stuff below is largely an echo of @scheibelp's comment above)

Integrate Spack Environment garbage collection: scheibelp#1

I wouldn't hold up this PR waiting on this feature to be implemented. I think this should be a follow-on PR.

Move spack env add SPEC to the top-level command spack add SPEC. Same for spack env remove. It's an error if these are run without an environment.

Move spack env concretize to the top-level command spack concretize. It's an error if it's run without an env.

Not that I have any great alternatives in mind, but this smells off to me. It seems weird to have top-level commands that only work inside an environment. These commands seem ripe for sticking under spack env.

More specifically, I have an opposition to adding spack add as a top-level command. Both spack add and spack install, which semantically are very similar, would be top-level commands. I could see having both as top-level commands being confusing to new users, just as list/find and clean/purge can be confusing.

Sort of related: does anyone have a spack "cheat sheet" of commands? Having them all laid out in one place and organized by topic could be useful for this kind of discussion.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 24, 2018 via email

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 24, 2018 via email

@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Oct 24, 2018

I wanted to add a note on "lockfiles". The manifest/lockfile model has already become a thing -- it refers to locking the resolved versions of dependencies in most of the language-specific package managers out there right now. Here are some references:

Honorable mentions:

I went with spack.yaml and spack.lock because I see this model growing in the project dependency management space (all these package managers are in that space except Homebrew, and brew-bundle attempts to provide this for Homebrew). People have come to expect it, and I don't think we should try to rename something that has become a very familiar concept.

I do think we did better than these package managers by having notions of "abstract" and "concrete" specs and "concretization". I think that's a clearer concept than "dependency resolution", which is what most of the other tools call it. Dependency resolution typically only refers to setting specific versions, while concretization does that and compilers, variants, flags, etc., and I think that's an important distinction. I don't think it's hard or unintuitive to say "the spack.lock file contains all the concrete specs for an environment."

@scheibelp
Copy link
Copy Markdown
Member

@tgamblin I think the most important outcome so far on the UI discussion is that the extra -e option should be removed from all spack env subcommands (i.e. no more spack env add -e), since it is already specified as part of spack -e . That will avoid confusion WRT redundant environment specifications and keep things mostly consistent (except for spack cd). Relatively speaking, the rest of this is less important.

(less important) I argue below that spack -e <env> <command> ... should be spack <command> -e <env> ... (note this does not conflict with the request to get rid of spack env add -e <env> above - I am saying spack -e <env> env add should be spack env -e <env> add) but that's relatively speaking not as big a deal.

@citibeth

Because spack -e FOO command... is the same as spack env activate FOO; spack command. You can do the latter for any command, even when the command itself is insensitive to environemnt (eg spack list). So there's really no harm in being able to set the environment for any command using spack -e, even if the command doesn't use it.

You can already activate the environment-aware version of any spack command by first doing spack env activate <env> and in that case you do even less typing. If you change spack -e <env> command ... with spack command -e <env> ... you then get a clear account of which commands are environment-sensitive "for free" (i.e. built into the command help) vs. having to read through docs. This would avoid the confusion with spack cd.

You forgot spack -e <env1> install <env2>. Which would be an error?

I didn't. That command is not valid. I do think though that users may end up getting confused about this unless one of the environment specification mechanisms is removed.

That being said, there is the possibility of inconsistency for spack env add: the spack env add command has a -e option,

A few points here:

  1. You have already put a global -e option on the whole Spack command.

The argument you are quoting in #9612 (comment) agrees with your response, which appears to be phrased as though it is a counterargument, so I am confused.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 24, 2018 via email

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 24, 2018 via email

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Oct 24, 2018 via email

- uninstall now:
  - restricts its spec search to the current environment
  - removes uninstalled specs from the current environment
  - reports envs that still need specs you're trying to uninstall

- removed spack env uninstall command
- updated tests
- `spack env status` used to show install status; consolidate that into
  `spack find`.

- `spack env status` will still print out whether there is an active
  environment
- split 'environment' section into 'environments' and 'modules'
- move location to 'query packages' section
- move cd to developer section

- --env-dir no longer has a short optino (was -E)
- -E now means "run without an environment" (no longer same as --env-dir)
- -D now means "run with this directory environment"
- remove short options for may infrequently used top-level commands
- The `Spec` class maintains a special `_patches_in_order_of_appearance`
  attribute on patch variants, but it is was preserved when specs are
  copied.

- This caused issues for some builds

- Add special logic to `Spec` to preserve this variant on copy

- TODO: in the long term we should get rid of the special variant and
  make it the responsibility of one of the variant classes.
- args weren't being delegated properly from CommentedMap to OrderedDict
- spack.yaml files in the current directory were picked up inconsistently
  -- make this a sure thing by moving that logic into find_environment()
  and moving find_environment() to main()

- simplify arguments to Spack command:
  - remove short args for infrequently used commands (--pdb/-D, -P, -s)
  - `spack -D` now forces an env with a directory
- to aovid changing spec hashes drastically, only add this attribute to
  differentiated abstract specs.

- othherwise assume that read-in specs are concrete
- all commands (except `spack find`, through `ConstraintAction`) now go
  through get_env() to get the active environment

- ev.active was hard to read -- and the name wasn't descriptive.
  - rename it to _active_environment to be more descriptive and to strongly
    indicate that spack.environment manages it
@tgamblin tgamblin force-pushed the features/environments-4 branch 2 times, most recently from a593bf9 to 84292e1 Compare November 9, 2018 07:50
@tgamblin tgamblin merged commit 423d3e7 into develop Nov 9, 2018
@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Nov 9, 2018

@scheibelp @becker33: I integrated the setup-env.csh fix and removed docs for now to add them in a PR very soon.

We are still having an issue where Python 2.6 builds are hanging and we don't know why. We've seen this in some other builds as well. If it persists on develop we can let Python2.6 fail until we fix it; the Python 2.6 tests pass when I run them and when I log into a Travis container and run them, so I believe the failures are related to Travis, not Spack.

@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Nov 9, 2018

At any rate, this is merged! Environments are in! Docs, a tutorial, and some new feature additions are coming soon.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Nov 9, 2018

@tgamblin are there any plans to marry a repo's spack.yaml with a simplified package.py?

That would allow software developers to in-source describe and update dependencies instead of writing a lot of "if between release X and Y, depends on [email protected]:2.5, else [email protected] and if Y>3.4 also adds dependency on W" inside a single package.py. They get cluttered over time quickly. Maybe I am also missing some of the possible workflows.

@citibeth
Copy link
Copy Markdown
Member

citibeth commented Nov 9, 2018

From the timestap, it looks like some midnight oil was burned on Spack Environments. Thank you Peter and Todd for all your hard work. This is truly a glorious day in Spack Land!

@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Nov 11, 2018

That would allow software developers to in-source describe and update dependencies instead of writing a lot of "if between release X and Y, depends on [email protected]:2.5, else [email protected] and if Y>3.4 also adds dependency on W" inside a single package.py. They get cluttered over time quickly. Maybe I am also missing some of the possible workflows.

@ax3l: I think there's a long-term path there but right now that is hard. We're not a distributed package management system (yet) and the HPC ecosystem needs some packages to be curated. There is probably a balance where some packages can be managed more like they are in registry-based systems, where you have a package.py vary over time in a repo, and others can just be central ones. But I wouldn't expect that soon. I do like the idea, though.

@jcftang
Copy link
Copy Markdown
Member

jcftang commented Nov 11, 2018

+1 I tend to agree with the need for curation in the various environments or stacks as a starting guide for others to modify locally

scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Dec 11, 2018
The goal is to try and get the full_hash computed during the
'buildcache check' to match the one computed (or looked up) during
the 'buildcache create' if nothing else has changed.  Not taking
patches into account during the latter was causing packages to
rebuild on every pipeline, even when it was unnecessary.
@tgamblin tgamblin mentioned this pull request Jul 20, 2019
Comment on lines +370 to +371
# If the command-line scope is present, it should always
# be the scope of highest precedence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when it shouldn't... I'd rather see this reverted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in #32273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.