Skip to content

--no-add installs only specified specs and only if already present in…#22657

Merged
tgamblin merged 3 commits intospack:developfrom
scottwittenburg:no-add-option-for-env-install
May 7, 2021
Merged

--no-add installs only specified specs and only if already present in…#22657
tgamblin merged 3 commits intospack:developfrom
scottwittenburg:no-add-option-for-env-install

Conversation

@scottwittenburg
Copy link
Copy Markdown
Contributor

@scottwittenburg scottwittenburg commented Mar 30, 2021

… env

Fixes #23333

Allow installing one or more specs from command line as long as they are
already present in the environment. Change spec parsing to check active
environment for dag hash (in addition to checking installation database).

Instead of changing spec parsing, this now relies on #23194 for that functionality.

This is now decoupled from parsing hashes of env specs on the command line, and thus no longer requires #23194.

@haampie
Copy link
Copy Markdown
Member

haampie commented Mar 30, 2021

I didn't thoroughly check, but are you aware of spec = spack.cmd.matching_spec_from_env(spec) which got added to a few commands (build-env, stage, location)? Looks similar

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

This is my attempt to implement #22586.

@tgamblin I think this is more or less what we talked about last Friday. If you know of the right reviewer, feel free to add them.

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

scottwittenburg commented Mar 30, 2021

I didn't thoroughly check, but are you aware of spec = spack.cmd.matching_spec_from_env(spec) which got added to a few commands (build-env, stage, location)? Looks similar

Well I did try the method that calls in the Environment class (which is called matching_spec), but when used on the hash of a dependency in the environment, it fails and prints a bunch of messages that the spec in question is a dependency of a bunch of other specs. Otherwise, I could have used that method to find the target for installation. Maybe I just mis-used it though?

EDIT: Having looked at this carefully this week, I do think we should use env.matching_spec(spec) from the Environment class, due to how it intends to handle when a spec might match n roots and m deps (for n >= 0 and m>=0). However I ran into a couple of issues (one confirmed bug, one questioned) and wrote them up in #23333. Depending on the resolution of the questioned bug (matching_spec can return an abstract spec rather than the concrete one from the environment), I plan to move forward using that method instead of searching through the specs myself.

@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch from cf8bc2f to 947934a Compare March 31, 2021 15:14
@tgamblin tgamblin self-requested a review March 31, 2021 17:02
@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch 3 times, most recently from 9b7b27f to 02f87cf Compare April 22, 2021 03:06
@scottwittenburg
Copy link
Copy Markdown
Contributor Author

fyi @zackgalbreath

@hartzell
Copy link
Copy Markdown
Contributor

@scottwittenburg -- just wanted to drop a note here to follow up on using this work to leverage being able to use the CI pipeline to build buildcaches for environments that are concretized together.

@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch 2 times, most recently from 6234ff3 to be338e5 Compare April 28, 2021 21:59
@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch 2 times, most recently from 4bfb3ce to 7fdae50 Compare April 30, 2021 19:55
@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch 2 times, most recently from b170b33 to 0da444a Compare May 1, 2021 01:43
@scottwittenburg
Copy link
Copy Markdown
Contributor Author

@hartzell I think you should be able to use this branch now to spack install --no-add <spec> in an environment concretized together. The only requirement is that each spec you provide on the command line needs to match unambiguously a spec already added to the environment.

As I mentioned in the description, I decoupled this PR from #23194, which allows parsing hashes on the command line for uninstalled specs spack has "seen" before (e.g. specs in active env or in binary mirrors). Consequently, it's not as easy to unambiguously specify a spec on the command line. Until we get that work in, my plan in CI is to use concrete spec yaml files written to disk to specify target specs unambiguously, as I have done all along. That's not strictly required though, abstract cli specs are just fine as long as they match exactly one spec in the active environment.

@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch from 0da444a to ef385a8 Compare May 1, 2021 01:57
scheibelp and others added 3 commits May 1, 2021 09:48
…s a match with a root spec as well as with a dependency of a root spec
In an active concretize environment, support installing one or more
cli specs only if they are already present in the environment.  The
`--no-add` option is the default for root specs, but optional for
dependency specs.  I.e. if you `spack install <depspec>` in an
environment, the dependency-only spec `depspec` will be added as a
root of the environment before being installed.  In addition,
`spack install --no-add <spec>` fails if it does not find an
unambiguous match for `spec`.
@scottwittenburg scottwittenburg force-pushed the no-add-option-for-env-install branch from ef385a8 to e703108 Compare May 1, 2021 15:48
@scottwittenburg scottwittenburg marked this pull request as ready for review May 3, 2021 16:55
@scottwittenburg
Copy link
Copy Markdown
Contributor Author

@tgamblin I think this is ready, can you assign a reviewer based on availability, etc? I think codecov is just wrong about the patch coverage, it's actually more like 100%, and on top of that codecov is complaining about uncovered lines in a module not even touched by this PR (lib/spack/spack/util/compression.py).

@haampie
Copy link
Copy Markdown
Member

haampie commented May 5, 2021

Some commands have been made "environment aware" recently, including spack -e env build-env [spec], spack -e env stage [spec], and more. They filter concrete matching specs in the environment, and then apply stage or build-env to them.

This makes me wonder: shouldn't the behavior of --no-add be the default for consistency with the above?

I would be happy if this was the default behavior in environments:

spack -e env add <spec> # adds a non-concrete root spec <spec> without installing it
spack -e env install # installs the full environment
spack -e env install <spec> # installs all specs in the concrete environment matching <spec> or none if there is no match
spack -e env uninstall <spec> # uninstalls all specs in the concrete environment matching <spec> -- also this shouldn't remove root specs from the environment file!!
spack -e env uninstall # uninstalls the full environment without removing root specs from the environment file
spack -e env remove <spec> # removes matching root specs from the environment

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

@haampie I kind of agree with your assessment there, at least from my narrow perspective which is mostly focused on gitlab CI pipelines. However, I've heard @tgamblin and @becker33 express the argument that spack environments should behave similar to python virtual environments (so spack install foo should add foo to the environment).

@haampie
Copy link
Copy Markdown
Member

haampie commented May 5, 2021

Yeah, spack -e env install xyz could still add a new root spec if there is no matching concrete spec in the environment.

In fact the current behavior of spack -e env build-env xyz is this:

  • if xyz matches a single concrete spec, give the build-env of that
  • if xyz does not match any concrete spec, concretize xyz in the context of the environment, and give the build-env of that

So the same could apply to spack install, right?

  • if xyz matches one or more concrete specs, install all of these
  • if there is no match, run spack add xyz, concretize, and "rerun" spack install xyz

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

So the same could apply to spack install, right?

  • if xyz matches one or more concrete specs, install all of these
  • if there is no match, run spack add xyz, concretize, and "rerun" spack install xyz

I'm inclined to agree with you here. Maybe I'm missing something, but this makes the most sense to me.

@adrienbernede
Copy link
Copy Markdown
Contributor

adrienbernede commented May 5, 2021

Reading this discussion:

  • Environment are designed to mimic python environment behavior: add a top-level requirement, dependencies are implied.
  • Environments will not allow independent builds of the dependencies without adding it to the top-level spec (hence this PR).
  • However, even outside CI, there could be a need to install a given package in the context of an environment (without altering the environment definition (think facility software stacks: an environment describes all the vetted specs)).

This made me think that there is a need for "contexts" (sorts of 'read-only' environments). I am satisfied with the way this PR allows me to use an environment as a "context" description.

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

Sorry, accidental PR close there.

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

@tgamblin @becker33 Do either of you see any reason not to make this behavior the default (and consequently do away with the --no-add option) as suggested in the comment by @haampie?

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to keep the default venv behavior of adding for now, but I agree it's sort of a mixed metaphor. I think this is good as-is, but there probably need to be a larger UI/UX investigation around environments and all the different use cases.

The issue is that they satisfy a lot of use cases. I don't think we should change the default install right now without considering how that affects them, and I think this PR at least gets us to where we can do what @haampie, @adrienbernede, and others want.

@tgamblin tgamblin merged commit 91de23c into spack:develop May 7, 2021
tgamblin pushed a commit that referenced this pull request May 28, 2021
### Overview

The goal of this PR is to make gitlab pipeline builds (especially build failures) more reproducible outside of the pipeline environment.  The two key changes here which aim to improve reproducibility are: 

1. Produce a `spack.lock` during pipeline generation which is passed to child jobs via artifacts.  This concretized environment is used both by generated child jobs as well as uploaded as an artifact to be used when reproducing the build locally.
2. In the `spack ci rebuild` command, if a spec needs to be rebuilt from source, do this by generating and running an `install.sh` shell script which is then also uploaded as a job artifact to be run during local reproduction.  

To make it easier to take advantage of improved build reproducibility, this PR also adds a new subcommand, `spack ci reproduce-build`, which, given a url to job artifacts:

- fetches and unzips the job artifacts to a local directory
- looks for the generated pipeline yaml and parses it to find details about the job to reproduce
- attempts to provide a copy of the same version of spack used in the ci build
- if the ci build used a docker image, the command prints a `docker run` command you can run to get an interactive shell for reproducing the build

#### Some highlights

One consequence of this change will be much smaller pipeline yaml files.  By encoding the concrete environment in a `spack.lock` and passing to child jobs via artifacts, we will no longer need to encode the concrete root of each spec and write it into the job variables, greatly reducing the size of the generated pipeline yaml.

Additionally `spack ci rebuild` output (stdout/stderr) is no longer internally redirected to a log file, so job output will appear directly in the gitlab job trace.  With debug logging turned on, this often results in log files getting truncated because they exceed the maximum amount of log output gitlab allows.  If this is a problem, you still have the option to `tee` command output to a file in the within the artifacts directory, as now each generated job exposes a `user_data` directory as an artifact, which you can fill with whatever you want in your custom job scripts.

There are some changes to be aware of in how pipelines should be set up after this PR:

#### Pipeline generation

Because the pipeline generation job now writes a `spack.lock` artifact to be consumed by generated downstream jobs, `spack ci generate` takes a new option `--artifacts-root`, inside which it creates a `concrete_env` directory to place the lockfile.  This artifacts root directory is also where the `user_data` directory will live, in case you want to generate any custom artifacts.  If you do not provide `--artifacts-root`, the default is for it to create a `jobs_scratch_dir` within your `CI_PROJECT_DIR` (a gitlab predefined environment variable) or whatever is your current working directory if that variable isn't set. Here's the diff of the PR testing `.gitlab-ci.yml` taking advantage of the new option:

```
$ git diff develop..pipelines-reproducible-builds share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
diff --git a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
index 579d7b5..0247803 100644
--- a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
+++ b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
@@ -28,10 +28,11 @@ default:
     - cd share/spack/gitlab/cloud_pipelines/stacks/${SPACK_CI_STACK_NAME}
     - spack env activate --without-view .
     - spack ci generate --check-index-only
+      --artifacts-root "${CI_PROJECT_DIR}/jobs_scratch_dir"
       --output-file "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
   artifacts:
     paths:
-      - "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
+      - "${CI_PROJECT_DIR}/jobs_scratch_dir"
   tags: ["spack", "public", "medium", "x86_64"]
   interruptible: true
```

Notice how we replaced the specific pointer to the generated pipeline file with its containing folder, the same folder we passed as `--artifacts-root`.  This way anything in that directory (the generated pipeline yaml, as well as the concrete environment directory containing the `spack.lock`) will be uploaded as an artifact and available to the downstream jobs.

#### Rebuild jobs

Rebuild jobs now must activate the concrete environment created by `spack ci generate` and provided via artifacts.  When the pipeline is generated, a directory called `concrete_environment` is created within the artifacts root directory, and this is where the `spack.lock` file is written to be passed to the generated rebuild jobs.  The artifacts root directory can be specified using the `--artifacts-root` option to `spack ci generate`, otherwise, it is assumed to be `$CI_PROJECT_DIR`.  The directory containing the concrete environment files (`spack.yaml` and `spack.lock`) is then passed to generated child jobs via the `SPACK_CONCRETE_ENV_DIR` variable in the generated pipeline yaml file.

When you don't provide custom `script` sections in your `mappings` within the `gitlab-ci` section of your `spack.yaml`, the default behavior of rebuild jobs is now to change into `SPACK_CONCRETE_ENV_DIR` and activate that environment.   If you do provide custom rebuild scripts in your `spack.yaml`, be aware those scripts should do the same thing: assume `SPACK_CONCRETE_ENV_DIR` contains the concretized environment to activate.  No other changes to existing custom rebuild scripts should be required as a result of this PR. 

As mentioned above, one key change made in this PR is the generation of the `install.sh` script by the rebuild jobs, as that same script is both run by the CI rebuild job as well as exported as an artifact to aid in subsequent attempts to reproduce the build outside of CI.  The generated `install.sh` script contains only a single `spack install` command with arguments computed by `spack ci rebuild`.  If the install fails, the job trace in gitlab will contain instructions on how to reproduce the build locally:

```
To reproduce this build locally, run:
  spack ci reproduce-build https://gitlab.next.spack.io/api/v4/projects/7/jobs/240607/artifacts [--working-dir <dir>]
If this project does not have public pipelines, you will need to first:
  export GITLAB_PRIVATE_TOKEN=<generated_token>
... then follow the printed instructions.
```

When run locally, the `spack ci reproduce-build` command shown above will download and process the job artifacts from gitlab, then print out instructions you  can copy-paste to run a local reproducer of the CI job.

This PR includes a few other changes to the way pipelines work, see the documentation on pipelines for more details.

This  PR erelies on 
~- [ ] #23194 to be able to refer to uninstalled specs by DAG hash~
EDIT: that is going to take longer to come to fruition, so for now, we will continue to install specs represented by a concrete `spec.yaml` file on disk.
- [x] #22657 to support install a single spec already present in the active, concrete environment
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jun 11, 2021
### Overview

The goal of this PR is to make gitlab pipeline builds (especially build failures) more reproducible outside of the pipeline environment.  The two key changes here which aim to improve reproducibility are: 

1. Produce a `spack.lock` during pipeline generation which is passed to child jobs via artifacts.  This concretized environment is used both by generated child jobs as well as uploaded as an artifact to be used when reproducing the build locally.
2. In the `spack ci rebuild` command, if a spec needs to be rebuilt from source, do this by generating and running an `install.sh` shell script which is then also uploaded as a job artifact to be run during local reproduction.  

To make it easier to take advantage of improved build reproducibility, this PR also adds a new subcommand, `spack ci reproduce-build`, which, given a url to job artifacts:

- fetches and unzips the job artifacts to a local directory
- looks for the generated pipeline yaml and parses it to find details about the job to reproduce
- attempts to provide a copy of the same version of spack used in the ci build
- if the ci build used a docker image, the command prints a `docker run` command you can run to get an interactive shell for reproducing the build

#### Some highlights

One consequence of this change will be much smaller pipeline yaml files.  By encoding the concrete environment in a `spack.lock` and passing to child jobs via artifacts, we will no longer need to encode the concrete root of each spec and write it into the job variables, greatly reducing the size of the generated pipeline yaml.

Additionally `spack ci rebuild` output (stdout/stderr) is no longer internally redirected to a log file, so job output will appear directly in the gitlab job trace.  With debug logging turned on, this often results in log files getting truncated because they exceed the maximum amount of log output gitlab allows.  If this is a problem, you still have the option to `tee` command output to a file in the within the artifacts directory, as now each generated job exposes a `user_data` directory as an artifact, which you can fill with whatever you want in your custom job scripts.

There are some changes to be aware of in how pipelines should be set up after this PR:

#### Pipeline generation

Because the pipeline generation job now writes a `spack.lock` artifact to be consumed by generated downstream jobs, `spack ci generate` takes a new option `--artifacts-root`, inside which it creates a `concrete_env` directory to place the lockfile.  This artifacts root directory is also where the `user_data` directory will live, in case you want to generate any custom artifacts.  If you do not provide `--artifacts-root`, the default is for it to create a `jobs_scratch_dir` within your `CI_PROJECT_DIR` (a gitlab predefined environment variable) or whatever is your current working directory if that variable isn't set. Here's the diff of the PR testing `.gitlab-ci.yml` taking advantage of the new option:

```
$ git diff develop..pipelines-reproducible-builds share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
diff --git a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
index 579d7b5..0247803 100644
--- a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
+++ b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
@@ -28,10 +28,11 @@ default:
     - cd share/spack/gitlab/cloud_pipelines/stacks/${SPACK_CI_STACK_NAME}
     - spack env activate --without-view .
     - spack ci generate --check-index-only
+      --artifacts-root "${CI_PROJECT_DIR}/jobs_scratch_dir"
       --output-file "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
   artifacts:
     paths:
-      - "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
+      - "${CI_PROJECT_DIR}/jobs_scratch_dir"
   tags: ["spack", "public", "medium", "x86_64"]
   interruptible: true
```

Notice how we replaced the specific pointer to the generated pipeline file with its containing folder, the same folder we passed as `--artifacts-root`.  This way anything in that directory (the generated pipeline yaml, as well as the concrete environment directory containing the `spack.lock`) will be uploaded as an artifact and available to the downstream jobs.

#### Rebuild jobs

Rebuild jobs now must activate the concrete environment created by `spack ci generate` and provided via artifacts.  When the pipeline is generated, a directory called `concrete_environment` is created within the artifacts root directory, and this is where the `spack.lock` file is written to be passed to the generated rebuild jobs.  The artifacts root directory can be specified using the `--artifacts-root` option to `spack ci generate`, otherwise, it is assumed to be `$CI_PROJECT_DIR`.  The directory containing the concrete environment files (`spack.yaml` and `spack.lock`) is then passed to generated child jobs via the `SPACK_CONCRETE_ENV_DIR` variable in the generated pipeline yaml file.

When you don't provide custom `script` sections in your `mappings` within the `gitlab-ci` section of your `spack.yaml`, the default behavior of rebuild jobs is now to change into `SPACK_CONCRETE_ENV_DIR` and activate that environment.   If you do provide custom rebuild scripts in your `spack.yaml`, be aware those scripts should do the same thing: assume `SPACK_CONCRETE_ENV_DIR` contains the concretized environment to activate.  No other changes to existing custom rebuild scripts should be required as a result of this PR. 

As mentioned above, one key change made in this PR is the generation of the `install.sh` script by the rebuild jobs, as that same script is both run by the CI rebuild job as well as exported as an artifact to aid in subsequent attempts to reproduce the build outside of CI.  The generated `install.sh` script contains only a single `spack install` command with arguments computed by `spack ci rebuild`.  If the install fails, the job trace in gitlab will contain instructions on how to reproduce the build locally:

```
To reproduce this build locally, run:
  spack ci reproduce-build https://gitlab.next.spack.io/api/v4/projects/7/jobs/240607/artifacts [--working-dir <dir>]
If this project does not have public pipelines, you will need to first:
  export GITLAB_PRIVATE_TOKEN=<generated_token>
... then follow the printed instructions.
```

When run locally, the `spack ci reproduce-build` command shown above will download and process the job artifacts from gitlab, then print out instructions you  can copy-paste to run a local reproducer of the CI job.

This PR includes a few other changes to the way pipelines work, see the documentation on pipelines for more details.

This  PR erelies on 
~- [ ] spack#23194 to be able to refer to uninstalled specs by DAG hash~
EDIT: that is going to take longer to come to fruition, so for now, we will continue to install specs represented by a concrete `spec.yaml` file on disk.
- [x] spack#22657 to support install a single spec already present in the active, concrete environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error in Environment.matching_specs() method

6 participants