Skip to content

Add a top-level concretizer config scope and --fresh option#28468

Merged
becker33 merged 6 commits intodevelopfrom
concretization-config
Feb 16, 2022
Merged

Add a top-level concretizer config scope and --fresh option#28468
becker33 merged 6 commits intodevelopfrom
concretization-config

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented Jan 17, 2022

We want to make --reuse the default concretization strategy, but to do that we need to have some configuration so that users can revert if they need to. Also, the concretizer is going to grow to have many more options, and we really need some structured place to specify them.

We currently have a couple places where concretizer options are specified (aside from the preferences in packages.yaml):

  1. The config:concretizer option. This chooses the solver (clingo or original).
  2. The concretization option in environments. This is not a top-level config section -- it's just for environments.

Extending either of these is awkward -- both are simple strings and we'd need to deprecate (or work around) the old config options to expand them into more extensive YAML options. Also, we'd eventually like to have one place where concretizer configuration is specified.

To avoid overlapping with either of these and to allow the most extensibility in the future, this adds a new concretizer config section that can be used in and outside of environments. There are two options right now: reuse and minimal. This can expand later. My expectation is that we will soon deprecate config:concretizer and warn when the user doesn't use clingo, and we will eventually (sometime later) move the together / separately options from concretization into the top-level concretizer section.

concretizer.yaml currently looks like this:

concretizer:
  # Whether to consider installed packages or packages from buildcaches when
  # concretizing specs. If `true`, we'll try to use as many installs/binaries
  # as possible, rather than building. If `false`, we'll always give you a fresh
  # concretization.
  reuse: false

Stay tuned for additional options like --minimal.

And there are two options on spack install, spack spec, etc.:

concretizer arguments:
  --fresh               do not reuse installed deps; build newest configuration
  --reuse               reuse installed dependencies/buildcaches when possible

--fresh is there for when we make --reuse the default. I could not think of a more intuitive way to express what Spack's default install routine, but here is the rationale -- see if you like it:

  1. Most package managers would call Spack's current default an "upgrade" install -- it prefers the latest versions. However, it's more than that -- it looks at whatever preferences you've set up, and it may change variants and things based on what's in the repo. So --upgrade didn't fully capture it (you might be "side-grading" or just re-concretizing).
  2. The default install tends to create new builds. I thought about --new for "new installation", but it might also reuse if there's a hash match, so --new didn't seem completely right.
  3. People who do not like --reuse -- e.g., people with CI use cases or nix/guix purists -- would call the default Spack install "deterministic", because it doesn't consider installation state. It's reproducible for the same spack version and packages. --deterministic and --reproducible seem too technical.

So I settled on --fresh, because I think it captures what we're doing -- making a fresh install from the latest packages. It also gives us a term we can use pretty easily.

When we make --reuse the default, it'll make more sense, as Spack will tend to reuse what you already have, and you may want to start with a completely --fresh install once those installations become too old.

There are a few additional technical refactors in here to make this stuff easier; the details of those are in the commit messages.

  • refactor spack.solver.asp.solve() into a class
  • add a concretizer config section and default concretizer.yaml
  • fix up how we handle mock configuration in conftest.py
  • introduce a ConfigSetAction to automatically set config options with argparse
  • introduce add_concretizer_args() method to make concretizer arg handling consistent across commands
  • rework call chain from commands -> solver to reduce the number of arguments and rely on config
  • introduce --fresh concretizer arg
  • docs

@adamjstewart
Copy link
Copy Markdown
Member

I actually prefer --deterministic or --reproducible over --fresh. Like you said, the people who will use this option are those who care about determinism/reproducibility. --fresh is kinda ambiguous much like --dirty.

Looking at this and #28470, I think it would be helpful to formulate as many possible concretization use cases as we possibly can. As of now, it's unclear to me if it makes sense to list these as booleans keys (i.e. some of these may not be compatible, or may not be binary). Here are some concretization schemes I think would be useful to have (names TBD):

Determinism

  • reuse none: spec does not depend on what is already installed (old behavior)
  • reuse all: reuse already installed packages whenever possible (new behavior)
  • reuse deps-only: reuse dependencies, but upgrade root specs (my desired behavior)
  • pure chaos: make all decisions randomly, good test of dependency constraints 😈

I think "reuse all" is a good default here.

Packages

  • minimal builds all: spec with the fewest nodes (useful for short build time)
  • minimal builds source: spec with fewest builds from source, buildcache installs don't matter
  • maximal builds all: spec with the most nodes (useful to stress test spack)
  • maximal builds source: spec with most builds from source

Versions

  • maximal versions all: install the newest versions of all packages (old behavior)
  • maximal versions root: install the newest versions of root specs, reuse installed (my desired behavior)
  • minimal versions all: install the oldest versions of all packages (useful for CI)
  • minimal versions root: install the oldest versions of root specs, newest version of deps (useful for CI)

See pypa/pip#8085 for some use cases for and debate on the minimal versions concretization option in pip. I would actually love this to ensure that my documented minimal supported versions remain up-to-date. This would also help to test Spack packages.

Compilers

  • single compiler: current behavior to minimize compiler conflict
  • reuse compiler: allow mixing and matching compilers or mixing and matching compiler flags to minimize rebuilds (dangerous)
  • system compiler: use system compiler for build deps, default/specified compiler for everything else (separate concretization of build deps PR)

Variants

  • minimal variants: disable all boolean variants, none value in multi-valued variants (useful for testing that deps are correct)
  • maximal variants: enable all boolean variants, all values in multi-valued variants (useful for populating buildcache with most powerful binaries)

For minimal variants, imagine just how many packages would break if you turned off all of the Python variants, we already hit this with packages like OpenCV/boost. For maximal variants, imagine distributing PyTorch/TF binaries built for every supported cuda_arch.

Architecture

  • optimized: only use binaries built for this specific microarchitecture
  • unoptimized: allow unoptimized binaries to avoid unnecessary builds

If you can think of any other possible use cases, feel free to update this comment.

Seeing these all laid out, it's unclear if it makes sense to have separate flags/keys for all of these or just a few flags/keys like --concretization-strategy=reuse-root. Of course, we don't necessarily have to support all of these, but I've already mentioned at least a few that I personally want that we don't yet have, and I'm sure others have more specific use cases.

@tgamblin tgamblin force-pushed the concretization-config branch from 4555ed2 to 2d101d9 Compare January 18, 2022 04:08
@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Jan 18, 2022

@adamjstewart: I actually like all of these. They're all doable. Some of them (like compilers) require some information that we don't have yet (specifically, compatibility between compilers and compiler versions for different languages). It might be worth converting this list to a discussion someplace so we can keep it around.

My worry is that we can't plan for all this. I can't really think of ways they won't work together (except minimizing and maximizing the same thing), so I'm not as worried about that, but naming them is going to be hard, and we need to get reuse as the default soon, so that we can switch to full hash and fix the current somewhat thrashy CI situation. E4S, it turns out, has around 30 builds that are in conflict with some other build -- in the sense that they share a DAG hash but not a full hash. We need that deployment model to start matching up -- we can't ignore build deps in our installs anymore, but I don't want to inflict that on users without default --reuse.

I would argue that --reuse and --fresh are special enough that they could have special options. Can we move the "how many kinds of optimization are there" discussion to #28470 and work on getting this change in so that we can make --reuse default sooner?

I think if we do that, the only decision left here is "what do we call --fresh. I have some opinions on that:

  • I don't think --deterministic is good because people will use it on develop and they'll say "it's not deterministic".
  • I don't think --reproducible is good because we still don't do fully reproducible builds in Spack -- the environment can matter.
  • I think --new is about as good as --fresh, but @becker33 didn't like --new for the reasons mentioned above.
  • I have really started liking --fresh because we can define it. I don't think Spack's default install mode right now is much like anything else.

One of the reasons I like --fresh is because it maps to a lot of use cases:

  • In CI, it means we're testing the latest configuration.
  • In an env or in regular Spack, it means we're upgrading -- but not just versions.
  • I think --latest could be good but it sort of implies just versions.
  • I think it works well in conversation, especially with a link to a precise definition in the docs.

Do you have other suggestions?

@adamjstewart
Copy link
Copy Markdown
Member

Yep, I definitely agree we should make --reuse the default soon, just trying to help with the naming discussion. Like you said, we don't want to have to support multiple naming schemes at the same time, so it's better to get this right the first time.

I can't really think of ways they won't work together (except minimizing and maximizing the same thing), so I'm not as worried about that

What about "reuse all" vs "reuse deps" vs "reuse none"? That set alone makes me wonder whether boolean key-value pairs make sense. We could have separate reuse: true/false, deps_only: true/false settings, but that wouldn't work if you want to reuse deps but maximize root spec versions. I think we're going to need a more complicate naming scheme than the one you're proposing here.

Another alternative to --fresh: --no-reuse. It isn't clear that --fresh and --reuse are opposites and mutually exclusive.

@scottwittenburg
Copy link
Copy Markdown
Contributor

It isn't clear that --fresh and --reuse are opposites and mutually exclusive.

I agree it's not intuitive that these are mutually exclusive, and that's the main thing I don't like about --fresh. I think I prefer --no-reuse, but based on the description it seems it might not be accurate. The current behavior is basically --fresh and we already get some reuse from that, so --no-reuse is slightly misleading. Is that the main argument against it?

@nilsvu
Copy link
Copy Markdown
Contributor

nilsvu commented Jan 19, 2022

For what it's worth, I like --fresh. It captures the user's intention pretty well (plus the more thought-out reasons given by @tgamblin). Whichever name you end up choosing, I would like to suggest adding a -U shorthand for the option, since it often has upgrade semantics and users are familiar with this option from package managers like pip.

Also, @adamjstewart I love your suggestion for a pure chaos config option. It would be super useful for CI testing.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jan 19, 2022

I concur that --update, --new, --deterministic and --reproducible are all imprecise for the reasons explained in #28468 (comment). The --no-reuse option is not bad, but it is also imprecise since we could reuse on a hash match. Would that work if we turned it into the slightly longer --disregard-reuse?

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jan 19, 2022

All in all, I like both --no-reuse (or slight variation on the same theme) or --fresh.

@nilsvu
Copy link
Copy Markdown
Contributor

nilsvu commented Jan 19, 2022

Another suggestion: go with @adamjstewart's suggestion above right away(#28468 (comment)) and make the CLI option something like --reuse=none or --reuse=all.

@ashermancinelli
Copy link
Copy Markdown
Contributor

+1 for --no-reuse but a poll in the Spack slack channel seems in order. The full --concretization-strategy=foo is ultimately what I want, and would be extremely helpful for CI as has already been mentioned.

@aweits
Copy link
Copy Markdown
Contributor

aweits commented Jan 19, 2022

--deprioritize-reuse is a thought that just occurred to me...

@tgamblin
Copy link
Copy Markdown
Member Author

tgamblin commented Jan 19, 2022

I kicked off the poll with --no-reuse and --fresh, as we got a few votes for each of these here and on the call this morning (and I'm trying to keep it simple). Vote in Slack!

@tgamblin
Copy link
Copy Markdown
Member Author

@adamjstewart::

What about "reuse all" vs "reuse deps" vs "reuse none"?

I think we can handle cases like this with, e.g.:

Now:

concretizer:
    reuse: true

Later:

concretizer:
    reuse:
        from: [env, global, caches]
        only: [deps]

reuse: true would still imply a default configuration. We did this with view: true vs. view: with a full projection specification in environments already -- I think it's relatively easy to do this for concretizer prefs, too.

I know I said it was hard to transition from a string to a dict above, but it's fairly easy to do when you're expanding on the true case. config:concretizer is harder to transition this way b/c we want to deprecate original and eventually move it out of config. Similar for concretization: together in the env -- the semantics don't telescope as naturally. I think it can work at least for reuse.

@tgamblin tgamblin force-pushed the concretization-config branch from ae443f1 to 42db1d7 Compare January 20, 2022 22:07
@tgamblin
Copy link
Copy Markdown
Member Author

@adamjstewart: ok based on the Slack poll, I think --fresh won.

I think the last thing that we might need here is docs. Anything else missing in your view?

Screen Shot 2022-01-20 at 2 08 15 PM

@tgamblin tgamblin force-pushed the concretization-config branch from bec3d65 to b61efd6 Compare February 15, 2022 19:00
The solver has a lot of configuration associated with it. Rather
than adding arguments to everything, we should encapsulate that
in a class. This is the start of that work; it replaces `solve()`
and its kwargs with a class and properties.
The concretizer is going to grow to have many more configuration,
and we really need some structured config for that.

* We have the `config:concretizer` option that chooses the solver,
  but extending that is awkward (we'd need to replace a string with
  a `dict`) and the solver choice will be deprecated eventually.

* We have the `concretization` option in environments, but it's
  not a top-level config section -- it's just for environments,
  and it also only admits a string right now.

To avoid overlapping with either of these and to allow the most
extensibility in the future, this adds a new `concretizer` config
section that can be used in and outside of environments. There
is only one option right now: `reuse`.  This can expand to include
other options later.

Likely, we will soon deprecate `config:concretizer` and warn when
the user doesn't use `clingo`, and we will eventually (sometime later)
move the `together` / `separately` options from `concretization` into
the top-level `concretizer` section.

This commit just adds the new section and schema. Fully wiring it
up is TBD.
Config scopes were different for `config` and `mutable_config`,
and `mutable_config` did not have a command line scope.

- [x] Fix by consolidating the creation logic for the two fixtures.
`--reuse` was previously handled individually by each command that
needed it. We are growing more concretization options, and they'll
need their own section for commands that support them.

Now there are two concretization options:

* `--reuse`: Attempt to reuse packages from installs and buildcaches.
* `--fresh`: Opposite of reuse -- traditional spack install.

To handle thes, this PR adds a `ConfigSetAction` for `argparse`, so
that you can write argparse code like this:

```
     subgroup.add_argument(
        '--reuse', action=ConfigSetAction, dest="concretizer:reuse",
        const=True, default=None,
        help='reuse installed dependencies/buildcaches when possible'
     )
```

With this, you don't need to add logic to pull the argument out and
handle it; the `ConfigSetAction` just does it for you. This can probably
be used to clean up some other commands later, as well.

Code that was previously passing `reuse=True` around everywhere has
been refactored to use config, and config is set from the CLI using
a new `add_concretizer_args()` function in `spack.cmd.common.arguments`.

- [x] Add `ConfigSetAction` to simplify concretizer config on the CLI
- [x] Refactor code so that it does not pass `reuse=True` to every function.
- [x] Refactor commands to use `add_concretizer_args()` and to pass
      concretizer config using the config system.
* Document `concretizer.yaml`, `--reuse`, and `--fresh`.
@tgamblin tgamblin force-pushed the concretization-config branch from b61efd6 to dcb387f Compare February 15, 2022 19:23
@tgamblin tgamblin requested a review from becker33 February 15, 2022 19:23
@tgamblin tgamblin force-pushed the concretization-config branch from 5baccbe to e584f4c Compare February 15, 2022 20:08
@tgamblin tgamblin requested a review from becker33 February 15, 2022 20:09
@tgamblin tgamblin force-pushed the concretization-config branch from e584f4c to dd7e562 Compare February 15, 2022 20:30
becker33
becker33 previously approved these changes Feb 15, 2022
@tgamblin tgamblin force-pushed the concretization-config branch 4 times, most recently from 064cc77 to 423c599 Compare February 16, 2022 01:35
becker33
becker33 previously approved these changes Feb 16, 2022
Reuse previously was a very invasive change that required parameters to be added to all
the methods that called `concretize()` on a `Spec` object. With the addition of
concretizer configuration, we can use the config system to simplify this argument
passing and keep the code cleaner.

We decided that concretizer config options should be read at `Solver` instantiation
time, and if config changes between instnatiation of a particular solver and
`solve()` invocation, the `Solver` should use the settings from `__init__()`.

- [x] remove `reuse` keyword argument from most concretize functions
- [x] refactor usages to use `spack.config.override("concretizer:reuse", True)`
- [x] rework argument passing in `Solver` so that parameters are set from config
      at instantiation time
@becker33 becker33 merged commit b1ff9c0 into develop Feb 16, 2022
@becker33 becker33 deleted the concretization-config branch February 16, 2022 18:17
Comment on lines +569 to +572
# If we're only doing setup, just return an empty solve result
if setup_only:
return Result(specs)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrongly positioned, we should exit after we generated the error facts. I'll submit a bug fix asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands concretization configuration defaults documentation Improvements or additions to documentation environments tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants