Skip to content

Make system config path overridable#26341

Closed
haampie wants to merge 1 commit intospack:developfrom
haampie:feature/add-SPACK_SYSTEM_CONFIG_PATH-env-variable
Closed

Make system config path overridable#26341
haampie wants to merge 1 commit intospack:developfrom
haampie:feature/add-SPACK_SYSTEM_CONFIG_PATH-env-variable

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Sep 29, 2021

On shared systems with a system-wide versioned programming environment
where the user can switch between different versions (and test the
latest the greatest when they feel like it), it's convenient to provide
a config for compilers and packages depending on the programming
environment version. It'd also be nice if the user could use their own
version of Spack on their favorite commit, and pick up config without
having to back up their user config and replacing it with something
temporarily when switching programming environment versions.

With this change, people managing the system can provide compilers.yaml
and packages.yaml versioned with the programming environment:

/path/to/programming/environment/1.2.3/{compilers,packages}.yaml

and set SPACK_SYSTEM_CONFIG_PATH to point to this dir.

In particular, on Cray systems this variable could be set in a module
called spack-system-config, and the user could opt in to use it, and
doesn't have to know about whatever programming environment verison they
are in.

As a bonus, if somebody would ever put Spack config files in
/etc/spack, the user can now opt out of that by setting

SPACK_SYSTEM_CONFIG_PATH=/non-existing/dir

@spackbot-app spackbot-app bot added the tests General test capability(ies) label Sep 29, 2021
On shared systems with a system-wide versioned programming environment
where the user can switch between different versions (and test the
latest the greatest when they feel like it), it's convenient to provide
a config for compilers and packages depending on the programming
environment version. It'd also be nice if the user could use their own
version of Spack on their favorite commit, and pick up config without
having to back up their user config and replacing it with something
temporarily when switching programming environment versions.

With this change, people managing the system can provide compilers.yaml
and packages.yaml versioned with the programming environment:

```
/path/to/programming/environment/1.2.3/{compilers,packages}.yaml
```

and set `SPACK_SYSTEM_CONFIG_PATH` to point to this dir.

In particular, on Cray systems this variable could be set in a module
called `spack-system-config`, and the user could opt in to use it, and
doesn't have to know about whatever programming environment verison they
are in.

As a bonus, if somebody would ever put Spack config files in
`/etc/spack`, the user can now opt out of that by setting

```
SPACK_SYSTEM_CONFIG_PATH=/non-existing/dir
```
@haampie haampie force-pushed the feature/add-SPACK_SYSTEM_CONFIG_PATH-env-variable branch from a64d20f to f210b1f Compare September 29, 2021 15:48
@scheibelp scheibelp self-assigned this Sep 29, 2021
@scheibelp
Copy link
Copy Markdown
Member

We spoke yesterday about this and I wanted to list the alternatives discussed:

  • Using -C with spack (as in spack -C <dir> ...): the issue with this is that scopes added this way take priority over most other config scopes
  • Updating environments to allow loading a module when activating the environment: I think this appealed the most to the Spack team. In general it has the disadvantage that users have to remember to spack env activate ... where before they would only need to use a more familiar module load prgenv...
  • One possible solution we did not discuss: environments are able to include external config directories. If we could dereference shell variables in these entries, this would also be a means to have prgenv-specific configs
  • One possible solution we did not discuss: we currently have architecture-specific scopes that we automatically check for. We could further specify prgenv-specific scopes that we automatically look for (e.g. in the same way we check ~/.spack/ for a linux subdir, we could also check it for a prgenv-* subdir which matches the current loaded prgenv).

Issues with this approach

  • It is another, different, mechanism to update the config: ideally we have fewer ways to do one thing
  • We might remove the system config scope at some point in the future: we would prefer not to add features that depend on it

WRT the latter point, I mentioned that if you didn't mind this being removed in the future that we could add it. However I'm trying to consider whether that's healthy for the Spack code base.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Sep 30, 2021

One possible solution we did not discuss: we currently have architecture-specific scopes that we automatically check for. We could further specify prgenv-specific scopes that we automatically look for (e.g. in the same way we check ~/.spack/ for a linux subdir, we could also check it for a prgenv-* subdir which matches the current loaded prgenv).

The issue I see with this is that Spack has to be aware of these environments, which may be undesirable, especially if there are other distro's with similar concepts for versioned programming environments (I just know of CentOS which has software collections allowing you to do e.g. scl enable devtoolset-9 bash to get a shell in a GCC 9 development environment).

Environments including config based on a variable are a nice and generic solution, but I'm afraid that apart from forgetting about activating an environment, the learning curve is too high. New users start with spack install xyz, not with spack env activate ..., so the default experience for new users would be bad (they'd start compiling mpich instead of using system cray-mpich or so).

ideally we have fewer ways to do one thing

Agreed, I didn't want to introduce yet another config scope for instance :p the system scope seems like a perfect fit, and is generally not used anyways.

If you didn't mind this being removed in the future that we could add it

I think I can live with it, when system / user config would be dropped, people would be forced to use environment config, right?


Btw, I just noticed that #23760 is essentially the same PR, but for user config. Also making that path overidable with an env var.

Actually, it's the equivalent of #11951. Even the variable name is exactly SPACK_<scope>_CONFIG_PATH.

Apparently many people want to override default paths with variables... the other PRs are from the perspective of the user working on multiple systems with a shared home folder, this PR is from the perspective of the system administrators providing multiple environments on a single system.

Also note that the behavior of this PR was requested in this comment: #23760 (comment) and got an upvote ;)


I think I'd be a big fan to have both these SPACK_SYSTEM_CONFIG_PATH and SPACK_USER_CONFIG_PATH pr's merged and then later superseded by PRs dropping these config scopes altogether... I realize I've also got scripts that set HOME to some temporary dir to avoid that Spack picks up config from ~ if the same home folder is mounted on a different system. It'd be great if I could set SPACK_USER_CONFIG_PATH=/nonexistent/dir/ instead, since tweaking HOME is a hack...

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 5, 2021

@scheibelp here's a reason why environment includes are a bit awkward:

spack:
  include:
  - ${SPACK_SYSTEM_CONFIG_PATH}/config.yaml
  - ${SPACK_SYSTEM_CONFIG_PATH}/packages.yaml
  ...

If SPACK_SYSTEM_CONFIG_PATH is not set, then:

==> Error: Detected 1 missing include path(s):
   /path/to/environment/folder/${SPACK_SYSTEM_CONFIG_PATH}/config.yaml
Please correct and try again.

SPACK_SYSTEM_CONFIG_PATH must be set or it is taken as a literal value and you end up with a non-existing path; the yaml files have to be listed explicitly (what if there is only packages.yaml but no config.yaml?)

So, basically I'll have to go back to my environment file and modify it, and I have to actually check what config files the system provides.

That would not be necessary if SPACK_SYSTEM_CONFIG_PATH changed the system scope config path, since if it was not set, it would default to /etc/spack, and if it was set to a non-existing dir, it would not error.

Also, it seems like environment includes are not very well battle-tested... it's outputting daunting looking & redundant messages on concretization:

==> Warning: included configuration files should be updated manually [files=${SPACK_SYSTEM_CONFIG_PATH}/packages.yaml]

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Oct 5, 2021

ideally we have fewer ways to do one thing

Agreed, I didn't want to introduce yet another config scope for instance :p the system scope seems like a perfect fit, and is generally not used anyways.

Interesting: I would actually say that adding another scope would be an explicit interaction with the existing API, whereas reusing an existing scope for a different purpose is a new approach. As I think on it more, I would in fact like to make this scope explicit, e.g. (in config.py):

#: Builtin paths to configuration files in Spack
configuration_paths = (
    ('defaults', os.path.join(spack.paths.etc_path, 'spack', 'defaults')),
    ('system', os.path.join(spack.paths.system_etc_path, 'spack')),
    ('site', os.path.join(spack.paths.etc_path, 'spack')),

    # I'd like to place a new scope here, that only gets placed here if $NEW_SHELL_VARIABLE is defined
    ('shell', '$NEW_SHELL_VARIABLE')

    ('user', spack.paths.user_config_path)
)

Do you think you would be able to do that? Or would that not be suitable for your purposes?

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 5, 2021

But then what would you call it 😅 SPACK_ENV_VARIABLE_CONFIG_BETWEEN_SITE_AND_USER_SCOPE=... (okay, I'm pushing it a bit, but to me it seems rather abitrary that the envirnonment-variable-based-scope happens to come between site & user, whereas defauls -> system -> site -> user -> command line make a lot of sense to me)

is a new approach

another interpretation is that it's a "bugfix" for the following:

  • system scopes have the issue that they enforce one config across many spack instances & many different versions
  • user scopes have the issue that they enforce one config across many spack instances of a single user, and across multiple machines of different architectures with the same home folder

there should be a way to opt out, and SPACK_[SYSTEM|USER]_CONFIG_PATH is a simple fix until those scopes are removed altogether. Overriding at the environment level with x:: is another solution, but it has drawbacks, in particular having to use an environment, and having to repeat all the config from the layers below the one you're trying to opt out of.

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Oct 5, 2021

But then what would you call it 😅 SPACK_ENV_VARIABLE_CONFIG_BETWEEN_SITE_AND_USER_SCOPE

Don't tempt me: I love long variable names

That being said, the scope itself I already proposed a name for: the shell scope. For the environment variable I would suggest SPACK_SHELL_SCOPE_DIRECTORY.

system scopes have the issue that they enforce one config across many spack instances & many different versions

There is in fact some config that you might want to share with everyone on a system:

  • Mirrors
  • Package repositories
  • Several options in config.yaml e.g. pointing users to the appropriate temp filesystem for stages

user scopes have the issue that they enforce one config across many spack instances of a single user, and across multiple machines of different architectures with the same home folder

I agree with this, but I'm not sure how it's relevant because this PR doesn't do anything about the user scope; I don't think it should since there are other concerns (i.e. why we have it in the first place, our plans to get rid of it, and what we need to do before we get rid of it).

okay, I'm pushing it a bit, but to me it seems rather abitrary that the envirnonment-variable-based-scope happens to come between site & user, whereas defauls -> system -> site -> user -> command line make a lot of sense to me

  • I argued above that you may have both Prgenv-specific and system-wide preferences: if you end up agreeing that they should both exist, then this probably has higher priority than the system scope
  • We want to allow users to have control over and be able to override the scope (so it has lower priority than the user scope)

So while I don't know if I have a strong opinion on where it should reside exactly, those two considerations establish the boundaries.

@scheibelp
Copy link
Copy Markdown
Member

We discussed this today in https://github.com/spack/spack/wiki/Telcon:-2021-10-06 and it came up that using env variables to override scopes may be controversial (although I still think using an env variable to add a new scope should be OK). It was suggested we check in on slack with @tgamblin

@cosmicexplorer
Copy link
Copy Markdown
Contributor

Regarding the above:

it came up that using env variables to override scopes may be controversial (although I still think using an env variable to add a new scope should be OK)

here are some thoughts I have which are adapted from my experience with the pants build tool.

Multiple config sources precedence

Pants has no concept of "system" or "site" config, and instead manages precedence within config files very simply: there's no precedence hierarchy to search through, instead it:

  • searches for pants.toml in the current directory,
  • on the command line, --pants-config-files=special-config-1.toml --pants-config-files=special-config-2.toml will apply the config variables from each successive file in order (so special-config-2.toml would override special-config-1.toml, and special-config-1.toml would override pants.toml),
  • and the config search path can be overwritten entirely with --pants-config-files=["special-config-1.toml"], which would ignore the default pants.toml (however, I'm not proposing this--see proposal below).

And, importantly, PANTS_CONFIG_FILES can be specified in the process environment, so that the selection of config files doesn't have to be provided on every single command line. In particular, I'm thinking about:

the other PRs are from the perspective of the user working on multiple systems with a shared home folder, this PR is from the perspective of the system administrators providing multiple environments on a single system.

It seems like this describes a conflict arising from spack's currently-hardcoded config precedence.

So attempting to apply that to the current issue, regarding:

Using -C with spack (as in spack -C <dir> ...): the issue with this is that scopes added this way take priority over most other config scopes

I'm thinking that if we can specify a set of config files (for spack, likely a list of directories of config files) in order within an environment/config scope to denote the linear precedence, we could avoid confusion regarding what gets overridden.

Proposal: configurable config precedence

Riffing off of @scheibelp's comment at #26341 (comment), I'm proposing the following.

If site admins could do e.g. this in the site config:

config:
    config-files-after:
    - extra-site-config-dir: "/etc/shared-stuff/.spack"

(where extra-site-config-dir is processed immediately after site config)

and a user could do this in their user config:

config:
    config-files-before:
    - shared-home-dir: "/home/shared-stuff/.spack"

(where shared-home-dir is processed immediately before the user config)

and a user could do this in their environment config:

spack:
    config-files-before:
    - some-specific-site-config-dir-provided-by-site-admins: "/etc/special-site-config"
    - this-repo-config: "./repo-config"

(where these are processed immediately before the env config)

Then we can enable all four of:

  1. site admins can split up their static, unchanging config for all users across separate dirs for readability (see extra-site-config-dir)
  2. site admins can provide configuration for specific use cases that doesn't automatically propagate to all spack users on the site (see some-specific-site-config-dir-provided-by-site-admins)
  3. individual users can refer to a shared home directory with precedence behind any of their user-specific config (see shared-home-dir)
  4. spack environments can selectively turn on and off specific site configurations (see some-specific-site-config-dir-provided-by-site-admins) as well as refer to spack configuration shared across environments located within the same code repository (see this-repo-config)

And importantly, none of this actually changes the precedence of config scopes, but merely allows specific config scopes to refer to more than one place--see below on bootstrap option parsing for why this is a useful result.

Bootstrap option parsing

As in pants, I believe this would require processing the config-files-{before,after} variables before other config variables. Pants refers to these as "bootstrap" options and has a specific option-processing phase for them: see https://github.com/pantsbuild/pants/blob/main/src/python/pants/option/options_bootstrapper.py. Note that pants is not able to understand pants_config_files being set within a config file itself--since spack relies more on config files than pants does, I think it's reasonable for us to go beyond pants's architectural decision in this case (see #25863 for more distinctions between spack and pants argument parsing).

This proposal initially had just a config-files variable which could be overridden entirely, but modified it to use config-files-{before,after} in response to the telcon notes:

Related: if we use env variables to control user/system scopes, then this also offers an option to "avoid" using a scope

  • Todd apparently had objections to prior PRs that do this

I think this is indeed better, since config-files:: doesn't allow prepending as with config-files-before, and because being able to overwrite config-files: entirely would probably make the proposed bootstrap option parsing significantly more confusing and complex.

Shell interpolation

Regarding this comment from @haampie:

Apparently many people want to override default paths with variables...

we could even extend this syntax to support interpolating shell variables:

spack:
    config-files-before:
    - this-env-config: "${MY_SPECIAL_SPACK_STUFF}/config-dir"
      env_var_dirs:
      - MY_SPECIAL_SPACK_STUFF

where env_var_dirs would be a special key that would ensure spack would check for the existence of the process environment variable and error out if it's not provided or doesn't exist. This addresses the concern from @haampie in:

SPACK_SYSTEM_CONFIG_PATH must be set or it is taken as a literal value and you end up with a non-existing path; the yaml files have to be listed explicitly (what if there is only packages.yaml but no config.yaml?)

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 8, 2021

@cosmicexplorer I was hoping for a quick fix 😆 (I'll read your message, but I was thinking more about a temporary solution until we have the system/config scopes dropped altogether, which seems to be one of the plans in other discussions)

@sleak-lbl
Copy link
Copy Markdown
Contributor

adding my support to @haampie 's stopgap approach: I wrote an almost-identical patch to our local install to allow the same Spack instance (on a sitewide filesystem) to get the appropriate compilers.yaml and packages.yaml for the cluster being used (and set SPACK_SYSTEM_CONFIG_PATH in the modulefile) .. and it works nicely

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 26, 2021

closing in favor of #26735

@haampie haampie closed this Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants