Refactor initialization and configuration to remove globals#7774
Refactor initialization and configuration to remove globals#7774
Conversation
In #2686, I made command-line configuration scopes. This allowed one to stack multiple configuration scopes specified on the command line. It's a feature I consider essential for environments, which you DON'T want the environment to rely on particular settings in the user's ~/.spack scope. When I wrote #2686, I became aware of the complexity in |
|
Until configuration scopes are implemented properly in #4585, the following quickie command-line utility would be REALLY useful: |
|
I take it back on the config scopes. I think that if we have appropriate config scope support in Spack Environments, we don't need it elsewhere. And we are almost there. |
scheibelp
left a comment
There was a problem hiding this comment.
Some preliminary comments/questions.
I don't have a thorough understanding of everything here yet.
| import argparse | ||
|
|
||
| import llnl.util.tty as tty | ||
|
|
There was a problem hiding this comment.
Would it be possible to move pure style changes like this into a separate commit?
| # Construct paths to special files in the archive dir used to | ||
| # keep track of whether patches were successfully applied. | ||
| archive_dir = self.stage.source_path | ||
| good_file = join_path(archive_dir, '.spack_patched') |
There was a problem hiding this comment.
These style changes (replacing join_path with os.path.join) are currently mixed into 2557ea6 - could they go in their own commit?
There was a problem hiding this comment.
I don't think this particular case is important (2557ea6 is simple even with that mixed in) but generally prefer that pure-style changes go in their own commits (and replacing join_path is close enough to a pure-style change to qualify IMO).
lib/spack/spack/__init__.py
Outdated
| # | ||
| # TODO: move all of these imports out of __init__ to avoid importing the whole | ||
| # TODO: world on Spack startup. There are some design changes that need to be | ||
| # TODO: made to enable this (decoupling Spec, repo, DB, and store state). |
There was a problem hiding this comment.
Is this TODO stale? If not then that sounds like a fair amount of additional work.
| """Get a config section or a single value from one. | ||
|
|
||
| Accepts a path syntax that allows us to grab nested config map | ||
| entries. Getting the 'config' section would look like:: |
There was a problem hiding this comment.
would look like:: -> would look like:
|
|
||
| def __init__(self, *args, **kwargs): | ||
| kwargs['default'] = spack.dirty | ||
| kwargs['default'] = spack.config.get('config:dirty') |
There was a problem hiding this comment.
This syntax (e.g. config:dirty) appears to be new. I had to scan through the commit messages to find 2b0c8f6 which explains it's purpose. Presumably the line
Simplify the spack.config interface to allow getting/setting either entire YAML files or specific parts (single variables).
is relevant? IMO it's worth mentioning the syntax there.
|
|
||
| spack.config.get('config') | ||
|
|
||
| and the ``dirty`` section in the ``config`` scope would be:: |
There was a problem hiding this comment.
Did you mean to call this the config scope here?
EDIT: I was assuming this would be the config section
lib/spack/spack/config.py
Outdated
|
|
||
| This is a singleton; it constructs one instance associated with this | ||
| module and returns it. It is bundled inside a function so that | ||
| configuratoin can be initialized lazily. |
| 'build_stage': ['path1', 'path2', 'path3']}} | ||
| # repos | ||
| def test_write_list_in_memory(config): | ||
| spack.config.set('repos', repos_low['repos'], scope='low') |
There was a problem hiding this comment.
Do you have to test spack.config vs. a Configuration object?
| return clist | ||
|
|
||
| def find_compiler(self, cmp_cls, *paths): | ||
| # function-local so that cnl doesn't depend on spack.config |
There was a problem hiding this comment.
What happens if this depends on spack.config?
There was a problem hiding this comment.
Still curious about this. I assume it has more to do with cyclic dependencies than initialization of state, at least for spack.spec. I'm generally wary of function-local imports - I suspect they are more difficult to maintain.
alalazo
left a comment
There was a problem hiding this comment.
I had so far a partial read of the PR, left a few comments, in particular on config.py
| throughout Spack and should bring in a minimal number of external | ||
| dependencies. | ||
| """ | ||
| import os |
There was a problem hiding this comment.
Minor nitpick: os.path is documented as a module, and usually I import it directly if I need to use any of its functions. Looking for best practices took me down into another python rabbit-hole.
| ('site', os.path.join(spack.paths.etc_path, 'spack')), | ||
|
|
||
| # User configuration can override both spack defaults and site config | ||
| ('user', spack.paths.user_config_path) |
There was a problem hiding this comment.
Not sure we should do it in this PR, but I think it's worth swapping user scope with site scope, and make site the default for configuration. Based on my experience using multiple Spack versions at the same time, writing by default to user config is very likely to create conflicts if the configuration file format changed in the meanwhile.
There was a problem hiding this comment.
I think this is good but for another PR.
| #: Hard-coded default values for some key configuration options. | ||
| #: This ensures that Spack will still work even if config.yaml in | ||
| #: the defaults scope is removed. | ||
| config_defaults = { |
| _validate_section(data, section_schemas[section]) | ||
|
|
||
| def __repr__(self): | ||
| return '<InternalConfigScope: %s>' % self.name |
There was a problem hiding this comment.
Isn't __repr__ usually supposed to return a string such that:
b = eval(repr(a))
assert b == a?
| ConfigScope('site/%s' % _platform, os.path.join(_site_path, _platform)) | ||
| If ``scope`` is ``None`` or not provided, return the merged contents | ||
| of all of Spack's configuration scopes. If ``scope`` is provided, | ||
| return only the confiugration as specified in that scope. |
lib/spack/spack/config.py
Outdated
|
|
||
| We use ``:`` as the separator, like YAML objects. | ||
|
|
||
| TODO: Currently only handles maps. Think about lists if neded. |
There was a problem hiding this comment.
Do you mean lists of objects? Something like:
spack.config.get('config:tests:0:url')
to get the url key from the item at index 0 of a list of things that looks like:
[{ url: 'foo' },
{ url: 'bar'}]? Or are you referring to another behavior that is not handled?
There was a problem hiding this comment.
I mean something like that, but I think we'd need to differentiate the syntax, e.g.:
spack.config.get('config:tests[0]:url')The keys following : are assumed to be strings, and result in index[key] in the Python code. The indexes for lists are integers, and you'd need to convert them before indexing. Yes, we could do that by looking at the type of structure we're indexing, but I think the above is clearer anyway because you cana clearly see what's expected to be a list and what's expected to be a dict in the path specifier.
I'm basically just saying we're not doing anything to address the list case currently, so you can't ask for, e.g.:
spack.config.get('repos[2]')There was a problem hiding this comment.
This function is used frequently so IMO the TODO should be a # comment vs. living in the docstring.
| value = value.get(key, default) | ||
| return value | ||
|
|
||
| def set(self, path, value, scope=None): |
There was a problem hiding this comment.
Don't know how much this would be a problem in reality, but I think the semantic of this class is kind of confusing. If I read the code correctly, by default when we get a value we retrieve a merged view of all the scopes. Instead, when we set a value, we set it in a particular scope.
Am I wrong saying that this means that:
# We set the value in the highest precedence scope by default
spack.config.set(somepath, a)
# We merge all the scopes by default
b = spack.config.get(somepath)
# This might be False, e.g. if 'somepath' is a dict spread over scopes
assert a == bmight happen? In case is this the behavior we want?
There was a problem hiding this comment.
This would never be false, unless you specified a scope in the first spack.config.set. FWIW, this new syntax doesn't actually change the semantics of working with the config system. It just makes it easier to partially update something without getting the entire dictionary for a level.
There was a problem hiding this comment.
Say we have two scopes for foo.yaml:
# first scope
foo:
bar: 5
# second scope
foo:
baz: 6Am I wrong on the following snippet:
spack.config.set('foo', {'foobar': 6})
b = spack.config.get('foo')
assert b == {'foobar': 6, 'baz': 6}?
There was a problem hiding this comment.
I see what you're saying -- I think I was wrong. I actually think b == {'bar': 5, 'foobar': 6, 'baz': 6} should be true, but we should support something like spack.config.set('foo:') for consistency (where the : means overwrite).
I guess the question is whether I should address that now or whether that can be another PR. I do think that the path syntax would be useful, e.g., for allowing these expressions to be used with spack config get.
|
|
||
|
|
||
| @contextmanager | ||
| def override(path, value): |
d96e954 to
f870f30
Compare
1484375 to
28e8f6a
Compare
|
@scheibelp @alalazo: this is ready for another review. It should be mostly done now. Note that I will be submitting some other PRs based on this one. I think this PR has gotten large enough with just the global removal that it should be split up. Probably best to look at each commit if you review this, as there are a lot of changes. |
29113cf to
674ef37
Compare
674ef37 to
0f12c70
Compare
cf6100e to
48d1a03
Compare
alalazo
left a comment
There was a problem hiding this comment.
A few more comments on the PR. I think we can use a different approach to lazy creation of objects to save some lines.
lib/spack/spack/__init__.py
Outdated
| ############################################################################## | ||
| import multiprocessing | ||
| import os | ||
| import sys |
There was a problem hiding this comment.
Is this import needed for something?
lib/spack/spack/caches.py
Outdated
| _fetch_cache = None | ||
|
|
||
|
|
||
| def misc_cache(): |
There was a problem hiding this comment.
I see you are using this pattern in several places. As a minor improvement, did you consider using something like:
class LazyObject(object):
def __init__(self, factory):
self.factory = factory
self._instance = None
@property
def instance(self):
if self._instance is None:
self._instance = self.factory()
return self._instance
def __getattr__(self, name):
return getattr(self.instance, name)
def __str__(self):
return str(self.instance)
def __repr__(self):
return repr(self.instance)? I think this could simplify a the code. For instance here we could write:
def _misc_cache_factory():
path = spack.config.get('config:misc_cache')
if not path:
path = os.path.join(spack.paths.user_config_path, 'cache')
path = canonicalize_path(path)
return FileCache(path)
misc_cache = LazyObject(_misc_cache_factory)and in client code refer to spack.caches.misc_cache instead of spack.caches.misc_cache()
lib/spack/spack/test/cc.py
Outdated
| self.cpp = Executable(join_path(spack.build_env_path, "cpp")) | ||
| self.cxx = Executable(join_path(spack.build_env_path, "c++")) | ||
| self.fc = Executable(join_path(spack.build_env_path, "fc")) | ||
| self.cc = Executable(join_path(build_env_path, "cc")) |
There was a problem hiding this comment.
To be consistent with other changes we should also change join_path to os.path.join here, and in general in other tests that have been touched by this PR. In the end we want to encourage the use of join_path only within package recipes, right?
|
|
||
| import fnmatch | ||
| import os | ||
| import fnmatch |
There was a problem hiding this comment.
Minor, but why swapping fnmatch with os? Can't we keep in general alphabetical ordering within the three groups of imports that appear in PEP8?
There was a problem hiding this comment.
I think it's a good idea but if we want that we should enforce it. I think my natural inclination is to put things like sys and os first. Can we do another PR where we alphabetize imports and put in a test to keep them that way? Seems reasonable to me to prevent import churn.
| for spec in specs: | ||
| spack.package_testing.test(spec.name) | ||
| tests = [spec.name for spec in specs] | ||
| kwargs['tests'] = tests |
There was a problem hiding this comment.
I like this change in the handling of tests! Maybe later we could extend it to accept a csv list, or someting like that, if we need the extra flexibility.
| value = value.get(key, default) | ||
| return value | ||
|
|
||
| def set(self, path, value, scope=None): |
There was a problem hiding this comment.
Say we have two scopes for foo.yaml:
# first scope
foo:
bar: 5
# second scope
foo:
baz: 6Am I wrong on the following snippet:
spack.config.set('foo', {'foobar': 6})
b = spack.config.get('foo')
assert b == {'foobar': 6, 'baz': 6}?
lib/spack/spack/config.py
Outdated
|
|
||
| def scopes(): | ||
| """Convenience function to get list of configuration scopes.""" | ||
| return config().scopes |
There was a problem hiding this comment.
Why do we need wrappers to mask the singleton? Can't we avoid this?
There was a problem hiding this comment.
@scheibelp thinks that this is more readable. I personally want to keep the wrappers to a minimum as I think it's better to have one way to access the function, but I also see the readability point. I specifically tried not to add too many wrappers, but added a few for common cases. For config in particular I think get/set read a lot better as wrappers, but I don't have too much of an opinion on this.
There was a problem hiding this comment.
My preference is consistency. Earlier we had
config.get('packages:...')
and
config.get_configuration().scopes
(get_configuration() has since been renamed to config()). Having two different ways of interacting with the singleton was confusing IMO. I think masking the singleton is OK as long as it's clear what is going on when someone looks at config.py specifically.
lib/spack/spack/modules/common.py
Outdated
|
|
||
| #: Root folders where the various module files should be written | ||
| roots = spack.config.get_config('config').get('module_roots', {}) | ||
| #: config section for this flie |
| # do any checks needed by the test suite | ||
| # return without building if only testing | ||
| if not self.unit_test_check(): | ||
| return |
There was a problem hiding this comment.
I am not sure I get the logic of this part. Is this used only within unit tests to disable building?
There was a problem hiding this comment.
@alalazo yes - perhaps the docstring for the function could be clearer?
Unit tests can override this function to perform checks after Package.install (and all post-install hooks) has run but before the database has been updated. The overridden function may indicate that the install procedure should terminate early (before updating the database) by returning
Falseor any value such thatbool(result) is False
There was a problem hiding this comment.
I think I addressed this below
| if dep: | ||
| merge = ( | ||
| # caller requested test dependencies | ||
| tests is True or (tests and self.name in tests) or |
There was a problem hiding this comment.
If we refactor this as we've discussed in the past, so that repo gets the class and not an instance,
What do you mean by this?
3d19e70 to
76136bc
Compare
- replace `spack.config.get_configuration()` with `spack.config.config()` - replace `get_config`/`update_config` with `get`, `set` - add a path syntax that can be used to refer to specific config options without firt getting the entire configuration dict - update usages of `get_config` and `update_config` to use `get` and `set`
- remove template_dirs global variable from __init__.py - also remove update_template_dirs fixture, which had no effect on test correctness
- remove variable from spack/__init__.py - clean up imports and some code structure in binary_distribution.py
- no longer require `spack_version` to be a Version (it isn't used that way anyway) - use a simple tuple `spack_version_info` with major, minor, patch versions - generate `spack_version` from the tuple
- refactor the way test dependencies are passed to the concretizer - remove global state - update tests
- rename `builtin_mock` and `refresh_builtin_mock` to the more clear `mock_packages` and `mutable_mock_packages`
- spack.repository module is now spack.repo - `spack.repo` is now `spack.repo.path()` and loaded lazily - Added `spack.repo.get()` and `spack.repo.all_package_names()` as convenience functions to simplify the new lazy interface. - updated tests and code
- spack.store was previously initialized at the spack.store module level, but this means the store has to be initialized on every spack call. - this moves the state in spack.store to a singleton so that the store is only initialized when needed.
- It turns out that jsonschema is one of the more expensive imports. - move imports of jsonschema into functions to avoid the performance hits for calls that don't need config.
- `spack.cmd.all_commands` does a directory listing on `lib/spack/spack/cmd`, regardless of whether it is needed - make this lazy so that the directory listing won't happen unless it's necessary.
- simplify the singleton pattern across the codebase - reduce lines of code needed for crufty initialization - reduce functions that need to mess with a global - Singletons whose semantics changed: - spack.store.store() -> spack.store - spack.repo.path() -> spack.repo.path - spack.config.config() -> spack.config.config - spack.caches.fetch_cache() -> spack.caches.fetch_cache - spack.caches.misc_cache() -> spack.caches.misc_cache
- Spack core has long used llnl.util.filesystem.join_path, but os.path.join is pretty much the same thing, and is more efficient. - Use os.path.join in the core Spack code from now on.
|
@alalazo @scheibelp: I've pushed some updates based on your comments, including adding @alalazo's singleton (lazyobject) and updating some comments that weren't clear. I think this is ready to go. The only thing I haven't done is update the Note: I reworded the way |
|
@scheibelp: if you think this is good i think you can rebase and merge. |
This is still WIP but it's almost done, so feel free to start reviewing. It hurt a lot but it's needed. 😄
Spack's
__init__.pyhas served us well but it's time to get rid of all the stuff in it. It's essentially a small set of global variables to manage the state of a Spack run. The fact that they are all in one top-level__init__.pyhas some serious negatives as Spack scales. Specifically:spackis loaded whenspack.anythingis imported. This means everything in__init__.pyhas to be imported for every Spack run, which causes startup time to suffer (see Remove case consistency check at startup. #7585 and Avoid stat-ing all packages at startup. #7587).__init__.pypollute the top-level Spack namespace. This will be dealt with in a future PR.The way
spack.configis currently written, it suffers from some of the same problems as (2), so it also needs to be made more lazy.This PR will do the following:
__init__.py, so that pieces of spack can be imported selectively.spack.configso that module-scope state is in aConfigurationclass. This makes it easy to overridespack.configinterface to allow getting/setting either entire YAML files or specific parts (single variables).spack.config, by adding new, internal-only configuration scopes (i.e. scopes that don't correspond to files). This unifies the way Spack handles user configuration settings and its own.__init__.pywith configuration options viaspack.configThe last TODO was to make
__init__.pyload lazily for thefrom spack import *case used bypackage.pyfiles, but I will continue this in another PR.This should allow #6903 and #7692 to move forward and should hopefully help reduce the remaining startup latency discussed in #7587. It should also make internal Spack code more clear, and give us more control going forward. Finally, it moves us a little closer to #1385.
@alalazo @scheibelp @becker33 @mwkrentel @cyrush FYI