Skip to content

Refactor initialization and configuration to remove globals#7774

Merged
scheibelp merged 25 commits intodevelopfrom
refactor/init-config
May 17, 2018
Merged

Refactor initialization and configuration to remove globals#7774
scheibelp merged 25 commits intodevelopfrom
refactor/init-config

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented Apr 16, 2018

This is still WIP but it's almost done, so feel free to start reviewing. It hurt a lot but it's needed. 😄

Spack's __init__.py has served us well but it's time to get rid of all the stuff in it. It's essentially a small set of global variables to manage the state of a Spack run. The fact that they are all in one top-level __init__.py has some serious negatives as Spack scales. Specifically:

  1. spack is loaded when spack.anything is imported. This means everything in __init__.py has to be imported for every Spack run, which causes startup time to suffer (see Remove case consistency check at startup. #7585 and Avoid stat-ing all packages at startup. #7587).
  2. All the imports mean that we have very little control over initialization order in Spack, and it's easy to get circular dependencies among the top-level modules. It also means we can't force certain imports to after before certain command-line options (see spack option -C/--config to add dir to config search path #6903 and config and command-line options to disable locking #7692).
  3. The imports in __init__.py pollute the top-level Spack namespace. This will be dealt with in a future PR.

The way spack.config is currently written, it suffers from some of the same problems as (2), so it also needs to be made more lazy.

This PR will do the following:

  • Remove all module-scope initialization from __init__.py, so that pieces of spack can be imported selectively.
  • Rework spack.config so that module-scope state is in a Configuration class. This makes it easy to override
  • Simplify the spack.config interface to allow getting/setting either entire YAML files or specific parts (single variables).
  • Make it possible to handle command-line configuration settings and defaults through spack.config, by adding new, internal-only configuration scopes (i.e. scopes that don't correspond to files). This unifies the way Spack handles user configuration settings and its own.
  • Replace all globals in __init__.py with configuration options via spack.config
  • Convert significant module-scope initialization in submodules into lazy initialization through functions or classes.

The last TODO was to make __init__.py load lazily for the from spack import * case used by package.py files, but I will continue this in another PR.

This should allow #6903 and #7692 to move forward and should hopefully help reduce the remaining startup latency discussed in #7587. It should also make internal Spack code more clear, and give us more control going forward. Finally, it moves us a little closer to #1385.

@alalazo @scheibelp @becker33 @mwkrentel @cyrush FYI

@citibeth
Copy link
Copy Markdown
Member

by adding new, internal-only configuration scopes (i.e. scopes that don't correspond to files

In #2686, I made command-line configuration scopes. This allowed one to stack multiple configuration scopes specified on the command line. It's a feature I consider essential for environments, which you DON'T want the environment to rely on particular settings in the user's ~/.spack scope.

When I wrote #2686, I became aware of the complexity in __init.py. I am hopeful that this PR will reduce that complexity, and in general make re-implementing #2686 easier. But I don't plan to re-implement as before; what I ACTUALLY need is for configuration scopes (as set up in files) to be added to Spack environments (#4585). I will post more specifics there.

@citibeth
Copy link
Copy Markdown
Member

Until configuration scopes are implemented properly in #4585, the following quickie command-line utility would be REALLY useful: scope-merge a b c would read YAML files from a/, b/ and c/, merge them, and then write them to ~/spack.

@citibeth
Copy link
Copy Markdown
Member

I take it back on the config scopes. I think that if we have appropriate config scope support in Spack Environments, we don't need it elsewhere. And we are almost there.

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments/questions.

I don't have a thorough understanding of everything here yet.

import argparse

import llnl.util.tty as tty

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move pure style changes like this into a separate commit?

# Construct paths to special files in the archive dir used to
# keep track of whether patches were successfully applied.
archive_dir = self.stage.source_path
good_file = join_path(archive_dir, '.spack_patched')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These style changes (replacing join_path with os.path.join) are currently mixed into 2557ea6 - could they go in their own commit?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this particular case is important (2557ea6 is simple even with that mixed in) but generally prefer that pure-style changes go in their own commits (and replacing join_path is close enough to a pure-style change to qualify IMO).

#
# TODO: move all of these imports out of __init__ to avoid importing the whole
# TODO: world on Spack startup. There are some design changes that need to be
# TODO: made to enable this (decoupling Spec, repo, DB, and store state).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO stale? If not then that sounds like a fair amount of additional work.

"""Get a config section or a single value from one.

Accepts a path syntax that allows us to grab nested config map
entries. Getting the 'config' section would look like::
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would look like:: -> would look like:


def __init__(self, *args, **kwargs):
kwargs['default'] = spack.dirty
kwargs['default'] = spack.config.get('config:dirty')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax (e.g. config:dirty) appears to be new. I had to scan through the commit messages to find 2b0c8f6 which explains it's purpose. Presumably the line

Simplify the spack.config interface to allow getting/setting either entire YAML files or specific parts (single variables).

is relevant? IMO it's worth mentioning the syntax there.


spack.config.get('config')

and the ``dirty`` section in the ``config`` scope would be::
Copy link
Copy Markdown
Member

@scheibelp scheibelp Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to call this the config scope here?

EDIT: I was assuming this would be the config section


This is a singleton; it constructs one instance associated with this
module and returns it. It is bundled inside a function so that
configuratoin can be initialized lazily.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration (typo)

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more questions

'build_stage': ['path1', 'path2', 'path3']}}
# repos
def test_write_list_in_memory(config):
spack.config.set('repos', repos_low['repos'], scope='low')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to test spack.config vs. a Configuration object?

return clist

def find_compiler(self, cmp_cls, *paths):
# function-local so that cnl doesn't depend on spack.config
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this depends on spack.config?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still curious about this. I assume it has more to do with cyclic dependencies than initialization of state, at least for spack.spec. I'm generally wary of function-local imports - I suspect they are more difficult to maintain.

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had so far a partial read of the PR, left a few comments, in particular on config.py

throughout Spack and should bring in a minimal number of external
dependencies.
"""
import os
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: os.path is documented as a module, and usually I import it directly if I need to use any of its functions. Looking for best practices took me down into another python rabbit-hole.

('site', os.path.join(spack.paths.etc_path, 'spack')),

# User configuration can override both spack defaults and site config
('user', spack.paths.user_config_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should do it in this PR, but I think it's worth swapping user scope with site scope, and make site the default for configuration. Based on my experience using multiple Spack versions at the same time, writing by default to user config is very likely to create conflicts if the configuration file format changed in the meanwhile.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good but for another PR.

#: Hard-coded default values for some key configuration options.
#: This ensures that Spack will still work even if config.yaml in
#: the defaults scope is removed.
config_defaults = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to this

_validate_section(data, section_schemas[section])

def __repr__(self):
return '<InternalConfigScope: %s>' % self.name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't __repr__ usually supposed to return a string such that:

b = eval(repr(a))
assert b == a

?

ConfigScope('site/%s' % _platform, os.path.join(_site_path, _platform))
If ``scope`` is ``None`` or not provided, return the merged contents
of all of Spack's configuration scopes. If ``scope`` is provided,
return only the confiugration as specified in that scope.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: confiugration


We use ``:`` as the separator, like YAML objects.

TODO: Currently only handles maps. Think about lists if neded.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean lists of objects? Something like:

spack.config.get('config:tests:0:url')

to get the url key from the item at index 0 of a list of things that looks like:

[{ url: 'foo' },
 { url: 'bar'}]

? Or are you referring to another behavior that is not handled?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like that, but I think we'd need to differentiate the syntax, e.g.:

spack.config.get('config:tests[0]:url')

The keys following : are assumed to be strings, and result in index[key] in the Python code. The indexes for lists are integers, and you'd need to convert them before indexing. Yes, we could do that by looking at the type of structure we're indexing, but I think the above is clearer anyway because you cana clearly see what's expected to be a list and what's expected to be a dict in the path specifier.

I'm basically just saying we're not doing anything to address the list case currently, so you can't ask for, e.g.:

spack.config.get('repos[2]')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used frequently so IMO the TODO should be a # comment vs. living in the docstring.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok moved

value = value.get(key, default)
return value

def set(self, path, value, scope=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how much this would be a problem in reality, but I think the semantic of this class is kind of confusing. If I read the code correctly, by default when we get a value we retrieve a merged view of all the scopes. Instead, when we set a value, we set it in a particular scope.

Am I wrong saying that this means that:

# We set the value in the highest precedence scope by default
spack.config.set(somepath, a) 

# We merge all the scopes by default
b = spack.config.get(somepath)

# This might be False, e.g. if 'somepath' is a dict spread over scopes
assert a == b

might happen? In case is this the behavior we want?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would never be false, unless you specified a scope in the first spack.config.set. FWIW, this new syntax doesn't actually change the semantics of working with the config system. It just makes it easier to partially update something without getting the entire dictionary for a level.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say we have two scopes for foo.yaml:

# first scope
foo:
   bar: 5
# second scope
foo:
    baz: 6

Am I wrong on the following snippet:

spack.config.set('foo', {'foobar': 6}) 
b = spack.config.get('foo')
assert b == {'foobar': 6, 'baz': 6}

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying -- I think I was wrong. I actually think b == {'bar': 5, 'foobar': 6, 'baz': 6} should be true, but we should support something like spack.config.set('foo:') for consistency (where the : means overwrite).

I guess the question is whether I should address that now or whether that can be another PR. I do think that the path syntax would be useful, e.g., for allowing these expressions to be used with spack config get.



@contextmanager
def override(path, value):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!

@tgamblin tgamblin force-pushed the refactor/init-config branch from d96e954 to f870f30 Compare May 9, 2018 23:54
@tgamblin tgamblin changed the title Refactor initialization and configuration Refactor initialization and configuration to remove globals May 10, 2018
@tgamblin tgamblin force-pushed the refactor/init-config branch 2 times, most recently from 1484375 to 28e8f6a Compare May 10, 2018 00:14
@tgamblin
Copy link
Copy Markdown
Member Author

@scheibelp @alalazo: this is ready for another review. It should be mostly done now.

Note that I will be submitting some other PRs based on this one. I think this PR has gotten large enough with just the global removal that it should be split up. Probably best to look at each commit if you review this, as there are a lot of changes.

@tgamblin tgamblin force-pushed the refactor/init-config branch 4 times, most recently from 29113cf to 674ef37 Compare May 10, 2018 05:02
@tgamblin tgamblin force-pushed the refactor/init-config branch from 674ef37 to 0f12c70 Compare May 10, 2018 06:03
@tgamblin tgamblin force-pushed the refactor/init-config branch 2 times, most recently from cf6100e to 48d1a03 Compare May 13, 2018 06:32
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments on the PR. I think we can use a different approach to lazy creation of objects to save some lines.

##############################################################################
import multiprocessing
import os
import sys
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import needed for something?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. fixed.

_fetch_cache = None


def misc_cache():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are using this pattern in several places. As a minor improvement, did you consider using something like:

class LazyObject(object):
    def __init__(self, factory):
        self.factory = factory
        self._instance = None
    
    @property
    def instance(self):
        if self._instance is None:
            self._instance = self.factory()
        return self._instance
    
    def __getattr__(self, name):
        return getattr(self.instance, name)
  
    def __str__(self):
        return str(self.instance)

    def __repr__(self):
        return repr(self.instance)

? I think this could simplify a the code. For instance here we could write:

def _misc_cache_factory():
    path = spack.config.get('config:misc_cache')
    if not path:
         path = os.path.join(spack.paths.user_config_path, 'cache')
    path = canonicalize_path(path)
    return FileCache(path)

misc_cache = LazyObject(_misc_cache_factory)

and in client code refer to spack.caches.misc_cache instead of spack.caches.misc_cache()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

self.cpp = Executable(join_path(spack.build_env_path, "cpp"))
self.cxx = Executable(join_path(spack.build_env_path, "c++"))
self.fc = Executable(join_path(spack.build_env_path, "fc"))
self.cc = Executable(join_path(build_env_path, "cc"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other changes we should also change join_path to os.path.join here, and in general in other tests that have been touched by this PR. In the end we want to encourage the use of join_path only within package recipes, right?


import fnmatch
import os
import fnmatch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but why swapping fnmatch with os? Can't we keep in general alphabetical ordering within the three groups of imports that appear in PEP8?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea but if we want that we should enforce it. I think my natural inclination is to put things like sys and os first. Can we do another PR where we alphabetize imports and put in a test to keep them that way? Seems reasonable to me to prevent import churn.

for spec in specs:
spack.package_testing.test(spec.name)
tests = [spec.name for spec in specs]
kwargs['tests'] = tests
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change in the handling of tests! Maybe later we could extend it to accept a csv list, or someting like that, if we need the extra flexibility.

value = value.get(key, default)
return value

def set(self, path, value, scope=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say we have two scopes for foo.yaml:

# first scope
foo:
   bar: 5
# second scope
foo:
    baz: 6

Am I wrong on the following snippet:

spack.config.set('foo', {'foobar': 6}) 
b = spack.config.get('foo')
assert b == {'foobar': 6, 'baz': 6}

?


def scopes():
"""Convenience function to get list of configuration scopes."""
return config().scopes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need wrappers to mask the singleton? Can't we avoid this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp thinks that this is more readable. I personally want to keep the wrappers to a minimum as I think it's better to have one way to access the function, but I also see the readability point. I specifically tried not to add too many wrappers, but added a few for common cases. For config in particular I think get/set read a lot better as wrappers, but I don't have too much of an opinion on this.

Copy link
Copy Markdown
Member

@scheibelp scheibelp May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is consistency. Earlier we had

config.get('packages:...')

and

config.get_configuration().scopes

(get_configuration() has since been renamed to config()). Having two different ways of interacting with the singleton was confusing IMO. I think masking the singleton is OK as long as it's clear what is going on when someone looks at config.py specifically.


#: Root folders where the various module files should be written
roots = spack.config.get_config('config').get('module_roots', {})
#: config section for this flie
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: file

# do any checks needed by the test suite
# return without building if only testing
if not self.unit_test_check():
return
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get the logic of this part. Is this used only within unit tests to disable building?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo yes - perhaps the docstring for the function could be clearer?

Unit tests can override this function to perform checks after Package.install (and all post-install hooks) has run but before the database has been updated. The overridden function may indicate that the install procedure should terminate early (before updating the database) by returning False or any value such that bool(result) is False

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed this below

if dep:
merge = (
# caller requested test dependencies
tests is True or (tests and self.name in tests) or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we refactor this as we've discussed in the past, so that repo gets the class and not an instance,

What do you mean by this?

@tgamblin tgamblin force-pushed the refactor/init-config branch 2 times, most recently from 3d19e70 to 76136bc Compare May 17, 2018 19:29
tgamblin added 19 commits May 17, 2018 12:30
- replace `spack.config.get_configuration()` with `spack.config.config()`

- replace `get_config`/`update_config` with `get`, `set`

- add a path syntax that can be used to refer to specific config options
  without firt getting the entire configuration dict

- update usages of `get_config` and `update_config` to use `get` and `set`
- remove template_dirs global variable from __init__.py

- also remove update_template_dirs fixture, which had no effect on test
  correctness
- remove variable from spack/__init__.py
- clean up imports and some code structure in binary_distribution.py
- no longer require `spack_version` to be a Version (it isn't used that
  way anyway)

- use a simple tuple `spack_version_info` with major, minor, patch
  versions

- generate `spack_version` from the tuple
- refactor the way test dependencies are passed to the concretizer
- remove global state
- update tests
- rename `builtin_mock` and `refresh_builtin_mock` to the more clear
  `mock_packages` and `mutable_mock_packages`
- spack.repository module is now spack.repo

- `spack.repo` is now `spack.repo.path()` and loaded lazily

- Added `spack.repo.get()` and `spack.repo.all_package_names()` as
  convenience functions to simplify the new lazy interface.

- updated tests and code
- spack.store was previously initialized at the spack.store module level,
  but this means the store has to be initialized on every spack call.

- this moves the state in spack.store to a singleton so that the store is
  only initialized when needed.
- It turns out that jsonschema is one of the more expensive imports.

- move imports of jsonschema into functions to avoid the performance hits
  for calls that don't need config.
- `spack.cmd.all_commands` does a directory listing on
  `lib/spack/spack/cmd`, regardless of whether it is needed

- make this lazy so that the directory listing won't happen unless it's
  necessary.
- simplify the singleton pattern across the codebase
- reduce lines of code needed for crufty initialization
- reduce functions that need to mess with a global

- Singletons whose semantics changed:
  - spack.store.store() -> spack.store
  - spack.repo.path() -> spack.repo.path
  - spack.config.config() -> spack.config.config
  - spack.caches.fetch_cache() -> spack.caches.fetch_cache
  - spack.caches.misc_cache() -> spack.caches.misc_cache
- Spack core has long used llnl.util.filesystem.join_path, but
  os.path.join is pretty much the same thing, and is more efficient.

- Use os.path.join in the core Spack code from now on.
@tgamblin
Copy link
Copy Markdown
Member Author

@alalazo @scheibelp: I've pushed some updates based on your comments, including adding @alalazo's singleton (lazyobject) and updating some comments that weren't clear. I think this is ready to go.

The only thing I haven't done is update the config syntax. I think it would be good to support lists and maybe :: overrides in set, and also it might be good to make it so you can spack config get these paths on the command line, but I think that's not needed for this PR. I'd rather commit something focused on that that we can review independently.

Note: I reworded the way database and install_mockery work based on the refactor of spack.store. I think they are a lot simpler now. I also removed some placed where these fixtures yielded custom objects as namedtuples. It seems like that was being used to expose both the original and the mock version of the fixture, but the real version was never used, and the tests would all call database.mock. I simplified this by just having the fixture yield the mock database.

@tgamblin
Copy link
Copy Markdown
Member Author

@scheibelp: if you think this is good i think you can rebase and merge.

@scheibelp scheibelp merged commit 5ef30e0 into develop May 17, 2018
@scheibelp scheibelp mentioned this pull request May 17, 2018
@tgamblin tgamblin deleted the refactor/init-config branch May 17, 2018 23:27
@tgamblin tgamblin mentioned this pull request Jun 24, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants