Skip to content

Refactor MockPackage into MockRepositoryBuilder#32273

Merged
alalazo merged 11 commits intospack:developfrom
alalazo:refactor/mock_repository_builder
Oct 11, 2022
Merged

Refactor MockPackage into MockRepositoryBuilder#32273
alalazo merged 11 commits intospack:developfrom
alalazo:refactor/mock_repository_builder

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Aug 19, 2022

This PR factors together code to generate a mock repository with fake packages, which was present in multiple places in Spack. It also construct indexes injecting a reference to the repository they refer to, removing any direct or indirect reference to global state in the class.

Spec.virtual now use the index, and computation done to compute the index use Repository.is_virtual_safe.

The approach taken is to provide an API to generate a mock repository in a temporary directory, instead of having fake objects mimicking a RepoPath and a Package. One advantage, besides the deduplication of the code, is that there will be no maintenance required to adapt the mocks when the API of RepoPath and Package change, since we'll be using the actual classes.

The decorator spack.repo.use_repositories now has an additional argument which permits to override (or not) existing repositories. It also pushes a corresponding configuration scope into the global config, so to keep it consistent with the changes.

Modifications:

  • *Index objects get the repository they refer to during construction
  • Spec.virtual uses the index
  • Remove mock_package.py
  • Add MockRepositoryBuilder in spack.repo
  • Add debug print for pushing and popping config scopes
  • Changed spack.repo.use_repositories so that it can override or not previous repos
  • spack.repo.use_repositories updates spack.config.config according to the modifications done
  • spack -m is now reflected in the configuration
  • spack.concretize.concretize_specs_together has been simplified by using MockRepoBuilder
  • Removed a peculiar behavior from spack.config.Configuration where push would always bubble-up a scope named command_line if it existed

@spackbot-app spackbot-app bot added commands core PR affects Spack core functionality dependencies new-version tests General test capability(ies) utilities labels Aug 19, 2022
@alalazo alalazo force-pushed the refactor/mock_repository_builder branch from 22d34c2 to b9f18c1 Compare August 19, 2022 21:09
@spackbot-app spackbot-app bot added the sbang label Aug 19, 2022
Comment on lines -411 to -418
cmd_line_scope = None
if self.scopes:
highest_precedence_scope = list(self.scopes.values())[-1]
if highest_precedence_scope.name == "command_line":
# If the command-line scope is present, it should always
# be the scope of highest precedence
cmd_line_scope = self.pop_scope()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin @scheibelp This was added in 4b2f51d which is part of #9612 There's no unit test failing on removing this behavior (in fact I have tests failing here due to this behavior + the refactor that keeps some of the context manager in sync with the configuration).

Was there any reason why this was added? Asking because if there is, we need to add an additional test and restore the behavior outside of this class, i.e. in the calling code:

config.push_scope(...)
config.move_as_top_scope("command_line")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the user to be able to use command-line scopes to have the "last-say": I think they will be frustrated if other configuration implicitly overrides config that they set with -C. If I failed to test this, then that is my mistake. IMO the behavior is still worthwhile (and worth testing).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, my only point is that shouldn't be part of the Configuration.push behavior but should be checked at the caller site where we assemble the configuration. Rationale for saying that:

  1. Having an API where subsequent push and pop don't restore the initial state is very counter-intuitive.
  2. Configuration should not bother with the names of the scopes, it's code that uses that class that could/should give meaning to them

@alalazo alalazo force-pushed the refactor/mock_repository_builder branch 2 times, most recently from d0802b1 to acd378d Compare August 23, 2022 13:07
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Aug 23, 2022

I have been timing a few solves on this branch:
radiuss_index_hist

and on develop:

radiuss_develop_hist

Using the index in Spec.virtual seem to have some effect in the setup phase. Not a game changer, but still an improvement.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Aug 29, 2022

@tgamblin @trws Would either of you have time to take a look at this refactoring? It started trying to inject repositories into the indexes that are used as caches at construction time, and ended up doing a few more things always related to repos. There's a comprehensive list in the description.

{% else %}
depends_on("{{ dep_spec }}")
{% endif %}
{% endfor %} No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the mechanism from #27987 also be a suitable replacement? i.e. that PR adds the ability to define packages exactly and in-line with the testing code:

_pkgy = (
     "y",
     """\
 class Y(Package):
     version('2.5')
     version('2.4')
     version('2.3', deprecated=True)

     variant('shared', default=True,
             description='Build shared libraries')
 """,
 )

I think this direct approach is useful because users know exactly what the package "should be" (e.g. without doing any mental substitution based on the template).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the approach here more, since it is more compact and can be extended further. It also has a single place where these mock packages are collected, so we don't scatter the code around.

That said, the main point of this change was to get rid of MockPackageMultiRepo etc. since that machinery requires maintenance every time we change properties of repositories. Here I remove the mock class and add the possibility to create small repositories to be used during tests on the fly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the approach here more, since it is more compact and can be extended further.

I'm not sure what is more extensible than being able to write the package contents down to the character: a template is flexible but it will never be that flexible. Also, ultimately demands on the template will encourage for example replacing the hardcoded versions that you have there with a placeholder for versions that you insert: at that point, my concern is that you will have created an interface for creating packages which is analogous to (and redundant with) the package.py format.

It also has a single place where these mock packages are collected, so we don't scatter the code around.

I'm not sure what you mean here: test_save_dependency_spec_jsons_subset for example creates its own repo as part of the test, that seems equally as disperse as the preceding approach.

Additionally, IMO consolidation of mock packages is problematic: I see in many cases that builtin.mock packages are used for multiple tests and thereby aggregate unrelated behaviors that makes them hard to understand (e.g. in the process of understanding a unit test, a user tracks down the associated builtin.mock package and reads it, and then must parse out the components of that package which happen to relate to the specific test).

Copy link
Copy Markdown
Member Author

@alalazo alalazo Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is more extensible than being able to write the package contents down to the character: a template is flexible but it will never be that flexible.

This is not true, since writing a literal package can't possibly deal with dynamic content like:

def _concretize_specs_together_original(*abstract_specs, **kwargs):
abstract_specs = [spack.spec.Spec(s) for s in abstract_specs]
tmpdir = tempfile.mkdtemp()
builder = spack.repo.MockRepositoryBuilder(tmpdir)
# Split recursive specs, as it seems the concretizer has issue
# respecting conditions on dependents expressed like
# depends_on('foo ^[email protected]'), see issue #11160
split_specs = [
dep.copy(deps=False) for spec1 in abstract_specs for dep in spec1.traverse(root=True)
]
builder.add_package(
"concretizationroot", dependencies=[(str(x), None, None) for x in split_specs]
)
with spack.repo.use_repositories(builder.root, override=False):
# Spec from a helper package that depends on all the abstract_specs
concretization_root = spack.spec.Spec("concretizationroot")
concretization_root.concretize(tests=kwargs.get("tests", False))
# Retrieve the direct dependencies
concrete_specs = [concretization_root[spec.name].copy() for spec in abstract_specs]
return concrete_specs

where the package to be generated depends on the function arguments. I can, of course, concatenate strings etc. but at that point I would have re-created a basic template engine. That said my point is:

  • Before this PR we were mocking a Package without having a package.py available and that was brittle1
  • In Configure requirements for a package #27987 we introduced a mechanism to use a literal package.py in tests. This is fine with me, and I don't want to change that in this PR.
  • Here I introduce a class that can create a mock repository with mock package.py from API. That is flexible enough to be used in many places where we had similar code before (hence the saving of roughly 200 lines)

Additionally, IMO consolidation of mock packages is problematic:

To be clear, I don't plan to consolidate builtin.mock but only the tests that are using MockPackageMultiRepo and related classes

Footnotes

  1. Changing a few signatures caused tests relying on the mock to fail, and solving that was not straightforward.

have_name = self._have_name(pkg_name)
return have_name and pkg_name in self.provider_index

def is_virtual_safe(self, pkg_name):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get wanting to separate these, but doesn't this pessimize specs, especially merge_dependency, such that we can't ever use an index even if we have one? This method shows up pretty heavily on performance traces because of so frequently calling get_pkg_class through spec, so I've been keeping an eye on it.

# This method can be called while regenerating the provider index
# So we turn off using the index to detect virtuals
return spack.repo.path.is_virtual(self.name, use_index=False)
return spack.repo.path.is_virtual(self.name)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trws I didn't look at traces in this PR, but I would expect separating the function into two optimizes times. The timings I posted seem to show a consistent shave of 0,5 sec. on the setup phase across many specs (on my laptop with SSD).

Note that before we were using use_index=False because we called Spec.virtual even when reconstructing repo caches. In this PR I took care of using RepoPath.is_virtual_safe in those places so I could enable using the cache when calling Spec.virtual.

Copy link
Copy Markdown
Member Author

@alalazo alalazo Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more explicit, the use_index parameter was removed in this PR and old use_index=False maps to is_virtual_safe, while use_index=True maps to is_virtual.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, and if we want to go further in spec we could always do something like:

if currently_bootstrapping():
  is_virtual_safe()
else:
  is_virtual()

Works for me, thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a last question on this one: is this going to regress #23661? The point of that PR was to avoid the massive stat we do on all packages for simple operations like spack load, where it is faster just to check each package individually. I wonder if doing this looks fast here but also slows down operations like that. What we want to avoid is the index check for simple operations where we don't have to reindex all virtuals.

@alalazo can you check on this and look at the issue that it fixes?

@alalazo alalazo force-pushed the refactor/mock_repository_builder branch 4 times, most recently from 3c8723d to b498cda Compare September 7, 2022 18:13
@alalazo alalazo force-pushed the refactor/mock_repository_builder branch from b498cda to 97bd867 Compare September 12, 2022 13:42
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Sep 12, 2022

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Sep 12, 2022

I've started that pipeline for you!

@alalazo alalazo force-pushed the refactor/mock_repository_builder branch from 97bd867 to 797b463 Compare September 21, 2022 12:46
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much more readable and it's definitely an improvement. I would really like to get to inline package classes, e.g.:

class MyRepo(MockRepo):
    class A(Package):
        version("1.0", sha256="abcdefabcdefabcdef")
        depends_on("b")

    class B(Package):
        version("1.0", sha256="abcdefabcdefabcdef")

But I think moving to this first and trying that later makes sense.

I have a few change requests, and I'm not sure why the repository has to be passed to all the instance methods of the indexers and not just the constructor. Otherwise this LGTM.

if specs is None:
specs = []

assert repository is not None, "a 'repository=' argument is required"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here -- why not just omit the default value?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, but if preferred I can move repository to the front in the signature

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9916991

alalazo added 11 commits October 5, 2022 15:50
Modifications:
- Remove mock_package.py
- Add MockRepositoryBuilder in spack.repo
- Add debug print for pushing and popping config scopes
- Changed spack.repo.user_repositories so that it
  can override or not previous repos
- spack.repo.use_repo updates spack.config.config
  according to the modifications done
- `spack -m` is now reflected in the configuration
- spack.concretize.concretize_specs_together has been
  simplified by using MockRepoBuilder
Remove the assertion during initialization
Remove the assertion during initialization
@alalazo alalazo force-pushed the refactor/mock_repository_builder branch from 797b463 to 4a11505 Compare October 5, 2022 15:27
@alalazo alalazo closed this Oct 5, 2022
@alalazo alalazo reopened this Oct 5, 2022
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 5, 2022

@tgamblin This should be ready for a second review

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one final question about this PR before it goes in -- we really need to be careful about using the virtual index in the wrong places because it can cause unexpected performance problems.

# This method can be called while regenerating the provider index
# So we turn off using the index to detect virtuals
return spack.repo.path.is_virtual(self.name, use_index=False)
return spack.repo.path.is_virtual(self.name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a last question on this one: is this going to regress #23661? The point of that PR was to avoid the massive stat we do on all packages for simple operations like spack load, where it is faster just to check each package individually. I wonder if doing this looks fast here but also slows down operations like that. What we want to avoid is the index check for simple operations where we don't have to reindex all virtuals.

@alalazo can you check on this and look at the issue that it fixes?

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 11, 2022

@tgamblin I don't see differences with develop on a test run on Marconi, so I think this is safe to merge. In case I am wrong, and we get user issues about being slow on GPFS, the workaround is to change a single line and use:

@property
def virtual(self):
    return spack.repo.path.is_virtual_safe(self.name)

in spack.spec.Spec.virtual, so it's very fast to act on this possible request.


The test performed on Marconi (CINECA). With this PR:

[mculpo02@login02 spack]$ spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing install failure marks
==> Removing cached information on repositories
==> Removing python cache files
$ time spack load hdf5

real	1m21,228s
user	0m56,768s
sys	0m6,822s
$ time spack unload hdf5

real	0m2,165s
user	0m1,344s
sys	0m0,825s
$ time spack load hdf5

real	0m2,498s
user	0m1,775s
sys	0m0,728s
$ time spack load hdf5

real	0m2,427s
user	0m1,577s
sys	0m0,858s

while on develop:

$ git checkout develop
Switched to branch 'develop'
Your branch is up to date with 'upstream/develop'.
$ spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing install failure marks
==> Removing cached information on repositories
==> Removing python cache files
$ time spack load hdf5

real	1m8,999s
user	0m57,426s
sys	0m6,980s
$ time spack unload hdf5

real	0m2,405s
user	0m1,524s
sys	0m0,883s
$ time spack load hdf5

real	0m2,740s
user	0m1,897s
sys	0m0,851s
$ time spack load hdf5

real	0m2,753s
user	0m1,866s
sys	0m0,896s

@alalazo alalazo merged commit de8c827 into spack:develop Oct 11, 2022
@alalazo alalazo deleted the refactor/mock_repository_builder branch October 11, 2022 17:28
luke-dt pushed a commit to dantaslab/spack that referenced this pull request Oct 12, 2022
Caches used by repositories don't reference the global spack.repo.path instance
anymore, but get the repository they refer to during initialization.
 
Spec.virtual now use the index, and computation done to compute the index 
use Repository.is_virtual_safe. 

Code to construct mock packages and mock repository has been factored into 
a unique MockRepositoryBuilder that is used throughout the codebase.

Add debug print for pushing and popping config scopes.

Changed spack.repo.use_repositories so that it can override or not previous repos

spack.repo.use_repositories updates spack.config.config according to the modifications done

Removed a peculiar behavior from spack.config.Configuration where push would always 
bubble-up a scope named command_line if it existed
@haampie
Copy link
Copy Markdown
Member

haampie commented Nov 4, 2024

This regressed overriding config from active environments:

spack env activate --temp
spack config add config:ccache:true
spack -c config:ccache:false config get config | grep ccache  # gives true instead of false

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2024

I guess it's:

Removed a peculiar behavior from spack.config.Configuration where push would always bubble-up a scope named command_line if it existed

? I'll try to submit a fix quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants