Refactor MockPackage into MockRepositoryBuilder#32273
Refactor MockPackage into MockRepositoryBuilder#32273alalazo merged 11 commits intospack:developfrom
Conversation
22d34c2 to
b9f18c1
Compare
| cmd_line_scope = None | ||
| if self.scopes: | ||
| highest_precedence_scope = list(self.scopes.values())[-1] | ||
| if highest_precedence_scope.name == "command_line": | ||
| # If the command-line scope is present, it should always | ||
| # be the scope of highest precedence | ||
| cmd_line_scope = self.pop_scope() |
There was a problem hiding this comment.
@tgamblin @scheibelp This was added in 4b2f51d which is part of #9612 There's no unit test failing on removing this behavior (in fact I have tests failing here due to this behavior + the refactor that keeps some of the context manager in sync with the configuration).
Was there any reason why this was added? Asking because if there is, we need to add an additional test and restore the behavior outside of this class, i.e. in the calling code:
config.push_scope(...)
config.move_as_top_scope("command_line")There was a problem hiding this comment.
I want the user to be able to use command-line scopes to have the "last-say": I think they will be frustrated if other configuration implicitly overrides config that they set with -C. If I failed to test this, then that is my mistake. IMO the behavior is still worthwhile (and worth testing).
There was a problem hiding this comment.
Sure, my only point is that shouldn't be part of the Configuration.push behavior but should be checked at the caller site where we assemble the configuration. Rationale for saying that:
- Having an API where subsequent
pushandpopdon't restore the initial state is very counter-intuitive. Configurationshould not bother with the names of the scopes, it's code that uses that class that could/should give meaning to them
d0802b1 to
acd378d
Compare
| {% else %} | ||
| depends_on("{{ dep_spec }}") | ||
| {% endif %} | ||
| {% endfor %} No newline at end of file |
There was a problem hiding this comment.
Would the mechanism from #27987 also be a suitable replacement? i.e. that PR adds the ability to define packages exactly and in-line with the testing code:
_pkgy = (
"y",
"""\
class Y(Package):
version('2.5')
version('2.4')
version('2.3', deprecated=True)
variant('shared', default=True,
description='Build shared libraries')
""",
)
I think this direct approach is useful because users know exactly what the package "should be" (e.g. without doing any mental substitution based on the template).
There was a problem hiding this comment.
I think I like the approach here more, since it is more compact and can be extended further. It also has a single place where these mock packages are collected, so we don't scatter the code around.
That said, the main point of this change was to get rid of MockPackageMultiRepo etc. since that machinery requires maintenance every time we change properties of repositories. Here I remove the mock class and add the possibility to create small repositories to be used during tests on the fly.
There was a problem hiding this comment.
I think I like the approach here more, since it is more compact and can be extended further.
I'm not sure what is more extensible than being able to write the package contents down to the character: a template is flexible but it will never be that flexible. Also, ultimately demands on the template will encourage for example replacing the hardcoded versions that you have there with a placeholder for versions that you insert: at that point, my concern is that you will have created an interface for creating packages which is analogous to (and redundant with) the package.py format.
It also has a single place where these mock packages are collected, so we don't scatter the code around.
I'm not sure what you mean here: test_save_dependency_spec_jsons_subset for example creates its own repo as part of the test, that seems equally as disperse as the preceding approach.
Additionally, IMO consolidation of mock packages is problematic: I see in many cases that builtin.mock packages are used for multiple tests and thereby aggregate unrelated behaviors that makes them hard to understand (e.g. in the process of understanding a unit test, a user tracks down the associated builtin.mock package and reads it, and then must parse out the components of that package which happen to relate to the specific test).
There was a problem hiding this comment.
I'm not sure what is more extensible than being able to write the package contents down to the character: a template is flexible but it will never be that flexible.
This is not true, since writing a literal package can't possibly deal with dynamic content like:
spack/lib/spack/spack/concretize.py
Lines 753 to 774 in 97bd867
where the package to be generated depends on the function arguments. I can, of course, concatenate strings etc. but at that point I would have re-created a basic template engine. That said my point is:
- Before this PR we were mocking a
Packagewithout having apackage.pyavailable and that was brittle1 - In Configure requirements for a package #27987 we introduced a mechanism to use a literal
package.pyin tests. This is fine with me, and I don't want to change that in this PR. - Here I introduce a class that can create a mock repository with mock
package.pyfrom API. That is flexible enough to be used in many places where we had similar code before (hence the saving of roughly 200 lines)
Additionally, IMO consolidation of mock packages is problematic:
To be clear, I don't plan to consolidate builtin.mock but only the tests that are using MockPackageMultiRepo and related classes
Footnotes
-
Changing a few signatures caused tests relying on the mock to fail, and solving that was not straightforward. ↩
| have_name = self._have_name(pkg_name) | ||
| return have_name and pkg_name in self.provider_index | ||
|
|
||
| def is_virtual_safe(self, pkg_name): |
There was a problem hiding this comment.
I get wanting to separate these, but doesn't this pessimize specs, especially merge_dependency, such that we can't ever use an index even if we have one? This method shows up pretty heavily on performance traces because of so frequently calling get_pkg_class through spec, so I've been keeping an eye on it.
| # This method can be called while regenerating the provider index | ||
| # So we turn off using the index to detect virtuals | ||
| return spack.repo.path.is_virtual(self.name, use_index=False) | ||
| return spack.repo.path.is_virtual(self.name) |
There was a problem hiding this comment.
@trws I didn't look at traces in this PR, but I would expect separating the function into two optimizes times. The timings I posted seem to show a consistent shave of 0,5 sec. on the setup phase across many specs (on my laptop with SSD).
Note that before we were using use_index=False because we called Spec.virtual even when reconstructing repo caches. In this PR I took care of using RepoPath.is_virtual_safe in those places so I could enable using the cache when calling Spec.virtual.
There was a problem hiding this comment.
To be more explicit, the use_index parameter was removed in this PR and old use_index=False maps to is_virtual_safe, while use_index=True maps to is_virtual.
There was a problem hiding this comment.
Fair, and if we want to go further in spec we could always do something like:
if currently_bootstrapping():
is_virtual_safe()
else:
is_virtual()Works for me, thanks!
There was a problem hiding this comment.
I have a last question on this one: is this going to regress #23661? The point of that PR was to avoid the massive stat we do on all packages for simple operations like spack load, where it is faster just to check each package individually. I wonder if doing this looks fast here but also slows down operations like that. What we want to avoid is the index check for simple operations where we don't have to reindex all virtuals.
@alalazo can you check on this and look at the issue that it fixes?
3c8723d to
b498cda
Compare
b498cda to
97bd867
Compare
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
97bd867 to
797b463
Compare
tgamblin
left a comment
There was a problem hiding this comment.
This looks much more readable and it's definitely an improvement. I would really like to get to inline package classes, e.g.:
class MyRepo(MockRepo):
class A(Package):
version("1.0", sha256="abcdefabcdefabcdef")
depends_on("b")
class B(Package):
version("1.0", sha256="abcdefabcdefabcdef")But I think moving to this first and trying that later makes sense.
I have a few change requests, and I'm not sure why the repository has to be passed to all the instance methods of the indexers and not just the constructor. Otherwise this LGTM.
lib/spack/spack/provider_index.py
Outdated
| if specs is None: | ||
| specs = [] | ||
|
|
||
| assert repository is not None, "a 'repository=' argument is required" |
There was a problem hiding this comment.
same here -- why not just omit the default value?
There was a problem hiding this comment.
Same as before, but if preferred I can move repository to the front in the signature
Modifications: - Remove mock_package.py - Add MockRepositoryBuilder in spack.repo - Add debug print for pushing and popping config scopes - Changed spack.repo.user_repositories so that it can override or not previous repos - spack.repo.use_repo updates spack.config.config according to the modifications done - `spack -m` is now reflected in the configuration - spack.concretize.concretize_specs_together has been simplified by using MockRepoBuilder
Remove the assertion during initialization
Remove the assertion during initialization
797b463 to
4a11505
Compare
|
@tgamblin This should be ready for a second review |
tgamblin
left a comment
There was a problem hiding this comment.
I have one final question about this PR before it goes in -- we really need to be careful about using the virtual index in the wrong places because it can cause unexpected performance problems.
| # This method can be called while regenerating the provider index | ||
| # So we turn off using the index to detect virtuals | ||
| return spack.repo.path.is_virtual(self.name, use_index=False) | ||
| return spack.repo.path.is_virtual(self.name) |
There was a problem hiding this comment.
I have a last question on this one: is this going to regress #23661? The point of that PR was to avoid the massive stat we do on all packages for simple operations like spack load, where it is faster just to check each package individually. I wonder if doing this looks fast here but also slows down operations like that. What we want to avoid is the index check for simple operations where we don't have to reindex all virtuals.
@alalazo can you check on this and look at the issue that it fixes?
|
@tgamblin I don't see differences with @property
def virtual(self):
return spack.repo.path.is_virtual_safe(self.name)in The test performed on Marconi (CINECA). With this PR: [mculpo02@login02 spack]$ spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing install failure marks
==> Removing cached information on repositories
==> Removing python cache files
$ time spack load hdf5
real 1m21,228s
user 0m56,768s
sys 0m6,822s
$ time spack unload hdf5
real 0m2,165s
user 0m1,344s
sys 0m0,825s
$ time spack load hdf5
real 0m2,498s
user 0m1,775s
sys 0m0,728s
$ time spack load hdf5
real 0m2,427s
user 0m1,577s
sys 0m0,858swhile on $ git checkout develop
Switched to branch 'develop'
Your branch is up to date with 'upstream/develop'.
$ spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing install failure marks
==> Removing cached information on repositories
==> Removing python cache files
$ time spack load hdf5
real 1m8,999s
user 0m57,426s
sys 0m6,980s
$ time spack unload hdf5
real 0m2,405s
user 0m1,524s
sys 0m0,883s
$ time spack load hdf5
real 0m2,740s
user 0m1,897s
sys 0m0,851s
$ time spack load hdf5
real 0m2,753s
user 0m1,866s
sys 0m0,896s |
Caches used by repositories don't reference the global spack.repo.path instance anymore, but get the repository they refer to during initialization. Spec.virtual now use the index, and computation done to compute the index use Repository.is_virtual_safe. Code to construct mock packages and mock repository has been factored into a unique MockRepositoryBuilder that is used throughout the codebase. Add debug print for pushing and popping config scopes. Changed spack.repo.use_repositories so that it can override or not previous repos spack.repo.use_repositories updates spack.config.config according to the modifications done Removed a peculiar behavior from spack.config.Configuration where push would always bubble-up a scope named command_line if it existed
|
This regressed overriding config from active environments: |
|
I guess it's:
? I'll try to submit a fix quickly. |


This PR factors together code to generate a mock repository with fake packages, which was present in multiple places in Spack. It also construct indexes injecting a reference to the repository they refer to, removing any direct or indirect reference to global state in the class.
Spec.virtualnow use the index, and computation done to compute the index useRepository.is_virtual_safe.The approach taken is to provide an API to generate a mock repository in a temporary directory, instead of having fake objects mimicking a
RepoPathand aPackage. One advantage, besides the deduplication of the code, is that there will be no maintenance required to adapt the mocks when the API ofRepoPathandPackagechange, since we'll be using the actual classes.The decorator
spack.repo.use_repositoriesnow has an additional argument which permits to override (or not) existing repositories. It also pushes a corresponding configuration scope into the global config, so to keep it consistent with the changes.Modifications:
*Indexobjects get the repository they refer to during constructionSpec.virtualuses the indexmock_package.pyMockRepositoryBuilderinspack.repospack.repo.use_repositoriesso that it can override or not previous reposspack.repo.use_repositoriesupdatesspack.config.configaccording to the modifications donespack -mis now reflected in the configurationspack.concretize.concretize_specs_togetherhas been simplified by usingMockRepoBuilderspack.config.Configurationwherepushwould always bubble-up a scope namedcommand_lineif it existed