Allow for multiple dependencies/dependents from the same package#28673
Conversation
| Return a list of dependency specs sorted topologically. The spec | ||
| argument is not modified in the process. | ||
|
|
||
| def topological_sort(spec, deptype='all'): |
There was a problem hiding this comment.
This function could use some comments
| original_spec = getattr(spec, 'wrapped_obj', spec) | ||
| self.wrapped_obj = original_spec | ||
| self.token = original_spec, name, query_parameters |
There was a problem hiding this comment.
Should this logic be pushed into lang.ObjectWrapper? It seems like it would apply to any object that cares about it's id, so we could use __getattr__ and __getitem__ to make sure methods are called on the wrapped object.
There was a problem hiding this comment.
Hmm, I need to think about that. I think __getattr__ wouldn't work since the way we create a wrapper object is to fake the class name and share the __dict__ object:
spack/lib/spack/llnl/util/lang.py
Lines 742 to 747 in c791ddc
We can probably create though a better API to say that a method should be forwarded to the real object.
deb51c4 to
950f436
Compare
becker33
left a comment
There was a problem hiding this comment.
I think this is ready to go, but I'm a little nervous that we could be missing something.
balay
left a comment
There was a problem hiding this comment.
Approving petsc/package.py change [checked a couple of builds - they go through fine]
There was a problem hiding this comment.
Ok I have two main things here:
-
Can you check the performance of this implementation? With something like
spack buildcache listwhere we have to construct and sort a lot of specs? I am worried that the edge structures are going to be too complicated and that we'll run into issues with large build caches. If there's no noticeable slowdown then I think that's fine, but it would be good to know b/c this is a major change to a core data structure. I made a few comments inline to this effect. -
The interface for accessing dependencies here seems awkward in places. I get why there are bothedges_from _dependents(name)andedges_to _dependencies(name), for symmetry, but essentially every place we ask for a dependency by name, there is going to be one element in that list. The dependencies case (where there will only be one) is far more common than the dependents case (where we have to think about multiple). So I think the interface should not be the same for these.I think there should be a way to get the single edge with, e.g.,edge_to_dependency(name)(which should fail with an assertion error if there are multiple), andedges_from _dependentsshould probably not take anameargument. I don't think this leaks out ofspec.pytoo much -- the dicts with list values are mostly hidden in the details of_EdgeMap, but there were places where I found the code harder to follow and had to think about the lists in_EdgeMapwhen they did not matter at all - that was hard to read.
lib/spack/spack/spec.py
Outdated
|
|
||
| return clone | ||
|
|
||
| def select_by(self, parent=None, child=None, dependency_types='all'): |
There was a problem hiding this comment.
dependency_types should be renamed to deptypes, because we call it that in every other place it's used as an argument.
There was a problem hiding this comment.
I slightly disagree with that. We call them both deptype and deptypes and we had bugs because of the two similar spellings1. Writing the full name in my opinion helps avoiding these kind of issues.
Footnotes
-
Actually looking again at the code base there are more bugs than I caught in the past. I'll submit a separate PR to fix them. See for instance this call: https://github.com/spack/spack/blob/1ddad522a4b1527d337bfface0a62b26b8520347/lib/spack/spack/package.py#L1192
wheredeptypesgets forwarded to this other method:
https://github.com/spack/spack/blob/1ddad522a4b1527d337bfface0a62b26b8520347/lib/spack/spack/spec.py#L1350
that gets adeptype(withouts) instead. ↩
There was a problem hiding this comment.
I think we should standardize on deptypes here and fix the other occurrences in the separate PR
|
For Todd's point (2), we need the dependency edge information for when we split build dependencies for cross-compilation. It's not possible yet, and it won't be in the first separate concretization of build deps PR, but we will need that eventually. I think that's a good enough reason to keep this now, since we'll need it in the long run. |
I agree with @becker33 That is one of the point why we have this PR, to allow splitting edges for the same dependency if need be. A prominent example would be Also, the common case - the one where we have a single dependency - should be covered by dep = root[dep_name] |
|
Ok, yes you guys are right -- when we properly handle cross-compilation and can split a dependency node between build and link/run environments, we'll care about this. So for point 2 above, yeah, let's keep the interface symmetric. I guess right now, |
|
Ok updated the review and resolved the points around multiple edges to dependencies. |
|
From a fresh clone of Spack, setting up the instance with these commands: $ spack solve zlib
$ spack mirror add E4S https://cache.e4s.io
$ spack buildcache keys -it
==> Fetching https://cache.e4s.io/build_cache/_pgp/25645FA2B218FE55B4EF649E4345F04B40005581.pub
gpg: key 4345F04B40005581: public key "University of Oregon - E4S" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: inserting ownertrust of 6and running on:
This is the timing I obtain on this branch: On so it seems there's a ~15-20% increase in time for this call. |
|
Can you profile it with pyinstrument? When I've looked at this in the past it was very sensitive to canonicalizing dependencies in too many places (as that happens a lot as part of traversal). I'm suspicious of that and maybe the new list structure -- but moreso the canonicalizing. |
|
Also another way to benchmark this that's faster than waiting for I tend to run pyinstrument like this, to show everything: python3 -m pyinstrument --show-all --hide zzz $(which spack) help
|
950f436 to
c19ef50
Compare
|
With c19ef50 I have this profiling buildcache_list.spy.txt obtained with: $ py-spy record -f speedscope -o buildcache_list.spy -s -n $(which spack) buildcache listIt can be seen at https://www.speedscope.app/ It seems we spend a lot of time (for what this PR is concerned) in the comparisons at: Line 826 in c19ef50 If I read the profiling data correctly canonicalizing doesn't have a major impact for the command above. |
5d77b88 to
df30e6c
Compare
| for edge in selected: | ||
| if id(dependency_spec) == id(edge.spec): | ||
| edge.add_type(deptype) | ||
| return |
There was a problem hiding this comment.
@tgamblin I still need to update this part, to do the symmetric operation on the parent It seems though we have no test that detected this. Ok, it turns out this is correct, since we push the same object into two dictionaries, so when we update the dependency types the other reference sees the update too.
|
@tgamblin From what I can see locally the timing of this branch is now more in line with $ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
spack buildcache list --allarch
real 0m47,115s
user 0m44,382s
sys 0m0,356s
$ time spack buildcache list --allarch
[ ... ]
real 0m50,916s
user 0m48,500s
sys 0m0,429swhile on `develop: $ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
spack buildcache list --allarch
real 0m46,886s
user 0m44,530s
sys 0m0,380s
$ time spack buildcache list --allarch
[ ... ]
real 0m44,941s
user 0m42,481s
sys 0m0,469sFor the record, I am using:
|
| key = edge.spec.name if self.store_by_child else edge.parent.name | ||
| current_list = self.edges.setdefault(key, []) | ||
| current_list.append(edge) | ||
| current_list.sort(key=_sort_by_dep_types) |
There was a problem hiding this comment.
Note that while this guarantees deterministic dependency traversal order, it doesn't guarantee a deterministic dependent traversal order for the same spec. I do not think that matters currently (we don't hash dependent traversals) but it's worth remembering.
| c2 = Spec('[email protected]').concretized() | ||
|
|
||
| p.add_dependency_edge(c1, deptype=c1_deptypes) | ||
| with pytest.raises(spack.error.SpackError): |
There was a problem hiding this comment.
I verified that this raises the right exception but for the future: this test should probably not have the parent and the dependencies as b. If we added cycle checking or something I could see this error getting confused with another.
This is true for a few tests above too
| raise spack.error.SpecError(msg.format(edge, deptype)) | ||
|
|
||
| for edge in selected: | ||
| if id(dependency_spec) == id(edge.spec): |
There was a problem hiding this comment.
this should probably be shortened to is but I think we can leave it for now.
tgamblin
left a comment
There was a problem hiding this comment.
I have a few really minor nitpicks above but none of them should impede getting this merged -- they can be follow-ons. LGTM! The new tests and minimal performance hit are great.
…age (spack#28673)" This reverts commit 2cd5c00.
|
@sethrj Changes to fix the package: spec.dependencies(deptype='link')As it is now it's looking for a dependency named |
fixes #11983
closes #16447
same as #21683 which I cannot reopen after a force push for some reason
This PR changes the internal representation of
Specto allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers.Modifications:
DependencyMapwith_EdgeMap. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides aselect_bymethod to query items.Specto get list of dependencies or related edges.Since #22845 went in first, this PR reuses that format and thus it should not change hashes. What happens is that in the list of dependencies we may have the same package being present multiple times with different associated specs.