Skip to content

Allow for multiple dependencies/dependents from the same package#28673

Merged
tgamblin merged 29 commits intospack:developfrom
alalazo:features/subscript_api_multiple_nodes_per_package
Mar 10, 2022
Merged

Allow for multiple dependencies/dependents from the same package#28673
tgamblin merged 29 commits intospack:developfrom
alalazo:features/subscript_api_multiple_nodes_per_package

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Jan 29, 2022

fixes #11983
closes #16447

same as #21683 which I cannot reopen after a force push for some reason

This PR changes the internal representation of Spec to allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers.

Modifications:

  • Substitute DependencyMap with _EdgeMap. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides a select_by method to query items.
  • Reworked a few public APIs of Spec to get list of dependencies or related edges.
  • Added unit tests to prevent regression on Incomplete computation of installed dependents #11983 and prove the synthetic construction of specs with multiple deps from the same package.

Since #22845 went in first, this PR reuses that format and thus it should not change hashes. What happens is that in the list of dependencies we may have the same package being present multiple times with different associated specs.

Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

Return a list of dependency specs sorted topologically. The spec
argument is not modified in the process.

def topological_sort(spec, deptype='all'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could use some comments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in 950f436

Comment on lines +1143 to +1126
original_spec = getattr(spec, 'wrapped_obj', spec)
self.wrapped_obj = original_spec
self.token = original_spec, name, query_parameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic be pushed into lang.ObjectWrapper? It seems like it would apply to any object that cares about it's id, so we could use __getattr__ and __getitem__ to make sure methods are called on the wrapped object.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I need to think about that. I think __getattr__ wouldn't work since the way we create a wrapper object is to fake the class name and share the __dict__ object:

if type(self) not in wrapped_cls.__mro__:
self.__class__ = type(wrapped_name, (type(self), wrapped_cls), {})
else:
self.__class__ = type(wrapped_name, (wrapped_cls,), {})
self.__dict__ = wrapped_object.__dict__

We can probably create though a better API to say that a method should be forwarded to the real object.

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from deb51c4 to 950f436 Compare February 7, 2022 15:42
@alalazo alalazo requested a review from becker33 February 7, 2022 16:56
becker33
becker33 previously approved these changes Feb 7, 2022
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go, but I'm a little nervous that we could be missing something.

balay
balay previously approved these changes Feb 7, 2022
Copy link
Copy Markdown
Contributor

@balay balay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving petsc/package.py change [checked a couple of builds - they go through fine]

@tgamblin tgamblin mentioned this pull request Feb 8, 2022
10 tasks
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have two main things here:

  1. Can you check the performance of this implementation? With something like spack buildcache list where we have to construct and sort a lot of specs? I am worried that the edge structures are going to be too complicated and that we'll run into issues with large build caches. If there's no noticeable slowdown then I think that's fine, but it would be good to know b/c this is a major change to a core data structure. I made a few comments inline to this effect.

  2. The interface for accessing dependencies here seems awkward in places. I get why there are both edges_from _dependents(name) and edges_to _dependencies(name), for symmetry, but essentially every place we ask for a dependency by name, there is going to be one element in that list. The dependencies case (where there will only be one) is far more common than the dependents case (where we have to think about multiple). So I think the interface should not be the same for these.

    I think there should be a way to get the single edge with, e.g., edge_to_dependency(name) (which should fail with an assertion error if there are multiple), and edges_from _dependents should probably not take a name argument. I don't think this leaks out of spec.py too much -- the dicts with list values are mostly hidden in the details of _EdgeMap, but there were places where I found the code harder to follow and had to think about the lists in _EdgeMap when they did not matter at all - that was hard to read.


return clone

def select_by(self, parent=None, child=None, dependency_types='all'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency_types should be renamed to deptypes, because we call it that in every other place it's used as an argument.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly disagree with that. We call them both deptype and deptypes and we had bugs because of the two similar spellings1. Writing the full name in my opinion helps avoiding these kind of issues.

Footnotes

  1. Actually looking again at the code base there are more bugs than I caught in the past. I'll submit a separate PR to fix them. See for instance this call: https://github.com/spack/spack/blob/1ddad522a4b1527d337bfface0a62b26b8520347/lib/spack/spack/package.py#L1192
    where deptypes gets forwarded to this other method:
    https://github.com/spack/spack/blob/1ddad522a4b1527d337bfface0a62b26b8520347/lib/spack/spack/spec.py#L1350
    that gets a deptype (without s) instead.

Copy link
Copy Markdown
Member

@tgamblin tgamblin Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should standardize on deptypes here and fix the other occurrences in the separate PR

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #29397 and f99c721

@becker33
Copy link
Copy Markdown
Member

For Todd's point (2), we need the dependency edge information for when we split build dependencies for cross-compilation. It's not possible yet, and it won't be in the first separate concretization of build deps PR, but we will need that eventually. I think that's a good enough reason to keep this now, since we'll need it in the long run.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 23, 2022

we need the dependency edge information for when we split build dependencies for cross-compilation.

I agree with @becker33 That is one of the point why we have this PR, to allow splitting edges for the same dependency if need be. A prominent example would be python being a (build,run) dependency in a cross-compilation environment.

Also, the common case - the one where we have a single dependency - should be covered by __getitem__:

dep = root[dep_name]

@tgamblin
Copy link
Copy Markdown
Member

Ok, yes you guys are right -- when we properly handle cross-compilation and can split a dependency node between build and link/run environments, we'll care about this. So for point 2 above, yeah, let's keep the interface symmetric.

I guess right now, root[dep_name] will give you the link dependency if there happen to be two with the same name, which I guess is the right thing to do here. I guess we can make the indexing fancier if we want later -- e.g., root[dep_name, 'build'] if we really have to refine it.

@tgamblin
Copy link
Copy Markdown
Member

Ok updated the review and resolved the points around multiple edges to dependencies.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 23, 2022

From a fresh clone of Spack, setting up the instance with these commands:

$ spack solve zlib
$ spack mirror add E4S https://cache.e4s.io
$ spack buildcache keys -it
==> Fetching https://cache.e4s.io/build_cache/_pgp/25645FA2B218FE55B4EF649E4345F04B40005581.pub
gpg: key 4345F04B40005581: public key "University of Oregon - E4S" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: inserting ownertrust of 6

and running on:

  • Spack: 0.17.1-1343-455fd15e86
  • Python: 3.8.10
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

This is the timing I obtain on this branch:

$ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
  spack buildcache list --allarch

real	1m8,325s
user	1m5,968s
sys	0m0,399s

$ time spack buildcache list --allarch
[ ... ]
real	1m13,419s
user	1m10,911s
sys	0m0,453s

On develop:

$ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
  spack buildcache list --allarch

real	0m51,076s
user	0m38,516s
sys	0m0,918s

$ time spack buildcache list --allarch
[ ... ]
real	0m52,793s
user	0m50,168s
sys	0m0,424s

so it seems there's a ~15-20% increase in time for this call.

@tgamblin
Copy link
Copy Markdown
Member

Can you profile it with pyinstrument? When I've looked at this in the past it was very sensitive to canonicalizing dependencies in too many places (as that happens a lot as part of traversal). I'm suspicious of that and maybe the new list structure -- but moreso the canonicalizing.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Feb 23, 2022

Also another way to benchmark this that's faster than waiting for buildcache list is to concretize a large spec, save it as json, then run pyinstrument on a script that just loads it and traverses it a hundred (maybe a few hundred?) times.

I tend to run pyinstrument like this, to show everything:

python3 -m pyinstrument --show-all --hide zzz $(which spack) help

--hide zzz overrides the somewhat annoying default hide settings it has -- which apparently are not overridden by --show-all

@alalazo alalazo dismissed stale reviews from balay and becker33 via c19ef50 February 23, 2022 11:02
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 950f436 to c19ef50 Compare February 23, 2022 11:02
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 23, 2022

With c19ef50 I have this profiling buildcache_list.spy.txt obtained with:

$ py-spy record -f speedscope -o buildcache_list.spy -s -n $(which spack) buildcache list

It can be seen at https://www.speedscope.app/ It seems we spend a lot of time (for what this PR is concerned) in the comparisons at:

if edge.spec == dep.spec and edge.parent == dep.parent:

If I read the profiling data correctly canonicalizing doesn't have a major impact for the command above.

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 5d77b88 to df30e6c Compare March 7, 2022 15:03
Comment on lines +1447 to +1450
for edge in selected:
if id(dependency_spec) == id(edge.spec):
edge.add_type(deptype)
return
Copy link
Copy Markdown
Member Author

@alalazo alalazo Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin I still need to update this part, to do the symmetric operation on the parent It seems though we have no test that detected this. Ok, it turns out this is correct, since we push the same object into two dictionaries, so when we update the dependency types the other reference sees the update too.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 8, 2022

@tgamblin From what I can see locally the timing of this branch is now more in line with develop. On 43bd292 I got:

$ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
  spack buildcache list --allarch

real	0m47,115s
user	0m44,382s
sys	0m0,356s

$ time spack buildcache list --allarch
[ ... ]
real	0m50,916s
user	0m48,500s
sys	0m0,429s

while on `develop:

$ time spack buildcache list
==> 0 cached builds.
==> You can query all available architectures with:
  spack buildcache list --allarch

real	0m46,886s
user	0m44,530s
sys	0m0,380s

$ time spack buildcache list --allarch
[ ... ]
real	0m44,941s
user	0m42,481s
sys	0m0,469s

For the record, I am using:

  • Spack: 0.17.1-1492-43bd292e13
  • Python: 3.8.10
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

key = edge.spec.name if self.store_by_child else edge.parent.name
current_list = self.edges.setdefault(key, [])
current_list.append(edge)
current_list.sort(key=_sort_by_dep_types)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that while this guarantees deterministic dependency traversal order, it doesn't guarantee a deterministic dependent traversal order for the same spec. I do not think that matters currently (we don't hash dependent traversals) but it's worth remembering.

c2 = Spec('[email protected]').concretized()

p.add_dependency_edge(c1, deptype=c1_deptypes)
with pytest.raises(spack.error.SpackError):
Copy link
Copy Markdown
Member

@tgamblin tgamblin Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this raises the right exception but for the future: this test should probably not have the parent and the dependencies as b. If we added cycle checking or something I could see this error getting confused with another.

This is true for a few tests above too

raise spack.error.SpecError(msg.format(edge, deptype))

for edge in selected:
if id(dependency_spec) == id(edge.spec):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be shortened to is but I think we can leave it for now.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few really minor nitpicks above but none of them should impede getting this merged -- they can be follow-ons. LGTM! The new tests and minimal performance hit are great.

@tgamblin tgamblin merged commit 2cd5c00 into spack:develop Mar 10, 2022
@alalazo alalazo deleted the features/subscript_api_multiple_nodes_per_package branch March 10, 2022 20:05
@sethrj
Copy link
Copy Markdown
Contributor

sethrj commented Mar 12, 2022

@alalazo I think this change is responsible for #29478: installing the exact same python spec with the new spack core fails because link_deps = spec.dependencies('link') seems to evaluate to an empty list now. Is this behavior change expected? What can be done to fix it?

@sethrj sethrj mentioned this pull request Mar 12, 2022
4 tasks
sethrj added a commit to sethrj/spack that referenced this pull request Mar 12, 2022
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 12, 2022

@sethrj Changes to fix the package:

spec.dependencies(deptype='link')

As it is now it's looking for a dependency named link. Let me submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incomplete computation of installed dependents

5 participants