Allow for multiple dependencies/dependents from the same package by alalazo · Pull Request #21683 · spack/spack

alalazo · 2021-02-15T18:02:21Z

fixes #11983
closes #16447

This PR changes the internal representation of Spec to allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers.

Modifications:

Substitute DependencyMap with _EdgeMap. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides a select_by method to query items.
Reworked a few public APIs of Spec to get list of dependencies or related edges.
Added unit tests to prevent regression on Incomplete computation of installed dependents #11983 and prove the synthetic construction of specs with multiple deps from the same package.

~~Due to the change in the internal representation of specs, the YAML file for specs will change too and the "dependencies" field will be a list of dictionaries instead of a single dictionary:~~

    dependencies:
      pkgconf:
      - hash: nip2nwwydp6asi4iiza37drmolecwzyg
        type:
        - build

~~This in turn will cause all of the hashes to change, so the PR is definitely not backward compatible with old installation hashes.~~

Since #22845 went in first, this PR reuses that format and thus it should not change hashes. What happens is that in the list of dependencies we may have the same package being present multiple times with different associated specs.

alalazo · 2021-04-01T10:49:53Z

@tgamblin Rebased on top of #21618 and ready for review

scheibelp

I have a few questions to start with. Primarily I'm interested in why this changes both dependencies and dependents. My understanding is that just changing dependents would be sufficient to fix #11983 (more details in the comments).

lib/spack/spack/spec.py

scheibelp · 2021-05-03T20:26:16Z

lib/spack/spack/spec.py

        nodes.update(self_nodes)

        for name in nodes:
+            # TODO: check if splice semantics is respected


I'm guessing the intent would be to resolve this TODO before merging this.

What is the effect of splicing in a dependency when there are multiple instances of the package in the DAG? I would assume that this function might need to specify deptypes in that case.

I'm guessing the intent would be to resolve this TODO before merging this.

To my understanding splicing is not yet used in Spack and there's just been some commit to prepare for this functionality. We can either leave a TODO and check it in a later PR (since there are no test failures) or see if @becker33 or maybe @nhanford can help with that here.

As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.

For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.

Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).

(for reference the PR that added splicing is #20262)

scheibelp

I'm still reading through this to understand it. I have some additional questions.

lib/spack/spack/spec.py

scheibelp · 2021-05-10T23:12:08Z

lib/spack/spack/spec.py

+
+        return clone
+
+    def select_by(self, parent=None, child=None, dependency_types='all'):


This API seems a little strange to me because the edge map is entirely parents or entirely children. I think something like:

def select_by(self, name=None, deptypes='all'): ... if self.store_by == 'child': selected = [d for d in selected if d.spec.name == name]

would avoid that confusion.

I think the API is correct, in the sense that an _EdgeMap can store arbitrary list of edges, which are collected in a dictionary by either parent name or child name. What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children. Hence I think on this class the API is fine.

What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.

I'm not sure if by this you mean

you might want an edge map that stores edges only to parents, but for multiple children

you might want an edge map that stores edges to both parents and children (I assume not this since the EdgeMap has a store_by_child property, but I want to make sure)

something else?

This API seems a little strange to me because the edge map is entirely parents or entirely children.

Think for example storing all the edges for an environment that has multiple root specs and wanting to retrieve all the different edges from e.g. any hdf5 to any zlib. With the current API you can do it with this object.

Another way of saying this is that the current API fit using an EdgeMap object outside of a Spec object as a collection of edges.

scheibelp · 2021-05-10T23:31:11Z

lib/spack/spack/spec.py

+        # If there's something in the list, check if we need to update an
+        # already existing entry.
+        for dep in current_list:
+            if edge.spec == dep.spec and edge.parent == dep.parent:


Should this be is vs. ==?

if edge.spec is dep.spec and edge.parent is dep.parent:

Specs may temporarily compare == even if we intend to manage two of them as separate objects (which eventually become unequal).

This may not be critical since the old concretizer should never do something like this and the new concretizer would (I think) only ever use a function like this when the specs were concrete.

I think it should be == i.e. if we have equivalent nodes defining the edges, we don't care if they are or are not the same object in memory.

If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?

I think == is ok as long as we mandate that the specs are concrete.

If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.

lib/spack/spack/spec.py

alalazo · 2021-06-03T20:23:08Z

@scheibelp Ready for a second review

scheibelp

This includes a couple questions and responses. Largest request is likely that we should think through how this interacts with splice (more in the comments).

scheibelp · 2021-06-11T16:35:09Z

lib/spack/spack/spec.py

+        # If there's something in the list, check if we need to update an
+        # already existing entry.
+        for dep in current_list:
+            if edge.spec == dep.spec and edge.parent == dep.parent:


If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?

I think == is ok as long as we mandate that the specs are concrete.

scheibelp · 2021-06-11T16:51:05Z

lib/spack/spack/spec.py

+
+        return clone
+
+    def select_by(self, parent=None, child=None, dependency_types='all'):


What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.

I'm not sure if by this you mean

you might want an edge map that stores edges only to parents, but for multiple children

you might want an edge map that stores edges to both parents and children (I assume not this since the EdgeMap has a store_by_child property, but I want to make sure)

something else?

scheibelp · 2021-06-11T17:54:21Z

lib/spack/spack/spec.py

        Spack specs have a single root (the package being installed).
        """
+        # FIXME: In the case of multiple parents this property does not
+        # FIXME: make sense. Should we revisit the semantics?


If there are multiple parents there may still be a single root (this would occur for a dependency in a single concretized spec), but if there are multiple roots (e.g. after database reconstruction), this can raise an error. I think that would settle it.

scheibelp · 2021-06-11T18:04:04Z

lib/spack/spack/spec.py

        nodes.update(self_nodes)

        for name in nodes:
+            # TODO: check if splice semantics is respected


As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.

For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.

Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).

(for reference the PR that added splicing is #20262)

alalazo · 2021-09-20T12:03:46Z

Generally I think 99% of the packages that use ^xyz in conflicts or when clauses refer to their direct dependencies, not to some arbitrary dependency of a dependency of a dependency.

I think that's already the semantics for packages. If you conflict with something it's supposed to be either a node attribute or a direct dependency (maybe through a virtual). We'll need to extend the DSL used on the command line in #15569 to allow referencing edge attributes, of which virtuals are an example.

Anyhow, this PR just changes the internal data representation, so no user facing behavior is changed. There are APIs to discriminate among transitive dependencies and the implementation for __getitem__ privileges link/run dependencies to build deps.

In a followup PR I'll build on this one to introduce separate concretization of build dependencies, and there we may have to face cases like the one you mentioned above.

becker33 · 2021-12-29T23:40:20Z

lib/spack/spack/solver/asp.py

+            # TODO: This assumes that each solve unifies dependencies
+            dependencies[0].add_type(type)


This will be slightly tricky to update with separate concretization of build deps, but I think the PSIDs will give us a way to make it work.

lib/spack/spack/installer.py

lib/spack/spack/database.py

lib/spack/spack/graph.py

becker33 · 2021-12-29T23:50:34Z

lib/spack/spack/graph.py

+    topological_order = []
+    par = {}
+    for name, specs in nodes.items():
+        par[name] = [x for x in parents(specs) if x.name in nodes]


I think this still assumes there's only one node of a given name in the tree we're sorting -- I don't think this can graph the synthetic creations you use for testing.

With the latest code, using this script:

import spack.graph import spack.hash_types import spack.spec spack.hash_types.package_hash = spack.hash_types.SpecHashDescriptor( deptype=(), package_hash=True, name='package_hash', override=lambda s: '' ) Spec = spack.spec.Spec nodes = { 'n1': Spec('[email protected]'), 'n2': Spec('[email protected]'), 'n3': Spec('[email protected]'), 'n4': Spec('[email protected]') } for s in nodes.values(): s._mark_concrete() original = nodes['n1'] nodes['n2'].add_dependency_edge(nodes['n4'], deptype=('build', 'link')) nodes['n3'].add_dependency_edge(nodes['n4'], deptype=('build', 'link')) original.add_dependency_edge(nodes['n2'], deptype='run') original.add_dependency_edge(nodes['n3'], deptype='link') spack.graph.graph_ascii(original)

I obtain the following graph:

o [email protected]/bysqrwy |\ | o [email protected]/tnvdtkr o | [email protected]/n4gla5x |/ o [email protected]/gaeiwev

lib/spack/spack/test/database.py

becker33 · 2021-12-29T23:56:20Z

lib/spack/spack/test/database.py

+    mutable_database.remove('mpileaks ^callpath ^mpich2')
+    mutable_database.remove('callpath ^mpich2')


Would be easier to read in the other order, but not worth changing unless you're in this file anyway

becker33 · 2021-12-30T00:04:30Z

lib/spack/spack/spec.py

+        # If there's something in the list, check if we need to update an
+        # already existing entry.
+        for dep in current_list:
+            if edge.spec == dep.spec and edge.parent == dep.parent:


If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.

lib/spack/spack/spec.py

becker33 · 2022-01-05T18:00:04Z

This also needs to be updated for splicing

alalazo · 2022-01-10T14:41:00Z

@becker33 With respect to what we discussed in Slack to extend the splice API to:

def splice(self, other, transitive, deptype):
     ...

where deptype is used to select which spec we want to splice in self, I think the API might not be general enough to cover for cases that may happen when allowing multiple specs from the same package.

One (synthetic) use case that we can't cover with that interface is the following. Start with a spec like:

   A
 /   \
B     C

and say that we want to splice in that:

     C'
   /  \
  /    \
B_1'     B_2'

where the new C spec depends on 2 different B specs. In that case it's not a matter of selecting which node in the original spec needs to be spliced, but rather which one of B_1' or B_2'needs to be used for the spliced spec if transitive=True.¹ I'm trying to work this out on paper, but I think the might need some "translation" dict that maps hashes of specs that are to be substituted to hashes of specs that will substitute them. We might discuss this again offline, but I wanted to leave here a sketch of the use case.

@nhanford You might be interested in this case / have opinions on how to solve this issue.

Another sensible option is to use all three B, B_1' and B_2' in the spliced spec. ↩

Introduce a new data structure, called _EdgeMap that maps package names to list of DependencySpec objects (edges in the DAG). This data structure is used to track both dependencies and dependents of a node. It allows filtering the stored values by parent or child name and by dependency types. Unit tests added: - Regression on 11983 - Synthetic construction of split dependency - Synthetic construction of bootstrapping

alalazo · 2022-01-29T17:06:50Z

This is kind of weird. I force pushed the branch to update the PR. The PR wasn't updated and now I cannot reopen it.

alalazo added hash-change bugfix Something wasn't working, here's a fix labels Feb 15, 2021

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from e33c575 to 72d8bab Compare February 18, 2021 22:52

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from ab6161d to f3e6b03 Compare March 5, 2021 13:55

alalazo marked this pull request as ready for review March 5, 2021 15:27

tgamblin requested review from becker33, nhanford and tgamblin March 14, 2021 06:40

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from f5cea50 to b2907d7 Compare March 31, 2021 19:32

alalazo mentioned this pull request Apr 1, 2021

specs: use lazy lexicographic comparison instead of key_ordering #21618

Merged

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from b2907d7 to 2162313 Compare April 1, 2021 06:17

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 2162313 to 83dcaa2 Compare April 15, 2021 17:04

alalazo requested a review from scheibelp April 30, 2021 07:10

alalazo assigned scheibelp Apr 30, 2021

scheibelp reviewed May 3, 2021

View reviewed changes

scheibelp reviewed May 11, 2021

View reviewed changes

alalazo mentioned this pull request May 13, 2021

spack dependents --installed only shows one installed variant of dependent specs #23569

Closed

3 tasks

alalazo mentioned this pull request May 24, 2021

Use default compilers for pure build dependencies. #23861

Closed

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from 7be96f8 to 3e35267 Compare June 3, 2021 19:13

scheibelp mentioned this pull request Jun 9, 2021

Upcoming high-impact changes (perma-pin) #24223

Closed

scheibelp requested changes Jun 11, 2021

View reviewed changes

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 3e35267 to c93d9fe Compare July 29, 2021 18:56

spackbot-app bot added build-systems tests General test capability(ies) labels Jul 29, 2021

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 8d52a24 to c465ab0 Compare January 3, 2022 09:07

becker33 requested changes Jan 5, 2022

View reviewed changes

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 3 times, most recently from ccc77e3 to 69de40d Compare January 18, 2022 10:02

spackbot-app bot added the commands label Jan 18, 2022

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 3 times, most recently from 860285c to 882cfeb Compare January 19, 2022 13:09

alalazo added 12 commits January 26, 2022 15:05

Added comments to address review

b442b61

Using an enum for "store_by"

d9b8e2c

Fix petsc failure in gitlab pipeline

6cfb65b

Fix style checks

14526f7

Fixed rebasing issues

eb2d36a

Minor: unify turning a list result into a set

63ef0ea

minor: fixed docstring

fcc9b07

Refactor _get_dependency

08f8ac7

Fix copy in case of multiple dependencies from same package

ff56e48

Make spack.graph work with mutiple specs from a single package

adfab11

Add a check to avoid using splicing with multiple specs etc.

cd95594

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 882cfeb to cd95594 Compare January 26, 2022 15:12

Skip test that might result in spurious failures

2fd1940

alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 2708c8b to 2fd1940 Compare January 26, 2022 22:09

alalazo closed this Jan 29, 2022

alalazo mentioned this pull request Jan 29, 2022

Allow for multiple dependencies/dependents from the same package #28673

Merged

3 tasks


		return clone

		def select_by(self, parent=None, child=None, dependency_types='all'):

		# TODO: This assumes that each solve unifies dependencies
		dependencies[0].add_type(type)

		mutable_database.remove('mpileaks ^callpath ^mpich2')
		mutable_database.remove('callpath ^mpich2')

Conversation

alalazo commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalazo commented Apr 1, 2021

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalazo Sep 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scheibelp May 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alalazo commented Jun 3, 2021

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalazo commented Sep 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

becker33 commented Jan 5, 2022

Uh oh!

alalazo commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

alalazo commented Feb 15, 2021 •

edited

Loading

alalazo Sep 14, 2021 •

edited

Loading

scheibelp May 10, 2021 •

edited

Loading

alalazo commented Jan 10, 2022 •

edited

Loading