Skip to content

Allow for multiple dependencies/dependents from the same package#21683

Closed
alalazo wants to merge 13 commits intospack:developfrom
alalazo:features/subscript_api_multiple_nodes_per_package
Closed

Allow for multiple dependencies/dependents from the same package#21683
alalazo wants to merge 13 commits intospack:developfrom
alalazo:features/subscript_api_multiple_nodes_per_package

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Feb 15, 2021

fixes #11983
closes #16447

This PR changes the internal representation of Spec to allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers.

Modifications:

  • Substitute DependencyMap with _EdgeMap. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides a select_by method to query items.
  • Reworked a few public APIs of Spec to get list of dependencies or related edges.
  • Added unit tests to prevent regression on Incomplete computation of installed dependents #11983 and prove the synthetic construction of specs with multiple deps from the same package.

Due to the change in the internal representation of specs, the YAML file for specs will change too and the "dependencies" field will be a list of dictionaries instead of a single dictionary:

    dependencies:
      pkgconf:
      - hash: nip2nwwydp6asi4iiza37drmolecwzyg
        type:
        - build

This in turn will cause all of the hashes to change, so the PR is definitely not backward compatible with old installation hashes.

Since #22845 went in first, this PR reuses that format and thus it should not change hashes. What happens is that in the list of dependencies we may have the same package being present multiple times with different associated specs.

@alalazo alalazo added hash-change bugfix Something wasn't working, here's a fix labels Feb 15, 2021
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from e33c575 to 72d8bab Compare February 18, 2021 22:52
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from ab6161d to f3e6b03 Compare March 5, 2021 13:55
@alalazo alalazo marked this pull request as ready for review March 5, 2021 15:27
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from f5cea50 to b2907d7 Compare March 31, 2021 19:32
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from b2907d7 to 2162313 Compare April 1, 2021 06:17
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Apr 1, 2021

@tgamblin Rebased on top of #21618 and ready for review

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 2162313 to 83dcaa2 Compare April 15, 2021 17:04
@alalazo alalazo requested a review from scheibelp April 30, 2021 07:10
Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions to start with. Primarily I'm interested in why this changes both dependencies and dependents. My understanding is that just changing dependents would be sufficient to fix #11983 (more details in the comments).

nodes.update(self_nodes)

for name in nodes:
# TODO: check if splice semantics is respected
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the intent would be to resolve this TODO before merging this.

What is the effect of splicing in a dependency when there are multiple instances of the package in the DAG? I would assume that this function might need to specify deptypes in that case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the intent would be to resolve this TODO before merging this.

To my understanding splicing is not yet used in Spack and there's just been some commit to prepare for this functionality. We can either leave a TODO and check it in a later PR (since there are no test failures) or see if @becker33 or maybe @nhanford can help with that here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.

For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.

Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).

(for reference the PR that added splicing is #20262)

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reading through this to understand it. I have some additional questions.


return clone

def select_by(self, parent=None, child=None, dependency_types='all'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API seems a little strange to me because the edge map is entirely parents or entirely children. I think something like:

def select_by(self, name=None, deptypes='all'):
    ...
    if self.store_by == 'child':
        selected = [d for d in selected if d.spec.name == name]
   

would avoid that confusion.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is correct, in the sense that an _EdgeMap can store arbitrary list of edges, which are collected in a dictionary by either parent name or child name. What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children. Hence I think on this class the API is fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.

I'm not sure if by this you mean

  • you might want an edge map that stores edges only to parents, but for multiple children
  • you might want an edge map that stores edges to both parents and children (I assume not this since the EdgeMap has a store_by_child property, but I want to make sure)
  • something else?

Copy link
Copy Markdown
Member Author

@alalazo alalazo Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API seems a little strange to me because the edge map is entirely parents or entirely children.

Think for example storing all the edges for an environment that has multiple root specs and wanting to retrieve all the different edges from e.g. any hdf5 to any zlib. With the current API you can do it with this object.

Another way of saying this is that the current API fit using an EdgeMap object outside of a Spec object as a collection of edges.

# If there's something in the list, check if we need to update an
# already existing entry.
for dep in current_list:
if edge.spec == dep.spec and edge.parent == dep.parent:
Copy link
Copy Markdown
Member

@scheibelp scheibelp May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be is vs. ==?

if edge.spec is dep.spec and edge.parent is dep.parent:

Specs may temporarily compare == even if we intend to manage two of them as separate objects (which eventually become unequal).

This may not be critical since the old concretizer should never do something like this and the new concretizer would (I think) only ever use a function like this when the specs were concrete.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be == i.e. if we have equivalent nodes defining the edges, we don't care if they are or are not the same object in memory.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?

I think == is ok as long as we mandate that the specs are concrete.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 2 times, most recently from 7be96f8 to 3e35267 Compare June 3, 2021 19:13
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jun 3, 2021

@scheibelp Ready for a second review

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes a couple questions and responses. Largest request is likely that we should think through how this interacts with splice (more in the comments).

# If there's something in the list, check if we need to update an
# already existing entry.
for dep in current_list:
if edge.spec == dep.spec and edge.parent == dep.parent:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?

I think == is ok as long as we mandate that the specs are concrete.


return clone

def select_by(self, parent=None, child=None, dependency_types='all'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.

I'm not sure if by this you mean

  • you might want an edge map that stores edges only to parents, but for multiple children
  • you might want an edge map that stores edges to both parents and children (I assume not this since the EdgeMap has a store_by_child property, but I want to make sure)
  • something else?

Spack specs have a single root (the package being installed).
"""
# FIXME: In the case of multiple parents this property does not
# FIXME: make sense. Should we revisit the semantics?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple parents there may still be a single root (this would occur for a dependency in a single concretized spec), but if there are multiple roots (e.g. after database reconstruction), this can raise an error. I think that would settle it.

nodes.update(self_nodes)

for name in nodes:
# TODO: check if splice semantics is respected
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.

For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.

Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).

(for reference the PR that added splicing is #20262)

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 3e35267 to c93d9fe Compare July 29, 2021 18:56
@spackbot-app spackbot-app bot added build-systems tests General test capability(ies) labels Jul 29, 2021
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Sep 20, 2021

Generally I think 99% of the packages that use ^xyz in conflicts or when clauses refer to their direct dependencies, not to some arbitrary dependency of a dependency of a dependency.

I think that's already the semantics for packages. If you conflict with something it's supposed to be either a node attribute or a direct dependency (maybe through a virtual). We'll need to extend the DSL used on the command line in #15569 to allow referencing edge attributes, of which virtuals are an example.

Anyhow, this PR just changes the internal data representation, so no user facing behavior is changed. There are APIs to discriminate among transitive dependencies and the implementation for __getitem__ privileges link/run dependencies to build deps.

In a followup PR I'll build on this one to introduce separate concretization of build dependencies, and there we may have to face cases like the one you mentioned above.

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 8d52a24 to c465ab0 Compare January 3, 2022 09:07
Comment on lines +1534 to +1884
# TODO: This assumes that each solve unifies dependencies
dependencies[0].add_type(type)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be slightly tricky to update with separate concretization of build deps, but I think the PSIDs will give us a way to make it work.

topological_order = []
par = {}
for name, specs in nodes.items():
par[name] = [x for x in parents(specs) if x.name in nodes]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still assumes there's only one node of a given name in the tree we're sorting -- I don't think this can graph the synthetic creations you use for testing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest code, using this script:

import spack.graph
import spack.hash_types
import spack.spec

spack.hash_types.package_hash = spack.hash_types.SpecHashDescriptor(
    deptype=(), package_hash=True, name='package_hash', override=lambda s: ''
)

Spec = spack.spec.Spec

nodes = {
    'n1': Spec('[email protected]'), 'n2': Spec('[email protected]'), 'n3': Spec('[email protected]'), 'n4': Spec('[email protected]')
}

for s in nodes.values():
    s._mark_concrete()

original = nodes['n1']
nodes['n2'].add_dependency_edge(nodes['n4'], deptype=('build', 'link'))
nodes['n3'].add_dependency_edge(nodes['n4'], deptype=('build', 'link'))
original.add_dependency_edge(nodes['n2'], deptype='run')
original.add_dependency_edge(nodes['n3'], deptype='link')

spack.graph.graph_ascii(original)

I obtain the following graph:

o [email protected]/bysqrwy
|\
| o [email protected]/tnvdtkr
o | [email protected]/n4gla5x
|/
o [email protected]/gaeiwev

Comment on lines +923 to +954
mutable_database.remove('mpileaks ^callpath ^mpich2')
mutable_database.remove('callpath ^mpich2')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier to read in the other order, but not worth changing unless you're in this file anyway

# If there's something in the list, check if we need to update an
# already existing entry.
for dep in current_list:
if edge.spec == dep.spec and edge.parent == dep.parent:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.

@becker33
Copy link
Copy Markdown
Member

becker33 commented Jan 5, 2022

This also needs to be updated for splicing

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jan 10, 2022

@becker33 With respect to what we discussed in Slack to extend the splice API to:

def splice(self, other, transitive, deptype):
     ...

where deptype is used to select which spec we want to splice in self, I think the API might not be general enough to cover for cases that may happen when allowing multiple specs from the same package.

One (synthetic) use case that we can't cover with that interface is the following. Start with a spec like:

   A
 /   \
B     C

and say that we want to splice in that:

     C'
   /  \
  /    \
B_1'     B_2'

where the new C spec depends on 2 different B specs. In that case it's not a matter of selecting which node in the original spec needs to be spliced, but rather which one of B_1' or B_2'needs to be used for the spliced spec if transitive=True.1 I'm trying to work this out on paper, but I think the might need some "translation" dict that maps hashes of specs that are to be substituted to hashes of specs that will substitute them. We might discuss this again offline, but I wanted to leave here a sketch of the use case.

@nhanford You might be interested in this case / have opinions on how to solve this issue.

Footnotes

  1. Another sensible option is to use all three B, B_1' and B_2' in the spliced spec.

@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 3 times, most recently from ccc77e3 to 69de40d Compare January 18, 2022 10:02
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch 3 times, most recently from 860285c to 882cfeb Compare January 19, 2022 13:09
Introduce a new data structure, called _EdgeMap
that maps package names to list of DependencySpec
objects (edges in the DAG).

This data structure is used to track both
dependencies and dependents of a node. It
allows filtering the stored values by
parent or child name and by dependency types.

Unit tests added:
- Regression on 11983
- Synthetic construction of split dependency
- Synthetic construction of bootstrapping
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 882cfeb to cd95594 Compare January 26, 2022 15:12
@alalazo alalazo force-pushed the features/subscript_api_multiple_nodes_per_package branch from 2708c8b to 2fd1940 Compare January 26, 2022 22:09
@alalazo alalazo closed this Jan 29, 2022
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jan 29, 2022

This is kind of weird. I force pushed the branch to update the PR. The PR wasn't updated and now I cannot reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Something wasn't working, here's a fix build-systems commands tests General test capability(ies) update-package utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incomplete computation of installed dependents

5 participants