Allow for multiple dependencies/dependents from the same package#21683
Allow for multiple dependencies/dependents from the same package#21683alalazo wants to merge 13 commits intospack:developfrom
Conversation
e33c575 to
72d8bab
Compare
ab6161d to
f3e6b03
Compare
f5cea50 to
b2907d7
Compare
b2907d7 to
2162313
Compare
2162313 to
83dcaa2
Compare
scheibelp
left a comment
There was a problem hiding this comment.
I have a few questions to start with. Primarily I'm interested in why this changes both dependencies and dependents. My understanding is that just changing dependents would be sufficient to fix #11983 (more details in the comments).
| nodes.update(self_nodes) | ||
|
|
||
| for name in nodes: | ||
| # TODO: check if splice semantics is respected |
There was a problem hiding this comment.
I'm guessing the intent would be to resolve this TODO before merging this.
What is the effect of splicing in a dependency when there are multiple instances of the package in the DAG? I would assume that this function might need to specify deptypes in that case.
There was a problem hiding this comment.
I'm guessing the intent would be to resolve this TODO before merging this.
To my understanding splicing is not yet used in Spack and there's just been some commit to prepare for this functionality. We can either leave a TODO and check it in a later PR (since there are no test failures) or see if @becker33 or maybe @nhanford can help with that here.
There was a problem hiding this comment.
As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.
For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.
Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).
(for reference the PR that added splicing is #20262)
scheibelp
left a comment
There was a problem hiding this comment.
I'm still reading through this to understand it. I have some additional questions.
|
|
||
| return clone | ||
|
|
||
| def select_by(self, parent=None, child=None, dependency_types='all'): |
There was a problem hiding this comment.
This API seems a little strange to me because the edge map is entirely parents or entirely children. I think something like:
def select_by(self, name=None, deptypes='all'):
...
if self.store_by == 'child':
selected = [d for d in selected if d.spec.name == name]
would avoid that confusion.
There was a problem hiding this comment.
I think the API is correct, in the sense that an _EdgeMap can store arbitrary list of edges, which are collected in a dictionary by either parent name or child name. What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children. Hence I think on this class the API is fine.
There was a problem hiding this comment.
What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.
I'm not sure if by this you mean
- you might want an edge map that stores edges only to parents, but for multiple children
- you might want an edge map that stores edges to both parents and children (I assume not this since the
EdgeMaphas astore_by_childproperty, but I want to make sure) - something else?
There was a problem hiding this comment.
This API seems a little strange to me because the edge map is entirely parents or entirely children.
Think for example storing all the edges for an environment that has multiple root specs and wanting to retrieve all the different edges from e.g. any hdf5 to any zlib. With the current API you can do it with this object.
Another way of saying this is that the current API fit using an EdgeMap object outside of a Spec object as a collection of edges.
| # If there's something in the list, check if we need to update an | ||
| # already existing entry. | ||
| for dep in current_list: | ||
| if edge.spec == dep.spec and edge.parent == dep.parent: |
There was a problem hiding this comment.
Should this be is vs. ==?
if edge.spec is dep.spec and edge.parent is dep.parent:
Specs may temporarily compare == even if we intend to manage two of them as separate objects (which eventually become unequal).
This may not be critical since the old concretizer should never do something like this and the new concretizer would (I think) only ever use a function like this when the specs were concrete.
There was a problem hiding this comment.
I think it should be == i.e. if we have equivalent nodes defining the edges, we don't care if they are or are not the same object in memory.
There was a problem hiding this comment.
If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?
I think == is ok as long as we mandate that the specs are concrete.
There was a problem hiding this comment.
If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.
7be96f8 to
3e35267
Compare
|
@scheibelp Ready for a second review |
scheibelp
left a comment
There was a problem hiding this comment.
This includes a couple questions and responses. Largest request is likely that we should think through how this interacts with splice (more in the comments).
| # If there's something in the list, check if we need to update an | ||
| # already existing entry. | ||
| for dep in current_list: | ||
| if edge.spec == dep.spec and edge.parent == dep.parent: |
There was a problem hiding this comment.
If the specs are not concrete, but later differ after becoming concrete, would we lose the edge information (i.e. only record one edge for what ends up later being two distinct dependencies)?
I think == is ok as long as we mandate that the specs are concrete.
|
|
||
| return clone | ||
|
|
||
| def select_by(self, parent=None, child=None, dependency_types='all'): |
There was a problem hiding this comment.
What I mean is that it's specs that use _EdgeMap objects to store edges having all the same parent or all the same children but an _EdgeMap object can in principle store completely different edges in terms of parents and children.
I'm not sure if by this you mean
- you might want an edge map that stores edges only to parents, but for multiple children
- you might want an edge map that stores edges to both parents and children (I assume not this since the
EdgeMaphas astore_by_childproperty, but I want to make sure) - something else?
| Spack specs have a single root (the package being installed). | ||
| """ | ||
| # FIXME: In the case of multiple parents this property does not | ||
| # FIXME: make sense. Should we revisit the semantics? |
There was a problem hiding this comment.
If there are multiple parents there may still be a single root (this would occur for a dependency in a single concretized spec), but if there are multiple roots (e.g. after database reconstruction), this can raise an error. I think that would settle it.
| nodes.update(self_nodes) | ||
|
|
||
| for name in nodes: | ||
| # TODO: check if splice semantics is respected |
There was a problem hiding this comment.
As is, this TODO is too vague. We should explain why this PR may lead to a failure to respect splice semantics.
For starters, I think this doesn't handle the case where we depend on the same package once for linking and once for building, and we want to splice in a separate build dependency (for that, I think splice would need to allow specifying a dependency type). I think that should be handled here (so in that sense it's not a good example of what should go into the TODO), but I also want to consider other cases where the logic is incomplete.
Overall I think this will have to be understood better before this PR is merged (maybe we don't need to solve every problem, but the anticipated issues will need to be enumerated).
(for reference the PR that added splicing is #20262)
3e35267 to
c93d9fe
Compare
I think that's already the semantics for packages. If you conflict with something it's supposed to be either a node attribute or a direct dependency (maybe through a virtual). We'll need to extend the DSL used on the command line in #15569 to allow referencing edge attributes, of which virtuals are an example. Anyhow, this PR just changes the internal data representation, so no user facing behavior is changed. There are APIs to discriminate among transitive dependencies and the implementation for In a followup PR I'll build on this one to introduce separate concretization of build dependencies, and there we may have to face cases like the one you mentioned above. |
8d52a24 to
c465ab0
Compare
| # TODO: This assumes that each solve unifies dependencies | ||
| dependencies[0].add_type(type) |
There was a problem hiding this comment.
This will be slightly tricky to update with separate concretization of build deps, but I think the PSIDs will give us a way to make it work.
lib/spack/spack/graph.py
Outdated
| topological_order = [] | ||
| par = {} | ||
| for name, specs in nodes.items(): | ||
| par[name] = [x for x in parents(specs) if x.name in nodes] |
There was a problem hiding this comment.
I think this still assumes there's only one node of a given name in the tree we're sorting -- I don't think this can graph the synthetic creations you use for testing.
There was a problem hiding this comment.
With the latest code, using this script:
import spack.graph
import spack.hash_types
import spack.spec
spack.hash_types.package_hash = spack.hash_types.SpecHashDescriptor(
deptype=(), package_hash=True, name='package_hash', override=lambda s: ''
)
Spec = spack.spec.Spec
nodes = {
'n1': Spec('[email protected]'), 'n2': Spec('[email protected]'), 'n3': Spec('[email protected]'), 'n4': Spec('[email protected]')
}
for s in nodes.values():
s._mark_concrete()
original = nodes['n1']
nodes['n2'].add_dependency_edge(nodes['n4'], deptype=('build', 'link'))
nodes['n3'].add_dependency_edge(nodes['n4'], deptype=('build', 'link'))
original.add_dependency_edge(nodes['n2'], deptype='run')
original.add_dependency_edge(nodes['n3'], deptype='link')
spack.graph.graph_ascii(original)I obtain the following graph:
o [email protected]/bysqrwy
|\
| o [email protected]/tnvdtkr
o | [email protected]/n4gla5x
|/
o [email protected]/gaeiwev
| mutable_database.remove('mpileaks ^callpath ^mpich2') | ||
| mutable_database.remove('callpath ^mpich2') |
There was a problem hiding this comment.
Would be easier to read in the other order, but not worth changing unless you're in this file anyway
| # If there's something in the list, check if we need to update an | ||
| # already existing entry. | ||
| for dep in current_list: | ||
| if edge.spec == dep.spec and edge.parent == dep.parent: |
There was a problem hiding this comment.
If we turn it into one edge, then we know they will concretize the same way because the concretizer will treat them as a single spec.
|
This also needs to be updated for splicing |
|
@becker33 With respect to what we discussed in Slack to extend the def splice(self, other, transitive, deptype):
...where One (synthetic) use case that we can't cover with that interface is the following. Start with a spec like: and say that we want to splice in that: where the new @nhanford You might be interested in this case / have opinions on how to solve this issue. Footnotes
|
ccc77e3 to
69de40d
Compare
860285c to
882cfeb
Compare
Introduce a new data structure, called _EdgeMap that maps package names to list of DependencySpec objects (edges in the DAG). This data structure is used to track both dependencies and dependents of a node. It allows filtering the stored values by parent or child name and by dependency types. Unit tests added: - Regression on 11983 - Synthetic construction of split dependency - Synthetic construction of bootstrapping
882cfeb to
cd95594
Compare
2708c8b to
2fd1940
Compare
|
This is kind of weird. I force pushed the branch to update the PR. The PR wasn't updated and now I cannot reopen it. |
fixes #11983
closes #16447
This PR changes the internal representation of
Specto allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers.Modifications:
DependencyMapwith_EdgeMap. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides aselect_bymethod to query items.Specto get list of dependencies or related edges.Due to the change in the internal representation of specs, the YAML file for specs will change too and the "dependencies" field will be a list of dictionaries instead of a single dictionary:This in turn will cause all of the hashes to change, so the PR is definitely not backward compatible with old installation hashes.Since #22845 went in first, this PR reuses that format and thus it should not change hashes. What happens is that in the list of dependencies we may have the same package being present multiple times with different associated specs.