Skip to content

ASP-based solver: unique objects for reused specs#39590

Merged
haampie merged 9 commits intospack:developfrom
alalazo:bugfix/double-python
Oct 19, 2023
Merged

ASP-based solver: unique objects for reused specs#39590
haampie merged 9 commits intospack:developfrom
alalazo:bugfix/double-python

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Aug 23, 2023

fixes #39570

Reused specs used to be referenced directly into the built spec.

This might cause issues like in #39570 where two objects in memory represent the same node, because two reused specs were loaded from different sources but referred to the same spec by DAG hash.

Here the issue is solved by registering the concrete specs to be reused in a container that ensures their consistency.

@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Aug 23, 2023
@alalazo

This comment was marked as outdated.

@alalazo alalazo requested a review from tgamblin August 23, 2023 15:24
@alalazo alalazo force-pushed the bugfix/double-python branch from 156e802 to a8e99c7 Compare August 24, 2023 13:46
@spackbot-app spackbot-app bot added stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies) labels Aug 24, 2023
@alalazo

This comment was marked as outdated.

@alalazo alalazo force-pushed the bugfix/double-python branch from a8e99c7 to 7f8c44a Compare October 16, 2023 14:39
@alalazo alalazo changed the title ASP-based solver: reconstruct edges for reused specs ASP-based solver: unify spec objects for reused specs Oct 16, 2023
@alalazo alalazo requested a review from haampie October 16, 2023 14:44
return False

# Unify objects in the container
for edge in reversed(spack.traverse.traverse_edges_topo([spec], direction="children")):
Copy link
Copy Markdown
Member

@haampie haampie Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why edges / double loop / topo order?

Copy link
Copy Markdown
Member

@haampie haampie Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: if the goal is to early exit on sub-dag added before, don't use topo order, cause it has to traverse the full dag to order things :) so, just use dfs and don't recurse into children already added (so, use a visitor directly cause an iterator can't be stopped from recursing; if that's too verbose, do recursion or keep a stack)

Copy link
Copy Markdown
Member Author

@alalazo alalazo Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why edges?

Cause I forgot to update this function

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the goal is to early exit on sub-dag added before

No, the goal is to unify object for subdags added before - similarly to a roundtrip to-from DB. That unifies the 2 Python objects with the same DAG we have in memory.

Copy link
Copy Markdown
Member

@haampie haampie Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do early exit though? It doesn't make sense to iterate over everything and skip every node from each subdag seen before.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you must do this. Your current implementation effectively copies a database in O(n^2) time.

Copy link
Copy Markdown
Member

@haampie haampie Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ReusableSpecsByHash is rather wasteful, it always copies everything.

By design daghash(x) == daghash(y) => id(x) == id(y) for all x and y from the same database. So the issue is just about merging databases (and adding small number of specs from say spec.json files).

So, why not duplicate only dependents of specs that occur in multiple databases? There's already a dictionary of dag hashes per db.

@haampie haampie changed the title ASP-based solver: unify spec objects for reused specs ASP-based solver: unique objects for reused specs Oct 16, 2023
Copy link
Copy Markdown
Member

@haampie haampie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid copies of all specs from all databases, and instead exploit the invariant that each database has one spec instance per unique dag hash -- if there are two databases A and B, it's only necessary to copy those nodes that are unique to A but have dependencies in common with B.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 16, 2023

I'll look into that optimization, but for reference this is what I get with:

  • Spack: 0.21.0.dev0 (ef06f0d)
  • Python: 3.11.5
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

after adding:

$ spack mirror add v0.20.1 https://binaries.spack.io/v0.20.1

develop: 10999c0
PR: ef06f0d

radiuss-pr.csv
radiuss-develop.csv
radiuss.txt

radiuss

and that is the reason why this implementation seemed viable. I'll try with a bigger buildcache (more traversal and copies) and see what I get.

@haampie
Copy link
Copy Markdown
Member

haampie commented Oct 16, 2023

sure, but these things add up over time, so why not implement it efficiently from the start, if that's not difficult to achieve.

At the very least, make it linear time...

@haampie
Copy link
Copy Markdown
Member

haampie commented Oct 17, 2023

notice: BinaryCacheIndex._mirrors_for_spec is a mapping from dag hash -> list of mirrors, so it should be rather trivial to "dedupe" specs.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 17, 2023

I implemented an explicit visitor. With that, using:

  • Spack: 0.21.0.dev0 (34b99e3)
  • Python: 3.11.5
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

with a setup like:

$ spack mirror add v0.20.1 https://binaries.spack.io/v0.20.1
$ spack mirror add E4S https://cache.e4s.io

develop: 348e5cb
PR: 34b99e3

radiuss-develop.csv
radiuss-pr.csv
radiuss.txt

radiuss

The case of ascent, where develop is slower, seems to be some transitory issue - repeating the concretization on that leads to similar times in both the PR and develop. Overall, I wouldn't complicate the code further to optimize the tiny difference in the setup phase.

@alalazo alalazo requested a review from haampie October 17, 2023 12:30
@haampie
Copy link
Copy Markdown
Member

haampie commented Oct 17, 2023

Hm okay. The current implementation is roughly equivalent to db = Database(); for x in xs: db.add(x) ; reusable_specs = db.query_local(). Wouldn't that be simpler than?

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Oct 17, 2023

Wouldn't that be simpler than?

Yes, I looked into that as a first approach, but wouldn't that require to have a temporary DB on the filesystem? This one doesn't.1

Footnotes

  1. I agree though that the code should be consolidated at some point, among all the various flavors of "DBs"

@tgamblin tgamblin added this to the v0.21.0 milestone Oct 17, 2023
@alalazo alalazo force-pushed the bugfix/double-python branch from 34b99e3 to 16b3a2f Compare October 18, 2023 18:41
Reused specs used to be referenced directly into the built spec.

This might cause issues like in spack#39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

Here the issue is solved by registering the concrete specs to
be reused in a container that ensures their consistency.
@alalazo alalazo force-pushed the bugfix/double-python branch from 16b3a2f to c4d84b5 Compare October 18, 2023 21:15
assert spack.solver.asp._is_checksummed_version((v, v_opts)) == checksummed


class TestConcreteSpecsByHash:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a class?

Copy link
Copy Markdown
Member Author

@alalazo alalazo Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to group tests for the ConcreteSpecsByHash class. This can be used during development:

$ spack unit-test lib/spack/spack/test/concretize.py::TestConcreteSpecsByHash

to run all related unit tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all as in one, but sure ;p

@haampie haampie merged commit a1ca1a9 into spack:develop Oct 19, 2023
@alalazo alalazo deleted the bugfix/double-python branch October 19, 2023 14:09
mtaillefumier pushed a commit to mtaillefumier/spack that referenced this pull request Oct 23, 2023
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
victoria-cherkas pushed a commit to victoria-cherkas/spack that referenced this pull request Oct 30, 2023
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Nov 2, 2023
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
gabrielctn pushed a commit to gabrielctn/spack that referenced this pull request Nov 24, 2023
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
mtaillefumier pushed a commit to mtaillefumier/spack that referenced this pull request Dec 14, 2023
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jan 11, 2024
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jan 31, 2024
Reused specs used to be referenced directly into the built spec.

This might cause issues like in issue 39570 where two objects in
memory represent the same node, because two reused specs were
loaded from different sources but referred to the same spec
by DAG hash.

The issue is solved by copying concrete specs to a dictionary keyed
by dag hash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

concretizer: reuse + lookup cause two instances of dag-hash equivalent spec in link-run subdag

3 participants