specs: move to new spec.json format with build provenance#22845
specs: move to new spec.json format with build provenance#22845tgamblin merged 89 commits intospack:developfrom
Conversation
8e293f5 to
8003bf0
Compare
| return hash(lang.tuplify(self._cmp_iter)) | ||
|
|
||
| def __reduce__(self): | ||
| return _spec_from_dict, (self.to_dict(hash=ht.build_hash),) |
There was a problem hiding this comment.
I think this may be wrong, at least in develop, in the sense that using the default means using the DAG hash. The information returned here should be enough to reconstruct an equivalent Spec object when we unpickle (and I think pickle caches these results, though I can look for a definitive confirmation of that).
Since we serialize to pass the information to a spawned process on macOS and we don't seek persistence, I felt like a change in the recipe was ruled out and build hash was the right granularity. Using the dag hash will make any spec with different build deps equivalent to each other.
There was a problem hiding this comment.
I think using the build hash makes sense. However, this PR subtly changes to_dict so that when the caller specifies a hash, it writes and "indexes by" (checks node uniqueness by) that hash. I was trying to avoid an implementation that looks at all the nodes and sees if they all have a full hash, then a build hash, then just go by DAG hash, but that might be necessary.
| if 'name' in node.keys(): | ||
| # New format | ||
| name = node['name'] | ||
| else: | ||
| # Old format | ||
| name = next(iter(node)) | ||
| node = node[name] | ||
| for h in ht.hashes: | ||
| setattr(spec, h.attr, node.get(h.name, None)) |
There was a problem hiding this comment.
Definitely not asking for a refactor here, but in general I wonder if when we upgrade formats we should have separate functions to read different formats.
|
🥳 |
|
FWIW, test speeds using the data from #25507 and the merge commit from this pr (there's been some new tests, prs, features in the meantime of course):
Total time is -17.6% Not sure what's up on macos, seems like it regressed before this pr |
#22845 revealed a long-standing bug that had never been triggered before, because the hashing algorithm had been stable for multiple years while the bug was in production. The bug was that when reading a concretized environment, Spack did not properly read in the build hashes associated with the specs in the environment. Those hashes were recomputed (and as long as we didn't change the algorithm, were recomputed identically). Spack's policy, though, is never to recompute a hash. Once something is installed, we respect its metadata hash forever -- even if internally Spack changes the hashing method. Put differently, once something is concretized, it has a concrete hash, and that's it -- forever. When we changed the hashing algorithm for performance in #22845 we exposed the bug. This PR fixes the bug at its source, but properly reading in the cached build hash attributes associated with the specs. I've also renamed some variables in the Environment class methods to make a mistake of this sort more difficult to make in the future. * ensure environment build hashes are never recomputed * add comment clarifying reattachment of env build hashes * bump lockfile version and include specfile version in env meta * Fix unit-test for v1 to v2 conversion Co-authored-by: Massimiliano Culpo <[email protected]>
| if 'build_spec' in node.keys(): | ||
| node_hash = node[hash_type] | ||
| else: | ||
| node_hash = node[hash_type] |
There was a problem hiding this comment.
IIRC this is vestigial from when I was experimenting with forcing a certain hash type for spliced specs, and then I must have quickly done a "replace all" to fix it. Please feel free to remove the unnecessary conditional or let me know and I can do it.
There was a problem hiding this comment.
Thanks @nhanford I have removed the conditional in the linked PR above, we just wanted to make sure we weren't covering up the trail for someone diagnosing some potential future issue.
) Change the internal representation of `Spec` to allow for multiple dependencies or dependents stemming from the same package. This change permits to represent cases which are frequent in cross compiled environments or to bootstrap compilers. Modifications: - [x] Substitute `DependencyMap` with `_EdgeMap`. The main differences are that the latter does not support direct item assignment and can be modified only through its API. It also provides a `select_by` method to query items. - [x] Reworked a few public APIs of `Spec` to get list of dependencies or related edges. - [x] Added unit tests to prevent regression on #11983 and prove the synthetic construction of specs with multiple deps from the same package. Since #22845 went in first, this PR reuses that format and thus it should not change hashes. The same package may be present multiple times in the list of dependencies with different associated specs (each with its own hash).
This is a major rework of Spack's core core
spec.yamlmetadata format. It moves fromspec.yamltospec.jsonfor speed, and it changes the format in several ways. Specifically:_metasection with a version (now set to version2). This will simplify major changes like this one in the future.hash,full_hash,build_hash).build_specprovenance from New splice method in class Spec. #20262 is included in the spec format. This means that, for spliced specs, we preserve the full provenance of how to build, and we can reproduce a spliced spec from the original builds that produced it.NOTE: Because we have switched the spec format, this PR changes Spack's hashing algorithm. This means that after this commit, Spack will think a lot of things need rebuilds.
There are two major benefits this PR provides:
The old
spec.yamlformat continues to be supported, but is now considered a legacy format, and Spack will opportunistically convert these to the newspec.jsonformat.