Conversation
45407af to
69430d1
Compare
lib/spack/spack/package.py
Outdated
| # typically allow for just retrieving the latest commit ID) | ||
| source_id = fs.for_package_version(self, self.version).source_id() | ||
| if not source_id: | ||
| return |
There was a problem hiding this comment.
@scheibelp This causes failures because package_hash doesn't expect None in its use of this method, so it seems that something will need to be done here.
There was a problem hiding this comment.
Probably we should make this just hash empty string if we do not have the content.
There was a problem hiding this comment.
Eh, that seems flaky to me. The hash of a package changes if I delete a patch file on disk?
|
@scheibelp There's also a test failure in |
I don't see that test here or in develop...scheibelp:features/package-hash and it's been a while so I'd have to see it to get a better understanding of why there is a failure now. Since then the manifestation of patches as implicit variants (5476) may have resulted in a conflict here. |
|
The test is in |
tgamblin
left a comment
There was a problem hiding this comment.
I added a bunch of comments. This looks mostly good -- thanks for rebasing and refactoring commits. The refactored commits made it much easier to review.
There are two goals with this PR:
- Being able to tell two specs apart by their content hashes, so that the build farm can rebuild when a content hash has changed.
- Eventually transition to using full hashes to identify specs.
Right now, we only want (1). Eventually, full hashes will replace the DAG hash, but not yet. Doing that now will cause many rebuilds, and the concretizer is not aggressive enough about reusing existing packages to handle it. Once the new concretizer is in (in a couple months) we can revisit the transition.
So, given that, I think we need the following changes:
- Remove all the parsing logic added for full hashes, as well as the look-up-by-full-hash function. Eventually there will be only one (full) hash and we won't need to parse two types of hashes.
- Remove the last commit that switches the DB to use full hashes (though it might be worthwhile to keep that on a separate branch to reuse later).
- store the full hash in
spec.yamland in the DB (and in binary package metadata), but do not use it to index or identify specs.
(3) is needed so that the build farm can tell when a package should be rebuilt due to a package file or source change. Eventually that should just be a check of the spec hash, but for now, it's a special check just for the build farm while we wait for the new concretizer.
Make sense?
lib/spack/spack/fetch_strategy.py
Outdated
| """ | ||
|
|
||
| def source_id(self): | ||
| pass |
There was a problem hiding this comment.
This should probably raise NotImplementedError(), not just pass, as it's an error if an implementation doesn't provide a valid implementation of this function (or leave it to raise NotImplementedError if, for a given implementation, the function is never expected to be called).
This should also document the contract. What's it supposed to do? What type does it return (string, tuple, yaml-able thing?, etc)? Put a docstring on this.
There was a problem hiding this comment.
It seems to need just a __str__ method and that's it.
lib/spack/spack/patch.py
Outdated
| return base64.b32encode(hashlib.md5(F.read()).digest()).lower() | ||
|
|
||
| @property | ||
| def file_hash(self): |
There was a problem hiding this comment.
This is redundant now. Since #5476, all implementations of Patch have a sha256 attribute that you can use for this. URLPatches also have an archive_sha256, which allows us to checksum the downloaded archive, but you're guaranteed that the sha256 attribute is a content hash of the patch file. So replace this with sha256.
lib/spack/spack/util/package_hash.py
Outdated
| import re | ||
|
|
||
|
|
||
| attributes = ['homepage', 'url', 'list_url', 'extendable', 'parallel', |
There was a problem hiding this comment.
These should live in package.py, as the list needs to be updated whenever a new attribute is added. If this list lives here it will probably get out of sync.
| return node | ||
|
|
||
|
|
||
| class TagMultiMethods(ast.NodeVisitor): |
| nodes.append((node, None)) | ||
|
|
||
|
|
||
| class ResolveMultiMethods(ast.NodeTransformer): |
lib/spack/spack/spec.py
Outdated
| # These are possible token types in the spec grammar. | ||
| # | ||
| HASH, DEP, AT, COLON, COMMA, ON, OFF, PCT, EQ, ID, VAL = range(11) | ||
| HASH, FULL_HASH, DEP, AT, COLON, COMMA, ON, OFF, PCT, EQ, ID, VAL = range(12) |
There was a problem hiding this comment.
There's no reason to add this here, since the lexer doesn't actually have a sigil for parsing full hashes. I think it's totally reasonable to add print logic for the full hash. But I would not add parse logic for it yet. We are not transitioning to using full hashes to identify specs; we're just getting the package hashing in so that we can use it in the build farm.
lib/spack/spack/spec.py
Outdated
| specs.append(self.spec_by_hash()) | ||
| elif self.accept(FULL_HASH): | ||
| # We're finding a spec by hash | ||
| specs.append(self.spec_by_full_hash()) |
lib/spack/spack/spec.py
Outdated
| elif self.accept(FULL_HASH): | ||
| # We're finding a dependency by hash for an | ||
| # anonymous spec | ||
| dep = self.spec_by_full_hash() |
lib/spack/spack/spec.py
Outdated
|
|
||
| return matches[0] | ||
|
|
||
| def spec_by_full_hash(self): |
lib/spack/spack/spec.py
Outdated
| else: | ||
| raise InvalidHashError(spec, hash_spec.dag_hash()) | ||
|
|
||
| elif self.accept(FULL_HASH): |
This will be included in the full hash of packages.
These attributes are ignored when doing a content hash of a package.
|
I've removed the last two commits. This resolves the test failures and it takes out the |
dc176c2 to
8622d32
Compare
|
The other commits are available here: https://github.com/mathstuf/spack/tree/package-hash-in-db |
| """Get the first <bits> bits of the DAG hash as an integer type.""" | ||
| return base32_prefix_bits(self.dag_hash(), bits) | ||
|
|
||
| def full_hash(self, length=None): |
There was a problem hiding this comment.
Still needs test coverage. The tests seem to call package_hash, but this doesn't. Why?
There was a problem hiding this comment.
The tests were for package_hash directly. This calls content_hash which ends up calling package_hash later. Will add tests to this commit.
lib/spack/spack/package.py
Outdated
| if not source_id: | ||
| message = 'Missing a source id for {s.name}@{s.version}' | ||
| tty.warn(message.format(s=self)) | ||
| hashContent.append('') |
There was a problem hiding this comment.
@mathstuf: I'd still leave a TODO here (in addition to the warning) with a note to fix this when we refactor fetchers.
I think you said in an earlier comment that this was brittle, and that if a tarball is removed, a package would then have a different hash. Can you elaborate on that? The hash for an existing installation will never change, because we store the hash that was generated at install time. So this may cause a rebuild if, e.g., the version is later checksummed, but it shouldn't cause any existing install to change.
There was a problem hiding this comment.
Now that all sources and patches come with a .sha256, that's not a problem I think. I've put the comment back with the updated reasoning.
8622d32 to
09b5842
Compare
|
Failing on Python 3 compatibility |
8c07814 to
a3b84cb
Compare
|
I fixed the Python3 incompat bit. |
This will be included in the full hash of packages.
This helps to ensure that patches are applied consistently and will also be used as the source for the patch part of full package hashes.
This calculates a hash which depends on the complete content of the package including sources and the associated `package.py` file.
This hash includes the content of the package itself as well as the DAG for the package.
a3b84cb to
176c3cf
Compare
|
And Python3 tests passed locally. |
|
Did this PR by any chance change the order in which patches are applied? Some patches need to be applied in a certain order. See #7543. |
alalazo
left a comment
There was a problem hiding this comment.
Is there a reason why we started sorting patches by urls?
| patch_list | ||
| for spec, patch_list in self.patches.items() | ||
| if self.spec.satisfies(spec)) | ||
| return sorted(patchesToApply, key=lambda p: p.path_or_url) |
There was a problem hiding this comment.
@adamjstewart This seems a likely candidate to inspect for the behavior seen in #7543 (we return the patches sorted by url, if I am not reading this wrongly)
There was a problem hiding this comment.
To add on these comments, I checked and we never really enforced a global order on patch application. What we were doing was to enforce a local order for the same constraint.
Said otherwise, patches from the following directives:
patch('patch1', when='constraint1')
patch('patch2', when='constraint2')
patch('patch3', when='constraint1')are stored in packages like:
{
'constraint1': ['patch1', 'patch3'],
'constraint2': ['patch2'],
}using a dictionary. Before this PR we were granted that patch3 would be applied after patch1, but we had no guarantee on the order of application for patch2 with respect to the other two. After this PR we are granted that we apply patches sorted by their path or urls. I think both strategies are wrong and we should apply patches in the order they are declared, if they match the constraint.
I'll submit a minimal PR to recover the old behavior, and work on a more permanent fix later, if people agree with my analysis above. @tgamblin @mathstuf
| # Apply all the patches for specs that match this one | ||
| patched = False | ||
| for patch in patches: | ||
| for patch in self.patches_to_apply(): |
There was a problem hiding this comment.
@adamjstewart And here we use the patches sorted differently.
fixes spack#7543 This is very likely an hot-fix, while a more permanent solution is needed. See this comment for more insight: spack#7193 (comment) on the problem, that probably asks
fixes spack#7543 This is very likely an hot-fix, while a more permanent solution is needed. See this comment for more insight: spack#7193 (comment) on the problem.
fixes #7543 This is very likely an hot-fix, while a more permanent solution is needed. See this comment for more insight: #7193 (comment) on the problem.
fixes spack#7543 This is very likely an hot-fix, while a more permanent solution is needed. See this comment for more insight: spack#7193 (comment) on the problem.
Fixes #7885 #7193 added the patches_to_apply function to collect patches which are then applied in Package.do_patch. However this only collects patches that are associated with the Package object and does not include Spec-related patches (which are applied by dependents, added in #5476). Spec.patches already collects patches from the package as well as those applied by dependents, so the Package.patches_to_apply function isn't necessary. All uses of Package.patches_to_apply are replaced with Package.spec.patches. This also updates Package.content_hash to require the associated spec to be concrete: Spec.patches is only set after concretization. Before this PR, it was possible for Package.content_hash to be valid before concretizing the associated Spec if all patches were associated with the Package (vs. being applied by dependents). This behavior was unreliable though so the change is unlikely to be disruptive.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
We've included a package hash in Spack since spack#7193 for CI, and we started using it on the spec in spack#28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in spack#28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
This is the
features/package-hashbranch rebased ontodevelopand split into smaller, more logical commits. Currently there are two test failures on the top of the branch, but the rest of the branch is passing the test suite.Cc: @tgamblin @scheibelp
See #7119