Merged
Conversation
scheibelp
reviewed
Dec 24, 2021
cosmicexplorer
previously approved these changes
Dec 25, 2021
Contributor
cosmicexplorer
left a comment
There was a problem hiding this comment.
This is awesome!!!!!!! I'm marking approve because I would accept this as-is, but will pay attention to any discussion on my review comments.
cc15d63 to
92a0bf8
Compare
We are planning to switch to using full hashes for Spack specs, which means that the
package hash will be included in the deployment descriptor. This means we need a more
robust package hash than simply dumping the `repr` of the AST.
The AST repr that we previously used for package content is unreliable because it can
vary between python versions (Python's AST actually changes fairly frequently).
- [x] change `package_hash`, `package_ast`, and `canonical_source` to accept a string for
alternate source instead of a filename.
- [x] consolidate package hash tests in `test/util/package_hash.py`.
- [x] remove old `package_content` method.
- [x] make `package_hash` do what `canonical_source_hash` was doing before.
- [x] modify `content_hash` in `package.py` to use the new `package_hash` function.
Co-authored-by: Danny McClanahan <[email protected]>
To make it easier to see how package hashes change and how they are computed, add two commands: * `spack pkg source <spec>`: dumps source code for a package to the terminal * `spack pkg source --canonical <spec>`: dumps canonicalized source code for a package to the terminal. It strips comments, directives, and known-unused multimethods from the package. It is used to generate package hashes. * `spack pkg hash <spec>`: This gives the package hash for a particular spec. It is generated from the canonical source code for the spec. - [x] `add spack pkg source` and `spack pkg hash` - [x] add tests - [x] fix bug in multimethod resolution with boolean `@when` values Co-authored-by: Greg Becker <[email protected]>
We can't tell `print(a, b, c)` and `print((a, b, c))` apart -- both of these expressions generate different ASTs in Python 2 and Python 3. However, we can decide that we don't care. This commit treats both of them the same when `py_ver_consistent` is set with `unparse()`. This means that the package hash won't notice changes from printing a tuple to printing multiple values, but we don't care, because this is extremely unlikely to affect the build. More than likely this is just an error message for the user of the package. - [x] treat `print(a, b, c)` and `print((a, b, c))` the same in py2 and py3 - [x] add another package parsing test -- legion -- that exercises this feature
Handle complex f-strings. Backport of:
python/cpython@a993e90
…ream These are refactors that have happened in upstream `ast.unparse()`
These refactors have happened in upstream `ast.unparse()`
These are the unit tests from astunparse, converted to pytest, with a few backports from upstream cpython. These should hopefully keep `unparser.py` well covered as we change it.
Many packages implement logic at the class level to handle complex dependencies and
conflicts. Others have started using `with when("@1.0"):` blocks since we added that
capability. The loops and other control logic can cause some pure directive logic not to
be removed by our package hashing logic -- and in many cases that's a lot of code that
will cause unnecessary rebuilds.
This commit changes the unparser so that it will descend into these blocks. Specifically:
1. Descend into loops, if statements, and with blocks at the class level.
2. Don't look inside function definitions (in or outside a class).
3. Don't look at nested class definitions (they don't have directives)
4. Add logic to *remove* empty loops/with blocks/if statements if all directives
in them were removed.
This allows our package hash to ignore a lot of pure metadata that it was not ignoring
before, and makes it less sensitive.
In addition, we add `maintainers` and `tags` to the list of metadata attributes that
Spack should remove from packages when constructing canonoical source for a package
hash.
- [x] Make unparser handle if/for/while/with at class level.
- [x] Add tests for control logic removal.
- [x] Add a test to ensure that all packages are not only unparseable, but also
that their canonical source is still compilable. This is a test for
our control logic removal.
- [x] Add another unparse test package that has complex logic.
ab57e93 to
d2a9fcd
Compare
alalazo
approved these changes
Jan 12, 2022
tgamblin
added a commit
that referenced
this pull request
Aug 22, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
4 tasks
tgamblin
added a commit
that referenced
this pull request
Oct 2, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin
added a commit
that referenced
this pull request
Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin
added a commit
that referenced
this pull request
Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin
added a commit
that referenced
this pull request
Mar 10, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin
added a commit
that referenced
this pull request
Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin
added a commit
that referenced
this pull request
Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Aug 12, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Sep 4, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Oct 22, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
tgamblin
added a commit
that referenced
this pull request
Oct 28, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on the spec in #28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in #28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
vjranagit
pushed a commit
to vjranagit/spack
that referenced
this pull request
Jan 18, 2026
We've included a package hash in Spack since spack#7193 for CI, and we started using it on the spec in spack#28504. However, what goes into the package hash is a bit opaque. Here's what `spec.json` looks like now: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====", "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp" } ] } } ``` The `package_hash` there is a hash of the concatenation of: * A canonical hash of the `package.py` recipe, as implemented in spack#28156; * `sha256`'s of patches applied to the spec; and * Archive `sha256` sums of archives or commits/revisions of repos used to build the spec. There are some issues with this: patches are counted twice in this spec (in `patches` and in the `package_hash`), the hashes of sources used to build are conflated with the `package.py` hash, and we don't actually include resources anywhere. With this PR, I've expanded the package hash out in the `spec.json` body. Here is the "same" spec with the new fields: ```json { "spec": { "_meta": { "version": 3 }, "nodes": [ { "name": "zlib", "version": "1.2.12", ... "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4", "sources": [ { "type": "archive", "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9" } ], "patches": [ "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76" ], "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein" } ] } } ``` Now: * Patches and archive hashes are no longer included in the `package_hash`; * Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`; * `sources` will include resources for packages that have it; * Patches are the same as before -- but only represented once; and * The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only tells you that the `package.py` changed. The behavior of the DAG hash (which includes the `package_hash`) is basically the same as before, except now resources are included, and we can see differences in archives and resources directly in the `spec.json` Note that we do not need to bump the spec meta version on this, as past versions of Spack can still read the new specs; they just will not notice the new fields (which is fine, since we currently do not do anything with them). Among other things, this will more easily allow us to convert Spack specs to SBOM and track relevant security information (like `sha256`'s of archives). For example, we could do continuous scanning of a Spack installation based on these hashes, and if the `sha256`'s become associated with CVE's, we'll know we're affected. - [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a fetcher for a `spec.json`. - [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as a special-case spec hash in `hash_types.py`, but it really doesn't belong there. Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py` is much simpler. - [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and include more information about artifacts in it. - [x] Update package hash tests and make them check for artifact and resource hashes. Signed-off-by: Todd Gamblin <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ensures that the package hash is consistent across Python versions, and adds a lot of tests to ensure that. It also makes the package hash less sensitive, to avoid superfluous rebuilds in CI.
Now that we have #25310 (reusing installed packages), we want to make Spack hashes more fine-grained to preserve provenance. To do that we want to switch to using full hashes everywhere. Full hashes include all build dependencies, a hash of the canonicalized package file, and the hashes of artifacts and patches used in the build.
The existing full hash worked by editing the AST of a
package.pyfile, removing multimethods (where statically possible b/c we know they won't be triggered), and stripping comments, docstrings, and directives. Comments and docstrings don't affect the build, and directives are represented by other metadata on the spec. Once all these modifications were done, we dumped therepr()of the AST and hashed that.There were a few problems with the old package hash:
repr(). So, Spack running under Python 2.7, 3.5, and 3.8 may calculate entirely different full hashes for the samepackage.pyfile. This isn't as big a deal right now, but it needs to be fixed if we are going to move to full hashes.if,for,with when(), etc.), and that constitutes a lot of logic for some packages, even when all the loops are doing is setting up complex directives.maintainersandtags, and we really don't want those to require a rebuild.This PR makes the following modifications:
spack.util.unparse, which regenerates Python source code from a parsed AST. This is based on https://github.com/simonpercivall/astunparse, which is consistent across Python 2.7-3.8, with back ports from the newast.unparsefunction in Python 3.9+.unparse()method implemented here is consistent for Python 2.7 and 3.5-3.10.py_ver_consistent=Trueargument tounparse(), which handles the following cases:*argsand**kwargsare last in function calls, for consistency with the Python 2 AST (which has no idea what position they're in)print()as a function (Python 2's AST treats it as its own statement) and do not differentiate betweenprint(a, b, c)andprint((a, b, c)). These have different semantics in Python 2 and Python 3, but they almost certainly do not affect the results of builds, so switching between them won't affect the package hash.\xencoding in output, and unicode literals will be unparsed with auprefix.unparse()is consistent withast.unparse()whenpy_ver_consistentis not used, so that we may be able to switch to the builtinast.unparse()one day (if we ever only support Python 3.9+). This is done with backports of things likeast.unparse's parenthesis precedence algorithm.package_hash.pyto preserve non-whendecorators, and handle dynamic@whenstatements better. Add a bunch of tests for this.package.pyfile. This is safe as these strings are used as comments -- they don't affect build results.spack pkg source <spec>: dump source for apackage.pyfilespack pkg source --canonical <spec>: dump canonical source for apackage.pyfile, with comments and docstrings stripped and multimethods removed.spack pkg hash <spec>: print the canonical package hash for the spec.package_hashand use the new, stableunparse-based one.maintainers,tags) to make the hash less sensitive.I've verified that this is consistent for Python 2.7 and 3.5-3.10 across all Spack packages by unparsing them and comparing their canonicalized source. The test packages in
lib/spack/spack/test/data/unparseand thetest_package_hash_consistency()test are meant to preserve this with some spot checks of interesting cases.This should trigger a large CI rebuild on this PR, but it may reduce CI churn in the future (though I think CI was only using one version of Python, so maybe not).