Skip to content

Make the package hash consistent#28156

Merged
becker33 merged 21 commits intodevelopfrom
consistent-package-hash
Jan 12, 2022
Merged

Make the package hash consistent#28156
becker33 merged 21 commits intodevelopfrom
consistent-package-hash

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented Dec 24, 2021

This PR ensures that the package hash is consistent across Python versions, and adds a lot of tests to ensure that. It also makes the package hash less sensitive, to avoid superfluous rebuilds in CI.

Now that we have #25310 (reusing installed packages), we want to make Spack hashes more fine-grained to preserve provenance. To do that we want to switch to using full hashes everywhere. Full hashes include all build dependencies, a hash of the canonicalized package file, and the hashes of artifacts and patches used in the build.

The existing full hash worked by editing the AST of a package.py file, removing multimethods (where statically possible b/c we know they won't be triggered), and stripping comments, docstrings, and directives. Comments and docstrings don't affect the build, and directives are represented by other metadata on the spec. Once all these modifications were done, we dumped the repr() of the AST and hashed that.

There were a few problems with the old package hash:

  1. The AST changes pretty frequently from Python version to Python version, and so does its repr(). So, Spack running under Python 2.7, 3.5, and 3.8 may calculate entirely different full hashes for the same package.py file. This isn't as big a deal right now, but it needs to be fixed if we are going to move to full hashes.
  2. Static multimethod reosolution (removing multi-methods if we know they won't be used) was not working in all cases in the previous version of the hash, and there weren't a lot of tests for it. This resulted in the old package hash stripping any function with a decorator from the canonical representation, so we would lose some functions that affected builds.
  3. The old docstring remover would only strip the first free-standing string in a module/class/method. We want to strip all of them (a few packages had extra strings lying around).
  4. The old package hash would not remove directive nested in complex logic at the package level (if, for, with when(), etc.), and that constitutes a lot of logic for some packages, even when all the loops are doing is setting up complex directives.
  5. The old package hash was considering changes to maintainers and tags, and we really don't want those to require a rebuild.

This PR makes the following modifications:

  • To solve the AST issue, add spack.util.unparse, which regenerates Python source code from a parsed AST. This is based on https://github.com/simonpercivall/astunparse, which is consistent across Python 2.7-3.8, with back ports from the new ast.unparse function in Python 3.9+.
    • The unparse() method implemented here is consistent for Python 2.7 and 3.5-3.10.
    • We added a py_ver_consistent=True argument to unparse(), which handles the following cases:
      • Ensure that *args and **kwargs are last in function calls, for consistency with the Python 2 AST (which has no idea what position they're in)
      • Always unparse print() as a function (Python 2's AST treats it as its own statement) and do not differentiate between print(a, b, c) and print((a, b, c)). These have different semantics in Python 2 and Python 3, but they almost certainly do not affect the results of builds, so switching between them won't affect the package hash.
      • Unparse unicode literals the way Python 2 would. Basically, this means that all non-ASCII characters will use \x encoding in output, and unicode literals will be unparsed with a u prefix.
    • Our unparse() is consistent with ast.unparse() when py_ver_consistent is not used, so that we may be able to switch to the builtin ast.unparse() one day (if we ever only support Python 3.9+). This is done with backports of things like ast.unparse's parenthesis precedence algorithm.
    • Add some tests that guarantee that packages will have the same hash across different python versions.
  • Fix multimethod resolution in package_hash.py to preserve non-when decorators, and handle dynamic @when statements better. Add a bunch of tests for this.
  • Fix docstring stripping to remove any free-standing string in a package.py file. This is safe as these strings are used as comments -- they don't affect build results.
  • Add some new commands to deal with package hashes:
    • spack pkg source <spec>: dump source for a package.py file
    • spack pkg source --canonical <spec>: dump canonical source for a package.py file, with comments and docstrings stripped and multimethods removed.
    • spack pkg hash <spec>: print the canonical package hash for the spec.
  • Remove the old AST-dump-based package_hash and use the new, stable unparse-based one.
  • New unparser descends into control logic a the package level and removes anything (loops, ifs, with statements) that is comprised only of directives. This makes the package hash less sensitive.
  • Added more attributes for removal (maintainers, tags) to make the hash less sensitive.

I've verified that this is consistent for Python 2.7 and 3.5-3.10 across all Spack packages by unparsing them and comparing their canonicalized source. The test packages in lib/spack/spack/test/data/unparse and the test_package_hash_consistency() test are meant to preserve this with some spot checks of interesting cases.

This should trigger a large CI rebuild on this PR, but it may reduce CI churn in the future (though I think CI was only using one version of Python, so maybe not).

cosmicexplorer
cosmicexplorer previously approved these changes Dec 25, 2021
Copy link
Copy Markdown
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!!!!!!! I'm marking approve because I would accept this as-is, but will pay attention to any discussion on my review comments.

@tgamblin tgamblin force-pushed the consistent-package-hash branch 2 times, most recently from cc15d63 to 92a0bf8 Compare December 28, 2021 18:33
tgamblin and others added 8 commits January 12, 2022 01:34
We are planning to switch to using full hashes for Spack specs, which means that the
package hash will be included in the deployment descriptor. This means we need a more
robust package hash than simply dumping the `repr` of the AST.

The AST repr that we previously used for package content is unreliable because it can
vary between python versions (Python's AST actually changes fairly frequently).

- [x] change `package_hash`, `package_ast`, and `canonical_source` to accept a string for
      alternate source instead of a filename.
- [x] consolidate package hash tests in `test/util/package_hash.py`.
- [x] remove old `package_content` method.
- [x] make `package_hash` do what `canonical_source_hash` was doing before.
- [x] modify `content_hash` in `package.py` to use the new `package_hash` function.

Co-authored-by: Danny McClanahan <[email protected]>
To make it easier to see how package hashes change and how they are computed, add two
commands:

* `spack pkg source <spec>`: dumps source code for a package to the terminal

* `spack pkg source --canonical <spec>`: dumps canonicalized source code for a
   package to the terminal. It strips comments, directives, and known-unused
   multimethods from the package. It is used to generate package hashes.

* `spack pkg hash <spec>`: This gives the package hash for a particular spec.
  It is generated from the canonical source code for the spec.

- [x] `add spack pkg source` and `spack pkg hash`
- [x] add tests
- [x] fix bug in multimethod resolution with boolean `@when` values

Co-authored-by: Greg Becker <[email protected]>
We can't tell `print(a, b, c)` and `print((a, b, c))` apart -- both of these expressions
generate different ASTs in Python 2 and Python 3.  However, we can decide that we don't
care.  This commit treats both of them the same when `py_ver_consistent` is set with
`unparse()`.

This means that the package hash won't notice changes from printing a tuple to printing
multiple values, but we don't care, because this is extremely unlikely to affect the build.
More than likely this is just an error message for the user of the package.

- [x] treat `print(a, b, c)` and `print((a, b, c))` the same in py2 and py3
- [x] add another package parsing test -- legion -- that exercises this feature
…ream

These are refactors that have happened in upstream `ast.unparse()`
These refactors have happened in upstream `ast.unparse()`
These are the unit tests from astunparse, converted to pytest, with a few backports from
upstream cpython. These should hopefully keep `unparser.py` well covered as we change it.
Many packages implement logic at the class level to handle complex dependencies and
conflicts. Others have started using `with when("@1.0"):` blocks since we added that
capability. The loops and other control logic can cause some pure directive logic not to
be removed by our package hashing logic -- and in many cases that's a lot of code that
will cause unnecessary rebuilds.

This commit changes the unparser so that it will descend into these blocks. Specifically:

  1. Descend into loops, if statements, and with blocks at the class level.
  2. Don't look inside function definitions (in or outside a class).
  3. Don't look at nested class definitions (they don't have directives)
  4. Add logic to *remove* empty loops/with blocks/if statements if all directives
     in them were removed.

This allows our package hash to ignore a lot of pure metadata that it was not ignoring
before, and makes it less sensitive.

In addition, we add `maintainers` and `tags` to the list of metadata attributes that
Spack should remove from packages when constructing canonoical source for a package
hash.

- [x] Make unparser handle if/for/while/with at class level.
- [x] Add tests for control logic removal.
- [x] Add a test to ensure that all packages are not only unparseable, but also
      that their canonical source is still compilable. This is a test for
      our control logic removal.
- [x] Add another unparse test package that has complex logic.
@becker33 becker33 merged commit 54d741b into develop Jan 12, 2022
@becker33 becker33 deleted the consistent-package-hash branch January 12, 2022 14:14
tgamblin added a commit that referenced this pull request Aug 22, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Oct 2, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Mar 10, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 12, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Sep 4, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Oct 22, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Oct 28, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
@haampie haampie added the hash-change things that will change a lot of hashes label Nov 7, 2024
vjranagit pushed a commit to vjranagit/spack that referenced this pull request Jan 18, 2026
We've included a package hash in Spack since spack#7193 for CI, and we started using it on
the spec in spack#28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in spack#28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants