Skip to content

Features/package hash#7193

Merged
tgamblin merged 6 commits intospack:developfrom
mathstuf:features/package-hash
Mar 20, 2018
Merged

Features/package hash#7193
tgamblin merged 6 commits intospack:developfrom
mathstuf:features/package-hash

Conversation

@mathstuf
Copy link
Copy Markdown
Contributor

@mathstuf mathstuf commented Feb 6, 2018


This is the features/package-hash branch rebased onto develop and split into smaller, more logical commits. Currently there are two test failures on the top of the branch, but the rest of the branch is passing the test suite.

Cc: @tgamblin @scheibelp

See #7119

@mathstuf mathstuf force-pushed the features/package-hash branch from 45407af to 69430d1 Compare February 6, 2018 19:34
@mathstuf mathstuf mentioned this pull request Feb 6, 2018
3 tasks
# typically allow for just retrieving the latest commit ID)
source_id = fs.for_package_version(self, self.version).source_id()
if not source_id:
return
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp This causes failures because package_hash doesn't expect None in its use of this method, so it seems that something will need to be done here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should make this just hash empty string if we do not have the content.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, that seems flaky to me. The hash of a package changes if I delete a patch file on disk?

@mathstuf
Copy link
Copy Markdown
Contributor Author

mathstuf commented Feb 6, 2018

@scheibelp There's also a test failure in test_dont_add_patches_to_installed_package that I don't fully understand. I don't really understand what the test is trying to do or why it worked before and doesn't now. As-is, it fails due to failing to do the fetch (understandable), but if I make it a local file patch instead, the assertion fails at the end of that test because the left side (I think it comes from querying the database) has a patch= part in its concretized spec. Where is this coming from that it didn't also happen before (it only fails in the last commit on this topic)?

@scheibelp
Copy link
Copy Markdown
Member

There's also a test failure in test_dont_add_patches_to_installed_package that I don't fully understand

I don't see that test here or in develop...scheibelp:features/package-hash and it's been a while so I'd have to see it to get a better understanding of why there is a failure now. Since then the manifestation of patches as implicit variants (5476) may have resulted in a conflict here.

@mathstuf
Copy link
Copy Markdown
Contributor Author

The test is in lib/spack/spack/test/install.py. It seems that you added it in #5580 :) .

@tgamblin tgamblin self-requested a review February 20, 2018 19:09
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp @mathstuf:

I added a bunch of comments. This looks mostly good -- thanks for rebasing and refactoring commits. The refactored commits made it much easier to review.

There are two goals with this PR:

  1. Being able to tell two specs apart by their content hashes, so that the build farm can rebuild when a content hash has changed.
  2. Eventually transition to using full hashes to identify specs.

Right now, we only want (1). Eventually, full hashes will replace the DAG hash, but not yet. Doing that now will cause many rebuilds, and the concretizer is not aggressive enough about reusing existing packages to handle it. Once the new concretizer is in (in a couple months) we can revisit the transition.

So, given that, I think we need the following changes:

  1. Remove all the parsing logic added for full hashes, as well as the look-up-by-full-hash function. Eventually there will be only one (full) hash and we won't need to parse two types of hashes.
  2. Remove the last commit that switches the DB to use full hashes (though it might be worthwhile to keep that on a separate branch to reuse later).
  3. store the full hash in spec.yaml and in the DB (and in binary package metadata), but do not use it to index or identify specs.

(3) is needed so that the build farm can tell when a package should be rebuilt due to a package file or source change. Eventually that should just be a check of the spec hash, but for now, it's a special check just for the build farm while we wait for the new concretizer.

Make sense?

"""

def source_id(self):
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably raise NotImplementedError(), not just pass, as it's an error if an implementation doesn't provide a valid implementation of this function (or leave it to raise NotImplementedError if, for a given implementation, the function is never expected to be called).

This should also document the contract. What's it supposed to do? What type does it return (string, tuple, yaml-able thing?, etc)? Put a docstring on this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to need just a __str__ method and that's it.

return base64.b32encode(hashlib.md5(F.read()).digest()).lower()

@property
def file_hash(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant now. Since #5476, all implementations of Patch have a sha256 attribute that you can use for this. URLPatches also have an archive_sha256, which allows us to checksum the downloaded archive, but you're guaranteed that the sha256 attribute is a content hash of the patch file. So replace this with sha256.

import re


attributes = ['homepage', 'url', 'list_url', 'extendable', 'parallel',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should live in package.py, as the list needs to be updated whenever a new attribute is added. If this list lives here it will probably get out of sync.

return node


class TagMultiMethods(ast.NodeVisitor):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docstring

nodes.append((node, None))


class ResolveMultiMethods(ast.NodeTransformer):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docstring

# These are possible token types in the spec grammar.
#
HASH, DEP, AT, COLON, COMMA, ON, OFF, PCT, EQ, ID, VAL = range(11)
HASH, FULL_HASH, DEP, AT, COLON, COMMA, ON, OFF, PCT, EQ, ID, VAL = range(12)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to add this here, since the lexer doesn't actually have a sigil for parsing full hashes. I think it's totally reasonable to add print logic for the full hash. But I would not add parse logic for it yet. We are not transitioning to using full hashes to identify specs; we're just getting the package hashing in so that we can use it in the build farm.

specs.append(self.spec_by_hash())
elif self.accept(FULL_HASH):
# We're finding a spec by hash
specs.append(self.spec_by_full_hash())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

elif self.accept(FULL_HASH):
# We're finding a dependency by hash for an
# anonymous spec
dep = self.spec_by_full_hash()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


return matches[0]

def spec_by_full_hash(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove; not needed

else:
raise InvalidHashError(spec, hash_spec.dag_hash())

elif self.accept(FULL_HASH):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove from parsing

This will be included in the full hash of packages.
These attributes are ignored when doing a content hash of a package.
@mathstuf
Copy link
Copy Markdown
Contributor Author

I've removed the last two commits. This resolves the test failures and it takes out the ${FULL_HASH} and database bits. Those will be available on another branch I'll push once testing the commits finishes locally.

@mathstuf mathstuf force-pushed the features/package-hash branch 2 times, most recently from dc176c2 to 8622d32 Compare February 21, 2018 17:49
@mathstuf
Copy link
Copy Markdown
Contributor Author

"""Get the first <bits> bits of the DAG hash as an integer type."""
return base32_prefix_bits(self.dag_hash(), bits)

def full_hash(self, length=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs test coverage. The tests seem to call package_hash, but this doesn't. Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests were for package_hash directly. This calls content_hash which ends up calling package_hash later. Will add tests to this commit.

if not source_id:
message = 'Missing a source id for {s.name}@{s.version}'
tty.warn(message.format(s=self))
hashContent.append('')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathstuf: I'd still leave a TODO here (in addition to the warning) with a note to fix this when we refactor fetchers.

I think you said in an earlier comment that this was brittle, and that if a tarball is removed, a package would then have a different hash. Can you elaborate on that? The hash for an existing installation will never change, because we store the hash that was generated at install time. So this may cause a rebuild if, e.g., the version is later checksummed, but it shouldn't cause any existing install to change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that all sources and patches come with a .sha256, that's not a problem I think. I've put the comment back with the updated reasoning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@mathstuf mathstuf force-pushed the features/package-hash branch from 8622d32 to 09b5842 Compare February 21, 2018 22:21
@tgamblin
Copy link
Copy Markdown
Member

Failing on Python 3 compatibility

@mathstuf mathstuf force-pushed the features/package-hash branch 2 times, most recently from 8c07814 to a3b84cb Compare February 27, 2018 20:04
@mathstuf
Copy link
Copy Markdown
Contributor Author

I fixed the Python3 incompat bit.

@mathstuf mathstuf changed the title WIP: Features/package hash Features/package hash Feb 28, 2018
This will be included in the full hash of packages.
This helps to ensure that patches are applied consistently and will also
be used as the source for the patch part of full package hashes.
This calculates a hash which depends on the complete content of the
package including sources and the associated `package.py` file.
This hash includes the content of the package itself as well as the DAG
for the package.
@mathstuf mathstuf force-pushed the features/package-hash branch from a3b84cb to 176c3cf Compare February 28, 2018 19:51
@mathstuf
Copy link
Copy Markdown
Contributor Author

And Python3 tests passed locally.

@aashish24
Copy link
Copy Markdown

@scheibelp @tgamblin

@tgamblin tgamblin merged commit 6458b15 into spack:develop Mar 20, 2018
@mathstuf mathstuf deleted the features/package-hash branch March 20, 2018 14:01
@adamjstewart
Copy link
Copy Markdown
Member

Did this PR by any chance change the order in which patches are applied? Some patches need to be applied in a certain order. See #7543.

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we started sorting patches by urls?

patch_list
for spec, patch_list in self.patches.items()
if self.spec.satisfies(spec))
return sorted(patchesToApply, key=lambda p: p.path_or_url)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjstewart This seems a likely candidate to inspect for the behavior seen in #7543 (we return the patches sorted by url, if I am not reading this wrongly)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add on these comments, I checked and we never really enforced a global order on patch application. What we were doing was to enforce a local order for the same constraint.

Said otherwise, patches from the following directives:

patch('patch1', when='constraint1')
patch('patch2', when='constraint2')
patch('patch3', when='constraint1')

are stored in packages like:

{
  'constraint1': ['patch1', 'patch3'],
  'constraint2': ['patch2'],
}

using a dictionary. Before this PR we were granted that patch3 would be applied after patch1, but we had no guarantee on the order of application for patch2 with respect to the other two. After this PR we are granted that we apply patches sorted by their path or urls. I think both strategies are wrong and we should apply patches in the order they are declared, if they match the constraint.

I'll submit a minimal PR to recover the old behavior, and work on a more permanent fix later, if people agree with my analysis above. @tgamblin @mathstuf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @alalazo. @mathstuf is on vacation til fall so we'll have to go ahead without him. Sorry I missed this in review!

# Apply all the patches for specs that match this one
patched = False
for patch in patches:
for patch in self.patches_to_apply():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjstewart And here we use the patches sorted differently.

alalazo added a commit to epfl-scitas/spack that referenced this pull request Mar 22, 2018
fixes spack#7543

This is very likely an hot-fix, while a more permanent solution is
needed. See this comment for more insight:

 spack#7193 (comment)

on the problem, that probably asks
alalazo added a commit to epfl-scitas/spack that referenced this pull request Mar 22, 2018
fixes spack#7543

This is very likely an hot-fix, while a more permanent solution is
needed. See this comment for more insight:

 spack#7193 (comment)

on the problem.
tgamblin pushed a commit that referenced this pull request Mar 22, 2018
fixes #7543

This is very likely an hot-fix, while a more permanent solution is
needed. See this comment for more insight:

 #7193 (comment)

on the problem.
ch741 pushed a commit to ch741/spack that referenced this pull request Apr 20, 2018
fixes spack#7543

This is very likely an hot-fix, while a more permanent solution is
needed. See this comment for more insight:

 spack#7193 (comment)

on the problem.
scheibelp added a commit that referenced this pull request Jun 7, 2018
Fixes #7885

#7193 added the patches_to_apply function to collect patches which are then
applied in Package.do_patch. However this only collects patches that are
associated with the Package object and does not include Spec-related patches
(which are applied by dependents, added in #5476).

Spec.patches already collects patches from the package as well as those applied
by dependents, so the Package.patches_to_apply function isn't necessary. All
uses of Package.patches_to_apply are replaced with Package.spec.patches.

This also updates Package.content_hash to require the associated spec to be
concrete: Spec.patches is only set after concretization. Before this PR, it was
possible for Package.content_hash to be valid before concretizing the associated
Spec if all patches were associated with the Package (vs. being applied by
dependents). This behavior was unreliable though so the change is unlikely to
be disruptive.
tgamblin added a commit that referenced this pull request Aug 22, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Oct 2, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Dec 4, 2022
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Mar 10, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Jul 18, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 12, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Aug 23, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Sep 4, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Oct 22, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
tgamblin added a commit that referenced this pull request Oct 28, 2024
We've included a package hash in Spack since #7193 for CI, and we started using it on
the spec in #28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in #28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
vjranagit pushed a commit to vjranagit/spack that referenced this pull request Jan 18, 2026
We've included a package hash in Spack since spack#7193 for CI, and we started using it on
the spec in spack#28504. However, what goes into the package hash is a bit opaque. Here's
what `spec.json` looks like now:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "package_hash": "pthf7iophdyonixxeed7gyqiksopxeklzzjbxtjrw7nzlkcqleba====",
        "hash": "ke4alug7ypoxp37jb6namwlxssmws4kp"
      }
    ]
  }
}
```

The `package_hash` there is a hash of the concatenation of:

* A canonical hash of the `package.py` recipe, as implemented in spack#28156;
* `sha256`'s of patches applied to the spec; and
* Archive `sha256` sums of archives or commits/revisions of repos used to build the spec.

There are some issues with this: patches are counted twice in this spec (in `patches`
and in the `package_hash`), the hashes of sources used to build are conflated with the
`package.py` hash, and we don't actually include resources anywhere.

With this PR, I've expanded the package hash out in the `spec.json` body. Here is the
"same" spec with the new fields:

```json
{
  "spec": {
    "_meta": {
      "version": 3
    },
    "nodes": [
      {
        "name": "zlib",
        "version": "1.2.12",
        ...
        "package_hash": "6kkliqdv67ucuvfpfdwaacy5bz6s6en4",
        "sources": [
          {
            "type": "archive",
            "sha256": "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9"
          }
        ],
        "patches": [
          "0d38234384870bfd34dfcb738a9083952656f0c766a0f5990b1893076b084b76"
        ],
        "hash": "ts3gkpltbgzr5y6nrfy6rzwbjmkscein"
      }
    ]
  }
}
```

Now:

* Patches and archive hashes are no longer included in the `package_hash`;
* Artifacts used in the build go in `sources`, and we tell you their checksum in the `spec.json`;
* `sources` will include resources for packages that have it;
* Patches are the same as before -- but only represented once; and
* The `package_hash` is a base32-encoded `sha1`, like other hashes in Spack, and it only
  tells you that the `package.py` changed.

The behavior of the DAG hash (which includes the `package_hash`) is basically the same
as before, except now resources are included, and we can see differences in archives and
resources directly in the `spec.json`

Note that we do not need to bump the spec meta version on this, as past versions of
Spack can still read the new specs; they just will not notice the new fields (which is
fine, since we currently do not do anything with them).

Among other things, this will more easily allow us to convert Spack specs to SBOM and
track relevant security information (like `sha256`'s of archives). For example, we could
do continuous scanning of a Spack installation based on these hashes, and if the
`sha256`'s become associated with CVE's, we'll know we're affected.

- [x] Add a method, `spec_attrs()` to `FetchStrategy` that can be used to describe a
      fetcher for a `spec.json`.

- [x] Simplify the way package_hash() is handled in Spack. Previously, it was handled as
      a special-case spec hash in `hash_types.py`, but it really doesn't belong there.
      Now, it's handled as part of `Spec._finalize_concretization()` and `hash_types.py`
      is much simpler.

- [x] Change `PackageBase.content_hash()` to `PackageBase.artifact_hashes()`, and
      include more information about artifacts in it.

- [x] Update package hash tests and make them check for artifact and resource hashes.

Signed-off-by: Todd Gamblin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants