Skip to content

Buildcache tarballs with rootfs structure#39341

Merged
alalazo merged 2 commits intospack:developfrom
haampie:feature/full-path-in-tarballs
Oct 3, 2023
Merged

Buildcache tarballs with rootfs structure#39341
alalazo merged 2 commits intospack:developfrom
haampie:feature/full-path-in-tarballs

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Aug 8, 2023

Two changes in this PR:

  1. Register absolute paths in tarballs*, which makes it easier
    to use them as container image layers, or rootfs in general, outside
    of Spack where no relocation can be done. Spack supports installing
    tarballs of this type already on develop. Tar file size would be
    a bit larger, but those repeated prefixes compress well. In practice
    I'm getting 0.4% size difference, so negligible.
  2. Assemble the tarfile entries "by hand", which has a few advantages:
    1. Avoid reading /etc/passwd, /etc/groups, /etc/nsswitch.conf
      which tar.add(dir) does for each file it adds
    2. Reduce the number of stat calls per file added by a factor two,
      compared to tar.add, which should help with slow, shared filesystems
      where these calls are expensive
    3. Create normalized TarInfo entries from the start, instead of letting
      Python create them and patching them after the fact
    4. Don't recurse into subdirs before processing files, to avoid
      keeping nested directories opened. This changes the tar entry
      order slightly, it's like sorting by (is_dir, name) instead of name

Part 1 is necessary for #38358

This change is forward incompatible, so old Spack 0.20 won't be able to
use tarballs generated with 0.21. (Unless #37441 is backported to 0.20)

* the leading / is stripped, following GNU tar defaults, so technically the
paths are relative to the root / and not absolute.

Creating a tarball from a [email protected] install

Measured using

PYTHONPATH=$spack/lib/spack/external:$spack/lib/spack/external/_vendoring:$spack/lib/spack strace -fc python3 -c 'from spack.binary_distribution import _do_create_tarball; _do_create_tarball("a.tar.gz", "/path/to/python-3.11.4-ygq322p7xh65pm5suad2fzx6q7eyf7xs", "/path/to/python-3.11.4-ygq322p7xh65pm5suad2fzx6q7eyf7xs", {})'

on an NVMe disk, so probably multiply the seconds column by a factor 100-1000 to translate to HPC shared filesystem performance ;)

Before this PR

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 24.60    0.033036           0     52882       218 newfstatat
 18.32    0.024602           0     25669       270 openat
 17.97    0.024129           0     40214           read
 14.18    0.019037           0     25655         4 close
  8.64    0.011603           0     25198         9 lseek
  5.23    0.007017           2      2792           write
  1.04    0.001397           1       734           getdents64

After this

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 25.39    0.013535           0     23634           read
 23.39    0.012466           0     19405       219 newfstatat
 13.14    0.007003           0      9090       270 openat
 11.37    0.006060           2      2792           write
  8.02    0.004275           0      8908         4 close
  6.58    0.003507           0      8620         9 lseek
  2.23    0.001186           1       734           getdents64

This includes (common) startup of Python itself... even then 2.7x fewer stat calls, and read dominating the syscalls as it should. write is of course small because of compression.

@spackbot-app spackbot-app bot added binary-packages core PR affects Spack core functionality tests General test capability(ies) labels Aug 8, 2023
@haampie haampie mentioned this pull request Aug 8, 2023
12 tasks
@haampie haampie force-pushed the feature/full-path-in-tarballs branch from f1fadf0 to 0e9d096 Compare August 8, 2023 20:50
Two changes in this PR:

1. Register absolute paths in tarballs, which makes it easier
   to use them as container image layers, or rootfs in general, outside
   of Spack. Spack supports this already on develop.
2. Assemble the tarfile entries "by hand", which has a few advantages:
   1. Avoid reading `/etc/passwd`, `/etc/groups`, `/etc/nsswitch.conf`
      which `tar.add(dir)` does _for each file it adds_
   2. Reduce the number of stat calls per file added by a factor two,
      compared to `tar.add`, which should help with slow, shared filesystems
      where these calls are expensive
   4. Create normalized `TarInfo` entries from the start, instead of letting
      Python create them and patching them after the fact
   5. Don't recurse into subdirs before processing files, to avoid
      keeping nested directories opened. (this changes the tar entry
      order slightly, it's like sorting by `(not is_dir, name)`.

This change is forward incompatible, so old Spack 0.20 won't be able to
use tarballs generated with 0.21. (Unless spack#37441 is backported to 0.20)
@haampie haampie force-pushed the feature/full-path-in-tarballs branch from 0e9d096 to 37be51b Compare August 8, 2023 20:52
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Sep 6, 2023

@kwryankrattiger maybe you wanna review? Seems like nobody is interested :(

@alalazo alalazo self-assigned this Oct 2, 2023
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the logic and LGTM. I have a couple of minor comments. I'll be testing this on some parallel filesystem next.

Args:
tar: tarfile object to add files to
prefix: absolute install prefix of spec"""
assert os.path.isabs(prefix) and os.path.isdir(prefix)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a message here, otherwise this will print:

==> Error:

if triggered.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 2, 2023

In terms of wall time, on an HPC cluster, develop shows:

$ time spack buildcache create ~/tmp/develop libpng
==> Selected 17 specs to push to file:///g/g92/culpo1/tmp/develop
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
==> [ 1/17] Pushed [email protected]/rrjrgaw
==> [ 2/17] Pushed [email protected]/mjqcazt
==> [ 3/17] Pushed [email protected]/hsuxk52
==> [ 4/17] Pushed [email protected]/upwiaol
==> [ 5/17] Pushed [email protected]/itbpeol
==> [ 6/17] Pushed [email protected]/d542ymy
==> [ 7/17] Pushed ca-certificates-mozilla@2023-05-30/kuxdrfi
==> [ 8/17] Pushed [email protected]/kvputle
==> [ 9/17] Pushed [email protected]/oc4h6k6
==> [10/17] Pushed [email protected]/64ouqrb
==> [11/17] Pushed [email protected]/zh67wnb
==> [12/17] Pushed [email protected]/b5olrx4
==> [13/17] Pushed [email protected]/xcqqjcq
==> [14/17] Pushed [email protected]/jns5mse
==> [15/17] Pushed [email protected]/skzoqbs
==> [16/17] Pushed [email protected]/qyygqi4
==> [17/17] Pushed [email protected]/kduz2h6

real	1m35,619s
user	1m3,254s
sys	0m15,350s

while this PR shows:

$ time spack buildcache create ~/tmp/pr libpng
==> Selected 17 specs to push to file:///g/g92/culpo1/tmp/pr
==> [ 1/17] Pushed [email protected]/rrjrgaw
==> [ 2/17] Pushed [email protected]/mjqcazt
==> [ 3/17] Pushed [email protected]/hsuxk52
==> [ 4/17] Pushed [email protected]/upwiaol
==> [ 5/17] Pushed [email protected]/itbpeol
==> [ 6/17] Pushed [email protected]/d542ymy
==> [ 7/17] Pushed ca-certificates-mozilla@2023-05-30/kuxdrfi
==> [ 8/17] Pushed [email protected]/kvputle
==> [ 9/17] Pushed [email protected]/oc4h6k6
==> [10/17] Pushed [email protected]/64ouqrb
==> [11/17] Pushed [email protected]/zh67wnb
==> [12/17] Pushed [email protected]/b5olrx4
==> [13/17] Pushed [email protected]/xcqqjcq
==> [14/17] Pushed [email protected]/jns5mse
==> [15/17] Pushed [email protected]/skzoqbs
==> [16/17] Pushed [email protected]/qyygqi4
==> [17/17] Pushed [email protected]/kduz2h6

real	1m21,050s
user	0m53,942s
sys	0m13,770s

so a 17% speedup using this PR.

@alalazo alalazo enabled auto-merge (squash) October 3, 2023 07:10
@alalazo alalazo disabled auto-merge October 3, 2023 07:10
@alalazo alalazo enabled auto-merge (squash) October 3, 2023 07:11
@alalazo alalazo disabled auto-merge October 3, 2023 07:55
@alalazo alalazo merged commit 06057d6 into spack:develop Oct 3, 2023
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 3, 2023

Force merged, since cray-sles is down

AdhocMan pushed a commit to AdhocMan/spack that referenced this pull request Oct 10, 2023
Two changes in this PR:

1. Register absolute paths in tarballs, which makes it easier
   to use them as container image layers, or rootfs in general, outside
   of Spack. Spack supports this already on develop.
2. Assemble the tarfile entries "by hand", which has a few advantages:
   1. Avoid reading `/etc/passwd`, `/etc/groups`, `/etc/nsswitch.conf`
      which `tar.add(dir)` does _for each file it adds_
   2. Reduce the number of stat calls per file added by a factor two,
      compared to `tar.add`, which should help with slow, shared filesystems
      where these calls are expensive
   4. Create normalized `TarInfo` entries from the start, instead of letting
      Python create them and patching them after the fact
   5. Don't recurse into subdirs before processing files, to avoid
      keeping nested directories opened. (this changes the tar entry
      order slightly, it's like sorting by `(not is_dir, name)`.
tgamblin pushed a commit that referenced this pull request Oct 27, 2023
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console 
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**


TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge #37441 (included here)
- [x] Merge #39077 (included here)
- [x] #39187 + #39285
- [x] #39341
- [x] Not a blocker: #35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
victoria-cherkas pushed a commit to victoria-cherkas/spack that referenced this pull request Oct 30, 2023
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console 
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**


TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Nov 2, 2023
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console 
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**


TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
gabrielctn pushed a commit to gabrielctn/spack that referenced this pull request Nov 24, 2023
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console 
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**


TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
mtaillefumier pushed a commit to mtaillefumier/spack that referenced this pull request Dec 14, 2023
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console 
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**


TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jan 11, 2024
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**

TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jan 31, 2024
Credits to @ChristianKniep for advocating the idea of OCI image layers
being identical to spack buildcache tarballs.

With this you can configure an OCI registry as a buildcache:

```console
$ spack mirror add my_registry oci://user/image # Dockerhub

$ spack mirror add my_registry oci://ghcr.io/haampie/spack-test # GHCR

$ spack mirror set --push --oci-username ... --oci-password ... my_registry  # set login credentials
```

which should result in this config:

```yaml
mirrors:
  my_registry:
    url: oci://ghcr.io/haampie/spack-test
    push:
      access_pair: [<username>, <password>]
```

It can be used like any other registry

```
spack buildcache push my_registry [specs...]
```

It will upload the Spack tarballs in parallel, as well as manifest + config
files s.t. the binaries are compatible with `docker pull` or `skopeo copy`.

In fact, a base image can be added to get a _runnable_ image:

```console
$ spack buildcache push --base-image ubuntu:23.04 my_registry python
Pushed ... as [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack

$ docker run --rm -it [image]:python-3.11.2-65txfcpqbmpawclvtasuog4yzmxwaoia.spack
```

which should really be a game changer for sharing binaries.

Further, all content-addressable blobs that are downloaded and verified
will be cached in Spack's download cache. This should make repeated
`push` commands faster, as well as `push` followed by a separate
`update-index` command.

An end to end example of how to use this in Github Actions is here:

**https://github.com/haampie/spack-oci-buildcache-example**

TODO:

- [x] Generate environment modifications in config so PATH is set up
- [x] Enrich config with Spack's `spec` json (this is allowed in the OCI specification)
- [x] When ^ is done, add logic to create an index in say `<image>:index` by fetching all config files (using OCI distribution discovery API)
- [x] Add logic to use object storage in an OCI registry in `spack install`.
- [x] Make the user pick the base image for generated OCI images.
- [x] Update buildcache install logic to deal with absolute paths in tarballs
- [x] Merge with `spack buildcache` command
- [x] Merge spack#37441 (included here)
- [x] Merge spack#39077 (included here)
- [x] spack#39187 + spack#39285
- [x] spack#39341
- [x] Not a blocker: spack#35737 fixes correctness run env for the generated container images

NOTE:

1. `oci://` is unfortunately taken, so it's being abused in this PR to mean "oci type mirror". `skopeo` uses `docker://` which I'd like to avoid, given that classical docker v1 registries are not supported.
2. this is currently `https`-only, given that basic auth is used to login. I _could_ be convinced to allow http, but I'd prefer not to, given that for a `spack buildcache push` command multiple domains can be involved (auth server, source of base image, destination registry). Right now, no urllib http handler is added, so redirects to https and auth servers with http urls will simply result in a hard failure.

CAVEATS:

1. Signing is not implemented in this PR. `gpg --clearsign` is not the nicest solution, since (a) the spec.json is merged into the image config, which must be valid json, and (b) it would be better to sign the manifest (referencing both config/spec file and tarball) using more conventional image signing tools
2. `spack.binary_distribution.push` is not yet implemented for the OCI buildcache, only `spack buildcache push` is. This is because I'd like to always push images + deps to the registry, so that it's `docker pull`-able, whereas in `spack ci` we really wanna push an individual package without its deps to say `pr-xyz`, while its deps reside in some `develop` buildcache.
3. The `push -j ...` flag only works for OCI buildcache, not for others
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binary-packages core PR affects Spack core functionality tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants