Various binary cache improvements

Over the last week(s) there's been quite some discussion about binary caches.

This issue is meant to give an overview of discussions, and previous & current problems, and a suggested way forward.

---

The main problems that triggered these discussions:

1. Requests to `s3://` URLs are very slow, let's say it takes 1-3 seconds per request. Compare this to ~150ms for the equivalent `https://` URL for our spack-binaries bucket.
2. Multiple requests per mirror were issued to locate a spec: `spec.yaml`, `spec.json`, `spec.json.sig`
3. There are typically multiple mirrors configured, like 3-5.

This leads to a significant overhead to fetch binaries in CI, @blue42u reported a lot of time wasted (let's say about 30 minutes) just trying to fetch binaries from mirrors.

---

We already had a small optimization to reduce the number of requests:

4. We check the _local_ or _offline_ cache of the mirror for a spec, and then prioritize mirrors with matching specs, resulting in direct hits (typically).

However, the optimization has a bug: if the spec cannot be located in any local cache (which is either because none of the remotes have the spec at all, or because we don't have a local cache for the mirror), Spack would do a _partial_ update of the cache. Partial in the sense that it would query _each_ mirror if it had the spec by directly fetching the relevant spec.json files. So, in this case, the optimization is doing strictly damage: all mirrors are queried before starting a download; without the "optimization", Spack would simply stop at the first mirror where it can download from.

---

5. @blue42u changed the optimization in point 4. to go from a _partial_ update to a _full_ update of the cache if there was no local cache hit. The idea being, for the next spec to install, the cache can finally be exploited, and the optimization works: mirrors are ordered to get direct hits.

However, 5. only makes sense for terribly slow mirrors, since there is a high startup cost of fetching `index.json` with say a 100K specs. Slow mirrors are not the norm (e.g. file:// mirrors or mirrors on the local network with low latency), so this PR makes the Spack experience only worse for other Spack users. For fast mirrors, we'd really like to do direct fetches (and also use that fully offline mirror order optimization).

In fact, we never had any issues with the https://mirror.spack.io URLs for sources, it would be absurd if Spack would first download an index of _all sources available_ on mirror.spack.io so that it could use that when installing packages from sources.

---

What has not really been looked into is why these s3:// requests are so slow in the first place, and it turns out it's because of various trivial issues:

6. Each s3:// 404 would cause Spack to try and download `<failing url>/index.html`, this was fixed in #34325 
7. Spack creates an S3 client instance _on each request_, which itself requires one or more requests to S3 if no credentials are provided, causing a huge overhead. For me, reusing the same client instance makes requests 4x faster. #34372 

Next, what had not really been addressed:

8. Direct existence checks on mirrors are slower than necessary, because in the case of a cache miss *three* requests are made: `spec.yaml`, `spec.json`, `spec.json.sig`, we can reduce this to one:
   * `spec.yaml` was deprecated, so it's removed in #34347
   * There is no technical reason to have a special `spec.json.sig` extension, so we can just stick to `spec.json` and have Spack peek into the file to see if it's signed or not, so I submitted #34350. (The only problem here is that it's not forward compatible, it may need backporting to 0.19 if we're nice about it).

When 6-8 are all addressed, I expect it would reduce the overhead (especially in the unhappy cache miss path) at least by a factor 10.

---

Going forward, I think the highest priority is to fix point 7.

Then we should ensure the mirror order optimization is always offline, which means partially reverting @blue42u's PR, and adding `index_only=True` in the relevant place where a spec is searched for.

To make @blue42u happy, it could be useful to have a command `spack mirror update` (or something like that), that updates the local binary index if necessary, which he can then run before `spack install` in CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various binary cache improvements #34371

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Various binary cache improvements #34371

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions