Skip to content

spack repo update: fix broken partial package repository fetching#50997

Merged
becker33 merged 15 commits intospack:developfrom
alecbcs:fix/repo-update-full-clones
Jul 17, 2025
Merged

spack repo update: fix broken partial package repository fetching#50997
becker33 merged 15 commits intospack:developfrom
alecbcs:fix/repo-update-full-clones

Conversation

@alecbcs
Copy link
Copy Markdown
Member

@alecbcs alecbcs commented Jul 14, 2025

When we first added spack repo update I tried to be clever and perform partial clones to save users bandwidth.

After merging the initial spack repo update functionality in #50868, I started to notice periodic errors when updating my repository with the command and discovered that partial updates don't work how I thought they would. Right now we run git fetch --depth=20 and if there have been > 20 commits to the packages repository since you last pulled the develop branch you'll get an from git error about disconnected histories.

This happens because git only downloads the latest 20 commits and doesn't not how to fast forward your existing repository if there are gaps in the history.

(TLDR; spack repo update is broken right now and we shouldn't release v1.0 as it currently is. Sorry!!)

To fix this we should instead perform a full fetch of the repository when updating an existing repository to get the full history. Both CocoaPods and brew tried doing partial downloads of their package repositories and ultimately moved away from it due to requests from GitHub to reduce compute time on the backend resources. We're not nearly as big as either of those projects, but I think we should learn from them and follow suit in performing full clones for now until we hear otherwise from GitHub.

@alecbcs alecbcs added this to the v1.0.0 milestone Jul 14, 2025
@alecbcs alecbcs requested a review from haampie July 14, 2025 21:35
@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch 3 times, most recently from 45d0005 to 3959566 Compare July 14, 2025 21:45
@haampie
Copy link
Copy Markdown
Member

haampie commented Jul 14, 2025

did you try this? is it relevant? the comment is from 9 years ago.

@alecbcs
Copy link
Copy Markdown
Member Author

alecbcs commented Jul 14, 2025

We do see repositories getting throttled at LLNL from GitHub, but I'm not sure how I'd test the load on GitHub's side.

I think it's worth erring on the side of caution here until we hear otherwise from GitHub. I looked at brew and it still looks like they're preferring full clones over shallow clones for taps.

@haampie
Copy link
Copy Markdown
Member

haampie commented Jul 15, 2025

In my experience the initial shallow fetch is significantly faster, and for updates I expect the same.

There is also the use case of CI with ephemeral environments where full clones make little sense.

@alecbcs
Copy link
Copy Markdown
Member Author

alecbcs commented Jul 15, 2025

In my experience the initial shallow fetch is significantly faster, and for updates I expect the same.

The initial fetch is faster, but shallow updates don't really work. (I thought they'd work but it turns out they don't after living with spack repo update for a few weeks now.)

If > 20 commits have happened since you last ran spack repo update it fails with an error about disconnected histories.
(TLDR; repo update right now is broken is and we should fix it before we release tomorrow.)

To correctly do shallow updates we'd have to modify the depth argument dynamically to be the number of commits since the user's HEAD and I believe that's what folks are talking about here, and that operation is computationally expensive on GitHub's side.

If we drop the depth argument all together on the second time we run spack repo update (what I'd kinda assume would do a dynamic depth) git appears to just do a full clone in the fetch.

@alecbcs
Copy link
Copy Markdown
Member Author

alecbcs commented Jul 15, 2025

For CI/CD use cases I agree it would be really nice to keep the shallow cloning, I just don't know how to do that yet.

(Edit: testing with blobless clones I'm seeing near identical download times and comparable disk usage.)

@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch 2 times, most recently from 55cd369 to c000a07 Compare July 15, 2025 17:35
@alecbcs alecbcs changed the title Switch to full clones / fetches to improve ironically download speed spack repo update: fix broken partial package repository fetching Jul 15, 2025
@alecbcs alecbcs requested review from becker33 and tgamblin July 15, 2025 17:45
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The depth argument in _clone_or_pull is now vestigial.

How robust is the testing for this?

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Jul 16, 2025

@alecbcs can we:

  1. Keep the initial clone shallow
  2. git fetch --unshallow before updating, if we notice an already-fetched repo is shallow?

spack blame already unshallows repos whenever it's run in order to get full blame history.

Doing this would:

  1. Keep the fast initial clones (e.g. for CI and places where you'll never update)
  2. Do a big fetch on your 2nd update (which I think is a reasonable predictor that you will do more updates)
  3. Allow updates to succeed without disconnected histories
  4. Ensure that this works for people who already cloned the package repo with spack and are having update issues.

Thoughts?

@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch from 3f26b62 to 9e88961 Compare July 16, 2025 19:01
@alecbcs
Copy link
Copy Markdown
Member Author

alecbcs commented Jul 16, 2025

I've dropped the --filter=blobs:none in favor of @tgamblin's suggestion since we're still supporting users on RHEL7 with old versions of git that don't yet support blobless clones.

I've tested this on a shallow download and it successfully updated the repository by deepening it to a full clone.

becker33
becker33 previously approved these changes Jul 16, 2025
alecbcs and others added 2 commits July 16, 2025 16:32
Co-authored-by: Todd Gamblin <[email protected]>
Signed-off-by: Alec Scott <[email protected]>
@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch from 20feb86 to 5673c46 Compare July 16, 2025 23:32
@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch from 03e31f4 to 721db75 Compare July 17, 2025 00:31
alecbcs added 3 commits July 16, 2025 17:35
@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch from 9ff71c6 to 5f557d0 Compare July 17, 2025 01:34
@alecbcs alecbcs force-pushed the fix/repo-update-full-clones branch from 5f557d0 to 8e0f9ff Compare July 17, 2025 01:37
@becker33 becker33 merged commit 4192d82 into spack:develop Jul 17, 2025
31 of 32 checks passed
@alecbcs alecbcs deleted the fix/repo-update-full-clones branch July 18, 2025 18:38
alstar555 pushed a commit to alstar555/spack that referenced this pull request Aug 27, 2025
…pack#50997)

* Switch to full clones / fetches to improve ironically download speed

---------

Signed-off-by: Alec Scott <[email protected]>
Signed-off-by: Alec Scott <[email protected]>
Co-authored-by: Todd Gamblin <[email protected]>
Signed-off-by: Angelica Loshak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants