Skip to content

fix: clean up corrupt partial downloads to prevent infinite extraction loop#1015

Closed
sheeki03 wants to merge 2 commits intocjpais:mainfrom
sheeki03:fix/partial-download-cleanup
Closed

fix: clean up corrupt partial downloads to prevent infinite extraction loop#1015
sheeki03 wants to merge 2 commits intocjpais:mainfrom
sheeki03:fix/partial-download-cleanup

Conversation

@sheeki03
Copy link
Copy Markdown

Before Submitting This PR

Please confirm you have done the following:

Human Written Description

When a directory-based model download completes but extraction fails (e.g. corrupt archive, interrupted extraction), the .partial tar.gz file is left behind. On the next download attempt the app resumes from EOF, thinks the download is complete, and retries extraction on the same broken archive — stuck in an infinite loop with no way to re-download.

This fix ensures the corrupt .partial is deleted when extraction fails so the next attempt starts a fresh download. It also distinguishes between archive corruption (delete .partial) and environmental failures like permissions or disk space (preserve .partial since the archive itself may be valid). The frontend is also fixed to clear stale download/progress state when extraction fails.

Related Issues/Discussions

Fixes #858

The maintainer suggested: "checksum and if the checksum does not match, delete the file and re-download." This PR addresses the core stuck-download loop without checksums (no canonical hash source is available). Checksum validation can be added as a follow-up if model hashes are published.

Community Feedback

Issue #858 has 2 comments including the maintainer confirming the bug and suggesting the fix direction.

Testing

Automated:

  • 1 new unit test (test_extraction_failure_cleans_up_partial_file) verifying that a corrupt .partial is deleted after extraction failure and the .extracting directory is cleaned up
  • All 4 model manager tests pass: cargo test --lib managers::model::tests

Manual (macOS, M4 MacBook Air, dev build via bun tauri dev):

  1. Corrupt archive cleanup (core bug):

    • Deleted extracted Moonshine V2 Tiny model directory
    • Created a 31MB corrupt .partial file (random bytes, not a valid tar.gz)
    • Restarted app, model showed as not downloaded
    • Clicked Download, app resumed from EOF, attempted extraction, extraction failed
    • .partial was deleted, backend state reset, UI cleared
    • Clicked Download again, fresh download started from scratch, completed successfully
  2. Resume regression (must not break):

    • Started downloading Moonshine V2 Tiny, cancelled mid-download at ~20MB
    • Restarted app, .partial preserved at 20MB
    • Clicked Download, resumed from byte 20159808 (not from 0%), completed successfully

Backend log evidence:

[INFO] Resuming download of model moonshine-tiny-streaming-en from byte 32505856
[INFO] Extracting archive for directory-based model: moonshine-tiny-streaming-en
[INFO] Starting fresh download of model moonshine-tiny-streaming-en
[INFO] Successfully extracted archive for model: moonshine-tiny-streaming-en

Screenshots/Videos (if applicable)

N/A — download/extraction behavior, not UI changes.

AI Assistance

  • AI was used (please describe below)

If AI was used:

  • Tools used: Claude Code
  • How extensively: Claude Code helped with implementation, code review, and test writing. The fix approach (scoped file handles for Windows compat, separate cleanup paths for corrupt archives vs environmental failures) and manual testing were done collaboratively.

…n loop

When extraction of a directory-based model fails (corrupt archive), the
.partial file was left in place and download state was not reset. This
caused the next download attempt to resume from EOF, "complete" instantly,
and retry extraction on the same broken archive — looping forever.

This fix:
- Scopes archive file handles so they are dropped before cleanup (Windows compat)
- Deletes .partial on extraction failure (corrupt archive) but preserves it
  on setup failures (permissions, disk space) since the archive may be valid
- Resets all backend state (is_downloading, partial_size, cancel flags)
- Clears frontend download state maps on extraction failure to prevent stuck UI
@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 12, 2026

I'm not convinced this is the way forward. If you as a human want to review the code and tell me I'm wrong let me know.

I can easily (trivially) compute the hashes. This to me seems like a band-aid solution which just adds more code and possibility for faults. The code is already a giant mess in this department, and I think it's almost better to just fundamentally rethink all of it, or rip it out.

@sheeki03
Copy link
Copy Markdown
Author

Gotcha...I avoided checksums, but I can just download the files from blob.handy.computer and compute them myself.
That would catch corruption before extraction even starts, which is simpler overall

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 12, 2026

Yeah right now I think maybe going the simpler way might be best.

Certainly there's a way to handle partial downloads, I'm just a bit wondering how much it matters to handle this. More or less I think it would mean revisiting this code overall and seeing if there's a simpler solution. It seems like things are kind of just in a busted state and I would rather make a proper fix, or move to checksums.

Most of the models are a few hundred MB and not multi gig files

Copy link
Copy Markdown
Contributor

@VirenMohindra VirenMohindra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fix is directionally correct — deleting .partial on extraction failure is the right call. left a few notes inline.

let _ = self.app_handle.emit(
"model-extraction-failed",
&serde_json::json!({
"model_id": model_id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cleanup block (unlock extracting, unlock models, unlock cancel_flags, emit event, return error) is repeated 3 times in this PR — here, in the create_dir_all failure path, and in the File::open failure path. that's ~25 lines × 3. a helper like fn fail_extraction(&self, model_id: &str, error_msg: &str, delete_partial: bool) would cut this to ~10 lines total and make the intent clearer.

if let Some(model) = models.get_mut(model_id) {
model.is_downloading = false;
model.partial_size = 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual one-line fix for #858 — nice. on current main, this line is missing and the .partial file survives extraction failure, causing the infinite loop. everything else in the PR is hardening around this.

Ok(f) => f,
Err(e) => {
// File::open failure is environmental, preserve .partial
let error_msg = format!("Failed to open archive: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch scoping the file handles for windows compat. worth keeping even in a slimmer version of this PR.

{
let mut extracting = self.extracting_models.lock().unwrap();
extracting.remove(model_id);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File::open failing on a .partial that we know exists is unusual — likely permissions or the file was deleted between the existence check and the open. preserving .partial here is fine but this is a very rare edge case. might not be worth 25 lines of dedicated handling vs just falling through to the same "delete and retry" path.

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 18, 2026

also @tanshkoul if you want to help out, testing this pr, or helping to improve/review/etc would be very helpful

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 19, 2026

closing in favor of #1095

@cjpais cjpais closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Partial downloads prevent the model from ever being downloaded again

3 participants