fix: clean up corrupt partial downloads to prevent infinite extraction loop#1015
fix: clean up corrupt partial downloads to prevent infinite extraction loop#1015sheeki03 wants to merge 2 commits intocjpais:mainfrom
Conversation
…n loop When extraction of a directory-based model fails (corrupt archive), the .partial file was left in place and download state was not reset. This caused the next download attempt to resume from EOF, "complete" instantly, and retry extraction on the same broken archive — looping forever. This fix: - Scopes archive file handles so they are dropped before cleanup (Windows compat) - Deletes .partial on extraction failure (corrupt archive) but preserves it on setup failures (permissions, disk space) since the archive may be valid - Resets all backend state (is_downloading, partial_size, cancel flags) - Clears frontend download state maps on extraction failure to prevent stuck UI
|
I'm not convinced this is the way forward. If you as a human want to review the code and tell me I'm wrong let me know. I can easily (trivially) compute the hashes. This to me seems like a band-aid solution which just adds more code and possibility for faults. The code is already a giant mess in this department, and I think it's almost better to just fundamentally rethink all of it, or rip it out. |
|
Gotcha...I avoided checksums, but I can just download the files from blob.handy.computer and compute them myself. |
|
Yeah right now I think maybe going the simpler way might be best. Certainly there's a way to handle partial downloads, I'm just a bit wondering how much it matters to handle this. More or less I think it would mean revisiting this code overall and seeing if there's a simpler solution. It seems like things are kind of just in a busted state and I would rather make a proper fix, or move to checksums. Most of the models are a few hundred MB and not multi gig files |
VirenMohindra
left a comment
There was a problem hiding this comment.
the fix is directionally correct — deleting .partial on extraction failure is the right call. left a few notes inline.
| let _ = self.app_handle.emit( | ||
| "model-extraction-failed", | ||
| &serde_json::json!({ | ||
| "model_id": model_id, |
There was a problem hiding this comment.
this cleanup block (unlock extracting, unlock models, unlock cancel_flags, emit event, return error) is repeated 3 times in this PR — here, in the create_dir_all failure path, and in the File::open failure path. that's ~25 lines × 3. a helper like fn fail_extraction(&self, model_id: &str, error_msg: &str, delete_partial: bool) would cut this to ~10 lines total and make the intent clearer.
| if let Some(model) = models.get_mut(model_id) { | ||
| model.is_downloading = false; | ||
| model.partial_size = 0; | ||
| } |
There was a problem hiding this comment.
this is the actual one-line fix for #858 — nice. on current main, this line is missing and the .partial file survives extraction failure, causing the infinite loop. everything else in the PR is hardening around this.
| Ok(f) => f, | ||
| Err(e) => { | ||
| // File::open failure is environmental, preserve .partial | ||
| let error_msg = format!("Failed to open archive: {e}"); |
There was a problem hiding this comment.
good catch scoping the file handles for windows compat. worth keeping even in a slimmer version of this PR.
| { | ||
| let mut extracting = self.extracting_models.lock().unwrap(); | ||
| extracting.remove(model_id); | ||
| } |
There was a problem hiding this comment.
File::open failing on a .partial that we know exists is unusual — likely permissions or the file was deleted between the existence check and the open. preserving .partial here is fine but this is a very rare edge case. might not be worth 25 lines of dedicated handling vs just falling through to the same "delete and retry" path.
|
also @tanshkoul if you want to help out, testing this pr, or helping to improve/review/etc would be very helpful |
|
closing in favor of #1095 |
Before Submitting This PR
Please confirm you have done the following:
Human Written Description
When a directory-based model download completes but extraction fails (e.g. corrupt archive, interrupted extraction), the
.partialtar.gz file is left behind. On the next download attempt the app resumes from EOF, thinks the download is complete, and retries extraction on the same broken archive — stuck in an infinite loop with no way to re-download.This fix ensures the corrupt
.partialis deleted when extraction fails so the next attempt starts a fresh download. It also distinguishes between archive corruption (delete.partial) and environmental failures like permissions or disk space (preserve.partialsince the archive itself may be valid). The frontend is also fixed to clear stale download/progress state when extraction fails.Related Issues/Discussions
Fixes #858
The maintainer suggested: "checksum and if the checksum does not match, delete the file and re-download." This PR addresses the core stuck-download loop without checksums (no canonical hash source is available). Checksum validation can be added as a follow-up if model hashes are published.
Community Feedback
Issue #858 has 2 comments including the maintainer confirming the bug and suggesting the fix direction.
Testing
Automated:
test_extraction_failure_cleans_up_partial_file) verifying that a corrupt.partialis deleted after extraction failure and the.extractingdirectory is cleaned upcargo test --lib managers::model::testsManual (macOS, M4 MacBook Air, dev build via
bun tauri dev):Corrupt archive cleanup (core bug):
.partialfile (random bytes, not a valid tar.gz).partialwas deleted, backend state reset, UI clearedResume regression (must not break):
.partialpreserved at 20MBBackend log evidence:
Screenshots/Videos (if applicable)
N/A — download/extraction behavior, not UI changes.
AI Assistance
If AI was used: