fix: fix backup for output marked by update#3839
Conversation
📝 WalkthroughWalkthroughThe changes modify backup and restore behavior for output files in Snakemake jobs. Backup operations now conditionally check file existence before backing up, and restore operations log warnings instead of errors when backups don't exist, with differentiated cleanup behavior for file versus directory backups. Changes
Sequence DiagramsequenceDiagram
participant Job
participant File
participant Persistence
Note over Job,Persistence: New Backup Flow
Job->>File: Check if exists (await)
alt File exists
File-->>Job: True
Job->>Persistence: backup_output()
else File does not exist
File-->>Job: False
Note over Job: Skip backup
end
Job->>File: prepare()
Note over Job,Persistence: New Restore Flow
Persistence->>Persistence: Check if backup exists
alt Backup is file
Persistence->>File: Remove existing path
Persistence->>File: Copy backup file
Persistence->>File: Delete backup file
else Backup is directory
Persistence->>File: Remove existing path
Persistence->>File: Copy backup directory
Note over Persistence: Backup NOT deleted
else No backup exists
Persistence->>Persistence: Log warning
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
update
|
I think this PR is not finalized as one case I can think of is not implemented. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/snakemake/jobs.py (1)
841-844: Conditional backup logic correctly implemented, but known cleanup gap remains.The existence check before backup creation properly addresses the case where an
updateoutput doesn't exist yet. However, the unresolved scenario you mentioned in the PR comments remains: if the file doesn't exist initially (no backup created) and the job fails mid-execution, the incomplete output won't be removed because thecleanupmethod at lines 954-977 excludes outputs flagged withupdate.To address the incomplete output cleanup, you could modify the
cleanupmethod to removeupdate-flagged outputs when no backup exists. Would you like me to propose an implementation, or open an issue to track this?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/jobs.py(1 hunks)src/snakemake/persistence.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/jobs.pysrc/snakemake/persistence.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: kdm9
Repo: snakemake/snakemake PR: 3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.
Learnt from: mbhall88
Repo: snakemake/snakemake PR: 3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.
📚 Learning: 2025-05-23T09:40:24.474Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.
Applied to files:
src/snakemake/jobs.py
📚 Learning: 2024-11-05T09:41:56.469Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3184
File: snakemake/dag.py:1340-1342
Timestamp: 2024-11-05T09:41:56.469Z
Learning: The method `self.workflow.persistence.software_stack_changed(job)` in `snakemake/persistence.py` is not asynchronous.
Applied to files:
src/snakemake/jobs.py
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
Repo: snakemake/snakemake PR: 3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Applied to files:
src/snakemake/jobs.py
🧬 Code graph analysis (1)
src/snakemake/jobs.py (1)
src/snakemake/io/__init__.py (2)
is_flagged(979-982)exists(515-519)
🔇 Additional comments (1)
src/snakemake/persistence.py (1)
314-322: LGTM! Warning instead of error aligns with conditional backup behavior.The change from raising an error to logging a warning when no backup exists is correct for the new scenario where backups are only created for existing files. The if/elif/else structure clearly handles both directory and file backups, with proper cleanup in both cases.
🤖 I have created a release *beep* *boop* --- ## [9.14.0](v9.13.7...v9.14.0) (2025-11-27) ### Features * Support Hy in script directive ([#3824](#3824)) ([2329c9e](2329c9e)) ### Bug Fixes * Add support for pathlib in `notebook` field ([#3811](#3811)) ([7b2180a](7b2180a)) * Addressed race condition in workdir_handler.py ([#3844](#3844)) ([8dbfcfb](8dbfcfb)) * cleanup update-marked output files of failed jobs if there was no backup to restore them ([#3843](#3843)) ([41f1ce8](41f1ce8)) * correct Windows callable path handling ([#3832](#3832)) ([5caad70](5caad70)) * expand env vars on resources ([#3823](#3823)) ([fcfa1bc](fcfa1bc)) * fix backup for output marked by `update` ([#3839](#3839)) ([09c64b7](09c64b7)) * Minor fixes/additions to logging module. ([#3802](#3802)) ([3b3986d](3b3986d)) * mount local storage prefix into containers ([#3840](#3840)) ([f1e8b62](f1e8b62)) * properly format input/output files in case of missing rule to produce them ([#3849](#3849)) ([69d5d24](69d5d24)) * Unpack AnnotatedString in _apply_wildcards ([#3798](#3798)) ([7886508](7886508)) ### Performance Improvements * retrieve storage inputs immediately before scheduling jobs instead of before running the entire workflow ([#3850](#3850)) ([4ac6cda](4ac6cda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
When an output file is marked to be updated snakemake fails if that file does not exist. That might happen when a file needs to be created by the rule before being updated in later workflow executions. This is fixed by validating if the output exists before creating a backup. In addition the behavior of restoring a backup after an error has been thrown needs to be adjusted. Previously an error was raised when no backup file exists for a file marked with `update`. As the newly introduced scenario might not have created a backup a waring is thrown instead of an error. ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Enhanced backup restoration to issue warnings instead of errors when backups are unavailable * Improved backup efficiency by only backing up files that exist * Refined directory backup handling to better preserve backup data after restoration <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.14.0](snakemake/snakemake@v9.13.7...v9.14.0) (2025-11-27) ### Features * Support Hy in script directive ([snakemake#3824](snakemake#3824)) ([2329c9e](snakemake@2329c9e)) ### Bug Fixes * Add support for pathlib in `notebook` field ([snakemake#3811](snakemake#3811)) ([7b2180a](snakemake@7b2180a)) * Addressed race condition in workdir_handler.py ([snakemake#3844](snakemake#3844)) ([8dbfcfb](snakemake@8dbfcfb)) * cleanup update-marked output files of failed jobs if there was no backup to restore them ([snakemake#3843](snakemake#3843)) ([41f1ce8](snakemake@41f1ce8)) * correct Windows callable path handling ([snakemake#3832](snakemake#3832)) ([5caad70](snakemake@5caad70)) * expand env vars on resources ([snakemake#3823](snakemake#3823)) ([fcfa1bc](snakemake@fcfa1bc)) * fix backup for output marked by `update` ([snakemake#3839](snakemake#3839)) ([09c64b7](snakemake@09c64b7)) * Minor fixes/additions to logging module. ([snakemake#3802](snakemake#3802)) ([3b3986d](snakemake@3b3986d)) * mount local storage prefix into containers ([snakemake#3840](snakemake#3840)) ([f1e8b62](snakemake@f1e8b62)) * properly format input/output files in case of missing rule to produce them ([snakemake#3849](snakemake#3849)) ([69d5d24](snakemake@69d5d24)) * Unpack AnnotatedString in _apply_wildcards ([snakemake#3798](snakemake#3798)) ([7886508](snakemake@7886508)) ### Performance Improvements * retrieve storage inputs immediately before scheduling jobs instead of before running the entire workflow ([snakemake#3850](snakemake#3850)) ([4ac6cda](snakemake@4ac6cda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
When an output file is marked to be updated snakemake fails if that file does not exist. That might happen when a file needs to be created by the rule before being updated in later workflow executions. This is fixed by validating if the output exists before creating a backup. In addition the behavior of restoring a backup after an error has been thrown needs to be adjusted. Previously an error was raised when no backup file exists for a file marked with
update. As the newly introduced scenario might not have created a backup a waring is thrown instead of an error.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Release Notes