Skip to content

fix: fix backup for output marked by update#3839

Merged
johanneskoester merged 1 commit intomainfrom
fix/backup_output
Nov 19, 2025
Merged

fix: fix backup for output marked by update#3839
johanneskoester merged 1 commit intomainfrom
fix/backup_output

Conversation

@FelixMoelder
Copy link
Copy Markdown
Contributor

@FelixMoelder FelixMoelder commented Nov 13, 2025

When an output file is marked to be updated snakemake fails if that file does not exist. That might happen when a file needs to be created by the rule before being updated in later workflow executions. This is fixed by validating if the output exists before creating a backup. In addition the behavior of restoring a backup after an error has been thrown needs to be adjusted. Previously an error was raised when no backup file exists for a file marked with update. As the newly introduced scenario might not have created a backup a waring is thrown instead of an error.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced backup restoration to issue warnings instead of errors when backups are unavailable
    • Improved backup efficiency by only backing up files that exist
    • Refined directory backup handling to better preserve backup data after restoration

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 13, 2025

📝 Walkthrough

Walkthrough

The changes modify backup and restore behavior for output files in Snakemake jobs. Backup operations now conditionally check file existence before backing up, and restore operations log warnings instead of errors when backups don't exist, with differentiated cleanup behavior for file versus directory backups.

Changes

Cohort / File(s) Summary
Backup and Restore Logic
src/snakemake/jobs.py, src/snakemake/persistence.py
Modified backup behavior to conditionally check file existence before backing up. Refactored restore control flow from separate if checks to if/elif chain. Changed error handling to log warnings when backups don't exist. Directory backups are no longer deleted after restoration; only file backups are removed.

Sequence Diagram

sequenceDiagram
    participant Job
    participant File
    participant Persistence
    
    Note over Job,Persistence: New Backup Flow
    Job->>File: Check if exists (await)
    alt File exists
        File-->>Job: True
        Job->>Persistence: backup_output()
    else File does not exist
        File-->>Job: False
        Note over Job: Skip backup
    end
    Job->>File: prepare()
    
    Note over Job,Persistence: New Restore Flow
    Persistence->>Persistence: Check if backup exists
    alt Backup is file
        Persistence->>File: Remove existing path
        Persistence->>File: Copy backup file
        Persistence->>File: Delete backup file
    else Backup is directory
        Persistence->>File: Remove existing path
        Persistence->>File: Copy backup directory
        Note over Persistence: Backup NOT deleted
    else No backup exists
        Persistence->>Persistence: Log warning
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Attention areas:
    • Verify the conditional existence check in jobs.py integrates correctly with the async/await pattern
    • Confirm the if/elif refactoring in persistence.py covers all intended backup state cases
    • Validate that directory backup preservation (non-deletion) doesn't cause unintended side effects in cleanup workflows

Possibly related PRs

  • PR #3833: Introduces backup/restore methods and job-level backup handling for update-marked outputs; the current PR modifies the backup flow by adding existence checks and altering restore/cleanup behavior.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main change: fixing backup behavior for output marked with the update flag.
Description check ✅ Passed The description explains the problem, the solution, and includes the QC checklist template, though QC items are unchecked indicating incomplete preparation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/backup_output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FelixMoelder FelixMoelder changed the title fix: fix backup output fix: fix backup for output marked by update Nov 13, 2025
@FelixMoelder
Copy link
Copy Markdown
Contributor Author

I think this PR is not finalized as one case I can think of is not implemented.
Let us assume an output is marked with update but the output file does not exist, yet.
In that case no backup will be created and in case of failing there will only be a warning that no backup exists. Still, it might be that some output has already been written before failing. If that is the case snakemake will not remove the output.
Solving this should be straightforward but I am not sure if there is some function to call for removing incomplete output.

@FelixMoelder FelixMoelder marked this pull request as ready for review November 14, 2025 07:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/snakemake/jobs.py (1)

841-844: Conditional backup logic correctly implemented, but known cleanup gap remains.

The existence check before backup creation properly addresses the case where an update output doesn't exist yet. However, the unresolved scenario you mentioned in the PR comments remains: if the file doesn't exist initially (no backup created) and the job fails mid-execution, the incomplete output won't be removed because the cleanup method at lines 954-977 excludes outputs flagged with update.

To address the incomplete output cleanup, you could modify the cleanup method to remove update-flagged outputs when no backup exists. Would you like me to propose an implementation, or open an issue to track this?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2329c9e and 3a423c7.

📒 Files selected for processing (2)
  • src/snakemake/jobs.py (1 hunks)
  • src/snakemake/persistence.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/jobs.py
  • src/snakemake/persistence.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: kdm9
Repo: snakemake/snakemake PR: 3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.
Learnt from: mbhall88
Repo: snakemake/snakemake PR: 3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.
📚 Learning: 2025-05-23T09:40:24.474Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.

Applied to files:

  • src/snakemake/jobs.py
📚 Learning: 2024-11-05T09:41:56.469Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3184
File: snakemake/dag.py:1340-1342
Timestamp: 2024-11-05T09:41:56.469Z
Learning: The method `self.workflow.persistence.software_stack_changed(job)` in `snakemake/persistence.py` is not asynchronous.

Applied to files:

  • src/snakemake/jobs.py
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
Repo: snakemake/snakemake PR: 3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.

Applied to files:

  • src/snakemake/jobs.py
🧬 Code graph analysis (1)
src/snakemake/jobs.py (1)
src/snakemake/io/__init__.py (2)
  • is_flagged (979-982)
  • exists (515-519)
🔇 Additional comments (1)
src/snakemake/persistence.py (1)

314-322: LGTM! Warning instead of error aligns with conditional backup behavior.

The change from raising an error to logging a warning when no backup exists is correct for the new scenario where backups are only created for existing files. The if/elif/else structure clearly handles both directory and file backups, with proper cleanup in both cases.

@johanneskoester johanneskoester merged commit 09c64b7 into main Nov 19, 2025
61 checks passed
@johanneskoester johanneskoester deleted the fix/backup_output branch November 19, 2025 08:59
johanneskoester pushed a commit that referenced this pull request Nov 27, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.14.0](v9.13.7...v9.14.0)
(2025-11-27)


### Features

* Support Hy in script directive
([#3824](#3824))
([2329c9e](2329c9e))


### Bug Fixes

* Add support for pathlib in `notebook` field
([#3811](#3811))
([7b2180a](7b2180a))
* Addressed race condition in workdir_handler.py
([#3844](#3844))
([8dbfcfb](8dbfcfb))
* cleanup update-marked output files of failed jobs if there was no
backup to restore them
([#3843](#3843))
([41f1ce8](41f1ce8))
* correct Windows callable path handling
([#3832](#3832))
([5caad70](5caad70))
* expand env vars on resources
([#3823](#3823))
([fcfa1bc](fcfa1bc))
* fix backup for output marked by `update`
([#3839](#3839))
([09c64b7](09c64b7))
* Minor fixes/additions to logging module.
([#3802](#3802))
([3b3986d](3b3986d))
* mount local storage prefix into containers
([#3840](#3840))
([f1e8b62](f1e8b62))
* properly format input/output files in case of missing rule to produce
them ([#3849](#3849))
([69d5d24](69d5d24))
* Unpack AnnotatedString in _apply_wildcards
([#3798](#3798))
([7886508](7886508))


### Performance Improvements

* retrieve storage inputs immediately before scheduling jobs instead of
before running the entire workflow
([#3850](#3850))
([4ac6cda](4ac6cda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
When an output file is marked to be updated snakemake fails if that file
does not exist. That might happen when a file needs to be created by the
rule before being updated in later workflow executions. This is fixed by
validating if the output exists before creating a backup. In addition
the behavior of restoring a backup after an error has been thrown needs
to be adjusted. Previously an error was raised when no backup file
exists for a file marked with `update`. As the newly introduced scenario
might not have created a backup a waring is thrown instead of an error.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Bug Fixes**
* Enhanced backup restoration to issue warnings instead of errors when
backups are unavailable
  * Improved backup efficiency by only backing up files that exist
* Refined directory backup handling to better preserve backup data after
restoration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.14.0](snakemake/snakemake@v9.13.7...v9.14.0)
(2025-11-27)


### Features

* Support Hy in script directive
([snakemake#3824](snakemake#3824))
([2329c9e](snakemake@2329c9e))


### Bug Fixes

* Add support for pathlib in `notebook` field
([snakemake#3811](snakemake#3811))
([7b2180a](snakemake@7b2180a))
* Addressed race condition in workdir_handler.py
([snakemake#3844](snakemake#3844))
([8dbfcfb](snakemake@8dbfcfb))
* cleanup update-marked output files of failed jobs if there was no
backup to restore them
([snakemake#3843](snakemake#3843))
([41f1ce8](snakemake@41f1ce8))
* correct Windows callable path handling
([snakemake#3832](snakemake#3832))
([5caad70](snakemake@5caad70))
* expand env vars on resources
([snakemake#3823](snakemake#3823))
([fcfa1bc](snakemake@fcfa1bc))
* fix backup for output marked by `update`
([snakemake#3839](snakemake#3839))
([09c64b7](snakemake@09c64b7))
* Minor fixes/additions to logging module.
([snakemake#3802](snakemake#3802))
([3b3986d](snakemake@3b3986d))
* mount local storage prefix into containers
([snakemake#3840](snakemake#3840))
([f1e8b62](snakemake@f1e8b62))
* properly format input/output files in case of missing rule to produce
them ([snakemake#3849](snakemake#3849))
([69d5d24](snakemake@69d5d24))
* Unpack AnnotatedString in _apply_wildcards
([snakemake#3798](snakemake#3798))
([7886508](snakemake@7886508))


### Performance Improvements

* retrieve storage inputs immediately before scheduling jobs instead of
before running the entire workflow
([snakemake#3850](snakemake#3850))
([4ac6cda](snakemake@4ac6cda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants