feat: Add --cache to general_args for jobscripts#3080
feat: Add --cache to general_args for jobscripts#3080johanneskoester merged 2 commits intosnakemake:mainfrom
Conversation
Currently the `--cache` argument is not passed to the `general_args` used to build the execution command for jobs. This causes jobs executed to miss the `--cache` argument and thus ignore the caching functionality. This has been reported in executor plugins such as the slurm one: snakemake/snakemake-executor-plugin-slurm#46
WalkthroughThe pull request modifies the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- snakemake/spawn_jobs.py (1 hunks)
Additional context used
Path-based instructions (1)
snakemake/spawn_jobs.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Additional comments not posted (1)
snakemake/spawn_jobs.py (1)
297-297: Approved: Addition of--cacheargument togeneral_args.The addition of the
--cacheargument at line 297 is correctly implemented and aligns with the PR objectives to enhance caching capabilities for remote jobs.Please ensure that the integration of the
--cacheargument is thoroughly tested, especially in environments where remote executors are used. This verification is crucial to confirm that the caching functionality behaves as expected across different scenarios.
|
johanneskoester
left a comment
There was a problem hiding this comment.
This is a very good catch! THanks a lot! I think your considerations are all fine. There is no downside to pass the argument like this.
In this PR the `--cache` is added to the `general_args`. This passes `--cache` to any executor that builds a job execution command using this method. Thus the `--cache` functionality is not ignored any more in remote executed jobs. Currently the `--cache` argument is not used and thus ignored when building the execution command for jobs depending on the `general_args` method of the `SpawnedJobArgsFactory`. Eg all executors depending on the [`RealExecutor` implementation ](https://github.com/snakemake/snakemake-interface-executor-plugins/blob/56b8c84a18b3b8b6a0497201cf17defbbaffa425/snakemake_interface_executor_plugins/executors/real.py#L141-L177) depend on this. This causes jobs executed via these executors to miss the `--cache` argument and thus ignore the caching functionality in generated jobs. This has been noticed in the Slurm executor plugin: snakemake/snakemake-executor-plugin-slurm#46 1. Arguably it would be more optimal to actually check the cache before submitting a job (ie if the rule is already cached, dont submit a job but directly link the results locally). In such a case the `job_exec` command would not even be needed in cases the cache is available, thus I dont think the implementation in this PR interferes with executors implementing such as feautre. 2. Enabling the `--cache` mechanism on remote systems, assumes that the remote executor has access to the same filesystem the cache resides on. To me it is the obligation of the user to make sure the exector used is compatible with caching and in case it is not to diseable the caching functionality. 3. Testing: I am currently not sure where/how to best test this. One test would be to run `general_args` with a `workflow` setting containing cache and check if it ends up in the generated `args` string - but this would require to mock lots of things, so would be quite a bit of effort. * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - **New Features** - Enhanced workflow configuration by adding a new argument for caching settings, improving workflow management capabilities. - **Bug Fixes** - Improved handling of workflow execution related to caching behavior, ensuring smoother operation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Vito Zanotelli <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [8.20.3](v8.20.2...v8.20.3) (2024-09-09) ### Bug Fixes * Add --cache to general_args for jobscripts ([#3080](#3080)) ([884498b](884498b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>



In this PR the
--cacheis added to thegeneral_args. This passes--cacheto any executor that builds a job execution command using this method. Thus the--cachefunctionality is not ignored any more in remote executed jobs.Background
Currently the
--cacheargument is not used and thus ignored when building the execution command for jobs depending on thegeneral_argsmethod of theSpawnedJobArgsFactory. Eg all executors depending on theRealExecutorimplementation depend on this.This causes jobs executed via these executors to miss the
--cacheargument and thus ignore the caching functionality in generated jobs.This has been noticed in the Slurm executor plugin: snakemake/snakemake-executor-plugin-slurm#46
Additional considerations
Arguably it would be more optimal to actually check the cache before submitting a job (ie if the rule is already cached, dont submit a job but directly link the results locally). In such a case the
job_execcommand would not even be needed in cases the cache is available, thus I dont think the implementation in this PR interferes with executors implementing such as feautre.Enabling the
--cachemechanism on remote systems, assumes that the remote executor has access to the same filesystem the cache resides on. To me it is the obligation of the user to make sure the exector used is compatible with caching and in case it is not to diseable the caching functionality.Testing: I am currently not sure where/how to best test this. One test would be to run
general_argswith aworkflowsetting containing cache and check if it ends up in the generatedargsstring - but this would require to mock lots of things, so would be quite a bit of effort.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes