Skip to content

feat: Add --cache to general_args for jobscripts#3080

Merged
johanneskoester merged 2 commits intosnakemake:mainfrom
FroeseLab:feature/cacheinjobs
Sep 9, 2024
Merged

feat: Add --cache to general_args for jobscripts#3080
johanneskoester merged 2 commits intosnakemake:mainfrom
FroeseLab:feature/cacheinjobs

Conversation

@votti
Copy link
Copy Markdown
Contributor

@votti votti commented Sep 9, 2024

In this PR the --cache is added to the general_args. This passes --cache to any executor that builds a job execution command using this method. Thus the --cache functionality is not ignored any more in remote executed jobs.

Background

Currently the --cache argument is not used and thus ignored when building the execution command for jobs depending on the general_args method of the SpawnedJobArgsFactory. Eg all executors depending on the RealExecutor implementation depend on this.
This causes jobs executed via these executors to miss the --cache argument and thus ignore the caching functionality in generated jobs.

This has been noticed in the Slurm executor plugin: snakemake/snakemake-executor-plugin-slurm#46

Additional considerations

  1. Arguably it would be more optimal to actually check the cache before submitting a job (ie if the rule is already cached, dont submit a job but directly link the results locally). In such a case the job_exec command would not even be needed in cases the cache is available, thus I dont think the implementation in this PR interferes with executors implementing such as feautre.

  2. Enabling the --cache mechanism on remote systems, assumes that the remote executor has access to the same filesystem the cache resides on. To me it is the obligation of the user to make sure the exector used is compatible with caching and in case it is not to diseable the caching functionality.

  3. Testing: I am currently not sure where/how to best test this. One test would be to run general_args with a workflow setting containing cache and check if it ends up in the generated args string - but this would require to mock lots of things, so would be quite a bit of effort.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Enhanced workflow configuration by adding a new argument for caching settings, improving workflow management capabilities.
  • Bug Fixes

    • Improved handling of workflow execution related to caching behavior, ensuring smoother operation.

Currently the `--cache` argument is not passed to the
`general_args` used to build the execution command for jobs.
This causes jobs executed to miss the `--cache` argument and thus
ignore the caching functionality.
This has been reported in executor plugins such as the slurm one:
snakemake/snakemake-executor-plugin-slurm#46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 9, 2024

Walkthrough

The pull request modifies the general_args function in the snakemake/spawn_jobs.py file by adding a new argument for workflow settings related to caching. This addition allows the function to accommodate additional configuration options, specifically w2a("workflow_settings.cache"), enhancing its interface for managing workflow execution and caching behavior.

Changes

File Change Summary
snakemake/spawn_jobs.py Added w2a("workflow_settings.cache") to general_args function.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4fb71b3 and 86088a1.

Files selected for processing (1)
  • snakemake/spawn_jobs.py (1 hunks)
Additional context used
Path-based instructions (1)
snakemake/spawn_jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Additional comments not posted (1)
snakemake/spawn_jobs.py (1)

297-297: Approved: Addition of --cache argument to general_args.

The addition of the --cache argument at line 297 is correctly implemented and aligns with the PR objectives to enhance caching capabilities for remote jobs.

Please ensure that the integration of the --cache argument is thoroughly tested, especially in environments where remote executors are used. This verification is crucial to confirm that the caching functionality behaves as expected across different scenarios.

@votti votti changed the title Add --cache to general_args for jobscripts feat: Add --cache to general_args for jobscripts Sep 9, 2024
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Sep 9, 2024

Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good catch! THanks a lot! I think your considerations are all fine. There is no downside to pass the argument like this.

@johanneskoester johanneskoester merged commit 43690be into snakemake:main Sep 9, 2024
johanneskoester pushed a commit that referenced this pull request Sep 9, 2024
In this PR the `--cache` is added to the `general_args`. This passes
`--cache` to any executor that builds a job execution command using this
method. Thus the `--cache` functionality is not ignored any more in
remote executed jobs.

Currently the `--cache` argument is not used and thus ignored when
building the execution command for jobs depending on the `general_args`
method of the `SpawnedJobArgsFactory`. Eg all executors depending on the
[`RealExecutor` implementation
](https://github.com/snakemake/snakemake-interface-executor-plugins/blob/56b8c84a18b3b8b6a0497201cf17defbbaffa425/snakemake_interface_executor_plugins/executors/real.py#L141-L177)
depend on this.
This causes jobs executed via these executors to miss the `--cache`
argument and thus ignore the caching functionality in generated jobs.

This has been noticed in the Slurm executor plugin:
snakemake/snakemake-executor-plugin-slurm#46

1. Arguably it would be more optimal to actually check the cache before
submitting a job (ie if the rule is already cached, dont submit a job
but directly link the results locally). In such a case the `job_exec`
command would not even be needed in cases the cache is available, thus I
dont think the implementation in this PR interferes with executors
implementing such as feautre.

2. Enabling the `--cache` mechanism on remote systems, assumes that the
remote executor has access to the same filesystem the cache resides on.
To me it is the obligation of the user to make sure the exector used is
compatible with caching and in case it is not to diseable the caching
functionality.

3. Testing: I am currently not sure where/how to best test this. One
test would be to run `general_args` with a `workflow` setting containing
cache and check if it ends up in the generated `args` string - but this
would require to mock lots of things, so would be quite a bit of effort.

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

- **New Features**
- Enhanced workflow configuration by adding a new argument for caching
settings, improving workflow management capabilities.

- **Bug Fixes**
- Improved handling of workflow execution related to caching behavior,
ensuring smoother operation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Vito Zanotelli <[email protected]>
johanneskoester pushed a commit that referenced this pull request Sep 9, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.20.3](v8.20.2...v8.20.3)
(2024-09-09)


### Bug Fixes

* Add --cache to general_args for jobscripts
([#3080](#3080))
([884498b](884498b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants