Skip to content

Commit bd94dc1

Browse files
Hocnonsensejohanneskoestercoderabbitai[bot]
authored
fix: Confusion with Overriding input After Snakemake Modularization (#3714)
will fix #3713 ## 1: Support recursive module modification via `WorkflowModifier` To make workflows recursively apply modifiers like path prefix and rule name changes across nested modules, I introduced the following changes: - src/snakemake/ruleinfo.py: - Made `RuleInfo` recursively apply `parent_modifier` to adjust paths and rule names. - src/snakemake/path_modifier.py: - Added `inner_modifier` to support chained path transformations in nested modules. - src/snakemake/modules.py: - Modified `get_name_modifier_func` and `WorkflowModifier.modify_rulename` to support recursive rule renaming. - Replaced `WorkflowModifier.skip_rule` with `avail_rulename` to ensure that `rule_whitelist` and `rule_exclude_list` are respected at every module level. ## 2: Make multiple use statements with specific_rule more reliable and strict My understanding is that `rule.name` should be unique in a workflow. Therefore, a second `use rule from ... with ...` should only be allowed if the rule was previously imported with the same name via a wildcard (`use * from ...`) and is now being refined. - to clarify: - A given rule from a module can be used **multiple times** in the same workflow, as long as each usage assigns it a **unique name**. - However, using the **same rule name** more than once is **not allowed**, to prevent accidental overwrites, just like any rule rule defined **out of** module **cannot** be overwrite. - That’s the restriction behind the “only once” statement -- it applies per final rule name, not per source rule. - src/snakemake/rules.py: - The old logic stored rule dependencies as objects. If a rule was modified afterward, those changes wouldn't propagate. - Now, rule dependencies are resolved dynamically by name, ensuring correct updated rule with parameters modified. - src/snakemake/modules.py: - Stricter checks for rules overwriting. `ModuleInfo.use_rules` now `allow_rule_overwrite` for specific_rule only if it was previously imported via `use * from ...` and not already customized. - src/snakemake/workflow.py: - Adjusted to work with the new `WorkflowModifier`. ## Other changes: - Simplify `WorkflowModifier.__init__`. The module’s `__name__` is now assigned as `WorkflowModifier.namespace`, making workflow development and debugging easier. ### QC * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Multi-stage path modifiers, a public module registry, and improved name/path/wrapper modifier lifecycle for more predictable module imports. * **Bug Fixes** * Prevents unintended duplicate rule creation from wildcard/name modifications and enforces explicit overwrite semantics; tighter path-replacement guards and more reliable dependency resolution. * **Tests** * Updated fixtures for renamed, aliased and nested modules/rules to reflect modifier behavior changes. * **Documentation** * Expanded module-import and rule-customization guidance with examples and conflict-resolution advice. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 3ffd8e1 commit bd94dc1

13 files changed

Lines changed: 295 additions & 160 deletions

File tree

docs/snakefiles/modularization.rst

Lines changed: 51 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ With Snakemake 6.0 and later, it is possible to define external workflows as mod
128128
snakefile:
129129
# here, plain paths, URLs and the special markers for code hosting providers (see below) are possible.
130130
"other_workflow/Snakefile"
131-
131+
132132
use rule * from other_workflow exclude ruleC as other_*
133133
134134
The ``module other_workflow:`` statement registers the external workflow as a module, by defining the path to the main snakefile of ``other_workflow``.
@@ -160,18 +160,18 @@ It is possible to overwrite the global config dictionary for the module, which i
160160
# here, plain paths, URLs and the special markers for code hosting providers (see below) are possible.
161161
snakefile: "other_workflow/Snakefile"
162162
config: config["other-workflow"]
163-
163+
164164
use rule * from other_workflow as other_*
165165
166166
In this case, any ``configfile`` statements inside the module are ignored.
167167
In addition, it is possible to skip any :ref:`validation <snakefiles_config_validation>` statements in the module, by specifying ``skip_validation: True`` in the module statement.
168168
Moreover, one can automatically move all relative input and output files of a module into a dedicated folder by specifying ``prefix: "foo"`` in the module definition, e.g. any output file ``path/to/output.txt`` in the module would be stored under ``foo/path/to/output.txt`` instead.
169169
This becomes particularly useful when combining multiple modules, see :ref:`use_with_modules`.
170-
However, if you have some input files that come from outside the workflow, you can use the ``local`` flag so that their path is not modified (see :ref:`snakefiles-storage-local-files`)..
170+
However, if you have some input files that come from outside the workflow, you can use the ``local`` flag so that their path is not modified (see :ref:`snakefiles-storage-local-files`).
171171

172-
Instead of using all rules, it is possible to import specific rules.
173-
Specific rules may even be modified before using them, via a final ``with:`` followed by a block that lists items to overwrite.
174-
This modification can be performed after a general import, and will overwrite any unmodified import of the same rule.
172+
Instead of using all rules, you can selectively import specific rules from modules.
173+
These rules can also be modified during import via a final ``with:`` followed by a block that lists items to overwrite.
174+
This behaves similarly to inheriting an existing rule within the current workflow, but with a ``from`` statement to declare the original module (see :ref:`snakefiles-rule-inheritance`).
175175

176176
.. code-block:: python
177177
@@ -189,15 +189,52 @@ This modification can be performed after a general import, and will overwrite an
189189
output:
190190
"results/some-result.txt"
191191
192-
By such a modifying use statement, any properties of the rule (``input``, ``output``, ``log``, ``params``, ``benchmark``, ``threads``, ``resources``, etc.) can be overwritten, except the actual execution step (``shell``, ``notebook``, ``script``, ``cwl``, or ``run``).
192+
When using the ``with:`` block, keyword arguments in ``params`` will be selectively replaced, while positional arguments are overwritten if provided.
193+
All other properties (e.g., ``input``, ``output``, ``log``, ``params``, etc.) will be fully overwritten with the values specified in the block, except the actual execution step (``shell``, ``notebook``, ``script``, ``cwl``, or ``run``).
194+
195+
Note that the second use statement has to use the original rule name, not the one that has been prefixed with ``other_`` via the first use statement (there is no rule ``other_some_task`` in the module ``other_workflow``).
193196

194197
.. note::
195-
Modification of `params` allows the replacement of single keyword arguments. Keyword `params` arguments of the original rule that are not defined after `with` are inherited. Positional `params` arguments of the original rule are overwritten, if positional `params` arguments are given after `with`.
196-
All other properties are overwritten with the values specified after `with`.
197198

198-
Note that the second use statement has to use the original rule name, not the one that has been prefixed with ``other_`` via the first use statement (there is no rule ``other_some_task`` in the module ``other_workflow``).
199-
In order to overwrite the rule ``some_task`` that has been imported with the first ``use rule`` statement, it is crucial to ensure that the rule is used with the same name in the second statement, by adding an equivalent ``as`` clause (here ``other_some_task``).
200-
Otherwise, you will have two versions of the same rule, which might be unintended (a common symptom of such unintended repeated uses would be ambiguous rule exceptions thrown by Snakemake).
199+
A rule cannot be overwritten under the same name, unless it was previously imported via `use rule * from ...` statement.
200+
This is the **only allowed scenario** where an existing rule name may be overwritten, and is provided for convenience when selectively customizing some rules without introducing new names.
201+
In such cases, the second statement uses the same final name as produced by the previous import (via the `as` clause).
202+
Importantly, once a rule has been modified in this way, it cannot be redefined or modified again under the same name, but you should import under different names to customize the same rule multiple times:
203+
204+
.. code-block:: python
205+
206+
use rule * from other_workflow as other_*
207+
208+
use rule some_task from other_workflow as other_some_task with:
209+
output:
210+
"results/some-result.txt"
211+
212+
use rule some_task from other_workflow as else_some_task with:
213+
output:
214+
"custom_output.txt"
215+
216+
Once a rule has been modified this way under a given name, it **cannot** be redefined or modified again under the same name:
217+
218+
.. code-block:: python
219+
220+
use rule some_task from other_workflow as other_some_task with:
221+
output:
222+
"results/some-result.txt"
223+
224+
use rule some_task from other_workflow as other_some_task with:
225+
threads: 1
226+
# Not allowed: "other_some_task" was already defined above.
227+
228+
Similarly, if a `use rule * from ...` statement would result in a rule name that collides with a previously defined rule (regardless of its source), Snakemake will raise an error, and you should resolve the conflict by changing the import order or using a different `as` modifier:
229+
230+
.. code-block:: python
231+
232+
use rule some_task from other_workflow as else_some_task with:
233+
output:
234+
"custom_output.txt"
235+
236+
use rule * from other_workflow as else_*
237+
# Will fail: "else_some_task" is already defined.
201238
202239
Of course, it is possible to combine the use of rules from multiple modules (see :ref:`use_with_modules`), and via modifying statements they can be rewired and reconfigured in an arbitrary way.
203240

@@ -209,7 +246,7 @@ Dynamic Modules
209246

210247
With Snakemake 9.0 and later, it is possible to load modules dynamically by providing the ``name`` keyword inside the module definition.
211248
For example, by reading the module name from a config file or by iterating over several modules in a loop.
212-
For this, the module name is not specified directly after the ``module`` keyword, but by specifying the ``name`` parameter.
249+
For this, the module name is not specified directly after the ``module`` keyword, but by specifying the ``name`` parameter.
213250

214251

215252
.. code-block:: python
@@ -289,7 +326,7 @@ Code hosting providers
289326
----------------------
290327

291328
To obtain the correct URL to an external source code resource (e.g. a snakefile, see :ref:`snakefiles-modules`), Snakemake provides markers for code hosting providers.
292-
Currently, Github
329+
Currently, Github
293330

294331
.. code-block:: python
295332

docs/snakefiles/rules.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2944,6 +2944,12 @@ In reality, one will often change more.
29442944
Analogously to the ``use rule`` from external modules, any properties of the rule (``input``, ``output``, ``log``, ``params``, ``benchmark``, ``threads``, ``resources``, etc.) can be modified, except the actual execution step (``shell``, ``notebook``, ``script``, ``cwl``, or ``run``).
29452945
All unmodified properties are inherited from the parent rule.
29462946

2947+
.. important::
2948+
A rule cannot be redefined without renaming it using the ``as`` clause.
2949+
Otherwise, you will have two versions of the same rule, which might be unintended (a common symptom of such unintended repeated uses would be ambiguous rule exceptions thrown by Snakemake).
2950+
However, it is allowed to create **multiple modified versions** of the same rule, as long as each has a **unique name**.
2951+
The only exception is when a rule was previously imported via a general ``use rule * from`` statement, such rules may be **further modified once** under the same final name for convenience (see :ref:`snakefiles-modules`).
2952+
29472953
.. note::
29482954
Modification of `params` allows the replacement of single keyword arguments.
29492955
Keyword `params` arguments of the original rule that are not defined after `with` are inherited.

0 commit comments

Comments
 (0)