You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* adapt to changes in storage plugin interface, now passing the logger to the storage provider ([#3460](https://github.com/snakemake/snakemake/issues/3460)) ([ac34f11](https://github.com/snakemake/snakemake/commit/ac34f11e704730c0bf0b9a89adcfb8909fb0baf0))
10
+
* finalized support for access pattern annotation (sequential, random, multi), allowing storage plugins to decide about the most efficient provisioning approach (e.g. mounting vs. downloading a local copy) ([#3461](https://github.com/snakemake/snakemake/issues/3461)) ([871c5ab](https://github.com/snakemake/snakemake/commit/871c5ab83def456f7f239f212080bcef447dc08b))
11
+
* introduce ability to annotate access pattern (will e.g. be usable for optimizations in storage plugins) ([#3459](https://github.com/snakemake/snakemake/issues/3459)) ([6e5e65b](https://github.com/snakemake/snakemake/commit/6e5e65b4184b842df1519469f62e847fc52c2d20))
12
+
13
+
14
+
### Bug Fixes
15
+
16
+
* only warn upon positional parameter overwrite with "use rule" in case the number of positional parameters changes ([#3457](https://github.com/snakemake/snakemake/issues/3457)) ([ec18f98](https://github.com/snakemake/snakemake/commit/ec18f98fd94e6133980b4575f932be841e3d2dbf))
17
+
* setup logger for non-executing subcommands ([de5c7a3](https://github.com/snakemake/snakemake/commit/de5c7a35a5d439c073194cf8c30984463180a471))
18
+
19
+
20
+
### Documentation
21
+
22
+
* Fix indentation of datavzrd configuration of tutorial ([#3465](https://github.com/snakemake/snakemake/issues/3465)) ([96ec187](https://github.com/snakemake/snakemake/commit/96ec18700e3616794c4da72937e879f7e789b8e7))
*[#3412](https://github.com/snakemake/snakemake/issues/3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([#3430](https://github.com/snakemake/snakemake/issues/3430)) ([22978c3](https://github.com/snakemake/snakemake/commit/22978c3a9479d0f0a94f33ea74e91ce06f83d2d7))
14
-
* add flag --report-after-run to automatically generate the report after a successfull workflow run ([#3428](https://github.com/snakemake/snakemake/issues/3428)) ([b0a7f03](https://github.com/snakemake/snakemake/commit/b0a7f03e824beae5985e542d80d46c3d75bfc823))
42
+
* add flag --report-after-run to automatically generate the report after a successful workflow run ([#3428](https://github.com/snakemake/snakemake/issues/3428)) ([b0a7f03](https://github.com/snakemake/snakemake/commit/b0a7f03e824beae5985e542d80d46c3d75bfc823))
15
43
* add flatten function to IO utils ([#3424](https://github.com/snakemake/snakemake/issues/3424)) ([67fa392](https://github.com/snakemake/snakemake/commit/67fa392c1eedab2c7b0aaa5c38ac1d9403912497))
16
44
* add helper functions to parse input files ([#2918](https://github.com/snakemake/snakemake/issues/2918)) ([63e45a7](https://github.com/snakemake/snakemake/commit/63e45a70ae57bc46b345c69b7c2f18d7c811b176))
17
45
* Add option to print redacted file names ([#3089](https://github.com/snakemake/snakemake/issues/3089)) ([ba4d264](https://github.com/snakemake/snakemake/commit/ba4d2644aab18b43a8704e883f48c428d8b35d5a))
Copy file name to clipboardExpand all lines: docs/executing/cli.rst
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,11 @@ Non-local execution
73
73
^^^^^^^^^^^^^^^^^^^
74
74
75
75
Non-local execution on cluster or cloud infrastructure is implemented via plugins.
76
-
The `Snakemake plugin catalog <https://snakemake.github.io/snakemake-plugin-catalog>`_ lists available plugins and their documentation.
76
+
The `Snakemake plugin catalog <https://snakemake.github.io/snakemake-plugin-catalog>`__ lists available plugins and their documentation.
77
+
In general, the configuration boils down to specifying an executor plugin (e.g. for SLURM or Kubernetes) and, if needed, a :ref:`storage <default_storage>` plugin (e.g. in order to use S3 for input and output files or in order to efficiently use a shared network filesystem).
78
+
For maximizing the I/O performance over the network, it can be advisable to :ref:`annotate the input file access patterns of rules <storage-access-patterns>`.
79
+
Snakemake provides lots of tunables for non-local execution, which can all be found under :ref:`all_options` and in the plugin descriptions of the `Snakemake plugin catalog <https://snakemake.github.io/snakemake-plugin-catalog>`__.
80
+
In any case, the cluster or cloud specific configuration will entail lots of command line options to be chosen and set, which should be persisted in a :ref:`profile <executing-profiles>`.
77
81
78
82
Dealing with very large workflows
79
83
---------------------------------
@@ -106,7 +110,7 @@ Snakemake will process beyond the rule ``myrule``, because all of its input file
106
110
Obviously, a good choice of the rule to perform the batching is a rule that has a lot of input files and upstream jobs, for example a central aggregation step within your workflow.
107
111
We advice all workflow developers to inform potential users of the best suited batching rule.
Copy file name to clipboardExpand all lines: docs/getting_started/migration.rst
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,12 @@ Sometimes, new features are added that do not require, but make it strongly advi
11
11
12
12
Below are migration hints for particular Snakemake versions.
13
13
14
+
Migrating to Snakemake 9
15
+
------------------------
16
+
17
+
Between Snakemake 8 and Snakemake 9, there is only a single breaking change in how custom loggers are provided, such that hardly any user should be affected.
18
+
The new way to specify custom log handlers is specifying a logger plugin via ``--logger`` or ``OutputSettings.log_handler_settings`` in the API.
19
+
14
20
Migrating to Snakemake 8
15
21
------------------------
16
22
@@ -571,7 +577,7 @@ Profiles
571
577
^^^^^^^^
572
578
573
579
Profiles can now be versioned.
574
-
If your profile makes use of settings that are available in version 8 or later, use the filename ``config.v8+.yaml`` for the profile configuration (see :ref:`profiles<profiles>`).
580
+
If your profile makes use of settings that are available in version 8 or later, use the filename ``config.v8+.yaml`` for the profile configuration (see :ref:`executing-profiles`).
Copy file name to clipboardExpand all lines: docs/project_info/contributing.rst
+11-4Lines changed: 11 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,8 +56,10 @@ Write Documentation
56
56
57
57
Snakemake could always use more documentation, whether as part of the official docs, in docstrings, or even on the web in blog posts, articles, and such.
58
58
59
-
Snakemake uses `Sphinx <https://sphinx-doc.org>`_ for the user manual (that you are currently reading).
60
-
See :ref:`project_info-doc_guidelines` on how the documentation reStructuredText is used.
Snakemake uses `Sphinx`_ for the user manual (that you are currently reading).
62
+
See :ref:`project_info-doc_guidelines` on how the reStructuredText is used for the documentation.
61
63
62
64
63
65
@@ -250,11 +252,16 @@ The existing unit tests should all cope with this, and in general you should avo
250
252
Documentation Guidelines
251
253
========================
252
254
255
+
The documentation uses `Sphinx`_ and is written in ``reStructuredText``.
256
+
For details on the syntax, see the `Sphinx primer on reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#rst-primer>`_ and the `Sphinx documentation on cross-references <https://www.sphinx-doc.org/en/master/usage/referencing.html>`_.
257
+
253
258
For the documentation, please adhere to the following guidelines:
254
259
255
260
- Put each sentence on its own line, this makes tracking changes through Git SCM easier.
256
-
- Provide hyperlink targets, at least for the first two section levels.
257
-
For this, use the format ``<document_part>-<section_name>``, e.g., ``project_info-doc_guidelines``.
261
+
- Provide `hyperlink targets <https://www.sphinx-doc.org/en/master/usage/referencing.html#cross-referencing-arbitrary-locations>`_, at least for the first two section levels.
262
+
For this, use the format ``<document_part>-<section_name>``, for example ``project_info-doc_guidelines`` for the current section.
263
+
Set the hyperlink target right above the section heading with ``.. _project_info-doc_guidelines:``.
264
+
Reference the hyperlink (i.e. link to it) with ``:ref:`project_info-doc_guidelines```.
258
265
- Use the `section structure recommended by Sphinx <https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections>`_, which references the `recommendations in the Python Developer's Guide <https://devguide.python.org/documentation/markup/#sections>`_.
Copy file name to clipboardExpand all lines: docs/snakefiles/rules.rst
+36-3Lines changed: 36 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ However, rules can be much more complex, may use :ref:`plain python <snakefiles-
23
23
24
24
Inside the shell command, all local and global variables, especially input and output files can be accessed via their names in the `python format minilanguage <https://docs.python.org/py3k/library/string.html#formatspec>`_.
25
25
Here, input and output (and in general any list or tuple) automatically evaluate to a space-separated list of files (i.e. ``path/to/inputfile path/to/other/inputfile``).
26
-
From Snakemake 3.8.0 on, adding the special formatting instruction ``:q`` (e.g. ``"somecommand {input:q} {output:q}")``) will let Snakemake quote each of the list or tuple elements that contains whitespace.
26
+
From Snakemake 3.8.0 on, adding the special formatting instruction ``:q`` (e.g. ``"somecommand {input:q} {output:q}"``) will let Snakemake quote each of the list or tuple elements that contains whitespace.
27
27
28
28
.. note::
29
29
@@ -842,7 +842,7 @@ Snakemake will always round the calculated value down (while enforcing a minimum
842
842
843
843
Starting from version 3.7, threads can also be a callable that returns an ``int`` value. The signature of the callable should be ``callable(wildcards[, input])`` (input is an optional parameter). It is also possible to refer to a predefined variable (e.g, ``threads: threads_max``) so that the number of cores for a set of rules can be changed with one change only by altering the value of the variable ``threads_max``.
844
844
845
-
Both threads can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` see :ref:`all_options` and via workflow profiles, see :ref:`profiles`.
845
+
Both threads can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` see :ref:`all_options` and via workflow profiles, see :ref:`executing-profiles`.
846
846
To quickly exemplify the latter, you could provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory:
847
847
848
848
.. code-block:: yaml
@@ -957,7 +957,7 @@ Here, the value that the function ``get_mem_mb`` returns, grows linearly with th
957
957
Of course, any other arithmetic could be performed in that function.
958
958
959
959
Both threads and resources can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` and `--set-resources`, see :ref:`all_options`.
960
-
Or they can be defined via workflow :ref:`profiles`, with the variables listed above in the signature for usable callables.
960
+
Or they can be defined via workflow :ref:`executing-profiles`, with the variables listed above in the signature for usable callables.
961
961
You could, for example, provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory:
962
962
963
963
.. code-block:: yaml
@@ -1799,6 +1799,8 @@ or the short form
1799
1799
will generate skeleton code in ``notebooks/hello.py.ipynb`` and additionally print instructions on how to open and execute the notebook in VSCode.
1800
1800
1801
1801
1802
+
.. _snakefiles_protected_temp:
1803
+
1802
1804
Protected and Temporary Files
1803
1805
-----------------------------
1804
1806
@@ -3058,6 +3060,37 @@ To avoid such leaks (only required if your template does something like that wit
3058
3060
shell:
3059
3061
"sometool {input} {output}"
3060
3062
3063
+
.. _snakefiles_default_flags:
3064
+
3065
+
Setting default flags
3066
+
---------------------
3067
+
3068
+
Snakemake allows the annotation of input and output files via so-called flags (see e.g. :ref:`snakefiles_protected_temp`).
3069
+
Sometimes, it can be useful to define that a certain flag shall be applied to all input or output files of a workflow.
3070
+
This can be achieved via the global ``inputflags`` and ``outputflags`` directives.
3071
+
Consider the following example:
3072
+
3073
+
.. code-block:: python
3074
+
3075
+
outputflags:
3076
+
temp
3077
+
3078
+
rule a:
3079
+
output:
3080
+
"test.out"
3081
+
shell:
3082
+
"echo test > {output}"
3083
+
3084
+
Would automatically mark the output file of rule ``a`` as temporary.
3085
+
The most convenient use case of this mechanism occurs in combination with :ref:`access pattern annotation <storage-access-patterns>`.
3086
+
In this case, the default access pattern can be set globally for all output files of a workflow.
3087
+
Only a few cases that differ have then to deal with explicit access pattern annotation (see :ref:`storage-access-patterns` for an example).
3088
+
Whenever a rule defines a flag for a file, this flag will override the default flag of the same kind or any contradicting default flags (e.g. ``temp`` will override ``protected``).
3089
+
3090
+
Such default input and output flag specifications are always valid for all rules that follow them in the workflow definition.
3091
+
Importantly, they are also "namespaced" per module, meaning that ``inputflags`` and ``outputflags`` directives in a module only apply to the rules defined in that module.
Copy file name to clipboardExpand all lines: docs/snakefiles/storage.rst
+59Lines changed: 59 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,8 @@ In general, there are four ways to use a storage provider.
43
43
Using the S3 storage plugin, we will provide an example for all of the cases below.
44
44
For provider specific options (also for all options of the S3 plugin which are omitted here for brevity) and all available plugins see the `Snakemake plugin catalog <https://snakemake.github.io/snakemake-plugin-catalog>`_.
45
45
46
+
.. _default_storage:
47
+
46
48
As default provider
47
49
^^^^^^^^^^^^^^^^^^^
48
50
If you want all your input and output (which is not explicitly marked to come from
@@ -223,3 +225,60 @@ Usually, this can be done via environment variables, e.g. for S3::
223
225
224
226
export SNAKEMAKE_STORAGE_S3_ACCESS_KEY=...
225
227
export SNAKEMAKE_STORAGE_S3_SECRET_KEY=...
228
+
229
+
.. _storage-access-patterns:
230
+
231
+
Access pattern annotation
232
+
^^^^^^^^^^^^^^^^^^^^^^^^^
233
+
234
+
Storage providers can automatically optimize the provision of files based on how the files will be accessed by the respective job.
235
+
For example, if a file is only read sequentially, the storage provider can avoid downloading it and instead mount or symlink it (depending on the protocol) for ondemand access.
236
+
This can be beneficial, in particular if the sequential access involves only a small part of an otherwise large file.
237
+
The three access patterns that can be annotated are:
238
+
239
+
* ``access.sequential``: The file is read sequentially either from start to end or in (potentially disjoint) chunks, but always in order from the start to the end.
240
+
* ``access.random``: The file is read in a non-sequential order.
241
+
* ``access.multi``: The file is read sequentially, but potentially multiple times in parallel.
242
+
243
+
Snakemake considers an input file eligible for on-demand provisioning if it is accessed sequentially by one job in parallel.
244
+
In all other cases, multi-access, random access, or sequential access by multiple jobs in parallel, the storage provider will download the file to the local filesystem before it is accessed by jobs.
245
+
In case no access pattern is annotated (the default), Snakemake will also download the file.
246
+
247
+
The access patterns can be annotated via flags.
248
+
Usually, one would define sequential access as the default pattern (it should usually be the most common pattern in a workflow).
249
+
This can be done via the ``inputflags`` directive before defining any rule.
250
+
For specific files, the access pattern can be annotated by the respective flags ``access.sequential``, ``access.random``, or ``access.multi``.
251
+
252
+
.. code-block:: python
253
+
254
+
inputflags:
255
+
access.sequential
256
+
257
+
258
+
rule a:
259
+
input:
260
+
access.random("test1.in") # expected as local copy (because accessed randomly)
261
+
output:
262
+
"test1.out"
263
+
shell:
264
+
"cmd_b {input} {output}"
265
+
266
+
267
+
rule b:
268
+
input:
269
+
access.multi("test1.out") # expected as local copy (because accessed multiple times)
270
+
output:
271
+
"test2.{dataset}.out"
272
+
shell:
273
+
"cmd_b {input} {output}"
274
+
275
+
276
+
rule c:
277
+
input:
278
+
"test2.{dataset}.out"# expected as on-demand provisioning (because accessed sequentially, the default defined above)
279
+
output:
280
+
"test3.{dataset}.out"
281
+
shell:
282
+
"cmd_c {input} {output}"
283
+
284
+
Note that there is no guarantee that the storage provider makes use of this information, since the possibilities can vary between storage protocols and the development stage of the storage plugin.
0 commit comments