Skip to content

Commit 000c13f

Browse files
authored
Merge branch 'main' into fix-3713
2 parents 88b9399 + 06ba7e6 commit 000c13f

55 files changed

Lines changed: 78604 additions & 329 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Changelog
22

33

4+
## [9.10.1](https://github.com/snakemake/snakemake/compare/v9.10.0...v9.10.1) (2025-09-01)
5+
6+
7+
### Performance Improvements
8+
9+
* optimize persistence implementation (only write metadata once, reduce file operations for improving glusterfs performance) ([#3679](https://github.com/snakemake/snakemake/issues/3679)) ([122c713](https://github.com/snakemake/snakemake/commit/122c71379eeef6799a4448428594ec4f9b5b43ec))
10+
411
## [9.10.0](https://github.com/snakemake/snakemake/compare/v9.9.0...v9.10.0) (2025-08-19)
512

613

docs/snakefiles/reporting.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,3 +329,26 @@ For example, this allows you to set a logo at the top (by using CSS to inject a
329329
For an example with a custom stylesheet defining a logo, see :download:`the report here <../../tests/test_report/expected-results/report.html>` (with a custom branding for the University of Duisburg-Essen).
330330
For the complete mechanics, you can also have a look at the `full example source code <https://github.com/snakemake/snakemake/tree/main/tests/test_report/>`__ and :download:`the custom stylesheet with the logo definition <../../tests/test_report/custom-stylesheet.css>`.
331331

332+
Custom report metadata
333+
^^^^^^^^^^^^^^^^^^^^^^
334+
335+
You can define custom metadata that is displayed on the landing page of the report.
336+
The metadata is provided as a `YTE <https://yte-template-engine.github.io>`_ yaml template.
337+
338+
.. code-block:: bash
339+
340+
snakemake --report report.html --report-metadata yte_template.yaml
341+
342+
An example metadata yaml template that contains information about the work directory in which the workflow was run contains the following definitions.
343+
344+
.. code-block:: yaml
345+
346+
__definitions__:
347+
- import os
348+
349+
Workflow name: Test Workflow
350+
Workdir: ?os.getcwd()
351+
Contributors:
352+
- Test Contributor
353+
- Another Contributor
354+

docs/snakefiles/testing.rst

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,26 @@ By running
1313
1414
Snakemake is instructed to take one representative job for each rule and copy its input files to a hidden folder ``.tests/unit``,
1515
along with generating test cases for Pytest_.
16+
Pytest_ tests can be run as:
1617

17-
Importantly, note that such unit tests shall not be generated from big data, as they should usually be finished in a few seconds.
18-
Further, it makes sense to store the generated unit tests in version control (e.g. git), such that huge files are not recommended.
19-
Instead, we suggest to first execute the workflow that shall be tested with some kind of small dummy datasets, and then use the results thereof to generate the unit tests.
20-
The small dummy datasets can in addition be used to generate an integration test, that could e.g. be stored under ``.tests/integration``, next to the unit tests.
18+
.. code-block:: bash
19+
20+
pytest .tests/unit/
21+
22+
or, optionally, if you want to use a local conda cache and disable pytest caching:
23+
24+
.. code-block:: bash
25+
26+
pytest -p no:cacheprovider .tests/unit/ --conda-prefix /path/to/cache/conda/
2127
2228
Each auto-generated unit test is stored in a file ``.tests/unit/test_<rulename>.py``, and executes just the one representative job of the respective rule.
2329
After successful execution of the job, it will compare the obtained results with those that have been present when running ``snakemake --generate-unit-tests``.
24-
By default, the comparison happens byte by byte (using ``cmp``). This behavior can be overwritten by modifying the test file.
30+
By default, the comparison happens byte by byte (using ``cmp/zcmp/bzcmp/xzcmp``). This behavior can be overwritten by modifying the test file.
31+
32+
NOTE: Importantly, such unit tests shall not be generated from big data, as they should usually be finished in a few seconds.
33+
Furthermore, it makes sense to store the generated unit tests in version control (e.g. git), such that huge files are also not recommended.
34+
Instead, we suggest to first execute the workflow that shall be tested with some kind of small dummy datasets while keeping all temp files (``--notemp``),
35+
and then use the results thereof to generate the unit tests.
36+
The small dummy datasets can in addition be used to generate an integration test, that could e.g. be stored under ``.tests/integration``, next to the unit tests.
2537

26-
.. _Pytest: https://pytest.org
38+
.. _Pytest: https://pytest.org

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ dependencies = [
5151
"snakemake-interface-executor-plugins>=9.3.2,<10.0",
5252
"snakemake-interface-common>=1.20.1,<2.0",
5353
"snakemake-interface-storage-plugins>=4.1.0,<5.0",
54-
"snakemake-interface-report-plugins>=1.1.0,<2.0.0",
54+
"snakemake-interface-report-plugins>=1.2.0,<2.0.0",
5555
"snakemake-interface-logger-plugins>=1.1.0,<2.0.0",
5656
"snakemake-interface-scheduler-plugins>=2.0.0,<3.0.0",
5757
"tabulate",

src/snakemake/api.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
GroupSettings,
2020
SchedulingSettings,
2121
WorkflowSettings,
22+
GlobalReportSettings,
2223
)
2324

2425
if sys.version_info < MIN_PY_VERSION:
@@ -677,13 +678,15 @@ def create_report(
677678
self,
678679
reporter: str = "html",
679680
report_settings: Optional[ReportSettingsBase] = None,
681+
global_report_settings: Optional[GlobalReportSettings] = None,
680682
):
681683
"""Create a report for the workflow.
682684
683685
Arguments
684686
---------
685687
report: Path -- The path to the report.
686-
report_stylesheet: Optional[Path] -- The path to the report stylesheet.
688+
report_settings: Optional[ReportSettingsBase] -- Report settings for the html report.
689+
global_report_settings: Optional[GlobalReportSettings] -- Report settings that apply to all report plugins.
687690
reporter: str -- report plugin to use (default: html)
688691
"""
689692

@@ -693,9 +696,13 @@ def create_report(
693696
if report_settings is not None:
694697
report_plugin.validate_settings(report_settings)
695698

699+
if global_report_settings is None:
700+
global_report_settings = GlobalReportSettings()
701+
696702
self.workflow_api._workflow.create_report(
697703
report_plugin=report_plugin,
698704
report_settings=report_settings,
705+
global_report_settings=global_report_settings,
699706
)
700707

701708
@_no_exec

src/snakemake/cli.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
WorkflowSettings,
6868
StrictDagEvaluation,
6969
PrintDag,
70+
GlobalReportSettings,
7071
)
7172
from snakemake.target_jobs import parse_target_jobs_cli_args
7273
from snakemake.utils import available_cpu_count, update_config
@@ -719,7 +720,9 @@ def get_argument_parser(profiles=None):
719720
"--keep-going",
720721
"-k",
721722
action="store_true",
722-
help="Go on with independent jobs if a job fails.",
723+
help="Go on with independent jobs if a job fails during execution. "
724+
"This only applies to runtime failures in job execution, "
725+
"not to errors during workflow parsing or DAG construction.",
723726
)
724727
group_exec.add_argument(
725728
"--rerun-triggers",
@@ -945,6 +948,13 @@ def get_argument_parser(profiles=None):
945948
help="Custom stylesheet to use for report. In particular, this can be used for "
946949
"branding the report with e.g. a custom logo, see docs.",
947950
)
951+
group_report.add_argument(
952+
"--report-metadata",
953+
metavar="FILE",
954+
type=Path,
955+
help="Custom metadata to use for the landing page of the report. In particular, "
956+
"this can be used to provide metadata in the report e.g. the work directory, see docs.",
957+
)
948958
group_report.add_argument(
949959
"--reporter",
950960
metavar="PLUGIN",
@@ -2102,6 +2112,9 @@ def args_to_api(args, parser):
21022112
dag_api.create_report(
21032113
reporter=args.reporter,
21042114
report_settings=report_settings,
2115+
global_report_settings=GlobalReportSettings(
2116+
metadata_template=args.report_metadata
2117+
),
21052118
)
21062119
elif args.generate_unit_tests:
21072120
dag_api.generate_unit_tests(args.generate_unit_tests)

src/snakemake/persistence.py

Lines changed: 93 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from snakemake_interface_executor_plugins.persistence import (
2323
PersistenceExecutorInterface,
2424
)
25+
from snakemake_interface_executor_plugins.settings import ExecMode
2526

2627
from snakemake.common.tbdstring import TBDString
2728
import snakemake.exceptions
@@ -311,58 +312,64 @@ async def finished(self, job):
311312
# do not store metadata if not requested
312313
return
313314

314-
code = self._code(job.rule)
315-
input = self._input(job)
316-
log = self._log(job)
317-
params = self._params(job)
318-
shellcmd = job.shellcmd
319-
conda_env = self._conda_env(job)
320-
software_stack_hash = self._software_stack_hash(job)
321-
fallback_time = time.time()
322-
for f in job.output:
323-
rec_path = self._record_path(self._incomplete_path, f)
324-
starttime = os.path.getmtime(rec_path) if os.path.exists(rec_path) else None
325-
# Sometimes finished is called twice, if so, lookup the previous starttime
326-
if not os.path.exists(rec_path):
327-
starttime = self._read_record(self._metadata_path, f).get(
328-
"starttime", None
315+
if (
316+
self.dag.workflow.exec_mode == ExecMode.DEFAULT
317+
or self.dag.workflow.remote_execution_settings.immediate_submit
318+
):
319+
code = self._code(job.rule)
320+
input = self._input(job)
321+
log = self._log(job)
322+
params = self._params(job)
323+
shellcmd = job.shellcmd
324+
conda_env = self._conda_env(job)
325+
software_stack_hash = self._software_stack_hash(job)
326+
fallback_time = time.time()
327+
for f in job.output:
328+
rec_path = self._record_path(self._incomplete_path, f)
329+
starttime = (
330+
os.path.getmtime(rec_path) if os.path.exists(rec_path) else None
329331
)
332+
# Sometimes finished is called twice, if so, lookup the previous starttime
333+
if not os.path.exists(rec_path):
334+
starttime = self._read_record(self._metadata_path, f).get(
335+
"starttime", None
336+
)
330337

331-
endtime = (
332-
(await f.mtime()).local_or_storage()
333-
if await f.exists()
334-
else fallback_time
335-
)
338+
endtime = (
339+
(await f.mtime()).local_or_storage()
340+
if await f.exists()
341+
else fallback_time
342+
)
336343

337-
checksums = (
338-
(infile, await infile.checksum(self.max_checksum_file_size))
339-
for infile in job.input
340-
)
341-
self._record(
342-
self._metadata_path,
343-
{
344-
"record_format_version": RECORD_FORMAT_VERSION,
345-
"code": code,
346-
"rule": job.rule.name,
347-
"input": input,
348-
"log": log,
349-
"params": params,
350-
"shellcmd": shellcmd,
351-
"incomplete": False,
352-
"starttime": starttime,
353-
"endtime": endtime,
354-
"job_hash": hash(job),
355-
"conda_env": conda_env,
356-
"software_stack_hash": software_stack_hash,
357-
"container_img_url": job.container_img_url,
358-
"input_checksums": {
359-
infile: checksum
360-
async for infile, checksum in checksums
361-
if checksum is not None
344+
checksums = (
345+
(infile, await infile.checksum(self.max_checksum_file_size))
346+
for infile in job.input
347+
)
348+
self._record(
349+
self._metadata_path,
350+
{
351+
"record_format_version": RECORD_FORMAT_VERSION,
352+
"code": code,
353+
"rule": job.rule.name,
354+
"input": input,
355+
"log": log,
356+
"params": params,
357+
"shellcmd": shellcmd,
358+
"incomplete": False,
359+
"starttime": starttime,
360+
"endtime": endtime,
361+
"job_hash": hash(job),
362+
"conda_env": conda_env,
363+
"software_stack_hash": software_stack_hash,
364+
"container_img_url": job.container_img_url,
365+
"input_checksums": {
366+
infile: checksum
367+
async for infile, checksum in checksums
368+
if checksum is not None
369+
},
362370
},
363-
},
364-
f,
365-
)
371+
f,
372+
)
366373
# remove incomplete marker only after creation of metadata record.
367374
# otherwise the job starttime will be missing.
368375
self._remove_incomplete_marker(job)
@@ -639,30 +646,32 @@ def _params(self, job: Job):
639646
def _output(self, job):
640647
return sorted(job.output)
641648

642-
def _record(self, subject, json_value, id):
649+
def _record(
650+
self,
651+
subject,
652+
json_value,
653+
id,
654+
mode=stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IWGRP,
655+
):
643656
recpath = self._record_path(subject, id)
644-
recdir = os.path.dirname(recpath)
645-
os.makedirs(recdir, exist_ok=True)
646-
# Write content to temporary file and rename it to the final file.
647-
# This avoids race-conditions while writing (e.g. on NFS when the main job
648-
# and the cluster node job propagate their content and the system has some
649-
# latency including non-atomic propagation processes).
650-
with tempfile.NamedTemporaryFile(
651-
mode="w",
652-
dir=recdir,
653-
delete=False,
654-
# Add short prefix to final filename for better debugging.
655-
# This may not be the full one, because that may be too long
656-
# for the filesystem in combination with the prefix from the temp
657-
# file.
658-
suffix=f".{os.path.basename(recpath)[:8]}",
659-
) as tmpfile:
660-
json.dump(json_value, tmpfile)
661-
# ensure read and write permissions for user and group
662-
os.chmod(
663-
tmpfile.name, stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IWGRP
664-
)
665-
os.replace(tmpfile.name, recpath)
657+
try:
658+
recpath_stat = os.stat(recpath)
659+
except FileNotFoundError:
660+
recpath_stat = None
661+
recdir = os.path.dirname(recpath)
662+
os.makedirs(recdir, exist_ok=True)
663+
664+
with open(recpath, "w") as recfile:
665+
json.dump(json_value, recfile)
666+
667+
# ensure read and write permissions for user and group if they don't include the required mode
668+
if recpath_stat is None:
669+
os.chmod(recpath, mode)
670+
else:
671+
existing = stat.S_IMODE(recpath_stat.st_mode)
672+
new_mode = existing | mode
673+
if existing != new_mode:
674+
os.chmod(recpath, new_mode)
666675

667676
def _delete_record(self, subject, id):
668677
try:
@@ -687,15 +696,21 @@ def _read_record_cached(self, subject, id):
687696
def _read_record_uncached(self, subject, id):
688697
if not self._exists_record(subject, id):
689698
return dict()
690-
with open(self._record_path(subject, id), "r") as f:
699+
path = self._record_path(subject, id)
700+
with open(path, "r") as f:
691701
try:
692702
return json.load(f)
693-
except json.JSONDecodeError as e:
694-
pass
695-
# case: file is corrupted, delete it
696-
logger.warning("Deleting corrupted metadata record.")
697-
self._delete_record(subject, id)
698-
return dict()
703+
except json.JSONDecodeError:
704+
# Since record writing cannot be reliably made atomic (some network
705+
# filesystems, e.g. gluster have issues with writing to a temp file
706+
# and then moving) we ignore corrupted or incompletely written records
707+
# here.
708+
# They can only occur if a snakemake process is running and one does a
709+
# dry-run (or intentionally disables locking) at the same time.
710+
logger.warning(
711+
f"Ignore corrupted or currently written metadata record {path}."
712+
)
713+
return dict()
699714

700715
def _exists_record(self, subject, id):
701716
return os.path.exists(self._record_path(subject, id))

0 commit comments

Comments
 (0)