Snakemake version
7.15.1
Describe the bug
Rules downstream of a checkpoint aggregation rule are always re-executed when both aggregation and downstream rules have already been run and output files exist.
Minimal example
Using the docs example and adding two rules downstream of aggregate. It keeps re-running the process and process2 rules even though output files exists and nothing upstream had changed. It doesn't matter if you have one rule after aggregate or multiple, it keeps re-running all of them even when the outputs exist and nothing upstream has changed. If you remove the downstream rules and go back to the exact docs code, things work as expected.
# a target rule to define the desired final output
rule all:
input:
"processed2.txt",
# the checkpoint that shall trigger re-evaluation of the DAG
# an number of file is created in a defined directory
checkpoint somestep:
output:
directory("my_directory/"),
shell:
"""
mkdir my_directory/
cd my_directory
for i in 1 2 3; do touch $i.txt; done
"""
# input function for rule aggregate, return paths to all files produced by the checkpoint 'somestep'
def aggregate_input(wildcards):
checkpoint_output = checkpoints.somestep.get(**wildcards).output[0]
return expand(
"my_directory/{i}.txt",
i=glob_wildcards(os.path.join(checkpoint_output, "{i}.txt")).i,
)
rule aggregate:
input:
aggregate_input,
output:
"aggregated.txt",
shell:
"echo AGGREGATED > {output}"
rule process:
input:
"aggregated.txt",
output:
"processed.txt",
shell:
"echo PROCESSED > {output}"
rule process2:
input:
"processed.txt",
output:
"processed2.txt",
shell:
"echo PROCESSED2 > {output}"