Skip to content

Regression after 7.7.0 until 7.15.X: rules downstream of a checkpoint aggregation rule keep being re-executed even when all output files exist and nothing upstream has changed #1818

@hermidalc

Description

@hermidalc

Snakemake version

7.15.1

Describe the bug

Rules downstream of a checkpoint aggregation rule are always re-executed when both aggregation and downstream rules have already been run and output files exist.

Minimal example

Using the docs example and adding two rules downstream of aggregate. It keeps re-running the process and process2 rules even though output files exists and nothing upstream had changed. It doesn't matter if you have one rule after aggregate or multiple, it keeps re-running all of them even when the outputs exist and nothing upstream has changed. If you remove the downstream rules and go back to the exact docs code, things work as expected.

# a target rule to define the desired final output
rule all:
    input:
        "processed2.txt",


# the checkpoint that shall trigger re-evaluation of the DAG
# an number of file is created in a defined directory
checkpoint somestep:
    output:
        directory("my_directory/"),
    shell:
        """
        mkdir my_directory/
        cd my_directory
        for i in 1 2 3; do touch $i.txt; done
        """


# input function for rule aggregate, return paths to all files produced by the checkpoint 'somestep'
def aggregate_input(wildcards):
    checkpoint_output = checkpoints.somestep.get(**wildcards).output[0]
    return expand(
        "my_directory/{i}.txt",
        i=glob_wildcards(os.path.join(checkpoint_output, "{i}.txt")).i,
    )


rule aggregate:
    input:
        aggregate_input,
    output:
        "aggregated.txt",
    shell:
        "echo AGGREGATED > {output}"


rule process:
    input:
        "aggregated.txt",
    output:
        "processed.txt",
    shell:
        "echo PROCESSED > {output}"


rule process2:
    input:
        "processed.txt",
    output:
        "processed2.txt",
    shell:
        "echo PROCESSED2 > {output}"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions