snakemake “rule all” is rerunning unnecessary rules

Question

I'm running into a case where snakemake is rerunning rules that have already been run, even though the output from those rules is still present. I am specifying all the desired output files in a "rule all". I ran the pipeline the point where I had all the desired outputs from "rule B", and wanted to restart the pipeline and just run rule A. But snakemake reruns "rule B" even though all the outputs from "rule B" are already present. This isn't the behavior I expect from snakemake, which should only rerun the rules necessary to get to a target (here specified in the rule all).

When I run snakemake in a dry-run mode, this is at the end of the output.

Job counts: count jobs

   1       count_matrix
   27      picard_fastq2sam
   27      star_align
   55
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

The output from picard_fastq2sam is used by star_align and all the outputs from star_align are used for count_matrix. So there should only one rule to run, because the outputs from "picard_fastq2sam" and "star_align" are all already present. My rule all looks like this:

This workflow is based on the template from https://github.com/snakemake-workflows/rna-seq-star-deseq2, but I've modified it enough that I thought I should post here.

snakemake version is 6.0.5

Any hints on where to look? This is really the opposite of the behavior I expect from snakemake.

When I looked further into the debug-dag output, I saw a bunch of these blocks of text:
candidate job star_align
wildcards: sample=8R_S14, unit=lane1
candidate job picard_fastq2sam
wildcards: sample=8R_S14, unit=lane1
selected job picard_fastq2sam
wildcards: sample=8R_S14, unit=lane1
file results/picard_fastq2sam/8R_S14-lane1.unaligned.bam:
Producer found, hence exceptions are ignored.

selected job star_align
wildcards: sample=8R_S14, unit=lane1
file results/star/8R_S14-lane1.ReadsPerGene.out.tab:
Producer found, hence exceptions are ignored.

I'm not sure what means, but it seems relevant.

Daniel Ence Daniel Ence · Accepted Answer · 2021-03-28T20:19:16

I had forgotten that snakemake checks the timestamp of files in addition to whether they are present or not. I figured this out when I found this post: How to not rerule updated files with Snakemake

and fixed in my case by rerunning with "--touch" and "--forceall" to update the timestamps of all the output files.

snakemake “rule all” is rerunning unnecessary rules

1 Answers