I'm running into a case where snakemake is rerunning rules that have already been run, even though the output from those rules is still present. I am specifying all the desired output files in a "rule all". I ran the pipeline the point where I had all the desired outputs from "rule B", and wanted to restart the pipeline and just run rule A. But snakemake reruns "rule B" even though all the outputs from "rule B" are already present. This isn't the behavior I expect from snakemake, which should only rerun the rules necessary to get to a target (here specified in the rule all).
When I run snakemake in a dry-run mode, this is at the end of the output.
Job counts: count jobs
1 count_matrix 27 picard_fastq2sam 27 star_align 55
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
The output from picard_fastq2sam is used by star_align and all the outputs from star_align are used for count_matrix. So there should only one rule to run, because the outputs from "picard_fastq2sam" and "star_align" are all already present. My rule all looks like this:
This workflow is based on the template from https://github.com/snakemake-workflows/rna-seq-star-deseq2, but I've modified it enough that I thought I should post here.
snakemake version is 6.0.5
Any hints on where to look? This is really the opposite of the behavior I expect from snakemake.
When I looked further into the debug-dag output, I saw a bunch of these blocks of text:
candidate job star_align
wildcards: sample=8R_S14, unit=lane1
candidate job picard_fastq2sam
wildcards: sample=8R_S14, unit=lane1
selected job picard_fastq2sam
wildcards: sample=8R_S14, unit=lane1
file results/picard_fastq2sam/8R_S14-lane1.unaligned.bam:
Producer found, hence exceptions are ignored.
selected job star_align
wildcards: sample=8R_S14, unit=lane1
file results/star/8R_S14-lane1.ReadsPerGene.out.tab:
Producer found, hence exceptions are ignored.
I'm not sure what means, but it seems relevant.