I am using Snakemake to submit jobs to the cluster. I am facing a situation where I would like to force a particular rule to run only after all other rules have run - this is because the input files for this job (R script) are not yet ready.
I happened to see this on the Snakemake documentation page where it states one can force rule execution order - https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#flag-files
I have different rules, but for sake of simplicity, I am showing my Snakefile and the last 2 rules below (rsem_model and tximport_rsem). On my qsub cluster workflow, I want tximport_rsem to execute only after rsem_model has finished and I tried the "touchfile" method but I am not able to get it working successfully.
# Snakefile
rule all:
input:
expand("results/fastqc/{sample}_fastqc.zip",sample=samples),
expand("results/bbduk/{sample}_trimmed.fastq",sample=samples),
expand("results/bbduk/{sample}_trimmed_fastqc.zip",sample=samples),
expand("results/bam/{sample}_Aligned.toTranscriptome.out.bam",sample=samples),
expand("results/bam/{sample}_ReadsPerGene.out.tab",sample=samples),
expand("results/quant/{sample}.genes.results",sample=samples),
expand("results/quant/{sample}_diagnostic.pdf",sample=samples),
expand("results/multiqc/project_QS_STAR_RSEM_trial.html"),
expand("results/rsem_tximport/RSEM_GeneLevel_Summarization.csv"),
expand("mytask.done")
rule clean:
shell: "rm -rf .snakemake/"
include: 'rules/fastqc.smk'
include: 'rules/bbduk.smk'
include: 'rules/fastqc_after.smk'
include: 'rules/star_align.smk'
include: 'rules/rsem_norm.smk'
include: 'rules/rsem_model.smk'
include: 'rules/tximport_rsem.smk'
include: 'rules/multiqc.smk'
rule rsem_model:
input:
'results/quant/{sample}.genes.results'
output:
'results/quant/{sample}_diagnostic.pdf'
params:
plotmodel = config['rsem_plot_model'],
prefix = 'results/quant/{sample}',
touchfile = 'mytask.done'
threads: 16
priority: 60
shell:"""
touch {params.touchfile}
{params.plotmodel} {params.prefix} {output}
"""
rule tximport_rsem:
input: 'mytask.done'
output:
'results/rsem_tximport/RSEM_GeneLevel_Summarization.csv'
priority: 50
shell: "Rscript scripts/RSEM_tximport.R"
Here is the error I get when I try to do a dry-run
snakemake -np
Building DAG of jobs...
MissingInputException in line 1 of /home/yh6314/rsem/tutorials/QS_Snakemake/rules/tximport_rsem.smk:
Missing input files for rule tximport_rsem:
mytask.done
One important thing to note: If I try running this on the head node, I do not have to do "touch file" and everything works fine.
I would appreciate suggestions and help to figure out a workaround.
Thanks in advance.