I'm relatively new with Snakemake, and I'm having some troubles figuring out how to counts the number of jobs per rule. The snakefile I am using is below
rule test:
input:
files = expand("{file}", file=glob.glob("/home/MyData/input/*.csv"))
output:
out = expand("{file}", file=glob.glob("/home/MyData/output/*.csv"))
run:
with open(output.out, 'r') as input_stream:
for file in input_stream:
print(file)
The Jobs count
shows the following (when ran with snakemake -j 4 test -n
)
Job counts:
count jobs
1 test
1
However, going through a snakemake tutorial I found online (link here), his snakefile looks like this:
configfile: "config.yaml"
rule all:
input:
"plots/quals.svg",
"calls/all.vcf",
"mapped/",
"mapped/"
rule map_reads:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
pipe("mapped/{sample}.bam")
conda:
"envs/mapping.yaml"
shell:
"bwa mem {input} | samtools view -Sb > {output}"
rule sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
conda:
"envs/mapping.yaml"
shell:
"samtools sort -o {output} {input}"
rule call:
input:
genome="data/genome.fa",
bam=expand("mapped/{sample}.sorted.bam", sample=config["samples"])
output:
"calls/all.vcf"
conda:
"envs/calling.yaml"
shell:
"samtools mpileup -g -f {input.genome} {input.bam} | "
"bcftools call -mv - > {output}"
rule plot_qual:
input:
"calls/all.vcf"
output:
svg=report("plots/quals.svg", caption="report/plot-quals.rst")
conda:
"envs/stats.yaml"
script:
"scripts/plot-quals.py"
And the Job counts
looks like this (when run with snakemake -j 4 all -n
)
Job counts:
count jobs
1 all
1 call
3 map_reads
1 plot_qual
3 sort
9
With the config.yaml
file looking like:
samples:
- A
- B
- C
How can I get my Job counts
to show the number of input files run per rule?
test
rule has only one "instance" because it deals with multiple files by itself. If you want several "instances" of a rule to happen, you should make a rule that deals with one file, and another one that wants as input multiple files. Then snakemake will figure out that it needs to run the first one multiple times in order to produce the input that the other wants, and make as many jobs. – bliopen(output.out, 'r')
works, given thatoutput.out
is a list (it is generated byexpand
, so it is a list.). I wrote some explanations aboutexpand
here: stackoverflow.com/a/50216057/1878788 You may be interested at other examples here stackoverflow.com/a/50837428/1878788 and here stackoverflow.com/a/44945591/1878788 – bli