3
votes

Sorry if this is a naive question, but I'm still trying to wrap my head around the intricacies of Snakemake.

I have a directory containing a number of files that I want to apply a rule to in parallel (i.e. I want to submit the same script to the cluster, specifying a different input file for each submission).

I first tried using expand for the input files, but this only resulted in one job submission:

CHROMS = [str(c) for c in range(1, 23)] + ["X"]
rule vep:
    input:
        expand("data/split/chr{chrom}.vcf", 
               chrom=CHROMS)
    output:
        expand("data/vep/split/chr{chrom}.ann.vcf",
               chrom=CHROMS)
    shell:
        "vep "
        "{input} "
        "{output}"

Is there an alternative approach here?

Thank you!

1
Also, shell command in current form would result in error as each line in it would act as a separate command. Instead you have to escape newlines with ` i.e. "vep \ {input} \ {output}". Note: Newlines next to \ do not seem to format properly in stackoverflow comments.Manavalan Gajapathy
@JeeYem I haven't tried, but is it not possible that this syntax is actually valid, resulting in a concatenation of the 3 strings?bli
I briefly checked it yesterday and your syntax would cause problem. Try -p flag in snakemake to see the command that will be executed.Manavalan Gajapathy
@bli Syntax as used is actually valid and I was wrong. I saw similar implementation elsewhere and on testing, snakemake does concatenate them as you suggested. Sorry for the confusion. Not sure how I missed it the last time I tested it.Manavalan Gajapathy
@JeeYem I think this is a general python feature to allow cutting strings in several lines. If I remember well, I've seen it used for instance in argparse help messages.bli

1 Answers

3
votes

Currently, your workflow indeed consists in applying the "vep" rule only once, where it executes vep with all your inputs and outputs as arguments. I don't know how vep works, but it is likely either failing or not doing what you expect.

You should probably write your rule's input and output without expansion, and drive it using an "all" rule, that does the expand:

CHROMS = [str(c) for c in range(1, 23)] + ["X"]


rule all:
    input:
        expand("data/vep/split/chr{chrom}.ann.vcf",
               chrom=CHROMS)

rule vep:
    input:
        "data/split/chr{chrom}.vcf"
    output:
        "data/vep/split/chr{chrom}.ann.vcf"
    shell:
        "vep "
        "{input} "
        "{output}"

To generate the desired input of the "all" rule, snakemake will determine how many times and how (i.e. with what value for the chrom wildcard) it needs to apply the "vep" rule.

Be sure to put the "all" rule before all other rules.