use multiple parameters in snakemake

Question

I am just getting started with snakemake and was wondering what's the "correct" way to run a set of parameters on the same file and how this will work for chaining of rules?

So for example, when I want to have multiple normalization methods, followed by let's say a clustering rule with a varying number of k clusters. What would be the best way to do this such that all combinations are run?

I started doing this:

INFILES = ["mytable"]

rule preprocess:
input:
    bam=expand("data/{sample}.csv", sample=INFILES, param=config["normmethod"])

output:
    bamo=expand("results/{sample}_pp_{param}.csv", sample=INFILES, param=config["normmethod"])

script:
    "scripts/preprocess.py"

And then invoked the script via:

snakemake --config normmethod=Median

But that doesn't really scale for further options later in the workflow. For example, how would I incorporate these set of options automatically?

normmethods= ["Median", "Quantile"]
kclusters= [1,3,5,7,10]

The expand in your input contains param, which does not appear in the string that has to be expanded. How does it behave? — bli
The code above runs through. If I do 'snakemake --config normmethod=Median' the "Median" method is used. If I run the workflow with 'snakemake --config normmethod=Mean', the mean is used. Accordingly, the output files carry the "normmethod param" in their filename. — Kam Sen

Pereira Hugo Pereira Hugo · Accepted Answer · 2017-02-13T16:42:17

You did well using the function expand() in your rule.

For the parameters I recommend to use a configuration file containing all your parameters. Snakemake works with YAML & JSON files. Here you got all the information about these two formats:

YAML : http://docs.ansible.com/ansible/YAMLSyntax.html
JSON : http://json.org/example.html

In your case you just have in a YAML file to write this :

INFILES : "mytables"

normmethods : ["Median", "Quantile"] 
or
normmethods : - "Median"
              - "Quantile"

kclusters : [1,3,5,7,10]
or
kclusters : - 1
            - 3
            - 5
            - 7
            - 10

Write your rule like this :

rule preprocess:
input:
    bam = expand("data/{sample}.csv",
                 sample = config["INFILES"])

params :
    kcluster = config["kcluster"]

output:
    bamo = expand("results/{sample}_pp_{method}_{cluster}.csv",
                  sample = config["INFILES"],
                  method = config["normmethod"],
                  cluster = config["kcluster"])

script:
    "scripts/preprocess.py {input.bam} {params.kcluster}"

Then you just have to lunch like this :

snakemake --configfile  path/to/config.yml

For running with others parameters you will have to modify your configuration file and not your snakefile (making less mistakes) and it is better for the readability and code beauty.

EDIT :

  rule preprocess:
    input:
      bam = "data/{sample}.csv"

Just to correct my own mistake, you don't need to use expand here on the input since you just want to run the rule one file .csv by one. So just put the wildcard here and Snakemake will do his part.

use multiple parameters in snakemake

3 Answers