2
votes

I currently have a snakemake workflow that requires the use of lambda wildcards, set up as follows:

Snakefile:

configfile: "config.yaml"
workdir: config["work"]

rule all:
    input:
        expand("logs/bwa/{ref}.log", ref=config["refs"])

rule bwa_index:
    input:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
    output:
        "logs/bwa/{ref}.log"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

Config file:

work: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS

refs:
    12NC29: GCA_002873275.1_ASM287327v1_genomic
    12SD80: GCA_002873125.1_ASM287312v1_genomic

This works, but I've had to use a hack to get the output of bwa_index to play with the input of all. My hack is to generate a log file as part of bwa_index, set the log to the output of bwa_index, and then set the input of all to these log files. As I said, it works, but I don't like it. The problem is that the true outputs of bwa_index are of the format, for example, GCA_002873275.1_ASM287327v1_genomic.fna.sa. So, to specify these output files, I would need to use a lambda function for the output, something like:

rule bwa_index:
    input:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
    output:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.sa"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

and then use a lambda function with expand for the input of rule all. However, snakemake will not accept functions as output, so I'm at a complete loss how to do this (other than my hack). Does anyone have suggestions of a sensible solution? TIA!

1

1 Answers

3
votes

You can use a simple python function in the inputs (as the lambda function) so I suggest you use it for the rule all.

configfile: "config.yaml"
workdir: config["work"]

def getTargetFiles():
    targets = list()
    for r in config["refs"]:
        targets.append("data/"+config["refs"][r]+".fna.sa")

    return targets

rule all:
    input:
        getTargetFiles()

rule bwa_index:
    input:
        "data/{ref}.fna.gz"
    output:
        "data/{ref}.fna.sa"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

Careful here the wildcard {ref} is the value and not the key of your dictionnary so your log files will finally be named "logs/bwa/GCA_002873275.1_ASM287327v1_genomic.log", etc...