snakemake : rule's input with different pattern

Question

I am new to snakemake and would like to use the following rule :

input_path = config["PATH"]
samples = pd.read_csv(config["METAFILE"], sep = '\t', header = 0)['sample']

rule getPaired:
        output:
            fwd = temp(tmp_path + "/reads/{sample}_fwd.fastq.gz"),
            rev = temp(tmp_path + "/reads/{sample}_rev.fastq.gz")
        params:
            input_path = input_path
        run:
            shell("scp -i {params.input_path}/{wildcards.sample}_*1*.f*q.gz {output.fwd}"),
            shell("scp -i {params.input_path}/{wildcards.sample}_*2*.f*q.gz {output.rev}")

Input files have different patterns :

{sampleID}_R[1-2]_001.fq.gz (for example : 2160_J15_S480_R1_001.fastq.gz)
{sampleID}_[1-2].fq.gz (for example : SRX000001_1.fq.gz)

The getPaired rule works for input like {sample}_[1-2].fq.gz but not for the second pattern.

What am I doing wrong ?

Maarten-vd-Sande Maarten-vd-Sande · Accepted Answer · 2020-02-21T11:07:57

You should make use of input functions. I made an example which isn't really what you need, but I think it should clearly show what you want to achieve:

paths = {'sample1': '/home/jankees/data',
         'sample2': '/mnt/data',
         'sample3': '/home/christina/fastq'}

extensions = {'sample1': '.fq.gz',
              'sample2': '.fq.gz',
              'sample3': '.fastq.gz'}

def get_input(wildcards):
    input_file = paths[wildcards.sample] + "/read/" + wildcards.sample + extensions[wildcards.sample]
    return input_file

rule all:
    input:
        ["sample1_trimmed.fastq.gz", 
         "sample2_trimmed.fastq.gz", 
         "sample3_trimmed.fastq.gz"]

rule trim:
    input:
        get_input
    output:
        "{sample}_trimmed.fastq.gz"
    shell:
        "touch {output}"

snakemake : rule's input with different pattern

1 Answers