0
votes

I try to use snakemake for map and merge some data obtaine from many lanes. I have some problems. What I want to do is this:

*.gz> 432_L001.sam, 432_L002.sam > 432_L001.sorted.bam,432_L002.sorted.bam> 432.bam

So starting from fastq from units create a unic bamfile with the name of the key of samples.

config.yaml

samples:
    "432": ["432_L001", "432_L002"]
    "433": ["433_LOO1","433_L002"]


units:

  "432_L001": [ "RAW/432_CGATGT_L001_R1_001.fastq.gz", "RAW/432_CGATGT_L001_R2_001.fastq.gz"]
  "432_L002": ["RAW/432_CGATGT_L002_R1_001.fastq.gz","RAW/432_CGATGT_L002_R2_001.fastq.gz"]
  "433_L001": ["RAW/433_CAGATC_L001_R1_001.fastq.gz","RAW/433_CAGATC_L001_R2_001.fastq.gz"]
  "433_L002": ["RAW/433_CAGATC_L002_R1_001.fastq.gz","RAW/433_CAGATC_L002_R2_001.fastq.gz"]

snakemake

rule all:
    input: expand("mapped_reads/merged_samples/{A}.bam", A=config["samples"]),
           expand("mapped_reads/bam/{unit}_sorted.bam",unit=config['units'])


include_prefix="rules"


include:
    include_prefix + "/bwa_mem.rules"
include:
    include_prefix + "/samfiles.rules"
include:
    include_prefix + "/picard.rules"

rules

    from snakemake.exceptions import MissingInputException

    rule bwa_mem:
        input:
            lambda wildcards: config["units"][wildcards.unit]
        output:
            temp("mapped_reads/sam/{unit}.sam")
        params:
            #sample=lambda wildcards, UNIT_TO_SAMPLE[wildcards.unit]
            #sample=lambda wildcards: units[wildcards.unit],
            genome= config["reference"]['genome_fasta']
        log:
            "mapped_reads/log/{unit}_bwa_mem.log"
        benchmark:
            "benchmarks/bwa/mem/{unit}.txt"
        threads: 8
        shell:
            '/illumina/software/PROG2/bwa-0.7.15/bwa mem '\
                    '-t {threads} {params.genome} {input} 2> {log} > {output}'
rule picard_SortSam:
   input:
       "mapped_reads/sam/{unit}.sam"
   output:
       temp("mapped_reads/bam/{unit}_sorted.bam")
   benchmark:
       "benchmarks/picard/SortSam/{unit}.txt"
   shell:
       "picard  SortSam I={input} O={output} SO=coordinate"

rule samtools_merge_bam:
    """
    Merge bam files for multiple units into one for the given sample.
    If the sample has only one unit, files will be copied.
    """
    input:
        lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
    output:
        "mapped_reads/merged_samples/{sample}.bam"
    benchmark:
        "benchmarks/samtools/merge/{sample}.txt"
    run:
        if len(input) > 1:
            shell("samtools merge {output} {input}")
        else:
            shell("cp {input} {output} && touch -h {output}")

If I use this code I have always this error:

InputFunctionException in line 50 of /home/maurizio/Desktop/TEST_exome/rules/bwa_mem.rules:
KeyError: '433_LOO1'
Wildcards:
unit=433_LOO1

How can resolve?

What it is wrong in this wildcard..??:

lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])

1

1 Answers

1
votes

You seem to have a confusion between "0" and "O" in your config:

samples: "432": ["432_L001", "432_L002"] "433": ["433_LOO1","433_L002"] -----------------------------^^

"433_L001": ["RAW/433_CAGATC_L001_R1_001.fastq.gz","RAW... -----------^^