I try to use snakemake for map and merge some data obtaine from many lanes. I have some problems. What I want to do is this:
*.gz> 432_L001.sam, 432_L002.sam > 432_L001.sorted.bam,432_L002.sorted.bam> 432.bam
So starting from fastq from units create a unic bamfile with the name of the key of samples.
config.yaml
samples:
"432": ["432_L001", "432_L002"]
"433": ["433_LOO1","433_L002"]
units:
"432_L001": [ "RAW/432_CGATGT_L001_R1_001.fastq.gz", "RAW/432_CGATGT_L001_R2_001.fastq.gz"]
"432_L002": ["RAW/432_CGATGT_L002_R1_001.fastq.gz","RAW/432_CGATGT_L002_R2_001.fastq.gz"]
"433_L001": ["RAW/433_CAGATC_L001_R1_001.fastq.gz","RAW/433_CAGATC_L001_R2_001.fastq.gz"]
"433_L002": ["RAW/433_CAGATC_L002_R1_001.fastq.gz","RAW/433_CAGATC_L002_R2_001.fastq.gz"]
snakemake
rule all:
input: expand("mapped_reads/merged_samples/{A}.bam", A=config["samples"]),
expand("mapped_reads/bam/{unit}_sorted.bam",unit=config['units'])
include_prefix="rules"
include:
include_prefix + "/bwa_mem.rules"
include:
include_prefix + "/samfiles.rules"
include:
include_prefix + "/picard.rules"
rules
from snakemake.exceptions import MissingInputException
rule bwa_mem:
input:
lambda wildcards: config["units"][wildcards.unit]
output:
temp("mapped_reads/sam/{unit}.sam")
params:
#sample=lambda wildcards, UNIT_TO_SAMPLE[wildcards.unit]
#sample=lambda wildcards: units[wildcards.unit],
genome= config["reference"]['genome_fasta']
log:
"mapped_reads/log/{unit}_bwa_mem.log"
benchmark:
"benchmarks/bwa/mem/{unit}.txt"
threads: 8
shell:
'/illumina/software/PROG2/bwa-0.7.15/bwa mem '\
'-t {threads} {params.genome} {input} 2> {log} > {output}'
rule picard_SortSam:
input:
"mapped_reads/sam/{unit}.sam"
output:
temp("mapped_reads/bam/{unit}_sorted.bam")
benchmark:
"benchmarks/picard/SortSam/{unit}.txt"
shell:
"picard SortSam I={input} O={output} SO=coordinate"
rule samtools_merge_bam:
"""
Merge bam files for multiple units into one for the given sample.
If the sample has only one unit, files will be copied.
"""
input:
lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
output:
"mapped_reads/merged_samples/{sample}.bam"
benchmark:
"benchmarks/samtools/merge/{sample}.txt"
run:
if len(input) > 1:
shell("samtools merge {output} {input}")
else:
shell("cp {input} {output} && touch -h {output}")
If I use this code I have always this error:
InputFunctionException in line 50 of /home/maurizio/Desktop/TEST_exome/rules/bwa_mem.rules:
KeyError: '433_LOO1'
Wildcards:
unit=433_LOO1
How can resolve?
What it is wrong in this wildcard..??:
lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])