I am new in using snakemake, I have a simple problem when doing mapping in snakemake. I have couples of _1.fastq.gz & _2.fastq.gz and I would like to do pair-end mapping for around 10 pairs of fastq.gz. So I wrote a snakemake file for this.
import os
import snakemake.io
import glob
(SAMPLES,READS,) = glob_wildcards("raw/{sample}_{read}.fastq.gz")
READS=["1","2"]
REF="/data/data/reference/refs/ucsc.hg19.fasta.gz"
rule all:
input: expand("raw/{sample}.bam",sample=SAMPLES)
rule bwa_map:
input:
ref=REF,
r1=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS),
r2=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS)
output: "raw/{sample}.bam"
shell: "bwa mem -M -t 8 {input.ref} {input.r1} {input.r2} | samtools view -Sbh - > {output}"
Error:
RuleException:
CalledProcessError in line 17 of /data/data/Samples/snakemake-example/WGS-test/step2.smk:
Command ' set -euo pipefail; bwa mem -M -t 8 /data/data/reference/refs/ucsc.hg19.fasta.gz raw/sub1_1.fastq.gz raw/sub1_2.fastq.gz raw/sub2_1.fastq.gz raw/sub2_2.fastq.gz raw/sub1_1.fastq.gz raw/sub1_2.fastq.gz raw/sub2_1.fastq.gz raw/sub2_2.fastq.gz raw/sub1_1.fastq.gz raw/sub1_2.fastq.gz raw/sub2_1.fastq.gz raw/sub2_2.fastq.gz raw/sub1_1.fastq.gz raw/sub1_2.fastq.gz raw/sub2_1.fastq.gz raw/sub2_2.fastq.gz | samtools view -Sbh - > raw/sub2.bam ' returned non-zero exit status 1.
File "/data/data/Samples/snakemake-example/WGS-test/step2.smk", line 17, in __rule_bwa_map
File "/root/miniconda3/envs/bioinfo/lib/python3.6/concurrent/futures/thread.py", line 56, in run
My desired output would be like generate all 10 bam files like
sub1.bam sub2.bam sub3.bam ...
It looks like put all the fastq files into a command. how can i separate them and run pair by pair automatically without using hard code method. Please advice.
expand
is generally needed only in the input of a rule that needs to gather results from several instances of an "upstream" rule. Theall
rule is typically used to do this gathering. – bli