3
votes

I want to run bcl2fastq to generate fastq files from bcl format.

Depending on the sequencing set up with respect to sequencing mode and how many indexes were used, it can generate either read1,read2,index1 or read1,read2,index1,index2, etc.

What I want to do is, put the read output number information in the config.yaml file as this:

readids: ['I1','I2','R1','R2']

and let the rule figure out automatically how many read output (fastq.gz files) it should generate.

How do I write the output section to achieve it?

Below is what I have and it somehow can only output one file from this rule each time. So it actually run this rule 4 times, each for I1, I2, R1 and R2, which is not what I want. How to fix it at line 45? in line 45, {readid} is supposed to be one of I1,I2,R1,R2.

 39 rule bcl2fastq:                                                                                                                                                 
 40     input:
 41         "/data/MiniSeq/test"
 42     params:
 43         prefix="0_fastq"
 44     output:
 45         "0_fastq/{runid}_S0_L001_{readid}_001.fastq.gz"
 46     log:
 47         "0_fastq/bcl2fastq_log.txt"
 48     shell:
 49         """
 50         bcl2fastq -R {input} -o {params.prefix} --create-fastq-for-index-reads --barcode-mismatches 1 --use-bases-mask {config[bcl2mask]} --minimum-trimmed
    -read-length 1 --mask-short-adapter-reads 1 --no-bgzf-compression &> {log}
 52        
 53         """
1

1 Answers

6
votes

you are looking for the expand() function which basically fills in the given variables, returning a list of output files. You just need to be careful to escape wildcards that should "survive the formatting" (use double curly brackets):

So in your case

output:
      expand("0_fastq/{{runid}}_S0_L001_{readid}_001.fastq.gz", readid=config['readids'])

This will replace readid with values given in config['readids'] and keep the runid wilcard.

Andreas