I was wondering if there is a way to have optional inputs in rules. An example case is excluding unpaired reads for alignment (or having only unpaired reads). A pseudo rule example:
rule hisat2_align:
input:
rU: lambda wildcards: ('-U '+ read_files[wildcards.reads]['unpaired']) if wildcards.read_type=='trimmed' else '',
r1: lambda wildcards: '-1 '+ read_files[wildcards.reads]['R1'],
r2: lambda wildcards: '-2 '+ read_files[wildcards.reads]['R2']
output:
'aligned.sam'
params:
idx: 'index_prefix',
extra: ''
shell:
'hisat2 {params.extra} -x {params.idx} {input.rU} {input.r1} {input.r2}'
Here, not having trimmed reads (rU=''
) would result in missing input file error.
I can go around this through a duplicate rule with adjusted input/shell statement or handling the input through params
(i'm sure there are other ways). I'm trying to handle this neatly so that this step can be run through a snakemake wrapper (currently a custom one).
The closest example I've seen is on https://groups.google.com/d/msg/snakemake/qX7RfXDTDe4/XTMOoJpMAAAJ
and Johannes' answer. But there we have a conditional assignment (eg. input: 'a' if condition else 'b'
) not an optional one.
Any help/guidance will be appreciated.
ps. optional input can help with varying number of hisat2 indexes as well (as noted here: https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/hisat2.html).
EDIT
To clarify the potential inputs:
1) Use single-end reads alone and declare them in rU
. Reads files for the sample might be
sample1_single_1.fastq.gz
sample1_single_2.fastq.gz
In this case r1
and r2
maybe empty lists or not declared at all in the rule.
2) Use paired-end reads and declared them in r1
and r2
. Reads files for the sample might be
sample1_paired_1_R1.fastq.gz
sample1_paired_1_R2.fastq.gz
sample1_paired_2_R1.fastq.gz
sample1_paired_2_R2.fastq.gz
In this case `rU`` maybe empty list or not declared at all in the rule.
3) Use paired and single-end reads together (e.g. output from trimmomatic where some pairs are broken). Reads files for the sample might be
sample1_paired_1_R1.fastq.gz
sample1_paired_1_R2.fastq.gz
sample1_paired_2_R1.fastq.gz
sample1_paired_2_R2.fastq.gz
sample1_unpaired_1_R1.fastq.gz
sample1_unpaired_1_R2.fastq.gz
sample1_unpaired_2_R1.fastq.gz
sample1_unpaired_2_R2.fastq.gz
As a solution. I ended up using @timofeyprodanov approach. I didn't realize an empty list can be used for this. Thanks for all the answers and comments!