I have implemented some NGS data analysis workflows with Nextflow. I used "Paired End" channels (fromFilePairs method) for some of my workflow processes. I ran into a problem I wasn't expecting after multiple workflow executions : my samples ID would sometimes be mixed, resulting in inaccurate outputs for the processes where it happened. I think this is related to the Non-deterministic input channels issue (https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).
Let's suppose I apply my worklow on these paired-end files : sample1_R{1,2}.fastq, sample2_R{1,2}.fastq
process Step1 {
input:
tuple pair_ID, file(A) from channelA
tuple pair_ID, file(B) from channelB
tuple pair_ID, file(C) from channelC
...
}
For this kind of process with more than one "tuple pair_ID" as input, the data pair_ID (= my samples names) can be mixed up and my process would end up using randomly input files A and B of the sample1, and the input file C of the sample2 instead of all files (A,B,C) of the same pair_ID (key = only sample1 or only sample2). I had this randomly mixed input filenames issue (which impacted the outputs) after several workflow executions, after using -resume when an error occurred but also after full successful workflow runs.
In order to have the same key (pair_ID) between the input files emitted by each of the 3 channels, I used the join
operator:
Process Step1 {
input:
tuple pair_ID, file(A), file(B), file(C) from channelA.join(channelB).join(channelC)
...
}
This operator seems to make everything work as expected, I don't see any mix in my sample IDs and in my final outputs. In the doc (https://www.nextflow.io/docs/latest/operator.html?highlight=join#join), join
seems to be suited for a 2 channels use only, so I am unsure if I am using it right for 3 channels.
Is my method using join
legit ? Or does it still have some flaws ?
Is there a better way to correct my issue ?
If I am unsure that this method is correct to avoid any mix in my samples ID, I might change to another workflow management system such as Snakemake but I would really like to solve this issue and to continue using Nextflow.
Thank you in advance, don't hesitate if something isn't clear !