5
votes

I have a rule that iterates over a file pulls out the Fastq file paths and runs trimGalore on the Fastq files. However some of the files are corrupted / truncated and so trimGalore fails to process them. It continues to run on remaining files but the overall rule fails and deletes the output folder with the successfully processed files too. How do I retain the output folder?

I tried altering the shell command to ignore exit status but snakemake seems to enforce set -euo pipefailwithin a shell element of the run.

rule trimGalore:
    """
    This module takes in the temporary file created by parse sampleFile rule and determines if libraries are single end or paired end.
    The appropriate step for trimGalore is then ran and a summary of the runs is produced in summary_tg.txt
    """
    input:
        rules.parse_sampleFile.output[1]+"singleFile.txt", rules.parse_sampleFile.output[1]+"pairFile.txt"
    output:
        directory(projectDir+"/trimmed_reads/")
    log:
        projectDir+"/logs/"+stamp+"_trimGalore.log"
    params:
        p = trimGaloreParams
    shell:
        """
        (awk -F "," '{{print $2}}' {input[0]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore {params.p} --gzip -o {output} $i; done
        awk -F "," '{{print $2" "$3}}' {input[1]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore --paired {params.p} --gzip -o {output} $i; done) 2>>{log}
        """

I am happy that it continues to process the remaining Fastq files if one fails but I want the rule output folder to be kept when the job finishes and fails. I want to continue to process the non truncated files

2
I found one workaround was to use params instead of output and it seemed not to create an issue.jimh

2 Answers

1
votes

Currently, your rule considers the entire directory as it's output, so if any errors pop up along the way, it will consider the job as a whole failed and discard the output (i.e. your entire folder).

The solution I could think of would be related to this section of the Snakemake docs, and the one just below it on Functions as input.

def myfunc(wildcards):
    return [... a list of input files depending on given wildcards ...]

rule:
    input: myfunc
    output: "someoutput.{somewildcard}.txt"
    shell: "..."

With this you could try iterating over your file, and snakemake will create one job per Fastq, so in the event that individual job fails, only that output file will be removed.

Disclaimer: This is something I just learned and haven't tried yet, but it will be useful to me as well!

0
votes

I had a similar-ish issue, my approach was to create a dummy file for the output, and move my/your output to params.

rule trimGalore:
    """
    This module takes in the temporary file created by parse sampleFile rule and determines if libraries are single end or paired end.
    The appropriate step for trimGalore is then ran and a summary of the runs is produced in summary_tg.txt
    """
    input:
        rules.parse_sampleFile.output[1]+"singleFile.txt", rules.parse_sampleFile.output[1]+"pairFile.txt"
    output:
        dummy = dummy.txt,
    log:
        projectDir+"/logs/"+stamp+"_trimGalore.log"
    params:
        p = trimGaloreParams,
        dir = directory(projectDir+"/trimmed_reads/")
    shell:
        """
        (awk -F "," '{{print $2}}' {input[0]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore {params.p} --gzip -o {params.dir} $i; done
        awk -F "," '{{print $2" "$3}}' {input[1]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore --paired {params.p} --gzip -o {params.dir} $i; done) 2>>{log}  && touch {output.dummy}
        """

I cannot test this and you may need to tinker a bit...it may bear fruit.