Snakemake SGE Cluster Submission Issue

Question

I'm new to snakemake and to using clusters, so I would appreciate any help!

I have a snakefile that works fine on a server, but when I try to run it on the cluster I have not found the proper commands to submit a job and have it execute. It "stalls" like other users have found. https://groups.google.com/forum/#!searchin/snakemake/cluster|sort:relevance/snakemake/dFxRIgKDxUU/od9az3MuBAAJ

I am running it on an SGE cluster where there is only one node (the head node) that we submit jobs through. We can't run jobs interactively or run intensive commands on the head node. Usually I would run a bwa command like so:

qsub -V -b y 'bwa mem -t 20 /reference/hg38.fa in/R_1.fastq in/R_2.fastq |samtools view -S -bh -@ 7 > aln_R.bam'

So I followed the FAQ about submitting jobs on the cluster via the head node which suggests this code :

qsub -N PIPE -cwd -j yes python snakemake --cluster "ssh user@headnode_address 'qsub -N pipe_task -j yes -cwd -S /bin/sh ' " -j

This did not work for me because my terminal expected python to be a file. To actually invoke the program's command, I had to use this:

qsub -V -N test -cwd -j y -b y snakemake --cluster "qsub " -j 1

The -b y allows for both binary or as a script. If I run this, qstat will show the program running, but there is an internal error and it never finishes.

Also, the contents inside "qsub " are treated like snakemake commands. When I try to use sge flags such as -j y, I have errors from snakemake along the lines of this:

qsub -V -N test -cwd -j y -b y snakemake --cluster "qsub -j y" -j 1
snakemake: error: argument --cores/--jobs/-j: invalid int value: 'y'

I can submit the snakemake shell scripts in the tmp file perfectly fine, but I can't use the -b y flag and have added the -S /bin/bash flag. So the scripts themselves work, but I think the way they are being pushed to the cluster from the head node is not working somehow. I could be totally off target as well! I would love any direction about how to talk about the SGE to my sys-admins, because I don't really know what to ask them about my problem.

In conclusion: Has anyone else come across the need to invoke -b y for snakemake --cluster to run on SGE? And has it also treated "qsub" as a snakemake command? Or does anyone have another workaround for submitting jobs on the head node for SGE? What questions should I ask my SGE sys-admins?

TBoyarski TBoyarski · Accepted Answer · 2017-05-17T18:49:25

To simplify things:

You shouldn't need to name your job (-N PIPE)
You shouldn't need to set the working directory (-cwd)
Snakemake handles well the STDERR and STDIN of jobs (-j yes)
I don't know enough about this flag, keep it. ('-b y')
You might need the -S argument as well, see below.

Qsub Arguments:

[-b y[es]|n[o]]      handle command as binary
[-S path_list]       command interpreter to be used
[-V]                 export all environment variables

Try the calls below, from the directory which contains your Snakefile. My SGE cluster requires this '-S /bin/bash' argument. I have theories about '-S', but I cannot say for sure why it is needed. The answer in this post reflects a lot of my suspicions as to why it is needed... SGE Cluster - script fails after submission - works in terminal

TRY

$snakemake --jobs 10 --cluster "qsub -V -b y"

OR

$snakemake --jobs 10 --cluster "qsub -V -b y -S /bin/bash"

This way you have your Snakemake arguments (--jobs & --cluster), clearly separated from your qsub arguments (-V, -b & -S).

Your Snakefile should look something like this. It could be better coded, but this is the basic idea.

run bwaRULE:
    input:
        "in/R_1.fastq", "in/R_2.fastq"
    output:
        "aln_R.bam"
    shell:
        "bwa mem -t 20 /reference/hg38.fa {input} | samtools view -S -bh -@ 7 > {output}"

EDIT Responding to OP's comment.

TL;DR I wish you the best. I don't think this how Snakemake was intended to be used. Inti Pedroso re-invented the wheel, you will likely have to do the same. Since you reference his post as well, I will point out that he specifies that the Sys-Admins "prefer" not to have Snakemake run on the head node, out of fear it will consume too many resources.

PID   USER      PR   NI VIRT  RES  SHR S %CPU  %MEM  TIME+  COMMAND
26389 tboyarsk  19   0  318m  62m  11m R 99.8  0.1   0:10.96 snakemake

This is a 1000 job DAG using 14 of the 20+ Snakemake modules I have coded. It ends up trying to use 100% of the CPU, but for <15 seconds. Memory usage didn't exceed 500MB. I strongly recommend you test the waters with your Sys-Admins one more time before you begin work arounds. Getting permission will save you a lot of time.

http://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-can-i-run-snakemake-on-a-cluster-where-its-main-process-is-not-allowed-to-run-on-the-head-node

https://bitbucket.org/snakemake/snakemake/issues/25/running-snakemake-via-cluster-engine

I'm in the processing of renaming this as per my employeer's request. They aren't super descriptive, yet. 4 Samples which after realignment are split and processed via chromosome prior to re-building, annotation, and summation of the data.

Job counts:
count   jobs
4   alignBAM
1   all
8   canonical
8   catVCF
4   cosmic
4   dpsnp
4   filteredBAM
4   indel
4   indexBAM
336 mPileSPLIT
4   markdupBAM
672 mpileup2SPLIT
4   sortBAM
8   tableGET
4   undoBAM
1069

EDIT May 26th 2017

Added to clarify resource consumption on the head node by a Snakemake submission of a large pipeline.

From experience, here's an idea of the strain//resource consumption on the head node caused by running this pipeline. Resource consumption peaks within the first 30 seconds of the pipeline being submitted. After that, head node resource consumption is trivial. The head node is just using minimal resources to monitor the jobs status and submit the next call, as schedulers normally do. No more resource intensive determinations.

Scope

17GB BAM Files (4 Samples)
Duration (6 hours when run in parallel)
Head node usage after the first 15-20 second DAG assembly is trivial.

Timeline

Start
15-20 seconds Head node competition for resources (<500MB) while the DAG is being determined and assembled.
Jobs are qsub'b from the head node to child nodes via Snakemake commands, nearly instantly. Very little overhead, mostly string concatenation and variable linking. This continues until the jobs have all been submitted.

Snakemake SGE Cluster Submission Issue

2 Answers

Scope

Timeline