1
votes

I'm new to snakemake and to using clusters, so I would appreciate any help!

I have a snakefile that works fine on a server, but when I try to run it on the cluster I have not found the proper commands to submit a job and have it execute. It "stalls" like other users have found. https://groups.google.com/forum/#!searchin/snakemake/cluster|sort:relevance/snakemake/dFxRIgKDxUU/od9az3MuBAAJ

I am running it on an SGE cluster where there is only one node (the head node) that we submit jobs through. We can't run jobs interactively or run intensive commands on the head node. Usually I would run a bwa command like so:

qsub -V -b y 'bwa mem -t 20 /reference/hg38.fa in/R_1.fastq in/R_2.fastq |samtools view -S -bh -@ 7 > aln_R.bam' 

So I followed the FAQ about submitting jobs on the cluster via the head node which suggests this code :

qsub -N PIPE -cwd -j yes python snakemake --cluster "ssh user@headnode_address 'qsub -N pipe_task -j yes -cwd -S /bin/sh ' " -j

This did not work for me because my terminal expected python to be a file. To actually invoke the program's command, I had to use this:

qsub -V -N test -cwd -j y -b y snakemake --cluster "qsub " -j 1

The -b y allows for both binary or as a script. If I run this, qstat will show the program running, but there is an internal error and it never finishes.

Also, the contents inside "qsub " are treated like snakemake commands. When I try to use sge flags such as -j y, I have errors from snakemake along the lines of this:

qsub -V -N test -cwd -j y -b y snakemake --cluster "qsub -j y" -j 1
snakemake: error: argument --cores/--jobs/-j: invalid int value: 'y' 

I can submit the snakemake shell scripts in the tmp file perfectly fine, but I can't use the -b y flag and have added the -S /bin/bash flag. So the scripts themselves work, but I think the way they are being pushed to the cluster from the head node is not working somehow. I could be totally off target as well! I would love any direction about how to talk about the SGE to my sys-admins, because I don't really know what to ask them about my problem.

In conclusion: Has anyone else come across the need to invoke -b y for snakemake --cluster to run on SGE? And has it also treated "qsub" as a snakemake command? Or does anyone have another workaround for submitting jobs on the head node for SGE? What questions should I ask my SGE sys-admins?

2

2 Answers

1
votes

To simplify things:

  1. You shouldn't need to name your job (-N PIPE)
  2. You shouldn't need to set the working directory (-cwd)
  3. Snakemake handles well the STDERR and STDIN of jobs (-j yes)
  4. I don't know enough about this flag, keep it. ('-b y')
  5. You might need the -S argument as well, see below.

Qsub Arguments:

[-b y[es]|n[o]]      handle command as binary
[-S path_list]       command interpreter to be used
[-V]                 export all environment variables

Try the calls below, from the directory which contains your Snakefile. My SGE cluster requires this '-S /bin/bash' argument. I have theories about '-S', but I cannot say for sure why it is needed. The answer in this post reflects a lot of my suspicions as to why it is needed... SGE Cluster - script fails after submission - works in terminal

TRY

$snakemake --jobs 10 --cluster "qsub -V -b y"

OR

$snakemake --jobs 10 --cluster "qsub -V -b y -S /bin/bash"

This way you have your Snakemake arguments (--jobs & --cluster), clearly separated from your qsub arguments (-V, -b & -S).

Your Snakefile should look something like this. It could be better coded, but this is the basic idea.

run bwaRULE:
    input:
        "in/R_1.fastq", "in/R_2.fastq"
    output:
        "aln_R.bam"
    shell:
        "bwa mem -t 20 /reference/hg38.fa {input} | samtools view -S -bh -@ 7 > {output}"

EDIT Responding to OP's comment.

TL;DR I wish you the best. I don't think this how Snakemake was intended to be used. Inti Pedroso re-invented the wheel, you will likely have to do the same. Since you reference his post as well, I will point out that he specifies that the Sys-Admins "prefer" not to have Snakemake run on the head node, out of fear it will consume too many resources.

PID   USER      PR   NI VIRT  RES  SHR S %CPU  %MEM  TIME+  COMMAND
26389 tboyarsk  19   0  318m  62m  11m R 99.8  0.1   0:10.96 snakemake

This is a 1000 job DAG using 14 of the 20+ Snakemake modules I have coded. It ends up trying to use 100% of the CPU, but for <15 seconds. Memory usage didn't exceed 500MB. I strongly recommend you test the waters with your Sys-Admins one more time before you begin work arounds. Getting permission will save you a lot of time.

http://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-can-i-run-snakemake-on-a-cluster-where-its-main-process-is-not-allowed-to-run-on-the-head-node

https://bitbucket.org/snakemake/snakemake/issues/25/running-snakemake-via-cluster-engine

I'm in the processing of renaming this as per my employeer's request. They aren't super descriptive, yet. 4 Samples which after realignment are split and processed via chromosome prior to re-building, annotation, and summation of the data.

Job counts:
count   jobs
4   alignBAM
1   all
8   canonical
8   catVCF
4   cosmic
4   dpsnp
4   filteredBAM
4   indel
4   indexBAM
336 mPileSPLIT
4   markdupBAM
672 mpileup2SPLIT
4   sortBAM
8   tableGET
4   undoBAM
1069

EDIT May 26th 2017

Added to clarify resource consumption on the head node by a Snakemake submission of a large pipeline.

From experience, here's an idea of the strain//resource consumption on the head node caused by running this pipeline. Resource consumption peaks within the first 30 seconds of the pipeline being submitted. After that, head node resource consumption is trivial. The head node is just using minimal resources to monitor the jobs status and submit the next call, as schedulers normally do. No more resource intensive determinations.

Scope

  • 17GB BAM Files (4 Samples)
  • Duration (6 hours when run in parallel)
  • Head node usage after the first 15-20 second DAG assembly is trivial.

Timeline

  1. Start
  2. 15-20 seconds Head node competition for resources (<500MB) while the DAG is being determined and assembled.
  3. Jobs are qsub'b from the head node to child nodes via Snakemake commands, nearly instantly. Very little overhead, mostly string concatenation and variable linking. This continues until the jobs have all been submitted.
0
votes

When you say you can't use nodes interactively are you sure your cluster admins have banned the use of qrsh and qlogin as well as ssh? Those two commands submit jobs to the cluster that can give you an interactive shell but are under the control of SGE.

My suspicion is that you are running into an issue with double parsing of the command line. Once on job submission and once when SGE is trying to start your command. Rather than trying to submit the whole thing as a command line write your snakekmake command in a shell file and submit that (without -b y)

#!/bin/sh
#$ -S /bin/sh
exec python snakemake -j 1 --cluster "qsub -j y"

Alternatively create a wrapper script that embeds the options you want snakemake to use when invoking qsub for subordinate jobs.

#!/bin/sh
exec qsub -j y "$@"

Then tell snakemake to use that:

qsub -V -N test -cwd -j y -b y snakemake  -j 1 --cluster "wrapper"

Alternatively play around with the command lines you've adding extra layers of escaping and quoting until it works.