4
votes

I am working on a rather complex snakemake workflow that spawns several hundreds of thousands of jobs. Everything works... The workflow executes, DAG gets created (thanks to the new checkpoint implementation), but is unbearably slow. I think the bottleneck is in checking off the successfully completed jobs before continuing to the next. More annoyingly it does this sequentially for all the jobs that are started in a batch, and the next batch is only executed when all checks are successful. The execution time of each job takes about 1 or two seconds and is done in parallel (1 core per job), but snakemake then cycles through the job completion check one at a time taking 5 to 10 seconds per job. So the whole process per batch takes minutes. See part of the log below that shows the different timestamps of consecutive jobs run in the same batch. There is about 11 seconds difference between the timestamps

Finished job 42227. 5853 of 230419 steps (3%) done [Thu Feb 28 09:41:09 2019] Finished job 119645. 5854 of 230419 steps (3%) done [Thu Feb 28 09:41:21 2019] Finished job 161354. 5855 of 230419 steps (3%) done [Thu Feb 28 09:41:32 2019] Finished job 160224. 5856 of 230419 steps (3%) done [Thu Feb 28 09:41:42 2019] Finished job 197063. 5857 of 230419 steps (3%) done [Thu Feb 28 09:41:53 2019] Finished job 200063. 5858 of 230419 steps (3%) done [Thu Feb 28 09:42:04 2019] Finished job 45227. 5859 of 230419 steps (3%) done [Thu Feb 28 09:42:15 2019] Finished job 44097. 5860 of 230419 steps (3%) done [Thu Feb 28 09:42:26 2019] Finished job 5387. 5861 of 230419 steps (3%) done [Thu Feb 28 09:42:37 2019] Finished job 158354. 5862 of 230419 steps (3%) done [Thu Feb 28 09:42:48 2019]

So for 20 parallel processes 2 seconds are used for computation, but then goes idle for 20x11 = 220 seconds before continuing processing the next 20 jobs.

In the run above I had about 200K+ jobs. If I scale down the time between logs becomes much much shorter:

Finished job 2630. 5 of 25857 steps (0.02%) done [Thu Feb 28 10:14:31 2019] Finished job 886. 6 of 25857 steps (0.02%) done [Thu Feb 28 10:14:31 2019] Finished job 9606. 7 of 25857 steps (0.03%) done [Thu Feb 28 10:14:31 2019] Finished job 4374. 8 of 25857 steps (0.03%) done [Thu Feb 28 10:14:32 2019] Finished job 6118. 9 of 25857 steps (0.03%) done [Thu Feb 28 10:14:32 2019] Finished job 7862. 10 of 25857 steps (0.04%) done [Thu Feb 28 10:14:32 2019] Finished job 278. 11 of 25857 steps (0.04%) done [Thu Feb 28 10:14:32 2019] Finished job 2022. 12 of 25857 steps (0.05%) done [Thu Feb 28 10:14:33 2019]

With 25K jobs the checking time is now < 1 second.

I hope I am missing an argument here, or some parameter, but I am fearing that it maybe something non-trivial.

I run snakemake with the following flags: snakemake --keep-going --snakefile My.Snakefile --configfile config.yaml -j 23 --max-jobs-per-second 23

I am running my workflow locally on a 24 core system with 256 GB ram.

Any help to speed things up would greatly be appreciated!

John

1
in my experience snakemake does not scale well to 10K jobs or more, i have come to believe this is an inherent limitation, but the developers of snakemake may be able to suggest some optimisations - Chris_Rands
I had the same problem twitter.com/yokofakun/status/891937697983602688 , I switched to nextflow.io - Pierre
note: for this kind of problem, you should ask biostars.org or bioinformatics.stackexchange.com - Pierre
Hi Pierre, the problem you describe in your tweet seems to be solved, or is at least not an issue for me. The DAG creation step for my full analysis (+500K jobs) takes 2 minutes to start and about 15 minutes after the first checkpoint is completed and the DAG is updated, which I find acceptable given the complexity. It's just that each job requires 11 seconds to check for correct completion. - John van Dam

1 Answers

3
votes

I've 'solved' my issue for now by replacing the "one to many to one" parts of my workflow with calling GNU parallel.

Pro's:

  • No file checking, so removed the snakemake overhead
  • Removed the necessity for checkpoints
  • Simplified DAG

Con's:

  • No file checking, so failed jobs are less easy to retrieve and entire computational heavy sections of the workflow would have to be redone, not just the files that fails.
  • Need to dive into other logs in order to find what exactly went wrong.

I suggest to use the --halt option with 'now,fail=1' and the --joblog option for parallel to catch problematic files.

On a test set I've reduced my calculation time from 4 hours to 15 minutes.

I still feel that snakemake should be able to handle this somehow, but I need to move on with my project.