4
votes

I have a series of jobs which need to be done; no dependencies between jobs. I'm looking for a tool which will help me distribute these jobs to machines. The only restriction is that each machine should run one job at a time only. I'm trying to maximize throughput, because the jobs are not very balanced. My current hacked together shell scripts are less than efficient as I pre-build the per-machine job-queue, and can't move jobs from the queue of a heavily loaded machine to one which is waiting, having already finished everything.

Previous suggestions have included SLURM which seems like overkill, and even more overkill LoadLeveller.

GNU Parallel looks like almost exactly what I want, but the remote machines don't speak SSH; there's a custom job launcher used (which has no queueing capabilities). What I'd like is Gnu Parallel where the machine can just be substituted into a shell script on the fly just before the job is dispatched.

So, in summary:

  • List of Jobs + List of Machines which can accept: Maximize throughput. As close to shell as possible is preferred.

Worst case scenario something can be hacked together with bash's lockfile, but I feel as if a better solution must exist somewhere.

2
Have you consider using the shell built-in jobs? Something like while # jobs >= maxjobs sleep .1; command &technosaurus

2 Answers

2
votes

Assuming your jobs are in a text file jobs.tab looking like

/path/to/job1
/path/to/job2
...

Create dispatcher.sh as something like

mkfifo /tmp/jobs.fifo
while true; do
  read JOB
  if test -z "$JOB"; then 
    break 
  fi
  echo -n "Dispatching job $JOB .."
  echo $JOB >> /tmp/jobs.fifo
  echo ".. taken!"
done
rm /tmp/jobs.fifo

and run one instance of

dispatcher.sh < jobs.tab

Now create launcher.sh as

while true; do
  read JOB < /tmp/jobs.fifo
  if test -z "$JOB"; then
    break
  fi

  #launch job $JOB on machine $0 from your custom launcher

done

and run one instance of launcher.sh per target machine (giving the machine as first and only argument)

1
votes

GNU Parallel supports your own ssh command. So this should work:

function my_submit { echo On host $1 run command $3; }
export -f my_submit
parallel -j1 -S "my_submit server1,my_submit server2" my_command ::: arg1 arg2