1
votes

I am new to Slurm and I also found the related questions about this topic. However, I am still confused about several points of how to use srun. According to the official document, srun will typically first allocate resources and then run the parallel jobs. For example, I want to run 20 tasks and if I submit my job based on the following script, I am not sure how many tasks are created. Because sbatch only takes care of allocating resources instead of executing program.

#!/bin/sh
#SBATCH -n 20
#SBATCH --mpi=pmi2
#SBATCH -o myoutputfile.txt
module load mpi/mpich-x86_64
mpirun mpiprogram < inputfile.txt

If I am trying to run sequential program like the following, I am not whether there will be a difference or not. For example, I can simply remove the srun command in this script. What will happen?

#!/bin/sh
#SBATCH -n 1
#SBATCH -N 1
srun tar zxf julia-0.3.11.tar.gz
echo "prefix=/software/julia-0.3.11" > julia/Make.user
cd julia
srun make
1

1 Answers

1
votes

The first example will spawn 20 tasks ; sbatch will request 20 CPUs and also set up the environment so that mpirun knows how many CPUs were requested for the job. mpirun will then spawn as many processes as were allocated (provided that OpenMPI was compiled with Slurm support).

The #SBATCH --mpi=pmi2 part is meant for srun so it will have no effect if srun is not called in the submission script.

In the second example, there will be no difference in the number of processes spawned as only one is needed. But, with srun, the output of sstat will be more reliable, the management of signals will be more precise, and the buffering of the output will be more controlled (via the srun command line options).

If you request multiple tasks, srun will instantiate that many processes. It can be an MPI program, or a sequential program that adapts its behaviour based on the SLURM_PROC_ID environment variable.

Also you can run multiple srun in the same submission script. Each instance of srun (called a "step") is then accounted separately in the accounting (sacct).

Finally, srun can use a subset of the allocation and organise the micro-scheduling of many small tasks in a single job (see the example in the srun manpage).