0
votes

I'm trying to use a cluster to run an MPI code. the cluster hardware consist of 30 nodes, each with the following specs: 16 Cores at 2 Sockets (Intel Xeon e5-2650 v2) - (32 Cores with multithreading enabled) 64 GByte 1866 MT/s main memory named: aria

the slurm config file is as following:

#SBATCH --ntasks=64                     # Number of MPI ranks
#SBATCH --cpus-per-task=1               # Number of cores per MPI rank
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=32             # How many tasks on each node
#SBATCH --ntasks-per-socket=16          # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb             # Memory per core

when I submit the job, a return message comes out with the following content: sbatch: error: Batch job submission failed: Requested node configuration is not available which is a little bit confusing. I'm submitting one task per a cpu and dividing the tasks equally between nodes and sockets, can anyone please advise on the problem with the aforementioned configs? and one more thing: what is the optimum configuration given the hardware specs?

Thanks in advance

1

1 Answers

0
votes

Look exactly what nodes offer with the sinfo -Nl command.

If could be that:

  • hyper threading is not enabled (which is often the case on HPC clusters)
  • or one core is reserved for Slurm and the Operating System
  • or hyper threading is enabled but Slurm is configured to schedule physical cores

As for optimal job configuration, it depends how 'optimal' is defined. For optimal time to solution, often it is better to let Slurm decide how to organise the ranks on the nodes because it will then be able to start your job sooner.

#SBATCH --ntasks=64                     # Number of MPI ranks
#SBATCH --mem-per-cpu=100mb             # Memory per core

For optimal job performance (in case of benchmarks, or cost analysis, etc.) you will need to take switches into accounts as well. (although with 30 nodes you probably have only one switch)

#SBATCH --ntasks=64                     # Number of MPI ranks
#SBATCH --exclusive
#SBATCH --switches=1
#SBATCH --mem-per-cpu=100mb             # Memory per core

Using --exclusive will make sure your job will not be bothered by other jobs.