I want to parallelize an R script on a HPC with a Slurm scheduler.
SLURM is configured with SelectType: CR_Core_Memory
.
Each compute node has 16 cores (32 threads).
I pass the R script to SLURM with the following configuration using the clustermq as the interface to Slurm.
#!/bin/sh
#SBATCH --job-name={{ job_name }}
#SBATCH --partition=normal
#SBATCH --output={{ log_file | /dev/null }} # you can add .%a for array index
#SBATCH --error={{ log_file | /dev/null }}
#SBATCH --mem-per-cpu={{ memory | 2048 }}
#SBATCH --cpus-per-task={{ n_cpus }}
#SBATCH --array=1-{{ n_jobs }}
#SBATCH --ntasks={{ n_tasks }}
#SBATCH --nodes={{ n_nodes }}
#ulimit -v $(( 1024 * {{ memory | 4096 }} ))
R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
Within the R script I do "multicore" parallelization with 30 cores. I would like to use cores from multiple nodes to satisfy the requirement of 30 cpus, i.e. 16 cores from node1, 14 from node2.
I tried using n_tasks = 2
and cpus-per-task=16
. With this, the job gets assigned to two nodes. However, only one node is doing compuation (on 16 cores). The second node is assigned to the job but does nothing.
In this question srun
is used to split parallelism across nodes with foreach
and Slurm IDs. I do not neither use srun
nor foreach
. Is there a way to achieve what I want with SBATCH
and multicore
parallelism?
(I know that I could use SelectType=CR_CPU_Memory
and have 32 threads available per node. However, the question is how to use cores/threads from multiple nodes in general to be able to scale up parallelism).
parallel
package to parallelize through several computers. – lcgodoy