2
votes

We're using SLURM to manage job scheduling on our computing cluster, and we experiencing a problem with memory management. Specifically, we can't find out how we can allocate memory for a specific task.

Consider the following setup:

  • Each node has 32GB memory
  • We have a SLURM job that sets --mem=24GB

Now, assume we want to run that SLURM job twice, concurrently. Then what I expect (or want) to happen is that when I queue it twice by calling sbatch runscript.sh twice, one of the two jobs will run on one node, and the other will run on another node. However, as it currently is, SLURM schedules both tasks on the same node.

One of the possible causes we've identified is that it appears to check only whether the 24GB of memory is available (i.e., not actively used by other node), instead of checking whether it is requested/allocated.

The question here is: is it possible to allocate/reserve memory per task in SLURM?

Thanks for your help!

2
What is the value in slurm.conf for SelectTypeParams? - Carles Fenoy
Thanks for your comment! SelectType=select/cons_res and SelectTypeParameters=CR_Core - engelen

2 Answers

2
votes

In order to be able to manage memory slurm needs the parameter in SchedTypeParameters to include MEMORY. So just changing that parameter to CR_Core_Memory should be enough for Slurm to start to manage the memory.

If that is not set --mem will not reserve memory and only ensure that the node has enough memory configured.

More information here

1
votes

@CarlesFenoy's answer is good, but to answer

The question here is: is it possible to allocate/reserve memory per task in SLURM?

the parameter you are looking for is --mem-per-cpu, to use in combination with --cpus-per-tasks