1
votes

I've been wondering if there exists somewhere a diagram of some sort that explains exactly the "computing model" behind slurm (if that makes sense). Basically, I am wondering the links between the concepts of "Node", "Tasks", "CPU", "Core" and "Thread" (what they mean relatively to each other) as well as the relationship to the machine/system.

I've read the man page for sbatch and srun but I am not 100% sure about it. So far, my understanding is the following

  • A Node is a Physical Machine (that's pretty easy and clear :D), and the machines are typically connected by a network. Above it's a distributed memory model, below it's a shared memory model (?).
  • A Task is typically a process. That is what would be a rank in MPI (right ?)
  • A CPU == Processor (?) is an actual physical CPU
  • A Core is part of a multicore CPU
  • A Thread is something that will run on a given core. Is that only useful for like hyperthreading ?

Is that right ?

Then, say I have a mix of MPI and OpenMP. I basically use the concept of core, cpu and thread for the OpenMP part, and then task & node for MPI.

Correct ?

Thanks :)

1

1 Answers

1
votes

A Node is a Physical Machine (that's pretty easy and clear :D), and the machines are typically connected by a network.

Indeed a node is a physical machine, with one motherboard with one or more sockets, each hosting one CPU package made of several cores each.

Above it's a distributed memory model, below it's a shared memory model (?).

That is a natural way of thinking, but the programming model and the hardware structure are two different things. You can have a distributed memory model (e.g. MPI) program running inside a single node, and you can have a shared memory program running across several nodes using PGAS frameworks such as Coarray Fortran or shmem

A Task is typically a process. That is what would be a rank in MPI (right ?)

Yes.

A CPU == Processor (?) is an actual physical CPU

In the Slurm context, a CPU is to be understood as a core for systems that disable hardware hyperthreading or a (hardware) thread when hyperthreading is enabled.

But generally speaking, when you buy one CPU, you get one chip with several cores on it (CPU package) that fits into one socket.

A Core is part of a multicore CPU

A core is a distinct compute unit within the CPU package. It has its own arithmetic and logic unit, but shares some cache memories

A Thread is something that will run on a given core. Is that only useful for like hyper threading ?

A hardware thread is a technology that allows a single physical core to appear as two distinct compute cores as some registers and caches are duplicated. But both hardware threads share the same arithmetic and logic unit. They are useful when the workload is made of a lot of I/Os that leave the arithmetic and logic unit idle while doing data transfers. Hardware hyper threading is often disabled on compute clusters.

A software thread is a lightweight version of a process. A single process can be made of several threads, using libraries/tools such as pthreads, or OpenMP.

Then, say I have a mix of MPI and OpenMP. I basically use the concept of core, cpu and thread for the OpenMP part, and then task & node for MPI.

You can simply set --ntasks to the number of MPI processes you want, and --cpus-per-task for the number of OpenMP threads you want, assuming Slurm is configured with Cores as consumable resources.