Would someone be able to clarify what each of these things actually are? From what I gathered, nodes are computing points within the cluster, essentially a single computer. Tasks are processes that can be executed either on a single node or on multiple nodes. And cores are basically how much of a CPU on a single node do you want to be allocated to executing the task assigned to that CPU. Is this correct? Am I confusing something?
1 Answers
The terms can have different meanings in different context, but if we stick to a Slurm context:
A (compute) node is a computer part of a larger set of nodes (a cluster). Besides compute nodes, a cluster comprises one or more login nodes, file server nodes, management nodes, etc. A compute node offers resources such as processors, volatile memory (RAM), permanent disk space (e.g. SSD), accelerators (e.g. GPU) etc.
A core is the part of a processor that does the computations. A processor comprises multiple cores, as well as a memory controller, a bus controller, and possibly many other components. A processor in the Slurm context is referred to as a socket, which actually is the name of the slot on the motherboard that hosts the processor. A single core can have one or two hardware threads. This is a technology that allows virtually doubling the number of cores the operating systems perceives while only doubling part of the core components -- typically the components related to memory and I/O and not the computation components. Hardware multi-threading is very often disabled in HPC.
a CPU in a general context refers to a processor, but in the Slurm context, a CPU is a consumable resource offered by a node. It can refer to a socket, a core, or a hardware thread, based on the Slurm configuration.
The rĂ´le of Slurm is to match those resources to jobs. A job comprises one or more (sequential) steps, and each step has one or more (parallel) tasks. A task is an instance of a running program, i.e. at a process, possibly along with subprocesses or software threads.
Multiple tasks are dispatched on possibly multiple nodes depending on how many core each task needs. The number of cores a task needs depends on the number of subprocesses or software threads in the instance of the running program. The idea is to map each hardware thread to one core, and make sure that each task has all assigned cores assigned on the same node.