1
votes

When a program is run with a job scheduler, the scheduler allocates n processor cores (specified by the user) for the job. When a program using OpenMP runs, OpenMP will in general use OMP_NUM_THREADS threads, which for simplicity we'll say are each mapped to a different processor core.

OpenMP doesn't know anything about which cores were allocated to the program/job by the scheduler (afaik). Also, the OS is the one to actually map the OpenMP threads to cores, not OpenMP.

My question is: what's going on behind the scenes so that the OpenMP threads are only mapped to cores that were allocated to the job by the job scheduler?

I want my question to be general, but if the process is really different across job schedulers, then an LSF-specific answer would be best.

1

1 Answers

3
votes

The way it works is very simple - the DRM (distributed resource manager) limits the CPU affinity mask of the process before it is started. The affinity mask tells the OS scheduler on which logical CPUs the process can be scheduled. The default CPU affinity mask simply contains all available logical CPUs. If not instructed otherwise, most OpenMP runtimes obtain that mask when the program is started and they obey it while spawning new threads. Both GNU and Intel OpenMP runtimes examine the affinity mask in order to determine the default number of threads if no OMP_NUM_THREADS is specified. Most OpenMP runtimes also support their own binding mechanisms (also known as per-thread affinity), e.g. the KMP_AFFINITY variable of Intel OpenMP or the GOMP_CPU_AFFINITY variable for GNU OpenMP. Some of these can be instructed to respect the original mask, e.g. KMP_AFFINITY="respect,granularity=core" will make Intel OpenMP bind its threads only to the CPUs enabled in the affinity mask with which the process was started.

Under Linux there are two kinds of affinity masks. One could be considered soft and is set by the sched_setaffinity(2) syscall. This mask is soft because it could be overridden and expanded at any time. But Linux also provides the so-called cpusets (part of the cgroups framework) that function more or less like lightweight containers. One can create a cpuset and assign only certain logical CPUs to it and then that set is AND-ed with whatever mask is requested via sched_setaffinity() in order to obtain the final mask that is actually applied. Therefore cpusets provide a hard mask - it cannot be extended, rather one can only use it or a subset of it (but not a superset). sched_setaffinity() on Linux takes either PIDs or TIDs and therefore could be used to set the affinity of individual threads and that's how OpenMP runtimes implement per-thread affinity. A more portable call is the POSIX pthread_setaffinity_np().

LSF (9.1.1 and later) supports affinity using Linux cpusets. See the documentation here on how to set it up if you are an LSF administrator or how to request certain affinity settings for your jobs if you are user.

Sun (errr... I mean Oracle) Grid Engine has some support for process affinity starting with version 6.2u5 if I recall correctly.