On a shared memory system running Linux, say it has 4 Intel Xeon-E5 CPUs and each CPU has 10 cores. PBS Pro is installed. And for example users use qsub -l select=1:ncpu=30
to run software programs if they wanted to run on 30 cores. Or would do setenv OMP_NUM_THREADS 30
for other software.
my question has mainly to do with commercial software packages that are based around MPI. Disregarding PBS and qsub for a moment, all you do to run these programs is either chose the number of cores to run on from a drop down menu after it starts, or from the prompt while launching it with something like ./cfd.exe -np 30
to use 30 cores.
system has 4 physical sockets = 4 CPUs;
each CPU has 10 cores = 40 total physical cores;
each core has hyperthreading, so a cat /proc/cpuinfo
will report back with 80 cpus or cores numbered from 0 to 79.
q1: I am confused as to when & how hyperthreading takes place, if it happens automatically behind the scenes, or if i have to somehow manually invoke it to happen.
For a system having many cores but i will keep using the above numbers for simplicity, now when PBS Pro and qsub are used and a user does qsub -l select=1:ncpu=20
they get allocated 10 physical cores numbered from say 10..19 and also get allocated 10 virtual cores numbered from 50..59. This brings me to question 2 below-
q2: What is the correct way to run?
If /proc/cpuinfo comes back with a total of 80 CPUs then am i safe to assume i can always do ./cfd.exe -np 80
or setenv OMP_NUM_THREADS 80
and be sure every core is not running at 50% ? Or must i never do greater than -np 40
and let the system handle it?
I use cfd software as an example, but i am also asking this with respect to software i and coworkers have written using OpenMP and other parallel directives.
q3: Am I correct in thinking that, if I launch a software program and specify it to run on 4 cores or it is hard coded to look for at most 4 cores to run in parallel then if the CPU is hyper threading capable, would hyper threading happen behind the scenes automatically? Such that if I were to disable hyper threading at the BIOS or EFI level then my program would run slower? Assume that the program and problem scales linearly and 8 cores should always twice as fast as 4, 16 cores always twice as fast as 8 cores, and so on. This question #3 i am most interested in understanding correctly.