I've written parallel program in C using OpenMP.
I want to control number of threads program is using.
I'm using system with:
- CentOS release 6.5 (Final)
- icc version 14.0.1 (gcc version 4.4.7 compatibility)
- 2x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Program that I run:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
double t1[TABLE_SIZE];
double t2[TABLE_SIZE];
int main(int argc, char** argv) {
omp_set_dynamic(0);
omp_set_nested(0);
omp_set_num_threads(NUM_OF_THREADS);
#pragma omp parallel for default(none) shared(t1, t2) private(i)
for(i=0; i<TABLE_SIZE; i++) {
t1[i] = rand();
t2[i] = rand();
}
for(i=0; i<NUM_OF_REPETITION; i++) {
test1(t1, t2);
}
}
void test1(double t1[], double t2[]) {
int i;
double result;
#pragma omp parallel for default(none) shared(t1, t2) private(i) reduction(+:result)
for(i=0; i<TABLE_SIZE; i++) {
result += t1[i]*t2[i];
}
}
I'm running script that sets TABLE_SIZE(2500, 5000, 100000, 1000000), NUM_OF_THREADS(1-24), NUM_OF_REPETITION(50000 as 50k, 100000 as 100k, 1000000 as 1M) at compile time. The problem is that computer is not utilizing all the threads that are offered all the time. It seems that problem is dependent on TABLE_SIZE.
For example when I compile the code with TABLE_SIZE=2500 all is fine till NUM_OF_THREADS=20. Then some weird things happen. When I set NUM_OF_THREADS=21 the program is utilizing only 18 threads(I observe htop to see how many threads are running). When I set NUM_OF_THREADS=23 and NUM_OF_REPETITION=100k it's using 18 threads, but if I change NUM_OF_REPETITION to 1M at NUM_OF_THREADS=23 it's using 19 threads.
When I change TABLE_SIZE to 5000 the anomally starts at 18 threads. I set NUM_OF_THREADS=18 and at NUM_OF_REPETITION=1M the program uses only 17 threads. When I set NUM_OF_THREADS=19 and NUM_OF_REPETITION=100k or 1M it uses only 17 threads. If I change NUM_OF_THREADS to 24 the program is using 20 threads at NUM_OF_REPETITION=50k, 22 threads at NUM_OF_REPETITION=100k and 23 threads at NUM_OF_REPETITION=1M.
This sort of inconsistency is going on and on with increasing TABLE_SIZE. The bigger the TABLE_SIZE the faster(at lower NUM_OF_THREADS) the inconsistency occours.
At this(OpenMP set_num_threads() is not working) post I read that omp_set_num_threads() sets the upper limit of threads that can be used by the program. And as you can see I've disabled dynamic teams and program is still not using all the threads. It doesn't help if I set environment variables OMP_NUM_THREADS and OMP_DYNAMIC either.
So I went and read some of OpenMP specification 3.1. And it says program should use the number of threads it is set by omp_set_num_threads(). Also omp_get_max_threads() function returns 24 available threads.
Any help would be greatly appreciated.