1
votes

I've written parallel program in C using OpenMP.

I want to control number of threads program is using.

I'm using system with:

  • CentOS release 6.5 (Final)
  • icc version 14.0.1 (gcc version 4.4.7 compatibility)
  • 2x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz

Program that I run:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

double t1[TABLE_SIZE];
double t2[TABLE_SIZE];

int main(int argc, char** argv) {

    omp_set_dynamic(0);
    omp_set_nested(0);
    omp_set_num_threads(NUM_OF_THREADS);

    #pragma omp parallel for default(none) shared(t1, t2) private(i)
    for(i=0; i<TABLE_SIZE; i++) {
        t1[i] = rand();
        t2[i] = rand();
    }

    for(i=0; i<NUM_OF_REPETITION; i++) {
        test1(t1, t2);
    }
}

void test1(double t1[], double t2[]) {
    int i;
    double result;

    #pragma omp parallel for default(none) shared(t1, t2) private(i) reduction(+:result)
    for(i=0; i<TABLE_SIZE; i++) {
        result += t1[i]*t2[i];
    }
}

I'm running script that sets TABLE_SIZE(2500, 5000, 100000, 1000000), NUM_OF_THREADS(1-24), NUM_OF_REPETITION(50000 as 50k, 100000 as 100k, 1000000 as 1M) at compile time. The problem is that computer is not utilizing all the threads that are offered all the time. It seems that problem is dependent on TABLE_SIZE.

For example when I compile the code with TABLE_SIZE=2500 all is fine till NUM_OF_THREADS=20. Then some weird things happen. When I set NUM_OF_THREADS=21 the program is utilizing only 18 threads(I observe htop to see how many threads are running). When I set NUM_OF_THREADS=23 and NUM_OF_REPETITION=100k it's using 18 threads, but if I change NUM_OF_REPETITION to 1M at NUM_OF_THREADS=23 it's using 19 threads.

When I change TABLE_SIZE to 5000 the anomally starts at 18 threads. I set NUM_OF_THREADS=18 and at NUM_OF_REPETITION=1M the program uses only 17 threads. When I set NUM_OF_THREADS=19 and NUM_OF_REPETITION=100k or 1M it uses only 17 threads. If I change NUM_OF_THREADS to 24 the program is using 20 threads at NUM_OF_REPETITION=50k, 22 threads at NUM_OF_REPETITION=100k and 23 threads at NUM_OF_REPETITION=1M.

This sort of inconsistency is going on and on with increasing TABLE_SIZE. The bigger the TABLE_SIZE the faster(at lower NUM_OF_THREADS) the inconsistency occours.

At this(OpenMP set_num_threads() is not working) post I read that omp_set_num_threads() sets the upper limit of threads that can be used by the program. And as you can see I've disabled dynamic teams and program is still not using all the threads. It doesn't help if I set environment variables OMP_NUM_THREADS and OMP_DYNAMIC either.

So I went and read some of OpenMP specification 3.1. And it says program should use the number of threads it is set by omp_set_num_threads(). Also omp_get_max_threads() function returns 24 available threads.

Any help would be greatly appreciated.

1

1 Answers

1
votes

I finally found a solution. I set the KMP_AFFINITY environment variable. It doesn't matter if I set variable to "compact" or "scatter"(I'm just interested in using all threads for now).

This is what documentation has to say(https://software.intel.com/en-us/articles/openmp-thread-affinity-control):

There are 2 considerations for OpenMP threading and affinity: First, determine the number of threads to utilize, and secondly, how to bind threads to specific processor cores.

If you do not set a value for KMP_AFFINITY, the OpenMP runtime is allowed to choose affinity for you. The value chosen depends on the CPU architecture and may change depending on what affinity is deemed most efficient FOR A VARIETY OF APPLICATIONS for that architecture.

Another source (https://software.intel.com/en-us/node/522691):

Affinity Types:

type = none (default)

Does not bind OpenMP* threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP* thread affinity interface to determine machine topology.

So I guess because I did not have KMP_AFFINITY set, the OpenMP runtime set most efficient affinity to its knowledge. Please correct me if I'm wrong.