OpenMP : Creating threads, parallelism, barrier in loop

Question

I am using OpenMP and trying to get a span log(n) for the algorithm prefix sum with a table of size n. For each cells of the table I have the sum of all previous values of the table.

In the solution I have got, there is a loop I cannot parallelize, and I am using the numbers of the n threads in the loop to work on each cell of the array T (so the thread number i works on T[i]).

EDIT : Here, it is the sequential algorithm to do this with a table T of n cells.

prefix_sum(int ** T, int n)
{
    for(i = 2; i <= n; i *= 2) // this loop cannot be parallelized
    {
        for (l=1; l <= n/i; l++) // this loop can be parallelized
        {
            T[l*i - 1] += T[l*i - i/2 - 1];
        }
    }

        for(i = n/2; i >= 2; i /= 2) // this loop cannot be parallelized
        {
            for (l = 1; l < n/i; l++) // this loop can be parallelized
            {
                T[i*l + i/2 - 1] += T[i*l - 1];
            }
        }
    }

    return T;
}

I want to avoid coding each loop like this :

#pragma omp single
{
    for (i=2; i <= n; i*=2)
    {
        #pragma omp parallel num_threads(n)
        {
            if ((omp_get_thread_num() % i) == (i - 1))
            {
                T[omp_get_thread_num()] += T[omp_get_thread_num() - i/2];
            }
        }
    }
}

because the openMP specification notifies that for each #pragma omp parallel the team of threads is created and I lost the span of log(n). In this code, I have a span of log^2(n) with the creation of threads. Consequently, I try to do each loop like this :

omp_set_num_threads(8); //this is a test code with 8 threads

#pragma omp parallel
{
    for (i = 0; i < 8; ++i)
    {
        printf("iteration : %d\n", i);

        #pragma omp barrier
    }
}

Each step of the for loops have to be done sequentially, so the idea was to parallelize them with n threads, and make all thread wait at the end of each step of the loop with the barrier.

But it seems like the barrier is stoping the parallelism. The execution displays this :

iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 1
iteration : 2
iteration : 3
iteration : 4
iteration : 5
iteration : 6
iteration : 7
^C <- the program loops infinitely...

It seems like only one thread executes the loop after the first barrier. From the openMP specification the sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team and this program respect this restriction.

So I am wondering if someone knows how to make the threads wait for the others at each step of the loop.

What you mean by "it seems like the barrier is stoping the parallelism."? — Vladimir F
When I am printing the result table, I can see there is only the first iteration of the first loop which is done, and then I think there is only the master thread doing the others iterations, but as it has the number 0, it does nothing... — vlnk
Could you add the serial code before you tried using OpenMP so we have an idea what you're trying to achieve. — Z boson

yohjp yohjp · Accepted Answer · 2014-11-03T02:19:34

In your code, loop iterator i is shared unintentionally;

#pragma omp parallel

shall be

#pragma omp parallel private(i)

Live demo: http://melpon.org/wandbox/permlink/nE74kARjqoBjVWbr

OpenMP : Creating threads, parallelism, barrier in loop

1 Answers