I am using OpenMP and trying to get a span log(n) for the algorithm prefix sum with a table of size n. For each cells of the table I have the sum of all previous values of the table.
In the solution I have got, there is a loop I cannot parallelize, and I am using the numbers of the n threads in the loop to work on each cell of the array T (so the thread number i works on T[i]).
EDIT : Here, it is the sequential algorithm to do this with a table T of n cells.
prefix_sum(int ** T, int n)
{
for(i = 2; i <= n; i *= 2) // this loop cannot be parallelized
{
for (l=1; l <= n/i; l++) // this loop can be parallelized
{
T[l*i - 1] += T[l*i - i/2 - 1];
}
}
for(i = n/2; i >= 2; i /= 2) // this loop cannot be parallelized
{
for (l = 1; l < n/i; l++) // this loop can be parallelized
{
T[i*l + i/2 - 1] += T[i*l - 1];
}
}
}
return T;
}
I want to avoid coding each loop like this :
#pragma omp single
{
for (i=2; i <= n; i*=2)
{
#pragma omp parallel num_threads(n)
{
if ((omp_get_thread_num() % i) == (i - 1))
{
T[omp_get_thread_num()] += T[omp_get_thread_num() - i/2];
}
}
}
}
because the openMP specification notifies that for each #pragma omp parallel the team of threads is created and I lost the span of log(n). In this code, I have a span of log^2(n) with the creation of threads. Consequently, I try to do each loop like this :
omp_set_num_threads(8); //this is a test code with 8 threads
#pragma omp parallel
{
for (i = 0; i < 8; ++i)
{
printf("iteration : %d\n", i);
#pragma omp barrier
}
}
Each step of the for loops have to be done sequentially, so the idea was to parallelize them with n threads, and make all thread wait at the end of each step of the loop with the barrier.
But it seems like the barrier is stoping the parallelism. The execution displays this :
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 0
iteration : 1
iteration : 2
iteration : 3
iteration : 4
iteration : 5
iteration : 6
iteration : 7
^C <- the program loops infinitely...
It seems like only one thread executes the loop after the first barrier. From the openMP specification the sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team and this program respect this restriction.
So I am wondering if someone knows how to make the threads wait for the others at each step of the loop.