Prerequisites:
- parallel engine: OpenMP 3.1+ (can be OpenMP 4.0 if needed)
- parallel constructs: OpenMP tasks
- compiler: gcc 4.9.x (supports OpenMP 4.0)
Input:
- C code with loops
- loop have cross-iteration data dependency(ies): “i+1“ iteration needs data from “i” iteration (only such kind of dependency, nothing else)
- loop body can be partially dependent
- loop cannot be split in two loops; loop body should remain solid
- anything reasonable can be added to loop or loop body function definition
Code sample:
(Here conf/config/configData variables are used for illustration purposes only, the main interest is within value/valueData variables.)
void loopFunc(const char* config, int* value)
{
int conf;
conf = prepare(config); // independent, does not change “config”
*value = process(conf, *value); // dependent, takes prev., produce next
return;
}
int main()
{
int N = 100;
char* configData; // never changes
int valueData = 0; // initial value
…
for (int i = 0; i < N; i++)
{
loopFunc(configData, &valueData);
}
…
}
Need to:
- parallelise loop using omp tasks (omp for / omp sections cannot be used)
- “prepare” functions should be executed in parallel with other “prepare” or “process” functions
- “process” functions should be ordered according to data dependency
What have been proposed and implemented:
- define integer flag
- assign to it a number of first iteration
- every iteration when it needs data waits for flag to be equal to it’s iteration
- update flag value when data for next iteration is ready
Like this:
(I reminds that conf/config/configData variables are used for illustration purposes only, the main interest is within value/valueData variables.)
void loopFunc(const char* config, int* value, volatile int *parSync, int iteration)
{
int conf;
conf = prepare(config); // independent, do not change “config”
while (*parSync != iteration) // wait for previous to be ready
{
#pragma omp taskyield
}
*value = process(conf, *value); // dependent, takes prev., produce next
*parSync = iteration + 1; // inform next about readiness
return;
}
int main()
{
int N = 100;
char* configData; // never changes
int valueData = 0; // initial value
volatile int parallelSync = 0;
…
omp_set_num_threads(5);
#pragma omp parallel
#pragma omp single
for (int i = 0; i < N; i++)
{
#pragma omp task shared(configData, valueData, parallelSync) firstprivate(i)
loopFunc(configData, &valueData, ¶llelSync, i);
}
#pragma omp taskwait
…
}
What happened:
It fails. :)
The reason was that openmp task occupies openmp thread. For example, if we define 5 openmp threads (as in the code above).
- “For” loop generates 100 tasks.
- OpenMP runtime assign 5 arbitrary tasks to 5 threads and starts these tasks.
If there will be no task with i=0 among started tasks (it happens time to time), executing tasks wait forever, occupy threads forever and the task with i=0 never being started.
What's next?
I have no other ideas how to implement the required mode of computation.
Current solution
Thanks for the idea to @parallelgeek below
int main()
{
int N = 10;
char* configData; // never changes
int valueData = 0; // initial value
volatile int parallelSync = 0;
int workers;
volatile int workingTasks = 0;
...
omp_set_num_threads(5);
#pragma omp parallel
#pragma omp single
{
workers = omp_get_num_threads()-1; // reserve 1 thread for task generation
for (int i = 0; i < N; i++)
{
while (workingTasks >= workers)
{
#pragma omp taskyield
}
#pragma omp atomic update
workingTasks++;
#pragma omp task shared(configData, valueData, parallelSync, workingTasks) firstprivate(i)
{
loopFunc(configData, &valueData, ¶llelSync, i);
#pragma omp atomic update
workingTasks--;
}
}
#pragma omp taskwait
}
}
OMP_WAIT_POLICY
to set it toPASSIVE
in case it allows for better task scheduling... – Gilles