0
votes

I am sorry if this has been asked before, I could not locate it. It is a simple question, I am trying to use OpenMP such that the each thread runs all the statements inside the for loop.

Example: Assume having two CPUs, thus, using two threads.

#pragma omp for schedule(dynamic) 
    for(int n=0; n<n_size; ++n) { 
foo1();
foo2();
}

I want Thread[1] to sequentially process foo1() and foo2(), Thread[2] to process another iteration but with foo1() and foo2(), and so on. I have tried to use sections, right after declaring the for statement, but, the program went into loose.

Any help would be appreciated.

Cheers, -Rawi

######################################################

After the comments and the discussion below, I will give a simple program:

// put inside main()
int k;
#pragma omp parallel num_threads(2)
    {
#pragma omp for schedule(dynamic) // or using this: schedule(dynamic); I don't know which one is faster
        for( int n=0; n<4; ++n) {
 // #pragma omp single
            { k=0;
                foo1(k);
                foo2(k);
            }
        }

    }

// main ends here

// foo1 increments k which is passed as a reference, then prints it, then, foo2, increments k. So the upper value should be 2. Here's how they look like:
void foo1(int &n){
    cout<<"calling foo1"<<" k= "<<n<<" T["<<omp_get_thread_num()<<endl;
    ++n;

}

void foo2(int &n){
    cout<<"calling foo2"<<" k= "<<n<<" T["<<omp_get_thread_num()<<endl;
    ++n;
}

Here is the output:

calling foo1 k= calling foo1 k= 0 T[00 T[1
calling foo2 k= 1 T[0
calling foo1 k= 0 T[0
calling foo2 k= 1 T[0

calling foo2 k= 2 T[1
calling foo1 k= 0 T[1
calling foo2 k= 1 T[1

As we see, k was 3 for T[1] at foo2, while it should be 1.

Why I am getting this error? The foo2 depends on the values found by foo1 (in my application I have actual parameters passed to the function).

So, using '#pragma omp single' helped a bit, but, there was a comment that this should not be nested! Here's the output after using '#pragma omp single':

calling foo1 k= 0 T[0
calling foo2 k= 1 T[0
calling foo1 k= 0 T[1
calling foo2 k= 1 T[1

However, there should be 4 more outputs (the odd n values)?

1
All threads will run all the commands inside the scope of the loop, with different ranges of n passed to different threads. I conclude that I don't understand your question and that you should therefore explain more carefully what problem you have and what you are trying to achieve.High Performance Mark
omp_get_thread_num returns the identity of the thread that runs it.Fred Foo
I need each thread to sequentially process all the functions listed inside the for-loop block. This is because I have heap memory shared between foo1 and foo2, thus, if processed with different threads, which is what is happening right now, the result is not what I need. Of course, other threads should work on another n iteration each.Mohd
OpenMP is already doing what you want based on the way you describe your question. I don't understand what the problem is. Maybe you have multiple threads trying to write to the same memory location (a race condition)? Can you provide a bit more code?Z boson
I am not sure if OpenMP is already doing that, my simple programs showed that some foo1 and foo2 are distributed among the available threads. I have found a directive that might help a bit, '#pragma omp single', that specifies that the given statement/block is executed by only one thread. Here is a snippet: #pragma omp parallel num_threads(2) { #pragma omp for schedule(static) for(int n=0; n<10; ++n) { #pragma omp single { foo1(n); foo2(n); } } }Mohd

1 Answers

0
votes

Simply don't parallelize the for loop, but still put it inside a parallel region.

#pragma omp parallel
{
  for(int n=0; n<n_size; ++n)  // every thread will run all iterations
  { 
    foo1();
    foo2();
  }
  // threads are not synchronised here! (no implicit barrier)
}