I was trying to parallelize the following loop in my code with OpenMP
double pottemp,pot2body;
pot2body=0.0;
pottemp=0.0;
#pragma omp parallel for reduction(+:pot2body) private(pottemp) schedule(dynamic)
for(int i=0;i<nc2;i++)
{
pottemp=ener2body[i]->calculatePot(ener2body[i]->m_mols);
pot2body+=pottemp;
}
For function 'calculatePot', a very important loop inside this function has also been parallelized by OpenMP
CEnergymulti::calculatePot(vector<CMolecule*> m_mols)
{
...
#pragma omp parallel for reduction(+:dev) schedule(dynamic)
for (int i = 0; i < i_max; i++)
{
...
}
}
So it seems that my parallelization involves nested loops. When I removed the parallelization of the outmost loop, it seems that the program runs much faster than the one with outmost loop parallelized. The test was performed on 8 cores.
I think this low efficiency of parallelization might be related to nested loops. Someone suggests me using 'collapse' while parallelizing the outmost loop. However, since there are still something between the outmost loop and the inner loop, it was said 'collapse' cannot be used under this circumstance. Are there any other ways I could try to make this parllelization more efficient while still using OpenMP?
Thanks a lot.
calculatePot
is long running enough to warrant parallelization of the contained loop, it should offer enough parallelization to use up all availible parallel resources. If it doesn't always do this you could use an omp if-clause not to parallelize the inner loop when it isn't useful – Grizzly