Consider the following scenario: I am writing a function, within which there is a computationally intensive loop. I parallelized it with TBB's parallel_for
. Now, the problem is that this function may be used on its own, and benefits from the parallelization. Or it maybe used within another loop. In the later case, the outer loop can also be parallelized. And often, it is better to only parallelize the outer loop.
Normally in TBB parallelize both the outer and inner loop is not a problem, since unlike OpenMP, nested parallelization in TBB does not results in additional threads being created. TBB merely create more tasks. However, sometime the overhead of the creating more tasks in the inner loop is still undesirable (I observed a 40% slowdown in one extreme situations).
So is there a way to have TBB do not create any task when parallel_for
etc is invoked while execution another parallel_for
algorithm? Similar to the effect of OMP_NESTED=FALSE
for OpenMP.