I have implemented a combinatorial search algorithm (for comparison to a more efficient optimization technique) and tried to improve its runtime with parfor
.
Unfortunately, the work assignments appear to be very badly unbalanced.
Each subitem i
has complexity of approximately nCr(N - i, 3)
. As you can see, the tasks i < N/4
involve significantly more work than i > 3*N/4
, yet it seems MATLAB is assigning all of i < N/4
to a single worker.
Is it true that MATLAB divides the work based on equally sized subsets of the loop range?
No, this question cites the documentation saying it does not.
Is there a convenient way to rebalance this without hardcoding the number of workers (e.g. if I require exactly 4 workers in the pool, then I could swap the two lowest bits of i
with two higher bits in order to ensure each worker received some mix of easy and hard tasks)?
I don't think a full "work-stealing" implementation is necessary, perhaps just assigning 1
, 2
, 3
, 4
to the workers, then when 4
completes first, its worker begins on item 5
, and so on. The size of each item is sufficiently larger than the number of iterations that I'm not too worried about the increased communication overhead.