0
votes

I have been running a program literally hundreds of times but recently found that one input parameter set causes the following error:

In DElambda at 116
  In parallel_function>make_general_channel/channel_general at 879
  In remoteParallelFunction at 31
??? Error using ==> parallel_function at 598
The session that parfor is using has shut down

Error in ==> CreateCurve at 86
parfor j=1:10

??? The client lost connection to an unknown lab.
This  might be due to network problems, or the interactive matlabpool job might have errored. This is
causing: java.lang.OutOfMemoryError: GC overhead limit exceeded

It happens when I set the min and max values for the parameter search space to min[0;0] and max[1.5;1.5] and set the Population size to 10k (it's differential evolution). I have not touched the other parameters at any point. Whenever I try to run it with the above parameters I get the error above.

However, when I drop the population size to 1k it converges (to incorrect answer due to insufficient searches). Alternately when I use a population size of 10k with any other set of parameters that I have tried it has worked perfectly and converges to the correct solution?

Seems very odd?

I am currently re-running the problem parameter set using a for loop rather than the parfor loop (and matlabpool's switched off), to see if this runs any better. Unfortunately this is very time consuming so I won't know the results for a while.

In the mean time can anyone explain what is causing this error? And/or tell me how to debug parallel code?

Just to add the code ran fine with the rogue parameter set when I used for instead of parfor! So I really need to find someway of debugging in the parallel environment so that I can isolate and fix this bug. Using for rather than parfor is just too slow!

2
As the error java.lang.OutOfMemoryError: GC overhead limit exceeded says, it's an out of memory. Without seeing the code, it's hard to tell why those specific input cause the out-of-memory.Oleg
I am happy to post up code its pretty large though! But would like to find out how to debug such problems.Bazman

2 Answers

2
votes

As @Oleg pointed out, older versions of Parallel Computing Toolbox had restrictive data size limitations for transfers to and from PARFOR. This limitation got fixed in R2013a, but unfortunately the doc page @Oleg linked to didn't get updated. If you can, please try again using R2013a.

1
votes

To avoid out-of-memory you need a calculator and to know the limitations.

You are limited to a size of 2GB for 64bit OS and 600MB for 32bit OS on each data transfer between client and worker. More details in Object Data Size Limitations and also mentioned in the parfor().

Then, you need to calculate (with a calculator) how much data you are transferring within each loop, i.e. the size of the arrays created by the code.