4
votes

I'm fitting a models to several datasets with fminsearch, and I'm trying to do them in parallel. My code is running up to the start of the parfor loop, but the parfor loop seems to take forever to start! (The first line in the parfor is not executed). There are no errors, matlab just remains "busy".

I'm running on local cluster with 4 cores, started with matlabpool 4, which appears to start up fine. I'm running Matlab R2014b 64bit on Ubuntu 14.04.3, eight core i7-3770K @ 3.50GHz, 24GiB RAM (most is unused of course).

EDIT

Here is code that reproduces the problem!

file matlab_parfor_test_2

function f=matlab_parfor_test_2
f={}; 
for i=1:400
  a=@(p)i*p;             % make some functions depending on i
  b=@(p)a(p)+0;          % a function depending on this
  f=[f { @(p)b(p) }];    % create a list of i functions using this
end

file matlab_parfor_test_1

function matlab_parfor_test_1
f=matlab_parfor_test_2(); % create the functions
f=f(1:2);       % discard all but two functions
for i=1:2       % for each function                ('A')
  parfor j=1    % dummy parfor 
    tmp=f{i}; % just read a function from the cell ('B')
  end
end

The time taken to get from 'A' to the first 'B' (ie. time taken to "enter" the parfor) on my machine is

returning 400 functions: 20 sec
          500 functions: 32 sec
          600 functions: 45 sec
          700 functions: 64 sec

This is very odd, because in test_1 I discard all but 2 of those functions! Why should the discarded functions cause slowing?

I thought perhaps matlab is not actually deleting the unwanted functions in f. So I tried replacing f=f(1:2) with

f={f{1}, f{2}}; 

but this also did not help.

If I replace the parfor with for, then of course it takes under 1ms to execute.

Any ideas??

OLD VERSION OF QUESTION

function fit_all
  models = createModelFunctions();  % creates cell of function handles
  data   = { [1 2 3], [1 2 3] };    % create 2 data sets
  for i = 1:length(models)
    fprintf('model %g\n',i);
    parfor j = 1:length(data)
      fprintf('data %g\n',j);
      tmp = models{i};  % if I comment this line out, it runs fine!
      % p(j) = fminsearch(@(p)models{j}(p,data{j}), [0 0]);
    end
  end

the model functions are created in another file,

function models = createModelFunctions()
  models{1} = @(p,d) likelihoodfun(0,0,p,d);
  models{2} = @(p,d) likelihoodfun(1,0,p,d); 

function L = likelihoodfun(a,b,p,d)
  L = some maths here;

Running fit_all, I expected to see a list of model 1, data 1, data 2, model 2 etc.. The output I'm getting is

model 1

then the thing just stops: no prompt, matlab says "busy", UI and OS are responsive as usual. System monitor shows only 1 core is active. It never makes it into the parfor. If I press ctrl+C at this point, after a 3-minute delay I get

Operation terminated by user during parallel.internal.pool.serialize (line 21)
In distcomp.remoteparfor (line 69)
                serializedInitData = parallel.internal.pool.serialize(varargin);
In parallel_function>iMakeRemoteParfor (line 1060)
P = distcomp.remoteparfor(pool, W, @make_channel, parfor_C);
In parallel_function (line 444)
        [P, W] = iMakeRemoteParfor(pool, W, parfor_C);

If I comment out the line indicated, it works -- so the problem seems to be when I access the model functions... Similarly, it works fine if I change models to

 models={@sum,@sum}

i.e. it's just when I'm using function handles from another file...

1
I am running Matlab 2014a 64 bit on Windows and I cannot reproduce your problem (it works fine). I would try 2 things: (1) put the parfor in the outer loop. (2) Try to implement it without cell arrays. (2) is more a shot in the dark, but I had problem in the past with struct array and parfor, so maybe... - Itamar Katz
Hmm you're right. It must be to do with the function itself. If I replace it with a very simple external function, there's no problem. I'll experiment. - Sanjay Manohar
I meant that I tried it with your function (just setting L=0) and it works fine. - Itamar Katz
I have run some new tests! The results are unexpected. Question edited. Thanks - Sanjay Manohar

1 Answers

0
votes

When I run your code on my machine it runs fine. Both on my Windows and Linux. However, the first run always take a bit longer because you have to open up a parallel pool, is this what you're referring to? If so, this is normal and expected behavior.

FYI, you should be using parpool instead of matlabpool. Maybe the lagacy matlabpool code is having problems creating the pool? Also, make sure it's not closing your parallel pool each time.

If none of that works, try the code on someone else's computer and see if you can re-create the problem.