0
votes

I have the following pseudo code (a loop) that I am trying to implement it (variable step size implementation) by using Matlab Parallel computing toolbox or Matlab distributed server computing. Actually, I have a matlab code for this loop that works in ordinary matlab 2013a.

Given: u0, t_0, T (initial and ending time value), the initial step size: h0

while t_0 < T  

% the fist step is to compute U1, U2 which depend on t_0 and some known parameters

U1(t_0, h0, u0, parameters)   

U2(t_0, h0, u0, parameters)   

% so U1 and U2 are independent, which can be computed in parallel using Matlab

% the next step is to compute U3, U4, U5, U6 which depends on t_0, U1, U2, and known parameters

U3(t_0, h0, u0, U1, U2, parameters) 

U4(t_0, h0, u0, U1, U2, parameters)  

U5(t_0, h0, u0, U1, U2, parameters)  

U6(t_0, h0, u0, U1, U2, parameters)

% so U3, U4, U5, U6 are independent, which can be also computed in parallel using Matlab

%finally, compute U7 and U8 which depend on U1,U2,..,U6

U7(t0, u0,h0, U1,U2,U3,U4,U5,U6)

U8(t0, u0,h0,U1,U2,U3,U4,U5,U6)

% so U7 and U8 are also independent, and we can compute them in parallel as well.

%Do step size control here, then assign h0:=h_new
t0=t0+h_new

end

Could you please suggest me the best way to implement the above code using Matlab parallel? By the best way I mean I want to get a speedup for the whole computation as fast as possible. (I have an access to supercomputer LEO III which has 162 computer nodes (with a total of 1944 cores). So each node has 12 cores.)

My idea is to compute U1, U2 on two separate workers (cores) which have their own memory, at the same time. Using the obtained results for U1, U2, one can do the similar way for computing U3,U4,U5,U6, and finally for U7, U8. For that I think I need to use PARFOR within Matlabpool? But I do not know how many indices (corresponding to the number of cores/processors) I need for the loop.

My questions are:

  1. I can use supercomputer as mentioned above, so I can use Matlab Distributed Computing server?

  2. For this code, should I use Parallel Computing Toolbox or Matlab Distributed Computing server? I mean with Parallel Computing Toolbox (local workers), I cannot specify which workers will compute U1 and U2 (also for U3, U4,...) since they share memory and run interactively, is it right?

  3. If I would use the proposed idea, then how many workers that I will need? probably 8 cores? Is this better to use 1 compute node and ask for 9 cores (8 for use and one for matlab session) or to use 8 computer nodes?

I am a beginner with Matlab Parallel Computing. Please give your suggestions! Thanks!

Peter

2
does any iteration of the T-loop depend on past iterations?Jonas
No, it doesn't! The T-loop (while loop you mean) is just for the time integration from t_0 (initial) to T(ending time).user3517471

2 Answers

0
votes

I suggest to parallelize the while-loop, since you want to be distributing many iterations among the nodes. Parfor is the easiest way to start with parallel computing, and does a good job for straightforward problems as yours. Only go with server if there's a lot of time steps that each take some significant time, because any parallelization comes with a certain overhead.

Computing locally allows you to make use of 12 cores in recent versions of Matlab; make sure that you have enough RAM to keep 13 copies of your loop body in memory. With good processor architecture and with no other programs competing for resources, it is fine to run on all cores.

Thus:

timeSteps = t0:h:T;

parfor timeIdx = 1:length(timeSteps)
    t0 = timeSteps(timeIdx);

    %# calculate all your u's here

    %# collect the output
    result{timeIdx,1} = U7;
    result{timeIdx,2} = U8;

end
0
votes

I would say all computations of U1,..U8 will need to call a function for computing matrix-vector multiplications. Let say we do not care about how long do they take for the moment (not much in my case). The problem is that, for the previous methods, U1,..,U8 are not independent (they are dependent!). That means to compute U_{i+1} you need U_{i}. So you need to compute them sequentially one after other. Now I could construct such a method that allows to compute U1, U2 at the same time (independent), the same holds for U3,..,U6, and for U7, U8. So I want to save the cpu time for the whole computation. That why I think one could use matlab parallel computing.