I have the following pseudo code (a loop) that I am trying to implement it (variable step size implementation) by using Matlab Parallel computing toolbox or Matlab distributed server computing. Actually, I have a matlab code for this loop that works in ordinary matlab 2013a.
Given: u0, t_0, T (initial and ending time value), the initial step size: h0
while t_0 < T
% the fist step is to compute U1, U2 which depend on t_0 and some known parameters
U1(t_0, h0, u0, parameters)
U2(t_0, h0, u0, parameters)
% so U1 and U2 are independent, which can be computed in parallel using Matlab
% the next step is to compute U3, U4, U5, U6 which depends on t_0, U1, U2, and known parameters
U3(t_0, h0, u0, U1, U2, parameters)
U4(t_0, h0, u0, U1, U2, parameters)
U5(t_0, h0, u0, U1, U2, parameters)
U6(t_0, h0, u0, U1, U2, parameters)
% so U3, U4, U5, U6 are independent, which can be also computed in parallel using Matlab
%finally, compute U7 and U8 which depend on U1,U2,..,U6
U7(t0, u0,h0, U1,U2,U3,U4,U5,U6)
U8(t0, u0,h0,U1,U2,U3,U4,U5,U6)
% so U7 and U8 are also independent, and we can compute them in parallel as well.
%Do step size control here, then assign h0:=h_new
t0=t0+h_new
end
Could you please suggest me the best way to implement the above code using Matlab parallel? By the best way I mean I want to get a speedup for the whole computation as fast as possible. (I have an access to supercomputer LEO III which has 162 computer nodes (with a total of 1944 cores). So each node has 12 cores.)
My idea is to compute U1, U2 on two separate workers (cores) which have their own memory, at the same time. Using the obtained results for U1, U2, one can do the similar way for computing U3,U4,U5,U6, and finally for U7, U8. For that I think I need to use PARFOR within Matlabpool? But I do not know how many indices (corresponding to the number of cores/processors) I need for the loop.
My questions are:
I can use supercomputer as mentioned above, so I can use Matlab Distributed Computing server?
For this code, should I use Parallel Computing Toolbox or Matlab Distributed Computing server? I mean with Parallel Computing Toolbox (local workers), I cannot specify which workers will compute U1 and U2 (also for U3, U4,...) since they share memory and run interactively, is it right?
If I would use the proposed idea, then how many workers that I will need? probably 8 cores? Is this better to use 1 compute node and ask for 9 cores (8 for use and one for matlab session) or to use 8 computer nodes?
I am a beginner with Matlab Parallel Computing. Please give your suggestions! Thanks!
Peter