3
votes

I have to optimize a piece of MATLAB code. The code is simple, yet it is a part of a calculation unit, which calls it ~8000 times (without redundancy) (This calculation unit is used ~10-20K times in real cases). The whole MATLAB code is quite long and complex (for a physicist, like me), yet MATLAB profiler claims that the following segment is responsible to nearly half the run time (!).

The code is essentially multiplying elementwise every permutation of 3 matrices from 3 groups (A,B,C) and sums it up with some weighting. Group A has a single matrix, group B has 4 matrices and group C has 7.

I tried some vectorizations techniques* yet at best got the same run time.

Using the MATLAB profiler I checked the total time spent at each line (for all 8000 calls) - I wrote those in comments.

for idx_b = 1:4

B_MAT=B_Container_Cell{idx_b};

for idx_c = 1:7

    C_MAT = C_Container_Cell{idx_b}(:,:,idx_c); % 60 sec

    ACB=A_MAT.*C_MAT.*B_MAT; % 20 sec

    Coeff_x = Coeff_x_Cell{idx_b}(p1,p2,idx_c,p3);
    Coeff_y = Coeff_y_Cell{idx_b}(p1,p2,idx_c,p3);
    Coeff_z = Coeff_z_Cell{idx_b}(p1,p2,idx_c,p3);

    Sum_x = Sum_x+Coeff_x.*ACB; % 15 sec
    Sum_y = Sum_y+Coeff_y.*ACB; % 15 sec
    Sum_z = Sum_z+Coeff_z.*ACB; % 15 sec

end

end

Some prior knowledge -
A_MAT is 1024x1024 complex double constant matrix defined ouside the loop
B_MAT is 1024x1024 double matrix, essentially sparse (only 0 and 1 values, ones are ~5% out of total elements)
C_MAT is 1024x1024 complex double

Sum_x/ Sum_y / Sum_z were properly initiated
Coeff_X / Coeff_y / Coeff_z are double scalars
p1,p2,p3 are parameters (constant for this code segment)

Does anybody know why the most consuming operation is variable assignment? (I tried to skip the assignment and replace C_MAT directly with it's expression, yet it worsens the performance)


Vectorization attempt
The techique I tried is to use cat, reshape and repmat to create 3 giant 2D matrices, element-wise multiply those and then put the all on top of each other (with reshape) and sum via the relevant dimention. The first matrix was A repeated 4*7=28 times, the second was the 4 B matrices repeated 7 times and the third was all C matrices spanned (=28 matrices).


Sample Input
The code on the following link generates sample input files. The run time with these variables (on my computer) is ~0.38 sec (the original code+variables ~0.42, the difference in my opinion is because the real C Cell container is very large, so extraction takes more time)

1
.* operation is commutative (unlike * matrix multiply), so you can perform A_MAT*B_MAT outside the inner loop. - Ben Voigt
@BenVoigt it doesn't change much the performance. - Alexander
You refer to vectorization techniques in your question, what have you tried? - BillBokeey
As you are talking about performance, please provide a fully reproducable situation (perhaps some extra code to generate input variables of the actual size, with their relevant properties). And of course make sure the runtimes match exactly with those inputs. If you indicate how fast you would expect it to go that could also help people to see whether certain tricks are worth trying. - Dennis Jaheruddin
If the container cells have identical sized matrices, why not use multi-dimensional arrays instead of the cell arrays that don't really support vectorized operations. Thus, B_Container_Cell could be replaced by 1024x1024x4 array, C_Container_Cell by 1024x1024x7x4 and similarly 4D arrays for Coeff_x_Cell, Coeff_y_Cell and Coeff_z_Cell each. Wouldn't that work? - Divakar

1 Answers

2
votes

Given that the arrays in the input cell arrays are of identical sizes, it might be a better idea to have the inputs stored as multi-dimensional arrays instead of cell arrays to leverage MATLAB's vectorized techniques, which in this case would be indexing for extracting specific elements and matrix-multiplication for sum-reduction. So, when forming the inputs, we could look to form multi-dimensional arrays corresponding to the inputs : B_Container_Cell, C_Container_Cell, Coeff_x_Cell, Coeff_y_Cell and Coeff_z_Cell. Now, these are 1D cell arrays with B_Container_Cell containing 2D arrays and rest have 3D arrays. Thus, when using multi-dimensinal arrays, we would have them as one additional dimension, i.e. they would be 3D and 4D arrays respectively.

To simulate their multi-dimensional array formats, let's convert the given cell arrays with concatenation using cat along their last+1 dimension, like so -

Bm = cat(3,B_Container_Cell{:});
Cm = cat(4,C_Container_Cell{:});

Cx = cat(4,Coeff_x_Cell{:});
Cy = cat(4,Coeff_y_Cell{:});
Cz = cat(4,Coeff_z_Cell{:});

Finally, the vectorized solution to use these multi-dimensional arrays and get the desired outputs -

%// Get ACB across all iterations and reshaped into (Nx28) shaped array
Ar = reshape(bsxfun(@times,bsxfun(@times,Cm,permute(Bm,[1,2,4,3])),A_MAT),[],28);

%// Use matrix-multiplication to sum reduce sliced versions of Cx, Cy and
%// Cz, to get respectived summed outputs
sz = size(A_MAT); %// Output array size
Sum_x_out = reshape(Ar*reshape(Cx(p1,p2,:,:),[],1),sz);
Sum_y_out = reshape(Ar*reshape(Cy(p1,p2,:,:),[],1),sz);
Sum_z_out = reshape(Ar*reshape(Cz(p1,p2,:,:),[],1),sz);

Please note that it doesn't look like the parameter p3 was used.

Runtime test results (for listed sample inputs) -

--------------------------------- With Original Approach
Elapsed time is 2.412417 seconds.
--------------------------------- With Proposed Approach
Elapsed time is 1.572035 seconds.