1
votes

I have the following code in MATLAB

[Mx,Nx] = size(x);
[My,Ny] = size(y);

padded_x = zeros(Mx+2*(My-1),Nx+2*(Ny-1));
padded_x(My:Mx+My-1,Ny:Ny+Nx-1) = x; 
y = rot90(y,2);

z = zeros(Mx+My-1,Nx+Ny-1);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
        z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));
    end
end

that is part of a 2D convolution implementation. Is there any way that this can become faster? e.g. vectorizing these 2 for loops? I know that there are faster algorithms that compute 2D convolution, but I want to speed this one up. So, I am not looking for an algorithm with different complexity, just something with lower complexity constant. I would also like to keep this in MATLAB and not to use MEX - files etc. Finally, the supplied conv2 function is also not the solution I am looking for.

1
and you don't want to use the supplied conv2 function? - gauteh
no, sorry I forgot to mention that! Actually conv2 uses FFT, so I thought it was obvious that it belongs to the "different algorithm" category - Controller
@Controller I was not aware that conv2 used fft. It is not very easy to find information about that either. However, does it matter if it does? - patrik
@patrik Well, yes it does in this case, because it leads to a different complexity. The concept is to see how these two algorithms (mine and conv2) scale with the size of inputs. I am just trying to find out if there can be a faster implementation of my algorithm in order to better approximate the difference between the two algorithms. - Controller
What are the typical datasizes for x and y? - Divakar

1 Answers

4
votes

For each iteration, you can replace the elementwise multiplication and double summations with a fast matrix multiplication.

That is -

z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));

would be replaced by -

M = padded_x(i:i+My-1,j:j+Ny-1);
z(i,j) = M(:).'*y(:);

Thus, the loopy portion of the original code could be replaced by -

z = zeros(Mx+My-1,Nx+Ny-1);
yr = y(:);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
         M = padded_x(i:i+My-1,j:j+Ny-1);
        z(i,j) = M(:).'*yr;
    end
end

Quick tests: With x and y as 200 x 200 each, the runtimes were -

------------------------- With Original Approach
Elapsed time is 10.357977 seconds.
------------------------- With Proposed Approach
Elapsed time is 5.209822 seconds.