0
votes

I need to convolve a matrix with many other matrices with few calls to convn.

for example: I have size(MyMat)=[fm, fm ,1, bSize] and size(masks)=[s, s, maskNum]

I want res(:,:,k,:) to be the product of convolving masks(:,:,k) with MyMat

res(:,:,k,:)=convn(MyMat,masks(:,:,k));

since the convolution takes up over 80% of the running time for my script and is called hundreds of thousands of times, I don't want to use a loop.

I'm looking for the fastest way to do this. basically, you could say I have bSize matrices, and I want to apply convolution masks masks to all of them with as few calls as possible to convolution.

The matrices are all small,non-sparse, fft-based convolution will probably slow it down (as a commentor here verified :) )

(The reason I have a 1 in the size of MyMat is because I actually have more elements in that dimension, but I compute the convolution for each element in that dimension in a loop)

The main goal is simply to eliminate the need for the following loop, or make it parallel with very little overhead, if possible:

for i=1:length
res(:,:,:,i)=convn(MyArray,convMask(:,:,i));
end

parallelizing for the GPU would be great if there's a way to do this with less overhead than the usual parfor

Thank you!

1
What do you mean by "small"? 10-by-10-ish or 100-by-100-ish?horchler
matrix sizes can be anywhere between 1x1x10x1000 to 9x9x20x1000, but the convolutions would be between matrices of size up to 9x9x1x1000 (and in the future maybe 21x21x1x1000). The convolutions will be applied with multiple masks, which will account for the 3rd dimensionuser1999728

1 Answers

0
votes

I assume that you are preallocating the array res correctly? Without a simple demo of what your doing and an idea of the size of fm, s, etc., one can only make guesses to help you. If the sizes of your matrices are large enough you might look into FFT-based convolution methods (there are some for convn on the Matlab File Exchange). If the data is sparse (> 50% zeros), you could try converting this to matrix multiplication and use sparse data types. You could also try gpuArray/convn if you have a decent one.