0
votes

I'm implementing separable convolution to speed up 2D Gaussian convolution.

clear all;
close all;

im = randi([0,255],1024,1024);

win = 7;
window = fspecial('gaussian',win,win/6);

[U, S, V] = svd(window);
v = U(:,1) * sqrt(S(1,1));
h = V(:,1)' * sqrt(S(1,1));

out1 = filter2(window, im);
out2 = filter2(h, filter2(v, im));
norm(out1 - out2)

tic
for i = 1:1000
    out1 = filter2(window, im);
end
toc

tic
for i = 1:1000
    out2 = filter2(h, filter2(v, im));
end
toc

The separable version is supposed to be faster by win*win/(win + win) = 2.5 times, but it is actually slower:

ans =
2.6250e-12

Elapsed time is 5.486270 seconds.
Elapsed time is 8.769868 seconds.

Was there any hidden implementation inside filter2?

2
You're making twice as many calls to filter2 in the second loop...it seems like this is the expected result. - gariepy
Please always read the source code before asking this kind of questions.. what makes you think that X should be faster than Y if you have no idea how Y works? - nirvana-msu
conv2 has a three-input version for separable kernels. Maybe that will be faster. You may need to flip some of the inputs to match the result of filter2 (filter2 computes correlation, not convolution) - Luis Mendo

2 Answers

1
votes

First of all, if you look at the source code of filter2, you can see that it is already implemented using the same separable SVD approach. This is the gist of what filter2 does in your case (but you really should just step through the code):

[U, S, V] = svd(window,'econ');
v = U(:,1) * sqrt(S(1,1));
h = V(:,1)' * sqrt(S(1,1));
out2 = conv2(v, h, im, 'same');

What you do is the same, but instead of using a more efficient single call conv2(v, h, im, 'same'), you end up doing recursive call conv2(conv2(im, v, 'same'), h, 'same'), which is clearly what makes it slower.

0
votes

Matlab's implementation of the built-in conv set of functions (including conv2) is sub-optimal. You can read about it (and the workarounds) here: http://undocumentedmatlab.com/blog/convolution-performance.

The basic idea is to use either the convolution theorem or a C++-based MEX implementation, or preferably both, as used by Bruno Luong's convfft utility.