4
votes

I´m trying to make an implementation of Gaussian blur for a school project. I need to make both a CPU and a GPU implementation to compare performance.

I am not quite sure that I understand how Gaussian blur works. So one of my questions is if I have understood it correctly?

Heres what I do now: I use the equation from wikipedia http://en.wikipedia.org/wiki/Gaussian_blur to calculate the filter. For 2d I take RGB of each pixel in the image and apply the filter to it by multiplying RGB of the pixel and the surrounding pixels with the associated filter position. These are then summed to be the new pixel RGB values. For 1d I apply the filter first horizontally and then vetically, which should give the same result if I understand things correctly. Is this result exactly the same result as when the 2d filter is applied?

Another question I have is about how the algorithm can be optimized. I have read that the Fast Fourier Transform is applicable to Gaussian blur. But I can't figure out how to relate it. Can someone give me a hint in the right direction?

Thanks.

1

1 Answers

8
votes

Yes, the 2D Gaussian kernel is separable so you can just apply it as two 1D kernels. Note that you can't apply these operations "in place" however - you need at least one temporary buffer to store the result of the first 1D pass.

FFT-based convolution is a useful optimisation when you have large kernels - this applies to any kind of filter, not just Gaussian. Just how big "large" is depends on your architecture, but you probably don't want to worry about using an FFT-based approach for anything smaller than, say, a 49x49 kernel. The general approach is:

  • FFT the image
  • FFT the kernel, padded to the size of the image
  • multiply the two in the frequency domain (equivalent to convolution in the spatial domain)
  • IFFT (inverse FFT) the result

Note that if you're applying the same filter to more than one image then you only need to FFT the padded kernel once. You still have at least two FFTs to perform per image though (one forward and one inverse), which is why this technique only becomes a computational win for large-ish kernels.