I am looking forward to speed up convolution with derivative of gaussian kernels (upto order 2/3) on large medical images (512 x 512 x 1000 double) in one of our open-source toolkits. We currently do this convolution via FFT.
After being suggested by a friend about ArrayFire and after reading this post, I am trying to see if I could adopt this toolkit. Seems like a great effort and enables us to handle multiple backends though I am currently interested in CUDA alone as that's what I have in hand.
I read this post on the forum that says that convolution in ArrayFire switches to frequency domain after a particular kernel size. I looked at the cuda file convolve.cu but I didn't find any calls to fft within ArrayFire or any of cuFFT stuff. Am i missing something?
Going forward, I would like to construct the derivative of Gaussian kernel directly in frequency domain, multiply with image FFT and bring it back. But I would like to compare the speed ups between creating the convolution kernel in space vs freq domain. Also, ArrayFire doesn't seem to have a Gaussian kernel in 3D.