I am currently converting a C++ program into CUDA code, and part of my program runs a fast Fourier transform. Originally I ran FFTW, but I saw that I couldn't call it in kernel, so I then rewrote that part using cufft but it tells me the same thing!
Are there any FFT that will run inside a CUDA kernel?
Can I just add __device__
to the fftw library?
I would like to avoid having to initialize or call the FFT in host. I want a completely on the gpu type function, if one exists.