I'm working on a project that needs to make use of FFTs on both Nvidia and AMD graphics cards. I initially looked for a library that would work on both (thinking this would be the OpenCL way) but I wasn't having any luck.
Someone suggested to me that I would have to use each vendor's FFT implementation and write a wrapper that chose what to do based on the platform. I found AMD's implementation pretty easily, but I'm actually working with an Nvidia card in the meantime (and this is the more important one for my particular application).
The only Nvidia implementation I can find is the CUFFT one. Does anyone know how I can actually use the CUFFT library from OpenCL? The only way I can think of is by having some CUDA code alongside my OpenCL code. I've read that I can't just use OpenCL buffers as CUDA pointers ( Trying to mix in OpenCL with CUDA in NVIDIA's SDK template ). Instead, would I have to copy the buffers back to the host after running OpenCL kernels and then copy them back to the GPU using the CUDA memory transfer routines? I don't really like this approach as it seems to involve pointless memory transfers, I would much prefer it if I could just use CUFFT from OpenCL.