I know that, in general, CUDA kernels cannot be called directly from a .cpp file. Instead, if such capability is desired, a kernel must be wrapped in a CPU-callable function whose interface goes into a .h file and whose implementation goes into the .cu file along with the kernel.
However, abiding by this policy poses a problem if the kernel is templated in its type and one wishes to pass that templatizability on through the CPU wrapper to the .cpp file (since a template interface must be in the same file (.h) as its implementation, hence causing problems for whatever non-nvcc compiler attempts to access that .h file).
Does anyone know of a way around this limitation? Perhaps there is none, as evidenced by the fact that (the fully templatized) CUDA Thrust library is directly callable only from .cu files (see here)?