calling templated CUDA kernels from a .cpp file

Question

I know that, in general, CUDA kernels cannot be called directly from a .cpp file. Instead, if such capability is desired, a kernel must be wrapped in a CPU-callable function whose interface goes into a .h file and whose implementation goes into the .cu file along with the kernel.

However, abiding by this policy poses a problem if the kernel is templated in its type and one wishes to pass that templatizability on through the CPU wrapper to the .cpp file (since a template interface must be in the same file (.h) as its implementation, hence causing problems for whatever non-nvcc compiler attempts to access that .h file).

Does anyone know of a way around this limitation? Perhaps there is none, as evidenced by the fact that (the fully templatized) CUDA Thrust library is directly callable only from .cu files (see here)?

kangshiyin kangshiyin · Accepted Answer · 2013-10-15T18:14:21

You are right. a kernel template always has to be instantiated in a .cu file.

For simple enough template functions (eg. only one type parameter), overloaded functions sometimes could fit your needs. OR you can also create another template for .cpp files.

kernel.cu

template <class T>
__global__ void kernel_axpy(T* x, T* y, int len) { ... }

void axpy(float* x, float* y, int len){ kernel_axpy<<<...>>>(x,y,len); }
void axpy(double* x, double* y, int len){ kernel_axpy<<<...>>>(x,y,len); }

axpy.h

extern void axpy(float* x, float* y, int len);
extern void axpy(double* x, double* y, int len);

template <class T> void cpp_axpy(T* x, T* y, int len) { std::cerr<<"Not implemented.\n"<<std::endl; }
template <> void cpp_axpy<float>(float* x, float* y, int len) { axpy(x,y,len); }
template <> void cpp_axpy<double>(double* x, double* y, int len) { axpy(x,y,len); }

main.cpp

#include "axpy.h"

...
{
    axpy(xx,yy,length);
    cpp_axpy<double>(xxx,yyy,lll);
}
...

calling templated CUDA kernels from a .cpp file

1 Answers