2
votes

I know that, in general, CUDA kernels cannot be called directly from a .cpp file. Instead, if such capability is desired, a kernel must be wrapped in a CPU-callable function whose interface goes into a .h file and whose implementation goes into the .cu file along with the kernel.

However, abiding by this policy poses a problem if the kernel is templated in its type and one wishes to pass that templatizability on through the CPU wrapper to the .cpp file (since a template interface must be in the same file (.h) as its implementation, hence causing problems for whatever non-nvcc compiler attempts to access that .h file).

Does anyone know of a way around this limitation? Perhaps there is none, as evidenced by the fact that (the fully templatized) CUDA Thrust library is directly callable only from .cu files (see here)?

1

1 Answers

3
votes

You are right. a kernel template always has to be instantiated in a .cu file.

For simple enough template functions (eg. only one type parameter), overloaded functions sometimes could fit your needs. OR you can also create another template for .cpp files.

kernel.cu

template <class T>
__global__ void kernel_axpy(T* x, T* y, int len) { ... }

void axpy(float* x, float* y, int len){ kernel_axpy<<<...>>>(x,y,len); }
void axpy(double* x, double* y, int len){ kernel_axpy<<<...>>>(x,y,len); }

axpy.h

extern void axpy(float* x, float* y, int len);
extern void axpy(double* x, double* y, int len);

template <class T> void cpp_axpy(T* x, T* y, int len) { std::cerr<<"Not implemented.\n"<<std::endl; }
template <> void cpp_axpy<float>(float* x, float* y, int len) { axpy(x,y,len); }
template <> void cpp_axpy<double>(double* x, double* y, int len) { axpy(x,y,len); }

main.cpp

#include "axpy.h"

...
{
    axpy(xx,yy,length);
    cpp_axpy<double>(xxx,yyy,lll);
}
...