My problem is the following: I want to add cuda code into an already existing c++ library and reuse my existing code as much as possible. In order to use polymorphism, I use template classes and template kernels. As such, everything is implemented in .cpp, .h and .cuh files. No .cu file is involved, and therefore nvcc is not used and the c++ compiler chokes on the <<< >>> kernel invocation syntax.
I have already seen [How to separate the kernel file CUDA with the main .cpp file and [How to call a CUDA file from a C++ header file? but I cannot find any design that would solve my problem.
The files involved:
main.cpp
Instanctate a bunch of my already existing classes, pass them to a CudaPrepare class that composes them and is responsible for preparing the data to be passed to cuda code with only primitive types.
#include "CudaPrepare.h"
#include "CudaSpecificType1.h"
#include "A.h" //already existing classes
#include "B.h" //already existing classes
void main()
{
A a(...);
B b(...);
CudaSpecificType1 cudaType(...);
CudaPrepare<CudaSpecificType> cudaPrepare(a, b, cudaType);
cudaPrepare.run();
}
CudaSpecificType1.cuh
class CudaSpecificType1
{
protected:
/*
a few members
*/
public:
CudaSpecificType1(...) : /*initializations*/ {}
float polymorphicFunction(/*args*/);
};
CudaPrepare.h
#include "A.h" //already existing classes
#include "B.h" //already existing classes
template<typename T>
class CudaPrepare
{
protected:
const A& a;
const B& b;
const T& t;
public:
CudaPrepare(const A& a, const B& b, const T& t): A(a), B(b), T(t) {/*some initialization stuff*/}
void run() const
{
/*
data preparation : various discretizations, sticking to primitive type only, casting to single precision etc...
*/
CudaClass<T> cudaClass(t, /*all the prepared data here*/);
cudaClass.run();
}
};
CudaClass.cuh
template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);
}
template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
};
The c++ compiler gives an error at the kernel invocation as expected. CudaClass::run() cannot be moved to a .cu file since the class is templated. The only thing I can think of is to introduce a .cu file replacing main.cpp / or containing a stub that would be called from main.cpp, but then nvcc cannot handle some c++11 features. In particular, A.h and B.h contain a lot of enum classes...