3
votes

My problem is the following: I want to add cuda code into an already existing c++ library and reuse my existing code as much as possible. In order to use polymorphism, I use template classes and template kernels. As such, everything is implemented in .cpp, .h and .cuh files. No .cu file is involved, and therefore nvcc is not used and the c++ compiler chokes on the <<< >>> kernel invocation syntax.

I have already seen [How to separate the kernel file CUDA with the main .cpp file and [How to call a CUDA file from a C++ header file? but I cannot find any design that would solve my problem.

The files involved:

main.cpp

Instanctate a bunch of my already existing classes, pass them to a CudaPrepare class that composes them and is responsible for preparing the data to be passed to cuda code with only primitive types.

#include "CudaPrepare.h"
#include "CudaSpecificType1.h"
#include "A.h" //already existing classes 
#include "B.h" //already existing classes

void main()
{
    A a(...);
    B b(...);
    CudaSpecificType1 cudaType(...);
    CudaPrepare<CudaSpecificType> cudaPrepare(a, b, cudaType);
    cudaPrepare.run();

}

CudaSpecificType1.cuh

class CudaSpecificType1
{
protected:
/*
a few members
*/
public:
CudaSpecificType1(...) : /*initializations*/ {}
float polymorphicFunction(/*args*/); 
};

CudaPrepare.h

#include "A.h" //already existing classes 
#include "B.h" //already existing classes

template<typename T>
class CudaPrepare
{
protected:
const A& a;
const B& b;
const T& t;
public:
CudaPrepare(const A& a, const B& b, const T& t): A(a), B(b), T(t) {/*some initialization stuff*/}
void run() const
{
/*
data preparation : various discretizations,  sticking to primitive type only, casting to single precision etc...
*/

CudaClass<T> cudaClass(t, /*all the prepared data here*/);
cudaClass.run();

}
};

CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
};

The c++ compiler gives an error at the kernel invocation as expected. CudaClass::run() cannot be moved to a .cu file since the class is templated. The only thing I can think of is to introduce a .cu file replacing main.cpp / or containing a stub that would be called from main.cpp, but then nvcc cannot handle some c++11 features. In particular, A.h and B.h contain a lot of enum classes...

1
Have you tried CUDA 7? It supports c++11. And yes, I think you must introduce a .cu file which launches your kernel.m.s.
@ms: if you use the driver API, then the <<<>>> style kernel launches are not necessary.talonmies
Put the kernel launch in a .cu file, with a wrapper function around it. In your CudaClass.cuh, put your templated cuda class with the run function, and have the run function call the wrapper function in the .cu file. If you want the wrapper function/kernel to be templated, then manually specialize it for the types you need. You've already said you intend to use the CUDA code "with only primitive types". That seems doable.Robert Crovella

1 Answers

1
votes

I experimented with Cuda 7.0 (was on 6.5 before). Sadly, there still seems to be no support for (at least) the following c++11 features:

  1. enum classes

  2. final keyword

  3. range based for loops

However, as suggested by Robert Crovella, explicit template instantiation solves the problem.

CudaClass.cuh must be splitted in two:

CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}

void run() const;

};

CudaClass.cu

#include "CudaClass.cuh"



//explicit instantiation, so that the kernel invocation can be in a .cu file
template class CudaClass<CudaSpecificType1>;
/*
other explicit instantiations for various types
*/



template<typename T>
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}