2
votes

I am writing a very very long CUDA kernel, and it is pretty awful for human readability. Is there any way to organize CUDA kernels with functions for example outside of the kernel? Example:

__global__ void CUDA_Kernel(int* a, int* b){
     //calling function 1
     //calling function 2
     //calculation function
         .......
}
2

2 Answers

4
votes

A function can be called from inside a kernel if it is defined using the __device__ keyword.

For example:

__device__ int test_fun(int val)
{
   return 2*val + 3;
}


__global__ void kern_test( int * data) 
{
   int aOffset = blockDim.x * blockIdx.x + threadIdx.x;
   data[offset] = test_fun(data[offset]);
}
4
votes

Yep. Define and call device functions:

__device__ float foo(float x, float y) {
 return x+y*x;
}