I am writing a very very long CUDA kernel, and it is pretty awful for human readability. Is there any way to organize CUDA kernels with functions for example outside of the kernel? Example:
__global__ void CUDA_Kernel(int* a, int* b){
//calling function 1
//calling function 2
//calculation function
.......
}