I'm using CUDA and attempting to use a function pointer to pass a CUDA function to a library that later uses this function in its device kernel, similar to the CUDA function pointer example.
The important sections of the code are:
/** Type definition for the execution function in #qsched_run. */
typedef void (*qsched_funtype)( int , void * );
__device__ void gpuTest(int type , void *data)
{
....
}
__device__ qsched_funtype function = gpuTest;
void main(...)
{
//Various initialization setup.
if( cudaMemcpyFromSymbol( &func , function , sizeof(qsched_funtype) ) != cudaSuccess)
error("Failed to copy function pointer from device");
qsched_run_CUDA( &s , func );
}
The qsched_run_CUDA function is a library function that does some initialization, copies the function pointer to the device (to a variable it can see) and then runs a kernel that at some points calls the gpuTest function using that function pointer.
The code compiles correctly provided I use -G with the following nvcc call:
nvcc -g -G -m64 -I../src ../src/.libs/libquicksched_cuda.a -L/home/aidan/cuda_6.0/lib -L/home/aidan/cuda_6.0/lib64 -lcudart -lcuda -DWITH_CUDA -gencode arch=compute_30,code=sm_30 -lgomp test_gpu_simple.cu -o out.out
where
../src/.libs/libquicksched_cuda.a
is the library containing the qsched_run_CUDA function.
The moment I remove the -G flag from my nvcc call then suddenly it all breaks, and the kernel run in qsched_run_CUDA crashes with an invalid program counter error, and the function pointer (including in my own .cu file) is set to 0x4.
Presumably I need to use the seperate compilation in CUDA ( http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#using-separate-compilation-in-cuda ) as explained vaguely in Cuda function pointer consistency - however I'm not sure how to do this when using library functions, neither nvcc's guide nor the stackoverflow link make it obvious how to do this.
Has anyone any experience with this? I attempted to briefly try to work out nvlink to do this but I didn't get far (it didn't seem happy with my passing it a library).
__noinline__
e.g.__device__ __noinline__ void gpuTest( ...
? A complete code that reproduces the issue would be useful. – Robert Crovella