cuda dynamic parallelism linkage error extern c

Question

I'm trying to link my CUDA Kepler's Dynamic Parallelism program as follows:

nvcc -m32 -arch=sm_35 -dc -Xcompiler '-fPIC' DFS_Solving.cu
nvcc -m32 -arch=sm_35 -Xcompiler '-fPIC' -dlink DFS_Solving.o -o link.o
gcc  -shared -Wl,-soname,libdfs.so -o libdfs.so DFS_Solving.o link.o -L/usr/local/cuda/lib  -lcudart
gcc -c proxy.c
gcc -o proxy proxy.o -L. -ldfs

And I aways get the following error:

./libdfs.so: undefined reference to `__fatbinwrap_66_tmpxft_000015c6_00000000_12_cuda_device_runtime_compute_50_cpp1_ii_5f6993ef'                    
collect2: error: ld returned 1 exit status

But: when I do the same procedure to compile a CUDA code with no Dynamic Parellelism, the program works.

Does anybody knows what can I do in order to make this compilation works?

You're not linking against -lcudadevrt anywhere that I can see. Dynamic Parallelism has special compilation requirements. You might want to look at a makefile in a cuda dynamic parallelism sample code. SO expects that you provide a complete MCVE, not just your compilation steps. — Robert Crovella

Robert Crovella Robert Crovella · Accepted Answer · 2015-03-12T15:49:29

It appears that you are missing the linkage against -lcudadevrt. CDP codes need to be linked against the device runtime.

Here's a fully worked example. My compile sequence is not identical to yours, but pretty close:

$ cat DFS_Solving.cu
#include <stdio.h>

extern "C"{

  void cuda_test();

}

__global__ void child_kernel(){

  printf("hello\n");

}

__global__ void parent_kernel(){

  child_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
}

void cuda_test(){

  parent_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ cat proxy.c
void cuda_test();

int main(){

  cuda_test();

}

$ nvcc -arch=sm_35 -dc -Xcompiler '-fPIC' DFS_Solving.cu
$ nvcc -arch=sm_35 -Xcompiler '-fPIC' -dlink DFS_Solving.o -o link.o
$ gcc  -shared -Wl,-soname,libdfs.so -o libdfs.so DFS_Solving.o link.o -L/usr/local/cuda/lib64  -lcudart -lcudadevrt
$ gcc -c proxy.c
$ g++ -o proxy proxy.o -L. -ldfs
$ ./proxy
hello
$

There are also various cuda sample codes that demonstrate how to compile and link CDP projects.

cuda dynamic parallelism linkage error extern c

1 Answers