In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?
At the moment I'm getting invalid device function
when running kernels compiled with -arch=sm_35
on a 670 (max sm_30
) that don't require compute 3.5.
AFAIK you can't use multiple -arch=sm_*
arguments and using multiple -gencode=*
doesn't help. Also for separable compilation I've had to create an additional object file using -dlink
, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary
due to -lcudadevrt
, which I've needed for 3.5), how should I deal with this?
nvcc
build sequence, but I'm not going to go into the details of that. I believe when CUDA 6 is available, it will no longer throw an error when linkingcudadevrt
against pre-cc3.5 code that otherwise does not attempt to use dynamic parallelism, and then this problem will be straightforward to solve. CUDA 6 should be available soon. – Robert Crovella