In my library I need to support devices of compute capability 2.0 and higher. For CC 3.5+ devices I’ve implemented optimized kernels which utilize Dynamic Parallelism. It seems that nvcc compiler does not support DP when anything less than “compute_35,sm_35” is specified (I'm getting compiler/linker errors). My question is what is the best way to support multiple kernel versions in such case? Having multiple DLLs and choosing between them at runtime will work but I was wondering if there is a better way.
UPDATE: I’m successfully using #if __CUDA_ARCH__ >= 350
for other things (like __ldg()
etc) but it does not work in DP case as I have to link with cudadevrt.lib which produces the following error:
1>nvlink : fatal error : could not find compatible device code in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.5/lib/Win32/cudadevrt.lib
#pragma comment(lib, ...)
. – Roger Dahl__CUDA_ARCH__
is defined only for device code. 2. For some reason, #pragma comment(lib, ) does not work for that particular library, cudadevrt.lib). That is, if I replace it with, say, cudart.lib then #pragma works just fine, but for cudadevrt.lib I'm getting errors like1>nvlink : error : Undefined reference to 'cudaLaunchDevice' in 'Win32/Debug/cdpSimplePrint.cu.obj'
– Alexey Kamenev