1
votes

Is it possible to have Dynamic Parallelism feature in devices with compute capability less than 3.5? I know that nvcc will return an error if it encounter a kernel launch in a kernel, but is it possible to declare some PTX-level function to support this feature? For example here I have declared cudaLaunchDevice():

.extern .func(.param .b32 func_retval0) cudaLaunchDevice
{
    .param .b32 fun,
    .param .b32 parameterBuffer,
    .param .align 4 .b8 gridDimension[12],
    .param .align 4 .b8 blockDimension[12],
    .param .b32 sharedMemSize,
    .param .b32 stream
}

To hope that the CUDA-level declaration of this function in cuda_device_runtime_api.h will be mapped to the previous PTX-level declaration. (Does the program have to be linked with something?)

//CUDA-level declaration of cudaLaunchDevice()
extern "C" __device__
cudaError_t cudaLaunchDevice(void *func, void *parameterBuffer,
                             dim3 gridDimension, dim3 blockDimension,
                             unsigned int sharedMemSize,
                             cudaStream_t stream);

(The codes are from CUDA C programming guide book v6.5) .

1

1 Answers

1
votes

Unfortunately that does not work. You'll have to have a Kepler GK110 or any first or second generation Maxwell card.

"Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher." Source: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3OB61zgmK