1
votes

I would like to use Dynamic Parallelism for one of my algorithms, I launch a kernel from the CPU with 12 blocks * 512 threads and I was wondering if when I will use dynamic parallelism, each thread will launch the new kernel.

void main(){
 dynamic<<<12,512,0>>>(x,y,z);
}

__global__ void dynamic(int x,int y, int z){
preprocessing<<<1,100>>>(x,y);
}

__global__ void preprocessing(int x,int y){

 //do stuff
}

does this mean that each of my ~6000 threads will try to launch 100 threads ? or is my first dynamic kernel going to pause all the ~6000 threads, launch 100 threads and come back with the results to be processed by my 6000 previous threads ?

And if yes, how, could I simply use 100 threads ? should I use a condition like

if(tid==1){
preprocessing<<<1,100>>>(x,y)
}
1

1 Answers

4
votes

Yes, in your first instance, every thread that encounters the kernel launch will launch it's own kernel. Device kernel launches are asynchronous, i.e. control is immediately returned to the calling thread, just like on the host. So there is no pausing of threads.

If you only want a single child kernel launched from the parent kernel, your second method would work.

You may want to read the dynamic parallelism programming guide. You can play with the simple example given in the Programming Guidelines section and modify the kernel launch parameters to see one or multiple child kernels get launched.

Refer to p.11 of the programming guide for a description of asynchronous device kernel launches.