I would like to use Dynamic Parallelism for one of my algorithms, I launch a kernel from the CPU with 12 blocks * 512 threads and I was wondering if when I will use dynamic parallelism, each thread will launch the new kernel.
void main(){
dynamic<<<12,512,0>>>(x,y,z);
}
__global__ void dynamic(int x,int y, int z){
preprocessing<<<1,100>>>(x,y);
}
__global__ void preprocessing(int x,int y){
//do stuff
}
does this mean that each of my ~6000 threads will try to launch 100 threads ? or is my first dynamic kernel going to pause all the ~6000 threads, launch 100 threads and come back with the results to be processed by my 6000 previous threads ?
And if yes, how, could I simply use 100 threads ? should I use a condition like
if(tid==1){
preprocessing<<<1,100>>>(x,y)
}