In the CUDA 5 programming guide, the following is said:
Launches may continue to a depth of 24 generations, but this depth will typically be limited by available resources on the GPU
My questions are the following:
does the CUDA runtime on the GPU guarantee that a depth of 24 can always be achieved and that, in some circumstances, might even go beyond 24 (case A)? Or do they mean 24 is the absolute maximum limit and this number might not indeed be reached at runtime (case B)?
if case B, what happens when a kernel is launched on the GPU and there is not enough resources? Does the launch fail? (weird if this is the case!)
I plan on writing a CUDA program and I would like to take benefit from the Kepler architecture. My algorithm absolutely needs function recursion at a level of 15-19 typically (the recursion level is bound to my data structures).
__device__function, and use it recursively. You just won't have kernels launching in this case at each round of recursion. I tried a simple factorial recursive impelementation in cuda, and was able to recurse up to a depth of 20, the limit oflong intto store the result. need cc2.0 - Robert Crovella