The kernel uses: (--ptxas-options=-v
)
0 bytes stack frame, 0 bytes spill sotes, 0 bytes spill loads
ptxas info: Used 45 registers, 49152+0 bytes smem, 64 bytes cmem[0], 12 bytes cmem[16]
Launch with: kernelA<<<20,512>>>(float parmA, int paramB);
and it will run fine.
Launch with: kernelA<<<20,513>>>(float parmA, int paramB);
and it get the out of resources error. (too many resources requested for launch).
The Fermi device properties: 48KB of shared mem per SM, constant mem 64KB, 32K registers per SM, 1024 maximum threads per block, comp capable 2.1 (sm_21)
I'm using all my shared mem space. I'll run out of block register space around 700 threads/block. The kernel will not launch if I ask for more than half the number of MAX_threads/block. It may just be a coincidence, but I doubt it.
- Why can't I use a full block of threads (1024)?
- Any guess as to which resource I'm running out of?
- I have often wondered where the stalled thread data/state goes between warps. What resource holds these?