I am having trouble understanding the occupancy calculator. I am having trouble with some development code where 512 threads works fine, but 1024 threads gives crappy numbers.
I am running Tesla C2050 on windows 7, developing in Matlab (its not my fault i have to use Matlab) and Mexfunction.
I thought i would play around with the occupancy calculator to try to find any other restrictions on my code that was affecting the results.
When i enter 1024 threads per block, there is 0% occupancy. With 512 threads, the occupancy is 33%. I would have thought that i would get at least something with 1024 threads. I have noted that the code and the occupancy calculator gives good results for a maximum of 704 threads (This is a number that doesn't represent anything real).
I believe my lack of understanding on this area is the reason i can not correct the error i am seeing in the code. Can anyone explain why i'm getting these results?
The numbers are:
- compute capability 2.0
- shared memory size 49152
- threads per block 512 or 1024
- registers per thread 44
- shared memory per block 0
ptxas info : Used 44 registers, 232 bytes cmem[0], 144 bytes cmem[2], 28 bytes cmem[16]