i used --ptax-options=-v while compiling my .cu code, it gave the following:
ptxas info: Used 74 registers, 124 bytes smem, 16 bytes cmem[1]
devQuery for my card returns the following:
rev: 2.0
name: tesla c2050
total shared memory per block: 49152
total reg. per block: 32768
now, i input these data into cuda occupancy calculator as follows:
1.) 2.0
1.b) 49152
2.) threads per block: x
registers per thread: 74
shared memory per block (bytes): 124
i was varying the x (threads per block) so that x*74<=32768. for example, i enter 128 (or 256) in place of x. Am I entering all the required values by occupancy calculator correctly? thanks.