
I ran the deviceQuery and got the following result

./deviceQuery Starting... 

 CUDA Device Query (Runtime API) version (CUDART static linking)

 Detected 1 CUDA Capable device(s)
 Device 0: "GeForce GTX 560 Ti"
 CUDA Driver Version / Runtime Version          5.0 / 5.0
 CUDA Capability Major/Minor version number:    2.1
 Total amount of global memory:                 1024 MBytes (1073283072 bytes)
 (8) Multiprocessors x ( 48) CUDA Cores/MP:    384 CUDA Cores
 GPU Clock rate:                                1701 MHz (1.70 GHz)
 Memory Clock rate:                             2052 Mhz
 Memory Bus Width:                              256-bit
 L2 Cache Size:                                 524288 bytes
 Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D= (2048,2048,2048)    
 Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       49152 bytes
 Total number of registers available per block: 32768
 Warp size:                                     32
 Maximum number of threads per multiprocessor:  1536
 Maximum number of threads per block:           1024
 Maximum sizes of each dimension of a block:    1024 x 1024 x 64
 Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             512 bytes
 Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
 Run time limit on kernels:                     Yes
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support:                        Disabled
 Device supports Unified Addressing (UVA):      Yes
 Device PCI Bus ID / PCI location ID:           1 / 0
 Compute Mode:
 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)  >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 560 Ti      

My understanding is that I can create maximum of 65535 x 65535 x 65535 blocks with 1024 threads per block. Does that I can have 65535 x 65535 x 65535 x 1024 threads of maximum ? If not what is the maximum number of threads I can have ?

Can anyone clarify this doubt ?


2 Answers


Your understanding is correct. You can launch 65535 x 65535 x 65535 x 1024 threads theoretically but due to resource constraints you may be not able to hit the maximum.


You can't just multiply all the maximum grid dimensions and assume that you can have that many threads, unfortunately. You have 8 MPs and a maximum number of threads per MP = 1536, so that makes 8 * 1536 = 12288 threads max.