1
votes

When I run deviceQuery SDK sample it shows following stats:

Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

So does it mean that I can launch 1024*65535*65535*65535 total number of threads at the max?

1
Yes that is the theoretical maximum.talonmies
This is the theoretical maximum for a CC 2.0 device. Please refer to the table Technical Specifications per Compute Capability in the CUDA C Programming Guide for the device specific limits. On CC 3.* devices the maximum dimension of a grid in the x dimension is increased to (2^31)-1Greg Smith

1 Answers

0
votes

As talonmies pointed out, that is the theoretical maximum. However, the number of threads you can launch also depends on the amount of resources used by each thread too. This is because a particular block is executed in one Streaming Multiprocessor of the device. Streaming Multiprocessors have finite resources (especially registers and local memory), and that can limit the number of threads per block to less than the theoretical maximum listed in your question. So, you also have to be careful of the amount of resources used by each thread. If the amount resources used per thread is high, you may not be able to hit that value.