maximum number of threads on gpu

Question

I am using TESLA T10 device and it has 2 cuda devices and maximum number of threads in a block is 512 and maximum threads along each dimension is (512,512,64) and maximum grid size is (65535,65535,1) and it has 30 multiprocessors on each cuda device.

now i want to know how many threads i can run in parallel.i read previous solutions but none of them clear my doubt. from previous read =(30)*512 threads i can run in parallel(maxNoOfMultiprocessor * maxThreadBlockSize)

but when i launched 32 blocks of 512 threads still it is working how is it possible??? i am not understanding these maximum threads in each dimension and also maximum grid size part please explain with an example....... thanks in advance

Maybe the last two blocks that crossed the limit goes for global-synchronization zone so first 30 blocks are finished first then the last two are in another execution queue. Maybe. — huseyin tugrul buyukisik
that means we can launch any no of thread block with maximum no of threads in each block is 512 keeping in mind so first run 30*512 will executed then next 30*512 and so on — user2182259
But you cant be sure about which big block is executed before. — huseyin tugrul buyukisik

Robert Crovella Robert Crovella · Accepted Answer · 2013-11-16T14:03:31

For the purposes of this discussion, forget about how many multiprocessors there are. It has nothing to do with how many blocks you can launch in a kernel (i.e. the grid.)

The number of threads you can run in parallel (i.e. that can execute simultaneously) is different than the number of threads you can launch, or the number of blocks you can launch.

Normally, you do not want to launch grids that have only as many threads as the machine can run at a given time (maxNoOfMultiprocessor * maxThreadBlockSize). The machine wants many more threads than that, so it can hide latency.

Your machine is limited to 512 threads per block, but you can launch a single-dimensional grid of up to 65535 blocks. This does not mean that all those blocks/threads are running simultaneously, but the machine will process them all eventually.

maximum number of threads on gpu

2 Answers