Does a kernels grid size determine the number of blocks, and the blocksize determine the numbers of threads?

Question

I cant seem to understand the wording for the CUDA kernel parameters <<<gridSize, blockSize>>>

In the code I am reviewing they are defined as

const dim3 blockSize(1, 1, 1); 
const dim3 gridSize( 1, 1, 1);

Replacing the hardcoded 1s with variable reference, would they be properly name if they were named like so

const dim3 blockSize(nThreadsX, nThreadsY, nThreadsZ); 
const dim3 gridSize(nBlocksX, nBlocksY, nBlocksZ);

where the maximum value that any argument to blockSize can be is set by the hardware (something like 512 or 1024?) and is the maximum number of threads that will run in a block in a single dimension?

Robert Crovella Robert Crovella · Accepted Answer · 2014-09-09T02:03:59

Yes, the proposed naming is sensible. Those dim3 parameters are intended to represent (x,y,z) dimensions. The block is composed of threads. The grid is composed of blocks.

Using your naming, nBlocksX, nBlocksY and nBlocksZ must all be less than corresponding hardware-defined limits. Those limits can be discovered from the programming guide (Table 12) or programmatically using a method such as that contained in the deviceQuery sample app.

There are similar limits for nThreadsX, nThreadsY and nThreadsZ, but in addition, the product nThreadsX * nThreadsY * nThreadsZ must also satisfy another limit (Maximum number of threads per block, which is either 512 or 1024 for current CUDA GPU hardware.

Does a kernels grid size determine the number of blocks, and the blocksize determine the numbers of threads?

1 Answers