0
votes

This is a conceptual question. In CUDA, gridDim, blockDim and threadIdx can be 1D, 2D or 3D. I wonder, how are their 2D and 3D versions interpreted?

In more details, does CUDA think of multi-dimensional gridDim, blockDim and threadIdx just as a linear sequence, in the same way that C stores multi-dimensional array? If not, how should we interpret multi-dimensional gridDim, blockDim and threadIdx?

Thanks.

Edit 1. This question is not a duplicated one. I actually have come across the referred question. It asks about the order of execution of the GPU threads, not their layouts, as this one does.

Edit 2. Also, the answer to this question can be found at http://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-hierarchy. Thank you @talonmies, for the reference. To sum it up, multi-dimensional gridDim, blockDim and threadIdx is for convenience purposes. They can be interpreted just like a column major ordered multi-dimensional array.

2
What do you mean by how are their 2D and 3D versions interpreted? ? Could you expand your question a bit more?haccks

2 Answers

1
votes

Quoting directly from the CUDA programming guide

The index of a thread and its thread ID relate to each other in a straightforward way: For a one-dimensional block, they are the same; for a two-dimensional block of size (Dx, Dy),the thread ID of a thread of index (x, y) is (x + y Dx); for a three-dimensional block of size (Dx, Dy, Dz), the thread ID of a thread of index (x, y, z) is (x + y Dx + z Dx Dy).

So, yes the logical thread numbering in the programming model is sequential, with then the x dimension varying fastest, then the y dimension, then the z dimension. This applies both to thread numbering within blocks and block numbering within a grid. The numbering is analogous to column major ordered multi-dimensional arrays, although the actual threadIdx and blockIdx variables themselves are just structures reflecting internal thread and block identification words assign by the scheduler to each thread or block.

You should note that numbering implied by threadIdx and blockIdx are just for programmer convenience and don't imply anything about execution order of threads on the GPU.

0
votes

In more details, does CUDA think of multi-dimensional gridDim, blockDim and threadIdx just as a linear sequence, in the same way that C stores multi-dimensional array?

Yes.
All multidimensional arrays are linearized in C. They are linearized in row-major order--place all elements of the same row into consecutive locations, the rows are then placed one after another into the memory space.
CUDA C also uses row-major layout. An example of 2D array layout:

enter image description here