If I launch a 2D kernel in which each thread operates on a single element of a 2D array, which way of accessing the element is coalesced? Is it array[x][y] or array[y][x]?
1
votes
1 Answers
8
votes
If x = threadIdx.x and y = threadIdx.y then
array[y][x]
will be coalesced but
array[x][y]
not. The reason is that C uses row-major order, i.e. the last index is the fastest running one so that array[y][x] and array[y][x+1] go to adjacent locations in memory. And threads within CUDA blocks are arranged so that threadIdx.x runs fastest, then y and finally z.