1
votes

If I launch a 2D kernel in which each thread operates on a single element of a 2D array, which way of accessing the element is coalesced? Is it array[x][y] or array[y][x]?

1

1 Answers

8
votes

If x = threadIdx.x and y = threadIdx.y then

array[y][x]

will be coalesced but

array[x][y]

not. The reason is that C uses row-major order, i.e. the last index is the fastest running one so that array[y][x] and array[y][x+1] go to adjacent locations in memory. And threads within CUDA blocks are arranged so that threadIdx.x runs fastest, then y and finally z.