CUDA C programing guide: how do thread and block indexing calculations work?

Question

in the CUDA_C_Programming_Guide,Chapter2,Thread Hierarchy

relationship

__global__ void MatAdd(float A[N][N],float B[N][N],float C[N][N]）  
{
  int i=blockId.x*blockDim.x+threadIdx.x;  
  int j=blockId.y*blockDim.y+threadIdx.y;  
  if(i<N&&j<N)  
    C[i][j]=A[i][j]+B[i][j];
}

int main()
{
....
  dim3 threadPerBlock(16,16);
  dim3 numBlock(N/threadPerBlcok.x,N/threadPerBlock.y);
  MatAdd<<<numBlocks,threadPerBlock>>>(A,B,C);
....
}

I'm a fresh man to this,can't make sense of "int i=blockIdx.x*blockDim.x+threadIdx.x".Why can be this? Is there anyone can explain it to me? Thanks a lot. For example,how to confirm the Thread(1,1) in Block(1,1) using "i" and "j"?

Code belongs inside the post itself. Do not post images of code. — StoryTeller - Unslander Monica
picture one displayed the location relationship of block and thread — Y.fes
picture 1 is code, now i post it out.(Sorry , fresh to use it — Y.fes

Y.fes Y.fes · Accepted Answer · 2017-03-01T14:03:23

I find the answer in the << CUDA Programming: A Developer's Guide to Parallel Computing with GPUs >> autor:Shane Cook. In chapter 5,there is a clear explaination of that. As to 2D array, we need dim3 to create a 2D layout threads. "dim3 threadPerBlock(16,16)" means that a single block has 16 threads in its x axis and 16 threads in Y axis. "dim3 numBlocks(N/threadPerBlock.x,N/threadPerBlock.y)" means that a single grid has N/threadPerBlock.x block along x axis and N/threadPerBlock.y along the y axis. gridDim.x or gridDim.y means how many blocks along the x/y axis in a grid. blockDim.x or blockDim.y means how many threads along the x/y axis in a block. threadIdx.x or threadIdx.y means the thread index along the x/u axis in a block. blockIdx.x or block.idx.y means the block index along the x/y axis in a grid. so if we want to know the absolute thread index,we should know how many blocks behind current block and how many threads behind current thread (row*(sizeof(array_element)*width)))+((sizeof(array_element)*offset)). So we get i= blockIdx.x*blockDim.x+threadIdx.x . there is a picture show grid,block and thread dimensions. enter image description here

CUDA C programing guide: how do thread and block indexing calculations work?

1 Answers