I am trying to learn CUDA. I started to try matrix multiplication with the help of this article based on GPU. My main problem is that I am unable too understand how to access 2D array in Kernel since accessing a 2D array is a bit different than the conventional method (matrix[i][j]). This is the part where i am stuck:
for (int i = 0; i < N; i++) {
tmpSum += A[ROW * N + i] * B[i * N + COL];
}
C[ROW * N + COL] = tmpSum;
I could understand how ROW and COLUMN were derived.
int ROW = blockIdx.y*blockDim.y+threadIdx.y;
int COL = blockIdx.x*blockDim.x+threadIdx.x;
Any explanation with an example is highly appreciated. Thanks!
(i,j)
th element of result matrix. Since it takes a 1-D array, it can only find it as stacked rows. Thats whyROW * N + i
means ROWth row and ith element of that row but this is first matrix. Second matrix seems to be not transposed prior to this kernel, so it scans through a single column instead of a row. – huseyin tugrul buyukisikROW*N
and againI*N
.This logic seems to be tricky. I am unable to visualize it @huseyintugrulbuyukisik – uttejh