EDITED to correspond with current state after linked question.
I am currently trying to reimplement basic Matrix multiplication in CUDA, and while my code works fine for Square matrices, and Rectangular Matrices whose dimensions are multiples of 8, it does not appear to function for Rectangular Matrices, whose dimensions are not multiples of 8.
The following is my Kernel multiplication function:
__global__ void matrixMultiply(float * A, float * B, float * C,
int numARows, int numAColumns,
int numBRows, int numBColumns,
int numCRows, int numCColumns) {
int Row = blockIdx.y * blockDim.y + threadIdx.y;
int Col = blockIdx.x * blockDim.x + threadIdx.x;
if (numAColumns != numBRows) return ;
if ((Row < numARows) && (Col < numBColumns)){
float Cvalue = 0;
for (int k = 0 ; k < numAColumns ; ++k )
Cvalue += A[Row*numAColumns + k] * B[k * numBColumns + Col];
C[Row*numCColumns + Col] = Cvalue;
}
}
The following is the memory allocation(for readability I have cut out the error checking):
cudaMalloc((void**) &deviceA, ARows*sizeof(float)*AColumns);
cudaMalloc((void**) &deviceB, BRows*sizeof(float)*BColumns);
cudaMalloc((void**) &deviceC, CRows*sizeof(float)*CColumns);
cudaMemcpy(deviceA, hostA, ARows*sizeof(float)*AColumns, cudaMemcpyHostToDevice);
cudaMemcpy(deviceB, hostB, BRows*sizeof(float)*BColumns, cudaMemcpyHostToDevice);
cudaMemcpy(deviceC, hostC, CRows*sizeof(float)*CColumns, cudaMemcpyHostToDevice);
While the following is the Call:
dim3 dimGrid((int)ceil(numCRows / 8.0) , (int)ceil(numCColumns / 8.0), 1);
dim3 dimBlock(8 , 8, 1);
multiplyMatrices<<<dimGrid,dimBlock>>>(deviceA, deviceB, deviceC, numARows, AColumns, BRows, BColumns, CRows, CColumns);
And finally moving the memory Back: cudaMemcpy(hostC, deviceC, CRows*sizeof(float)*CColumns, cudaMemcpyDeviceToHost);
Now I have traced my algorithm repeatedly, and I do not believe there to be anything wrong with it, so I personally think there might be something wrong with the Block/Grid sizing scheme I've used. If anybody who knows CUDA/C better then I do (Ruby/JavaScript guy here), could take a look at it, and walk me through what exactly it is that I am doing wrong, I would be very very grateful.
if ((Row < numARows) && (Col < numBColumns)){
What happens if you change it to:if ((Row < numCRows) && (Col < numCColumns)){
? Or, I'd like to see your numerical values for every row and column variable. – Robert Crovella