I have a problem when I try to get access to each point of matrix in CUDA kernel. I'm working with OpenCV and I'm trying to "do something" on each point of matrix.
So, I'm converting uint8_t matrix to float matrix like this:
for(int i=0; i<inputMatrix.rows; ++i){
for(int j=0; j<inputMatrix.cols * cn; j+=cn){
examMatrix[i*inputMatrix.cols*cn + j + 0] = pixelPtr[i*inputMatrix.cols*cn + j + 0]; // B
examMatrix[i*inputMatrix.cols*cn + j + 1] = pixelPtr[i*inputMatrix.cols*cn + j + 1]; // G
examMatrix[i*inputMatrix.cols*cn + j + 2] = pixelPtr[i*inputMatrix.cols*cn + j + 2]; // R
}
}
And this works for 3 channels image cause if I created output image from this matrix (after back conversion to uint8_t) looks same as input.
But I want to make some changes using CUDA:
I set block size and grid size like this:
dim3 dimBlock(count, 3);
dim3 dimGrid( frameHeight/count, frameWidth/count);
Where count is thread number, 3 is channel number, frameHeight and frameWidth are frame size.
So, I allocated GPUexamMatrix and GPUresultMatrix and tried to access to each point in kernel. My kernel looks like this:
resultMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)] = examMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)];
So, as you can see I tried to simply copy matrix. After this operation, when I returned my matrix to host and printed it I've got really small or really big float numbers inside matrix, but not the numbers from examine matrix.
I suppose I'm doing something wrong inside kernel. Any ideas?
cuda-memcheckon your application to narrow it down. - PavelframeHeight/count- you sure it's fine with integer division? - PavelcudaMemcpyreturnedcudaErrorInvalidValue- Pavel