I am a CUDA beginner who has successfully compiled and run several code samples using CUDA libraries such as CUFFT and CUBLAS. Lately, however, I have been trying to generate my own simple kernels and am repeatedly receiving nonsense values back after calling my kernels. That is--when I pass a parameter into a kernel, set its value in the kernel, then try to copy the results back to the host and read the values later, they are bogus. I have tried many different simple tutorial kernels that seem to work for most people online, but I always get nonsensical values. For example...
#define SIZE 10
// Kernel definition, see also section 4.2.3 of Nvidia Cuda Programming Guide
__global__ void vecAdd(float* A, float* B, float* C) {
// threadIdx.x is a built-in variable provided by CUDA at runtime
int i = threadIdx.x;
A[i]=0;
B[i]=i;
C[i] = A[i] + B[i];
}
int main {
int N=SIZE;
float A[SIZE], B[SIZE], C[SIZE];
float *devPtrA;
float *devPtrB;
float *devPtrC;
int memsize= SIZE * sizeof(float);
cudaMalloc((void**)&devPtrA, memsize);
cudaMalloc((void**)&devPtrB, memsize);
cudaMalloc((void**)&devPtrC, memsize);
cudaMemcpy(devPtrA, A, memsize, cudaMemcpyHostToDevice);
cudaMemcpy(devPtrB, B, memsize, cudaMemcpyHostToDevice);
// __global__ functions are called: Func<<< Dg, Db, Ns >>>(parameter);
vecAdd<<<1, N>>>(devPtrA, devPtrB, devPtrC);
cudaMemcpy(C, devPtrC, memsize, cudaMemcpyDeviceToHost);
for (int i=0; i<SIZE; i++)
printf("C[%d]=%f\n",i,C[i]);
cudaFree(devPtrA);
cudaFree(devPtrA);
cudaFree(devPtrA);
}
This is a fairly straightforward problem; the results should be:
C[0]=0.000000
C[1]=1.000000
C[2]=2.000000
C[3]=3.000000
C[4]=4.000000
C[5]=5.000000
C[6]=6.000000
C[7]=7.000000
C[8]=8.000000
C[9]=9.000000
However, my awesome results are always random and generally look more like:
C[0]=nan
C[1]=-32813464158208.000000
C[2]=nan
C[3]=-27667211200843743232.000000
C[4]=34559834084263395806523272811251761152.000000
C[5]=9214363188332593152.000000
C[6]=nan
C[7]=-10371202300694685655937271382147072.000000
C[8]=121653576586393934243511643668480.000000
C[9]=-30648783863808.000000
So basically, when I pass parameters into a CUDA kernel with the intention of storing results within them to be copied back to the host, I tend to get out junk.
This one really has me stumped. Any help would be greatly appreciated.
Thanks.
cudaMalloc
andcudaMemcpy
withCUDA_SAFE_CALL
as currently you're not doing any error checking. – Paul R