I've written CUDA code for a Monte Carlo simulation. I basically have a number of particles and perform certain operations. To calculate the density at each cell of my 3D grid, I have an array (on the device) where I assign a cell ID for each particle. This is done via CUDA. I then want to copy the device memory to host to calculate the density and write the values to file.
However, when running my code, cudaMemcpy is not responding and the code after the statement is not executed. I'm worried that I've done something wrong when allocating the arrays and I would be happy if someone could point out my errror.
Here's the important part of the code:
size_t sizeInt = dim*numParticles *sizeof(int);
...
int *h_cellIndex = NULL; // host
err = cudaHostAlloc((void **)&h_cellIndex, sizeInt,0);
//int *h_cellIndex = (int*) malloc(sizeInt); <- this instead didn't work either
...
int *d_cellIndex = NULL; // device
err = cudaMalloc((void **)&d_cellIndex, sizeInt);
...
// simulation starts
...
printf("copy\n");
cudaMemcpy(h_cellIndex,d_cellIndex,sizeInt,cudaMemcpyDeviceToHost);
printf("copy done\n");
As output, I see "copy" printed to command line. Then nothing more happens, no segmentation fault, but also no further calculation.
Any idea what might be the problem?
Thanks in advance!
cudaDeviceSynchronize();after each kernel call and noticed that some calculation entered an infinite loop and didn't return. If you make your comment an answer, I would be happy to accept it as the solution. Thank you! - Thomas