cudaMemcpy not responding when copying from device

Question

I've written CUDA code for a Monte Carlo simulation. I basically have a number of particles and perform certain operations. To calculate the density at each cell of my 3D grid, I have an array (on the device) where I assign a cell ID for each particle. This is done via CUDA. I then want to copy the device memory to host to calculate the density and write the values to file.

However, when running my code, cudaMemcpy is not responding and the code after the statement is not executed. I'm worried that I've done something wrong when allocating the arrays and I would be happy if someone could point out my errror.

Here's the important part of the code:

size_t sizeInt = dim*numParticles *sizeof(int);
...
int *h_cellIndex = NULL; // host
err = cudaHostAlloc((void **)&h_cellIndex, sizeInt,0);
//int *h_cellIndex = (int*) malloc(sizeInt);  <- this instead didn't work either
...
int *d_cellIndex = NULL; // device
err = cudaMalloc((void **)&d_cellIndex, sizeInt);
...
// simulation starts
...
printf("copy\n");
cudaMemcpy(h_cellIndex,d_cellIndex,sizeInt,cudaMemcpyDeviceToHost);
printf("copy done\n");

As output, I see "copy" printed to command line. Then nothing more happens, no segmentation fault, but also no further calculation.

Any idea what might be the problem?

Thanks in advance!

I guess that your simulation is still running. The invocation of the kernel is asynchronous, so I think that it's your kernel being stuck. Just add a call to cudaDeviceSynchronize() after the kernel invocation and see whether it blocks there instead, in order to check it. — Sigi
Are you sure that there are no runtime errors occurring before the cudaMemcpy call? It is highly likely that your kernel is the culprit - either is it stuck in a loop or doing something which is killing the context/application. Try running cuda-memcheck and update your question with the output. — talonmies
@Sigismondo : Thank you very much! I added cudaDeviceSynchronize(); after each kernel call and noticed that some calculation entered an infinite loop and didn't return. If you make your comment an answer, I would be happy to accept it as the solution. Thank you! — Thomas
I felt obliged to add few lines to make this hint an answer...! :) So maybe it can be of help to somebody else. — Sigi

Sigi Sigi · Accepted Answer · 2014-02-17T10:04:55

I guess that your simulation is still running. The invocation of the kernel is asynchronous, so I think that it's your kernel being stuck. Just add a call to cudaDeviceSynchronize() after the kernel invocation and see whether it blocks there instead, in order to check it.

In fact kernels are not allowed to return any value, their return value can be void only, because they are asynchronous and any error in the kernel will be presented by the next call synchronizing them: a call in stream 0, a call in the same stream or an explicit synchronization.

cudaMemcpy not responding when copying from device

1 Answers