0
votes

I have a cuda kernel which is called many times, which adds some values to an array of allocated size N. I keep track of the inserted elements with a device variable in which I apply atomicAdd.

When the number of added values approach N, I would like to be able to know it so I can call cudaMalloc again and reallocate the array. The most obvious solution is to do a cudaMemcpy of that device variable every time the kernel is called, and therefore keep track of the size of the array in the host. What I would like to know is if there is a way to be able of ONLY doing the cudaMemcpy to the host when the values are approaching N.

One possible solution I had thought of is if I could set cudaError_t return value to 30 (cudaErrorUnknown), or some custom error, which I could later check. But I havent found how to do it and I guess that its not possible. Is there any way to do what I want and do the cudaMemcpy only when the device finds that its running out of memory?

1

1 Answers

1
votes

But I haven't found how to do it and I guess that it's not possible

Error numbers from the runtime are set by the host driver. They are not available to the programmer and they cannot be set in kernels either. So your guess is correct. There are assertions available in device for debugging, and there are ways to cause a kernel to abnormally abort, but the latter will cause context destruction and a loss of the contents of device memory, which I suspect won't help you.

About the best you can do is use a mapped host or managed allocation as a way for the host to keep track of the consumption of allocated memory on the device. Then you don't need to explicitly memcpy and the latency will be minimized. But you will need some sort of synchronization on the running kernel in that case.