I have a cuda kernel which is called many times, which adds some values to an array of allocated size N. I keep track of the inserted elements with a device variable in which I apply atomicAdd.
When the number of added values approach N, I would like to be able to know it so I can call cudaMalloc again and reallocate the array. The most obvious solution is to do a cudaMemcpy of that device variable every time the kernel is called, and therefore keep track of the size of the array in the host. What I would like to know is if there is a way to be able of ONLY doing the cudaMemcpy to the host when the values are approaching N.
One possible solution I had thought of is if I could set cudaError_t return value to 30 (cudaErrorUnknown), or some custom error, which I could later check. But I havent found how to do it and I guess that its not possible. Is there any way to do what I want and do the cudaMemcpy only when the device finds that its running out of memory?