I have a cuda kernel written in numba-cuda that processes large arrays that do not fit in GPU memory at once. So, I have to call the kernel multiple times to process the entire arrays. The kernel is called in a loop and, inside the loop, after GPU is done the computation, I copy and aggregate the results back to a host array.
My questions:
- What is the lifetime of a device array and an array that is copied to GPU memory? Are their value preserved from one kernel call to another?
- Do I need to put the device arrays definitions inside the loop (before I call the kernel) or do I just do it once before I enter the loop?
- Do I need to free/delete the device arrays manually in the code or the CUDA memory manager will do it at the end of the program?
Thanks.