I need to have an OpenCL kernel iteratively update a buffer and return the results. To clarify:
- Send initial buffer to contents to the kernel
- Kernel/worker updates each element in the buffer
- Host code reads the results - HOPEFULLY asynchronously, though I'm not sure how to do this without blocking the kernel.
- Kernel runs again, again updating each element, but the new value depends on the previous value.
- Repeat for some fixed number of iterations.
So far, I've been able to fake this by providing an input and output buffer, copying the output back to the input when the kernel finishes executing, and restarting the kernel. This seems like a huge waste of time and abuse of limited memory bandwidth as the buffer is quite large (~1GB).
Any suggestions/examples? I'm pretty new at OpenCL so this may have a very simple answer.
If it matters, I'm using Cloo/OpenCL.NET on an NVidia GTX460 and two GTX295s.