Is it possible to have a persistent cuda kernel running and communicating with cpu asynchronously ?

Question

Cuda streams and memorycpyasync as far as I know, need us to label different kernels, memory operations to different streams in order to make the gpu operations concurrent with cpu operations.

But is it possible to have one persistent kernel. This kernel launches one time, looping forever, checking "some flags" to see if there are a piece of data coming from CPU then operating on it. When this "piece of " data finishes, GPU set a "flag" to CPU, CPU sees it and copy the data back. This Kernel shall never finishes running.

Does this exist in current cuda programming model? What will be the closest to this I can get?

Robert Crovella Robert Crovella · Accepted Answer · 2014-02-28T20:16:59

Yes, it's possible. One approach is to use zero-copy (i.e. GPU mapped) host memory. The host places its data in the mapped area, and the GPU communicates back in the mapped area. Obviously this required polling, but that is inherent in your question.

This answer gives you most of the plumbing you need for a simple test case.

There is also the simple zero-copy sample code.

This answer provides a more involved, fully worked example.

Naturally, you'd want to do this in an environment where there are no timeout watchdogs enabled.

Is it possible to have a persistent cuda kernel running and communicating with cpu asynchronously ?

1 Answers