I have wrote a program which has two streams. Both streams operate on some data and write results back on the host memory. Here is the generic structure of how i am doing this:
loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);
Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>
/* Write the results on the host memory */
AsyncCpy(....,DeviceToHost,Stream1);
AsyncCpy(....,DeviceToHost,Stream2);
}
I want to do some work on the CPU once i know that StreamX has finished copying the results back to the host memory. At the same time, i don't want to stop the loop from executing Async operations (memcpy or kernel execution).
If i insert my host functions, let say host_ftn1(..) and host_ftn2(..) like this
loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);
Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>
/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);
if(Stream1 results are copied to host)
host_ftn1(..);
if(Stream2 results are copied to host)
host_ftn2(..);
}
It will stop the execution of loop until it finishes the execution of host functions i.e. host_ftn1 and host_ftn2, but I don't want to stop the execution of GPU instructions i.e. AsyncCpy(..) and Kernel<<<....,StreamX>>> while the CPU is busy executing the host functions i.e. host_ftn1(..) and host_ftn2(..)
Any solution/approach regarding this problem