0
votes

I have wrote a program which has two streams. Both streams operate on some data and write results back on the host memory. Here is the generic structure of how i am doing this:

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory */
AsyncCpy(....,DeviceToHost,Stream1);  
AsyncCpy(....,DeviceToHost,Stream2);  
}

I want to do some work on the CPU once i know that StreamX has finished copying the results back to the host memory. At the same time, i don't want to stop the loop from executing Async operations (memcpy or kernel execution).

If i insert my host functions, let say host_ftn1(..) and host_ftn2(..) like this

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);  

if(Stream1 results are copied to host)
       host_ftn1(..);
if(Stream2 results are copied to host)
       host_ftn2(..);
}

It will stop the execution of loop until it finishes the execution of host functions i.e. host_ftn1 and host_ftn2, but I don't want to stop the execution of GPU instructions i.e. AsyncCpy(..) and Kernel<<<....,StreamX>>> while the CPU is busy executing the host functions i.e. host_ftn1(..) and host_ftn2(..)

Any solution/approach regarding this problem

1
Try callbacks or different streams with events. - huseyin tugrul buyukisik

1 Answers

0
votes

As huseyin tugrul buyukisik suggested, the stream callback worked in this scenario. I have tested this for two streams.

The final design is as following:-

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);  

callback1(..);    // Work to be done on the host once stream1 completes
callback2(..);    // Work to be done on the host once stream2 completes
}

See Stream Callbacks