Is it possible to synchronize two CUDA streams without blocking the host? I know there's cudaStreamWaitEvent
, which is non-blocking. But what about the creation and destruction of the events using cudaEventCreate
and cudaEventDestroy
.
The documentation for cudaEventDestroy
says:
In case event has been recorded but has not yet been completed when cudaEventDestroy() is called, the function will return immediately and the resources associated with event will be released automatically once the device has completed event.
What I don't understand here is what the difference is between a recorded event and a completed event. Also this seems to imply that the call is blocking if the event has not yet been recorded.
Anyone who can shed some light on this?
cudaEventCreate()
on it. An event is recorded when you callcudaEventRecord()
on it. An event is completed when the processing of a stream that an event has been recorded into, reaches that event. For example, if I record an event into a stream immediately after a kernel call, then the event will be recorded but incomplete, until the kernel call has finished processing. Once the kernel call finishes processing, the recorded event after it will be marked complete (and stream processing will continue.) – Robert CrovellacudaEventDestroy
call is not blocking if the event has not yet been recorded. – Robert Crovella