Simplified problem I have two host threads, each with its own command queue to the same GPU device. Both queues are out-of-order with the execution order explicitly managed using wait events (simplified example doesn't need this, but actual application does).
ThreadA
is a lightweight processing pipeline that runs in real-time as new data is acquired. ThreadB
is a heavyweight slower processing pipeline that uses the same input data but processes it asynchronously at a slower rate. I'm using a double buffer to keep the pipelines separate but allow ThreadB
to work on the same input data written to device by ThreadA
.
ThreadA
's loop:
- Pulling image from network as data is available
- Write image to device
cl_mem BufferA
usingclEnqueueWriteBuffer(CommandQueueA)
- Invoke image processing
KernelA
usingclEnqueueNDRangeKernel(CommandQueueA)
once write is complete (kernel outputs results tocl_mem OutputA
) - Read processed result from
OutputA
usingclEnqueueReadBuffer(CommandQueueA)
ThreadB
's loop
- Wait until enough time has elapsed (does work at slower rate)
- Copy
BufferA
toBufferB
usingclEnqueueCopyBuffer(CommandQueueB)
(double buffer swap) - Invoke slower image processing
KernelB
usingclEnqueueNDRangeKernel(CommandQueueB)
once copy is complete (kernel outputs results tocl_mem OutputB
) - Read processed result from
OutputB
usingclEnqueueReadBuffer(CommandQueueB)
My Questions
There's a potential race condition between ThreadA
's step 2 and ThreadB
's step 2. I don't care which is executed first, I just want to make sure I don't copy BufferA
to BufferB
while BufferA
is being written to.
- Does OpenCL provide any implicit guarantees that this won't happen?
- If not, if I instead on
ThreadB
step 2 useclEnqueueCopyBuffer(CommandQueueA)
so that both the write and copy operations are in the same command queue, does that guarantee that they can't run simultaneously even though the queue allows out-of-order execution? - If not, is there a better solution than adding the WriteBuffer's event in
ThreadA
to the waitlist of the CopyBuffer command inThreadB
?
It seems like any of these should work, but I can't find where in the OpenCL spec it says this is fine. Please cite the OpenCL spec in your answers if possible.