0
votes

Simplified problem I have two host threads, each with its own command queue to the same GPU device. Both queues are out-of-order with the execution order explicitly managed using wait events (simplified example doesn't need this, but actual application does).

ThreadA is a lightweight processing pipeline that runs in real-time as new data is acquired. ThreadB is a heavyweight slower processing pipeline that uses the same input data but processes it asynchronously at a slower rate. I'm using a double buffer to keep the pipelines separate but allow ThreadB to work on the same input data written to device by ThreadA.

ThreadA's loop:

  1. Pulling image from network as data is available
  2. Write image to device cl_mem BufferA using clEnqueueWriteBuffer(CommandQueueA)
  3. Invoke image processing KernelA using clEnqueueNDRangeKernel(CommandQueueA) once write is complete (kernel outputs results to cl_mem OutputA)
  4. Read processed result from OutputA using clEnqueueReadBuffer(CommandQueueA)

ThreadB's loop

  1. Wait until enough time has elapsed (does work at slower rate)
  2. Copy BufferA to BufferB using clEnqueueCopyBuffer(CommandQueueB) (double buffer swap)
  3. Invoke slower image processing KernelB using clEnqueueNDRangeKernel(CommandQueueB) once copy is complete (kernel outputs results to cl_mem OutputB)
  4. Read processed result from OutputB using clEnqueueReadBuffer(CommandQueueB)

My Questions

There's a potential race condition between ThreadA's step 2 and ThreadB's step 2. I don't care which is executed first, I just want to make sure I don't copy BufferA to BufferB while BufferA is being written to.

  1. Does OpenCL provide any implicit guarantees that this won't happen?
  2. If not, if I instead on ThreadB step 2 use clEnqueueCopyBuffer(CommandQueueA) so that both the write and copy operations are in the same command queue, does that guarantee that they can't run simultaneously even though the queue allows out-of-order execution?
  3. If not, is there a better solution than adding the WriteBuffer's event in ThreadA to the waitlist of the CopyBuffer command in ThreadB?

It seems like any of these should work, but I can't find where in the OpenCL spec it says this is fine. Please cite the OpenCL spec in your answers if possible.

1

1 Answers

2
votes

Does OpenCL provide any implicit guarantees that this won't happen?

No, there is no implicit synchronization unless you use a single in-order command queue.

If not, if I instead on ThreadB step 2 use clEnqueueCopyBuffer(CommandQueueA) so that both the write and copy operations are in the same command queue, does that guarantee that they can't run simultaneously even though the queue allows out-of-order execution?

No, regardless of a queue's type (in-order vs out-of-order), OpenCL runtime does not track memory dependencies of commands. User is responsible to specify events in a wait list, if any dependency between commands exists.

The following quote could serve as a proof of that:

s3.2.1 Execution Model: Context and Command Queues

Out-of-order Execution: Commands are issued in order, but do not wait to complete before following commands execute. Any order constraints are enforced by the programmer through explicit synchronization commands.

It is not a direct answer to your question, but I assume that if any guarantees were provided, they should be mentioned in this section.

If not, is there a better solution than adding the WriteBuffer's event in ThreadA to the waitlist of the CopyBuffer command in ThreadB?

If you can use a single in-order queue, that would probably be more efficient than a cross-queue event, at least for some implementations.