Is clWaitForEvents required for an in-order queue?

Question

I've created an in-order OpenCL queue. My pipeline enqueues multiple kernels into the queue.

    queue = clCreateCommandQueue(cl.context, cl.device, 0, &cl.error);

    for(i=0 ;i < num_kernels; i++){
        clEnqueueNDRangeKernel(queue, kernels[i], dims, NULL, global_work_group_size, local_work_group_size, 0, NULL, &event);
    }

The output of kernels[0] is intput to kernels[1]. Output of kernels[1] is input to kernels[2] and so on.

Since my command queue is an in-order queue, my assumption is kernels[1] will start only after kernels[0] is completed.

Is my assumption valid?
Should I use clWaitForEvents to make sure the previous kernel is completed before enqueuing the next kernel?
Is there any way I can stack multiple kernels into the queue & just pass the input to kernels[0] & directly get the output from the last kernel? (without having to enqueue every kernel one by one)

w-m w-m · Accepted Answer · 2018-05-30T08:28:45

Your assumption is valid. You do not need to wait for events in an in-order queue. Take a look at the OpenCL doc:

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html

If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a command-queue is not set, the commands enqueued to a command-queue execute in order. For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by a clEnqueueNDRangeKernel to execute kernel B, the application can assume that kernel A finishes first and then kernel B is executed. If the memory objects output by kernel A are inputs to kernel B then kernel B will see the correct data in memory objects produced by execution of kernel A. If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a commandqueue is set, then there is no guarantee that kernel A will finish before kernel B starts execution.

As to the other question: yes, you'll need to enqueue every kernel that you want to run explicitly. Consider it a good thing, as there is no magic happening.

Of course you can always write your own helpers in C/C++ (or whatever host language you are using) that simplify this, and potentially hide the cumbersome kernel calls. Or use some GPGPU abstraction library to do the same.

Is clWaitForEvents required for an in-order queue?

1 Answers