I've created an in-order OpenCL queue. My pipeline enqueues multiple kernels into the queue.
queue = clCreateCommandQueue(cl.context, cl.device, 0, &cl.error);
for(i=0 ;i < num_kernels; i++){
clEnqueueNDRangeKernel(queue, kernels[i], dims, NULL, global_work_group_size, local_work_group_size, 0, NULL, &event);
}
The output of kernels[0] is intput to kernels[1]. Output of kernels[1] is input to kernels[2] and so on.
Since my command queue is an in-order queue, my assumption is kernels[1] will start only after kernels[0] is completed.
- Is my assumption valid?
- Should I use
clWaitForEventsto make sure the previous kernel is completed before enqueuing the next kernel? - Is there any way I can stack multiple kernels into the queue & just pass the input to kernels[0] & directly get the output from the last kernel? (without having to enqueue every kernel one by one)