Execution flow on the host just after starting the kernel

Question

I have several queries for which I need some clarifications.. Sorry if they seem to be very basic.

When we start the kernel using clEnqueueNDRangeKernel, what actually happens to the host code, does it wait for the kernel to complete or ..?
Say we have multiple kernels, what happens in this case? does it happen that if one of the kernel has completed, can the host retrieve results from that kernel while others are still doing the computations.
I was reading the OpenCL spec for clCreateBuffer (link here). Check the description of the flag CL_MEM_USE_HOST_PTR. For your convenience I have posted it here: "it indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object."

I am not able to get what exactly they mean by "application" and "opencl implementation". Further it also says that "OpenCL implementations are allowed to cache the buffer contents."

Alex F Alex F · Accepted Answer · 2012-07-12T07:25:30

Kernel module is enqueued and executed asynchronously, clEnqueueNDRangeKernel returns immediately and host program continues execution. To make this call synchronous, wait for event from the last optional parameter, or call clFinish to wait for all queued commands.
Multiple kernels are executed sequentially in the same order as they enqueued, if they belong to the same command queue.
Application means your code running on the host. OpenCL implementation is a library implementing OpenCL interface. There are several OpenCL implementations, like AMD, NVidia etc.

OpenCL program contains three components: host code running in CPU, OpenCL library (implementation), and kernel modules running in GPU.

Execution flow on the host just after starting the kernel

1 Answers