1
votes

I encountered a problem with asynchronous data transfer on the iGPU of a Intel Core i7-7600U.

The core part of the code from a trivial example:

std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
auto platform = platforms.front();
std::vector<cl::Device> devices;
platform.getDevices(CL_DEVICE_TYPE_GPU, &devices);
auto device = devices.front();

cl::Context context(device);
char buf[16];
cl::Buffer memBuf(context, CL_MEM_READ_WRITE, sizeof(buf));

cl::CommandQueue queue(context, device);
cl::Event ev;
queue.enqueueWriteBuffer(memBuf, CL_FALSE, 0, sizeof(buf), buf, NULL, &ev);
int status;
do{
    ev.getInfo(CL_EVENT_COMMAND_EXECUTION_STATUS, &status);
}while(status != 0);
std::cout << "DONE" << std::endl;

What it should do is busy waiting on the data transfer. This could be included in a scheduler (and is in starPU)

However it doesn't pass the loop. When I use the CPU instead (CL_DEVICE_TYPE_CPU) it works. When I use ev.wait() or queue.finish() it works.

Is this an Intel bug? Is there anything in the OpenCL standard, that allows the implementation to delay the scheduling till it is actually waited for?

For reference: Using Linux Mint, Kernel 4.13.0-32-generic #35~16.04.1-Ubuntu SMP.
OpenCL runtime from https://software.intel.com/en-us/articles/opencl-drivers (intel-opencl-r5.0 (SRB5.0) Linux driver package) and installed via:

sudo alien --to-deb *.rpm
sudo dpkg -i *.deb
sudo ln -s /opt/intel/opencl/include/CL /usr/local/include/CL
sudo apt-get install ocl-icd-libopencl1
1

1 Answers

2
votes

OpenCL commands enqueued by clEnqueueXXX do not start execution, until you call clFlush or clFinish functions.

  • clFlush basically moves all commands from CL_QUEUED state to CL_SUBMITTED state (see clGetEventInfo documentation for a list of all states).

  • clFinish does the same as clFlush, but also blocks until all commands in a queue finish execution.

In your example, you should add clFlush call before the loop to make it work.

Edit: some OpenCL implementations can do an implicit clFlush after enqueue, so a command can start right after clEnqueueXXX call. This is not portable, and for general case, you should still use clFlush or clFinish.