OpenCL global worskize interpreted differently on Haswell & Kabylake iGPUs

Question

Our kernel is initialized with:

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

The kernel implements a typical convolution on an image file. It works fine on a machine with a Kabylake iGPU, but when executing it on Haswell or Bay Trail machines the global work size is interpreted as {60, 60} and therefore executes with a wrong NDRange.

On all systems our platform is OpenCL 1.2 beignet 1.3

Is this a known issue? Or is there a hardware-dependent limit to the global work size? There doesn't seem to be any info on that in the OpenCL Programming Guide.

Sounds like a bug you need to raise with the Beignet developers. — pmdj

mogu mogu · Accepted Answer · 2019-02-28T09:59:28

Local work size and global work size must have the same dimension. See the documentation to clEnqueueNDRangeKernel:

local_work_size  Points to an array of work_dim unsigned values
global_work_size  Points to an array of work_dim unsigned values

So your code

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

If you enqueue a kernel with those and with workdim == 2, the driver will read that as

size_t localWorkSize[2] = {1, something};
size_t globalWorkSize[2] = {60, 80};

where something is whatever is on stack above localWorkSize. You need to do

size_t localWorkSize[2] = {1, 1};

OpenCL global worskize interpreted differently on Haswell & Kabylake iGPUs

1 Answers