Our kernel is initialized with:
size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};
The kernel implements a typical convolution on an image file. It works fine on a machine with a Kabylake iGPU, but when executing it on Haswell or Bay Trail machines the global work size is interpreted as {60, 60} and therefore executes with a wrong NDRange.
On all systems our platform is OpenCL 1.2 beignet 1.3
Is this a known issue? Or is there a hardware-dependent limit to the global work size? There doesn't seem to be any info on that in the OpenCL Programming Guide.