0
votes

If I use a barrier (no matter if CLK_LOCAL_MEM_FENCE or CLK_GLOBAL_MEM_FENCE) in my kernel, it causes a CL_INVALID_WORK_GROUP_SIZE error. The global work size is 512, the local work size is 128, 65536 items have to be computed, the max work group size of my device is 1024, I am using only one dimension. For Java bindings I use JOCL. The kernel is very simple:

kernel void sum(global float *input, global float *output, const int numElements, local float *localCopy
{
    localCopy[get_local_id(0)] = grid[get_global_id(0)];
    barrier(CLK_LOCAL_MEM_FENCE); // or barrier(CLK_GLOBAL_MEM_FENCE)
}

I run the kernel on the Intel(R) Xeon(R) CPU X5570 @ 2.93GHz and can use OpenCL 1.2. The calling method looks like

kernel.putArg(aCLBuffer).putArg(bCLBuffer).putArg(elementCount).putNullArg(localWorkSize);
queue.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize);

But the error is always the same:

[...]can not enqueue 1DRange CLKernel [...] with gwo: null gws: {512} lws: {128} 
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]

What I am doing wrong?

1
I'm not too familiar with the JOCL bindings, but the call: putNullArg(localWorkSize) seems to be allocating bytes of local memory. I think you need to try: putNullArg(localWorkSize * 4), or the java equivalent of putNullArg(localWorkSize * sizeof(float)) - mfa
unless localWorkSize is already multiplied.. - mfa
Thank you for your comment, but unfortunately it doesn't solve the problem. The JOCL documentation says nothing about byte size ( goo.gl/ALkBLw ), just size, so I assume (and trying says also), that it is not the problem. - Michael Dorner
does the error happen for other group sizes? 1, 16, 32 etc? - mfa
No, for local work size = 1 it does work, but not for > 1. - Michael Dorner

1 Answers

6
votes

This is expected behaviour on some OpenCL platforms. For example, on my Apple system, the CPU device has a maximum work-group size of 1024. However, if a kernel has a barrier inside, then the maximum work-group size for that specific kernel is reduced to 1.

You can query the maximum work-group size for a specific kernel by using the clGetKernelWorkGroupInfo function with the CL_KERNEL_WORK_GROUP_SIZE parameter. The value returned will be no more than the value returned by clGetDeviceInfo and CL_DEVICE_MAX_WORK_GROUP_SIZE, but is allowed to be less (as it is in this case).