1
votes

I'm having a weird issue with OpenCL where I'm calling clEnqueueNDRangeKernel for a kernel that uses a large memory buffer as an argument. For a buffer smaller than 16384 bytes everything works fine. If I increase the buffer size beyond that it returns an INVALID_KERNEL_ARGS error. As I understand, this error is meant to indicate an argument is not set. Setting an argument to an invalid size or something that doesn't fit in memory should trigger a different error.

Any ideas?

UPDATE

The answer from @mfa led me to look at the device specs again. It seems a 128*128 float array is exactly 64KB, which is the size of the card's constant memory. The global memory is much larger, so using __global instead of __constant for the kernel parameter fixes it.

But I'm still confused: How is an out-of-memory argument for const memory every to be called? It seems to me when I create the buffer it's not yet known if it'll be used as constant or global... Is there any way to get a more helpful error message?

1
When you create the data is not yet known how it will be used. However, the HW has specific channels to broadcast small key parameters in a very fast way (like transform matrices in GL). That is the __constant memory. If your buffer is bigger than this size then the parameter you are passing to your kernel to be constant is invalid, therefore the error.DarkZeros

1 Answers

1
votes

Query your device for 'CL_DEVICE_MAX_PARAMETER_SIZE', I bet it will be 16384. This value is 4096 for my intel cpu.

The minimum values defined by the specs are:

  • opencl 1.0 - 256 bytes
  • opencl 1.1 - 1024 bytes
  • opencl 1.2 - 1024 bytes
  • opencl 2.0 - 1024 bytes

Read more on the clGetDeviceInfo page.


Re: buffers and out-of-memory error

When you create a buffer, it is known which devices it could be attached to. clCreateBuffer takes the cl_context which the buffer will be associated with, and a previous call to clCreateContext took a list of devices to associated with the new context. But the buffer wasn't your problem.

__constant was your original issue. Your device wont let you use constant parameters totaling more than 4352 bytes, and you were trying to pass 65536 bytes. Global memory doesn't have such a low constraint (usually 50% of total memory for GPUs, and 25% for CPUs by default). The out of memory error will not be known/triggered until you try to set the kernel parameter to a value which is too large.

The problem happens on a low-level, when the constant memory is to be shared (or copied -- this is implementation dependent) for all work groups to use. there is only a finite amount of memory reserved in the compute unit for this use. There can only be so many transistors dedicated for this purpose before the designers of the chip would have to cut out ALU cores or other types of memory to allow a greater constant memory size. Even in opencl 2.0, the minimum is still only 1024 bytes.