1
votes

I want to get the maximum global work size. I don't want a kernel OpenCL will try to choose the best one for you, which MAY or MAY NOT be the maximum size.

To do this I want to specify the size when call clEnqueueNDRangeKernel. e.g:

clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL);

clGetKernelWorkGroupInfo documentation, indicates :

CL_KERNEL_GLOBAL_WORK_SIZE : This provides a mechanism for the application to query the maximum global size that can be used to execute a kernel (i.e. global_work_size argument to clEnqueueNDRangeKernel) on a custom device given by device or a built-in kernel on an OpenCL device given by device.

How can I get CL_KERNEL_GLOBAL_WORK_SIZE with OpenCL C++ bindings ?

I do this

cl::array<size_t, 3> kernel_global_work_size = my_kernel.getWorkGroupInfo<CL_KERNEL_GLOBAL_WORK_SIZE>(my_device);

But I got error :

cl2.hpp:5771:12: note: candidate: template<class T> cl_int cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_kernel_work_group_info, T*) const
     cl_int getWorkGroupInfo(
            ^~~~~~~~~~~~~~~~
cl2.hpp:5771:12: note:   template argument deduction/substitution failed:
cl2.hpp:5782:9: note: candidate: template<int name> typename cl::detail::param_traits<cl::detail::cl_kernel_work_group_info, name>::param_type cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_int*) const
         getWorkGroupInfo(const Device& device, cl_int* err = NULL) const

And with this code

cl::array<size_t, 3> kernel_global_work_size;
my_kernel.getWorkGroupInfo<cl::array<size_t, 3>>(my_device, CL_KERNEL_GLOBAL_WORK_SIZE, &kernel_global_work_size);

I got OpenCL error -30 (Invalid Value)

my_kernel is not Built-in Kernel e.g: cl::Kernel my_kernel = cl::Kernel(program, "my_kernel"); my_device is not Custom device. e.g: cl::Device device = myDevices[0];

1

1 Answers

2
votes

Yes, as your call matches the signature:

https://github.khronos.org/OpenCL-CLHPP/classcl_1_1_kernel.html

template <cl_int name> typename
detail::param_traits<detail::cl_kernel_work_group_info, name>::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const;

It looks like the param_traits which is generated via Macros is not declared for CL_KERNEL_GLOBAL_WORK_SIZE. That would be a bug in the headers. (GitHub issue created by OP)

For some entries here there are missing entries here .

Alternatively, you can use the version returning an error code, and the info via an output parameter, that should work around the issue:

template<typename T>
cl_int getWorkGroupInfo(const Device &device, cl_kernel_work_group_info name, T *param) const;

Call could look like:

cl::array<size_t, 3> result;
kernel.getWorkGroupInfo<decltype(result)>(device, CL_KERNEL_GLOBAL_WORK_SIZE, result);

My question to you would be: Did you try it yourself? Did the result not match your expectations?


Did you get an CL_INVALID_VALUE?

[...] on a custom device given by device or a built-in kernel on an OpenCL device given by device.

If device is not a custom device or kernel is not a built-in kernel, clGetKernelArgInfo returns the error CL_INVALID_VALUE.

See OpenCL 1.2 spec, pages 14 and 15:

Built-in Kernel: A built-in kernel is a kernel that is executed on an OpenCL device or custom device by fixed-function hardware or in firmware. Applications can query the built-in kernels supported by a device or custom device. A program object can only contain kernels written in OpenCL C or built-in kernels but not both. See also Kernel and Program.

Custom Device: An OpenCL device that fully implements the OpenCL Runtime but does not support programs written in OpenCL C. A custom device may be specialized non- programmable hardware that is very power efficient and performant for directed tasks or hardware with limited programmable capabilities such as specialized DSPs. Custom devices are not OpenCL conformant. Custom devices may support an online compiler. Programs for custom devices can be created using the OpenCL runtime APIs that allow OpenCL programs to be created from source (if an online compiler is supported) and/or binary, or from built-in kernels supported by the device. See also Device.

For regular kernels and devices, the standard constrains the work group size (device property), while the global size is only constrained by the range of the used size_t. See clEnqueueNDRangeKernel.