1
votes

I'm learning to use opencl. Now my task is very simple, copy one large array to another one. Let's say a[301][300][300] to b[301][300][300]. It's just a test to make me understand what's global work size and local work size. And I use SVM to pass float8 vector array to kernel.

__global float8* dts,
__global float8* dts_from_file

1. It seems I have to choose global work size > the array size, in my test case

size_t globalWorkSize[3] = { 128, 128, 256 };

(128*128*256*8)>301*300*300. Otherwise, I get truncated output. Am I right or just confused about the definition of the global work size? FYI,

    CL_DEVICE_ADDRESS_BITS=64
    CL_DEVICE_MAX_WORK_GROUP_SIZE=256
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=3
    CL_DEVICE_MAX_WORK_ITEM_SIZES[0,1,2]=256, 256, 256

2. Is the local work size limited by the CL_KERNEL_WORK_GROUP_SIZE=256 ?

size_t localWorkSize[3] = { 4,8,8 };

As far as I change 4 to larger value, there will be clEnqueueNDRangeKernel error CL_INVALID_WORK_GROUP_SIZE because 4*8*8=256?

3. What about the global/local work size for multiple devices (CPU+GPU), do I need to specify different work size for each device?

Thanks in advance.

1
What kind of device limits CL_DEVICE_MAX_WORK_ITEM_SIZES to just 256 x 256 x 256? Most I've seen have these at least 8192 x 8192 x (something) or larger.Dithermaster

1 Answers

0
votes

The total local work size cannot exceed CL_DEVICE_MAX_WORK_GROUP_SIZE, and each of them cannot exceed CL_DEVICE_MAX_WORK_ITEM_SIZES. In your case using local work size {4, 8, 8} is ok.
Each one in global work size must be multiple of local work size. When your kernel process only one piece of data, you have to set global work size bigger than your array size, and add a check inside kernel like this

if (get_global_id(0) < array_size_x)    {   ...   }

Of couse each item can process more data, you must check differently in different cases.

Each device has different work size, they must be set independently. Making different devices running same kernel at same time is not an easy task.