0
votes

I have written one piece of OpenCL kernel program to apply low pass filter to an image. The kernel:

__kernel void applyLowPassFilter(__global int *image, __global int *rst,
                                 __local int *localMem) {
  int nCols = get_global_size(0); // width of image
  int nRows = get_global_size(1); // height of image

  int xg = get_global_id(0); // x index of global buffer
  int yg = get_global_id(1); // y index od global buffer

  int xl = get_local_id(0); // x index of local buffer

  localMem[xl] = image[yg * nCols + xg];
  barrier(CLK_LOCAL_MEM_FENCE);
  if (yg != 0) {
    rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
  }
}

In the kernel code, I would like to access the local memory of each workgroup and compute the value. So I set the global item size to W*H (W: width of the image, H: height of the image) and local item size to W*1, I'm expecting the group size to be W and the number of group size to be H here. Host code:

    size_t globalItemSize[2];
    size_t localItemSize[2];
    globalItemSize[0] = W;
    globalItemSize[1] = H;
    localItemSize[0] = W;
    localItemSize[1] = 1;
    // Set cl kernel arguments.
    ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
    ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);
    ret = clSetKernelArg(clKernel, 2, sizeof(int) * localItemSize[0], NULL); // local mem

However, the code doesn't work and keeps giving me a result image of zeros. After experiment I found that it works by only using global memory and not accessing local memory. Did I do anything wrong with the code accessing local memory?

1
Are you actually checking any of the return values (error codes) of your calls? Are you sure there is enough local memory available? Are you sure your work group size is possible on the device you are using? It sounds like your kernel is probably not running at all because something is set up badly, and the returned error code(s) should tell you what the problem is. - pmdj
@pmdj I figured out the problem by tracing the error codes of each step, thanks. - wangx1ng

1 Answers

0
votes

I figured it out by monitoring the OpenCL return error codes. Firstly I got -48 CL_INVALID_KERNEL error code after clSetKernelArg calls. It's very suspicious that something is wrong with my kernel. Then I dropped the third parameter which I passed to kernel for local memory accessing and used the __local statement in the kernel code instead. At this point, I got -51 CL_INVALID_ARG_SIZE error code which reminded me of checking the local work item number limitation of my hardware using the clinfo command. Realized the limitation of local item size, I changed the localItemSize in dimension 0 from W to W/3. Then it worked.

Kernel code after modification:

__kernel void applyLowPassFilter(__global int *image, __global int *rst) {
  int nCols = get_global_size(0); // width of image
  int nRows = get_global_size(1); // height of image

  int xg = get_global_id(0); // x index of global buffer
  int yg = get_global_id(1); // y index od global buffer

  int xl = get_local_id(0); // x index of local buffer

  __local int localMem[212]; // 1/3 of image width
  localMem[xl] = image[yg * nCols + xg];
  barrier(CLK_LOCAL_MEM_FENCE);
  if (yg != 0) {
    rst[yg * nCols + xg] = (localMem[xl] + image[(yg - 1) * nCols + xg]) / 2;
  }
}

Parameters config in the host code:

    size_t globalItemSize[2];
    size_t localItemSize[2];
    globalItemSize[0] = W;
    globalItemSize[1] = H;
    localItemSize[0] = W / 3;
    localItemSize[1] = 1;
    // Set cl kernel arguments.
    ret = clSetKernelArg(clKernel, 0, sizeof(cl_mem), (void *)&imageObj);
    ret = clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&rstObj);