0
votes

I'm trying to learn OpenCL on a Mac, which appears to have some differences in implementation from the OpenCL book I'm reading. I want to be able to dynamically allocate local memory on the GPU. What I'm reading is I need to use the clSetKernelArg function, but that doesn't work within Xcode 6.4. Here's the code as it stands (never mind it's a pointless program, just trying to learn the syntax for shared memory). In Xcode, the kernel is written as a stand-alone .cl file similar to CUDA, so that's a separate file.

add.cl:

kernel void add(int a, int b, global int* c, local int* d)
{
    d[0] = a;
    d[1] = b;
    *c = d[0] + d[1];
}

main.c:

#include <stdio.h>
#include <OpenCL/opencl.h>
#include "add.cl.h"

int main(int argc, const char * argv[]) {

    int a = 3;
    int b = 5;
    int c;
    int* cptr = &c;

    dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);

    void* dev_c = gcl_malloc(sizeof(cl_int), NULL, CL_MEM_WRITE_ONLY);

    // attempt to create local memory buffer
    void* dev_d = gcl_malloc(2*sizeof(cl_int), NULL, CL_MEM_READ_WRITE); 
    // clSetKernelArg(add_kernel, 3, 2*sizeof(cl_int), NULL);

    dispatch_sync(queue, ^{

        cl_ndrange range = { 1, {0, 0, 0}, {1, 0, 0}, {1, 0, 0} };

        // This gives a warning: 
        // Warning: Incompatible pointer to integer conversion passing 'cl_int *' 
        //     (aka 'int *') to parameter of type 'size_t' (aka 'unsigned long')
        add_kernel(&range, a, b, (cl_int*)dev_c, (cl_int*)dev_d);

        gcl_memcpy((void*)cptr, dev_c, sizeof(cl_int));

    });

    printf("%d + %d = %d\n", a, b, c);

    gcl_free(dev_c);    
    dispatch_release(queue);
    return 0;
}

I've tried putting clSetKernelArg where indicated and it doesn't like the first argument:

Error: Passing 'void (^)(const cl_ndrange *, cl_int, cl_int, cl_int *, size_t)' to parameter of incompatible type 'cl_kernel' (aka 'struct _cl_kernel *')

I've looked and looked but can't find any examples illustrating this point within the Xcode environment. Can you point me in the right direction?

2
I should clarify, the warning in the add_kernel call refers to the last argument (dev_d). - Chopp
Where is add_kernel defined? My guess would be in add.cl.h but you haven't shown us that. Also .cl files are not Xcode specific, you could do that with any compiler/IDE. You could also write the OpenCL kernel code inline if you wanted to, as its done in the Hello World Example - UnholySheep
add_kernel is defined by the Xcode cl compiler. The code is in add.cl and the compiler creates the add_kernel function by appending the "_kernel". - Chopp

2 Answers

1
votes

Managed to solve this by ditching Apple's extensions and using standard OpenCL 1.2 calls. That means replacing gcl_malloc with clCreateBuffer, replacing dispatch_sync with clEnqueueNDRangeKernel, and most importantly, using clSetKernelArg with NULL in the last argument for local variables. Works like a charm.

Here's the new version:

char kernel_add[1024] =
"kernel void add(int a, int b, global int* c, local int* d) \
{\
    d[0] = a;\
    d[1] = b;\
    *c = d[0] + d[1];\
}";

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <OpenCL/opencl.h>

int main(int argc, const char * argv[]) {

    int a = 3;
    int b = 5;
    int c;

    cl_device_id device_id;
    int err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);

    cl_context context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);    
    cl_command_queue queue = clCreateCommandQueue(context, device_id, 0, &err);

    const char* srccode = kernel;
    cl_program program = clCreateProgramWithSource(context, 1, &srccode, NULL, &err);

    err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
    cl_kernel kernel = clCreateKernel(program, "kernel_add", &err);

    cl_mem dev_c = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int), NULL, NULL);

    err = clSetKernelArg(kernel, 0, sizeof(int), &a);
    err |= clSetKernelArg(kernel, 1, sizeof(int), &b);
    err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &dev_c);
    err |= clSetKernelArg(kernel, 3, sizeof(int), NULL);

    size_t one = 1;
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &one, NULL, 0, NULL, NULL);
    clFinish(queue);

    err = clEnqueueReadBuffer(queue, dev_c, true, 0, sizeof(int), &c, 0, NULL, NULL);

    clReleaseMemObject(dev_c);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseCommandQueue(queue);
    clReleaseContext(context);

    return 0;
}
0
votes

In regular OpenCL, for a kernel parameter declared as a local pointer, you don't allocate a host buffer and pass it in (like you're doing with dev_d). Instead you do a clSetKernelArg with the size of the desired local storage but a NULL pointer (like this: clSetKernelArg(kernel, 2, sizeof(cl_int) * local_work_size[0], NULL)). You'll have to translate that into the Xcode way if you insist on being platform-specific.