0
votes

I'm using #pragma OPENCL EXTENSION cl_khr_fp16 : enable supported GPU with OpenCL 1.2. I wanted to check the performance improvement by changin float precision from 32 to 16. In my device kernel, I converted all float to half like shown below:

__kernel void copy_kernel(int N,  __global half *X, __global half *Y)
{
    int i = get_global_id(0);
    if(i < N) Y[i] = X[i];
}

In my host side, I made cl_mem point to array of cl_half. Host program looks as shown below:

void copy(int N, cl_mem X, cl_mem Y)
{
    cl_kernel kernel = get_copy_kernel();
    cl_command_queue queue = cl.queue;

    cl_uint i = 0;

    cl.error = clSetKernelArg(kernel, i++, sizeof(N), (void*) &N);
    cl.error = clSetKernelArg(kernel, i++, sizeof(X), (void*) &X);
    cl.error = clSetKernelArg(kernel, i++, sizeof(Y), (void*) &Y);
    check_error_cl(cl);

    size_t gsize = N;
    cl.error = clEnqueueNDRangeKernel(queue, kernel, 1, 0, &gsize, 0, 0, 0, NULL);
    check_error_cl(cl);
}

But while compiling the kernel, I get the below error:

Call parameter type does not match function signature!
  %32 = load half addrspace(1)* %31, align 2
 float  %33 = call float @llvm.nvvm.mul.rn.f(half %32, half %19)
Broken module found, compilation terminated!
1

1 Answers

0
votes

You are passing a half variable to the kernel but the kernel expects a pointer to an array of halfs.

If you want to pass an array of halfs to the GPU you still have to use cl_mem objects which then contains the array of halfs.