I'm using #pragma OPENCL EXTENSION cl_khr_fp16 : enable
supported GPU with OpenCL 1.2. I wanted to check the performance improvement by changin float precision from 32 to 16. In my device kernel, I converted all float
to half
like shown below:
__kernel void copy_kernel(int N, __global half *X, __global half *Y)
{
int i = get_global_id(0);
if(i < N) Y[i] = X[i];
}
In my host side, I made cl_mem
point to array of cl_half
. Host program looks as shown below:
void copy(int N, cl_mem X, cl_mem Y)
{
cl_kernel kernel = get_copy_kernel();
cl_command_queue queue = cl.queue;
cl_uint i = 0;
cl.error = clSetKernelArg(kernel, i++, sizeof(N), (void*) &N);
cl.error = clSetKernelArg(kernel, i++, sizeof(X), (void*) &X);
cl.error = clSetKernelArg(kernel, i++, sizeof(Y), (void*) &Y);
check_error_cl(cl);
size_t gsize = N;
cl.error = clEnqueueNDRangeKernel(queue, kernel, 1, 0, &gsize, 0, 0, 0, NULL);
check_error_cl(cl);
}
But while compiling the kernel, I get the below error:
Call parameter type does not match function signature!
%32 = load half addrspace(1)* %31, align 2
float %33 = call float @llvm.nvvm.mul.rn.f(half %32, half %19)
Broken module found, compilation terminated!