I have a test kernel here that I'm calling from the Julia OpenCL API. The fact that I'm calling it from Julia isn't important, it's just what I'm using to run OpenCL, here is the code:
using OpenCL
const cl = OpenCL
device, ctx, queue = cl.create_compute_context()
C_buff = cl.Buffer(Float32, ctx, :w, 2)
const testkernel = """
kernel void test(global float *C)
{
int gid = get_global_id(0);
int lid = get_local_id(0);
local float x;
if (lid == 0)
{
x = 0.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
x += 1.0f;
barrier(CLK_LOCAL_MEM_FENCE);
if (lid == 0)
{
C[gid / 2] = x;
}
}
"""
program = cl.Program(ctx, source=testkernel) |> cl.build!
kernel = cl.Kernel(program, "test")
cl.call(queue, kernel, 4, 2, C_buff)
cl.read(queue, C_buff)
What I can't figure out is this returns a vector [1.0,1.0]
, when it seems like it should return the vector [2.0,2.0]
. Since basically I have 4 work-items broken up into two work-groups (each containing 2 work-items).
One local float x
is instantiated for each work-group and the first work-item in each work-group sets it to 0. Then each work-item in the work-group adds 1 to it, and since there's two work-items in each work-group it should be 2, but when I return C
, I get a vector of ones instead.