Kernel code:
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__kernel void calculate (__global double* in)
{
int idx = get_global_id(0); // statement 1
printf("started for %d workitem\n", idx); // statement 2
in[idx] = idx + 100; // statement 3
printf("value changed to %lf in %d workitem\n", in[idx], idx); // statement 4
barrier(CLK_GLOBAL_MEM_FENCE); // statement 5
printf("completed for %d workitem\n", idx); // statement 6
}
I am calling kernel using clEnqueueNDRangeKernel, by passing an argument of array of double datatype having 5 elements with value initialized to 0.0
i am calling kernel with 5 global_work_size, hence each element of array i will solve on each workitem.
But as per my theoritical understanding of barriers, To synchronize work-items in a work-group, OpenCL provides a similar capability with the barrier function. This forces a work-item to wait until every other work-item in the group reaches the barrier. By creating a barrier, you can make sure that every work-item has reached the same point in its processing. This is a crucial concern when the work-items need to finish computing an intermediate result that will be used in future computation.
Hence, i was expecting an output like:
started for 0 workitem
started for 1 workitem
value changed to 100.000000 in 0 workitem
value changed to 101.000000 in 1 workitem
started for 3 workitem
value changed to 103.000000 in 3 workitem
started for 2 workitem
value changed to 102.000000 in 2 workitem
started for 4 workitem
value changed to 104.000000 in 4 workitem
completed for 3 workitem
completed for 0 workitem
completed for 1 workitem
completed for 2 workitem
completed for 4 workitem
these completed statements, will come at the end together because of barrier will restrict other work items till reaching that point.
But, result i am getting,
started for 0 workitem
value changed to 100.000000 in 0 workitem
completed for 0 workitem
started for 4 workitem
value changed to 104.000000 in 4 workitem
completed for 4 workitem
started for 1 workitem
started for 2 workitem
started for 3 workitem
value changed to 101.000000 in 1 workitem
value changed to 103.000000 in 3 workitem
completed for 3 workitem
value changed to 102.000000 in 2 workitem
completed for 2 workitem
completed for 1 workitem
Am i missing something in logic? then, How does a barrier work for OpenCl Kernel?
Added more checks in kernel for cross checking updated values after Barrier instead of print statements.
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__kernel void calculate (__global double* in)
{
int idx = get_global_id(0);
in[idx] = idx + 100;
barrier(CLK_GLOBAL_MEM_FENCE);
if (idx == 0) {
in[0] = in[4];
in[1] = in[3];
in[2] = in[2];
in[3] = in[1];
in[4] = in[0];
}
}
then after array should be
after arr[0] = 104.000000
after arr[1] = 103.000000
after arr[2] = 102.000000
after arr[3] = 101.000000
after arr[4] = 100.000000
But results, i am getting:
after arr[0] = 0.000000
after arr[1] = 101.000000
after arr[2] = 102.000000
after arr[3] = 103.000000
after arr[4] = 104.000000