I have a problem in OpenCL 1.2.
Look, i have an array as __global in the kernel and the group size is 1000.
The problem is that the atomic_add() function doesn't work correctly.
My kernel code is :
__kernel void kernelfunction(__global uint32_t* buffer){
buffer[3] = 100;
atomic_add(&buffer[3], 1);
...
}
If I create 1000 threads, I expect the value of buffer[3] will be 1100, am i right?
but the behavior of the program is undefined.
sometime it will be 1100, sometimes 1064, sometimes 1093 and ...
What I have tried:
I also enable the opencl extension like below :
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
but the problem is still exists.
in the another project I'd create a simple opencl project and the atomic_add works correctly, and I've checked almost the entire of project configuration but i don't know the problem where is.
can you help me? thanks