OpenCL provides local memory similar to shared memory in cuda. In cuda we have to use volatile with shared memory , because If you don't declare a shared array as volatile, then the compiler is free to optimize locations in shared memory by locating them in registers. But it will be problem if threads communicate between each other. My question is do we have to follow same approach(of using volatile) in opencl kernel also and if yes how should I do it?
0
votes
1 Answers
2
votes
1) you don't need to use volatile with CUDA shared memory. Here is a good answer explaining that. Quote:
__syncthreads() call is sufficient to force thread synchronization as well as to force any register-cached values in shared memory to be evicted back to shared memory.
2) the OpenCL equivalent of __syncthreads() is barrier(CLK_LOCAL_MEM_FENCE). There is also a weaker mem_fence which is (supposedly) comparable to CUDA's __threadfence or __threadfence_block.