I apologize in advance for the vagueness of this question.
Background:
I am attempting to write a morphological image processing function in OpenCL. I have a __local buffer which I use to store data for every pixel (each pixel is represented by a work-item, no loop unrolling yet). Also, since I am early in testing, I am only using a single work-group (8x8 pixel image so I can manually validate results).
Problem:
There are occasions when data from one, two, three, or even four pixels must be added into the pixel buffer of another. Since these are adjacent pixel in the same workgroup, I am sure I am causing local memory bank conflicts. That's ok, speed isn't my top priority (yet!). However, these bank conflicts seem to be dropping data and even corrupting data. I've been very careful not to overflow or over run the buffers.
So, my first question is: is it, in fact, possible that the the bank conflicts are causing data corruption and loss? The Opencl spec seems to indicate that the operation should serialize, slowing down the bandwidth - but there is no mention of data loss.
My second question is: Help! - What can I do about this?
Any guidance will be greatly appreciated - thanks!