This question is closely related to this one, but mine belongs to the CUDA world.
I have several threads in my kernel that could write the very same value in the same global memory location. This has been working fine, but I'm afraid that it could be potentially bogus, and that so far I was just being lucky.
Is there any possibility of memory corruption or unexpected behavior in my workflow (due to data races, cache syncing, etc)?