I came across this thread Turning off coalescing in Nvidia Forum where it is stated that "So far setting a memory pointer/array to "volatile" seems to help for massively random access. (Gives 50% more performance ?!)".
I am doing finite difference computation (3D Stencil Computation) on GPU (Fermi) using CUDA and want to improve the performance of the computation. Since accessing the z axis of 3D array is random (3D array is laid in Z,Y,X from slow to fast), I feel like using volatile would be a better choice. Currently I am using shared memory
__shared__ float 2dplane[32][32]
When I tried using volatile as
volatile float **plane = 2dplane;
, I get this error
error: a value of type "float ()[16]" cannot be used to initialize an entity of type "volatile float *"
Can anyone tell me how to use volatile on 2d array[code example will be helpful]. Moreover, it would be great if someone tells me how much performance gain can I expect.