Multi-dimensional arrays are eventually flattened to map to the flat/linear shared memory address space. Therefore, there's no performance-wise advantage for using one over the other.
By using multi-dimensional shared arrays over the flattened you can avoid the burden of manual index calculation. For multi-dimensional arrays, index calculation will automatically be added to the final compiled code behind the curtain which makes reading the source code easier.
Maybe one advantage of using flat representation form over the multi-dimensional one is that you can easier reason about and spot bank conflicts in the shared memory accesses. Consider this shared memory buffers:
__shared__ int A[ 64 ];
If you access the buffer like this:
int laneID = threadIdx.x & 31;
int ret = A[ laneID * 2 ];
It might be easier to distinguish the bank conflict between the threads compared to the below example:
__shared__ int B[ 32 ][ 2 ];
where you access it like this:
int laneID = threadIdx.x & 31;
int ret = B[ laneID ][ 0 ];
Note that two above cases are functionally equivalent.