the latency of acessing shared memory

Question

which latency is longer between two situation below,

The data be filled into the shared memory from global memory, and all the thread access the shared memory concurrently.the data maybe the same for multiple threads accessing
All the threads access the global memory,but the data are neighbors.

Roger Dahl Roger Dahl · Accepted Answer · 2012-12-08T15:10:54

If you plan on accessing each value only once, then you won't gain anything from using shared memory.

Values in shared memory are only valid within a block, so one or more threads in each block will have to load the values from global memory. So you're not able to avoid the global memory accesses.

If you have a device of compute capability >= 2.0 (Fermi), values read from global memory are automatically cached in the L1 and L2 caches. L1 has the same latency as shared memory.

Latency is a fixed value that depends on which memory you're accessing. It doesn't change. Latency is always much lower for shared memory than for global memory.

I think what you might really be asking is what type of access would give you the best memory throughput. If you will be using each value only once, case (2) will give the best throughput. If you will be reusing values and have CC >= 2.0, letting L1 handle the caching is likely to give the best throughput. If you're reusing values on CC < 2.0, using shared memory will give the best throughput.

Case (1) may or may not cause bank conflicts but will give better throughput regardless, for values that are already stored in shared memory.

Case (2) describes the optimal access pattern for global memory.

the latency of acessing shared memory

which latency is longer between two situation below,

2 Answers