11
votes

How does L2 cache work in GPUs with Kepler architecture in terms of locality of references? For example if a thread accesses an address in global memory, supposing the value of that address is not in L2 cache, how is the value being cached? Is it temporal? Or are other nearby values of that address brought to L2 cache too (spatial)?

Below picture is from NVIDIA whitepaper.

Picture is from NVIDIA whitepaper

1
L2 cache was introduced with compute capability 2.0 and higher and continues to be supported on the Kepler architecture. The caching policy used is LRU (least recently used) the main intention of which was to avoid the global memory bandwidth bottleneck. I had read this from the book "Cuda Application design and development". Not sure if that answers your question.Sagar Masuti
L1 cache has a cacheline size of 128 bytes. L2 cache has a cacheline size of 32 bytes. So an L2 miss triggers a 32-byte load. Kepler does not normally have L1 enabled for ordinary global loads.Robert Crovella

1 Answers

10
votes

Unified L2 cache was introduced with compute capability 2.0 and higher and continues to be supported on the Kepler architecture. The caching policy used is LRU (least recently used) the main intention of which was to avoid the global memory bandwidth bottleneck. The GPU application can exhibit both types of locality (temporal and spatial).

Whenever there is an attempt read a specific memory it looks in the cache L1 and L2 if not found, then it will load 128 byte from the cache line. This is the default mode. The same can be understood from the below diagram as to why the 128 bit access pattern gives the good result.

enter image description here