How does L2 cache work in GPUs with Kepler architecture in terms of locality of references? For example if a thread accesses an address in global memory, supposing the value of that address is not in L2 cache, how is the value being cached? Is it temporal? Or are other nearby values of that address brought to L2 cache too (spatial)?
Below picture is from NVIDIA whitepaper.