Temporal locality is a property exhibited by applications and the way they access data.
There is no single practical cache design that is optimal for all cases.
Certainly applications exhibit some form of spatial locality (accessing nearby blocks), however temporal locality is not always there, e.g. streaming behavior.
The most important thing to consider when designing a cache for temporal locality,
is the replacement and cache block allocation policy.
LRU replacement approaches the optimal case for most applications but not all (there are pathological cases) and is practical only for caches with limited associativy (e.g. 4-ways).
A cache designer may also choose the allocation policy for writes (e.g. write-no-allocate with the optimization of a write-combining/merging buffer).
Temporal locality optimization is generally a software task, where several techniques can be applied, such as blocking and cache-oblivious algorithms.
There is also a very useful metric/method where you can characterize the temporal locality of applications and is called reuse-distance. This method assumes a fully-associative cache with LRU replacement and cache-line size of one word.
This is usually modeled as a stack and when you hit in the cache, the position in the stack gives you the distance (then you move this to the top of the stack and leave a hole), then you can generate histograms and see what happens.
There can be endless discussion on locality because it is a research topic forever.