This is about cache coherency protocol across different layers of cache
. My understanding(X86_64) about L1
is that, it is owned exclusively by a core and L2
is between 2 cores and L3
for all the cores in a CPU socket. I have read the MESI
protocol functioning, about store buffers, invalidate queues, invalidate messages etc. My doubt here is that is the MESI
applicable for L1
only or it is applicable for L2
and L3
as well. Or is there a different cache synchronizing between for L2
and L3
.
1 Answers
The number of cache levels, how each level is organized with respect to other processors or cores in the system, and the coherence protocol implemented in each cache is defined by the core microarchitecture, the uncore microarchitecture, and, in some cases, relevant boot-time configuration options. These design aspects vary by vendor and processor generation and models within the same generation. There a lot of different designs even if you just consider the processors released in the past few years.
The organization of the cache hierarchy is always clearly documented by Intel and AMD. However, the coherence protocols are not always clearly documented. You won't find a section in any official document that directly tells you all the protocols that caches use. Some hardware performance event names allude to what coherence protocol is used in the cache to which the events apply.
The instruction cache (L1I) always uses the SI protocol because a line is never modified between the point of fill and the point of invalidation. So an entry can either be in the S or I state. The M and E states are only relevant and the cache supports modifying an existing line.
Some microarchitectures have caches that only support the write-through write hit policy. For example, the L1D in the AMD Bulldozer is a write-through cache. The M state doesn't make sense in a write-through cache. This means that the L1D either uses SI or ESI. SI is more likely because it requires only a single bit of state per entry.
Intel processors almost always support the write-back policy in all data and unified caches. Old Intel processors (90s and early 2000s) with two levels of caches use MESI for the L1D and L2. Intel processors with three levels of caches also uses MESI for the L1D and L2. The fact that four states are available doesn't necessarily mean that all are being used. A cache line whose physical address falls within a region with the write-through (WT) memory type doesn't use the M state. (It's possible that the type changed from WB to WT, so the first WT access could hit in M.) So the effective protocol for a WT line is ESI or SI.
The L3 caches in Intel processors starting with Nehalem-EX uses the MESIF protocol with an inclusive directory (used on a hit) for the entire NUMA node. Nehalem-EX also employs an in-memory 2-state directory to track which lines are owned by the off-package IOH. The in-memory directory protocol changed in Westmere-EX, and then changed again in the Xeon E5, and again in the Xeon E5/E7 v2, and again in the Xeon E5/E7 v3. These processors also support multiple coherence protocols in the L3-miss scenario with different tradeoffs.
I'm not sure what else to say to answer your question. I guess you could say that MESI is more or less applicable to the L2 and L3.