3
votes

This is about cache coherency protocol across different layers of cache. My understanding(X86_64) about L1 is that, it is owned exclusively by a core and L2 is between 2 cores and L3 for all the cores in a CPU socket. I have read the MESI protocol functioning, about store buffers, invalidate queues, invalidate messages etc. My doubt here is that is the MESI applicable for L1 only or it is applicable for L2 and L3 as well. Or is there a different cache synchronizing between for L2 and L3 .

1
Yes, "coherency" is different between parent/child caches than between siblings. e.g. in Intel CPUs with an inclusive L3, L3 tags act as the "directory" to track which core might have a copy of a line in Shared vs. Exclusive/Modified state. Which cache mapping technique is used in intel core i7 processor? (So even between per-core private caches, it's not simple classic MESI with a shared bus.)Peter Cordes
Maybe related: what's L3$ role part in MESI protocal / What cache coherence solution do modern x86 CPUs use? / Cache coherence- MESI protocol. (Those are mostly my answers, it turns out; I think some other users have written some!)Peter Cordes
MESI really just refers to the stable states that a cache block can be in--not the specifics of the protocol, topology or implementation. If you want a thorough explanation of the states and actions available at different cache controllers for a variety of configurations, take a look at A Primer on Memory Consistency and Cache Coherence, Second Edition, especially Chapter 8.hayesti

1 Answers

2
votes

The number of cache levels, how each level is organized with respect to other processors or cores in the system, and the coherence protocol implemented in each cache is defined by the core microarchitecture, the uncore microarchitecture, and, in some cases, relevant boot-time configuration options. These design aspects vary by vendor and processor generation and models within the same generation. There a lot of different designs even if you just consider the processors released in the past few years.

The organization of the cache hierarchy is always clearly documented by Intel and AMD. However, the coherence protocols are not always clearly documented. You won't find a section in any official document that directly tells you all the protocols that caches use. Some hardware performance event names allude to what coherence protocol is used in the cache to which the events apply.

The instruction cache (L1I) always uses the SI protocol because a line is never modified between the point of fill and the point of invalidation. So an entry can either be in the S or I state. The M and E states are only relevant and the cache supports modifying an existing line.

Some microarchitectures have caches that only support the write-through write hit policy. For example, the L1D in the AMD Bulldozer is a write-through cache. The M state doesn't make sense in a write-through cache. This means that the L1D either uses SI or ESI. SI is more likely because it requires only a single bit of state per entry.

Intel processors almost always support the write-back policy in all data and unified caches. Old Intel processors (90s and early 2000s) with two levels of caches use MESI for the L1D and L2. Intel processors with three levels of caches also uses MESI for the L1D and L2. The fact that four states are available doesn't necessarily mean that all are being used. A cache line whose physical address falls within a region with the write-through (WT) memory type doesn't use the M state. (It's possible that the type changed from WB to WT, so the first WT access could hit in M.) So the effective protocol for a WT line is ESI or SI.

The L3 caches in Intel processors starting with Nehalem-EX uses the MESIF protocol with an inclusive directory (used on a hit) for the entire NUMA node. Nehalem-EX also employs an in-memory 2-state directory to track which lines are owned by the off-package IOH. The in-memory directory protocol changed in Westmere-EX, and then changed again in the Xeon E5, and again in the Xeon E5/E7 v2, and again in the Xeon E5/E7 v3. These processors also support multiple coherence protocols in the L3-miss scenario with different tradeoffs.

I'm not sure what else to say to answer your question. I guess you could say that MESI is more or less applicable to the L2 and L3.