what's L3$ role part in MESI protocal

Question

I like to know more details of MESI in intel broadwell .

Suppose A cpu socket has 6 cores core 0 to core 5 , each of them has their own L1$ and L2$ and share L3$ , there are a var X in shared memory , x located in cache line called XCacheL , the following is the detail for my question:

T1 : Core 0 and core 4 and core 5 has x = 100 and XCacheL is Shared status since 3 cores has the copy of XCacheL .

T2 : Core 0 require to modify x , so core 0 broadcast invalidate signal and core 4 and core 5 receive the signal ,invalidate their copy of XCacheL , Core 0 modify x to 200 and XCacheL status now is Modified .

T3: core 4 require to read x but its XCacheL copy is invalidated in T2 , so it fire a read miss , the following is going to happen :

● Processor makes bus request to memory
● Snooping cache puts copy value on the bus
● Memory access is abandoned
● Local processor caches value
● Local copy tagged S
● Source (M) value copied back to memory
● Source value M -> S

so after T3 , XCacheL is core 0 and core 4 status : Shared , and Invalidated in core 5 , and also L3$ and main memory has the newest valid XCacheL .

T4 : core 5 require to read x , since its XCacheL copy is Invalidated in T2 , but this monent XCacheL has the correct copy in L3$ , Would core 5 need to fire a read miss like core 4 do ?!

My guess is : no need , since L3$ has the valid XCacheL, so core 5 can reach L3$ and get the right XCacheL from L3$ to L1$ in core 5 , so core 5 won't fire a read miss .

Where the L3 is inclusive, it is probably faster to read the shared lines from there. Where it isn't they are forwarded from the other caches. That's why MESIF exists. The uncore probably just broadcast the request in the QPI/UPI link and either the L3, the iMC or another core homing agent respond to it. It that's what you mean by a read miss (sorry I lack terminology) than a core will still fire it. Actually, you always need to fire something to read from outside the core, even from L1. — Margaret Bloom
Transitioning from Modified directly to Shared upon read is not done on all processors. Sometimes it's good to invalidate because the read will soon become a write and you want the line exclusively. see - software.intel.com/en-us/forums/… — Leeor

Hadi Brais Hadi Brais · Accepted Answer · 2019-01-21T20:11:43

It looks like you're talking about the Early Snoop algorithm where the caching agents of the L3 slices are responsible for sending snoops. So I'll answer the question according to that algorithm.

All Broadwell processors use an inclusive L3. So yes, core 5 will miss in its private L1 and L2 caches and a read request is sent to the caching agent of the L3 slice to which the requested line is mapped. The caching agent determines that it has the line and it is in the S state. Since it is a read request, the caching agent will send the cache line to core 5. The state of the line is not changed and no snoops are sent.

what's L3$ role part in MESI protocal

2 Answers