The details of the MESI protocol for multicore processors would be really important for me, but I can't find them anywhere. Even http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf doesn't contain enough detail. For instance: assume a private L1 and shared L2 cache. If the state of a line is exclusive in L1, then is it exclusive in L2 too (or invalid, because only in one cache could be the state of a line exclusive)? And clearly, if another core writes this line, the state of the previously exclusive line in L1 becomes invalid, but how is changing the state of the L2 cache line? If a modified line in L1 is read by another core, will be the new state of that line shared and is it written back to the main memory through the L2 cache, or stay modified in L2 too? etc.
2 Answers
The reason you are having trouble finding these answers is because the traditional protocols were not defined for hierarchical cache architectures so the MESI protocol by itself doesn't define what will happen when you have an L1 and an L2 cache. It depends on three other system attributes.
If the L2 is designed to be exclusive of the L1 (i.e., it is guaranteed that L2 and L1 can never have common cache lines), then any line in the L1 will be invalid state (basically not present) in the L2.
If the L2 is inclusive of the L1, i.e., every line in the L1 must have an entry in the L2 as well, the entry in the L2 will contain a descriptor stating which L1 cache has the line in E state.
Whether or not the value is written out to L2 or memory on a read from E or W stage depends on whether your system supports cache-to-cache transfers or not. In old day, when each chip was a single core, and core-to-core communication was as expensive as read/write to memory, systems would write the data to memory and make the other processor read it (this allowed them to not support cache-to-cache transfers). In multi-core, talking via memory is insanely expensive compared to talking to other cores on-chip, so almost all multi-core chips today support cache-to-cache transfer. Thus, a read from E or W stage is not serviced by writing to memory.
I hope this helps.