cache coherency issue between two cores on same processor

Question

Having two processes p1 & p2 each running on different cores say c1 & c2 (both cores are on the same physical processor). Both of these cores have the different L1 & L2 cache while sharing the common L3 cache. Both p1 & p2 use a pointer ptr (ptr being in shared memory). Process p1 initializes the ptr & p2 is supposed to simply use it. Facing a crash in p2 as it sees the ptr as 'NULL' initially (though after some time, possibly because of cache coherence the correct value of ptr is seen by p2). I have the following questions related to this :

How can the above situation arise (p2 seeing a null value of ptr), though some form of cache coherency protocol would have been used ?
In case of shared bus/memory architecture, different processors (on different sockets) usually follow bus snooping protocols for cache coherence. I want to know what are the cache coherence protocols being followed in case of two cores (both cores on the same physical processor) having their private l1/l2 cache while sharing a common l3 cache.
Is there a way to check what is the cache coherence protocol being used (this is for ubuntu 16.04 system) ?

And you are sure that p1 has initialized the ptr before p2 reads it? How is the memory shared? — Erki Aring
from the debug logs, i can see that p1 has initialized the ptr. And p2 prints 'NULL' for the first time while trying to access it, and 2nd time onwards, it prints the correct value of ptr. Memory has been shared using the mmap. — mezda

Peter Cordes Peter Cordes · Accepted Answer · 2020-04-17T17:52:26

x86 is cache-coherent even across multiple sockets (like all other real-world ISAs that you can run std::thread across). x86's memory-ordering model is program-order + a store-buffer with store forwarding.

Formal model: A better x86 memory model: x86-TSO. Informally: http://preshing.com/20120930/weak-vs-strong-memory-models/

Lack of coherence is definitely not your bug. Once a store commits to L1d cache in one core, no other core can load the old value. (Because their copies of the line have all been invalidated so the core doing the modification can have exclusive ownership: MESI.)

Almost certainly p2 is reading the shared memory before p1 writes it. Coherence doesn't create synchronization on its own. If p1 and p2 both attach to the shared memory asynchronously, nothing stops p2 from reading before p1 writes.

You need some kind of data-ready flag which p2 checks with std::memory_order_acquire before reading the pointer. Or just spin on loading the pointer until you see a non-NULL value.

(Use mo_acquire on an atomic load of the pointer to avoid compile-time reordering, or runtime reordering on non-x86, with stuff you access later using that pointer. Or really only mo_consume would be needed for using a pointer, but compilers strengthen that to mo_acquire. That's fine on x86; acquire is free anyway.)

cache coherency issue between two cores on same processor

1 Answers