Is there anything that prevents the CPU from waiting a long time before executing its scheduled set operations?

Question

Let's say that I have two threads that share the global variable x. Thread A's job is to set a value to x, and Thread B's job is to read x. Now each thread (or each core I suppose) will have a cached copy of x.

Let's say that Thread A has set the value of x to 12345. Now Thread A's cache could remain unaltered because the CPU can schedule the set operation to be executed later, and so the cache coherence protocol will not act, and so when Thread B reads the value of x, it will read an old value.

My question is: is there anything that prevents the CPU from waiting a long time (for example: 10 minutes) before executing its scheduled set operations?

Note: I know that I can use a memory barrier to force the CPU to execute its scheduled set operations immediately, but I am curious to know what can happen if I don't use a memory barrier.

make use of volatile for that variable (x), so that it won't cache the value either the thread A or B — ntshetty
No. But a lot of events, including dispatching exceptions, flush the store buffer anyway. — Margaret Bloom
Threading race bugs are triggered by delays that are measured in nanosecond units. That it is so short is what makes such bugs so drastically hard to debug. — Hans Passant
Tangentially related: The paper cs.tau.ac.il/~mad/publications/asplos2014-ffwsq.pdf suggests exploiting the bound on the store queue length to avoid using a fence. But note that the authors are reasoning about buffer lengths, not time. — Arch D. Robison

Art Art · Accepted Answer · 2017-03-15T11:19:51

No CPU documentation I've read in the past 15 years speaks about time it takes to synchronize memory in terms more specific than "X is visible before Y". The reason for this is because memory protocols are so complex that it's pretty much impossible to put an upper bound on how much can happen before your write will become visible (DMA, error correction, TLB lookups, SMM, etc.).

You could construct a theoretical scenario where your write will never become visible, in fact, if you want to do that just find errata documents for CPUs and they'll have plenty examples of how that can happen. But in practice? No, you'll never wait 10 minutes. The kernel you're running on will be receiving interrupts that will perform memory reads and writes that will flush store buffers and evict your cache lines.

That being said, you should still use memory synchronization, but for different reasons. For enforcing ordering. If value x is the only bit of information you want to send to the other thread it will eventually become readable and you can get away with not synchronizing. But this is almost never the case. Usually the value x is there to say that value y contains something interesting and you need proper synchronization to know that y has the right contents when x becomes visible to the other thread.

Is there anything that prevents the CPU from waiting a long time before executing its scheduled set operations?

1 Answers