If a CPU core uses a write buffer, then the load can bypass the most recent store to the referenced location from the write buffer, without waiting until it will appear in the cache. But, as it's written in A Primer on Memory Consistency and Coherence, if the CPU honors TSO memory model, then
... multithreading introduces a subtle write buffer issue for TSO. TSO write buffers are logically private to each thread context (virtual core). Thus, on a multithreaded core, one thread context should never bypass from the write buffer of another thread context. This logical separation can be implemented with per-thread-context write buffers or, more commonly, by using a shared write buffer with entries tagged by thread-context identifiers that permit bypassing only when tags match.
I can't grasp the necessity of this limitation. Could you please give me an example when allowing some thread to bypass a write buffer entry written by another thread on the same core leads to the violation of the TSO memory model?