I am writing a series of test for a GPU's DRAM (global) memory. Specifically targeting AMD GCN architecture of Tahiti and Hawaii model lines. The archs have a write-back L2 caches.
What I want is to ensure that the stores to global memory are indeed written through to global memory before another thread does a read.
The barrier and mem_fence documentation in the spec states:
CLK_GLOBAL_MEM_FENCE - The barrier function will queue a memory fence to ensure correct ordering of memory operations to global memory. This can be useful when work-items, for example, write to buffer or image objects and then want to read the updated data.
However, this only enforces correct ordering. My question is does this trigger a write to global memory of the L2 cache data?