As far as I know, compiler(software) and CPU(hardware) will reorder instructions for performance reason, and memory berriers can prevent the reordering, they're in compiler level or CPU level.
MSDN says "Interlockedxxxx function generates a full memory barrier (or fence) to ensure that memory operations are completed in order", I don't know "a full memory barrier" means hardware or software barrier ?
What is done by boost::atomic ? a hardware barrier ? flush CPU cache/storage buffer ?
The memory_order_acquire semantic makes a software or hardware berrier ?