Note: I am not an expert on this topic, some of my statements are "what I heard on the internet", but I think I csan still clear up some misconceptions.
[edit] In general, I would rely on platform-specifics such as x86 atomic reads and lack of OOOX only in isolated, local optimizations that are guarded by an #ifdef checking the target platform, ideally accompanied by a portable solution in the #else path.
Things to look out for
- atomicity of read / write operations
- reordering due to compiler optimizations (this includes a different order seen by another thread due to simple register caching)
- out-of-order execution in the CPU
Possible misconceptions
1. As far as I understand it is, because function is protected and no reordering could be done by compiler inside.
[edit] To clarify: the _ReadWriteBarrier provides protection against instruction reordering, however, you have to look beyond the scope of the function. _ReadWriteBarrier has been fixed in VS 2010 to do that, earlier versions may be broken (depending on the optimizations they actually do).
Optimization isn't limited to functions. There are multiple mechanisms (automatic inlining, link time code generation) that span functions and even compilation units (and can provide much more significant optimizations than small-scoped register caching).
2. Visual C++ [...] makes volatile reads and writes atomic loads and stores,
Where did you find that? MSDN says that beyond the standard, will put memory barriers around reads and writes, no guarantee for atomic reads.
[edit] Note that C#, Java, Delphi etc. have different memory mdoels and may make different guarantees.
3. plain loads and stores are supposed to be atomic on x86 anyway, right?
No, they are not. Unaligned reads are not atomic. They happen to be atomic if they are well-aligned - a fact I'd not rely on unless it's isolated and easily exchanged. Otherwise your "simplificaiton fo x86" becomes a lockdown to that target.
[edit] Unaligned reads happen:
char * c = new char[sizeof(int)+1];
load(*(int *)c); // allowed by standard to be unaligned
load(*(int *)(c+1)); // unaligned with most allocators
#pragma pack(push,1)
struct
{
char c;
int i;
} foo;
load(foo.i); // caller said so
#pragma pack(pop)
This is of course all academic if you remember the parameter must be aligned, and you control all code. I wouldn't write such code anymore, because I've been bitten to often by laziness of the past.
4. Plain load has acquire semantics on x86, plain store has release semantics
No. x86 processors do not use out-of-order execution (or rather, no visible OOOX - I think), but this doesn't stop the optimizer from reordering instructions.
5. _ReadBarrier / _WriteBarrier / _ReadWriteBarrier do all the magic
They don't - they just prevent reordering by the optimizer. MSDN finally made it a big bad warning for VS2010, but the information apparently applies to previous versions as well.
Now, to your question.
I assume the purpose of the snippet is to pass any variable N, and load it (atomically?) The straightforward choice would be an interlocked read or (on Visual C++ 2005 and later) a volatile read.
Otherwise you'd need a barrier for both compiler and CPU before the read, in VC++ parlor this would be:
int load(int& var)
{
// force Optimizer to complete all memory writes:
// (Note that this had issues before VC++ 2010)
_WriteBarrier();
// force CPU to settle all pending read/writes, and not to start new ones:
MemoryBarrier();
// now, read.
int value = var;
return value;
}
Noe that _WriteBarrier has a second warning in MSDN:
*In past versions of the Visual C++ compiler, the _ReadWriteBarrier and _WriteBarrier functions were enforced only locally and did not affect functions up the call tree. These functions are now enforced all the way up the call tree.*
I hope that is correct. stackoverflowers, please correct me if I'm wrong.
std::memory_order_relaxedatomics. Also note that you should usually just use mfence (RWbarrier) after a store, not before & after a load. - Peter Cordes