I'm trying to implement a simple busy loop function.
This should keep polling a std::atomic variable for a maximum number of times (spinCount), and return true if the status did change (to anything other than NOT_AVAILABLE) within the given tries, or false otherwise:
// noinline is just to be able to inspect the resulting ASM a bit easier - in final code, this function SHOULD be inlined!
__declspec(noinline) static bool trySpinWait(std::atomic<Status>* statusPtr, const int spinCount)
{
int iSpinCount = 0;
while (++iSpinCount < spinCount && statusPtr->load() == Status::NOT_AVAILABLE);
return iSpinCount == spinCount;
}
However, it seems that MSVC just opitmizes the loop away on Release mode for Win64. I'm pretty bad with Assembly, but doesn't look to me like it's ever even trying to read the value of statusPtr at all:
int iSpinCount = 0;
000000013F7E2040 xor eax,eax
while (++iSpinCount < spinCount && statusPtr->load() == Status::NOT_AVAILABLE);
000000013F7E2042 inc eax
000000013F7E2044 cmp eax,edx
000000013F7E2046 jge trySpinWait+12h (013F7E2052h)
000000013F7E2048 mov r8d,dword ptr [rcx]
000000013F7E204B test r8d,r8d
000000013F7E204E je trySpinWait+2h (013F7E2042h)
return iSpinCount == spinCount;
000000013F7E2050 cmp eax,edx
000000013F7E2052 sete al
My impression was that std::atomic with std::memory_order_sequential_cst creates a compiler barrier that should prevent something like this, but seems that's not the case (or rather, my understanding was probably wrong).
What am I doing wrong here, or rather - how can I best implement that loop without having it optimized away, with least impact on overall performance?
I know I could use #pragma optimize( "", off ), but (other than in the example above), in my final code I'd very much like to have this call inlined into a larger function for performance reasons. seems that this #pragma will generally prevent inlining though.
Appreciate any thoughts!
Thanks