Multithreading program stuck in optimized mode but runs normally in -O0

Question

I wrote a simple multithreading programs as follows:

static bool finished = false;

int func()
{
    size_t i = 0;
    while (!finished)
        ++i;
    return i;
}

int main()
{
    auto result=std::async(std::launch::async, func);
    std::this_thread::sleep_for(std::chrono::seconds(1));
    finished=true;
    std::cout<<"result ="<<result.get();
    std::cout<<"\nmain thread id="<<std::this_thread::get_id()<<std::endl;
}

It behaves normally in debug mode in Visual studio or -O0 in gcc and print out the result after 1 seconds. But it stuck and does not print anything in Release mode or -O1 -O2 -O3.

Comments are not for extended discussion; this conversation has been moved to chat. — Samuel Liew

Scheff's Cat Scheff's Cat · Accepted Answer · 2019-10-23T05:39:02

Two threads, accessing a non-atomic, non-guarded variable are U.B. This concerns finished. You could make finished of type std::atomic<bool> to fix this.

My fix:

#include <iostream>
#include <future>
#include <atomic>

static std::atomic<bool> finished = false;

int func()
{
    size_t i = 0;
    while (!finished)
        ++i;
    return i;
}

int main()
{
    auto result=std::async(std::launch::async, func);
    std::this_thread::sleep_for(std::chrono::seconds(1));
    finished=true;
    std::cout<<"result ="<<result.get();
    std::cout<<"\nmain thread id="<<std::this_thread::get_id()<<std::endl;
}

Output:

result =1023045342
main thread id=140147660588864

Live Demo on coliru

Somebody may think 'It's a bool – probably one bit. How can this be non-atomic?' (I did when I started with multi-threading myself.)

But note that lack-of-tearing is not the only thing that std::atomic gives you. It also makes concurrent read+write access from multiple threads well-defined, stopping the compiler from assuming that re-reading the variable will always see the same value.

Making a bool unguarded, non-atomic can cause additional issues:

The compiler might decide to optimize variable into a register or even CSE multiple accesses into one and hoist a load out of a loop.
The variable might be cached for a CPU core. (In real life, CPUs have coherent caches. This is not a real problem, but the C++ standard is loose enough to cover hypothetical C++ implementations on non-coherent shared memory where atomic<bool> with memory_order_relaxed store/load would work, but where volatile wouldn't. Using volatile for this would be UB, even though it works in practice on real C++ implementations.)

To prevent this to happen, the compiler must be told explicitly not to do.

I'm a little bit surprised about the evolving discussion concerning the potential relation of volatile to this issue. Thus, I'd like to spent my two cents:

Multithreading program stuck in optimized mode but runs normally in -O0

3 Answers