Scheff's answer describes how to fix your code. I thought I would add a little information on what is actually happening in this case.
I compiled your code at godbolt using optimisation level 1 (-O1
). Your function compiles like so:
func():
cmp BYTE PTR finished[rip], 0
jne .L4
.L5:
jmp .L5
.L4:
mov eax, 0
ret
So, what is happening here?
First, we have a comparison: cmp BYTE PTR finished[rip], 0
- this checks to see if finished
is false or not.
If it is not false (aka true) we should exit the loop on the first run. This accomplished by jne .L4
which jumps when not equal to label .L4
where the value of i
(0
) is stored in a register for later use and the function returns.
If it is false however, we move to
.L5:
jmp .L5
This is an unconditional jump, to label .L5
which just so happens to be the jump command itself.
In other words, the thread is put into an infinite busy loop.
So why has this happened?
As far as the optimiser is concerned, threads are outside of its purview. It assumes other threads aren't reading or writing variables simultaneously (because that would be data-race UB). You need to tell it that it cannot optimise accesses away. This is where Scheff's answer comes in. I won't bother to repeat him.
Because the optimiser is not told that the finished
variable may potentially change during execution of the function, it sees that finished
is not modified by the function itself and assumes that it is constant.
The optimised code provides the two code paths that will result from entering the function with a constant bool value; either it runs the loop infinitely, or the loop is never run.
at -O0
the compiler (as expected) does not optimise the loop body and comparison away:
func():
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], 0
.L148:
movzx eax, BYTE PTR finished[rip]
test al, al
jne .L147
add QWORD PTR [rbp-8], 1
jmp .L148
.L147:
mov rax, QWORD PTR [rbp-8]
pop rbp
ret
therefore the function, when unoptimised does work, the lack of atomicity here is typically not a problem, because the code and data-type is simple. Probably the worst we could run into here is a value of i
that is off by one to what it should be.
A more complex system with data-structures is far more likely to result in corrupted data, or improper execution.