Towards the end of his talk about memory barriers (fences), he gave the following example (Note: global is not of atomic type):
// thread 1 // thread 2
widget *temp = new widget();
global = temp;
global->do_something();
global->do_something_else();
Later he said we should have full fences as the follows:
// thread 1 // thread 2
widget *temp = new widget();
XX mb(); XXXXXXXXXXXXXXXXXXXXX
global = temp;
temp2 = global;
XX mb(); XXXXXXXXXXXXXXXX
temp2->do_something();
temp2 = global;
XX mb(); XXXXXXXXXXXXXXXX
temp2->do_something_else();
I wonder why in thread 1 you need a barrier? global depends on temp and compiler wouldn't move global = temp; above the construction of temp anyway. One reason I can think of that needs the memory barrier is that the statement global=temp; somehow can be done in the middle of the construction of new widget() and thread 2 then would see a partially constructed global. Is that possible that assignment to global be scheduled in the middle of constructing new widget? Or the memory barrier is due to some other reasons?
Also in thread 2, don't you need another barrier after temp2->do_something();? As in the currently transformed form, the following statements can still be reordered during execution:
temp2 = global;
XX mb(); XXXXXXXXXXXXXXXX
temp2 = global;
temp2->do_something();
XX mb(); XXXXXXXXXXXXXXXX
temp2->do_something_else();
And this is not what you intended since now you only invoke member functions on one temp2 instead of reading a new value from global before executing do_something_else().
If no such reordering is possible in thread 2, then why do we need the barriers in the first place in thread 2 if temp2->do_something(); can't be reordered before temp2 = global;.